Today, my friend Angjoo Kanazawa (Kim) presented her proposal (work) to model 3D pose and shape of animals from 2D images.

Link: https://talks.cs.umd.edu/talks/1040

  • Initialize camera projection matrix (Initial fitting)
  • Until convergence
    • Solve SDP
    • Rotate the convex region
    • Update camera with latest \phi

Experiments

  • 10 images of cats and horses
  • 2 annotated points
  • 1000 tetrahedra faces

Comparison with other distortion bounds

  • Input image
  • No bounds
  • Uniform bounds
  • Stiffness

 

Camera estimation

Poor camera estimate when pose deviates a lot from the template

Synthesize

Goal: Deform template to match both pose and shape

Use silhouettes for guiding shape

Towards automatic sing-view reconstruction

 

Abstract

With the rise of Augmented Reality, Virtual Reality and 3D printing, methods for acquiring 3D models from the real world are more important than ever. However, even with today’s high quality depth-sensors and sophisticated multi-view stereo algorithms, obtaining 3D models of highly non-rigid and articulated objects like live animals remains a challenge. One approach to generate 3D models is by modifying an existing template 3D mesh to fit the pose and shape of similar objects in images.

Automatic or user-annotated 3D-to-2D point correspondences along with silhouettes can guide the modification of the 3D mesh. If possible, this will allow for applications where ordinary users can produce 3D models of their pets from a personal photo collection that can be readily 3D printed or used in virtual reality applications.

In order to deform the template to match the object pose and shape naturally, it is essential to have a model that spans the 3D pose and shape variations of an object class. I propose that it is possible to learn such a model from a set of annotated 2D images and a template 3D mesh. The preliminary work presents a data-driven approach that learns a class model of articulation and deformation from a set of annotated Internet images. To do so, we incorporate the idea of local stiffness, which specifies the amount of distortion allowed for a local region. Our system jointly learns the stiffness as it deforms a template 3D mesh to the pose of the objects in images. We show that this seemingly complex task can be solved with a sequence of convex optimization programs. I plan on extending this approach for modeling intra-class shape variations such as fat vs thin and tall vs short from annotated 2D images and silhouettes. The final model will allow for synthesizing new 3D models of the object class in realistic poses and shapes. The obtained 3D models can be rendered from many viewpoints, which can be used to obtain a virtually unlimited number of training data. This can be used to explore deep learning methods that can automatically obtain 3D-to-2D correspondences from images for single-view reconstruction and markerless motion capture from video

Examining Committee:

Committee Chair:                   –           Dr. David Jacobs

Dept’s Representative            –           Dr. Tom Goldstein

Committee Member(s):          –           Dr. Larry Davis