It is not surprising for me to learn that Richard Newcombe, Dieter Fox, Steve Seitz won the best paper award by DynamicFusion in CVPR 2015. UW really rocks in the 3D reconstruction field.

Summary

First let’s read the abstract of the paper:

We present the first dense SLAM system capable of reconstructing non-rigidly deforming scenes in real-time, by fusing together RGBD scans captured from commodity sensors. Our DynamicFusion approach reconstructs scene geometry whilst simultaneously estimating a dense volumetric 6D motion field that warps the estimated geometry into a live frame. Like KinectFusion, our system produces increasingly denoised, detailed, and complete reconstructions as more measurements are fused, and displays the updated model in real time. Because we do not require a template or other prior scene model, the approach is applicable to a wide range of moving objects and scenes.

Here is the video with splendid results:

Overall, DynamicFusion decomposes a non-rigidly deforming scene into a latent geometric surface, reconstructed into a rigid canonical space S ⊆ R^3 ; and a per frame volumetric warp field that transforms that surface into the live frame. The steps are:

  1. Estimation of the volumetric model-to-frame warp field parameters
  2. Fusion of the live frame depth map into the canonical space via the estimated warp field
  3. Adaptation of the warp-field structure to capture newly added geometry

The paper itself contains lots of work, the following summaries are only for myself references. I suggest reading the paper if you would like to go in details.

Volumetric Warp (motion) Field

The Motion Field transforms the canonical model space into the live frame, enabling the scene motion to be undone, and all depth maps to be densely fused into a single rigid TSDF reconstruction.

The structure of the warp field is constructed as a set of sparse 6D transformation nodes that are smoothly interpolated through a k-nearest node average in the canonical frame.

As visualized in the trailer image, we can observe the motion trails for a sub-sample of model vertices Fig.2(e).

Dense Non-Rigid Surface Fusion

The canonical model geometry is updated by the model-to-frame warp field.

They used the projective TSDF fusion approach to operate over non-rigidly deforming scenes. (B. L. Curless. New Methods for Surface Reconstruction from Range Images. PhD thesis, Stanford University, 1997)

Unlike the static fusion scenario where the weight w(x) encodes the uncertainty of the depth value observed at the projected pixel in the depth frame, the authors also account for uncertainty associated with the warp function at per voxel center x_c.

Estimating the Warp-field State

Given a newly observed depth map, the authors propose an efficient approach to estimate the transformations of a volumetric 6D warp field.

They obtain an initial estimate for data-association (correspondence) between the model geometry and the live frame by rendering the warped surface V into the live frame shaded with canonical frame vertex positions using a rasterizing rendering pipeline.

It is crucial for our non-rigid TSDF fusion technique to estimate a deformation not only of currently visible surfaces, but over all space within S. They use a deformation graph based regularization defined between transformation nodes, where an edge in the graph between nodes i and j adds a rigid-as-possible regularisation term to the total error being minimized, under the discontinuity preserving Huber penalty ψreg.

Extending the Warp-field

DynamicFusion obtains reconstructions of objects whilst they deform and provides dense correspondence across time.

Given the newly updated set of deformation nodes, DynamicFusion constructs an L ≥ 1 level regularisation graph node hierarchy, where the l = 0 level nodes will simply be N_{warp}.

Limitation

  • It is currently limited in its ability to achieve dynamic reconstruction of scenes that quickly move from a closed to open topology (for example starting a reconstruction with closed hands and then opening.
  • Failures common to real-time differential tracking can cause unrecoverable model corruption or result in loop closure failures
  • More challenging is the estimation of a growing warp field. As the size and complexity of the scene increases, proportionally more is occluded from the camera, and the problem of predicting the motion of occluded areas becomes much more challenging

The last bullet is really ambitious. It is indeed the best paper in CVPR 2015!