10:15 - 11:00
One specific question within machine learning that has recently fascinated many is the prospect of predicting 3D forms of objects in 2D images without any given annotations/labels. An unsupervised method for achieving this was presented by Shubhum Goel, et al., in 2020. It proposes a framework called U-CMR (Unsupervised Category-Specific Mesh Reconstruction) to recover the shape, pose, and texture of an object from a single image, without any ground truths (keypoints). In this case, keypoint supervision refers to annotations that give the models clues as to what general 3D shape the object would conform to. The methodology involves training a shape and texture predictor, optimizing a “camera-multiplex” (a set of camera-position hypotheses for each image), and then rendering that predicted shape and texture in the multiplex and computing a per-camera reconstruction loss. The predictor is then updated against expected loss, and each camera is updated based on loss incurred. Then, a feed-forward model is trained to predict the optimal camera position in the multiplex, recovering the 3D form. Performance metrics approached previous supervised models. Liberation from having to obtain keypoint-labeled data for 3D shape and texture modeling is a key advantage and U-CMR will be used in future unforeseen applications.