1. Introduction
The problem of structure from motion — recovering the 3D structure of an object and locations of the camera from a monocular video stream — has been studied extensively in computer vision. For rigid scenes, many algorithms are based on the seminal work of Tomasi and Kanade [32], in which it was shown that a noise-free measurement matrix of point tracks has rank at most 3 for an affine camera when the data are centered at the origin. The 3D locations of all tracked points and camera positions can be easily obtained from a factorization of this matrix. Due to occlusion, however, many matrix entries are typically missing and standard matrix factorization techniques can no longer be applied.