I. Introduction
Advanced 3D visual media often require dynamic light fields or multi-view videos for ambient communication and immersive user experience [1]–[3]. In order to enhance such media without limiting them to computer-generated synthetic ones, we need to develop technologies for acquiring dynamic light fields composed of real multi-view videos inexpensively and robustly. Actually, dynamic light fields have been obtained as multi-view videos captured by camera arrays [4]–[9]. Massive camera arrays allow us to acquire high-quality real multi-view videos of sufficient image resolution by a large number of viewpoints. However, due to many device components including video synchronization and aggregation for a camera array, it not only costs too much but also severely suffers from low fault tolerance. In contrast, we can enjoy an inexpensive light field camera combining a micro-lens array with a single image sensor to acquire multi-view still images [10]–[15]. Unfortunately, although it captures very dense light fields, its optical device causes insufficient exposure, that leads to awful noise on videos. Thus, light field cameras cannot easily obtain high-quality multi-view videos as robust dynamic light field acquisition.