Loading [a11y]/accessibility-menu.js
Deep Realistic Novel View Generation for City-Scale Aerial Images | IEEE Conference Publication | IEEE Xplore

Deep Realistic Novel View Generation for City-Scale Aerial Images


Abstract:

In this paper we introduce a novel end-to-end framework for generation of large, aerial, city-scale, realistic synthetic image sequences with associated accurate and prec...Show More

Abstract:

In this paper we introduce a novel end-to-end framework for generation of large, aerial, city-scale, realistic synthetic image sequences with associated accurate and precise camera metadata. The two main purposes for this data are (i) to enable objective, quantitative evaluation of computer vision algorithms and methods such as feature detection, description, and matching or full computer vision pipelines such as 3D reconstruction; and (ii) to supply large amounts of high quality training data for deep learning guided computer vision methods. The proposed framework consists of three main modules, a 3D voxel renderer for data generation, a deep neural network for artifact removal, and a quantitative evaluation module for Multi-View Stereo (MVS) as an example. The 3D voxel renderer enables generation of seen or unseen views of a scene from arbitrary camera poses with accurate camera metadata parameters. The artifact removal module proposes a novel edge-augmented deep learning network with an explicit edgemap processing stream to remove image artifacts while preserving and recovering scene structures for more realistic results. Our experiments on two urban, city-scale, aerial datasets for Albuquerque (ABQ), NM and Los Angeles (LA), CA show promising results in terms of structural similarity to real data and accuracy of reconstructed 3D point clouds.
Date of Conference: 10-15 January 2021
Date Added to IEEE Xplore: 05 May 2021
ISBN Information:
Print on Demand(PoD) ISSN: 1051-4651
Conference Location: Milan, Italy

Funding Agency:


I. Introduction

High quality image/video data with precise and accurate camera metadata is a growing necessity for objective, quantitative evaluation of computer vision methods and pipelines and for development of new data-driven, artificial intelligence guided systems [1]. Of particular interest to our city-scale aerial image/video analysis field [2]–[4] are evaluation and development of fundamental computer vision methods such as feature point detection, description, and matching that constitute the core of many applications including optical flow estimation, image registration, Bundle Adjustment (BA), and Structure-from-Motion (SfM) [5] and larger 3D reconstruction pipelines such as Multi-View Stereo (MVS). Large-scale aerial video of urban scenes suffer from perspective shape distortions and other complications caused by oblique viewing angles, making accurate feature point detection and matching a challenging task. In order to find robust processing algorithms that can reach optimal results, it is important to evaluate such operators on scenario specific data. Furthermore, deep learning approaches which have proven successful in many computer vision tasks generally require large amounts of training data [6], [7]. However, data availability in some domains such as city-scale aerial video may be rather limited. In some cases, availability of accurate ground truth camera metadata is the main issue for quantitative evaluation of feature matching, bundle adjustment, or dense 3D reconstruction methods.

Contact IEEE to Subscribe

References

References is not available for this document.