2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - Conference Table of Contents | IEEE Xplore
Conference on Computer Vision and Pattern Recognition (CVPR)

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

DOI: 10.1109/CVPR42600.2020

13-19 June 2020

Proceedings

The proceedings of this conference will be available for purchase through Curran Associates.

Computer Vision and Pattern Recognition (CVPR), 2020 IEEE/CVF Conference on

[Title page i]

Publication Year: 2020,Page(s):1 - 1

[Title page iii]

Publication Year: 2020,Page(s):3 - 3

[Copyright notice]

Publication Year: 2020,Page(s):4 - 4

Table of Contents

Publication Year: 2020,Page(s):5 - 208

Reviewers

Publication Year: 2020,Page(s):216 - 216
We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least in principle, a symmetric structure. We sho...Show More
Understanding the shape of a scene from a single color image is a formidable computer vision task. However, most methods aim to predict the geometry of surfaces that are visible to the camera, which is of limited use when planning paths for robots or augmented reality agents. Such agents can only move when grounded on a traversable surface, which we define as the set of classes which humans can al...Show More
Recovering the dynamic fluid surface is a long-standing challenging problem in computer vision. Most existing image-based methods require multiple views or a dedicated imaging system. Here we present a learning-based single-image approach for 3D fluid surface reconstruction. Specifically, we design a deep neural network that estimates the depth and normal maps of a fluid surface by analyzing the r...Show More
Any solid object can be decomposed into a collection of convex polytopes (in short, convexes). When a small number of convexes are used, such a decomposition can be thought of as a piece-wise approximation of the geometry. This decomposition is fundamental in computer graphics, where it provides one of the most common ways to approximate geometry, for example, in real-time physics simulation. A co...Show More
Polygonal meshes are ubiquitous in the digital 3D domain, yet they have only played a minor role in the deep learning revolution. Leading methods for learning generative models of shapes rely on implicit functions, and generate meshes only after expensive iso-surfacing routines. To overcome these challenges, we are inspired by a classical spatial data structure from computer graphics, Binary Space...Show More
Semantic reconstruction of indoor scenes refers to both scene understanding and object reconstruction. Existing works either address one part of this problem or focus on independent objects. In this paper, we bridge the gap between understanding and reconstruction, and propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image. Instead o...Show More
Beyond depth estimation from a single image, the monocular cue is useful in a broader range of depth inference applications and settings---such as when one can leverage other available depth cues for improved accuracy. Currently, different applications, with different inference tasks and combinations of depth cues, are solved via different specialized networks---trained separately for each applica...Show More
We propose a novel learnable representation for detail preserving shape deformation. The goal of our method is to warp a source shape to match the general structure of a target shape, while preserving the surface details of the source. Our method extends a traditional cage-based deformation technique, where the source shape is enclosed by a coarse control mesh termed cage, and translations prescri...Show More
Recent advances in image-based 3D human shape estimation have been driven by the significant improvement in representation power afforded by deep neural networks. Although current approaches have demonstrated the potential in real world settings, they still fail to produce reconstructions with the level of detail often present in the input images. We argue that this limitation stems primarily form...Show More
Under the conventional diffuse shading model with unknown directional lighting, the set of quadratic surface shapes that are consistent with the spatial derivatives of intensity at a single image point is a two-dimensional algebraic variety embedded in the five-dimensional space of quadratic shapes. We describe the geometry of this variety, and we introduce a concise feedforward model that compute...Show More
The accuracy of monocular 3D human pose estimation depends on the viewpoint from which the image is captured. While freely moving cameras, such as on drones, provide control over this viewpoint, automatically positioning them at the location which will yield the highest accuracy remains an open problem. This is the problem that we address in this paper. Specifically, given a short video sequence, ...Show More
We address the challenging task of occlusion-aware indoor 3D scene understanding. We represent scenes by a set of planes, where each one is defined by its normal, offset and two masks outlining (i) the extent of the visible part and (ii) the full region that consists of both visible and occluded parts of the plane. We infer these planes from a single input image with a novel neural network archite...Show More
Fine-grained action recognition datasets exhibit environmental bias, where multiple video sequences are captured from a limited number of environments. Training a model in one environment and deploying in another results in a drop in performance due to an unavoidable domain shift. Unsupervised Domain Adaptation (UDA) approaches have frequently utilised adversarial training between the source and t...Show More
We present a new method to learn video representations from large-scale unlabeled video data. Ideally, this representation will be generic and transferable, directly usable for new tasks such as action recognition and zero or few-shot learning. We formulate unsupervised representation learning as a multi-modal, multi-task learning problem, where the representations are shared across different moda...Show More
Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics. To capture robust movement patterns from these graphs, long-range and multi-scale context aggregation and spatial-temporal dependency modeling are critical aspects of a powerful feature extractor. However, existing methods have limitations in achieving (1) unbiased long-ran...Show More
Training competitive deep video models is an order of magnitude slower than training their counterpart image models. Slow training causes long research cycles, which hinders progress in video understanding research. Following standard practice for training image models, video model training has used a fixed mini-batch shape: a specific number of clips, frames, and spatial size. However, what is th...Show More
First-person video naturally brings the use of a physical environment to the forefront, since it shows the camera wearer interacting fluidly in a space based on his intentions. However, current methods largely separate the observed actions from the persistent space itself. We introduce a model for environment affordances that is learned directly from egocentric video. The main idea is to gain a hu...Show More
Automatically reasoning about future human behaviors is a difficult problem but has significant practical applications to assistive systems. Part of this difficulty stems from learning systems' inability to represent all kinds of behaviors. Some behaviors, such as motion, are best described with continuous representations, whereas others, such as picking up a cup, are best described with discrete ...Show More

Proceedings

The proceedings of this conference will be available for purchase through Curran Associates.

Computer Vision and Pattern Recognition (CVPR), 2020 IEEE/CVF Conference on