1. Introduction
Line segments and junctions are prominent visual patterns in the low-level vision, and thus often used as important cues/features to facilitate many downstream vision tasks such as camera pose estimation [24], [25], [11], image matching [36], image rectification [37], structure from motion (SfM) [4], [22], visual SLAM [19], [39], [42], and surface reconstruction [17]. Both line segment detection and junction detection remain challenging problems in computer vision [32], [34], [35]. Line segments and junctions are often statistically coupled in images. So, a new research task, wireframe parsing, is recently emerged to tackle the problem of jointly detecting meaningful and salient line segments and junctions with large-scale benchmarks available [15]. And, end-to-end trainable approaches based on deep neural networks (DNNs) are one of the most interesting frameworks, which have shown remarkable performance.
Illustration of the proposed HAWP in comparison with L-CNN [41] in wireframe parsing. The two methods adopt the same two-stage parsing pipeline: Proposal (line segments and junctions) generation and proposal verification. They use the same junction prediction in (d) and verification modules. The key difference lies in the line segment proposal generation. L-CNN bypasses directly learning line segment prediction module and resorts to a sophisticated sampling based approach for generation line segment proposals in (e). Our HAWP proposes a novel line segment prediction method in (b) for more accurate and efficient parsing, e.g., the parsing results of the window in (c) and (f).