I. Introduction
Forestry has seen a large mechanization effort, yet little has been done on automating tasks requiring high-level cognition. In the last decade, other industries such as agriculture and mining, made significant progress towards automation. While facing different challenges, forestry is catching up towards autonomous machines in the forest and mills [1]. Just like the ferrying of ores has been one of the first tasks automated in mining [2], it is presumed that forwarding operations, i.e., extracting logs from the forest with heavy machinery, will be the first candidate for automation [3]. Log picking is an essential component in this forwarding task, but is challenging from a perception and manipulation perspective. This exacerbates the ongoing manpower shortage in forestry operations, as novice operators require lengthy training to accomplish this repetitive task [4]. On the short run, teleoperation [6] could allow operators to work from remote locations, while having access to better situ-ational awareness with real-time navigation and augmented reality. Other assistance systems, like the Intelligent Boom Control (IBC), are also simplifying the work of operators, thus reducing their cognitive load [7]. Taking advantage of these technological advances, unmanned logging machines are getting closer to reality. However, previous work on autonomous log handling [8] assume that the position and orientation of the logs are known, which is generally not the case.
(a) View from an actual forestry forwarder, captured with one of our dashcams during actual log picking operations. Mask2Former [5], the best performing network trained on our dataset, predicts masks on wood logs in the scene (right). We focus on detecting top logs from an upper point of view, hence the undetected logs on the left. (b) Rotated bounding boxes (solid blue line) are more adapted to detect logs because the latter are elongated and randomly oriented. This is in contrast with axis-aligned bounding boxes (dashed orange line), which will encompass other logs. (c) For localization purposes, oriented boxes are not enough since crooked logs lead to the center point being off, as shown by the stars. Therefore, we focused on instance segmentation with masks.