I. Introduction
Aerial cinematography has become an increasingly popular tool for capturing footage in a variety of media production contexts, including sports events, commercials, and movies. While manual operation offers a high degree of artistic control and the ability to achieve professional-looking shots, recent advancements in autonomous technology have made it possible to replicate these results through automated means. The literature has identified and described visually pleasing combinations of framing shot types and camera movements [1]. A common thread among these descriptions is the need for knowledge of the target's 3D position in a world coordinate system at all times. This requires the use of a target detection, localization, and tracking system, particularly in unstructured environments. It must be able to run in real-time, onboard the UAV, and should not rely on external sensors. Recent advances in computer vision techniques have emerged as a highly efficient means of performing object detection and tracking. By combining this technology with a pinhole camera model and a fast 3D position estimation algorithm, a highly efficient real-time object localization tool can be created for short and medium-distance UAV cinematography and aerial shot planning.