I. Introduction
The perception and representation of the environment is a key ingredient in automated driving systems. A multitude of data fusion methods and ways of modeling the local area around the ego vehicle have been proposed [1], [2]. With regard to dynamic objects, the majority of work focuses on the detection of known object classes such as vehicles, cyclists, or pedestrians. Deep neural networks are trained on established labeled datasets such as KITTI or nuScenes based on camera, LiDAR, and RADAR data to detect such predefined object classes [3], [4]. In reality, however, the spectrum of objects that can be dynamic is not limited to predefined classes, but nearly anything can move. Examples include shopping carts, rolling tires, or all kinds of animals. But also standard classes such as vehicles exist in all kinds of non-standard appearances, see Fig. 1. Detectors trained on predefined object classes are incapable to perceive such generic dynamic objects – let alone to estimate their velocities or accelerations, which can lead to dangerous situations.