Loading [MathJax]/extensions/MathZoom.js
Michael Arens - IEEE Xplore Author Profile

Showing 1-25 of 51 results

Filter Results

Show

Results

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results

Matej Kristan;Jiří Matas;Martin Danelljan;Michael Felsberg;Hyung Jin Chang;Luka Čehovin Zajc;Alan Lukežič;Ondrej Drbohlav;Zhongqun Zhang;Khanh-Tung Tran;Xuan-Son Vu;Johanna Björklund;Christoph Mayer;Yushan Zhang;Lei Ke;Jie Zhao;Gustavo Fernández;Noor Al-Shakarji;Dong An;Michael Arens;Stefan Becker;Goutam Bhat;Sebastian Bullinger;Antoni B. Chan;Shijie Chang;Hanyuan Chen;Xin Chen;Yan Chen;Zhenyu Chen;Yangming Cheng;Yutao Cui;Chunyuan Deng;Jiahua Dong;Matteo Dunnhofer;Wei Feng;Jianlong Fu;Jie Gao;Ruize Han;Zeqi Hao;Jun-Yan He;Keji He;Zhenyu He;Xiantao Hu;Kaer Huang;Yuqing Huang;Yi Jiang;Ben Kang;Jin-Peng Lan;Hyungjun Lee;Chenyang Li;Jiahao Li;Ning Li;Wangkai Li;Xiaodi Li;Xin Li;Pengyu Liu;Yue Liu;Huchuan Lu;Bin Luo;Ping Luo;Yinchao Ma;Deshui Miao;Christian Micheloni;Kannappan Palaniappan;Hancheol Park;Matthieu Paul;HouWen Peng;Zekun Qian;Gani Rahmon;Norbert Scherer-Negenborn;Pengcheng Shao;Wooksu Shin;Elham Soltani Kazemi;Tianhui Song;Rainer Stiefelhagen;Rui Sun;Chuanming Tang;Zhangyong Tang;Imad Eddine Toubal;Jack Valmadre;Joost van de Weijer;Luc Van Gool;Jash Vira;Stèphane Vujasinović;Cheng Wan;Jia Wan;Dong Wang;Fei Wang;Feifan Wang;He Wang;Limin Wang;Song Wang;Yaowei Wang;Zhepeng Wang;Gangshan Wu;Jiannan Wu;Qiangqiang Wu;Xiaojun Wu;Anqi Xiao;Jinxia Xie;Chenlong Xu;Min Xu;Tianyang Xu;Yuanyou Xu;Bin Yan;Dawei Yang;Ming-Hsuan Yang;Tianyu Yang;Yi Yang;Zongxin Yang;Xuanwu Yin;Fisher Yu;Hongyuan Yu;Qianjin Yu;Weichen Yu;YongSheng Yuan;Zehuan Yuan;Jianlin Zhang;Lu Zhang;Tianzhu Zhang;Guodongfang Zhao;Shaochuan Zhao;Yaozong Zheng;Bineng Zhong;Jiawen Zhu;Xuefeng Zhu;Yueting Zhuang;ChengAo Zong;Kunlong Zuo

2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Year: 2023 | Conference Paper |
Cited by: Papers (1)
The Visual Object Tracking Segmentation VOTS2023 challenge is the eleventh annual tracker benchmarking activity of the VOT initiative. This challenge is the first to merge short-term and long-term as well as single-target and multiple-target tracking with segmentation masks as the only target location specification. A new dataset was created; the ground truth has been withheld to prevent overfitti...Show More

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results

Matej Kristan;Jiří Matas;Martin Danelljan;Michael Felsberg;Hyung Jin Chang;Luka Čehovin Zajc;Alan Lukežič;Ondrej Drbohlav;Zhongqun Zhang;Khanh-Tung Tran;Xuan-Son Vu;Johanna Björklund;Christoph Mayer;Yushan Zhang;Lei Ke;Jie Zhao;Gustavo Fernández;Noor Al-Shakarji;Dong An;Michael Arens;Stefan Becker;Goutam Bhat;Sebastian Bullinger;Antoni B. Chan;Shijie Chang;Hanyuan Chen;Xin Chen;Yan Chen;Zhenyu Chen;Yangming Cheng;Yutao Cui;Chunyuan Deng;Jiahua Dong;Matteo Dunnhofer;Wei Feng;Jianlong Fu;Jie Gao;Ruize Han;Zeqi Hao;Jun-Yan He;Keji He;Zhenyu He;Xiantao Hu;Kaer Huang;Yuqing Huang;Yi Jiang;Ben Kang;Jin-Peng Lan;Hyungjun Lee;Chenyang Li;Jiahao Li;Ning Li;Wangkai Li;Xiaodi Li;Xin Li;Pengyu Liu;Yue Liu;Huchuan Lu;Bin Luo;Ping Luo;Yinchao Ma;Deshui Miao;Christian Micheloni;Kannappan Palaniappan;Hancheol Park;Matthieu Paul;HouWen Peng;Zekun Qian;Gani Rahmon;Norbert Scherer-Negenborn;Pengcheng Shao;Wooksu Shin;Elham Soltani Kazemi;Tianhui Song;Rainer Stiefelhagen;Rui Sun;Chuanming Tang;Zhangyong Tang;Imad Eddine Toubal;Jack Valmadre;Joost van de Weijer;Luc Van Gool;Jash Vira;Stèphane Vujasinović;Cheng Wan;Jia Wan;Dong Wang;Fei Wang;Feifan Wang;He Wang;Limin Wang;Song Wang;Yaowei Wang;Zhepeng Wang;Gangshan Wu;Jiannan Wu;Qiangqiang Wu;Xiaojun Wu;Anqi Xiao;Jinxia Xie;Chenlong Xu;Min Xu;Tianyang Xu;Yuanyou Xu;Bin Yan;Dawei Yang;Ming-Hsuan Yang;Tianyu Yang;Yi Yang;Zongxin Yang;Xuanwu Yin;Fisher Yu;Hongyuan Yu;Qianjin Yu;Weichen Yu;YongSheng Yuan;Zehuan Yuan;Jianlin Zhang;Lu Zhang;Tianzhu Zhang;Guodongfang Zhao;Shaochuan Zhao;Yaozong Zheng;Bineng Zhong;Jiawen Zhu;Xuefeng Zhu;Yueting Zhuang;ChengAo Zong;Kunlong Zuo

This paper proposes a novel method for vision-based metric cross-view geolocalization (CVGL) that matches the camera images captured from a ground-based vehicle with an aerial image to determine the vehicle's geo-pose. Since aerial images are globally available at low cost, they represent a potential compromise between two established paradigms of autonomous driving, i.e. using expensive high-defi...Show More
This paper proposes a novel method for geo-tracking, i.e. continuous metric self-localization in outdoor environments by registering a vehicle's sensor information with aerial imagery of an unseen target region. Geo- tracking methods offer the potential to supplant noisy signals from global navigation satellite systems (GNSS) and expensive and hard to maintain prior maps that are typically used fo...Show More
Despite their black box nature, deep neural networks have been successfully used in practical applications lately. In areas where the results of these applications can lead to safety hazards or decisions of ethical relevance, the application provider is accountable for the resulting decisions and should therefore be able to explain, how, and why a specific decision was made. For image processing n...Show More
While current methods for interactive Video Object Segmentation (iVOS) rely on scribble-based interactions to generate precise object masks, we propose a Click-based interactive Video Object Segmentation (CiVOS) framework to simplify the required user workload as much as possible. CiVOS builds on de-coupled modules reflecting user interaction and mask propagation. The interaction module converts c...Show More
Methods to quantify the complexity of trajectory datasets are still a missing piece in benchmarking human trajectory prediction models. In order to gain a better understanding of the complexity of trajectory prediction tasks and following the intuition, that more complex datasets contain more information, an approach for quantifying the amount of information contained in a dataset from a prototype...Show More
In the field of indoor localization, ultra-wideband (UWB) technology is no longer dispensable. The market demands that the UWB hardware has to be cheap, precise, and accurate. These requirements lead to the popularity of the DecaWave UWB system. The great majority of the publications about this system deal with the correction of the signal power, hardware delay, or clock drift. It has traditionall...Show More
Precise indoor localization is a major challenge in the field of localization. In this article, we investigate multiple error corrections for the ultrawideband (UWB) technology, in particular the DecaWave DW1000 transceiver. Both the time-of-arrival (TOA) and the time-difference-of-arrival (TDOA) methods are considered. Various clock-drift correction methods for TOA from the literature are reviewe...Show More
The amount of available surveillance video data is increasing rapidly and therefore makes manual inspection impractical. The goal of activity detection is to automatically localize activities spatially and temporally in a large collection of video data. In this work we will answer the question to what extent context plays a role in spatio-temporal activity detection in extended videos. Towards thi...Show More
Self-calibration of time-of-arrival positioning systems is made difficult by the non-linearity of the relevant set of equations. This work applies dimension lifting to this problem. The objective function is extended by an additional dimension to allow the dynamics of the optimization to avoid local minima. Next to the usual numerical optimization, a partially analytical method is suggested, which...Show More
Statistical models of the human body surface are generally learned from thousands of high-quality 3D scans in predefined poses to cover the wide variety of human body shapes and articulations. Acquisition of such data requires expensive equipment, calibration procedures, and is limited to cooperative subjects who can understand and follow instructions, such as adults. We present a method for learn...Show More
In this paper, several variants of two-stream architectures for temporal action proposal generation in long, untrimmed videos are presented. Inspired by the recent advances in the field of human action recognition utilizing 3D convolutions in combination with two-stream networks and based on the Single-Stream Temporal Action Proposals (SST) architecture, four different two-stream architectures uti...Show More
Three-dimensional environment perception is a key element of autonomous driving and driver assistance systems. A common image based approach to determine three-dimensional scene information is stereo matching, which is limited by the stereo camera baseline. In contrast to stereo matching based methods, we present an approach to reconstruct three-dimensional object trajectories combining temporal a...Show More
This paper presents a method to reconstruct three-dimensional object motion trajectories in stereo video sequences. We apply stereo matching to each image pair of a stereo sequence to compute corresponding binocular disparities. By combining instance-aware semantic segmentation techniques and optical flow cues, we track two-dimensional object shapes on pixel level. This allows us to determine for ...Show More
The precise determination of correspondences between pairs of images is still a fundamental building block of many computer vision systems. Despite the maturity of modern feature matchers, multispectral methods are still lacking robustness and speed. We focus on the problem of finding point correspondences in a multispectral imaging setup. Most methods aim at invariant feature transforms (e.g. mul...Show More
Most multiple object tracking methods rely on object detection methods in order to initialize new tracks and to update existing tracks. Although strongly interconnected, tracking and detection are usually addressed as separate building blocks. However both parts can benefit from each other, e.g. the affinity model from the tracking method can reuse appearance features already calculated by the det...Show More
Recurrent neural networks are able to learn complex long-term relationships from sequential data and output a probability density function over the state space. Therefore, recurrent models are a natural choice to address path prediction tasks, where a trained model is used to generate future expectations from past observations. When applied to security applications, like predicting pedestrian path...Show More
This work proposes a novel approach to reconstruct three-dimensional vehicle trajectories in monocular video sequences. We leverage state-of-the-art instance-aware semantic segmentation and optical flow methods to compute object video tracks on pixel level. This approach uses Structure from Motion to determine camera poses relative to vehicle instances and environment structures. We parameterize v...Show More
The quadratic system provided by the Time of Arrival technique can be solved analytical or by non-linear least squares minimization. In real environments the measurements are always corrupted by noise. This measurement noise effects the analytical solution more than non-linear optimization algorithms. On the other hand it is also true that local optimization tends to find the local minimum, instea...Show More
Visual object tracking is a challenging task in computer vision, especially if there are no constraints to the scenario and the objects are arbitrary. The number of tracking algorithms is very large and all have diverse advantages and disadvantages. Normally they show various behaviour and their failures in the tracking process occur at different moments in the sequence. So far, there is no tracke...Show More
We present a method to perform online Multiple Object Tracking (MOT) of known object categories in monocular video data. Current Tracking-by-Detection MOT approaches build on top of 2D bounding box detections. In contrast, we exploit state-of-the-art instance aware semantic segmentation techniques to compute 2D shape representations of target objects in each frame. We predict position and shape of...Show More
A technique widely used in video based situation assessment, and especially in anomaly detection, is the analysis of spatial behavior in terms of motion profiles recorded along trajectories. An intuitive assessment metric is the deviation from normal behavior, where generative models are a natural choice for capturing the underlying statistics. Applying such outlier methods in open world scenarios...Show More
Real images contain symmetric Gestalten with high probability. I.e. certain parts can be mapped on other certain parts by the usual Gestalt laws and are repeated there with high similarity. Moreover, such mapping comes in nested hierarchies - e.g. a reflection Gestalt that is made of repetition friezes, whose parts are again reflection symmetric compositions. This can be explicitly modelled by con...Show More
Real images contain reflection symmetry and repetition in rows with high probability. I.e. certain parts can be mapped on other certain parts by the usual Gestalt laws and are repeated there with high similarity. Moreover, such mapping comes in nested hierarchies - e.g. a reflection Gestalt that is made of repetition friezes, whose parts are again reflection symmetric compositions. It is our inten...Show More
Motion analysis of infants is used for early detection of movement disorders like cerebral palsy. For the development of automated methods, capturing the infant's pose accurately is crucial. Our system for predicting 3D joint positions is based on a recently introduced pixelwise body part classifier using random ferns, to which we propose multiple enhancements. We apply a feature selection step be...Show More