Exploring Spatiotemporal Consistency of Features for Video Translation in Consumer Internet of Things | IEEE Journals & Magazine | IEEE Xplore

Exploring Spatiotemporal Consistency of Features for Video Translation in Consumer Internet of Things


Abstract:

Video data has emerged as a primary source of information input in contemporary CIoT systems, significantly driving its development. However, due to the diversity of vide...Show More

Abstract:

Video data has emerged as a primary source of information input in contemporary CIoT systems, significantly driving its development. However, due to the diversity of video capture devices, videos exhibit significant heterogeneity in various aspects, such as color, texture, and lighting conditions, posing challenges for video manipulation and analysis. Moreover, different information processing terminals have limited requirements for data types, which has led to a demand for the translation of heterogeneous videos. In this paper, we propose a novel method named Structure and Motion Consistency Network (SMCN). It updates and optimizes the model from the feature level, making it more efficient at extracting invariant spatiotemporal information from different types of video data. Specifically, it fuses the structure information, a.k.a. mean and standard deviation of features at each spatial position across channels, then re-injects it to refine the spatial consistency, and maximizes motion mutual information of features from adjacent frames to improve the temporal consistency of intermediate features. We conducted experiments on the common video translation dataset Viper and the infrared-to-visible video translation dataset IRVI. Extensive experiments indicate our SMCN outperforms the state-of-the-art methods and the lightweight module can be easily applied to other models in a plug-and-play manner, showing significant advantages in addressing the problem of heterogeneous video data transformation.
Published in: IEEE Transactions on Consumer Electronics ( Volume: 70, Issue: 1, February 2024)
Page(s): 3077 - 3087
Date of Publication: 14 November 2023

ISSN Information:

No metrics found for this document.

I. Introduction

In recent years, the Consumer Internet of Things (CIoT) has experienced explosive growth, with the increasing use of multimedia, such as videos and images, creating larger and more diverse data sets. This trend has allowed for the rapid development of smart cities, with video emerging as a crucial sensing data source for applications such as Smart Home Systems [1], [2], [3], [4], [5], Video Surveillance Systems [6], [7], [8], [9], and Autonomous Driving [10], [11], [12], [13]. However, the heterogeneous nature of video data resulting from the diverse array of video recording devices impedes the development of IoT technology, with different types of data captured by specific devices, such as structural, semantic, and positional information captured by infrared, visible light, and point cloud video recordings, respectively. This difference in data distribution poses critical challenges for the recognition of various information types among different information processing terminals in IoT systems, leading to difficulties in making effective and timely decisions. Thus, the heterogeneity of video data creates a need for video translation between heterogeneous videos to enable CIoT systems to make real-time decisions. For instance, the translation of infrared video to clear visible video at night would enable CIoT systems to analyze and respond to events in real-time. Therefore, it is crucial to develop techniques to address the challenges of video heterogeneity to leverage the full potential of video data in CIoT systems.

Usage
Select a Year
2025

View as

Total usage sinceNov 2023:168
05101520JanFebMarAprMayJunJulAugSepOctNovDec16100000000000
Year Total:26
Data is updated monthly. Usage includes PDF downloads and HTML views.
Contact IEEE to Subscribe

References

References is not available for this document.