Yixiao Wang - IEEE Xplore Author Profile

Showing 1-24 of 24 results

Results

In the rapidly evolving field of autonomous driving, accurately detecting and understanding dynamic environments remains a challenge. Federated learning (FL) offers a promising approach by integrating decentralized models from multiple Connected Autonomous Vehicles (CAVs) to enhance the performance of Deep Learning (DL) based object detection methods. However, traditional FL faces hurdles such as ...Show More
People nowadays can easily synthesize high fidelity fake images with different types of image content due to the rapid advances of deep learning technologies. Detecting such images and attributing them to their generative models (GMs) is crucial. Existing deep learning methods attempt to identify and classify GM-specific artifacts but often struggle with content-independence and generalizability. ...Show More
Detecting drivers' distraction and emotion has raised attention due to its importance in ensuring driving safety, especially with the increasing number of accidents caused by distracted or emotionally unstable drivers. Previous research has employed the multitasking method to detect these two factors simultaneously but paid insufficient attention to the emotion detection part. Meanwhile, existing ...Show More
Deepfake technology has already impacted the integrity of news and may grow to hugely destructive political and social force. The realistic and convincing nature of deepfakes poses a threat to the authenticity of information, alarming individuals and organizations. While many studies have explored the issue of deepfakes, the majority of them have focused on swapping entire faces rather than partia...Show More
Video/image compression codecs utilize the characteristics of the human visual system and its varying sensitivity to certain frequencies, brightness, contrast, and colors to achieve high compression. Inevitably, compression introduces undesirable visual artifacts. As compression standards improve, restoring image quality becomes more challenging. Recently, deep learning based models, especially tr...Show More
Explainable decision-making is critical for building trust in autonomous vehicles. We investigate the use of a pre-trained large language model (LLM) to derive comprehensible driving decisions from multi-modal time-series data captured by a monocular camera on an autonomous vehicle. Leveraging a graph-of-thought structure, the LLM learns policies that perform robustly while generating natural lang...Show More
Lowering the brightness of digital displays as a means to reduce power consumption and extend battery life is a widely adopted strategy. However, this course of action inevitably results in decreased image contrast and a negative influence on the overall image quality. In this paper, we propose a method to enhance the visual quality of dimmed displays while keeping the overall brightness and power...Show More
Impressive advancements in capturing, display, and broadcasting technologies significantly elevate image and video quality, and with that the need for designing new reference and no-reference image and video quality metrics. One of the latest and perceptually accurate video quality metrics is the Video Multi-Method Assessment Fusion (VMAF) method. However, VMAF considers the temporal nature of vid...Show More
Federated learning has shown great potential in improving the accuracy of models designed for connected autonomous vehicles (CAVs). However, existing approaches only focus on data collected by CAVs, ignoring the valuable insights provided by other types of clients, such as road-side units (RSUs). In this paper, we propose an approach that combines federated learning with cooperative perception to ...Show More
The use of Stable Diffusion models to generate realistic images has become a popular topic in recent years. However, this technology has also raised concerns about the potential harm it may cause to the copyright holders, particularly in the realm of art where these synthesized images can closely resemble the original work. As these synthesized images are hard for humans to distinguish from authen...Show More
In this paper, a new and calibrated light field (LF) video dataset is introduced, which focuses on outdoor scenes and objects. Each video stream is 10 seconds long and it is captured with a dense camera array that consists of $5\times 5$ camera modules in $1640\times 1232$ resolution at 40 frames per second. As multiple cameras in an array setup may suffer from various conditions of camera setting...Show More
The increasing demand of emerging 8K video content, has made its transmission one of the main challenges for broadcasting companies. Thus, upgrading the existing infrastructures seem to be the only option to support transmission the 8K content at the desired quality level, which is costly and time consuming. To address this critical challenge, we propose a novel approach that avoids such a prematu...Show More
Nowadays, human face recognition systems have been widely used in different applications in which identity recognition is needed. The performance of current face recognition algorithms is negatively affected by occlusions, such as facial masks and various human poses. To address these challenges, we re-trained a modified version of the VGG19 deep learning model on masked and unmasked images of 62 ...Show More
An accurate no-reference image quality assessment metric for compression artifacts is essential for the broadcasting and streaming industries. Although we have witnessed impressive advances in the capturing, delivery and display technologies, we have not managed to match them with an accurate and perceptual based no-reference image quality metric. In this paper, we propose a unique perceptual base...Show More
One of the main challenges in designing deep learning networks for autonomous driving is the lack of labeled data. Recent trends that address this problem involve the use of unlabeled data. In this paper, we propose a unified semi-supervised and federated learning (FL) approach that is designed to offer cost efficient and practical training of deep learning object detection models for autonomous d...Show More
Forged image localization is an important research task, as such images may have a tremendous impact of various aspects of society. Images can be manipulated using image editing tools (known as “shallowfakes”) or, recently, artificial intelligence techniques (“deepfakes”). While there are many existing works that are designed for manipulated areas localization on either shallow- or deep-fake image...Show More
High dynamic range (HDR) has arguably been established as the preferred image and video format for content providers. As standard dynamic range (SDR) displays still dominate the market, there is a need for finding efficient ways to convert HDR content to the SDR format, a process known as tone mapping. Recently, many tone mapping operators (TMOs) have been proposed that are based on deep learning ...Show More
Standard dynamic range (SDR) technology has been the foundation of medical imaging to this day, making medical images lack contrast and details. To address this issue, a feasible option before adopting high dynamic range (HDR) technology in the medical image capturing is to use inverse tone mapping (iTMO) to convert SDR images into HDR images. This approach can synthetically recover some of the in...Show More
Telehealth applications, such as remote diagnosis and examination, have become more and more popular nowadays. However, the generated large number of medical images and their impractical digital size when it comes to telehealth have brought pressure on communication infrastructures. More specifically, the image processing time has increased dramatically due to the size of the digitized images, whi...Show More
As the deepfake technology emerges at a breathtaking pace, it threatens to become a destructive political and social force with unpredictable impact on society. Therefore, detecting deepfakes and even figuring out which deep learning generative models (GMs) created such images is of extreme importance. There are already several methods that find and categorize artifacts left by GMs, with the lates...Show More
Timely road condition inspection and maintenance are key components of infrastructure management for smart cities, as they reduce traffic congestion, accidents and repairing costs. Traditional road inspection methods that employ vibrations and/or laser scanning for detecting road deterioration use expensive equipment and dedicated municipality vehicles. Recently, computer vision techniques and art...Show More
Advances in display technology have led to the introduction of 8K Ultra High Definition (UHD) displays to the consumer market, offering an improved visual experience. However, the lack of 8K High Dynamic Range (HDR) content is a major challenge for the wide adoption. In this paper, we introduce a deep learning approach based on generative adversarial networks to generate 8K UHD HDR content from Fu...Show More
This paper proposes a unique real-time street parking detection scheme that utilizes visual information and the YOLOv4 convolutional neural network to accurately detect available parking spaces. We also introduce a new video dataset that is captured specifically for this task and is used for training our network. Our network being the first of its kind, successfully detects available street parkin...Show More
Video compression, from the early days of digital video, has been necessary for efficiently transmitting the large amount of raw captured data. Throughout the years, many video compression standards, mainly under the umbrella of the Moving Picture Experts Group (MPEG), have been introduced, ranging from H.264 to the latest VVC standard. Recently, an alliance of technical companies promoted an open...Show More