Loading [MathJax]/extensions/MathMenu.js
IEEE Xplore Search Results

Showing 1-25 of 3,358 resultsfor

Results

Japanese automatic speech recognition (ASR) in conversation is considered challenging and highly ambiguous because the subjects are usually omitted and the number of homonyms is quite large. Since the speaker's utterance alone may not provide enough information, the context of the previous utterance from the dialog partner might help. A limited number of studies have addressed the idea of i...Show More
Automatic image segmentation is critical for medical image segmentation. For example, automatic segmentation of infection area of COVID-19 before and after diagnosis and treatment can help us automatically analyze the diagnosis and treatment effect. The existing algorithms do not solve the problems of insufficient data and insufficient feature extraction at the same time. In this paper, we propose...Show More
Context-aware neural machine translation has attracted much attention recently by promising sophisticated contextual information integration into conventional neural machine translation. However, context-aware NMT is challenged with effective context aggregation and increased training time due integration of extra information. In this work, we study the effect of encoding selective contextual info...Show More
The pursuit of accurate and fluent English communication is the cornerstone of global academic exchange and expression. The accuracy of written English is crucial for effective discourse; however, despite advancements in grammatical error correction (GEC) technology, these systems often fall short when analyzing coherent text. This paper delves into the complexities of English writing and the chal...Show More
Global contextual information needs to be modeled precisely for accurate segmentation of images taken by Unmanned Aerial Vehicles (UAVs). This paper presents a transformer-based method for UAV street scene semantic segmentation. The method uses an encoder-decoder-based architecture to capture local and global context information in UAV images. Experimental result of the proposed method shows compe...Show More
Personalized speech enhancement (PSE) methods typically rely on pre-trained speaker verification models or self-designed speaker encoders to extract target speaker clues, guiding the PSE model in isolating the desired speech. However, these approaches suffer from significant model complexity and often underutilize enrollment speaker information, limiting the potential performance of the PSE model....Show More
In the field of image inpainting, there are some deep learning schemes, but the pixel inpainting of these schemes generally does not consider the semantics of the image. In this paper, the Semantic Control Context Encoder(SCCE) is proposed, which combines the confrontation network of text-generated images with traditional image restoration to form a comprehensive image restoration method. In this ...Show More
In recent years, with the growth of data, storage and computing power many challenging problems can be solved. One of the most challenging and interesting tasks is to recover data in pixel level of image without using any additional information. Context encoder, a combination of auto-encoder and GAN, uses an unsupervised visual feature learning algorithm to predict missing pixel data in images. Th...Show More
Context encoder with loss function based on generative adversarial networks (GAN) have been shown superior in image inpainting. However, when using the adversarial loss alone, the texture of the original image and the recovered regions is occasionally inconsistent. In order to solve this problem, this paper introduces a new constraint called contextual consistent loss and proposes a novel algorith...Show More
We present a multi-speaker Japanese audiobook text-to-speech (TTS) system that leverages multimodal context information of preceding acoustic context and bilateral textual context to improve the prosody of synthetic speech. Previous work either uses unilateral or single-modality context, which does not fully represent the context information. The proposed method uses an acoustic context encoder an...Show More
This paper presents a generalized form of large-context language models (LCLMs) that can take linguistic contexts beyond utterance boundaries into consideration. In discourse-level and conversation-level automatic speech recognition (ASR) tasks, which have to handle a series of utterances, it is essential to capture long-range linguistic contexts beyond utterance boundaries. The LCLMs of previous ...Show More
The encoder-decoder architecture is widely used as a lightweight semantic segmentation network. However, it struggles with a limited performance compared to a well-designed Dilated-FCN model for two major problems. First, commonly used upsampling methods in the decoder such as interpolation and deconvolution suffer from a local receptive field, unable to encode global contexts. Second, low-level f...Show More
Neural machine translation (NMT) heavily relies on its encoder to capture the underlying meaning of a source sentence so as to generate a faithful translation. However, most NMT encoders are built upon either unidirectional or bidirectional recurrent neural networks, which either do not deal with future context or simply concatenate the history and future context to form context-dependent word rep...Show More
Traditional automatic speech recognition (ASR) systems usually focus on individual utterances, without considering long-form speech with useful historical information, which is more practical in real scenarios. Simply attending longer transcription history for a vanilla neural transducer model shows no much gain in our preliminary experiments, since the prediction network is not a pure language mo...Show More
A pixel-wise classification for high-resolution (HR) synthetic aperture radar (SAR) images is a challenging task, due to the limited availability of labeled SAR data, as well as the difficulty of exploring context information affected by coherent speckle. In this article, we propose a novel supervised classification method for HR SAR images, which combines a context-aware encoder network (CAEN) an...Show More
We present a novel large-context end-to-end automatic speech recognition (E2E-ASR) model and its effective training method based on knowledge distillation. Common E2E-ASR models have mainly focused on utterance-level processing in which each utterance is independently transcribed. On the other hand, large-context E2E-ASR models, which take into account long-range sequential contexts beyond utteran...Show More
We show how to construct an optimal context tree for a given plaintext and context order. Our algorithm runs in time linear in the size of the plaintext and the size of the context, and consumes space linear in the size of the plaintext. We evaluate the algorithm's performance on the CCITT test set for bilevel images with the Euclidean-norm context order.Show More
Recognising apparent emotion from audio-visual signals in naturalistic conditions remains an open problem. Existing methods that build on recurrent models, or in the modelling of contextual dependencies at the feature level using self-attention fail to model the long-term dependencies that subtly occur at different levels of abstraction. Affective Processes have emerged as a novel paradigm to the ...Show More
Conversational Text-to-Speech (TTS) aims to synthesis an utterance with the right linguistic and affective prosody in a conversational context. The correlation between the current utterance and the dialogue history at the utterance level was used to improve the expressiveness of synthesized speech. However, the fine-grained information in the dialogue history at the word level also has an importan...Show More
This paper presents an arithmetic coding scheme for DCT coefficients in video compression, in which the number of non-zero coefficients, significant map and level information for a DCT block are used as coding elements. To exploit the statistical correlations, an hierarchical dependency context model (HDCM) is proposed, where the number of non-zero coefficients and scanned position are used to cap...Show More
With the acceleration of globalization, the demand for cross-language information retrieval is increasing. The traditional machine translation technology has some limitations, and it is difficult to accurately capture the semantic differences between the source and target languages. In this study, an artificial intelligence translation model based on context-awareness is proposed. The model utiliz...Show More
Sufficient context is essential for modeling the geometric distribution of large-scale dynamic point clouds. However, previous methods gather the context without considering the distinctive characteristics of different contexts, which leads to suboptimal performances. In this paper, we propose an octree-based diverse context model that captures the large-scale context, local detailed context, and ...Show More
Predicting new user behavior has always been a challenging issue in intelligent recommender systems. This challenge is mainly due to the extreme asymmetry of information between new users and old users. Existing factorization models can efficiently process and map asymmetric information, but they are not good at mining deep relationships between contexts when compressing high-dimensional data. In ...Show More
In this paper, we analyze the binary arithmetic coding of High Efficiency Video Coding (HEVC) and the second generation of audio and video coding standard (AVS2). Then an optimized probability estimation scheme is proposed for arithmetic coder. The proposed scheme is incorporated into the HEVC reference software (HM 16.0) and AVS2 reference software (RD 10.1). Experimental results demonstrate that...Show More
In text compression, statistical context modeling aims to construct a model to calculate the probability distribution of a character based upon its context. The order-k context of a symbol is defined as the string formed by its preceding k symbols. This study introduces compressed context modeling, which defines the order-k context of a character as the sequence of k-bits composed of the entropy c...Show More