Showing 1-8 of 8 results

Filter Results

Show

Results

Despite the impressive success achieved in various domains, multi-agent reinforcement learning (MARL) is still faced with the problem of incomplete information and how to deal with complex decision-making tasks with multi-task coupled. The information fusion algorithm ensures the consistency of the global state information by integrating local information to cope with information loss and communic...Show More
This paper proposes an adaptive direct-discretized recurrent neural network (ADD-RNN) algorithm with fuzzy factor to address the problem of future distinct-layer inequality and equation system (FDLIES). For comparison purposes, a traditional indirect-discretized RNN (TID-RNN) algorithm is also developed to solve the same problem. The superior performance of the proposed ADD-RNN algorithm is valida...Show More
Interactive Segmentation (IS) segments specific objects or parts in the image according to user input. Current IS pipelines fall into two categories: single-granularity out-put and multi-granularity output. The latter aims to allevi-ate the spatial ambiguity present in the former. However, the multi-granularity output pipeline suffers from limited interaction flexibility and produces redundant res...Show More
Interactive segmentation enables users to segment as needed by providing cues of objects, which introduces human-computer interaction for many fields, such as image editing and medical image analysis. Typically, massive and expansive pixel-level annotations are spent to train deep models by object-oriented interactions with manually labeled object masks. In this work, we reveal that informative in...Show More
Position information is critical for Vision Transformers (VTs) due to the permutation-invariance of self-attention operations. A typical way to introduce position information is adding the absolute Position Embedding (PE) to patch embedding before entering VTs. However, this approach operates the same Layer Normalization (LN) to token embedding and PE, and delivers the same PE to each layer. This ...Show More
Existing text-video retrieval solutions are, in essence, discriminant models focused on maximizing the conditional likelihood, i.e., p(candidates|query). While straightforward, this de facto paradigm overlooks the underlying data distribution p(query), which makes it challenging to identify out-of-distribution data. To address this limitation, we creatively tackle this task from a generative viewp...Show More
Recently, self-supervised large-scale visual pre-training models have shown great promise in representing pixel-level semantic relationships, significantly promoting the development of unsupervised dense prediction tasks, e.g., unsupervised semantic segmentation (USS). The extracted relationship among pixel-level representations typically contains rich class-aware information that semantically ide...Show More
Weakly supervised semantic segmentation is typically inspired by class activation maps, which serve as pseudo masks with class-discriminative regions highlighted. Although tremendous efforts have been made to recall precise and complete locations for each class, existing methods still commonly suffer from the unsolicited Out-of-Candidate (OC) error predictions that do not belong to the label candi...Show More