Author: "Zhai, Wei" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhai, Wei"' showing total 2,079 results

Start Over Author "Zhai, Wei"

2,079 results on '"Zhai, Wei"'

1. Improved Video VAE for Latent Video Diffusion Model

Author: Wu, Pingyu, Zhu, Kai, Liu, Yu, Zhao, Liming, Zhai, Wei, Cao, Yang, and Zha, Zheng-Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Variational Autoencoder (VAE) aims to compress pixel data into low-dimensional latent space, playing an important role in OpenAI's Sora and other latent video diffusion generation models. While most of existing video VAEs inflate a pretrained image VAE into the 3D causal structure for temporal-spatial compression, this paper presents two astonishing findings: (1) The initialization from a well-trained image VAE with the same latent dimensions suppresses the improvement of subsequent temporal compression capabilities. (2) The adoption of causal reasoning leads to unequal information interactions and unbalanced performance between frames. To alleviate these problems, we propose a keyframe-based temporal compression (KTC) architecture and a group causal convolution (GCConv) module to further improve video VAE (IV-VAE). Specifically, the KTC architecture divides the latent space into two branches, in which one half completely inherits the compression prior of keyframes from a lower-dimension image VAE while the other half involves temporal compression through 3D group causal convolution, reducing temporal-spatial conflicts and accelerating the convergence speed of video VAE. The GCConv in above 3D half uses standard convolution within each frame group to ensure inter-frame equivalence, and employs causal logical padding between groups to maintain flexibility in processing variable frame video. Extensive experiments on five benchmarks demonstrate the SOTA video reconstruction and generation capabilities of the proposed IV-VAE (https://wpy1999.github.io/IV-VAE/).
Published: 2024

2. EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting

Author: Liao, Bohao, Zhai, Wei, Wan, Zengyu, Zhang, Tianzhu, Cao, Yang, and Zha, Zheng-Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Scene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attempted to simultaneously optimize scene representations (NeRF or 3DGS) and camera poses. Despite recent progress, existing methods relying on traditional camera input tend to fail in high-speed (or equivalently low-frame-rate) scenarios. Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution, providing valuable scene and motion information in blind inter-frame intervals. In this paper, we introduce the event camera to aid scene construction from a casually captured video for the first time, and propose Event-Aided Free-Trajectory 3DGS, called EF-3DGS, which seamlessly integrates the advantages of event cameras into 3DGS through three key components. First, we leverage the Event Generation Model (EGM) to fuse events and frames, supervising the rendered views observed by the event stream. Second, we adopt the Contrast Maximization (CMax) framework in a piece-wise manner to extract motion information by maximizing the contrast of the Image of Warped Events (IWE), thereby calibrating the estimated poses. Besides, based on the Linear Event Generation Model (LEGM), the brightness information encoded in the IWE is also utilized to constrain the 3DGS in the gradient domain. Third, to mitigate the absence of color information of events, we introduce photometric bundle adjustment (PBA) to ensure view consistency across events and frames. We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS. Our project page is https://lbh666.github.io/ef-3dgs/., Comment: Project Page: https://lbh666.github.io/ef-3dgs/
Published: 2024

3. Visual-Geometric Collaborative Guidance for Affordance Learning

Author: Luo, Hongchen, Zhai, Wei, Wang, Jiao, Cao, Yang, and Zha, Zheng-Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Perceiving potential ``action possibilities'' (\ie, affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions. Prevailing affordance learning algorithms often adopt the label assignment paradigm and presume that there is a unique relationship between functional region and affordance label, yielding poor performance when adapting to unseen environments with large appearance variations. In this paper, we propose to leverage interactive affinity for affordance learning, \ie extracting interactive affinity from human-object interaction and transferring it to non-interactive objects. Interactive affinity, which represents the contacts between different parts of the human body and local regions of the target object, can provide inherent cues of interconnectivity between humans and objects, thereby reducing the ambiguity of the perceived action possibilities. To this end, we propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues to excavate interactive affinity from human-object interactions jointly. Besides, a contact-driven affordance learning (CAL) dataset is constructed by collecting and labeling over 55,047 images. Experimental results demonstrate that our method outperforms the representative models regarding objective metrics and visual quality. Project: \href{https://github.com/lhc1224/VCR-Net}{github.com/lhc1224/VCR-Net}.
Published: 2024

4. MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media

Author: Zhai, Wei, Bai, Nan, Zhao, Qing, Li, Jianqiang, Wang, Fan, Qi, Hongzhi, Jiang, Meng, Wang, Xiaoqin, Yang, Bing Xiang, and Fu, Guanghui
Subjects: Computer Science - Computation and Language
Abstract: As the prevalence of mental health challenges, social media has emerged as a key platform for individuals to express their emotions.Deep learning tends to be a promising solution for analyzing mental health on social media. However, black box models are often inflexible when switching between tasks, and their results typically lack explanations. With the rise of large language models (LLMs), their flexibility has introduced new approaches to the field. Also due to the generative nature, they can be prompted to explain decision-making processes. However, their performance on complex psychological analysis still lags behind deep learning. In this paper, we introduce the first multi-task Chinese Social Media Interpretable Mental Health Instructions (C-IMHI) dataset, consisting of 9K samples, which has been quality-controlled and manually validated. We also propose MentalGLM series models, the first open-source LLMs designed for explainable mental health analysis targeting Chinese social media, trained on a corpus of 50K instructions. The proposed models were evaluated on three downstream tasks and achieved better or comparable performance compared to deep learning models, generalized LLMs, and task fine-tuned LLMs. We validated a portion of the generated decision explanations with experts, showing promising results. We also evaluated the proposed models on a clinical dataset, where they outperformed other LLMs, indicating their potential applicability in the clinical field. Our models show strong performance, validated across tasks and perspectives. The decision explanations enhance usability and facilitate better understanding and practical application of the models. Both the constructed dataset and the models are publicly available via: https://github.com/zwzzzQAQ/MentalGLM.
Published: 2024

5. MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Author: Yang, Jian, Yin, Dacheng, Zhou, Yizhou, Rao, Fengyun, Zhai, Wei, Cao, Yang, and Zha, Zheng-Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advancements in multi-modal large language models have propelled the development of joint probabilistic models capable of both image understanding and generation. However, we have identified that recent methods inevitably suffer from loss of image information during understanding task, due to either image discretization or diffusion denoising steps. To address this issue, we propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework. Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss. Differing from diffusion-based approaches, we disentangle the diffusion process from auto-regressive backbone model by employing a light-weight diffusion head on top each auto-regressed image patch embedding. In this way, when the model transits from image generation to understanding through text generation, the backbone model's hidden representation of the image is not limited to the last denoising step. To successfully train our method, we also propose a theoretically proven technique that addresses the numerical stability issue and a training strategy that balances the generation and understanding task goals. Through extensive evaluations on 18 image understanding benchmarks, MMAR demonstrates much more superior performance than other joint multi-modal models, matching the method that employs pretrained CLIP vision encoder, meanwhile being able to generate high quality images at the same time. We also showed that our method is scalable with larger data and model size.
Published: 2024

6. VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection

Author: Deng, Huilin, Luo, Hongchen, Zhai, Wei, Cao, Yang, and Kang, Yu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Zero-shot anomaly detection (ZSAD) recognizes and localizes anomalies in previously unseen objects by establishing feature mapping between textual prompts and inspection images, demonstrating excellent research value in flexible industrial manufacturing. However, existing ZSAD methods are limited by closed-world settings, struggling to unseen defects with predefined prompts. Recently, adapting Multimodal Large Language Models (MLLMs) for Industrial Anomaly Detection (IAD) presents a viable solution. Unlike fixed-prompt methods, MLLMs exhibit a generative paradigm with open-ended text interpretation, enabling more adaptive anomaly analysis. However, this adaption faces inherent challenges as anomalies often manifest in fine-grained regions and exhibit minimal visual discrepancies from normal samples. To address these challenges, we propose a novel framework VMAD (Visual-enhanced MLLM Anomaly Detection) that enhances MLLM with visual-based IAD knowledge and fine-grained perception, simultaneously providing precise detection and comprehensive analysis of anomalies. Specifically, we design a Defect-Sensitive Structure Learning scheme that transfers patch-similarities cues from visual branch to our MLLM for improved anomaly discrimination. Besides, we introduce a novel visual projector, Locality-enhanced Token Compression, which mines multi-level features in local contexts to enhance fine-grained detection. Furthermore, we introduce the Real Industrial Anomaly Detection (RIAD), a comprehensive IAD dataset with detailed anomaly descriptions and analyses, offering a valuable resource for MLLM-based IAD development. Extensive experiments on zero-shot benchmarks, including MVTec-AD, Visa, WFDD, and RIAD datasets, demonstrate our superior performance over state-of-the-art methods. The code and dataset will be available soon.
Published: 2024

7. Grounding 3D Scene Affordance From Egocentric Interactions

Author: Liu, Cuiyu, Zhai, Wei, Yang, Yuhang, Luo, Hongchen, Liang, Sen, Cao, Yang, and Zha, Zheng-Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Grounding 3D scene affordance aims to locate interactive regions in 3D environments, which is crucial for embodied agents to interact intelligently with their surroundings. Most existing approaches achieve this by mapping semantics to 3D instances based on static geometric structure and visual appearance. This passive strategy limits the agent's ability to actively perceive and engage with the environment, making it reliant on predefined semantic instructions. In contrast, humans develop complex interaction skills by observing and imitating how others interact with their surroundings. To empower the model with such abilities, we introduce a novel task: grounding 3D scene affordance from egocentric interactions, where the goal is to identify the corresponding affordance regions in a 3D scene based on an egocentric video of an interaction. This task faces the challenges of spatial complexity and alignment complexity across multiple sources. To address these challenges, we propose the Egocentric Interaction-driven 3D Scene Affordance Grounding (Ego-SAG) framework, which utilizes interaction intent to guide the model in focusing on interaction-relevant sub-regions and aligns affordance features from different sources through a bidirectional query decoder mechanism. Furthermore, we introduce the Egocentric Video-3D Scene Affordance Dataset (VSAD), covering a wide range of common interaction types and diverse 3D environments to support this task. Extensive experiments on VSAD validate both the feasibility of the proposed task and the effectiveness of our approach.
Published: 2024

8. PEAR: Phrase-Based Hand-Object Interaction Anticipation

Author: Zhang, Zichen, Luo, Hongchen, Zhai, Wei, Cao, Yang, and Kang, Yu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: First-person hand-object interaction anticipation aims to predict the interaction process over a forthcoming period based on current scenes and prompts. This capability is crucial for embodied intelligence and human-robot collaboration. The complete interaction process involves both pre-contact interaction intention (i.e., hand motion trends and interaction hotspots) and post-contact interaction manipulation (i.e., manipulation trajectories and hand poses with contact). Existing research typically anticipates only interaction intention while neglecting manipulation, resulting in incomplete predictions and an increased likelihood of intention errors due to the lack of manipulation constraints. To address this, we propose a novel model, PEAR (Phrase-Based Hand-Object Interaction Anticipation), which jointly anticipates interaction intention and manipulation. To handle uncertainties in the interaction process, we employ a twofold approach. Firstly, we perform cross-alignment of verbs, nouns, and images to reduce the diversity of hand movement patterns and object functional attributes, thereby mitigating intention uncertainty. Secondly, we establish bidirectional constraints between intention and manipulation using dynamic integration and residual connections, ensuring consistency among elements and thus overcoming manipulation uncertainty. To rigorously evaluate the performance of the proposed model, we collect a new task-relevant dataset, EGO-HOIP, with comprehensive annotations. Extensive experimental results demonstrate the superiority of our method., Comment: 22 pages, 10 figures, 4 tables
Published: 2024

9. CrysToGraph: A Comprehensive Predictive Model for Crystal Materials Properties and the Benchmark

Author: Wang, Hongyi, Sun, Ji, Liang, Jinzhe, Zhai, Li, Tang, Zitian, Li, Zijian, Zhai, Wei, Wang, Xusheng, Gao, Weihao, and Gong, Sheng
Subjects: Condensed Matter - Materials Science, Computer Science - Machine Learning, Physics - Computational Physics
Abstract: The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of crystals, it is crucial to consider long-range orders. While GNN excels at capturing the local environment of atoms in crystals, they often face challenges in effectively capturing longer-ranged interactions due to their limited depth. In this paper, we propose CrysToGraph ($\textbf{Crys}$tals with $\textbf{T}$ransformers $\textbf{o}$n $\textbf{Graph}$s), a novel transformer-based geometric graph network designed specifically for unconventional crystalline systems, and UnconvBench, a comprehensive benchmark to evaluate models' predictive performance on unconventional crystal materials such as defected crystals, low-dimension crystals and MOF. CrysToGraph effectively captures short-range interactions with transformer-based graph convolution blocks as well as long-range interactions with graph-wise transformer blocks. CrysToGraph proofs its effectiveness in modelling unconventional crystal materials in multiple tasks, and moreover, it outperforms most existing methods, achieving new state-of-the-art results on the benchmarks of both unconventional crystals and traditional crystals.
Published: 2024

10. EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views

Author: Yang, Yuhang, Zhai, Wei, Wang, Chengfeng, Yu, Chengjun, Cao, Yang, and Zha, Zheng-Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception, facilitating applications like AR/VR and embodied AI. For the egocentric HOI, in addition to perceiving semantics e.g., ''what'' interaction is occurring, capturing ''where'' the interaction specifically manifests in 3D space is also crucial, which links the perception and operation. Existing methods primarily leverage observations of HOI to capture interaction regions from an exocentric view. However, incomplete observations of interacting parties in the egocentric view introduce ambiguity between visual observations and interaction contents, impairing their efficacy. From the egocentric view, humans integrate the visual cortex, cerebellum, and brain to internalize their intentions and interaction concepts of objects, allowing for the pre-formulation of interactions and making behaviors even when interaction regions are out of sight. In light of this, we propose harmonizing the visual appearance, head motion, and 3D object to excavate the object interaction concept and subject intention, jointly inferring 3D human contact and object affordance from egocentric videos. To achieve this, we present EgoChoir, which links object structures with interaction contexts inherent in appearance and head motion to reveal object affordance, further utilizing it to model human contact. Additionally, a gradient modulation is employed to adopt appropriate clues for capturing interaction regions across various egocentric scenarios. Moreover, 3D contact and affordance are annotated for egocentric videos collected from Ego-Exo4D and GIMO to support the task. Extensive experiments on them demonstrate the effectiveness and superiority of EgoChoir., Comment: NeurIPS2024, project: https://yyvhang.github.io/EgoChoir/
Published: 2024

11. ViViD: Video Virtual Try-on using Diffusion Models

Author: Fang, Zixun, Zhai, Wei, Su, Aimin, Song, Hongliang, Zhu, Kai, Wang, Mao, Chen, Yu, Liu, Zhiheng, Cao, Yang, and Zha, Zheng-Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Video virtual try-on aims to transfer a clothing item onto the video of a target person. Directly applying the technique of image-based try-on to the video domain in a frame-wise manner will cause temporal-inconsistent outcomes while previous video-based try-on solutions can only generate low visual quality and blurring results. In this work, we present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Specifically, we design the Garment Encoder to extract fine-grained clothing semantic features, guiding the model to capture garment details and inject them into the target video through the proposed attention feature fusion mechanism. To ensure spatial-temporal consistency, we introduce a lightweight Pose Encoder to encode pose signals, enabling the model to learn the interactions between clothing and human posture and insert hierarchical Temporal Modules into the text-to-image stable diffusion model for more coherent and lifelike video synthesis. Furthermore, we collect a new dataset, which is the largest, with the most diverse types of garments and the highest resolution for the task of video virtual try-on to date. Extensive experiments demonstrate that our approach is able to yield satisfactory video try-on results. The dataset, codes, and weights will be publicly available. Project page: https://becauseimbatman0.github.io/ViViD.
Published: 2024

12. Bidirectional Progressive Transformer for Interaction Intention Anticipation

Author: Zhang, Zichen, Luo, Hongchen, Zhai, Wei, Cao, Yang, and Kang, Yu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Interaction intention anticipation aims to jointly predict future hand trajectories and interaction hotspots. Existing research often treated trajectory forecasting and interaction hotspots prediction as separate tasks or solely considered the impact of trajectories on interaction hotspots, which led to the accumulation of prediction errors over time. However, a deeper inherent connection exists between hand trajectories and interaction hotspots, which allows for continuous mutual correction between them. Building upon this relationship, a novel Bidirectional prOgressive Transformer (BOT), which introduces a Bidirectional Progressive mechanism into the anticipation of interaction intention is established. Initially, BOT maximizes the utilization of spatial information from the last observation frame through the Spatial-Temporal Reconstruction Module, mitigating conflicts arising from changes of view in first-person videos. Subsequently, based on two independent prediction branches, a Bidirectional Progressive Enhancement Module is introduced to mutually improve the prediction of hand trajectories and interaction hotspots over time to minimize error accumulation. Finally, acknowledging the intrinsic randomness in human natural behavior, we employ a Trajectory Stochastic Unit and a C-VAE to introduce appropriate uncertainty to trajectories and interaction hotspots, respectively. Our method achieves state-of-the-art results on three benchmark datasets Epic-Kitchens-100, EGO4D, and EGTEA Gaze+, demonstrating superior in complex scenarios.
Published: 2024

13. SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis

Author: Qi, Hongzhi, Liu, Hanfei, Li, Jianqiang, Zhao, Qing, Zhai, Wei, Luo, Dan, He, Tian Yu, Liu, Shuo, Yang, Bing Xiang, and Fu, Guanghui
Subjects: Computer Science - Computation and Language
Abstract: In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is a promising solution, but there is a notable lack of relevant datasets, especially in the Chinese context. To address this gap, this study presents a Chinese social media dataset designed for fine-grained suicide risk classification, focusing on indicators such as expressions of suicide intent, methods of suicide, and urgency of timing. Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10. In our experiments, deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%. However, the results for fine-grained suicide risk classification were still unsatisfactory, with an weighted F1 score of 50.89%. To address the issues of data imbalance and limited dataset size, we investigated both traditional and advanced, large language model based data augmentation techniques, demonstrating that data augmentation can enhance model performance by up to 4.65% points in F1-score. Notably, the Chinese MentalBERT model, which was pre-trained on psychological domain data, shows superior performance in both tasks. This study provides valuable insights for automatic identification of suicidal individuals, facilitating timely psychological intervention on social media platforms. The source code and data are publicly available.
Published: 2024

14. MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking

Author: Wang, Zhong, Wan, Zengyu, Han, Han, Liao, Bohao, Wu, Yuliang, Zhai, Wei, Cao, Yang, and Zha, Zheng-jun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy provided by the event camera. However, the diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization. To achieve a stable event-based eye-tracking system, this paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information in response to the variability of eye movements. Specifically, the MambaPupil network is proposed, which consists of the multi-layer convolutional encoder to extract features from the event representations, a bidirectional Gated Recurrent Unit (GRU), and a Linear Time-Varying State Space Module (LTV-SSM), to selectively capture contextual correlation from the forward and backward temporal relationship. Furthermore, the Bina-rep is utilized as a compact event representation, and the tailor-made data augmentation, called as Event-Cutout, is proposed to enhance the model's robustness by applying spatial random masking to the event image. The evaluation on the ThreeET-plus benchmark shows the superior performance of the MambaPupil, which secured the 1st place in CVPR'2024 AIS Event-based Eye Tracking challenge., Comment: Accepted by CVPR 2024 Workshop (AIS: Vision, Graphics and AI for Streaming), top solution of challenge Event-based Eye Tracking, see https://www.kaggle.com/competitions/event-based-eye-tracking-ais2024
Published: 2024

15. Event-Based Eye Tracking. AIS 2024 Challenge Survey

Author: Wang, Zuowen, Gao, Chang, Wu, Zongwei, Conde, Marcos V., Timofte, Radu, Liu, Shih-Chii, Chen, Qinyu, Zha, Zheng-jun, Zhai, Wei, Han, Han, Liao, Bohao, Wu, Yuliang, Wan, Zengyu, Wang, Zhong, Cao, Yang, Tan, Ganchao, Chen, Jinze, Pei, Yan Ru, Brüers, Sasskia, Crouzet, Sébastien, McLelland, Douglas, Coenen, Oliver, Zhang, Baoheng, Gao, Yizhao, Li, Jingyuan, So, Hayden Kwok-Hay, Bich, Philippe, Boretti, Chiara, Prono, Luciano, Lică, Mircea, Dinucu-Jianu, David, Grîu, Cătălin, Lin, Xiaopeng, Ren, Hongwei, Cheng, Bojun, Zhang, Xinan, Vial, Valentin, Yezzi, Anthony, and Tsai, James
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggle competition, and 8 teams submitted a challenge factsheet. The novel and diverse methods from the submitted factsheets are reviewed and analyzed in this survey to advance future event-based eye tracking research., Comment: Qinyu Chen is the corresponding author
Published: 2024

16. AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts

Author: Jiang, Meng, Yu, Yi Jing, Zhao, Qing, Li, Jianqiang, Song, Changwei, Qi, Hongzhi, Zhai, Wei, Luo, Dan, Wang, Xiaoqin, Fu, Guanghui, and Yang, Bing Xiang
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including suicidal behaviors in extreme cases. Yet, there is a notable absence of methodologies for analyzing cognitive pathways that could aid psychotherapists in conducting effective interventions online. In this study, we gathered data from social media and established the task of extracting cognitive pathways, annotating the data based on a cognitive theoretical framework. We initially categorized the task of extracting cognitive pathways as a hierarchical text classification with four main categories and nineteen subcategories. Following this, we structured a text summarization task to help psychotherapists quickly grasp the essential information. Our experiments evaluate the performance of deep learning and large language models (LLMs) on these tasks. The results demonstrate that our deep learning method achieved a micro-F1 score of 62.34% in the hierarchical text classification task. Meanwhile, in the text summarization task, GPT-4 attained a Rouge-1 score of 54.92 and a Rouge-2 score of 30.86, surpassing the experimental deep learning model's performance. However, it may suffer from an issue of hallucination. We have made all models and codes publicly available to support further research in this field.
Published: 2024

17. The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Author: Ren, Bin, Li, Yawei, Mehta, Nancy, Timofte, Radu, Yu, Hongyuan, Wan, Cheng, Hong, Yuxin, Han, Bingnan, Wu, Zhuoyuan, Zou, Yajun, Liu, Yuqing, Li, Jizhe, He, Keji, Fan, Chao, Zhang, Heng, Zhang, Xiaolin, Yin, Xuanwu, Zuo, Kunlong, Liao, Bohao, Xia, Peizhe, Peng, Long, Du, Zhibo, Di, Xin, Li, Wangkai, Wang, Yang, Zhai, Wei, Pei, Renjing, Guo, Jiaming, Xu, Songcen, Cao, Yang, Zha, Zhengjun, Wang, Yan, Liu, Yi, Wang, Qing, Zhang, Gang, Zhang, Liou, Zhao, Shijie, Sun, Long, Pan, Jinshan, Dong, Jiangxin, Tang, Jinhui, Liu, Xin, Yan, Min, Wang, Qian, Zhou, Menghan, Yan, Yiqiang, Liu, Yixuan, Chan, Wensong, Tang, Dehua, Zhou, Dong, Wang, Li, Tian, Lu, Emad, Barsoum, Jia, Bohan, Qiao, Junbo, Zhou, Yunshuai, Zhang, Yun, Li, Wei, Lin, Shaohui, Zhou, Shenglong, Chen, Binbin, Liao, Jincheng, Zhao, Suiyi, Zhang, Zhao, Wang, Bo, Luo, Yan, Wei, Yanyan, Li, Feng, Wang, Mingshen, Guan, Jinhan, Hu, Dehua, Yu, Jiawei, Xu, Qisheng, Sun, Tao, Lan, Long, Xu, Kele, Lin, Xin, Yue, Jingtong, Yang, Lehan, Du, Shiyi, Qi, Lu, Ren, Chao, Han, Zeyu, Wang, Yuhan, Chen, Chaolin, Li, Haobo, Zheng, Mingjun, Yang, Zhongbao, Song, Lianhong, Yan, Xingzhuo, Fu, Minghan, Zhang, Jingyi, Li, Baiang, Zhu, Qi, Xu, Xiaogang, Guo, Dan, Guo, Chunle, Chen, Jiadi, Long, Huanhuan, Duanmu, Chunjiang, Lei, Xiaoyan, Liu, Jie, Jia, Weilin, Cao, Weifeng, Zhang, Wenlong, Mao, Yanyu, Guo, Ruilong, Zhang, Nihao, Pandey, Manoj, Chernozhukov, Maksym, Le, Giang, Cheng, Shuli, Wang, Hongyuan, Wei, Ziyan, Tang, Qingting, Wang, Liejun, Li, Yongming, Guo, Yanhui, Xu, Hao, Khatami-Rizi, Akram, Mahmoudi-Aznaveh, Ahmad, Hsu, Chih-Chung, Lee, Chia-Ming, Chou, Yi-Shiuan, Joshi, Amogh, Akalwadi, Nikhil, Malagi, Sampada, Yashaswini, Palani, Desai, Chaitra, Tabib, Ramesh Ashok, Patil, Ujwala, and Mudenagudi, Uma
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/., Comment: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024
Published: 2024

18. Event-based Asynchronous HDR Imaging by Temporal Incident Light Modulation

Author: Wu, Yuliang, Tan, Ganchao, Chen, Jinze, Zhai, Wei, Cao, Yang, and Zha, Zheng-Jun
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Dynamic Range (DR) is a pivotal characteristic of imaging systems. Current frame-based cameras struggle to achieve high dynamic range imaging due to the conflict between globally uniform exposure and spatially variant scene illumination. In this paper, we propose AsynHDR, a Pixel-Asynchronous HDR imaging system, based on key insights into the challenges in HDR imaging and the unique event-generating mechanism of Dynamic Vision Sensors (DVS). Our proposed AsynHDR system integrates the DVS with a set of LCD panels. The LCD panels modulate the irradiance incident upon the DVS by altering their transparency, thereby triggering the pixel-independent event streams. The HDR image is subsequently decoded from the event streams through our temporal-weighted algorithm. Experiments under standard test platform and several challenging scenes have verified the feasibility of the system in HDR imaging task.
Published: 2024

19. Intention-driven Ego-to-Exo Video Generation

Author: Luo, Hongchen, Zhu, Kai, Zhai, Wei, and Cao, Yang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Ego-to-exo video generation refers to generating the corresponding exocentric video according to the egocentric video, providing valuable applications in AR/VR and embodied AI. Benefiting from advancements in diffusion model techniques, notable progress has been achieved in video generation. However, existing methods build upon the spatiotemporal consistency assumptions between adjacent frames, which cannot be satisfied in the ego-to-exo scenarios due to drastic changes in views. To this end, this paper proposes an Intention-Driven Ego-to-exo video generation framework (IDE) that leverages action intention consisting of human movement and action description as view-independent representation to guide video generation, preserving the consistency of content and motion. Specifically, the egocentric head trajectory is first estimated through multi-view stereo matching. Then, cross-view feature perception module is introduced to establish correspondences between exo- and ego- views, guiding the trajectory transformation module to infer human full-body movement from the head trajectory. Meanwhile, we present an action description unit that maps the action semantics into the feature space consistent with the exocentric image. Finally, the inferred human movement and high-level action descriptions jointly guide the generation of exocentric motion and interaction content (i.e., corresponding optical flow and occlusion maps) in the backward process of the diffusion model, ultimately warping them into the corresponding exocentric video. We conduct extensive experiments on the relevant dataset with diverse exo-ego video pairs, and our IDE outperforms state-of-the-art models in both subjective and objective assessments, demonstrating its efficacy in ego-to-exo video generation.
Published: 2024

20. Molecular imaging of renal cell carcinomas: ready for prime time

Author: Wu, Qianyun, Shao, Hongda, Zhai, Wei, Huang, Gang, Liu, Jianjun, Calais, Jeremie, and Wei, Weijun
Published: 2024
Full Text: View/download PDF

21. 1T′-transition metal dichalcogenide monolayers stabilized on 4H-Au nanowires for ultrasensitive SERS detection

Author: Li, Zijian, Zhai, Li, Zhang, Qinghua, Zhai, Wei, Li, Pai, Chen, Bo, Chen, Changsheng, Yao, Yao, Ge, Yiyao, Yang, Hua, Qiao, Panzhe, Kang, Jianing, Shi, Zhenyu, Zhang, An, Wang, Hongyi, Liang, Jinzhe, Liu, Jiawei, Guan, Zhiqiang, Liao, Lingwen, Neacșu, Vlad Andrei, Ma, Chen, Chen, Ye, Zhu, Ye, Lee, Chun-Sing, Ma, Lu, Du, Yonghua, Gu, Lin, Li, Jian-Feng, Tian, Zhong-Qun, Ding, Feng, and Zhang, Hua
Published: 2024
Full Text: View/download PDF

22. Unlocking oxygen vacancy-rich high-entropy oxides in upgrading composite solid electrolyte

Author: Cheng, Jun, Ci, Nai-Xuan, Zhang, Hong-Qiang, Zeng, Zhen, Zhou, Xuan, Li, Yuan-Yuan, Qiu, Hua-Jun, Zhai, Wei, Gao, Dan-Dan, Ci, Li-Jie, and Li, De-Ping
Published: 2024
Full Text: View/download PDF

23. Chinese MentalBERT: Domain-Adaptive Pre-training on Social Media for Chinese Mental Health Text Analysis

Author: Zhai, Wei, Qi, Hongzhi, Zhao, Qing, Li, Jianqiang, Wang, Ziqi, Wang, Han, Yang, Bing Xiang, and Fu, Guanghui
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In the current environment, psychological issues are prevalent and widespread, with social media serving as a key outlet for individuals to share their feelings. This results in the generation of vast quantities of data daily, where negative emotions have the potential to precipitate crisis situations. There is a recognized need for models capable of efficient analysis. While pre-trained language models have demonstrated their effectiveness broadly, there's a noticeable gap in pre-trained models tailored for specialized domains like psychology. To address this, we have collected a huge dataset from Chinese social media platforms and enriched it with publicly available datasets to create a comprehensive database encompassing 3.36 million text entries. To enhance the model's applicability to psychological text analysis, we integrated psychological lexicons into the pre-training masking mechanism. Building on an existing Chinese language model, we performed adaptive training to develop a model specialized for the psychological domain. We evaluated our model's performance across six public datasets, where it demonstrated improvements compared to eight other models. Additionally, in the qualitative comparison experiment, our model provided psychologically relevant predictions given the masked sentences. Due to concerns regarding data privacy, the dataset will not be made publicly available. However, we have made the pre-trained models and codes publicly accessible to the community via: https://github.com/zwzzzQAQ/Chinese-MentalBERT.
Published: 2024

24. LEMON: Learning 3D Human-Object Interaction Relation from 2D Images

Author: Yang, Yuhang, Zhai, Wei, Luo, Hongchen, Cao, Yang, and Zha, Zheng-Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Learning 3D human-object interaction relation is pivotal to embodied AI and interaction modeling. Most existing methods approach the goal by learning to predict isolated interaction elements, e.g., human contact, object affordance, and human-object spatial relation, primarily from the perspective of either the human or the object. Which underexploit certain correlations between the interaction counterparts (human and object), and struggle to address the uncertainty in interactions. Actually, objects' functionalities potentially affect humans' interaction intentions, which reveals what the interaction is. Meanwhile, the interacting humans and objects exhibit matching geometric structures, which presents how to interact. In light of this, we propose harnessing these inherent correlations between interaction counterparts to mitigate the uncertainty and jointly anticipate the above interaction elements in 3D space. To achieve this, we present LEMON (LEarning 3D huMan-Object iNteraction relation), a unified model that mines interaction intentions of the counterparts and employs curvatures to guide the extraction of geometric correlations, combining them to anticipate the interaction elements. Besides, the 3D Interaction Relation dataset (3DIR) is collected to serve as the test bed for training and evaluation. Extensive experiments demonstrate the superiority of LEMON over methods estimating each element in isolation., Comment: accept by CVPR2024
Published: 2023

25. Likelihood-Aware Semantic Alignment for Full-Spectrum Out-of-Distribution Detection

Author: Lu, Fan, Zhu, Kai, Zheng, Kecheng, Zhai, Wei, and Cao, Yang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Full-spectrum out-of-distribution (F-OOD) detection aims to accurately recognize in-distribution (ID) samples while encountering semantic and covariate shifts simultaneously. However, existing out-of-distribution (OOD) detectors tend to overfit the covariance information and ignore intrinsic semantic correlation, inadequate for adapting to complex domain transformations. To address this issue, we propose a Likelihood-Aware Semantic Alignment (LSA) framework to promote the image-text correspondence into semantically high-likelihood regions. LSA consists of an offline Gaussian sampling strategy which efficiently samples semantic-relevant visual embeddings from the class-conditional Gaussian distribution, and a bidirectional prompt customization mechanism that adjusts both ID-related and negative context for discriminative ID/OOD boundary. Extensive experiments demonstrate the remarkable OOD detection performance of our proposed LSA especially on the intractable Near-OOD setting, surpassing existing methods by a margin of $15.26\%$ and $18.88\%$ on two F-OOD benchmarks, respectively., Comment: 16 pages, 7 figures
Published: 2023

26. Bidirectional Progressive Transformer for Interaction Intention Anticipation

Author: Zhang, Zichen, Luo, Hongchen, Zhai, Wei, Cao, Yang, Kang, Yu, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

27. Non-coding RNAs Function in Periodontal Ligament Stem Cells

Author: Zhai, Wei, Gao, Jie, Qin, Wen, and Xu, Yuerong
Published: 2024
Full Text: View/download PDF

28. ImmunoPET/CT imaging of clear cell renal cell carcinoma with [18F]RCCB6: a first-in-human study

Author: Wu, Qianyun, Wu, Yanfei, Zhang, You, Guan, Yihui, Huang, Gang, Xie, Fang, Liu, Jianjun, Zhai, Wei, and Wei, Weijun
Published: 2024
Full Text: View/download PDF

29. Autotomy and Regeneration of Appendages in Crustaceans: A Review

Author: Liu, Lei, Tao, Dandan, Wang, Chunlin, Fu, Yuanyuan, Wang, Sixiang, Huang, Xinlian, Zhai, Wei, and Song, Weiwei
Published: 2024
Full Text: View/download PDF

30. Grounded Affordance from Exocentric View

Author: Luo, Hongchen, Zhai, Wei, Zhang, Jing, Cao, Yang, and Tao, Dacheng
Published: 2024
Full Text: View/download PDF

31. Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

Author: Zhai, Wei, Wu, Pingyu, Zhu, Kai, Cao, Yang, Wu, Feng, and Zha, Zheng-Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. Recently, a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve pixel-level localization. While existing FPM-based methods use cross-entropy to evaluate the foreground prediction map and to guide the learning of the generator, this paper presents two astonishing experimental observations on the object localization learning process: For a trained network, as the foreground mask expands, 1) the cross-entropy converges to zero when the foreground mask covers only part of the object region. 2) The activation value continuously increases until the foreground mask expands to the object boundary. Therefore, to achieve a more effective localization performance, we argue for the usage of activation value to learn more object regions. In this paper, we propose a Background Activation Suppression (BAS) method. Specifically, an Activation Map Constraint (AMC) module is designed to facilitate the learning of generator by suppressing the background activation value. Meanwhile, by using foreground region guidance and area constraint, BAS can learn the whole region of the object. In the inference phase, we consider the prediction maps of different categories together to obtain the final localization results. Extensive experiments show that BAS achieves significant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets. In addition, our method also achieves state-of-the-art weakly supervised semantic segmentation performance on the PASCAL VOC 2012 and MS COCO 2014 datasets. Code and models are available at https://github.com/wpy1999/BAS-Extension., Comment: Accepted by IJCV. arXiv admin note: text overlap with arXiv:2112.00580
Published: 2023

32. Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media

Author: Qi, Hongzhi, Zhao, Qing, Li, Jianqiang, Song, Changwei, Zhai, Wei, Luo, Dan, Liu, Shuo, Yu, Yi Jing, Wang, Fan, Zou, Huijing, Yang, Bing Xiang, and Fu, Guanghui
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: On social media, users often express their personal feelings, which may exhibit cognitive distortions or even suicidal tendencies on certain specific topics. Early recognition of these signs is critical for effective psychological intervention. In this paper, we introduce two novel datasets from Chinese social media: SOS-HL-1K for suicidal risk classification and SocialCD-3K for cognitive distortions detection. The SOS-HL-1K dataset contained 1,249 posts and SocialCD-3K dataset was a multi-label classification dataset that containing 3,407 posts. We propose a comprehensive evaluation using two supervised learning methods and eight large language models (LLMs) on the proposed datasets. From the prompt engineering perspective, we experimented with two types of prompt strategies, including four zero-shot and five few-shot strategies. We also evaluated the performance of the LLMs after fine-tuning on the proposed tasks. The experimental results show that there is still a huge gap between LLMs relying only on prompt engineering and supervised learning. In the suicide classification task, this gap is 6.95% points in F1-score, while in the cognitive distortion task, the gap is even more pronounced, reaching 31.53% points in F1-score. However, after fine-tuning, this difference is significantly reduced. In the suicide and cognitive distortion classification tasks, the gap decreases to 4.31% and 3.14%, respectively. This research highlights the potential of LLMs in psychological contexts, but supervised learning remains necessary for more challenging tasks. All datasets and code are made available., Comment: 10 pages
Published: 2023

33. Robust and sensitive conductive nanocomposite hydrogel with bridge cross-linking-dominated hierarchical structural design.

Author: Li, Tian, Qi, Haobo, Zhao, Yijing, Kumar, Punit, Zhao, Cancan, Li, Zhenming, Dong, Xinyu, Guo, Xiao, Zhao, Miao, Li, Xinwei, Wang, Xudong, Ritchie, Robert, and Zhai, Wei
Abstract: Conductive hydrogels have a remarkable potential for applications in soft electronics and robotics, owing to their noteworthy attributes, including electrical conductivity, stretchability, biocompatibility, etc. However, the limited strength and toughness of these hydrogels have traditionally impeded their practical implementation. Inspired by the hierarchical architecture of high-performance biological composites found in nature, we successfully fabricate a robust and sensitive conductive nanocomposite hydrogel through self-assembly-induced bridge cross-linking of MgB2 nanosheets and polyvinyl alcohol hydrogels. By combining the hierarchical lamellar microstructure with robust molecular B─O─C covalent bonds, the resulting conductive hydrogel exhibits an exceptional strength and toughness. Moreover, the hydrogel demonstrates exceptional sensitivity (response/relaxation time, 20 milliseconds; detection lower limit, ~1 Pascal) under external deformation. Such characteristics enable the conductive hydrogel to exhibit superior performance in soft sensing applications. This study introduces a high-performance conductive hydrogel and opens up exciting possibilities for the development of soft electronics.
Published: 2024

34. Enhancing Psychological Counseling with Large Language Model: A Multifaceted Decision-Support System for Non-Professionals

Author: Fu, Guanghui, Zhao, Qing, Li, Jianqiang, Luo, Dan, Song, Changwei, Zhai, Wei, Liu, Shuo, Wang, Fan, Wang, Yan, Cheng, Lijuan, Zhang, Juan, and Yang, Bing Xiang
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: In the contemporary landscape of social media, an alarming number of users express negative emotions, some of which manifest as strong suicidal intentions. This situation underscores a profound need for trained psychological counselors who can enact effective mental interventions. However, the development of these professionals is often an imperative but time-consuming task. Consequently, the mobilization of non-professionals or volunteers in this capacity emerges as a pressing concern. Leveraging the capabilities of artificial intelligence, and in particular, the recent advances in large language models, offers a viable solution to this challenge. This paper introduces a novel model constructed on the foundation of large language models to fully assist non-professionals in providing psychological interventions on online user discourses. This framework makes it plausible to harness the power of non-professional counselors in a meaningful way. A comprehensive study was conducted involving ten professional psychological counselors of varying expertise, evaluating the system across five critical dimensions. The findings affirm that our system is capable of analyzing patients' issues with relative accuracy and proffering professional-level strategies recommendations, thereby enhancing support for non-professionals. This research serves as a compelling validation of the application of large language models in the field of psychology and lays the groundwork for a new paradigm of community-based mental health support.
Published: 2023

35. Multifunctional PVA/PNIPAM conductive hydrogel sensors enabled human-machine interaction intelligent rehabilitation training

Author: Zhao, Yanlong, Zhang, Xichong, Hao, Yilin, Zhao, Yinghe, Ding, Peng, Zhai, Wei, Dai, Kun, Zheng, Guoqiang, Liu, Chuntai, and Shen, Changyu
Published: 2024
Full Text: View/download PDF

36. ADAMTS4 exacerbates lung cancer progression via regulating c-Myc protein stability and activating MAPK signaling pathway

Author: Zhai, Wei, Yang, Wensheng, Ge, Jing, Xiao, Xuelian, Wu, Kang, She, Kelin, Zhou, Yu, Kong, Yi, Wu, Lin, Luo, Shiya, and Pu, Xingxiang
Published: 2024
Full Text: View/download PDF

37. Authentic or artificial intelligence? Faculty’s perspectives on the ChatGPT’s impact on U.S. urban planning Ph.D. programs

Author: Ye, Xinyue, Zhai, Wei, Du, Jiaxin, Van Zandt, Shannon, and Ye, Yuning
Published: 2024
Full Text: View/download PDF

38. Impact-resistant supercapacitor by hydrogel-infused lattice

Author: Zhou, Shixiang, Zhao, Yijing, Zhang, Kaixi, Xun, Yanran, Tao, Xueyu, Yan, Wentao, Zhai, Wei, and Ding, Jun
Published: 2024
Full Text: View/download PDF

39. Nitrogen Removal Performance and Microbial Community Structure of IMTA Ponds (Apostistius japonicus-Penaeus japonicus-Ulva)

Author: Chen, Daiqiang, Tian, Chen, Yuan, Haiqing, Zhai, Wei, and Chang, Zhiqiang
Published: 2024
Full Text: View/download PDF

40. 3D printable strong and tough composite organo-hydrogels inspired by natural hierarchical composite design principles

Author: Liu, Quyang, Dong, Xinyu, Qi, Haobo, Zhang, Haoqi, Li, Tian, Zhao, Yijing, Li, Guanjin, and Zhai, Wei
Published: 2024
Full Text: View/download PDF

41. Clinical study of electroacupuncture plus stuck-needle lifting method for intractable facial paralysis

Author: Fan, Li, Yang, Qianyun, and Zhai, Wei
Published: 2024
Full Text: View/download PDF

42. Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

Author: Zhai, Wei, Wu, Pingyu, Zhu, Kai, Cao, Yang, Wu, Feng, and Zha, Zheng-Jun
Published: 2024
Full Text: View/download PDF

43. Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection

Author: Lu, Fan, Zhu, Kai, Zhai, Wei, Zheng, Kecheng, and Cao, Yang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Semantically coherent out-of-distribution (SCOOD) detection aims to discern outliers from the intended data distribution with access to unlabeled extra set. The coexistence of in-distribution and out-of-distribution samples will exacerbate the model overfitting when no distinction is made. To address this problem, we propose a novel uncertainty-aware optimal transport scheme. Our scheme consists of an energy-based transport (ET) mechanism that estimates the fluctuating cost of uncertainty to promote the assignment of semantic-agnostic representation, and an inter-cluster extension strategy that enhances the discrimination of semantic property among different clusters by widening the corresponding margin distance. Furthermore, a T-energy score is presented to mitigate the magnitude gap between the parallel transport and classifier branches. Extensive experiments on two standard SCOOD benchmarks demonstrate the above-par OOD detection performance, outperforming the state-of-the-art methods by a margin of 27.69% and 34.4% on FPR@95, respectively., Comment: Accepted by CVPR2023
Published: 2023

44. Spatial-Aware Token for Weakly Supervised Object Localization

Author: Wu, Pingyu, Zhai, Wei, Cao, Yang, Luo, Jiebo, and Zha, Zheng-Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Weakly supervised object localization (WSOL) is a challenging task aiming to localize objects with only image-level supervision. Recent works apply visual transformer to WSOL and achieve significant success by exploiting the long-range feature dependency in self-attention mechanism. However, existing transformer-based methods synthesize the classification feature maps as the localization map, which leads to optimization conflicts between classification and localization tasks. To address this problem, we propose to learn a task-specific spatial-aware token (SAT) to condition localization in a weakly supervised manner. Specifically, a spatial token is first introduced in the input space to aggregate representations for localization task. Then a spatial aware attention module is constructed, which allows spatial token to generate foreground probabilities of different patches by querying and to extract localization knowledge from the classification task. Besides, for the problem of sparse and unbalanced pixel-level supervision obtained from the image-level label, two spatial constraints, including batch area loss and normalization loss, are designed to compensate and enhance this supervision. Experiments show that the proposed SAT achieves state-of-the-art performance on both CUB-200 and ImageNet, with 98.45% and 73.13% GT-known Loc, respectively. Even under the extreme setting of using only 1 image per class from ImageNet for training, SAT already exceeds the SOTA method by 2.1% GT-known Loc. Code and models are available at https://github.com/wpy1999/SAT., Comment: Accepted by ICCV 2023. Code:https://github.com/wpy1999/SAT
Published: 2023

45. Grounding 3D Object Affordance from 2D Interactions in Images

Author: Yang, Yuhang, Zhai, Wei, Luo, Hongchen, Cao, Yang, Luo, Jiebo, and Zha, Zheng-Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Grounding 3D object affordance seeks to locate objects' ''action possibilities'' regions in the 3D space, which serves as a link between perception and operation for embodied agents. Existing studies primarily focus on connecting visual affordances with geometry structures, e.g. relying on annotations to declare interactive regions of interest on the object and establishing a mapping between the regions and affordances. However, the essence of learning object affordance is to understand how to use it, and the manner that detaches interactions is limited in generalization. Normally, humans possess the ability to perceive object affordances in the physical world through demonstration images or videos. Motivated by this, we introduce a novel task setting: grounding 3D object affordance from 2D interactions in images, which faces the challenge of anticipating affordance through interactions of different sources. To address this problem, we devise a novel Interaction-driven 3D Affordance Grounding Network (IAG), which aligns the region feature of objects from different sources and models the interactive contexts for 3D object affordance grounding. Besides, we collect a Point-Image Affordance Dataset (PIAD) to support the proposed task. Comprehensive experiments on PIAD demonstrate the reliability of the proposed task and the superiority of our method. The project is available at https://github.com/yyvhang/IAGNet., Comment: ICCV2023, camera-ready version
Published: 2023

46. Examining disaster resilience perception of social media users during the billion-dollar hurricanes

Author: Zhai, Wei, Hu, Wanyang, Yuan, Zhihang, and Li, Yantong
Published: 2024
Full Text: View/download PDF

47. Predictive genomic biomarkers of therapeutic effects in renal cell carcinoma

Author: Yan, Weijie, Hou, Naiqiao, Zheng, Junhua, and Zhai, Wei
Published: 2023
Full Text: View/download PDF

48. Grounded Affordance from Exocentric View

Author: Luo, Hongchen, Zhai, Wei, Zhang, Jing, Cao, Yang, and Tao, Dacheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Affordance grounding aims to locate objects' "action possibilities" regions, which is an essential step toward embodied intelligence. Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels. Human has the ability that transforms the various exocentric interactions into invariant egocentric affordance to counter the impact of interactive diversity. To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i.e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision. However, there is some "interaction bias" between personas, mainly regarding different regions and different views. To this end, we devise a cross-view affordance knowledge transfer framework that extracts affordance-specific features from exocentric interactions and transfers them to the egocentric view. Specifically, the perception of affordance regions is enhanced by preserving affordance co-relations. In addition, an affordance grounding dataset named AGD20K is constructed by collecting and labeling over 20K images from $36$ affordance categories. Experimental results demonstrate that our method outperforms the representative models regarding objective metrics and visual quality. Code is released at https://github.com/lhc1224/Cross-view-affordance-grounding., Comment: arXiv admin note: text overlap with arXiv:2203.09905
Published: 2022

49. Learning Affordance Grounding from Exocentric Images

Author: Luo, Hongchen, Zhai, Wei, Zhang, Jing, Cao, Yang, and Tao, Dacheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Affordance grounding, a task to ground (i.e., localize) action possibility region in objects, which faces the challenge of establishing an explicit link with object parts due to the diversity of interactive affordance. Human has the ability that transform the various exocentric interactions to invariant egocentric affordance so as to counter the impact of interactive diversity. To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i.e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision. To this end, we devise a cross-view knowledge transfer framework that extracts affordance-specific features from exocentric interactions and enhances the perception of affordance regions by preserving affordance correlation. Specifically, an Affordance Invariance Mining module is devised to extract specific clues by minimizing the intra-class differences originated from interaction habits in exocentric images. Besides, an Affordance Co-relation Preserving strategy is presented to perceive and localize affordance by aligning the co-relation matrix of predicted results between the two views. Particularly, an affordance grounding dataset named AGD20K is constructed by collecting and labeling over 20K images from 36 affordance categories. Experimental results demonstrate that our method outperforms the representative models in terms of objective metrics and visual quality. Code: github.com/lhc1224/Cross-View-AG., Comment: CVPR2022
Published: 2022

50. Location-Free Camouflage Generation Network

Author: Li, Yangyang, Zhai, Wei, Cao, Yang, and Zha, Zheng-jun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Camouflage is a common visual phenomenon, which refers to hiding the foreground objects into the background images, making them briefly invisible to the human eye. Previous work has typically been implemented by an iterative optimization process. However, these methods struggle in 1) efficiently generating camouflage images using foreground and background with arbitrary structure; 2) camouflaging foreground objects to regions with multiple appearances (e.g. the junction of the vegetation and the mountains), which limit their practical application. To address these problems, this paper proposes a novel Location-free Camouflage Generation Network (LCG-Net) that fuse high-level features of foreground and background image, and generate result by one inference. Specifically, a Position-aligned Structure Fusion (PSF) module is devised to guide structure feature fusion based on the point-to-point structure similarity of foreground and background, and introduce local appearance features point-by-point. To retain the necessary identifiable features, a new immerse loss is adopted under our pipeline, while a background patch appearance loss is utilized to ensure that the hidden objects look continuous and natural at regions with multiple appearances. Experiments show that our method has results as satisfactory as state-of-the-art in the single-appearance regions and are less likely to be completely invisible, but far exceed the quality of the state-of-the-art in the multi-appearance regions. Moreover, our method is hundreds of times faster than previous methods. Benefitting from the unique advantages of our method, we provide some downstream applications for camouflage generation, which show its potential. The related code and dataset will be released at https://github.com/Tale17/LCG-Net.
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

2,079 results on '"Zhai, Wei"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources