Author: "Yu, Weihao" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yu, Weihao"' showing total 167 results

Start Over Author "Yu, Weihao"

167 results on '"Yu, Weihao"'

1. Attention Prompting on Image for Large Vision-Language Models

Author: Yu, Runpeng, Yu, Weihao, and Wang, Xinchao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Compared with Large Language Models (LLMs), Large Vision-Language Models (LVLMs) can also accept images as input, thus showcasing more interesting emergent capabilities and demonstrating impressive performance on various vision-language tasks. Motivated by text prompting in LLMs, visual prompting has been explored to enhance LVLMs' capabilities of perceiving visual information. However, previous visual prompting techniques solely process visual inputs without considering text queries, limiting the models' ability to follow text instructions to complete tasks. To fill this gap, in this work, we propose a new prompting technique named Attention Prompting on Image, which just simply overlays a text-query-guided attention heatmap on the original input image and effectively enhances LVLM on various tasks. Specifically, we generate an attention heatmap for the input image dependent on the text query with an auxiliary model like CLIP. Then the heatmap simply multiplies the pixel values of the original image to obtain the actual input image for the LVLM. Extensive experiments on various vison-language benchmarks verify the effectiveness of our technique. For example, Attention Prompting on Image improves LLaVA-1.5 by 3.8% and 2.9% on MM-Vet and LLaVA-Wild benchmarks, respectively., Comment: Website, see https://yu-rp.github.io/api-prompting
Published: 2024

2. LinFusion: 1 GPU, 1 Minute, 16K Image

Author: Liu, Songhua, Yu, Weihao, Tan, Zhenxiong, and Wang, Xinchao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Modern diffusion models, particularly those utilizing a Transformer-based UNet for denoising, rely heavily on self-attention operations to manage complex spatial relationships, thus achieving impressive generation performance. However, this existing paradigm faces significant challenges in generating high-resolution visual content due to its quadratic time and memory complexity with respect to the number of spatial tokens. To address this limitation, we aim at a novel linear attention mechanism as an alternative in this paper. Specifically, we begin our exploration from recently introduced models with linear complexity, e.g., Mamba2, RWKV6, Gated Linear Attention, etc, and identify two key features--attention normalization and non-causal inference--that enhance high-resolution visual generation performance. Building on these insights, we introduce a generalized linear attention paradigm, which serves as a low-rank approximation of a wide spectrum of popular linear token mixers. To save the training cost and better leverage pre-trained models, we initialize our models and distill the knowledge from pre-trained StableDiffusion (SD). We find that the distilled model, termed LinFusion, achieves performance on par with or superior to the original SD after only modest training, while significantly reducing time and memory complexity. Extensive experiments on SD-v1.5, SD-v2.1, and SD-XL demonstrate that LinFusion enables satisfactory and efficient zero-shot cross-resolution generation, accommodating ultra-resolution images like 16K on a single GPU. Moreover, it is highly compatible with pre-trained SD components and pipelines, such as ControlNet, IP-Adapter, DemoFusion, DistriFusion, etc, requiring no adaptation efforts. Codes are available at https://github.com/Huage001/LinFusion., Comment: Work in Progress. Codes are available at https://github.com/Huage001/LinFusion
Published: 2024

3. MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

Author: Yu, Weihao, Yang, Zhengyuan, Ren, Linfeng, Li, Linjie, Wang, Jianfeng, Lin, Kevin, Lin, Chung-Ching, Liu, Zicheng, Wang, Lijuan, and Wang, Xinchao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: MM-Vet, with open-ended vision-language questions targeting at evaluating integrated capabilities, has become one of the most popular benchmarks for large multimodal model evaluation. MM-Vet assesses six core vision-language (VL) capabilities: recognition, knowledge, spatial awareness, language generation, OCR, and math. However, its question format is restricted to single image-text pairs, lacking the interleaved image and text sequences prevalent in real-world scenarios. To address this limitation, we introduce MM-Vet v2, which includes a new VL capability called "image-text sequence understanding", evaluating models' ability to process VL sequences. Furthermore, we maintain the high quality of evaluation samples while further expanding the evaluation set size. Using MM-Vet v2 to benchmark large multimodal models, we found that Claude 3.5 Sonnet is the best model with a score of 71.8, slightly outperforming GPT-4o which scored 71.0. Among open-weight models, InternVL2-Llama3-76B leads with a score of 68.4., Comment: Extension of MM-Vet: arXiv:2308.02490
Published: 2024

4. KAN or MLP: A Fairer Comparison

Author: Yu, Runpeng, Yu, Weihao, and Wang, Xinchao
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: This paper does not introduce a novel method. Instead, it offers a fairer and more comprehensive comparison of KAN and MLP models across various tasks, including machine learning, computer vision, audio processing, natural language processing, and symbolic formula representation. Specifically, we control the number of parameters and FLOPs to compare the performance of KAN and MLP. Our main observation is that, except for symbolic formula representation tasks, MLP generally outperforms KAN. We also conduct ablation studies on KAN and find that its advantage in symbolic formula representation mainly stems from its B-spline activation function. When B-spline is applied to MLP, performance in symbolic formula representation significantly improves, surpassing or matching that of KAN. However, in other tasks where MLP already excels over KAN, B-spline does not substantially enhance MLP's performance. Furthermore, we find that KAN's forgetting issue is more severe than that of MLP in a standard class-incremental continual learning setting, which differs from the findings reported in the KAN paper. We hope these results provide insights for future research on KAN and other MLP alternatives. Project link: https://github.com/yu-rp/KANbeFair, Comment: Technical Report
Published: 2024

5. GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

Author: Li, Chenxin, Liu, Xinyu, Wang, Cheng, Liu, Yifan, Yu, Weihao, Shao, Jing, and Yuan, Yixuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advances in learning multi-modal representation have witnessed the success in biomedical domains. While established techniques enable handling multi-modal information, the challenges are posed when extended to various clinical modalities and practical modalitymissing setting due to the inherent modality gaps. To tackle these, we propose an innovative Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o), which embeds the numerous disparate clinical modalities into a unified representation, completes the deficient embedding of missing modality and reformulates the cross-modal learning with a graph-based aggregation. Specially, we establish a heterogeneous graph embedding to explicitly capture the diverse semantic properties on both the modality-specific features (nodes) and the cross-modal relations (edges). Then, we design a modality-prompted completion that enables completing the inadequate graph representation of missing modality through a graph prompting mechanism, which generates hallucination graphic topologies to steer the missing embedding towards the intact representation. Through the completed graph, we meticulously develop a knowledge-guided hierarchical cross-modal aggregation consisting of a global meta-path neighbouring to uncover the potential heterogeneous neighbors along the pathways driven by domain knowledge, and a local multi-relation aggregation module for the comprehensive cross-modal interaction across various heterogeneous relations. We assess the efficacy of our methodology on rigorous benchmarking experiments against prior state-of-the-arts. In a nutshell, GTP-4o presents an initial foray into the intriguing realm of embedding, relating and perceiving the heterogeneous patterns from various clinical modalities holistically via a graph theory. Project page: https://gtp-4-o.github.io/., Comment: Accepted by ECCV2024
Published: 2024

6. EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting

Author: Li, Chenxin, Feng, Brandon Y., Liu, Yifan, Liu, Hengyu, Wang, Cheng, Yu, Weihao, and Yuan, Yixuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D reconstruction of biological tissues from a collection of endoscopic images is a key to unlock various important downstream surgical applications with 3D capabilities. Existing methods employ various advanced neural rendering techniques for photorealistic view synthesis, but they often struggle to recover accurate 3D representations when only sparse observations are available, which is usually the case in real-world clinical scenarios. To tackle this {sparsity} challenge, we propose a framework leveraging the prior knowledge from multiple foundation models during the reconstruction process, dubbed as \textit{EndoSparse}. Experimental results indicate that our proposed strategy significantly improves the geometric and appearance quality under challenging sparse-view conditions, including using only three views. In rigorous benchmarking experiments against state-of-the-art methods, \textit{EndoSparse} achieves superior results in terms of accurate geometry, realistic appearance, and rendering efficiency, confirming the robustness to sparse-view limitations in endoscopic reconstruction. \textit{EndoSparse} signifies a steady step towards the practical deployment of neural 3D reconstruction in real-world clinical scenarios. Project page: https://endo-sparse.github.io/., Comment: Accpeted by MICCAI2024
Published: 2024

7. MambaOut: Do We Really Need Mamba for Vision?

Author: Yu, Weihao and Wang, Xinchao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism and subsequently applied to vision tasks. Nevertheless, the performance of Mamba for vision is often underwhelming when compared with convolutional and attention-based models. In this paper, we delve into the essence of Mamba, and conceptually conclude that Mamba is ideally suited for tasks with long-sequence and autoregressive characteristics. For vision tasks, as image classification does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks are also not autoregressive, yet they adhere to the long-sequence characteristic, so we believe it is still worthwhile to explore Mamba's potential for these tasks. To empirically verify our hypotheses, we construct a series of models named MambaOut through stacking Mamba blocks while removing their core token mixer, SSM. Experimental results strongly support our hypotheses. Specifically, our MambaOut model surpasses all visual Mamba models on ImageNet image classification, indicating that Mamba is indeed unnecessary for this task. As for detection and segmentation, MambaOut cannot match the performance of state-of-the-art visual Mamba models, demonstrating the potential of Mamba for long-sequence visual tasks. The code is available at https://github.com/yuweihao/MambaOut, Comment: Code: https://github.com/yuweihao/MambaOut
Published: 2024

8. TSA-Net: a temporal knowledge graph completion method with temporal-structural adaptation

Author: Xie, Ruzhong, Ruan, Ke, Huang, Bosong, Yu, Weihao, Xiao, Jing, and Huang, Jin
Published: 2024
Full Text: View/download PDF

9. GTP-4o: Modality-Prompted Heterogeneous Graph Learning for Omni-Modal Biomedical Representation

Author: Li, Chenxin, Liu, Xinyu, Wang, Cheng, Liu, Yifan, Yu, Weihao, Shao, Jing, Yuan, Yixuan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

10. MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

Author: Yu, Weihao, Yang, Zhengyuan, Li, Linjie, Wang, Jianfeng, Lin, Kevin, Liu, Zicheng, Wang, Xinchao, and Wang, Lijuan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: We propose MM-Vet, an evaluation benchmark that examines large multimodal models (LMMs) on complicated multimodal tasks. Recent LMMs have shown various intriguing abilities, such as solving math problems written on the blackboard, reasoning about events and celebrities in news images, and explaining visual jokes. Rapid model advancements pose challenges to evaluation benchmark development. Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking. To this end, we present MM-Vet, designed based on the insight that the intriguing ability to solve complicated tasks is often achieved by a generalist model being able to integrate different core vision-language (VL) capabilities. MM-Vet defines 6 core VL capabilities and examines the 16 integrations of interest derived from the capability combination. For evaluation metrics, we propose an LLM-based evaluator for open-ended outputs. The evaluator enables the evaluation across different question types and answer styles, resulting in a unified scoring metric. We evaluate representative LMMs on MM-Vet, providing insights into the capabilities of different LMM system paradigms and models. Code and data are available at https://github.com/yuweihao/MM-Vet., Comment: Add results of GPT-4V. Code, data and leaderboard: https://github.com/yuweihao/MM-Vet
Published: 2023

11. Neighborhood-enhanced contrast for pre-training graph neural networks

Author: Li, Yichun, Huang, Jin, Yu, Weihao, and Zhang, Tinghua
Published: 2024
Full Text: View/download PDF

12. Two-stage Denoising Diffusion Model for Source Localization in Graph Inverse Problems

Author: Huang, Bosong, Yu, Weihao, Xie, Ruzhong, Xiao, Jing, and Huang, Jin
Subjects: Computer Science - Machine Learning
Abstract: Source localization is the inverse problem of graph information dissemination and has broad practical applications. However, the inherent intricacy and uncertainty in information dissemination pose significant challenges, and the ill-posed nature of the source localization problem further exacerbates these challenges. Recently, deep generative models, particularly diffusion models inspired by classical non-equilibrium thermodynamics, have made significant progress. While diffusion models have proven to be powerful in solving inverse problems and producing high-quality reconstructions, applying them directly to the source localization is infeasible for two reasons. Firstly, it is impossible to calculate the posterior disseminated results on a large-scale network for iterative denoising sampling, which would incur enormous computational costs. Secondly, in the existing methods for this field, the training data itself are ill-posed (many-to-one); thus simply transferring the diffusion model would only lead to local optima. To address these challenges, we propose a two-stage optimization framework, the source localization denoising diffusion model (SL-Diff). In the coarse stage, we devise the source proximity degrees as the supervised signals to generate coarse-grained source predictions. This aims to efficiently initialize the next stage, significantly reducing its convergence time and calibrating the convergence process. Furthermore, the introduction of cascade temporal information in this training method transforms the many-to-one mapping relationship into a one-to-one relationship, perfectly addressing the ill-posed problem. In the fine stage, we design a diffusion model for the graph inverse problem that can quantify the uncertainty in the dissemination. The proposed SL-Diff yields excellent prediction results within a reasonable sampling time at extensive experiments.
Published: 2023

13. InceptionNeXt: When Inception Meets ConvNeXt

Author: Yu, Weihao, Zhou, Pan, Yan, Shuicheng, and Wang, Xinchao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7x7 depthwise convolution. Although such depthwise operator only consumes a few FLOPs, it largely harms the model efficiency on powerful computing devices due to the high memory access costs. For example, ConvNeXt-T has similar FLOPs with ResNet-50 but only achieves 60% throughputs when trained on A100 GPUs with full precision. Although reducing the kernel size of ConvNeXt can improve speed, it results in significant performance degradation. It is still unclear how to speed up large-kernel-based CNN models while preserving their performance. To tackle this issue, inspired by Inceptions, we propose to decompose large-kernel depthwise convolution into four parallel branches along channel dimension, i.e. small square kernel, two orthogonal band kernels, and an identity mapping. With this new Inception depthwise convolution, we build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance. For instance, InceptionNeXt-T achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K. We anticipate InceptionNeXt can serve as an economical baseline for future architecture design to reduce carbon footprint. Code is available at https://github.com/sail-sg/inceptionnext., Comment: Code: https://github.com/sail-sg/inceptionnext
Published: 2023

14. Lorentz Equivariant Model for Knowledge-Enhanced Hyperbolic Collaborative Filtering

Author: Huang, Bosong, Yu, Weihao, Xie, Ruzhong, Xiao, Jing, and Huang, Jin
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Social and Information Networks
Abstract: Introducing prior auxiliary information from the knowledge graph (KG) to assist the user-item graph can improve the comprehensive performance of the recommender system. Many recent studies show that the ensemble properties of hyperbolic spaces fit the scale-free and hierarchical characteristics exhibited in the above two types of graphs well. However, existing hyperbolic methods ignore the consideration of equivariance, thus they cannot generalize symmetric features under given transformations, which seriously limits the capability of the model. Moreover, they cannot balance preserving the heterogeneity and mining the high-order entity information to users across two graphs. To fill these gaps, we propose a rigorously Lorentz group equivariant knowledge-enhanced collaborative filtering model (LECF). Innovatively, we jointly update the attribute embeddings (containing the high-order entity signals from the KG) and hyperbolic embeddings (the distance between hyperbolic embeddings reveals the recommendation tendency) by the LECF layer with Lorentz Equivariant Transformation. Moreover, we propose Hyperbolic Sparse Attention Mechanism to sample the most informative neighbor nodes. Lorentz equivariance is strictly maintained throughout the entire model, and enforcing equivariance is proven necessary experimentally. Extensive experiments on three real-world benchmarks demonstrate that LECF remarkably outperforms state-of-the-art methods., Comment: 11 pages, 6 figures
Published: 2023

15. MetaFormer Baselines for Vision

Author: Yu, Weihao, Si, Chenyang, Zhou, Pan, Luo, Mi, Zhou, Yichen, Feng, Jiashi, Yan, Shuicheng, and Wang, Xinchao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: MetaFormer, the abstracted architecture of Transformer, has been found to play a significant role in achieving competitive performance. In this paper, we further explore the capacity of MetaFormer, again, without focusing on token mixer design: we introduce several baseline models under MetaFormer using the most basic or common mixers, and summarize our observations as follows: (1) MetaFormer ensures solid lower bound of performance. By merely adopting identity mapping as the token mixer, the MetaFormer model, termed IdentityFormer, achieves >80% accuracy on ImageNet-1K. (2) MetaFormer works well with arbitrary token mixers. When specifying the token mixer as even a random matrix to mix tokens, the resulting model RandFormer yields an accuracy of >81%, outperforming IdentityFormer. Rest assured of MetaFormer's results when new token mixers are adopted. (3) MetaFormer effortlessly offers state-of-the-art results. With just conventional token mixers dated back five years ago, the models instantiated from MetaFormer already beat state of the art. (a) ConvFormer outperforms ConvNeXt. Taking the common depthwise separable convolutions as the token mixer, the model termed ConvFormer, which can be regarded as pure CNNs, outperforms the strong CNN model ConvNeXt. (b) CAFormer sets new record on ImageNet-1K. By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85.5% at 224x224 resolution, under normal supervised training without external data or distillation. In our expedition to probe MetaFormer, we also find that a new activation, StarReLU, reduces 71% FLOPs of activation compared with GELU yet achieves better performance. We expect StarReLU to find great potential in MetaFormer-like models alongside other neural networks., Comment: Accepted to TPAMI. Code: https://github.com/sail-sg/metaformer
Published: 2022
Full Text: View/download PDF

16. ODformer: Spatial-Temporal Transformers for Long Sequence Origin-Destination Matrix Forecasting Against Cross Application Scenario

Author: Huang, Jin, Huang, Bosong, Yu, Weihao, Xiao, Jing, Xie, Ruzhong, and Ruan, Ke
Subjects: Computer Science - Artificial Intelligence
Abstract: Origin-Destination (OD) matrices record directional flow data between pairs of OD regions. The intricate spatiotemporal dependency in the matrices makes the OD matrix forecasting (ODMF) problem not only intractable but also non-trivial. However, most of the related methods are designed for very short sequence time series forecasting in specific application scenarios, which cannot meet the requirements of the variation in scenarios and forecasting length of practical applications. To address these issues, we propose a Transformer-like model named ODformer, with two salient characteristics: (i) the novel OD Attention mechanism, which captures special spatial dependencies between OD pairs of the same origin (destination), greatly improves the ability of the model to predict cross-application scenarios after combining with 2D-GCN that captures spatial dependencies between OD regions. (ii) a PeriodSparse Self-attention that effectively forecasts long sequence OD matrix series while adapting to the periodic differences in different scenarios. Generous experiments in three application backgrounds (i.e., transportation traffic, IP backbone network traffic, crowd flow) show our method outperforms the state-of-the-art methods.
Published: 2022

17. Re-thinking and Re-labeling LIDC-IDRI for Robust Pulmonary Cancer Prediction

Author: Zhang, Hanxiao, Gu, Xiao, Zhang, Minghui, Yu, Weihao, Chen, Liang, Wang, Zhexin, Yao, Feng, Gu, Yun, and Yang, Guang-Zhong
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: The LIDC-IDRI database is the most popular benchmark for lung cancer prediction. However, with subjective assessment from radiologists, nodules in LIDC may have entirely different malignancy annotations from the pathological ground truth, introducing label assignment errors and subsequent supervision bias during training. The LIDC database thus requires more objective labels for learning-based cancer prediction. Based on an extra small dataset containing 180 nodules diagnosed by pathological examination, we propose to re-label LIDC data to mitigate the effect of original annotation bias verified on this robust benchmark. We demonstrate in this paper that providing new labels by similar nodule retrieval based on metric learning would be an effective re-labeling strategy. Training on these re-labeled LIDC nodules leads to improved model performance, which is enhanced when new labels of uncertain nodules are added. We further infer that re-labeling LIDC is current an expedient way for robust lung cancer prediction while building a large pathological-proven nodule database provides the long-term solution.
Published: 2022

18. Inception Transformer

Author: Si, Chenyang, Yu, Weihao, Zhou, Pan, Zhou, Yichen, Wang, Xinchao, and Yan, Shuicheng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information. To tackle this issue, we present a novel and general-purpose Inception Transformer, or iFormer for short, that effectively learns comprehensive features with both high- and low-frequency information in visual data. Specifically, we design an Inception mixer to explicitly graft the advantages of convolution and max-pooling for capturing the high-frequency information to Transformers. Different from recent hybrid frameworks, the Inception mixer brings greater efficiency through a channel splitting mechanism to adopt parallel convolution/max-pooling path and self-attention path as high- and low-frequency mixers, while having the flexibility to model discriminative information scattered within a wide frequency range. Considering that bottom layers play more roles in capturing high-frequency details while top layers more in modeling low-frequency global information, we further introduce a frequency ramp structure, i.e. gradually decreasing the dimensions fed to the high-frequency mixer and increasing those to the low-frequency mixer, which can effectively trade-off high- and low-frequency components across different layers. We benchmark the iFormer on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection and ADE20K segmentation. For example, our iFormer-S hits the top-1 accuracy of 83.4% on ImageNet-1K, much higher than DeiT-S by 3.6%, and even slightly better than much bigger model Swin-B (83.3%) with only 1/4 parameters and 1/3 FLOPs. Code and models will be released at https://github.com/sail-sg/iFormer., Comment: Code and models will be released at https://github.com/sail-sg/iFormer
Published: 2022

19. Mugs: A Multi-Granular Self-Supervised Learning Framework

Author: Zhou, Pan, Zhou, Yichen, Si, Chenyang, Yu, Weihao, Ng, Teck Khim, and Yan, Shuicheng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: In self-supervised learning, multi-granular features are heavily desired though rarely investigated, as different downstream tasks (e.g., general and fine-grained classification) often require different or multi-granular features, e.g.~fine- or coarse-grained one or their mixture. In this work, for the first time, we propose an effective MUlti-Granular Self-supervised learning (Mugs) framework to explicitly learn multi-granular visual features. Mugs has three complementary granular supervisions: 1) an instance discrimination supervision (IDS), 2) a novel local-group discrimination supervision (LGDS), and 3) a group discrimination supervision (GDS). IDS distinguishes different instances to learn instance-level fine-grained features. LGDS aggregates features of an image and its neighbors into a local-group feature, and pulls local-group features from different crops of the same image together and push them away for others. It provides complementary instance supervision to IDS via an extra alignment on local neighbors, and scatters different local-groups separately to increase discriminability. Accordingly, it helps learn high-level fine-grained features at a local-group level. Finally, to prevent similar local-groups from being scattered randomly or far away, GDS brings similar samples close and thus pulls similar local-groups together, capturing coarse-grained features at a (semantic) group level. Consequently, Mugs can capture three granular features that often enjoy higher generality on diverse downstream tasks over single-granular features, e.g.~instance-level fine-grained features in contrastive learning. By only pretraining on ImageNet-1K, Mugs sets new SoTA linear probing accuracy 82.1$\%$ on ImageNet-1K and improves previous SoTA by $1.1\%$. It also surpasses SoTAs on other tasks, e.g. transfer learning, detection and segmentation., Comment: code and models are available at https://github.com/sail-sg/mugs
Published: 2022

20. LTSP: Long-Term Slice Propagation for Accurate Airway Segmentation

Author: Wu, Yangqian, Zhang, Minghui, Yu, Weihao, Zheng, Hao, Xu, Jiasheng, and Gu, Yun
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Purpose: Bronchoscopic intervention is a widely-used clinical technique for pulmonary diseases, which requires an accurate and topological complete airway map for its localization and guidance. The airway map could be extracted from chest computed tomography (CT) scans automatically by airway segmentation methods. Due to the complex tree-like structure of the airway, preserving its topology completeness while maintaining the segmentation accuracy is a challenging task. Methods: In this paper, a long-term slice propagation (LTSP) method is proposed for accurate airway segmentation from pathological CT scans. We also design a two-stage end-to-end segmentation framework utilizing the LTSP method in the decoding process. Stage 1 is used to generate a coarse feature map by an encoder-decoder architecture. Stage 2 is to adopt the proposed LTSP method for exploiting the continuity information and enhancing the weak airway features in the coarse feature map. The final segmentation result is predicted from the refined feature map. Results: Extensive experiments were conducted to evaluate the performance of the proposed method on 70 clinical CT scans. The results demonstrate the considerable improvements of the proposed method compared to some state-of-the-art methods as most breakages are eliminated and more tiny bronchi are detected. The ablation studies further confirm the effectiveness of the constituents of the proposed method. Conclusion: Slice continuity information is beneficial to accurate airway segmentation. Furthermore, by propagating the long-term slice feature, the airway topology connectivity is preserved with overall segmentation accuracy maintained., Comment: Accepted by IPCAI 2022
Published: 2022

21. BREAK: Bronchi Reconstruction by gEodesic transformation And sKeleton embedding

Author: Yu, Weihao, Zheng, Hao, Zhang, Minghui, Zhang, Hanxiao, Sun, Jiayuan, and Yang, Jie
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Airway segmentation is critical for virtual bronchoscopy and computer-aided pulmonary disease analysis. In recent years, convolutional neural networks (CNNs) have been widely used to delineate the bronchial tree. However, the segmentation results of the CNN-based methods usually include many discontinuous branches, which need manual repair in clinical use. A major reason for the breakages is that the appearance of the airway wall can be affected by the lung disease as well as the adjacency of the vessels, while the network tends to overfit to these special patterns in the training set. To learn robust features for these areas, we design a multi-branch framework that adopts the geodesic distance transform to capture the intensity changes between airway lumen and wall. Another reason for the breakages is the intra-class imbalance. Since the volume of the peripheral bronchi may be much smaller than the large branches in an input patch, the common segmentation loss is not sensitive to the breakages among the distal branches. Therefore, in this paper, a breakage-sensitive regularization term is designed and can be easily combined with other loss functions. Extensive experiments are conducted on publicly available datasets. Compared with state-of-the-art methods, our framework can detect more branches while maintaining competitive segmentation performance., Comment: Accept as IEEE ISBI 2022 oral
Published: 2022

22. Enhanced edge convolution-based spatial-temporal network for network traffic prediction

Author: Hu, Zehua, Ruan, Ke, Yu, Weihao, and Chen, Siyuan
Published: 2023
Full Text: View/download PDF

23. Rhombic dodecahedral ZIF-8-supported CuFe2O4 triggers sodium percarbonate activation for enhanced sulfonamide antibiotics degradation: Synergistic roles of heterostructure and photocatalytic mechanisms

Author: Xu, Junge, Yu, Weihao, Zhang, Ziwei, Deng, Fubin, Wang, Shengkong, Zou, Rusen, Wang, Yingmu, and Yuan, Baoling
Published: 2024
Full Text: View/download PDF

24. MetaFormer Is Actually What You Need for Vision

Author: Yu, Weihao, Luo, Mi, Zhou, Pan, Si, Chenyang, Zhou, Yichen, Wang, Xinchao, Feng, Jiashi, and Yan, Shuicheng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Transformers have shown great potential in computer vision tasks. A common belief is their attention-based token mixer module contributes most to their competence. However, recent works show the attention-based module in Transformers can be replaced by spatial MLPs and the resulted models still perform quite well. Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance. To verify this, we deliberately replace the attention module in Transformers with an embarrassingly simple spatial pooling operator to conduct only basic token mixing. Surprisingly, we observe that the derived model, termed as PoolFormer, achieves competitive performance on multiple computer vision tasks. For example, on ImageNet-1K, PoolFormer achieves 82.1% top-1 accuracy, surpassing well-tuned Vision Transformer/MLP-like baselines DeiT-B/ResMLP-B24 by 0.3%/1.1% accuracy with 35%/52% fewer parameters and 50%/62% fewer MACs. The effectiveness of PoolFormer verifies our hypothesis and urges us to initiate the concept of "MetaFormer", a general architecture abstracted from Transformers without specifying the token mixer. Based on the extensive experiments, we argue that MetaFormer is the key player in achieving superior results for recent Transformer and MLP-like models on vision tasks. This work calls for more future research dedicated to improving MetaFormer instead of focusing on the token mixer modules. Additionally, our proposed PoolFormer could serve as a starting baseline for future MetaFormer architecture design. Code is available at https://github.com/sail-sg/poolformer., Comment: CVPR 2022 (Oral). Code: https://github.com/sail-sg/poolformer
Published: 2021

25. FDA: Feature Decomposition and Aggregation for Robust Airway Segmentation

Author: Zhang, Minghui, Yu, Xin, Zhang, Hanxiao, Zheng, Hao, Yu, Weihao, Pan, Hong, Cai, Xiangran, and Gu, Yun
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D Convolutional Neural Networks (CNNs) have been widely adopted for airway segmentation. The performance of 3D CNNs is greatly influenced by the dataset while the public airway datasets are mainly clean CT scans with coarse annotation, thus difficult to be generalized to noisy CT scans (e.g. COVID-19 CT scans). In this work, we proposed a new dual-stream network to address the variability between the clean domain and noisy domain, which utilizes the clean CT scans and a small amount of labeled noisy CT scans for airway segmentation. We designed two different encoders to extract the transferable clean features and the unique noisy features separately, followed by two independent decoders. Further on, the transferable features are refined by the channel-wise feature recalibration and Signed Distance Map (SDM) regression. The feature recalibration module emphasizes critical features and the SDM pays more attention to the bronchi, which is beneficial to extracting the transferable topological features robust to the coarse labels. Extensive experimental results demonstrated the obvious improvement brought by our proposed method. Compared to other state-of-the-art transfer learning methods, our method accurately segmented more bronchi in the noisy CT scans., Comment: Accepted at MICCAI2021-DART
Published: 2021

26. LV-BERT: Exploiting Layer Variety for BERT

Author: Yu, Weihao, Jiang, Zihang, Chen, Fei, Hou, Qibin, and Feng, Jiashi
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Modern pre-trained language models are mostly built upon backbones stacking self-attention and feed-forward layers in an interleaved order. In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by exploiting layer variety from two aspects: the layer type set and the layer order. Specifically, besides the original self-attention and feed-forward layers, we introduce convolution into the layer type set, which is experimentally found beneficial to pre-trained models. Furthermore, beyond the original interleaved order, we explore more layer orders to discover more powerful architectures. However, the introduced layer variety leads to a large architecture space of more than billions of candidates, while training a single candidate model from scratch already requires huge computation cost, making it not affordable to search such a space by directly training large amounts of candidate models. To solve this problem, we first pre-train a supernet from which the weights of all candidate models can be inherited, and then adopt an evolutionary algorithm guided by pre-training accuracy to find the optimal architecture. Extensive experiments show that LV-BERT model obtained by our method outperforms BERT and its variants on various downstream tasks. For example, LV-BERT-small achieves 79.8 on the GLUE testing set, 1.8 higher than the strong baseline ELECTRA-small., Comment: Accepted to Findings of ACL 2021. The code and pre-trained models are available at https://github.com/yuweihao/LV-BERT
Published: 2021

27. Refiner: Refining Self-attention for Vision Transformers

Author: Zhou, Daquan, Shi, Yujun, Kang, Bingyi, Yu, Weihao, Jiang, Zihang, Li, Yuan, Jin, Xiaojie, Hou, Qibin, and Feng, Jiashi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs. Yet, they generally require much more data for model pre-training. Most of recent works thus are dedicated to designing more complex architectures or training methods to address the data-efficiency issue of ViTs. However, few of them explore improving the self-attention mechanism, a key factor distinguishing ViTs from CNNs. Different from existing works, we introduce a conceptually simple scheme, called refiner, to directly refine the self-attention maps of ViTs. Specifically, refiner explores attention expansion that projects the multi-head attention maps to a higher-dimensional space to promote their diversity. Further, refiner applies convolutions to augment local patterns of the attention maps, which we show is equivalent to a distributed local attention features are aggregated locally with learnable kernels and then globally aggregated with self-attention. Extensive experiments demonstrate that refiner works surprisingly well. Significantly, it enables ViTs to achieve 86% top-1 classification accuracy on ImageNet with only 81M parameters.
Published: 2021

28. Routing hypergraph convolutional recurrent network for network traffic prediction

Author: Yu, Weihao, Ruan, Ke, Tang, Hong, and Huang, Jin
Published: 2023
Full Text: View/download PDF

29. Lorentz equivariant model for knowledge-enhanced hyperbolic collaborative filtering

Author: Huang, Bosong, Yu, Weihao, Xie, Ruzhong, Luo, Junming, Xiao, Jing, and Huang, Jin
Published: 2024
Full Text: View/download PDF

30. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Author: Yuan, Li, Chen, Yunpeng, Wang, Tao, Yu, Weihao, Shi, Yujun, Jiang, Zihang, Tay, Francis EH, Feng, Jiashi, and Yan, Shuicheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Transformers, which are popular for language modeling, have been explored for solving vision tasks recently, e.g., the Vision Transformer (ViT) for image classification. The ViT model splits each image into a sequence of tokens with fixed length and then applies multiple Transformer layers to model their global relation for classification. However, ViT achieves inferior performance to CNNs when trained from scratch on a midsize dataset like ImageNet. We find it is because: 1) the simple tokenization of input images fails to model the important local structure such as edges and lines among neighboring pixels, leading to low training sample efficiency; 2) the redundant attention backbone design of ViT leads to limited feature richness for fixed computation budgets and limited training samples. To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study. Notably, T2T-ViT reduces the parameter count and MACs of vanilla ViT by half, while achieving more than 3.0\% improvement when trained from scratch on ImageNet. It also outperforms ResNets and achieves comparable performance with MobileNets by directly training on ImageNet. For example, T2T-ViT with comparable size to ResNet50 (21.5M parameters) can achieve 83.3\% top1 accuracy in image resolution 384$\times$384 on ImageNet. (Code: https://github.com/yitu-opensource/T2T-ViT), Comment: ICCV 2021, codes: https://github.com/yitu-opensource/T2T-ViT
Published: 2021

31. AirwayFormer: Structure-Aware Boundary-Adaptive Transformers for Airway Anatomical Labeling

Author: Yu, Weihao, Zheng, Hao, Gu, Yun, Xie, Fangfang, Sun, Jiayuan, Yang, Jie, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Greenspan, Hayit, editor, Madabhushi, Anant, editor, Mousavi, Parvin, editor, Salcudean, Septimiu, editor, Duncan, James, editor, Syeda-Mahmood, Tanveer, editor, and Taylor, Russell, editor
Published: 2023
Full Text: View/download PDF

32. Fast Generalizable Novel View Synthesis with Uncertainty-Aware Sampling

Author: Mo, Zhixiong, Wu, Weijun, Yu, Weihao, Zhang, Tinghua, Ke, Zhilin, Huang, Jin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Iliadis, Lazaros, editor, Papaleonidas, Antonios, editor, Angelov, Plamen, editor, and Jayne, Chrisina, editor
Published: 2023
Full Text: View/download PDF

33. Local-Global Semantic Fusion Single-shot Classification Method

Author: Cai, Jianwei, Fang, Kun, Yu, Weihao, Yang, Jie, Qiao, Yu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Tanveer, Mohammad, editor, Agarwal, Sonali, editor, Ozawa, Seiichi, editor, Ekbal, Asif, editor, and Jatowt, Adam, editor
Published: 2023
Full Text: View/download PDF

34. ConvBERT: Improving BERT with Span-based Dynamic Convolution

Author: Jiang, Zihang, Yu, Weihao, Zhou, Daquan, Chen, Yunpeng, Feng, Jiashi, and Yan, Shuicheng
Subjects: Computer Science - Computation and Language
Abstract: Pre-trained language models like BERT and its variants have recently achieved impressive performance in various natural language understanding tasks. However, BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost. Although all its attention heads query on the whole input sequence for generating the attention map from a global perspective, we observe some heads only need to learn local dependencies, which means the existence of computation redundancy. We therefore propose a novel span-based dynamic convolution to replace these self-attention heads to directly model local dependencies. The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning. We equip BERT with this mixed attention design and build a ConvBERT model. Experiments have shown that ConvBERT significantly outperforms BERT and its variants in various downstream tasks, with lower training cost and fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while using less than 1/4 training cost. Code and pre-trained models will be released., Comment: 17 pages
Published: 2020

35. ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

Author: Yu, Weihao, Jiang, Zihang, Dong, Yanfei, and Feng, Jiashi
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recent powerful pre-trained language models have achieved remarkable performance on most of the popular datasets for reading comprehension. It is time to introduce more challenging datasets to push the development of this field towards more comprehensive reasoning of text. In this paper, we introduce a new Reading Comprehension dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations. As earlier studies suggest, human-annotated datasets usually contain biases, which are often exploited by models to achieve high accuracy without truly understanding the text. In order to comprehensively evaluate the logical reasoning ability of models on ReClor, we propose to identify biased data points and separate them into EASY set while the rest as HARD set. Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set. However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models., Comment: ICLR 2020 paper. Project page: http://whyu.me/reclor/
Published: 2020

36. What is wrong with deep knowledge tracing? Attention-based knowledge tracing

Author: Wang, Xianqing, Zheng, Zetao, Zhu, Jia, and Yu, Weihao
Published: 2023
Full Text: View/download PDF

37. Heterogeneous Graph Learning for Visual Commonsense Reasoning

Author: Yu, Weijiang, Zhou, Jingwen, Yu, Weihao, Liang, Xiaodan, and Xiao, Nong
Subjects: Computer Science - Computer Vision and Pattern Recognition, 68T01
Abstract: Visual commonsense reasoning task aims at leading the research field into solving cognition-level reasoning with the ability of predicting correct answers and meanwhile providing convincing reasoning paths, resulting in three sub-tasks i.e., Q->A, QA->R and Q->AR. It poses great challenges over the proper semantic alignment between vision and linguistic domains and knowledge reasoning to generate persuasive reasoning paths. Existing works either resort to a powerful end-to-end network that cannot produce interpretable reasoning paths or solely explore intra-relationship of visual objects (homogeneous graph) while ignoring the cross-domain semantic alignment among visual concepts and linguistic words. In this paper, we propose a new Heterogeneous Graph Learning (HGL) framework for seamlessly integrating the intra-graph and inter-graph reasoning in order to bridge vision and language domain. Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement. Moreover, our HGL integrates a contextual voting module to exploit a long-range visual context for better global reasoning. Experiments on the large-scale Visual Commonsense Reasoning benchmark demonstrate the superior performance of our proposed modules on three tasks (improving 5% accuracy on Q->A, 3.5% on QA->R, 5.8% on Q->AR), Comment: 11 pages, 5 figures
Published: 2019

38. ODformer: Spatial–temporal transformers for long sequence Origin–Destination matrix forecasting against cross application scenario

Author: Huang, Bosong, Ruan, Ke, Yu, Weihao, Xiao, Jing, Xie, Ruzhong, and Huang, Jin
Published: 2023
Full Text: View/download PDF

39. Knowledge-Embedded Routing Network for Scene Graph Generation

Author: Chen, Tianshui, Yu, Weihao, Chen, Riquan, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: To understand a scene in depth not only involves locating/recognizing individual objects, but also requires to infer the relationships and interactions among them. However, since the distribution of real-world relationships is seriously unbalanced, existing methods perform quite poorly for the less frequent relationships. In this work, we find that the statistical correlations between object pairs and their relationships can effectively regularize semantic space and make prediction less ambiguous, and thus well address the unbalanced distribution issue. To achieve this, we incorporate these statistical correlations into deep neural networks to facilitate scene graph generation by developing a Knowledge-Embedded Routing Network. More specifically, we show that the statistical correlations between objects appearing in images and their relationships, can be explicitly represented by a structured knowledge graph, and a routing mechanism is learned to propagate messages through the graph to explore their interactions. Extensive experiments on the large-scale Visual Genome dataset demonstrate the superiority of the proposed method over current state-of-the-art competitors., Comment: Accepted by CVPR 2019
Published: 2019

40. HyperDNE: Enhanced hypergraph neural network for dynamic network embedding

Author: Huang, Jin, Lu, Tian, Zhou, Xuebin, Cheng, Bo, Hu, Zhibin, Yu, Weihao, and Xiao, Jing
Published: 2023
Full Text: View/download PDF

41. Two-Stage Denoising Diffusion Model for Source Localization in Graph Inverse Problems

Author: Huang, Bosong, primary, Yu, Weihao, additional, Xie, Ruzhong, additional, Xiao, Jing, additional, and Huang, Jin, additional
Published: 2023
Full Text: View/download PDF

42. Local-Global Semantic Fusion Single-shot Classification Method

Author: Cai, Jianwei, primary, Fang, Kun, additional, Yu, Weihao, additional, Yang, Jie, additional, and Qiao, Yu, additional
Published: 2023
Full Text: View/download PDF

43. Interpretable Lung Cancer Diagnosis with Nodule Attribute Guidance and Online Model Debugging

Author: Zhang, Hanxiao, Chen, Liang, Zhang, Minghui, Gu, Xiao, Qin, Yulei, Yu, Weihao, Yao, Feng, Wang, Zhexin, Gu, Yun, Yang, Guang-Zhong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Reyes, Mauricio, editor, Henriques Abreu, Pedro, editor, and Cardoso, Jaime, editor
Published: 2022
Full Text: View/download PDF

44. Molecular Characteristics of Bovine Viral Diarrhea Virus Strain Isolated from Commercial Foetal Bovine Serum

Author: Pan, Juanjuan, primary, Jiang, Jianfeng, additional, Duan, Ruli, additional, Dang, Yueyi, additional, Yu, Weihao, additional, Jianaer, Nuoerdun, additional, Chen, Xintong, additional, Kuang, Ling, additional, Tong, Panpan, additional, Mi, Shijiang, additional, and Xie, Jinxin, additional
Published: 2024
Full Text: View/download PDF

45. Deep Reasoning with Knowledge Graph for Social Relationship Understanding

Author: Wang, Zhouxia, Chen, Tianshui, Ren, Jimmy, Yu, Weihao, Cheng, Hui, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Social relationships (e.g., friends, couple etc.) form the basis of the social network in our daily life. Automatically interpreting such relationships bears a great potential for the intelligent systems to understand human behavior in depth and to better interact with people at a social level. Human beings interpret the social relationships within a group not only based on the people alone, and the interplay between such social relationships and the contextual information around the people also plays a significant role. However, these additional cues are largely overlooked by the previous studies. We found that the interplay between these two factors can be effectively modeled by a novel structured knowledge graph with proper message propagation and attention. And this structured knowledge can be efficiently integrated into the deep neural network architecture to promote social relationship understanding by an end-to-end trainable Graph Reasoning Model (GRM), in which a propagation mechanism is learned to propagate node message through the graph to explore the interaction between persons of interest and the contextual objects. Meanwhile, a graph attentional mechanism is introduced to explicitly reason about the discriminative objects to promote recognition. Extensive experiments on the public benchmarks demonstrate the superiority of our method over the existing leading competitors., Comment: Accepted at IJCAI 2018. The first work that integrates high-level knowledge graph to reason about social relationships between person pair of interest in still image
Published: 2018

46. Multi-relational knowledge graph completion method with local information fusion

Author: Huang, Jin, Lu, Tian, Zhu, Jia, Yu, Weihao, and Zhang, Tinghua
Published: 2022
Full Text: View/download PDF

47. Different Modulations of Arctic Oscillation on Wintertime Sea Surface Temperature Anomalies in the Northeast Pacific.

Author: Chen, Jiajie, Li, Ronglin, Mao, Jiongren, Yu, Weihao, Xie, Shen, Wei, Jiaqi, Huang, Hao, Liu, Qinyu, and Shi, Jian
Subjects: OCEAN temperature, MARINE heatwaves, ATMOSPHERIC circulation, LATENT heat, POLAR climate
Abstract: Persistent positive sea surface temperature anomalies (SSTAs) in the mid‐latitude Northeast Pacific (NEP), also known as "warm blob" or "marine heatwave," have substantial ecological and climate effects. This study delves into the spatiotemporal connection between Arctic Oscillation (AO) and SSTAs in the NEP. First, we conduct the lead‐lag correlation and maximum covariance analyses to disentangle the closest temporal relationship between October AO and wintertime SSTAs in the NEP. Then, we categorize the years in positive AO (pAO) phase into two groups: positive AO with warm anomaly (pAO&Blob) group and positive AO without warm anomaly (pAO&noBlob) group based on the October AO index and wintertime blob index. Results show that the positive phase of AO in October strongly influences the wintertime warm SSTAs in the NEP through local and remote pathways. The local pathway is contingent upon the longitudinal positioning of AO‐related high‐pressure anomaly (i.e., anomalous ridge) in the North Pacific. When easterly anomalies prevail at the southern flank of the anomalous ridge over the NEP, they foster warm SSTAs in the NEP. However, different locations of the high‐pressure anomalies may impede the NEP warming. Remote pathways indicate the teleconnections triggered by AO‐related precipitation increase in Greenland and decrease in East Asia, sustaining the high‐pressure anomaly and promoting the anomalous NEP warming. Hence, this study presents new evidence on polar and mid‐latitude climate connections, which may provide potential predictability for the warm SSTAs in the NEP. Plain Language Summary: Under rapid climate change, long‐lasting warm sea surface temperature anomalies (SSTAs) in the mid‐latitude Northeast Pacific (NEP) have attracted wide attention due to their substantial ecological and climate effects. Arctic Oscillation (AO) is the dominant mode of atmospheric circulation variability in mid‐to‐high latitudes of the Northern Hemisphere. Its positive phase is featured by a low‐pressure center in the Arctic region and high‐pressure centers in the mid‐latitude regions. Our results reveal a robust connection between the positive phase of AO in October and warm SSTAs in the NEP during winter. We identify two primary pathways through which the October AO influences the wintertime sea surface temperature (SST) warming in the NEP. Locally, a high‐pressure anomaly over the NEP is associated with easterly wind anomalies to its southern flank, which diminish oceanic latent heat loss to the atmosphere, thus fostering the SST warming in the NEP. However, other locations of the high‐pressure anomaly related to AO may not be favorable for the warming of the NEP. From a remote view, AO can impact the NEP SST through teleconnections due to increased precipitation in Greenland and decreased precipitation in East Asia, sustaining the high‐pressure anomaly over the NEP and further bolstering the NEP warming. Key Points: Temporally, Arctic Oscillation (AO) in October is closely linked to wintertime sea surface temperature (SST) anomalies in the mid‐latitude Northeast Pacific (NEP)The AO‐related anomalous ridge located over the NEP is favorable for the warming of SST locallyRemotely, enhanced rainfall in Greenland and decreased rainfall near East Asia sustain the anomalous ridge over the NEP via teleconnection [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Dehydroandrographolide facilitates M2 macrophage polarization by downregulating DUSP3 to inhibit sepsis‐associated acute kidney injury

Author: Shao, Yanyan, primary, Yu, Weihao, additional, and Cai, Hailun, additional
Published: 2024
Full Text: View/download PDF

49. Learning from Interpretable Analysis: Attention-Based Knowledge Tracing

Author: Zhu, Jia, Yu, Weihao, Zheng, Zetao, Huang, Changqin, Tang, Yong, Fung, Gabriel Pui Cheong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bittencourt, Ig Ibert, editor, Cukurova, Mutlu, editor, Muldner, Kasia, editor, Luckin, Rose, editor, and Millán, Eva, editor
Published: 2020
Full Text: View/download PDF

50. Re-thinking and Re-labeling LIDC-IDRI for Robust Pulmonary Cancer Prediction

Author: Zhang, Hanxiao, primary, Gu, Xiao, additional, Zhang, Minghui, additional, Yu, Weihao, additional, Chen, Liang, additional, Wang, Zhexin, additional, Yao, Feng, additional, Gu, Yun, additional, and Yang, Guang-Zhong, additional
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

167 results on '"Yu, Weihao"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources