Author: "Sun, Weixuan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Sun, Weixuan"' showing total 166 results

Start Over Author "Sun, Weixuan"

166 results on '"Sun, Weixuan"'

1. Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

Author: Qin, Zhen, Sun, Weigao, Li, Dong, Shen, Xuyang, Sun, Weixuan, and Zhong, Yiran
Subjects: Computer Science - Computation and Language
Abstract: We present Lightning Attention, the first linear attention implementation that maintains a constant training speed for various sequence lengths under fixed memory consumption. Due to the issue with cumulative summation operations (cumsum), previous linear attention implementations cannot achieve their theoretical advantage in a casual setting. However, this issue can be effectively solved by utilizing different attention calculation strategies to compute the different parts of attention. Specifically, we split the attention calculation into intra-blocks and inter-blocks and use conventional attention computation for intra-blocks and linear attention kernel tricks for inter-blocks. This eliminates the need for cumsum in the linear attention calculation. Furthermore, a tiling technique is adopted through both forward and backward procedures to take full advantage of the GPU hardware. To enhance accuracy while preserving efficacy, we introduce TransNormerLLM (TNL), a new architecture that is tailored to our lightning attention. We conduct rigorous testing on standard and self-collected datasets with varying model sizes and sequence lengths. TNL is notably more efficient than other language models. In addition, benchmark results indicate that TNL performs on par with state-of-the-art LLMs utilizing conventional transformer structures. The source code is released at github.com/OpenNLPLab/TransnormerLLM., Comment: Accepted by ICML 2024. Yiran Zhong is the corresponding author. Code is released at github.com/OpenNLPLab/TransnormerLLM
Published: 2024

2. LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image

Author: Cui, Ruikai, Song, Xibin, Sun, Weixuan, Wang, Senbo, Liu, Weizhe, Chen, Shenzhou, Shang, Taizhang, Li, Yang, Barnes, Nick, Li, Hongdong, and Ji, Pan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large Reconstruction Models have made significant strides in the realm of automated 3D content generation from single or multiple input images. Despite their success, these models often produce 3D meshes with geometric inaccuracies, stemming from the inherent challenges of deducing 3D shapes solely from image data. In this work, we introduce a novel framework, the Large Image and Point Cloud Alignment Model (LAM3D), which utilizes 3D point cloud data to enhance the fidelity of generated 3D meshes. Our methodology begins with the development of a point-cloud-based network that effectively generates precise and meaningful latent tri-planes, laying the groundwork for accurate 3D mesh reconstruction. Building upon this, our Image-Point-Cloud Feature Alignment technique processes a single input image, aligning to the latent tri-planes to imbue image features with robust 3D information. This process not only enriches the image features but also facilitates the production of high-fidelity 3D meshes without the need for multi-view input, significantly reducing geometric distortions. Our approach achieves state-of-the-art high-fidelity 3D mesh reconstruction from a single image in just 6 seconds, and experiments on various datasets demonstrate its effectiveness., Comment: 19 pages, 10 figures
Published: 2024

3. HGRN2: Gated Linear RNNs with State Expansion

Author: Qin, Zhen, Yang, Songlin, Sun, Weixuan, Shen, Xuyang, Li, Dong, Sun, Weigao, and Zhong, Yiran
Subjects: Computer Science - Computation and Language
Abstract: Hierarchically gated linear RNN (HGRN, \citealt{HGRN}) has demonstrated competitive training speed and performance in language modeling while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, limiting its expressiveness. To address this issue, we introduce a simple outer product-based state expansion mechanism, which significantly enlarges the recurrent state size without introducing any additional parameters. This enhancement also provides a linear attention interpretation for HGRN2, enabling hardware-efficient training. Our extensive experiments verify the advantage of HGRN2 over HGRN consistently across different settings and competitive with other recurrent models., Comment: Accept to COLM 2024. Yiran Zhong is the corresponding author. Zhen Qin and Songlin Yang contributed equally to this work. The source code is available at https://github.com/OpenNLPLab/HGRN2
Published: 2024

4. NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation

Author: Cui, Ruikai, Liu, Weizhe, Sun, Weixuan, Wang, Senbo, Shang, Taizhang, Li, Yang, Song, Xibin, Yan, Han, Wu, Zhennan, Chen, Shenzhou, Li, Hongdong, and Ji, Pan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: 3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation without considering spatial consistency. As a result, these approaches exhibit limited versatility in 3D data representation and shape generation, hindering their ability to generate highly diverse 3D shapes that comply with the specified constraints. In this paper, we introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling. To ensure spatial coherence and reduce memory usage, we incorporate a hybrid shape representation technique that directly learns a continuous signed distance field representation of the 3D shape using orthogonal 2D planes. Additionally, we meticulously enforce spatial correspondences across distinct planes using a transformer-based autoencoder structure, promoting the preservation of spatial relationships in the generated 3D shapes. This yields an algorithm that consistently outperforms state-of-the-art 3D shape generation methods on various tasks, including unconditional shape generation, multi-modal shape completion, single-view reconstruction, and text-to-shape synthesis. Our project page is available at https://weizheliu.github.io/NeuSDFusion/ ., Comment: ECCV 2024, project page: https://weizheliu.github.io/NeuSDFusion/
Published: 2024

5. Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

Author: Yan, Han, Li, Yang, Wu, Zhennan, Chen, Shenzhou, Sun, Weixuan, Shang, Taizhang, Liu, Weizhe, Chen, Tian, Dai, Xiaqiang, Ma, Chao, Li, Hongdong, and Ji, Pan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics
Abstract: We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass. Unlike existing methods that output a single, unified 3D shape, Frankenstein simultaneously generates multiple separated shapes, each corresponding to a semantically meaningful part. The 3D scene information is encoded in one single tri-plane tensor, from which multiple Singed Distance Function (SDF) fields can be decoded to represent the compositional shapes. During training, an auto-encoder compresses tri-planes into a latent space, and then the denoising diffusion process is employed to approximate the distribution of the compositional scenes. Frankenstein demonstrates promising results in generating room interiors as well as human avatars with automatically separated parts. The generated scenes facilitate many downstream applications, such as part-wise re-texturing, object rearrangement in the room or avatar cloth re-targeting. Our project page is available at: https://wolfball.github.io/frankenstein/., Comment: SIGGRAPH Asia 2024 Conference Paper
Published: 2024
Full Text: View/download PDF

6. Audio-Visual Segmentation with Semantics

Author: Zhou, Jinxing, Shen, Xuyang, Wang, Jianyuan, Zhang, Jiayi, Sun, Weixuan, Zhang, Jing, Birchfield, Stan, Guo, Dan, Kong, Lingpeng, Wang, Meng, and Zhong, Yiran
Published: 2024
Full Text: View/download PDF

7. BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

Author: Wu, Zhennan, Li, Yang, Yan, Han, Shang, Taizhang, Sun, Weixuan, Wang, Senbo, Cui, Ruikai, Liu, Weizhe, Sato, Hiroyuki, Li, Hongdong, and Ji, Pan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics
Abstract: We present BlockFusion, a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene. BlockFusion is trained using datasets of 3D blocks that are randomly cropped from complete 3D scene meshes. Through per-block fitting, all training blocks are converted into the hybrid neural fields: with a tri-plane containing the geometry features, followed by a Multi-layer Perceptron (MLP) for decoding the signed distance values. A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed. Diffusion applied to the latent representations allows for high-quality and diverse 3D scene generation. To expand a scene during generation, one needs only to append empty blocks to overlap with the current scene and extrapolate existing latent tri-planes to populate new blocks. The extrapolation is done by conditioning the generation process with the feature samples from the overlapping tri-planes during the denoising iterations. Latent tri-plane extrapolation produces semantically and geometrically meaningful transitions that harmoniously blend with the existing scene. A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements. Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes with unprecedented high-quality shapes in both indoor and outdoor scenarios., Comment: ACM Transactions on Graphics (SIGGRAPH'24). Code: https://yang-l1.github.io/blockfusion
Published: 2024

8. CO2: Efficient Distributed Training with Full Communication-Computation Overlap

Author: Sun, Weigao, Qin, Zhen, Sun, Weixuan, Li, Shidi, Li, Dong, Shen, Xuyang, Qiao, Yu, and Zhong, Yiran
Subjects: Computer Science - Computation and Language, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: The fundamental success of large language models hinges upon the efficacious implementation of large-scale distributed training techniques. Nevertheless, building a vast, high-performance cluster featuring high-speed communication interconnectivity is prohibitively costly, and accessible only to prominent entities. In this work, we aim to lower this barrier and democratize large-scale training with limited bandwidth clusters. We propose a new approach called CO2 that introduces local-updating and asynchronous communication to the distributed data-parallel training, thereby facilitating the full overlap of COmunication with COmputation. CO2 is able to attain a high scalability even on extensive multi-node clusters constrained by very limited communication bandwidth. We further propose the staleness gap penalty and outer momentum clipping techniques together with CO2 to bolster its convergence and training stability. Besides, CO2 exhibits seamless integration with well-established ZeRO-series optimizers which mitigate memory consumption of model states with large model training. We also provide a mathematical proof of convergence, accompanied by the establishment of a stringent upper bound. Furthermore, we validate our findings through an extensive set of practical experiments encompassing a wide range of tasks in the fields of computer vision and natural language processing. These experiments serve to demonstrate the capabilities of CO2 in terms of convergence, generalization, and scalability when deployed across configurations comprising up to 128 A100 GPUs. The outcomes emphasize the outstanding capacity of CO2 to hugely improve scalability, no matter on clusters with 800Gbps RDMA or 80Gbps TCP/IP inter-node connections., Comment: ICLR 2024 Spotlight. Yiran Zhong is the corresponding author. Code is available at: https://github.com/OpenNLPLab/CO2
Published: 2024

9. Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Author: Qin, Zhen, Sun, Weigao, Li, Dong, Shen, Xuyang, Sun, Weixuan, and Zhong, Yiran
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Linear attention is an efficient attention mechanism that has recently emerged as a promising alternative to conventional softmax attention. With its ability to process tokens in linear computational complexities, linear attention, in theory, can handle sequences of unlimited length without sacrificing speed, i.e., maintaining a constant training speed for various sequence lengths with a fixed memory consumption. However, due to the issue with cumulative summation (cumsum), current linear attention algorithms cannot demonstrate their theoretical advantage in a causal setting. In this paper, we present Lightning Attention-2, the first linear attention implementation that enables linear attention to realize its theoretical computational benefits. To achieve this, we leverage the thought of tiling, separately handling the intra-block and inter-block components in linear attention calculation. Specifically, we utilize the conventional attention computation mechanism for the intra-blocks and apply linear attention kernel tricks for the inter-blocks. A tiling technique is adopted through both forward and backward procedures to take full advantage of the GPU hardware. We implement our algorithm in Triton to make it IO-aware and hardware-friendly. Various experiments are conducted on different model sizes and sequence lengths. Lightning Attention-2 retains consistent training and inference speed regardless of input sequence length and is significantly faster than other attention mechanisms. The source code is available at https://github.com/OpenNLPLab/lightning-attention., Comment: Technical Report. Yiran Zhong is the corresponding author. The source code is available at https://github.com/OpenNLPLab/lightning-attention
Published: 2024

10. All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation

Author: Sun, Weixuan, Zhang, Yanhao, Qin, Zhen, Liu, Zheyuan, Cheng, Lin, Wang, Fanyi, Zhong, Yiran, and Barnes, Nick
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we propose a new transformer-based regularization to better localize objects for Weakly supervised semantic segmentation (WSSS). In image-level WSSS, Class Activation Map (CAM) is adopted to generate object localization as pseudo segmentation labels. To address the partial activation issue of the CAMs, consistency regularization is employed to maintain activation intensity invariance across various image augmentations. However, such methods ignore pair-wise relations among regions within each CAM, which capture context and should also be invariant across image views. To this end, we propose a new all-pairs consistency regularization (ACR). Given a pair of augmented views, our approach regularizes the activation intensities between a pair of augmented views, while also ensuring that the affinity across regions within each view remains consistent. We adopt vision transformers as the self-attention mechanism naturally embeds pair-wise affinity. This enables us to simply regularize the distance between the attention matrices of augmented image pairs. Additionally, we introduce a novel class-wise localization method that leverages the gradients of the class token. Our method can be seamlessly integrated into existing WSSS methods using transformers without modifying the architectures. We evaluate our method on PASCAL VOC and MS COCO datasets. Our method produces noticeably better class localization maps (67.3% mIoU on PASCAL VOC train), resulting in superior WSSS performances., Comment: ICCV 2023 workshop, code released at: https://github.com/OpenNLPLab/ACR_WSSS
Published: 2023

11. TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

Author: Qin, Zhen, Li, Dong, Sun, Weigao, Sun, Weixuan, Shen, Xuyang, Han, Xiaodong, Wei, Yunshen, Lv, Baohong, Luo, Xiao, Qiao, Yu, and Zhong, Yiran
Subjects: Computer Science - Computation and Language
Abstract: We present TransNormerLLM, the first linear attention-based Large Language Model (LLM) that outperforms conventional softmax attention-based models in terms of both accuracy and efficiency. TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization. Specifically, we use LRPE together with an exponential decay to avoid attention dilution issues while allowing the model to retain global interactions between tokens. Additionally, we propose Lightning Attention, a cutting-edge technique that accelerates linear attention by more than twice in runtime and reduces memory usage by a remarkable four times. To further enhance the performance of TransNormer, we leverage a gating mechanism for smooth training and a new tensor normalization scheme to accelerate the model, resulting in an impressive acceleration of over $20\%$. Furthermore, we develop a robust inference algorithm that ensures numerical stability and consistent inference speed, regardless of the sequence length, showcasing superior efficiency during both training and inference stages. We also implement an efficient model parallel schema for TransNormerLLM, enabling seamless deployment on large-scale clusters and facilitating expansion to even more extensive models, i.e., LLMs with 175B parameters. We validate our model design through a series of ablations and train models with sizes of 385M, 1B, and 7B on our self-collected corpus. Benchmark results demonstrate that our models not only match the performance of state-of-the-art LLMs with Transformer but are also significantly faster. Code is released at: https://github.com/OpenNLPLab/TransnormerLLM., Comment: Technical Report. Yiran Zhong is the corresponding author. Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen contribute equally to this paper. Code is released at: https://github.com/OpenNLPLab/TransnormerLLM
Published: 2023

12. Linearized Relative Positional Encoding

Author: Qin, Zhen, Sun, Weixuan, Lu, Kaiyue, Deng, Hui, Li, Dongxu, Han, Xiaodong, Dai, Yuchao, Kong, Lingpeng, and Zhong, Yiran
Subjects: Computer Science - Computation and Language
Abstract: Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Nevertheless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for various applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers. The code is available at https://github.com/OpenNLPLab/Lrpe., Comment: Reviewed by TMLR, decision pending. Yiran Zhong is the corresponding author. Code is available at https://github.com/OpenNLPLab/Lrpe
Published: 2023

13. Retraction Note: Curcumin inhibits proliferation, migration, invasion and promotes apoptosis of retinoblastoma cell lines through modulation of miR-99a and JAK/STAT pathway

Author: Li, Yaping, Sun, Weixuan, Han, Ning, Zou, Ying, and Yin, Dexin
Published: 2024
Full Text: View/download PDF

14. Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

Author: Liu, Zheyuan, Sun, Weixuan, Teney, Damien, and Gould, Stephen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Composed image retrieval aims to find an image that best matches a given multi-modal user query consisting of a reference image and text pair. Existing methods commonly pre-compute image embeddings over the entire corpus and compare these to a reference image embedding modified by the query text at test time. Such a pipeline is very efficient at test time since fast vector distances can be used to evaluate candidates, but modifying the reference image embedding guided only by a short textual description can be difficult, especially independent of potential candidates. An alternative approach is to allow interactions between the query and every possible candidate, i.e., reference-text-candidate triplets, and pick the best from the entire set. Though this approach is more discriminative, for large-scale datasets the computational cost is prohibitive since pre-computation of candidate embeddings is no longer possible. We propose to combine the merits of both schemes using a two-stage model. Our first stage adopts the conventional vector distancing metric and performs a fast pruning among candidates. Meanwhile, our second stage employs a dual-encoder architecture, which effectively attends to the input triplet of reference-text-candidate and re-ranks the candidates. Both stages utilize a vision-and-language pre-trained network, which has proven beneficial for various downstream tasks. Our method consistently outperforms state-of-the-art approaches on standard benchmarks for the task. Our implementation is available at https://github.com/Cuberick-Orion/Candidate-Reranking-CIR., Comment: Accepted at TMLR, 19 pages, 8 figures
Published: 2023

15. Toeplitz Neural Network for Sequence Modeling

Author: Qin, Zhen, Han, Xiaodong, Sun, Weixuan, He, Bowen, Li, Dong, Li, Dongxu, Dai, Yuchao, Kong, Lingpeng, and Zhong, Yiran
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Sequence modeling has important applications in natural language processing and computer vision. Recently, the transformer-based models have shown strong performance on various sequence modeling tasks, which rely on attention to capture pairwise token relations, and position embedding to inject positional information. While showing good performance, the transformer models are inefficient to scale to long input sequences, mainly due to the quadratic space-time complexity of attention. To overcome this inefficiency, we propose to model sequences with a relative position encoded Toeplitz matrix and use a Toeplitz matrix-vector production trick to reduce the space-time complexity of the sequence modeling to log linear. A lightweight sub-network called relative position encoder is proposed to generate relative position coefficients with a fixed budget of parameters, enabling the proposed Toeplitz neural network to deal with varying sequence lengths. In addition, despite being trained on 512-token sequences, our model can extrapolate input sequence length up to 14K tokens in inference with consistent performance. Extensive experiments on autoregressive and bidirectional language modeling, image modeling, and the challenging Long-Range Arena benchmark show that our method achieves better performance than its competitors in most downstream tasks while being significantly faster. The code is available at https://github.com/OpenNLPLab/Tnn., Comment: Accepted to ICLR 2023 Spotlight. Yiran Zhong is the corresponding author. 15B pretrained LLM with TNN will be released at https://github.com/OpenNLPLab/Tnn soon
Published: 2023

16. An Alternative to WSSS? An Empirical Study of the Segment Anything Model (SAM) on Weakly-Supervised Semantic Segmentation Problems

Author: Sun, Weixuan, Liu, Zheyuan, Zhang, Yanhao, Zhong, Yiran, and Barnes, Nick
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The Segment Anything Model (SAM) has demonstrated exceptional performance and versatility, making it a promising tool for various related tasks. In this report, we explore the application of SAM in Weakly-Supervised Semantic Segmentation (WSSS). Particularly, we adapt SAM as the pseudo-label generation pipeline given only the image-level class labels. While we observed impressive results in most cases, we also identify certain limitations. Our study includes performance evaluations on PASCAL VOC and MS-COCO, where we achieved remarkable improvements over the latest state-of-the-art methods on both datasets. We anticipate that this report encourages further explorations of adopting SAM in WSSS, as well as wider real-world applications., Comment: Technique report
Published: 2023

17. Bi-directional Training for Composed Image Retrieval via Text Prompt Learning

Author: Liu, Zheyuan, Sun, Weixuan, Hong, Yicong, Teney, Damien, and Gould, Stephen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Composed image retrieval searches for a target image based on a multi-modal user query comprised of a reference image and modification text describing the desired changes. Existing approaches to solving this challenging task learn a mapping from the (reference image, modification text)-pair to an image embedding that is then matched against a large image corpus. One area that has not yet been explored is the reverse direction, which asks the question, what reference image when modified as described by the text would produce the given target image? In this work we propose a bi-directional training scheme that leverages such reversed queries and can be applied to existing composed image retrieval architectures with minimum changes, which improves the performance of the model. To encode the bi-directional query we prepend a learnable token to the modification text that designates the direction of the query and then finetune the parameters of the text embedding module. We make no other changes to the network architecture. Experiments on two standard datasets show that our novel approach achieves improved performance over a baseline BLIP-based model that itself already achieves competitive performance. Our code is released at https://github.com/Cuberick-Orion/Bi-Blip4CIR., Comment: WACV 2024 accepted. 12 pages, 7 figures
Published: 2023

18. Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

Author: Sun, Weixuan, Zhang, Jiayi, Wang, Jianyuan, Liu, Zheyuan, Zhong, Yiran, Feng, Tianpeng, Guo, Yandong, Zhang, Yanhao, and Barnes, Nick
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Self-supervised audio-visual source localization aims to locate sound-source objects in video frames without extra annotations. Recent methods often approach this goal with the help of contrastive learning, which assumes only the audio and visual contents from the same video are positive samples for each other. However, this assumption would suffer from false negative samples in real-world training. For example, for an audio sample, treating the frames from the same audio class as negative samples may mislead the model and therefore harm the learned representations e.g., the audio of a siren wailing may reasonably correspond to the ambulances in multiple images). Based on this observation, we propose a new learning strategy named False Negative Aware Contrastive (FNAC) to mitigate the problem of misleading the training with such false negative samples. Specifically, we utilize the intra-modal similarities to identify potentially similar samples and construct corresponding adjacency matrices to guide contrastive learning. Further, we propose to strengthen the role of true negative samples by explicitly leveraging the visual features of sound sources to facilitate the differentiation of authentic sounding source regions. FNAC achieves state-of-the-art performances on Flickr-SoundNet, VGG-Sound, and AVSBench, which demonstrates the effectiveness of our method in mitigating the false negative issue. The code is available at \url{https://github.com/OpenNLPLab/FNAC_AVL}., Comment: CVPR2023
Published: 2023

19. Audio-Visual Segmentation with Semantics

Author: Zhou, Jinxing, Shen, Xuyang, Wang, Jianyuan, Zhang, Jiayi, Sun, Weixuan, Zhang, Jing, Birchfield, Stan, Guo, Dan, Kong, Lingpeng, Wang, Meng, and Zhong, Yiran
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first audio-visual segmentation benchmark, i.e., AVSBench, providing pixel-wise annotations for sounding objects in audible videos. It contains three subsets: AVSBench-object (Single-source subset, Multi-sources subset) and AVSBench-semantic (Semantic-labels subset). Accordingly, three settings are studied: 1) semi-supervised audio-visual segmentation with a single sound source; 2) fully-supervised audio-visual segmentation with multiple sound sources, and 3) fully-supervised audio-visual semantic segmentation. The first two settings need to generate binary masks of sounding objects indicating pixels corresponding to the audio, while the third setting further requires generating semantic maps indicating the object category. To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process. We also design a regularization loss to encourage audio-visual mapping during training. Quantitative and qualitative experiments on AVSBench compare our approach to several existing methods for related tasks, demonstrating that the proposed method is promising for building a bridge between the audio and pixel-wise visual semantics. Code is available at https://github.com/OpenNLPLab/AVSBench. Online benchmark is available at http://www.avlbench.opennlplab.cn., Comment: Submitted to TPAMI as a journal extension of ECCV 2022. Jinxing Zhou, Xuyang Shen, and Jianyuan Wang contribute equally to this work. Meng Wang and Yiran Zhong are the corresponding authors. Code is available at https://github.com/OpenNLPLab/AVSBench. Online benchmark is available at http://www.avlbench.opennlplab.cn. arXiv admin note: substantial text overlap with arXiv:2207.05042
Published: 2023

20. The Devil in Linear Transformer

Author: Qin, Zhen, Han, XiaoDong, Sun, Weixuan, Li, Dongxu, Kong, Lingpeng, Barnes, Nick, and Zhong, Yiran
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Linear transformers aim to reduce the quadratic space-time complexity of vanilla transformers. However, they usually suffer from degraded performances on various tasks and corpus. In this paper, we examine existing kernel-based linear transformers and identify two key issues that lead to such performance gaps: 1) unbounded gradients in the attention computation adversely impact the convergence of linear transformer models; 2) attention dilution which trivially distributes attention scores over long sequences while neglecting neighbouring structures. To address these issues, we first identify that the scaling of attention matrices is the devil in unbounded gradients, which turns out unnecessary in linear attention as we show theoretically and empirically. To this end, we propose a new linear attention that replaces the scaling operation with a normalization to stabilize gradients. For the issue of attention dilution, we leverage a diagonal attention to confine attention to only neighbouring tokens in early layers. Benefiting from the stable gradients and improved attention, our new linear transformer model, transNormer, demonstrates superior performance on text classification and language modeling tasks, as well as on the challenging Long-Range Arena benchmark, surpassing vanilla transformer and existing linear variants by a clear margin while being significantly more space-time efficient. The code is available at https://github.com/OpenNLPLab/Transnormer ., Comment: accepted to EMNLP2022
Published: 2022

21. Linear Video Transformer with Feature Fixation

Author: Lu, Kaiyue, Liu, Zexiang, Wang, Jianyuan, Sun, Weixuan, Qin, Zhen, Li, Dong, Shen, Xuyang, Deng, Hui, Han, Xiaodong, Dai, Yuchao, and Zhong, Yiran
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Vision Transformers have achieved impressive performance in video classification, while suffering from the quadratic complexity caused by the Softmax attention mechanism. Some studies alleviate the computational costs by reducing the number of tokens in attention calculation, but the complexity is still quadratic. Another promising way is to replace Softmax attention with linear attention, which owns linear complexity but presents a clear performance drop. We find that such a drop in linear attention results from the lack of attention concentration on critical features. Therefore, we propose a feature fixation module to reweight the feature importance of the query and key before computing linear attention. Specifically, we regard the query, key, and value as various latent representations of the input token, and learn the feature fixation ratio by aggregating Query-Key-Value information. This is beneficial for measuring the feature importance comprehensively. Furthermore, we enhance the feature fixation by neighborhood association, which leverages additional guidance from spatial and temporal neighbouring tokens. The proposed method significantly improves the linear attention baseline and achieves state-of-the-art performance among linear video Transformers on three popular video classification benchmarks. With fewer parameters and higher efficiency, our performance is even comparable to some Softmax-based quadratic Transformers.
Published: 2022

22. Neural Architecture Search on Efficient Transformers and Beyond

Author: Liu, Zexiang, Li, Dong, Lu, Kaiyue, Qin, Zhen, Sun, Weixuan, Xu, Jiacheng, and Zhong, Yiran
Subjects: Computer Science - Computation and Language
Abstract: Recently, numerous efficient Transformers have been proposed to reduce the quadratic computational complexity of standard Transformers caused by the Softmax attention. However, most of them simply swap Softmax with an efficient attention mechanism without considering the customized architectures specially for the efficient attention. In this paper, we argue that the handcrafted vanilla Transformer architectures for Softmax attention may not be suitable for efficient Transformers. To address this issue, we propose a new framework to find optimal architectures for efficient Transformers with the neural architecture search (NAS) technique. The proposed method is validated on popular machine translation and image classification tasks. We observe that the optimal architecture of the efficient Transformer has the reduced computation compared with that of the standard Transformer, but the general accuracy is less comparable. It indicates that the Softmax attention and efficient attention have their own distinctions but neither of them can simultaneously balance the accuracy and efficiency well. This motivates us to mix the two types of attention to reduce the performance imbalance. Besides the search spaces that commonly used in existing NAS Transformer approaches, we propose a new search space that allows the NAS algorithm to automatically search the attention variants along with architectures. Extensive experiments on WMT' 14 En-De and CIFAR-10 demonstrate that our searched architecture maintains comparable accuracy to the standard Transformer with notably improved computational efficiency.
Published: 2022

23. Audio-Visual Segmentation

Author: Zhou, Jinxing, Wang, Jianyuan, Zhang, Jiayi, Sun, Weixuan, Zhang, Jing, Birchfield, Stan, Guo, Dan, Kong, Lingpeng, Wang, Meng, and Zhong, Yiran
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first audio-visual segmentation benchmark (AVSBench), providing pixel-wise annotations for the sounding objects in audible videos. Two settings are studied with this benchmark: 1) semi-supervised audio-visual segmentation with a single sound source and 2) fully-supervised audio-visual segmentation with multiple sound sources. To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process. We also design a regularization loss to encourage the audio-visual mapping during training. Quantitative and qualitative experiments on the AVSBench compare our approach to several existing methods from related tasks, demonstrating that the proposed method is promising for building a bridge between the audio and pixel-wise visual semantics. Code is available at https://github.com/OpenNLPLab/AVSBench., Comment: ECCV 2022; Code is available at https://github.com/OpenNLPLab/AVSBench
Published: 2022

24. Vicinity Vision Transformer

Author: Sun, Weixuan, Qin, Zhen, Deng, Hui, Wang, Jianyuan, Zhang, Yi, Zhang, Kaihao, Barnes, Nick, Birchfield, Stan, Kong, Lingpeng, and Zhong, Yiran
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision transformers have shown great success on numerous computer vision tasks. However, its central component, softmax attention, prohibits vision transformers from scaling up to high-resolution images, due to both the computational complexity and memory footprint being quadratic. Although linear attention was introduced in natural language processing (NLP) tasks to mitigate a similar issue, directly applying existing linear attention to vision transformers may not lead to satisfactory results. We investigate this problem and find that computer vision tasks focus more on local information compared with NLP tasks. Based on this observation, we present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity. Specifically, for each image patch, we adjust its attention weight based on its 2D Manhattan distance measured by its neighbouring patches. In this case, the neighbouring patches will receive stronger attention than far-away patches. Moreover, since our Vicinity Attention requires the token length to be much larger than the feature dimension to show its efficiency advantages, we further propose a new Vicinity Vision Transformer (VVT) structure to reduce the feature dimension without degenerating the accuracy. We perform extensive experiments on the CIFAR100, ImageNet1K, and ADE20K datasets to validate the effectiveness of our method. Our method has a slower growth rate of GFlops than previous transformer-based and convolution-based networks when the input resolution increases. In particular, our approach achieves state-of-the-art image classification accuracy with 50% fewer parameters than previous methods., Comment: code: https://github.com/OpenNLPLab/Vicinity-Vision-Transformer
Published: 2022

25. In-plane and out-of-plane anisotropy in the optical and dielectric properties of YBa2Cu3O7-x superconducting film

Author: Wang, Yueming, Sun, Weixuan, Zhao, Minglin, Li, Yongfu, Wei, Mingyang, Jin, Kui, Li, Qian, Zhou, Xiang’an, Han, Yating, and Lian, Jie
Published: 2024
Full Text: View/download PDF

26. cosFormer: Rethinking Softmax in Attention

Author: Qin, Zhen, Sun, Weixuan, Deng, Hui, Li, Dongxu, Wei, Yunshen, Lv, Baohong, Yan, Junjie, Kong, Lingpeng, and Zhong, Yiran
Subjects: Computer Science - Computation and Language
Abstract: Transformer has shown great successes in natural language processing, computer vision, and audio processing. As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length. Kernel methods are often adopted to reduce the complexity by approximating the softmax operator. Nevertheless, due to the approximation errors, their performances vary in different tasks/corpus and suffer crucial performance drops when compared with the vanilla softmax attention. In this paper, we propose a linear transformer called cosFormer that can achieve comparable or better accuracy to the vanilla transformer in both casual and cross attentions. cosFormer is based on two key properties of softmax attention: i). non-negativeness of the attention matrix; ii). a non-linear re-weighting scheme that can concentrate the distribution of the attention matrix. As its linear substitute, cosFormer fulfills these properties with a linear operator and a cosine-based distance re-weighting mechanism. Extensive experiments on language modeling and text understanding tasks demonstrate the effectiveness of our method. We further examine our method on long sequences and achieve state-of-the-art performance on the Long-Range Arena benchmark. The source code is available at https://github.com/OpenNLPLab/cosFormer., Comment: Accepted to ICLR2022. Yiran Zhong is the corresponding author. Zhen Qin, Weixuan Sun, Hui Deng contributed equally to this work
Published: 2022

27. GETAM: Gradient-weighted Element-wise Transformer Attention Map for Weakly-supervised Semantic segmentation

Author: Sun, Weixuan, Zhang, Jing, Liu, Zheyuan, Zhong, Yiran, and Barnes, Nick
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Weakly Supervised Semantic Segmentation (WSSS) is challenging, particularly when image-level labels are used to supervise pixel level prediction. To bridge their gap, a Class Activation Map (CAM) is usually generated to provide pixel level pseudo labels. CAMs in Convolutional Neural Networks suffer from partial activation ie, only the most discriminative regions are activated. Transformer based methods, on the other hand, are highly effective at exploring global context with long range dependency modeling, potentially alleviating the "partial activation" issue. In this paper, we propose the first transformer based WSSS approach, and introduce the Gradient weighted Element wise Transformer Attention Map (GETAM). GETAM shows fine scale activation for all feature map elements, revealing different parts of the object across transformer layers. Further, we propose an activation aware label completion module to generate high quality pseudo labels. Finally, we incorporate our methods into an end to end framework for WSSS using double backward propagation. Extensive experiments on PASCAL VOC and COCO demonstrate that our results beat the state-of-the-art end-to-end approaches by a significant margin, and outperform most multi-stage methods.m most multi-stage methods.
Published: 2021

28. Inferring the Class Conditional Response Map for Weakly Supervised Semantic Segmentation

Author: Sun, Weixuan, Zhang, Jing, and Barnes, Nick
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Image-level weakly supervised semantic segmentation (WSSS) relies on class activation maps (CAMs) for pseudo labels generation. As CAMs only highlight the most discriminative regions of objects, the generated pseudo labels are usually unsatisfactory to serve directly as supervision. To solve this, most existing approaches follow a multi-training pipeline to refine CAMs for better pseudo-labels, which includes: 1) re-training the classification model to generate CAMs; 2) post-processing CAMs to obtain pseudo labels; and 3) training a semantic segmentation model with the obtained pseudo labels. However, this multi-training pipeline requires complicated adjustment and additional time. To address this, we propose a class-conditional inference strategy and an activation aware mask refinement loss function to generate better pseudo labels without re-training the classifier. The class conditional inference-time approach is presented to separately and iteratively reveal the classification network's hidden object activation to generate more complete response maps. Further, our activation aware mask refinement loss function introduces a novel way to exploit saliency maps during segmentation training and refine the foreground object masks without suppressing background objects. Our method achieves superior WSSS results without requiring re-training of the classifier.
Published: 2021

29. Protamine 1 as a secreted colorectal cancer-specific antigen facilitating G1/S phase transition under nutrient stress conditions

Author: Ren, Shengnan, Yang, Dingquan, Dong, Yongli, Ni, Weidong, Wang, Meiqi, Xing, Lei, Liu, Tong, Hou, Wenjia, Sun, Weixuan, Zhang, Haolong, Yu, Zhentao, Liu, Yi, Cao, Jingrui, Yan, Hongbo, Feng, Ye, Fang, Xuedong, Wang, Quan, and Chen, Fangfang
Published: 2023
Full Text: View/download PDF

30. 3D Guided Weakly Supervised Semantic Segmentation

Author: Sun, Weixuan, Zhang, Jing, and Barnes, Nick
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Pixel-wise clean annotation is necessary for fully-supervised semantic segmentation, which is laborious and expensive to obtain. In this paper, we propose a weakly supervised 2D semantic segmentation model by incorporating sparse bounding box labels with available 3D information, which is much easier to obtain with advanced sensors. We manually labeled a subset of the 2D-3D Semantics(2D-3D-S) dataset with bounding boxes, and introduce our 2D-3D inference module to generate accurate pixel-wise segment proposal masks. Guided by 3D information, we first generate a point cloud of objects and calculate objectness probability score for each point. Then we project the point cloud with objectness probabilities back to 2D images followed by a refinement step to obtain segment proposals, which are treated as pseudo labels to train a semantic segmentation network. Our method works in a recursive manner to gradually refine the above-mentioned segment proposals. Extensive experimental results on the 2D-3D-S dataset show that the proposed method can generate accurate segment proposals when bounding box labels are available on only a small subset of training images. Performance comparison with recent state-of-the-art methods further illustrates the effectiveness of our method.
Published: 2020

31. Supervised and self-supervised learning-based cascade spatiotemporal fusion framework and its application

Author: Sun, Weixuan, Li, Jie, Jiang, Menghui, and Yuan, Qiangqiang
Published: 2023
Full Text: View/download PDF

32. A Persymmetric GLRT for Underwater Targets in Presence of Range Cell Migration

Author: Sun, Weixuan, Yan, Sheng, Hao, Chengpeng, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Wu, Meiping, editor, Niu, Yifeng, editor, Gu, Mancang, editor, and Cheng, Jin, editor
Published: 2022
Full Text: View/download PDF

33. 3D Guided Weakly Supervised Semantic Segmentation

Author: Sun, Weixuan, Zhang, Jing, Barnes, Nick, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ishikawa, Hiroshi, editor, Liu, Cheng-Lin, editor, Pajdla, Tomas, editor, and Shi, Jianbo, editor
Published: 2021
Full Text: View/download PDF

34. Audio–Visual Segmentation

Author: Zhou, Jinxing, primary, Wang, Jianyuan, additional, Zhang, Jiayi, additional, Sun, Weixuan, additional, Zhang, Jing, additional, Birchfield, Stan, additional, Guo, Dan, additional, Kong, Lingpeng, additional, Wang, Meng, additional, and Zhong, Yiran, additional
Published: 2022
Full Text: View/download PDF

35. A Persymmetric GLRT for Underwater Targets in Presence of Range Cell Migration

Author: Sun, Weixuan, primary, Yan, Sheng, additional, and Hao, Chengpeng, additional
Published: 2022
Full Text: View/download PDF

36. Inhibition of porcine pancreatic α-amylase activity by chlorogenic acid

Author: Zheng, Yuxue, Yang, Wenhan, Sun, Weixuan, Chen, Shiguo, Liu, Donghong, Kong, Xiangli, Tian, Jinhu, and Ye, Xingqian
Published: 2020
Full Text: View/download PDF

37. Bi-directional Training for Composed Image Retrieval via Text Prompt Learning

Author: Liu, Zheyuan, primary, Sun, Weixuan, additional, Hong, Yicong, additional, Teney, Damien, additional, and Gould, Stephen, additional
Published: 2024
Full Text: View/download PDF

38. In-Plane and Out-of-Plane Anisotropy in the Optical and Dielectric Properties of Yba2cu3o7-X Superconducting Film

Author: Wang, Yueming, primary, Sun, Weixuan, additional, Zhao, Minglin, additional, Li, Yongfu, additional, Wei, Mingyang, additional, Jin, Kui, additional, Li, Qian, additional, Zhou, Xiang’an, additional, Han, Yating, additional, and Lian, Jie, additional
Published: 2024
Full Text: View/download PDF

39. Role of glutamine and its metabolite ammonia in crosstalk of cancer-associated fibroblasts and cancer cells

Author: Li, Xiao, Zhu, Hongming, Sun, Weixuan, Yang, Xingru, Nie, Qing, and Fang, Xuedong
Published: 2021
Full Text: View/download PDF

40. Controlled ultrasound treatments modify the morphology and physical properties of rice starch rather than the fine structure

Author: Yang, Wenhan, Kong, Xiangli, Zheng, Yuxue, Sun, Weixuan, Chen, Shiguo, Liu, Donghong, Zhang, Huiling, Fang, Haitian, Tian, Jinhu, and Ye, Xingqian
Published: 2019
Full Text: View/download PDF

41. 3D Guided Weakly Supervised Semantic Segmentation

Author: Sun, Weixuan, primary, Zhang, Jing, additional, and Barnes, Nick, additional
Published: 2021
Full Text: View/download PDF

42. All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation

Author: Sun, Weixuan, primary, Zhang, Yanhao, additional, Qin, Zhen, additional, Liu, Zheyuan, additional, Cheng, Lin, additional, Wang, Fanyi, additional, Zhong, Yiran, additional, and Barnes, Nick, additional
Published: 2023
Full Text: View/download PDF

43. Traditional Goji Berry-Based Functional Food in Chinese History

Author: Ye, Xingqian, primary, Tian, Jinhu, additional, Zheng, Yuxue, additional, Sun, Weixuan, additional, and Yang, Wenhan, additional
Published: 2020
Full Text: View/download PDF

44. Matting Moments: A Unified Data-Driven Matting Engine for Mobile AIGC in Photo Gallery

Author: Zhang, Yanhao, primary, Wang, Fanyi, additional, Sun, Weixuan, additional, Su, Jingwen, additional, Liu, Peng, additional, Li, Yaqian, additional, Feng, Xinjie, additional, and Zou, Zhengxia, additional
Published: 2023
Full Text: View/download PDF

45. Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

Author: Sun, Weixuan, primary, Zhang, Jiayi, additional, Wang, Jianyuan, additional, Liu, Zheyuan, additional, Zhong, Yiran, additional, Feng, Tianpeng, additional, Guo, Yandong, additional, Zhang, Yanhao, additional, and Barnes, Nick, additional
Published: 2023
Full Text: View/download PDF

46. RETRACTED ARTICLE: Curcumin inhibits proliferation, migration, invasion and promotes apoptosis of retinoblastoma cell lines through modulation of miR-99a and JAK/STAT pathway

Author: Li, Yaping, Sun, Weixuan, Han, Ning, Zou, Ying, and Yin, Dexin
Published: 2018
Full Text: View/download PDF

47. Autophagy Inhibitor (LY294002) and 5-fluorouracil (5-FU) Combination-Based Nanoliposome for Enhanced Efficacy Against Esophageal Squamous Cell Carcinoma

Author: Feng, Ye, Gao, Yongjian, Wang, Dayu, Xu, Zhonghang, Sun, Weixuan, and Ren, Ping
Published: 2018
Full Text: View/download PDF

48. Vicinity Vision Transformer

Author: Sun, Weixuan, Qin, Zhen, Deng, Hui, Wang, Jianyuan, Zhang, Yi, Zhang, Kaihao, Barnes, Nick, Birchfield, Stan, Kong, Lingpeng, and Zhong, Yiran
Abstract: Vision transformers have shown great success on numerous computer vision tasks. However, their central component, softmax attention, prohibits vision transformers from scaling up to high-resolution images, due to both the computational complexity and memory footprint being quadratic. Linear attention was introduced in natural language processing (NLP) which reorders the self-attention mechanism to mitigate a similar issue, but directly applying existing linear attention to vision may not lead to satisfactory results. We investigate this problem and point out that existing linear attention methods ignore an inductive bias in vision tasks, i.e., 2D locality. In this article, we propose Vicinity Attention, which is a type of linear attention that integrates 2D locality. Specifically, for each image patch, we adjust its attention weight based on its 2D Manhattan distance from its neighbouring patches. In this case, we achieve 2D locality in a linear complexity where the neighbouring image patches receive stronger attention than far away patches. In addition, we propose a novel Vicinity Attention Block that is comprised of Feature Reduction Attention (FRA) and Feature Preserving Connection (FPC) in order to address the computational bottleneck of linear attention approaches, including our Vicinity Attention, whose complexity grows quadratically with respect to the feature dimension. The Vicinity Attention Block computes attention in a compressed feature space with an extra skip connection to retrieve the original feature distribution. We experimentally validate that the block further reduces computation without degenerating the accuracy. Finally, to validate the proposed methods, we build a linear vision transformer backbone named Vicinity Vision Transformer (VVT). Targeting general vision tasks, we build VVT in a pyramid structure with progressively reduced sequence length. We perform extensive experiments on CIFAR-100, ImageNet-1 k, and ADE20 K datasets to validate the effectiveness of our method. Our method has a slower growth rate in terms of computational overhead than previous transformer-based and convolution-based networks when the input resolution increases. In particular, our approach achieves state-of-the-art image classification accuracy with 50% fewer parameters than previous approaches.
Published: 2023
Full Text: View/download PDF

49. Lnc HAGLR Promotes Colon Cancer Progression Through Sponging miR‐185‐5p and Activating CDK4 and CDK6 in vitro and in vivo

Author: Sun, Weixuan, Nie, Wenting, Wang, Zhaoyi, Zhang, Haolong, Li, Yezhou, and Fang, Xuedong
Subjects: lnc HAGLR, colon cancer, proliferation, apoptosis, CDK4/CDK6, miR-185-5p, Original Research
Abstract: Background/Aim LncRNA plays a key role in tumor progression. HAGLR functions as an oncogene in many cancers. However, the molecular mechanism of HAGLR in colon cancer is still unclear. Methods qRT-PCR was used to measure the expression of HAGLR, miR-185-5p in colon cancer. The expression of CDK4 and CDK6 was detected by Western blot. CCK-8 assay, EdU staining, transwell and Annexin V-FITC/PI assay were used to analyze the effect of HAGLR and miR-185-5p on cell proliferation, invasion, migration and apoptosis. Bioinformatic analysis and luciferase were used to analyze the target genes of HAGLR and miR-185-5p. Nude mice were used to detect mouse tumor changes. Results Compared with normal colon cancer tissues and cells, the expression of HAGLR was increased in colon cancer tissues and cells. In addition, the expression of HAGLR down-regulation inhibited the growth, migration, and invasion of colon cancer cells. MiR-185-5p was reduced in colon cancer, and CDK4 and CDK6 acted as target genes of miR-185-5p to regulate the progress of colon cancer. And CDK4 and CDK6 were predicted as downstream targets of miR-185-5p. Finally, it was demonstrated that HAGLR regulated tumor progression in vivo. Conclusion Lnc HAGLR promoted the development of colon cancer by miR-185-5p/CDK4/CDK6 axis, and lnc HAGLR might be potential target for colon cancer.
Published: 2020

50. Identification of Key Genes Related With Aspartic Acid Metabolism and Corresponding Protein Expression in Human Colon Cancer With Postoperative Prognosis and the Underlying Molecular Pathways Prediction

Author: Sun, Weixuan, primary, Jia, Chaoran, additional, Zhang, Xiaojun, additional, Wang, Zhaoyi, additional, Li, Yaping, additional, and Fang, Xuedong, additional
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

166 results on '"Sun, Weixuan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources