2,021 results on '"Nguyen, Anh P."'
Search Results
2. Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications
- Author
-
Nguyen, Nghia, Vu, Minh Nhat, Ta, Tung D., Huang, Baoru, Vo, Thieu, Le, Ngan, and Nguyen, Anh
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Vision language models have played a key role in extracting meaningful features for various robotic applications. Among these, Contrastive Language-Image Pretraining (CLIP) is widely used in robotic tasks that require both vision and natural language understanding. However, CLIP was trained solely on static images paired with text prompts and has not yet been fully adapted for robotic tasks involving dynamic actions. In this paper, we introduce Robotic-CLIP to enhance robotic perception capabilities. We first gather and label large-scale action data, and then build our Robotic-CLIP by fine-tuning CLIP on 309,433 videos (~7.4 million frames) of action data using contrastive learning. By leveraging action data, Robotic-CLIP inherits CLIP's strong image performance while gaining the ability to understand actions in robotic contexts. Intensive experiments show that our Robotic-CLIP outperforms other CLIP-based models across various language-driven robotic tasks. Additionally, we demonstrate the practical effectiveness of Robotic-CLIP in real-world grasping applications., Comment: 7 pages
- Published
- 2024
3. GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning
- Author
-
Nguyen, Huy Hoang, Vuong, An, Nguyen, Anh, Reid, Ian, and Vu, Minh Nhat
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Grasp detection is a fundamental robotic task critical to the success of many industrial applications. However, current language-driven models for this task often struggle with cluttered images, lengthy textual descriptions, or slow inference speed. We introduce GraspMamba, a new language-driven grasp detection method that employs hierarchical feature fusion with Mamba vision to tackle these challenges. By leveraging rich visual features of the Mamba-based backbone alongside textual information, our approach effectively enhances the fusion of multimodal features. GraspMamba represents the first Mamba-based grasp detection model to extract vision and language features at multiple scales, delivering robust performance and rapid inference time. Intensive experiments show that GraspMamba outperforms recent methods by a clear margin. We validate our approach through real-world robotic experiments, highlighting its fast inference speed., Comment: 8 pages. Project page: https://airvlab.github.io/grasp-anything/
- Published
- 2024
4. Origin of yield stress and mechanical plasticity in biological tissues
- Author
-
Nguyen, Anh Q., Huang, Junxiang, and Bi, Dapeng
- Subjects
Physics - Biological Physics ,Condensed Matter - Disordered Systems and Neural Networks ,Condensed Matter - Materials Science ,Condensed Matter - Soft Condensed Matter - Abstract
During development and under normal physiological conditions, biological tissues are continuously subjected to substantial mechanical stresses. In response to large deformations cells in a tissue must undergo multicellular rearrangements in order to maintain integrity and robustness. However, how these events are connected in time and space remains unknown. Here, using computational and theoretical modeling, we studied the mechanical plasticity of epithelial monolayers under large deformations. Our results demonstrate that the jamming-unjamming (solid-fluid) transition in tissues can vary significantly depending on the degree of deformation, implying that tissues are highly unconventional materials. Using analytical modeling, we elucidate the origins of this behavior. We also demonstrate how a tissue accommodates large deformations through a collective series of rearrangements, which behave similarly to avalanches in non-living materials. We find that these tissue avalanches are governed by stress redistribution and the spatial distribution of vulnerable spots. Finally, we propose a simple and experimentally accessible framework to predict avalanches and infer tissue mechanical stress based on static images.
- Published
- 2024
5. Provable Hyperparameter Tuning for Structured Pfaffian Settings
- Author
-
Balcan, Maria-Florina, Nguyen, Anh Tuan, and Sharma, Dravyansh
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
Data-driven algorithm design automatically adapts algorithms to specific application domains, achieving better performance. In the context of parameterized algorithms, this approach involves tuning the algorithm parameters using problem instances drawn from the problem distribution of the target application domain. While empirical evidence supports the effectiveness of data-driven algorithm design, providing theoretical guarantees for several parameterized families remains challenging. This is due to the intricate behaviors of their corresponding utility functions, which typically admit piece-wise and discontinuity structures. In this work, we present refined frameworks for providing learning guarantees for parameterized data-driven algorithm design problems in both distributional and online learning settings. For the distributional learning setting, we introduce the Pfaffian GJ framework, an extension of the classical GJ framework, capable of providing learning guarantees for function classes for which the computation involves Pfaffian functions. Unlike the GJ framework, which is limited to function classes with computation characterized by rational functions, our proposed framework can deal with function classes involving Pfaffian functions, which are much more general and widely applicable. We then show that for many parameterized algorithms of interest, their utility function possesses a refined piece-wise structure, which automatically translates to learning guarantees using our proposed framework. For the online learning setting, we provide a new tool for verifying dispersion property of a sequence of loss functions. This sufficient condition allows no-regret learning for sequences of piece-wise structured loss functions where the piece-wise structure involves Pfaffian transition boundaries.
- Published
- 2024
6. CathAction: A Benchmark for Endovascular Intervention Understanding
- Author
-
Huang, Baoru, Vo, Tuan, Kongtongvattana, Chayun, Dagnino, Giulio, Kundrat, Dennis, Chi, Wenqiang, Abdelaziz, Mohamed, Kwok, Trevor, Jianu, Tudor, Do, Tuong, Le, Hieu, Nguyen, Minh, Nguyen, Hoan, Tjiputra, Erman, Tran, Quang, Xie, Jianyang, Meng, Yanda, Bhattarai, Binod, Tan, Zhaorui, Liu, Hongbin, Gan, Hong Seng, Wang, Wei, Yang, Xi, Wang, Qiufeng, Su, Jionglong, Huang, Kaizhu, Stefanidis, Angelos, Guo, Min, Du, Bo, Tao, Rong, Vu, Minh, Zheng, Guoyan, Zheng, Yalin, Vasconcelos, Francisco, Stoyanov, Danail, Elson, Daniel, Baena, Ferdinando Rodriguez y, and Nguyen, Anh
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Real-time visual feedback from catheterization analysis is crucial for enhancing surgical safety and efficiency during endovascular interventions. However, existing datasets are often limited to specific tasks, small scale, and lack the comprehensive annotations necessary for broader endovascular intervention understanding. To tackle these limitations, we introduce CathAction, a large-scale dataset for catheterization understanding. Our CathAction dataset encompasses approximately 500,000 annotated frames for catheterization action understanding and collision detection, and 25,000 ground truth masks for catheter and guidewire segmentation. For each task, we benchmark recent related works in the field. We further discuss the challenges of endovascular intentions compared to traditional computer vision tasks and point out open research questions. We hope that CathAction will facilitate the development of endovascular intervention understanding methods that can be applied to real-world applications. The dataset is available at https://airvlab.github.io/cathaction/., Comment: 10 pages. Webpage: https://airvlab.github.io/cathaction/
- Published
- 2024
7. Enhancing Material Screening at Boulby Underground Laboratory with XIA UltraLo-1800 Alpha Particle Detectors
- Author
-
Maouloud, Sid El Moctar Ahmed, Liu, XinRan, Nguyen, Anh, Dobson, James Edward Young, Ghag, Chamkaur, Floch, Léna Le, Meehan, Emma, Murphy, Alexander St. John, Paling, Sean, Saakyan, Ruben, Scovell, Paul Robert, and Toth, Christopher
- Subjects
Physics - Instrumentation and Detectors - Abstract
The Boulby UnderGround Screening (BUGS) facility, located at the Boulby Underground Laboratory, has significantly advanced its material screening capabilities by installing two XIA UltraLo-1800 alpha particle detectors. This study presents a comprehensive evaluation of one of these detectors, operated 1,100 meters underground at the Boulby Underground Laboratory, which provides significant shielding from cosmic radiation and maintains a low ambient radon activity of 2.30 $\pm$ 0.03 Bq/m$^3$. Our evaluation focuses on energy reconstruction accuracy, background radiation rates, and operational stability. The XIA UltraLo-1800 detector demonstrates remarkable stability in energy reconstruction, with less than 0.1 MeV variation over four years. Moreover, the implementation of a graphite-filled PTFE liner in the sample tray resulted in a significant reduction in background radiation levels compared to measurements with the original stainless steel tray, achieving an average activity of 0.15 $\pm$ 0.01 $\alpha$/cm$^2$/khr. Copper sample assays, performed before and after radon exposure, demonstrated the detector's ability to accurately identify and quantify $^{210}$Po contamination. By implementing the robust cleanliness procedures and protocols described in this article, we observed a reduction in $^{210}$Po activity from 0.504 $\pm$ 0.022 mBq to 0.336 $\pm$ 0.013 mBq, highlighting the crucial role of refined cleaning methods in minimizing background for sensitive experiments. Additionally, observations of elevated background activity levels post-high-activity sample measurements illustrate the need for careful management of assay conditions and environment to maintain low background levels. These results highlight the potential of the XIA UltraLo-1800 in enhancing the precision of material assays essential for reducing background interference in rare event experiments.
- Published
- 2024
8. XMainframe: A Large Language Model for Mainframe Modernization
- Author
-
Dau, Anh T. V., Dao, Hieu Trung, Nguyen, Anh Tuan, Tran, Hieu Trung, Nguyen, Phong X., and Bui, Nghi D. Q.
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Mainframe operating systems, despite their inception in the 1940s, continue to support critical sectors like finance and government. However, these systems are often viewed as outdated, requiring extensive maintenance and modernization. Addressing this challenge necessitates innovative tools that can understand and interact with legacy codebases. To this end, we introduce XMainframe, a state-of-the-art large language model (LLM) specifically designed with knowledge of mainframe legacy systems and COBOL codebases. Our solution involves the creation of an extensive data collection pipeline to produce high-quality training datasets, enhancing XMainframe's performance in this specialized domain. Additionally, we present MainframeBench, a comprehensive benchmark for assessing mainframe knowledge, including multiple-choice questions, question answering, and COBOL code summarization. Our empirical evaluations demonstrate that XMainframe consistently outperforms existing state-of-the-art LLMs across these tasks. Specifically, XMainframe achieves 30% higher accuracy than DeepSeek-Coder on multiple-choice questions, doubles the BLEU score of Mixtral-Instruct 8x7B on question answering, and scores six times higher than GPT-3.5 on COBOL summarization. Our work highlights the potential of XMainframe to drive significant advancements in managing and modernizing legacy systems, thereby enhancing productivity and saving time for software developers.
- Published
- 2024
9. Exploring the Integration of the Happy School Model in Vietnamese Higher Education: Insights and Implications from the Perspectives of Tertiary EFL Teachers
- Author
-
Nguyen Anh Thi, Le Thanh Thao, Phuong Hoang Yen, Pham Trut Thuy, Huynh Thi Anh Thu, and Nguyen Huong Tra
- Abstract
This qualitative study explored the possibility of implementing the happy school model (HSM) in the context of Vietnamese higher education, with a focus on the socio-cultural perspectives of nine tertiary English as a foreign language (EFL) teachers at different career stages. Through semi-structured interviews, thematic analysis, and theoretical underpinning by constructivist paradigm and humanistic education theory, the study illuminated multifaceted insights. Key themes emerged, including aligning the HSM with holistic student development, recognizing challenges and potential benefits, balancing traditional Confucian values, and adapting the model to Vietnam's unique socio-cultural and economic landscape. The findings provide valuable guidance for educational innovation in Vietnam, highlighting complexities of aligning a new educational paradigm with existing practices and cultural norms. While the study's focus on a specific cultural context and limited participant pool presents certain limitations, the insights offer rich contributions to the broader global dialogue on education and human development. Future research directions and practical implications are also discussed, making this study a valuable resource for educators, policymakers, and researchers interested in the intersection of universal educational principles and specific cultural contexts like Vietnam.
- Published
- 2024
10. Language-driven Grasp Detection with Mask-guided Attention
- Author
-
Van Vo, Tuan, Vu, Minh Nhat, Huang, Baoru, Vuong, An, Le, Ngan, Vo, Thieu, and Nguyen, Anh
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Grasp detection is an essential task in robotics with various industrial applications. However, traditional methods often struggle with occlusions and do not utilize language for grasping. Incorporating natural language into grasp detection remains a challenging task and largely unexplored. To address this gap, we propose a new method for language-driven grasp detection with mask-guided attention by utilizing the transformer attention mechanism with semantic segmentation features. Our approach integrates visual data, segmentation mask features, and natural language instructions, significantly improving grasp detection accuracy. Our work introduces a new framework for language-driven grasp detection, paving the way for language-driven robotic applications. Intensive experiments show that our method outperforms other recent baselines by a clear margin, with a 10.0% success score improvement. We further validate our method in real-world robotic experiments, confirming the effectiveness of our approach., Comment: Accepted at IROS 2024
- Published
- 2024
11. Scalable Group Choreography via Variational Phase Manifold Learning
- Author
-
Le, Nhat, Do, Khoa, Bui, Xuan, Do, Tuong, Tjiputra, Erman, Tran, Quang D., and Nguyen, Anh
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Generating group dance motion from the music is a challenging task with several industrial applications. Although several methods have been proposed to tackle this problem, most of them prioritize optimizing the fidelity in dancing movement, constrained by predetermined dancer counts in datasets. This limitation impedes adaptability to real-world applications. Our study addresses the scalability problem in group choreography while preserving naturalness and synchronization. In particular, we propose a phase-based variational generative model for group dance generation on learning a generative manifold. Our method achieves high-fidelity group dance motion and enables the generation with an unlimited number of dancers while consuming only a minimal and constant amount of memory. The intensive experiments on two public datasets show that our proposed method outperforms recent state-of-the-art approaches by a large margin and is scalable to a great number of dancers beyond the training data., Comment: Accepted at ECCV 2024
- Published
- 2024
12. Lightweight Language-driven Grasp Detection using Conditional Consistency Model
- Author
-
Nguyen, Nghia, Vu, Minh Nhat, Huang, Baoru, Vuong, An, Le, Ngan, Vo, Thieu, and Nguyen, Anh
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Language-driven grasp detection is a fundamental yet challenging task in robotics with various industrial applications. In this work, we present a new approach for language-driven grasp detection that leverages the concept of lightweight diffusion models to achieve fast inference time. By integrating diffusion processes with grasping prompts in natural language, our method can effectively encode visual and textual information, enabling more accurate and versatile grasp positioning that aligns well with the text query. To overcome the long inference time problem in diffusion models, we leverage the image and text features as the condition in the consistency model to reduce the number of denoising timesteps during inference. The intensive experimental results show that our method outperforms other recent grasp detection methods and lightweight diffusion models by a clear margin. We further validate our method in real-world robotic experiments to demonstrate its fast inference time capability., Comment: Accepted at IROS 2024
- Published
- 2024
13. Sentiment Reasoning for Healthcare
- Author
-
Le-Duc, Khai, Nguyen, Khai-Nguyen, Tat, Bach Phan, Le, Duy, Ngo, Jerry, Vo-Dang, Long, Nguyen, Anh Totti, and Hy, Truong-Son
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Transparency in AI decision-making is crucial in healthcare due to the severe consequences of errors, and this is important for building trust among AI and users in sentiment analysis task. Incorporating reasoning capabilities helps Large Language Models (LLMs) understand human emotions within broader contexts, handle nuanced and ambiguous language, and infer underlying sentiments that may not be explicitly stated. In this work, we introduce a new task - Sentiment Reasoning - for both speech and text modalities, along with our proposed multimodal multitask framework and dataset. Our study showed that rationale-augmented training enhances model performance in sentiment classification across both human transcript and ASR settings. Also, we found that the generated rationales typically exhibit different vocabularies compared to human-generated rationales, but maintain similar semantics. All code, data (English-translated and Vietnamese) and models are published online: https://github.com/leduckhai/MultiMed, Comment: Preprint, 18 pages
- Published
- 2024
14. Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++
- Author
-
Nguyen, Anh The, Le, Triet Huynh Minh, and Babar, M. Ali
- Subjects
Computer Science - Software Engineering ,Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
Background: The C and C++ languages hold significant importance in Software Engineering research because of their widespread use in practice. Numerous studies have utilized Machine Learning (ML) and Deep Learning (DL) techniques to detect software vulnerabilities (SVs) in the source code written in these languages. However, the application of these techniques in function-level SV assessment has been largely unexplored. SV assessment is increasingly crucial as it provides detailed information on the exploitability, impacts, and severity of security defects, thereby aiding in their prioritization and remediation. Aims: We conduct the first empirical study to investigate and compare the performance of ML and DL models, many of which have been used for SV detection, for function-level SV assessment in C/C++. Method: Using 9,993 vulnerable C/C++ functions, we evaluated the performance of six multi-class ML models and five multi-class DL models for the SV assessment at the function level based on the Common Vulnerability Scoring System (CVSS). We further explore multi-task learning, which can leverage common vulnerable code to predict all SV assessment outputs simultaneously in a single model, and compare the effectiveness and efficiency of this model type with those of the original multi-class models. Results: We show that ML has matching or even better performance compared to the multi-class DL models for function-level SV assessment with significantly less training time. Employing multi-task learning allows the DL models to perform significantly better, with an average of 8-22% increase in Matthews Correlation Coefficient (MCC). Conclusions: We distill the practices of using data-driven techniques for function-level SV assessment in C/C++, including the use of multi-task DL to balance efficiency and effectiveness. This can establish a strong foundation for future work in this area., Comment: Accepted as a full paper in the technical track at The International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024
- Published
- 2024
- Full Text
- View/download PDF
15. Fusion and Cross-Modal Transfer for Zero-Shot Human Action Recognition
- Author
-
Kamboj, Abhi, Nguyen, Anh Duy, and Do, Minh
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Despite living in a multi-sensory world, most AI models are limited to textual and visual interpretations of human motion and behavior. Inertial measurement units (IMUs) provide a salient signal to understand human motion; however, they are challenging to use due to their uninterpretability and scarcity of their data. We investigate a method to transfer knowledge between visual and inertial modalities using the structure of an informative joint representation space designed for human action recognition (HAR). We apply the resulting Fusion and Cross-modal Transfer (FACT) method to a novel setup, where the model does not have access to labeled IMU data during training and is able to perform HAR with only IMU data during testing. Extensive experiments on a wide range of RGB-IMU datasets demonstrate that FACT significantly outperforms existing methods in zero-shot cross-modal transfer.
- Published
- 2024
16. Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
- Author
-
Nguyen, Toan, Vu, Minh Nhat, Huang, Baoru, Vuong, An, Vuong, Quan, Le, Ngan, Vo, Thieu, and Nguyen, Anh
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
6-DoF grasp detection has been a fundamental and challenging problem in robotic vision. While previous works have focused on ensuring grasp stability, they often do not consider human intention conveyed through natural language, hindering effective collaboration between robots and users in complex 3D environments. In this paper, we present a new approach for language-driven 6-DoF grasp detection in cluttered point clouds. We first introduce Grasp-Anything-6D, a large-scale dataset for the language-driven 6-DoF grasp detection task with 1M point cloud scenes and more than 200M language-associated 3D grasp poses. We further introduce a novel diffusion model that incorporates a new negative prompt guidance learning strategy. The proposed negative prompt strategy directs the detection process toward the desired object while steering away from unwanted ones given the language input. Our method enables an end-to-end framework where humans can command the robot to grasp desired objects in a cluttered scene using natural language. Intensive experimental results show the effectiveness of our method in both benchmarking experiments and real-world scenarios, surpassing other baselines. In addition, we demonstrate the practicality of our approach in real-world robotic applications. Our project is available at https://airvlab.github.io/grasp-anything., Comment: Accepted at ECCV 2024
- Published
- 2024
17. LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task
- Author
-
Le-Duc, Khai, Zhang, Ryan, Nguyen, Ngoc Son, Pham, Tan-Hanh, Dao, Anh, Ngo, Ba Hung, Nguyen, Anh Totti, and Hy, Truong-Son
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Multimedia - Abstract
Vision-language models have been extensively explored across a wide range of tasks, achieving satisfactory performance; however, their application in medical imaging remains underexplored. In this work, we propose a unified framework - LiteGPT - for the medical imaging. We leverage multiple pre-trained visual encoders to enrich information and enhance the performance of vision-language models. To the best of our knowledge, this is the first study to utilize vision-language models for the novel task of joint localization and classification in medical images. Besides, we are pioneers in providing baselines for disease localization in chest X-rays. Finally, we set new state-of-the-art performance in the image classification task on the well-benchmarked VinDr-CXR dataset. All code and models are publicly available online: https://github.com/leduckhai/LiteGPT, Comment: Preprint, 19 pages
- Published
- 2024
18. GPC: Generative and General Pathology Image Classifier
- Author
-
Nguyen, Anh Tien and Kwak, Jin Tae
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep learning has been increasingly incorporated into various computational pathology applications to improve its efficiency, accuracy, and robustness. Although successful, most previous approaches for image classification have crucial drawbacks. There exist numerous tasks in pathology, but one needs to build a model per task, i.e., a task-specific model, thereby increasing the number of models, training resources, and cost. Moreover, transferring arbitrary task-specific model to another task is still a challenging problem. Herein, we propose a task-agnostic generative and general pathology image classifier, so called GPC, that aims at learning from diverse kinds of pathology images and conducting numerous classification tasks in a unified model. GPC, equipped with a convolutional neural network and a Transformer-based language model, maps pathology images into a high-dimensional feature space and generates pertinent class labels as texts via the image-to-text classification mechanism. We evaluate GPC on six datasets for four different pathology image classification tasks. Experimental results show that GPC holds considerable potential for developing an effective and efficient universal model for pathology image analysis., Comment: MICCAI-MedAGI 2023 (Best Paper Honorable Mention)
- Published
- 2024
19. CAMP: Continuous and Adaptive Learning Model in Pathology
- Author
-
Nguyen, Anh Tien, Byeon, Keunho, Kim, Kyungeun, Song, Boram, Chae, Seoung Wan, and Kwak, Jin Tae
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
There exist numerous diagnostic tasks in pathology. Conventional computational pathology formulates and tackles them as independent and individual image classification problems, thereby resulting in computational inefficiency and high costs. To address the challenges, we propose a generic, unified, and universal framework, called a continuous and adaptive learning model in pathology (CAMP), for pathology image classification. CAMP is a generative, efficient, and adaptive classification model that can continuously adapt to any classification task by leveraging pathology-specific prior knowledge and learning taskspecific knowledge with minimal computational cost and without forgetting the knowledge from the existing tasks. We evaluated CAMP on 22 datasets, including 1,171,526 patches and 11,811 pathology slides, across 17 classification tasks. CAMP achieves state-of-theart classification performance on a wide range of datasets and tasks at both patch- and slide-levels and reduces up to 94% of computation time and 85% of storage memory in comparison to the conventional classification models. Our results demonstrate that CAMP can offer a fundamental transformation in pathology image classification, paving the way for the fully digitized and computerized pathology practice., Comment: Under review
- Published
- 2024
20. Adaptive Parametric Activation
- Author
-
Alexandridis, Konstantinos Panagiotis, Deng, Jiankang, Nguyen, Anh, and Luo, Shan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
The activation function plays a crucial role in model optimisation, yet the optimal choice remains unclear. For example, the Sigmoid activation is the de-facto activation in balanced classification tasks, however, in imbalanced classification, it proves inappropriate due to bias towards frequent classes. In this work, we delve deeper in this phenomenon by performing a comprehensive statistical analysis in the classification and intermediate layers of both balanced and imbalanced networks and we empirically show that aligning the activation function with the data distribution, enhances the performance in both balanced and imbalanced tasks. To this end, we propose the Adaptive Parametric Activation (APA) function, a novel and versatile activation function that unifies most common activation functions under a single formula. APA can be applied in both intermediate layers and attention layers, significantly outperforming the state-of-the-art on several imbalanced benchmarks such as ImageNet-LT, iNaturalist2018, Places-LT, CIFAR100-LT and LVIS and balanced benchmarks such as ImageNet1K, COCO and V3DET. The code is available at https://github.com/kostas1515/AGLU., Comment: ECCV2024
- Published
- 2024
21. Towards a text-based quantitative and explainable histopathology image analysis
- Author
-
Nguyen, Anh Tien, Vuong, Trinh Thi Le, and Kwak, Jin Tae
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Recently, vision-language pre-trained models have emerged in computational pathology. Previous works generally focused on the alignment of image-text pairs via the contrastive pre-training paradigm. Such pre-trained models have been applied to pathology image classification in zero-shot learning or transfer learning fashion. Herein, we hypothesize that the pre-trained vision-language models can be utilized for quantitative histopathology image analysis through a simple image-to-text retrieval. To this end, we propose a Text-based Quantitative and Explainable histopathology image analysis, which we call TQx. Given a set of histopathology images, we adopt a pre-trained vision-language model to retrieve a word-of-interest pool. The retrieved words are then used to quantify the histopathology images and generate understandable feature embeddings due to the direct mapping to the text description. To evaluate the proposed method, the text-based embeddings of four histopathology image datasets are utilized to perform clustering and classification tasks. The results demonstrate that TQx is able to quantify and analyze histopathology images that are comparable to the prevalent visual models in computational pathology., Comment: MICCAI 2024 - Early acceptance (Top 11%)
- Published
- 2024
22. Vision language models are blind
- Author
-
Rahmanzadehgervi, Pooyan, Bolton, Logan, Taesiri, Mohammad Reza, and Nguyen, Anh Totti
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
While large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro, are powering various image-text applications and scoring high on many vision-understanding benchmarks, we find that they are surprisingly still struggling with low-level vision tasks that are easy to humans. Specifically, on BlindTest, our suite of 7 very simple tasks such as identifying (a) whether two circles overlap; (b) whether two lines intersect; (c) which letter is being circled in a word; and (d) counting circles in an Olympic-like logo, four state-of-the-art VLMs are only 58.57% accurate on average. Claude 3.5 Sonnet performs the best at 74.94% accuracy, but this is still far from the human expected accuracy of 100%. Across different image resolutions and line widths, VLMs consistently struggle with tasks that require precise spatial information and recognizing geometric primitives that overlap or are close together. Code and data are available at: https://vlmsareblind.github.io
- Published
- 2024
23. Multicell-Fold: geometric learning in folding multicellular life
- Author
-
Yang, Haiqian, Nguyen, Anh Q., Bi, Dapeng, Buehler, Markus J., and Guo, Ming
- Subjects
Condensed Matter - Soft Condensed Matter ,Computer Science - Machine Learning ,Physics - Biological Physics - Abstract
During developmental processes such as embryogenesis, how a group of cells fold into specific structures, is a central question in biology that defines how living organisms form. Establishing tissue-level morphology critically relies on how every single cell decides to position itself relative to its neighboring cells. Despite its importance, it remains a major challenge to understand and predict the behavior of every cell within the living tissue over time during such intricate processes. To tackle this question, we propose a geometric deep learning model that can predict multicellular folding and embryogenesis, accurately capturing the highly convoluted spatial interactions among cells. We demonstrate that multicellular data can be represented with both granular and foam-like physical pictures through a unified graph data structure, considering both cellular interactions and cell junction networks. We successfully use our model to achieve two important tasks, interpretable 4-D morphological sequence alignment, and predicting local cell rearrangements before they occur at single-cell resolution. Furthermore, using an activation map and ablation studies, we demonstrate that cell geometries and cell junction networks together regulate local cell rearrangement which is critical for embryo morphogenesis. This approach provides a novel paradigm to study morphogenesis, highlighting a unified data structure and harnessing the power of geometric deep learning to accurately model the mechanisms and behaviors of cells during development. It offers a pathway toward creating a unified dynamic morphological atlas for a variety of developmental processes such as embryogenesis.
- Published
- 2024
24. CLIP-DR: Textual Knowledge-Guided Diabetic Retinopathy Grading with Ranking-aware Prompting
- Author
-
Yu, Qinkai, Xie, Jianyang, Nguyen, Anh, Zhao, He, Zhang, Jiong, Fu, Huazhu, Zhao, Yitian, Zheng, Yalin, and Meng, Yanda
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Diabetic retinopathy (DR) is a complication of diabetes and usually takes decades to reach sight-threatening levels. Accurate and robust detection of DR severity is critical for the timely management and treatment of diabetes. However, most current DR grading methods suffer from insufficient robustness to data variability (\textit{e.g.} colour fundus images), posing a significant difficulty for accurate and robust grading. In this work, we propose a novel DR grading framework CLIP-DR based on three observations: 1) Recent pre-trained visual language models, such as CLIP, showcase a notable capacity for generalisation across various downstream tasks, serving as effective baseline models. 2) The grading of image-text pairs for DR often adheres to a discernible natural sequence, yet most existing DR grading methods have primarily overlooked this aspect. 3) A long-tailed distribution among DR severity levels complicates the grading process. This work proposes a novel ranking-aware prompting strategy to help the CLIP model exploit the ordinal information. Specifically, we sequentially design learnable prompts between neighbouring text-image pairs in two different ranking directions. Additionally, we introduce a Similarity Matrix Smooth module into the structure of CLIP to balance the class distribution. Finally, we perform extensive comparisons with several state-of-the-art methods on the GDRBench benchmark, demonstrating our CLIP-DR's robustness and superior performance. The implementation code is available \footnote{\url{https://github.com/Qinkaiyu/CLIP-DR}, Comment: Accepted by MICCAI 2024
- Published
- 2024
25. Language-driven Grasp Detection
- Author
-
Vuong, An Dinh, Vu, Minh Nhat, Huang, Baoru, Nguyen, Nghia, Le, Hieu, Vo, Thieu, and Nguyen, Anh
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language-driven grasp detection dataset featuring 1M samples, over 3M objects, and upwards of 10M grasping instructions. We utilize foundation models to create a large-scale scene corpus with corresponding images and grasp prompts. We approach the language-driven grasp detection task as a conditional generation problem. Drawing on the success of diffusion models in generative tasks and given that language plays a vital role in this task, we propose a new language-driven grasp detection method based on diffusion models. Our key contribution is the contrastive training objective, which explicitly contributes to the denoising process to detect the grasp pose given the language instructions. We illustrate that our approach is theoretically supportive. The intensive experiments show that our method outperforms state-of-the-art approaches and allows real-world robotic grasping. Finally, we demonstrate our large-scale dataset enables zero-short grasp detection and is a challenging benchmark for future work. Project website: https://airvlab.github.io/grasp-anything/, Comment: 19 pages. Accepted to CVPR24
- Published
- 2024
26. Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning
- Author
-
Nguyen, Huy Hoang, Vu, Minh Nhat, Beck, Florian, Ebmer, Gerald, Nguyen, Anh, and Kugi, Andreas
- Subjects
Computer Science - Robotics - Abstract
Combining a vision module inside a closed-loop control system for a \emph{seamless movement} of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a \emph{modular} zero-shot framework for language-driven manipulation of (dynamic) objects through a closed-loop control system with real-time trajectory replanning and an online 6D object pose localization. We segment an object within $\SI{0.5}{\second}$ by leveraging a vision language model via language commands. Then, guided by natural language commands, a closed-loop system, including a unified pose estimation and tracking and online trajectory planning, is utilized to continuously track this object and compute the optimal trajectory in real-time. Our proposed zero-shot framework provides a smooth trajectory that avoids jerky movements and ensures the robot can grasp a non-stationary object. Experiment results exhibit the real-time capability of the proposed zero-shot modular framework for the trajectory optimization module to accurately and efficiently grasp moving objects, i.e., up to \SI{30}{\hertz} update rates for the online 6D pose localization module and \SI{10}{\hertz} update rates for the receding-horizon trajectory optimization. These advantages highlight the modular framework's potential applications in robotics and human-robot interaction; see the video in https://www.acin.tuwien.ac.at/en/6e64/., Comment: 9 pages, 6 figures
- Published
- 2024
27. Impact of Bystander Cardiopulmonary Resuscitation on Out-of- Hospital Cardiac Arrest Outcome in Vietnam
- Author
-
Dao, Co Xuan, Luong, Chinh Quoc, Manabe, Toshie, Nguyen, My Ha, Pham, Dung Thi, Ton, Tra Thanh, Hoang, Quoc Trong Ai, Nguyen, Tuan Anh, Nguyen, Anh Dat, McNally, Bryan Francis, Ong, Marcus Eng Hock, Do, Son Ngoc, and Group, The Local PAROS Investigators
- Subjects
Bystander Cardiopulmonary Resuscitation ,emergency medical services ,low- and middle-income countries ,Out-of-hospital Cardiac Arrest ,Pan-Asian Resuscitation Outcomes Study ,return of spontaneous circulation ,Vietnam - Abstract
Introduction: Patients experiencing an out-of-hospital cardiac arrest (OHCA) frequently do not receive bystander cardiopulmonary resuscitation (CPR), especially in low- and middle-income countries (LMIC). In this study we sought to determine the prevalence of OHCA patients in Vietnam who received bystander CPR and its effects on survival outcomes.Methods: We performed a multicenter, retrospective observational study of patients (≥18 years) presenting with OHCA at three major hospitals in an LMIC from February 2014–December 2018. We collected data on the hospital and patient characteristics, the cardiac arrest events, the emergency medical services (EMS) system, the therapy methods, and the outcomes and compared these data, before and after pairwise 1:1 propensity score matching, between patients who received bystander CPR and those who did not. Upon admission, we assessed factors associated with good neurological survival at hospital discharge in univariable and multivariable logistic models.Results: Of 521 patients, 388 (74.5%) were men, and the mean age was 56.7 years (SD 17.3). Although most cardiac arrests (68.7%, 358/521) occurred at home and 78.8% (410/520) were witnessed, a low proportion (22.1%, 115/521) of these patients received bystander CPR. Only half of the patients were brought by EMS (8.1%, 42/521) or private ambulance (42.8%, 223/521), 50.8% (133/262) of whom had resuscitation attempts. Before matching, there was a significant difference in good neurological survival between patients who received bystander CPR (12.2%, 14/115) and patients who did not (4.7%, 19/406; P < .001). After matching, good neurological survival was absent in all OHCA patients who did not receive CPR from a bystander. The multivariable analysis showed that bystander CPR (adjusted odds ratio: 3.624; 95% confidence interval 1.629–8.063) was an independent predictor of good neurological survival.Conclusion: In our study, only 22.1% of total OHCA patients received bystander CPR, which contributed significantly to a low rate of good neurological survival in Vietnam. To improve the chances of survival with good neurological functions of OHCA patients, more people should be trained to perform bystander CPR and teach others as well. A standard program for emergency first-aid training is necessary for this purpose.
- Published
- 2024
28. LXR signaling pathways link cholesterol metabolism with risk for prediabetes and diabetes
- Author
-
Ding, Jingzhong, Nguyen, Anh Tram, Lohman, Kurt, Hensley, Michael T, Parker, Daniel, Hou, Li, Taylor, Jackson, Voora, Deepak, Sawyer, Janet K, Boudyguina, Elena, Bancks, Michael P, Bertoni, Alain, Pankow, James S, Rotter, Jerome I, Goodarzi, Mark O, Tracy, Russell P, Murdoch, David M, Duprez, Daniel, Rich, Stephen S, Psaty, Bruce M, Siscovick, David, Newgard, Christopher B, Herrington, David, Hoeschele, Ina, Shea, Steven, Stein, James H, Patel, Manesh, Post, Wendy, Jacobs, David, Parks, John S, and Liu, Yongmei
- Subjects
Biochemistry and Cell Biology ,Biomedical and Clinical Sciences ,Biological Sciences ,Atherosclerosis ,Clinical Research ,Health Disparities ,Obesity ,Diabetes ,Genetics ,Nutrition ,Prevention ,Cardiovascular ,2.1 Biological and endogenous factors ,Metabolic and endocrine ,Humans ,Prediabetic State ,Male ,Female ,Diabetes Mellitus ,Type 2 ,Middle Aged ,Liver X Receptors ,Cholesterol ,Aged ,Signal Transduction ,ATP Binding Cassette Transporter ,Subfamily G ,Member 1 ,Monocytes ,Risk Factors ,ATP Binding Cassette Transporter 1 ,Aged ,80 and over ,Expression profiling ,Metabolism ,Medical and Health Sciences ,Immunology ,Biological sciences ,Biomedical and clinical sciences ,Health sciences - Abstract
BACKGROUNDPreclinical studies suggest that cholesterol accumulation leads to insulin resistance. We previously reported that alterations in a monocyte cholesterol metabolism transcriptional network (CMTN) - suggestive of cellular cholesterol accumulation - were cross-sectionally associated with obesity and type 2 diabetes (T2D). Here, we sought to determine whether the CMTN alterations independently predict incident prediabetes/T2D risk, and correlate with cellular cholesterol accumulation.METHODSMonocyte mRNA expression of 11 CMTN genes was quantified among 934 Multi-Ethnic Study of Atherosclerosis (MESA) participants free of prediabetes/T2D; cellular cholesterol was measured in a subset of 24 monocyte samples.RESULTSDuring a median 6-year follow-up, lower expression of 3 highly correlated LXR target genes - ABCG1 and ABCA1 (cholesterol efflux) and MYLIP (cholesterol uptake suppression) - and not other CMTN genes, was significantly associated with higher risk of incident prediabetes/T2D. Lower expression of the LXR target genes correlated with higher cellular cholesterol levels (e.g., 47% of variance in cellular total cholesterol explained by ABCG1 expression). Further, adding the LXR target genes to overweight/obesity and other known predictors significantly improved prediction of incident prediabetes/T2D.CONCLUSIONThese data suggest that the aberrant LXR/ABCG1-ABCA1-MYLIP pathway (LAAMP) is a major T2D risk factor and support a potential role for aberrant LAAMP and cellular cholesterol accumulation in diabetogenesis.FUNDINGThe MESA Epigenomics and Transcriptomics Studies were funded by NIH grants 1R01HL101250, 1RF1AG054474, R01HL126477, R01DK101921, and R01HL135009. This work was supported by funding from NIDDK R01DK103531 and NHLBI R01HL119962.
- Published
- 2024
29. AI-powered Code Review with LLMs: Early Results
- Author
-
Rasheed, Zeeshan, Sami, Malik Abdul, Waseem, Muhammad, Kemell, Kai-Kristian, Wang, Xiaofeng, Nguyen, Anh, Systä, Kari, and Abrahamsson, Pekka
- Subjects
Computer Science - Software Engineering - Abstract
In this paper, we present a novel approach to improving software quality and efficiency through a Large Language Model (LLM)-based model designed to review code and identify potential issues. Our proposed LLM-based AI agent model is trained on large code repositories. This training includes code reviews, bug reports, and documentation of best practices. It aims to detect code smells, identify potential bugs, provide suggestions for improvement, and optimize the code. Unlike traditional static code analysis tools, our LLM-based AI agent has the ability to predict future potential risks in the code. This supports a dual goal of improving code quality and enhancing developer education by encouraging a deeper understanding of best practices and efficient coding techniques. Furthermore, we explore the model's effectiveness in suggesting improvements that significantly reduce post-release bugs and enhance code review processes, as evidenced by an analysis of developer sentiment toward LLM feedback. For future work, we aim to assess the accuracy and efficiency of LLM-generated documentation updates in comparison to manual methods. This will involve an empirical study focusing on manually conducted code reviews to identify code smells and bugs, alongside an evaluation of best practice documentation, augmented by insights from developer discussions and code reviews. Our goal is to not only refine the accuracy of our LLM-based tool but also to underscore its potential in streamlining the software development lifecycle through proactive code improvement and education., Comment: 8 pages
- Published
- 2024
30. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
- Author
-
Abdin, Marah, Aneja, Jyoti, Awadalla, Hany, Awadallah, Ahmed, Awan, Ammar Ahmad, Bach, Nguyen, Bahree, Amit, Bakhtiari, Arash, Bao, Jianmin, Behl, Harkirat, Benhaim, Alon, Bilenko, Misha, Bjorck, Johan, Bubeck, Sébastien, Cai, Martin, Cai, Qin, Chaudhary, Vishrav, Chen, Dong, Chen, Dongdong, Chen, Weizhu, Chen, Yen-Chun, Chen, Yi-Ling, Cheng, Hao, Chopra, Parul, Dai, Xiyang, Dixon, Matthew, Eldan, Ronen, Fragoso, Victor, Gao, Jianfeng, Gao, Mei, Gao, Min, Garg, Amit, Del Giorno, Allie, Goswami, Abhishek, Gunasekar, Suriya, Haider, Emman, Hao, Junheng, Hewett, Russell J., Hu, Wenxiang, Huynh, Jamie, Iter, Dan, Jacobs, Sam Ade, Javaheripi, Mojan, Jin, Xin, Karampatziakis, Nikos, Kauffmann, Piero, Khademi, Mahoud, Kim, Dongwoo, Kim, Young Jin, Kurilenko, Lev, Lee, James R., Lee, Yin Tat, Li, Yuanzhi, Li, Yunsheng, Liang, Chen, Liden, Lars, Lin, Xihui, Lin, Zeqi, Liu, Ce, Liu, Liyuan, Liu, Mengchen, Liu, Weishung, Liu, Xiaodong, Luo, Chong, Madan, Piyush, Mahmoudzadeh, Ali, Majercak, David, Mazzola, Matt, Mendes, Caio César Teodoro, Mitra, Arindam, Modi, Hardik, Nguyen, Anh, Norick, Brandon, Patra, Barun, Perez-Becker, Daniel, Portet, Thomas, Pryzant, Reid, Qin, Heyang, Radmilac, Marko, Ren, Liliang, de Rosa, Gustavo, Rosset, Corby, Roy, Sambudha, Ruwase, Olatunji, Saarikivi, Olli, Saied, Amin, Salim, Adil, Santacroce, Michael, Shah, Shital, Shang, Ning, Sharma, Hiteshi, Shen, Yelong, Shukla, Swadheen, Song, Xia, Tanaka, Masahiro, Tupini, Andrea, Vaddamanu, Praneetha, Wang, Chunyu, Wang, Guanhua, Wang, Lijuan, Wang, Shuohang, Wang, Xin, Wang, Yu, Ward, Rachel, Wen, Wen, Witte, Philipp, Wu, Haiping, Wu, Xiaoxia, Wyatt, Michael, Xiao, Bin, Xu, Can, Xu, Jiahang, Xu, Weijian, Xue, Jilong, Yadav, Sonali, Yang, Fan, Yang, Jianwei, Yang, Yifan, Yang, Ziyi, Yu, Donghan, Yuan, Lu, Zhang, Chenruidong, Zhang, Cyril, Zhang, Jianwen, Zhang, Li Lyna, Zhang, Yi, Zhang, Yue, Zhang, Yunan, and Zhou, Xiren
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench). To enhance multilingual, multimodal, and long-context capabilities, we introduce three models in the phi-3.5 series: phi-3.5-mini, phi-3.5-MoE, and phi-3.5-Vision. The phi-3.5-MoE, a 16 x 3.8B MoE model with 6.6 billion active parameters, achieves superior performance in language reasoning, math, and code tasks compared to other open-source models of similar scale, such as Llama 3.1 and the Mixtral series, and on par with Gemini-1.5-Flash and GPT-4o-mini. Meanwhile, phi-3.5-Vision, a 4.2 billion parameter model derived from phi-3.5-mini, excels in reasoning tasks and is adept at handling both single-image and text prompts, as well as multi-image and text prompts., Comment: 24 pages
- Published
- 2024
31. High-fidelity Endoscopic Image Synthesis by Utilizing Depth-guided Neural Surfaces
- Author
-
Huang, Baoru, Wang, Yida, Nguyen, Anh, Elson, Daniel, Vasconcelos, Francisco, and Stoyanov, Danail
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In surgical oncology, screening colonoscopy plays a pivotal role in providing diagnostic assistance, such as biopsy, and facilitating surgical navigation, particularly in polyp detection. Computer-assisted endoscopic surgery has recently gained attention and amalgamated various 3D computer vision techniques, including camera localization, depth estimation, surface reconstruction, etc. Neural Radiance Fields (NeRFs) and Neural Implicit Surfaces (NeuS) have emerged as promising methodologies for deriving accurate 3D surface models from sets of registered images, addressing the limitations of existing colon reconstruction approaches stemming from constrained camera movement. However, the inadequate tissue texture representation and confused scale problem in monocular colonoscopic image reconstruction still impede the progress of the final rendering results. In this paper, we introduce a novel method for colon section reconstruction by leveraging NeuS applied to endoscopic images, supplemented by a single frame of depth map. Notably, we pioneered the exploration of utilizing only one frame depth map in photorealistic reconstruction and neural rendering applications while this single depth map can be easily obtainable from other monocular depth estimation networks with an object scale. Through rigorous experimentation and validation on phantom imagery, our approach demonstrates exceptional accuracy in completely rendering colon sections, even capturing unseen portions of the surface. This breakthrough opens avenues for achieving stable and consistently scaled reconstructions, promising enhanced quality in cancer screening procedures and treatment interventions.
- Published
- 2024
32. Gravitational radiation of a spherically symmetric source in $f(R)$-gravitation
- Author
-
Van Ky, Pham, Van, Nguyen Thi Hong, and Ky, Nguyen Anh
- Subjects
General Relativity and Quantum Cosmology ,High Energy Physics - Theory - Abstract
It is shown that Birkhoff's theorem for the general theory of relativity is overcome in the $f(R)$-theory of gravitation. That means, the $f(R)$-theory of gravitation, unlike Einstein's general theory of relativity, does not forbid gravitational radiation from a spherically symmetric source (whether stationary or non-stationary). As a consequence, in the $f(R)$-theory a spherically symmetric gravitational deformation (e.g., collapse/expansion or pulsation) could emit gravitational waves (of tensor- and scalar polarization modes), a phenomenon impossible in the general relativity. A test model is examined and it turns out that the gravitational radiation is strongest when the surface of the deforming object is in the vicinity of the (modified) event horizon, even suddenly flares up just outside the latter. In this letter, within the $f(R)$-theory of gravitation, a gravitational wave equation and a formula for the gravitational emission power are derived. These formulae, along with searching for signals, can be used for the experimental test of the $f(R)$-theory. In general, including the spherically symmetry case, gravitational radiation of both tensor- and scalar polarization modes are allowed, although under some circumstance the contribution of scalar modes is strongly suppressed., Comment: LaTeX 10 pages, 5 figures
- Published
- 2024
- Full Text
- View/download PDF
33. DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness
- Author
-
Wang, Yuqi, Wang, Zeqiang, Wang, Wei, Chen, Qi, Huang, Kaizhu, Nguyen, Anh, and De, Suparna
- Subjects
Computer Science - Computation and Language - Abstract
Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multi-task learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.
- Published
- 2024
34. Weakly-Supervised Learning via Multi-Lateral Decoder Branching for Guidewire Segmentation in Robot-Assisted Cardiovascular Catheterization
- Author
-
Omisore, Olatunji Mumini, Akinyemi, Toluwanimi, Nguyen, Anh, and Wang, Lei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
Although robot-assisted cardiovascular catheterization is commonly performed for intervention of cardiovascular diseases, more studies are needed to support the procedure with automated tool segmentation. This can aid surgeons on tool tracking and visualization during intervention. Learning-based segmentation has recently offered state-of-the-art segmentation performances however, generating ground-truth signals for fully-supervised methods is labor-intensive and time consuming for the interventionists. In this study, a weakly-supervised learning method with multi-lateral pseudo labeling is proposed for tool segmentation in cardiac angiograms. The method includes a modified U-Net model with one encoder and multiple lateral-branched decoders that produce pseudo labels as supervision signals under different perturbation. The pseudo labels are self-generated through a mixed loss function and shared consistency in the decoders. We trained the model end-to-end with weakly-annotated data obtained during robotic cardiac catheterization. Experiments with the proposed model shows weakly annotated data has closer performance to when fully annotated data is used. Compared to three existing weakly-supervised methods, our approach yielded higher segmentation performance across three different cardiac angiogram data. With ablation study, we showed consistent performance under different parameters. Thus, we offer a less expensive method for real-time tool segmentation and tracking during robot-assisted cardiac catheterization.
- Published
- 2024
35. Allowing humans to interactively guide machines where to look does not always improve human-AI team's classification accuracy
- Author
-
Nguyen, Giang, Taesiri, Mohammad Reza, Kim, Sunnie S. Y., and Nguyen, Anh
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Human-Computer Interaction - Abstract
Via thousands of papers in Explainable AI (XAI), attention maps \cite{vaswani2017attention} and feature importance maps \cite{bansal2020sam} have been established as a common means for finding how important each input feature is to an AI's decisions. It is an interesting, unexplored question whether allowing users to edit the feature importance at test time would improve a human-AI team's accuracy on downstream tasks. In this paper, we address this question by leveraging CHM-Corr, a state-of-the-art, ante-hoc explainable classifier \cite{taesiri2022visual} that first predicts patch-wise correspondences between the input and training-set images, and then bases on them to make classification decisions. We build CHM-Corr++, an interactive interface for CHM-Corr, enabling users to edit the feature importance map provided by CHM-Corr and observe updated model decisions. Via CHM-Corr++, users can gain insights into if, when, and how the model changes its outputs, improving their understanding beyond static explanations. However, our study with 18 expert users who performed 1,400 decisions finds no statistical significance that our interactive approach improves user accuracy on CUB-200 bird image classification over static explanations. This challenges the hypothesis that interactivity can boost human-AI team accuracy and raises needs for future research. We open-source CHM-Corr++, an interactive tool for editing image classifier attention (see an interactive demo here: http://137.184.82.109:7080/). We release code and data on github: https://github.com/anguyen8/chm-corr-interactive., Comment: Accepted for presentation at the XAI4CV Workshop, part of the CVPR 2024 proceedings
- Published
- 2024
36. Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model
- Author
-
Zhu, Qinfeng, Cai, Yuanzhi, Fang, Yuan, Yang, Yihan, Chen, Cheng, Fan, Lei, and Nguyen, Anh
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
High-resolution remotely sensed images pose a challenge for commonly used semantic segmentation methods such as Convolutional Neural Network (CNN) and Vision Transformer (ViT). CNN-based methods struggle with handling such high-resolution images due to their limited receptive field, while ViT faces challenges in handling long sequences. Inspired by Mamba, which adopts a State Space Model (SSM) to efficiently capture global semantic information, we propose a semantic segmentation framework for high-resolution remotely sensed images, named Samba. Samba utilizes an encoder-decoder architecture, with Samba blocks serving as the encoder for efficient multi-level semantic information extraction, and UperNet functioning as the decoder. We evaluate Samba on the LoveDA, ISPRS Vaihingen, and ISPRS Potsdam datasets, comparing its performance against top-performing CNN and ViT methods. The results reveal that Samba achieved unparalleled performance on commonly used remote sensing datasets for semantic segmentation. Our proposed Samba demonstrates for the first time the effectiveness of SSM in semantic segmentation of remotely sensed images, setting a new benchmark in performance for Mamba-based techniques in this specific application. The source code and baseline implementations are available at https://github.com/zhuqinfeng1999/Samba.
- Published
- 2024
37. Envisioning the Next-Generation AI Coding Assistants: Insights & Proposals
- Author
-
Nghiem, Khanh, Nguyen, Anh Minh, and Bui, Nghi D. Q.
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction - Abstract
As a research-product hybrid group in AI for Software Engineering (AI4SE), we present four key takeaways from our experience developing in-IDE AI coding assistants. AI coding assistants should set clear expectations for usage, integrate with advanced IDE capabilities and existing extensions, use extendable backend designs, and collect app data responsibly for downstream analyses. We propose open questions and challenges that academia and industry should address to realize the vision of next-generation AI coding assistants.
- Published
- 2024
38. On the Effectiveness of Heterogeneous Ensemble Methods for Re-identification
- Author
-
Klüttermann, Simon, Rutinowski, Jérôme, Nguyen, Anh, Grimme, Britta, Roidl, Moritz, and Müller, Emmanuel
- Subjects
Computer Science - Machine Learning - Abstract
In this contribution, we introduce a novel ensemble method for the re-identification of industrial entities, using images of chipwood pallets and galvanized metal plates as dataset examples. Our algorithms replace commonly used, complex siamese neural networks with an ensemble of simplified, rudimentary models, providing wider applicability, especially in hardware-restricted scenarios. Each ensemble sub-model uses different types of extracted features of the given data as its input, allowing for the creation of effective ensembles in a fraction of the training duration needed for more complex state-of-the-art models. We reach state-of-the-art performance at our task, with a Rank-1 accuracy of over 77% and a Rank-10 accuracy of over 99%, and introduce five distinct feature extraction approaches, and study their combination using different ensemble methods.
- Published
- 2024
39. ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation
- Author
-
Tran, Minh, Bounsavy, Winston, Vo, Khoa, Nguyen, Anh, Nguyen, Tri, and Le, Ngan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Amodal Instance Segmentation (AIS) presents a challenging task as it involves predicting both visible and occluded parts of objects within images. Existing AIS methods rely on a bidirectional approach, encompassing both the transition from amodal features to visible features (amodal-to-visible) and from visible features to amodal features (visible-to-amodal). Our observation shows that the utilization of amodal features through the amodal-to-visible can confuse the visible features due to the extra information of occluded/hidden segments not presented in visible display. Consequently, this compromised quality of visible features during the subsequent visible-to-amodal transition. To tackle this issue, we introduce ShapeFormer, a decoupled Transformer-based model with a visible-to-amodal transition. It facilitates the explicit relationship between output segmentations and avoids the need for amodal-to-visible transitions. ShapeFormer comprises three key modules: (i) Visible-Occluding Mask Head for predicting visible segmentation with occlusion awareness, (ii) Shape-Prior Amodal Mask Head for predicting amodal and occluded masks, and (iii) Category-Specific Shape Prior Retriever aims to provide shape prior knowledge. Comprehensive experiments and extensive ablation studies across various AIS benchmarks demonstrate the effectiveness of our ShapeFormer. The code is available at: \url{https://github.com/UARK-AICV/ShapeFormer}, Comment: Accepted to IJCNN2024
- Published
- 2024
40. PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck
- Author
-
Pham, Thang M., Chen, Peijie, Nguyen, Tin, Yoon, Seunghyun, Bui, Trung, and Nguyen, Anh Totti
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
CLIP-based classifiers rely on the prompt containing a {class name} that is known to the text encoder. Therefore, they perform poorly on new classes or the classes whose names rarely appear on the Internet (e.g., scientific names of birds). For fine-grained classification, we propose PEEB - an explainable and editable classifier to (1) express the class name into a set of text descriptors that describe the visual parts of that class; and (2) match the embeddings of the detected parts to their textual descriptors in each class to compute a logit score for classification. In a zero-shot setting where the class names are unknown, PEEB outperforms CLIP by a huge margin (~10x in top-1 accuracy). Compared to part-based classifiers, PEEB is not only the state-of-the-art (SOTA) on the supervised-learning setting (88.80% and 92.20% accuracy on CUB-200 and Dogs-120, respectively) but also the first to enable users to edit the text descriptors to form a new classifier without any re-training. Compared to concept bottleneck models, PEEB is also the SOTA in both zero-shot and supervised-learning settings., Comment: Findings of NAACL 2024 (long paper)
- Published
- 2024
41. Penetration Vision through Virtual Reality Headsets: Identifying 360-degree Videos from Head Movements
- Author
-
Nguyen, Anh, Zhang, Xiaokuan, and Yan, Zhisheng
- Subjects
Computer Science - Human-Computer Interaction - Abstract
In this paper, we present the first contactless side-channel attack for identifying 360 videos being viewed in a Virtual Reality (VR) Head Mounted Display (HMD). Although the video content is displayed inside the HMD without any external exposure, we observe that user head movements are driven by the video content, which creates a unique side channel that does not exist in traditional 2D videos. By recording the user whose vision is blocked by the HMD via a malicious camera, an attacker can analyze the correlation between the user's head movements and the victim video to infer the video title. To exploit this new vulnerability, we present INTRUDE, a system for identifying 360 videos from recordings of user head movements. INTRUDE is empowered by an HMD-based head movement estimation scheme to extract a head movement trace from the recording and a video saliency-based trace-fingerprint matching framework to infer the video title. Evaluation results show that INTRUDE achieves over 96% of accuracy for video identification and is robust under different recording environments. Moreover, INTRUDE maintains its effectiveness in the open-world identification scenario., Comment: Accepted to USENIX Security '24
- Published
- 2024
42. Spillover Effects of US Monetary Policy on Emerging Markets Amidst Uncertainty
- Author
-
Lastauskas, Povilas and Nguyen, Anh Dinh Minh
- Subjects
Economics - General Economics - Abstract
This paper examines the impact of US monetary policy tightening on emerging markets, distinguishing between direct and indirect spillover effects using the global vector autoregression with stochastic volatility covering 32 countries. The paper demonstrates that an increase in the US interest rate significantly reduces output for emerging markets, leading to larger, more prolonged, and persistent declines. Such an impact is further intensified by global trade integration, causing a sharper yet slightly quicker rebounding output drop. The spillover effects are significantly amplified when US monetary policy tightening is accompanied by an increase in monetary policy uncertainty. Finally, emerging markets exhibit considerable heterogeneity in their responses to US monetary policy shocks.
- Published
- 2024
- Full Text
- View/download PDF
43. Adaptive multi-gradient methods for quasiconvex vector optimization and applications to multi-task learning
- Author
-
Minh, Nguyen Anh, Muu, Le Dung, and Thang, Tran Ngoc
- Subjects
Mathematics - Optimization and Control ,Computer Science - Machine Learning - Abstract
We present an adaptive step-size method, which does not include line-search techniques, for solving a wide class of nonconvex multiobjective programming problems on an unbounded constraint set. We also prove convergence of a general approach under modest assumptions. More specifically, the convexity criterion might not be satisfied by the objective function. Unlike descent line-search algorithms, it does not require an initial step-size to be determined by a previously determined Lipschitz constant. The process's primary characteristic is its gradual step-size reduction up until a predetermined condition is met. It can be specifically applied to offer an innovative multi-gradient projection method for unbounded constrained optimization issues. Preliminary findings from a few computational examples confirm the accuracy of the strategy. We apply the proposed technique to some multi-task learning experiments to show its efficacy for large-scale challenges.
- Published
- 2024
44. MIRT: a simultaneous reconstruction and affine motion compensation technique for four dimensional computed tomography (4DCT)
- Author
-
Nguyen, Anh-Tuan, Renders, Jens, Iuso, Domenico, Maris, Yves, Soete, Jeroen, Wevers, Martine, Sijbers, Jan, and De Beenhouwer, Jan
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Mathematics - Optimization and Control ,65K10, 68U10, 68W01, 92C55, 94A08 - Abstract
In four-dimensional computed tomography (4DCT), 3D images of moving or deforming samples are reconstructed from a set of 2D projection images. Recent techniques for iterative motion-compensated reconstruction either necessitate a reference acquisition or alternate image reconstruction and motion estimation steps. In these methods, the motion estimation step involves the estimation of either complete deformation vector fields (DVFs) or a limited set of parameters corresponding to the affine motion, including rigid motion or scaling. The majority of these approaches rely on nested iterations, incurring significant computational expenses. Notably, despite the direct benefits of an analytical formulation and a substantial reduction in computational complexity, there has been no exploration into parameterizing DVFs for general affine motion in CT imaging. In this work, we propose the Motion-compensated Iterative Reconstruction Technique (MIRT)- an efficient iterative reconstruction scheme that combines image reconstruction and affine motion estimation in a single update step, based on the analytical gradients of the motion towards both the reconstruction and the affine motion parameters. When most of the state-of-the-art 4DCT methods have not attempted to be tested on real data, results from simulation and real experiments show that our method outperforms the state-of-the-art CT reconstruction with affine motion correction methods in computational feasibility and projection distance. In particular, this allows accurate reconstruction for a proper microscale diamond in the appearance of motion from the practically acquired projection radiographs, which leads to a novel application of 4DCT., Comment: Submitted to the SIAM Journal on Imaging Sciences (SIIMS)
- Published
- 2024
45. Chemical Modification of Cu2O Nanoparticles with Triacetoxy(Vinyl)Silane: Enhanced Dispersion, Abrasion Resistance, and Thermal Stability in Acrylic Coatings
- Author
-
Dam, Xuan Thang, Nguyen, Thuy Chinh, Nguyen, Anh Hiep, Vu, Dinh Hieu, Ly, Thi Ngoc Lien, Trinh, Hoang Nghia, Phung, Thi Lan, Nguyen, Tuan Anh, Dao, Phi Hung, and Thai, Hoang
- Published
- 2024
- Full Text
- View/download PDF
46. Multivariate polynomial interpolation based on Radon projections
- Author
-
Ngoc, Nguyen Anh, Khiem, Nguyen Van, Long, Tang Van, and Manh, Phung Van
- Published
- 2024
- Full Text
- View/download PDF
47. Mini-Dose Ready-to-Use Liquid Glucagon for Post-Bariatric Hypoglycemia Treatment in Experimental and Real-World Settings
- Author
-
Lawler, Helen Margaret, Patti, Mary Elizabeth, Krecic, Matthew R., Rowell, Jennifer, Dobs, Adrian, Nguyen, Anh, and Conoscenti, Valentina
- Published
- 2024
- Full Text
- View/download PDF
48. Utilization of shrimp heads for scaling up of production of Bacillus velezensis EB.KN15, its bioactive compounds and novel anti-fungal effect against durian pathogen fungi
- Author
-
Ngo, Van Anh, Wang, San-Lang, Nguyen, Van Bon, Phan, Tu Quy, Tran, Thi Ha Trang, Doan, Manh Dung, Nguyen, Dinh Sy, and Nguyen, Anh Dzung
- Published
- 2024
- Full Text
- View/download PDF
49. Expanded Polytetrafluoroethylene Combined with Human Acellular Dermis Matrix for the Nasal Reconstruction in Patients with Postoperative Deformities
- Author
-
Tuan, Hoang Thanh, Ai, Luu Dang, Ngoc, Nguyen Anh, Huong, Nguyen Thi Lan, Vinh, Vu Quang, and Van Anh, Tran
- Published
- 2024
- Full Text
- View/download PDF
50. Enhanced optical properties of graphite nanoflakes/polydimethylsiloxane nanocomposites induced by low-dose gamma irradiation
- Author
-
Chhetri, Suman, Nguyen, Anh Tuan, Song, Sehwan, Gaillard, Nicolas, Yoon, Sang-Hee, and Lee, Woochul
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.