142,561 results on '"An, Phan"'
Search Results
2. TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training
- Author
-
Krause, Felix, Phan, Timy, Hu, Vincent Tao, and Ommer, Björn
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Diffusion models have emerged as the mainstream approach for visual generation. However, these models usually suffer from sample inefficiency and high training costs. This issue is particularly pronounced in the standard diffusion transformer architecture due to its quadratic complexity relative to input length. Recent works have addressed this by reducing the number of tokens processed in the model, often through masking. In contrast, this work aims to improve the training efficiency of the diffusion backbone by using predefined routes that store this information until it is reintroduced to deeper layers of the model, rather than discarding these tokens entirely. Further, we combine multiple routes and introduce an adapted auxiliary loss that accounts for all applied routes. Our method is not limited to the common transformer-based model - it can also be applied to state-space models. Unlike most current approaches, TREAD achieves this without architectural modifications. Finally, we show that our method reduces the computational cost and simultaneously boosts model performance on the standard benchmark ImageNet-1K 256 x 256 in class-conditional synthesis. Both of these benefits multiply to a convergence speedup of 9.55x at 400K training iterations compared to DiT and 25.39x compared to the best benchmark performance of DiT at 7M training iterations.
- Published
- 2025
3. AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features
- Author
-
Zhang, Ruochen, Choi, Hyeung-Sik, Jung, Dongwook, Anh, Phan Huy Nam, Jeong, Sang-Ki, and Zhu, Zihao
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Monocular 3D object detection is a challenging task in autonomous systems due to the lack of explicit depth information in single-view images. Existing methods often depend on external depth estimators or expensive sensors, which increase computational complexity and hinder real-time performance. To overcome these limitations, we propose AuxDepthNet, an efficient framework for real-time monocular 3D object detection that eliminates the reliance on external depth maps or pre-trained depth models. AuxDepthNet introduces two key components: the Auxiliary Depth Feature (ADF) module, which implicitly learns depth-sensitive features to improve spatial reasoning and computational efficiency, and the Depth Position Mapping (DPM) module, which embeds depth positional information directly into the detection process to enable accurate object localization and 3D bounding box regression. Leveraging the DepthFusion Transformer architecture, AuxDepthNet globally integrates visual and depth-sensitive features through depth-guided interactions, ensuring robust and efficient detection. Extensive experiments on the KITTI dataset show that AuxDepthNet achieves state-of-the-art performance, with $\text{AP}_{3D}$ scores of 24.72\% (Easy), 18.63\% (Moderate), and 15.31\% (Hard), and $\text{AP}_{\text{BEV}}$ scores of 34.11\% (Easy), 25.18\% (Moderate), and 21.90\% (Hard) at an IoU threshold of 0.7.
- Published
- 2025
4. Electroweak phase transition with the confinement scale of the strong sector or Dilaton in the minimal composite Higgs model
- Author
-
Phong, Vo Quoc, Van Tien, Truong, and Khiem, Phan Hong
- Subjects
High Energy Physics - Phenomenology - Abstract
The minimal Composite Higgs model (MCHM) provides an effective trigger for the Baryogenesis scenario through the confinement scale of the strong sector ($f$) or Dilaton ($\chi$). $f$ is a parameter with mass dimension, which stores the resonances of particles at high energies, has a suitable value of about 800 GeV. But when $300$ GeV $\le f \le 400$ GeV, the effective Higgs potential has a first order electroweak phase transition. Therefore, although $f$ cannot be a perfect trigger, it does suggest an effective approach that accommodates the resonances of particles. Thus the investigation of the electroweak phase transition according to $f$ has confirmed that the inclusion of Dilaton in the effective potential is reasonable. Accordingly, we derive a Dilaton potential with appropriate parameter domains and $f=800$ GeV, the mass of Dilaton ranges from 300 GeV to 700 GeV, which will give an electroweak phase transition strength greater than 1 and less than 3, enough for a first order phase transition. This is a directly and clearly prove of the triggers for the first order EWPT in MCHM., Comment: 27 pages, 4 figures
- Published
- 2025
5. LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging
- Author
-
Singh, Shubhr, Benetos, Emmanouil, Phan, Huy, and Stowell, Dan
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Transformers have set new benchmarks in audio processing tasks, leveraging self-attention mechanisms to capture complex patterns and dependencies within audio data. However, their focus on pairwise interactions limits their ability to process the higher-order relations essential for identifying distinct audio objects. To address this limitation, this work introduces the Local- Higher Order Graph Neural Network (LHGNN), a graph based model that enhances feature understanding by integrating local neighbourhood information with higher-order data from Fuzzy C-Means clusters, thereby capturing a broader spectrum of audio relationships. Evaluation of the model on three publicly available audio datasets shows that it outperforms Transformer-based models across all benchmarks while operating with substantially fewer parameters. Moreover, LHGNN demonstrates a distinct advantage in scenarios lacking ImageNet pretraining, establishing its effectiveness and efficiency in environments where extensive pretraining data is unavailable.
- Published
- 2025
6. ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer
- Author
-
Qi, Xuyin, Zhang, Zeyu, Handoko, Aaron Berliano, Zheng, Huazhan, Chen, Mingxi, Huy, Ta Duc, Phan, Vu Minh Hieu, Zhang, Lei, Cheng, Linqi, Jiang, Shiyu, Zhang, Zhiwei, Liao, Zhibin, Zhao, Yang, and To, Minh-Son
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Prostate cancer, a growing global health concern, necessitates precise diagnostic tools, with Magnetic Resonance Imaging (MRI) offering high-resolution soft tissue imaging that significantly enhances diagnostic accuracy. Recent advancements in explainable AI and representation learning have significantly improved prostate cancer diagnosis by enabling automated and precise lesion classification. However, existing explainable AI methods, particularly those based on frameworks like generative adversarial networks (GANs), are predominantly developed for natural image generation, and their application to medical imaging often leads to suboptimal performance due to the unique characteristics and complexity of medical image. To address these challenges, our paper introduces three key contributions. First, we propose ProjectedEx, a generative framework that provides interpretable, multi-attribute explanations, effectively linking medical image features to classifier decisions. Second, we enhance the encoder module by incorporating feature pyramids, which enables multiscale feedback to refine the latent space and improves the quality of generated explanations. Additionally, we conduct comprehensive experiments on both the generator and classifier, demonstrating the clinical relevance and effectiveness of ProjectedEx in enhancing interpretability and supporting the adoption of AI in medical settings. Code will be released at https://github.com/Richardqiyi/ProjectedEx
- Published
- 2025
7. Lieb--Thirring inequalities for large quantum systems with inverse nearest-neighbor interactions
- Author
-
Duong, G. K. and Nam, Phan Thành
- Subjects
Mathematical Physics ,Mathematics - Functional Analysis ,Mathematics - Spectral Theory ,81Q10, 35R11, 46B70, 46E35 - Abstract
We prove an analogue of the Lieb--Thirring inequality for many-body quantum systems with the kinetic operator $\sum_i (-\Delta_i)^s$ and the interaction potential of the form $\sum_i \delta_i^{-2s}$ where $\delta_i$ is the nearest-neighbor distance to the point $x_i$. Our result extends the standard Lieb--Thirring inequality for fermions and applies to quantum systems without the anti-symmetry assumption on the wave functions. Additionally, we derive similar results for the Hardy--Lieb--Thirring inequality and obtain the asymptotic behavior of the optimal constants in the strong coupling limit., Comment: 27 pages
- Published
- 2025
8. Personalized Large Vision-Language Models
- Author
-
Pham, Chau, Phan, Hoang, Doermann, David, and Tian, Yunjie
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The personalization model has gained significant attention in image generation yet remains underexplored for large vision-language models (LVLMs). Beyond generic ones, with personalization, LVLMs handle interactive dialogues using referential concepts (e.g., ``Mike and Susan are talking.'') instead of the generic form (e.g., ``a boy and a girl are talking.''), making the conversation more customizable and referentially friendly. In addition, PLVM is equipped to continuously add new concepts during a dialogue without incurring additional costs, which significantly enhances the practicality. PLVM proposes Aligner, a pre-trained visual encoder to align referential concepts with the queried images. During the dialogues, it extracts features of reference images with these corresponding concepts and recognizes them in the queried image, enabling personalization. We note that the computational cost and parameter count of the Aligner are negligible within the entire framework. With comprehensive qualitative and quantitative analyses, we reveal the effectiveness and superiority of PLVM., Comment: A simple way to personalize your LLM
- Published
- 2024
9. Causal Composition Diffusion Model for Closed-loop Traffic Generation
- Author
-
Lin, Haohong, Huang, Xin, Phan-Minh, Tung, Hayden, David S., Zhang, Huan, Zhao, Ding, Srinivasa, Siddhartha, Wolff, Eric M., and Chen, Hongge
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
Simulation is critical for safety evaluation in autonomous driving, particularly in capturing complex interactive behaviors. However, generating realistic and controllable traffic scenarios in long-tail situations remains a significant challenge. Existing generative models suffer from the conflicting objective between user-defined controllability and realism constraints, which is amplified in safety-critical contexts. In this work, we introduce the Causal Compositional Diffusion Model (CCDiff), a structure-guided diffusion framework to address these challenges. We first formulate the learning of controllable and realistic closed-loop simulation as a constrained optimization problem. Then, CCDiff maximizes controllability while adhering to realism by automatically identifying and injecting causal structures directly into the diffusion process, providing structured guidance to enhance both realism and controllability. Through rigorous evaluations on benchmark datasets and in a closed-loop simulator, CCDiff demonstrates substantial gains over state-of-the-art approaches in generating realistic and user-preferred trajectories. Our results show CCDiff's effectiveness in extracting and leveraging causal structures, showing improved closed-loop performance based on key metrics such as collision rate, off-road rate, FDE, and comfort.
- Published
- 2024
10. Hybrid Network- and User-Centric Scalable Cell-Free Massive MIMO for Fronthaul Signaling Minimization
- Author
-
Lai, Phu, Xiang, Wei, Lukito, William Damario, Phan, Khoa Tran, Cheng, Peng, Liu, Chang, and Mao, Guoqiang
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Cell-free massive multiple-input multiple-output (CFmMIMO) coordinates a great number of distributed access points (APs) with central processing units (CPUs), effectively reducing interference and ensuring uniform service quality for user equipment (UEs). However, its cooperative nature can result in intense fronthaul signaling between CPUs in large-scale networks. To reduce the inter-CPU fronthaul signaling for systems with limited fronthaul capacity, we propose a low-complexity online UE-AP association approach for scalable CFmMIMO that combines network- and user-centric clustering methodologies, relies on local channel information only, and can handle dynamic UE arrivals. Numerical results demonstrate that compared to the state-of-the-art method on fronthaul signaling minimization, our approach can save up to 94% of the fronthaul signaling load and 83% of the CPU processing power at the cost of only up to 8.6% spectral efficiency loss, or no loss in some cases., Comment: This article has been accepted for publication by IEEE Transactions on Vehicular Technology
- Published
- 2024
- Full Text
- View/download PDF
11. DriveGPT: Scaling Autoregressive Behavior Models for Driving
- Author
-
Huang, Xin, Wolff, Eric M., Vernaza, Paul, Phan-Minh, Tung, Chen, Hongge, Hayden, David S., Edmonds, Mark, Pierce, Brian, Chen, Xinxin, Jacob, Pratik Elias, Chen, Xiaobai, Tairbekov, Chingiz, Agarwal, Pratik, Gao, Tianshi, Chai, Yuning, and Srinivasa, Siddhartha
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
We present DriveGPT, a scalable behavior model for autonomous driving. We model driving as a sequential decision making task, and learn a transformer model to predict future agent states as tokens in an autoregressive fashion. We scale up our model parameters and training data by multiple orders of magnitude, enabling us to explore the scaling properties in terms of dataset size, model parameters, and compute. We evaluate DriveGPT across different scales in a planning task, through both quantitative metrics and qualitative examples including closed-loop driving in complex real-world scenarios. In a separate prediction task, DriveGPT outperforms a state-of-the-art baseline and exhibits improved performance by pretraining on a large-scale dataset, further validating the benefits of data scaling., Comment: 14 pages, 16 figures, 9 tables, and 1 video link
- Published
- 2024
12. Multi-OphthaLingua: A Multilingual Benchmark for Assessing and Debiasing LLM Ophthalmological QA in LMICs
- Author
-
Restrepo, David, Wu, Chenwei, Tang, Zhengxu, Shuai, Zitao, Phan, Thao Nguyen Minh, Ding, Jun-En, Dao, Cong-Tinh, Gallifant, Jack, Dychiao, Robyn Gayle, Artiaga, Jose Carlo, Bando, André Hiroshi, Gracitelli, Carolina Pelegrini Barbosa, Ferrer, Vincenz, Celi, Leo Anthony, Bitterman, Danielle, Morley, Michael G, and Nakayama, Luis Filipe
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Current ophthalmology clinical workflows are plagued by over-referrals, long waits, and complex and heterogeneous medical records. Large language models (LLMs) present a promising solution to automate various procedures such as triaging, preliminary tests like visual acuity assessment, and report summaries. However, LLMs have demonstrated significantly varied performance across different languages in natural language question-answering tasks, potentially exacerbating healthcare disparities in Low and Middle-Income Countries (LMICs). This study introduces the first multilingual ophthalmological question-answering benchmark with manually curated questions parallel across languages, allowing for direct cross-lingual comparisons. Our evaluation of 6 popular LLMs across 7 different languages reveals substantial bias across different languages, highlighting risks for clinical deployment of LLMs in LMICs. Existing debiasing methods such as Translation Chain-of-Thought or Retrieval-augmented generation (RAG) by themselves fall short of closing this performance gap, often failing to improve performance across all languages and lacking specificity for the medical domain. To address this issue, We propose CLARA (Cross-Lingual Reflective Agentic system), a novel inference time de-biasing method leveraging retrieval augmented generation and self-verification. Our approach not only improves performance across all languages but also significantly reduces the multilingual bias gap, facilitating equitable LLM application across the globe., Comment: Accepted at the AAAI 2025 Artificial Intelligence for Social Impact Track (AAAI-AISI 2025)
- Published
- 2024
13. ZTF SN Ia DR2: Properties of the low-mass host galaxies of Type Ia supernovae in a volume-limited sample
- Author
-
Burgaz, U., Maguire, K., Dimitriadis, G., Smith, M., Sollerman, J., Galbany, L., Rigault, M., Goobar, A., Johansson, J., Kim, Y. -L., Alburai, A., Amenouche, M., Deckers, M., Ginolin, M., Harvey, L., Muller-Bravo, T. E., Nordin, J., Phan, K., Rosnet, P., Nugent, P. E., Terwel, J. H., Graham, M., Hale, D., Kasliwal, M. M., Laher, R. R., Neill, J. D., Purdum, J., and Rusholme, B.
- Subjects
Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Astrophysics of Galaxies - Abstract
In this study, we explore the characteristics of `low-mass' ($\log(M_{\star}/M_{\odot}) \leq 8$) and `intermediate-mass' ($8 \lt \log(M_{\star}/M_{\odot}) \leq 10$) host galaxies of Type Ia supernovae (SNe Ia) from the second data release (DR2) of the Zwicky Transient Facility survey and investigate their correlations with different sub-types of SNe Ia. We use the photospheric velocities measured from the Si II $\lambda$6355 feature, SALT2 light-curve stretch ($x_1$) and host-galaxy properties of SNe Ia to re-investigate the existing relationship between host galaxy mass and Si II $\lambda$6355 velocities. We also investigate sub-type preferences for host populations and show that while the more energetic and brighter 91T-like SNe Ia tends to populate the younger host populations, 91bg-like SNe Ia populate in the older populations. Our findings suggest High Velocity SNe Ia (HV SNe Ia) not only comes from the older populations but they also come from young populations as well. Therefore, while our findings can partially provide support for HV SNe Ia relating to single degenerate progenitor models, they indicate that HV SNe Ia other than being a different population, might be a continued distribution with different explosion mechanisms. We lastly investigate the specific rate of SNe Ia in the volume-limited SN Ia sample of DR2 and compare with other surveys., Comment: 13 pages, 8 figures, accepted for publication in Astronomy & Astrophysics
- Published
- 2024
14. The networked input-output economic problem
- Author
-
Trinh, Minh Hoang, Le-Phan, Nhat-Minh, and Ahn, Hyo-Sung
- Subjects
Electrical Engineering and Systems Science - Systems and Control ,Mathematics - Optimization and Control - Abstract
In this paper, we formulate an input-output economic model with multiple interactive economic systems. The model captures the multi-dimensional nature of the economic sectors or industries in each economic system, the interdependencies among industries within an economic system and across different economic systems, and the influence of demand. To determine the equilibrium price structure of the model, a matrix-weighted updating algorithm is proposed. We prove that the equilibrium price structure can be globally asymptotically achieved given that certain joint conditions on the matrix-weighted graph and the input-output matrices are satisfied. The theoretical results are then supported by numerical simulations., Comment: 14 pages, 3 figures, preprint
- Published
- 2024
15. Entropy approach for a generalization of Frankl's conjecture
- Author
-
Phan, Veronica
- Subjects
Mathematics - Combinatorics ,05D40 - Abstract
In this paper, we will use the entropy approach to derive a necessary and sufficient condition for the existence of an element that belongs to at least half of the sets in a finite family of sets., Comment: 4 pages, 0 figures
- Published
- 2024
16. VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction
- Author
-
Tran, Khai Phan, Hua, Wen, and Li, Xue
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Document-level Relation Extraction (DocRE) aims to identify relationships between entity pairs within a document. However, most existing methods assume a uniform label distribution, resulting in suboptimal performance on real-world, imbalanced datasets. To tackle this challenge, we propose a novel data augmentation approach using generative models to enhance data from the embedding space. Our method leverages the Variational Autoencoder (VAE) architecture to capture all relation-wise distributions formed by entity pair representations and augment data for underrepresented relations. To better capture the multi-label nature of DocRE, we parameterize the VAE's latent space with a Diffusion Model. Additionally, we introduce a hierarchical training framework to integrate the proposed VAE-based augmentation module into DocRE systems. Experiments on two benchmark datasets demonstrate that our method outperforms state-of-the-art models, effectively addressing the long-tail distribution problem in DocRE., Comment: COLING 2025
- Published
- 2024
17. Wilson Loop and Topological Properties in 3D Woodpile Photonic Crystal
- Author
-
Phan, Huyen Thanh, Takahashi, Shun, Iwamoto, Satoshi, and Wakabayashi, Katsunori
- Subjects
Physics - Optics ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
We numerically study the first and the second order topological states of electromagnetic (EM) wave in the three-dimensional (3D) woodpile photonic crystal (PhC). The recent studies on 3D PhCs have mainly focused on the observation of the topological states. Here, we not only focus on finding the topological states but also propose a numerical calculation method for topological invariants, which is based on the Wilson loop. For the 3D woodpile PhC, the topological states emerge due to the finite difference in the winding number or partial Chern number. The selection rule for the emergence of topological hinge states is also pointed out based on the topological invariants. Our numerical calculation results are essential and put a step toward the experimental realization of topological waveguide in 3D PhCs., Comment: 10 pages, 6 figures
- Published
- 2024
- Full Text
- View/download PDF
18. An Interoperable Machine Learning Pipeline for Pediatric Obesity Risk Estimation
- Author
-
Fayyaz, Hamed, Gupta, Mehak, Ramirez, Alejandra Perez, Jurkovitz, Claudine, Bunnell, H. Timothy, Phan, Thao-Ly T., and Beheshti, Rahmatollah
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Reliable prediction of pediatric obesity can offer a valuable resource to providers, helping them engage in timely preventive interventions before the disease is established. Many efforts have been made to develop ML-based predictive models of obesity, and some studies have reported high predictive performances. However, no commonly used clinical decision support tool based on existing ML models currently exists. This study presents a novel end-to-end pipeline specifically designed for pediatric obesity prediction, which supports the entire process of data extraction, inference, and communication via an API or a user interface. While focusing only on routinely recorded data in pediatric electronic health records (EHRs), our pipeline uses a diverse expert-curated list of medical concepts to predict the 1-3 years risk of developing obesity. Furthermore, by using the Fast Healthcare Interoperability Resources (FHIR) standard in our design procedure, we specifically target facilitating low-effort integration of our pipeline with different EHR systems. In our experiments, we report the effectiveness of the predictive model as well as its alignment with the feedback from various stakeholders, including ML scientists, providers, health IT personnel, health administration representatives, and patient group representatives.
- Published
- 2024
19. Minimizing sequences of Sobolev inequalities revisited
- Author
-
Dietze, Charlotte and Nam, Phan Thành
- Subjects
Mathematics - Analysis of PDEs ,Mathematical Physics - Abstract
We give a new proof of the compactness of minimizing sequences of the Sobolev inequalities in the critical case. Our approach relies on a simplified version of the concentration-compactness principle, which does not require any refinement of the Sobolev embedding theorem., Comment: 12 pages
- Published
- 2024
20. DocSum: Domain-Adaptive Pre-training for Document Abstractive Summarization
- Author
-
Chau, Phan Phuong Mai, Bakkali, Souhail, and Doucet, Antoine
- Subjects
Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Abstractive summarization has made significant strides in condensing and rephrasing large volumes of text into coherent summaries. However, summarizing administrative documents presents unique challenges due to domain-specific terminology, OCR-generated errors, and the scarcity of annotated datasets for model fine-tuning. Existing models often struggle to adapt to the intricate structure and specialized content of such documents. To address these limitations, we introduce DocSum, a domain-adaptive abstractive summarization framework tailored for administrative documents. Leveraging pre-training on OCR-transcribed text and fine-tuning with an innovative integration of question-answer pairs, DocSum enhances summary accuracy and relevance. This approach tackles the complexities inherent in administrative content, ensuring outputs that align with real-world business needs. To evaluate its capabilities, we define a novel downstream task setting-Document Abstractive Summarization-which reflects the practical requirements of business and organizational settings. Comprehensive experiments demonstrate DocSum's effectiveness in producing high-quality summaries, showcasing its potential to improve decision-making and operational workflows across the public and private sectors.
- Published
- 2024
21. The weak Lefschetz properties of artinian monomial algebras associated to certain tadpole graphs
- Author
-
Hung, Phan Minh, Phuoc, Nguyen Duy, and Son, Tran Nguyen Thanh
- Subjects
Mathematics - Commutative Algebra ,Mathematics - Combinatorics - Abstract
Given a simple graph $G$, the artinian monomial algebra associated to $G$, denoted by $A(G)$, is defined by the edge ideal of $G$ and the squares of the variables. In this article, we classify some tadpole graphs $G$ for which $A(G)$ has or fails the weak Lefschetz property., Comment: 12 pages. arXiv admin note: substantial text overlap with arXiv:2310.14368 by other authors
- Published
- 2024
22. Top observables as precise probes of the ALP
- Author
-
Phan, Anh Vu
- Subjects
High Energy Physics - Phenomenology ,High Energy Physics - Experiment - Abstract
Measurements of the top quark by the ATLAS and CMS experiments go beyond testing the Standard Model (SM) with high precision. Axion-like particles (ALPs), a potential SM extension involving new pseudoscalar particles, exhibit strong interactions with heavy SM fermions. Consequently, they can significantly affect the kinematic distributions of top quarks in top-antitop pair production. Moreover, such strong interactions can induce other ALP couplings at low energies, leading to a rich phenomenology. We summarize recent developments in probing the ALP-top coupling and use LHC data from run 2 to constrain the ALP parameter space., Comment: 7 pages, 4 figures. Talk at the 17th International Workshop on Top Quark Physics (Top2024), 22-27 September 2024
- Published
- 2024
23. Gentle robustness implies Generalization
- Author
-
Than, Khoat, Phan, Dat, and Vu, Giang
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Robustness and generalization ability of machine learning models are of utmost importance in various application domains. There is a wide interest in efficient ways to analyze those properties. One important direction is to analyze connection between those two properties. Prior theories suggest that a robust learning algorithm can produce trained models with a high generalization ability. However, we show in this work that the existing error bounds are vacuous for the Bayes optimal classifier which is the best among all measurable classifiers for a classification problem with overlapping classes. Those bounds cannot converge to the true error of this ideal classifier. This is undesirable, surprizing, and never known before. We then present a class of novel bounds, which are model-dependent and provably tighter than the existing robustness-based ones. Unlike prior ones, our bounds are guaranteed to converge to the true error of the best classifier, as the number of samples increases. We further provide an extensive experiment and find that two of our bounds are often non-vacuous for a large class of deep neural networks, pretrained from ImageNet.
- Published
- 2024
24. Relativistic Electron Acceleration and the 'Ankle' Spectral Feature in Earth's Magnetotail Reconnection
- Author
-
Sun, Weijie, Oka, Mitsuo, Øieroset, Marit, Turner, Drew L., Phan, Tai, Cohen, Ian J., Li, Xiaocan, Huang, Jia, Smith, Andy, Slavin, James A., Poh, Gangkai, Genestreti, Kevin J., Gershman, Dan, Dokgo, Kyunghwan., Le, Guan, Nakamura, Rumi, and Burch, James L.
- Subjects
Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Earth and Planetary Astrophysics ,Astrophysics - Solar and Stellar Astrophysics ,Physics - Space Physics - Abstract
Electrons are accelerated to high, non-thermal energies during explosive energy-release events in space, such as magnetic reconnection. However, the properties and acceleration mechanisms of relativistic electrons directly associated with reconnection X-line are not well understood. This study utilizes Magnetospheric Multiscale (MMS) measurements to analyze the flux and spectral features of sub-relativistic to relativistic (~ 80 to 560 keV) electrons during a magnetic reconnection event in Earth's magnetotail. This event provided a unique opportunity to measure the electrons directly energized by X-line as MMS stayed in the separatrix layer, where the magnetic field directly connects to the X-line, for approximately half of the observation period. Our analysis revealed that the fluxes of relativistic electrons were clearly enhanced within the separatrix layer, and the highest flux was directed away from the X-line, which suggested that these electrons originated directly from the X-line. Spectral analysis showed that these relativistic electrons deviated from the main plasma sheet population and exhibited an "ankle" feature similar to that observed in galactic cosmic rays. The contribution of "ankle" electrons to the total electron energy density increased from 0.1% to 1% in the separatrix layer, though the spectral slopes did not exhibit clear variations. Further analysis indicated that while these relativistic electrons originated from the X-line, they experienced a non-negligible degree of scattering during transport. These findings provide clear evidence that magnetic reconnection in Earth's magnetotail can efficiently energize relativistic electrons directly at the X-line, providing new insights into the complex processes governing electron dynamics during magnetic reconnection., Comment: 23 pages, 5 figures
- Published
- 2024
25. A new Time-decay Radiomics Integrated Network (TRINet) for short-term breast cancer risk prediction
- Author
-
Yeoh, Hong Hui, Strand, Fredrik, Phan, Raphaël, Rahmat, Kartini, and Tan, Maxine
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing - Abstract
To facilitate early detection of breast cancer, there is a need to develop short-term risk prediction schemes that can prescribe personalized/individualized screening mammography regimens for women. In this study, we propose a new deep learning architecture called TRINet that implements time-decay attention to focus on recent mammographic screenings, as current models do not account for the relevance of newer images. We integrate radiomic features with an Attention-based Multiple Instance Learning (AMIL) framework to weigh and combine multiple views for better risk estimation. In addition, we introduce a continual learning approach with a new label assignment strategy based on bilateral asymmetry to make the model more adaptable to asymmetrical cancer indicators. Finally, we add a time-embedded additive hazard layer to perform dynamic, multi-year risk forecasting based on individualized screening intervals. We used two public datasets, namely 8,528 patients from the American EMBED dataset and 8,723 patients from the Swedish CSAW dataset in our experiments. Evaluation results on the EMBED test set show that our approach significantly outperforms state-of-the-art models, achieving AUC scores of 0.851, 0.811, 0.796, 0.793, and 0.789 across 1-, 2-, to 5-year intervals, respectively. Our results underscore the importance of integrating temporal attention, radiomic features, time embeddings, bilateral asymmetry, and continual learning strategies, providing a more adaptive and precise tool for short-term breast cancer risk prediction.
- Published
- 2024
26. Ion exchange synthesizes layered polymorphs of MgZrN$_2$ and MgHfN$_2$, two metastable semiconductors
- Author
-
Rom, Christopher L., Jankousky, Matthew, Phan, Maxwell Q., O'Donnell, Shaun, Regier, Corlyn, Neilson, James R., Stevanovic, Vladan, and Zakutayev, Andriy
- Subjects
Condensed Matter - Materials Science - Abstract
The synthesis of ternary nitrides is uniquely difficult, in large part because elemental N$_2$ is relatively inert. However, lithium reacts readily with other metals and N$_2$, making Li-M-N the most numerous sub-set of ternary nitrides. Here, we use Li$_2$ZrN$_2$, a ternary with a simple synthesis recipe, as a precursor for ion exchange reactions towards AZrN$_2$ (A = Mg, Fe, Cu, Zn). In situ synchrotron powder X-ray diffraction studies show that Li$^+$ and Mg$^{2+}$ undergo ion exchange topochemically, preserving the layers of octahedral [ZrN$_6$] to yield a metastable layered polymorph of MgZrN$_2$ (spacegroup $R\overline{3}m$) rather than the calculated ground state structure ($I41/amd$). UV-vis measurements show an optical absorption onset near 2.0 eV, consistent with the calculated bandgap for this polymorph. Our experimental attempts to extend this ion exchange method towards FeZrN$_2$, CuZrN$_2$, and ZnZrN$_2$ resulted in decomposition products (A + ZrN + 1/6 N$_2$), an outcome that our computational results explain via the higher metastability of these phases. We successfully extended this ion exchange method to other Li-M-N precursors by synthesizing MgHfN$_2$ from Li$_2$HfN$_2$. In addition to the discovery of metastable $R\overline{3}m$ MgZrN$_2$ and MgHfN$_2$, this work highlights the potential of the 63 unique Li-M-N phases as precursors to synthesize new ternary nitrides.
- Published
- 2024
27. Unveiling Concept Attribution in Diffusion Models
- Author
-
Nguyen, Quang H., Phan, Hoang, and Doan, Khoa D.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Diffusion models have shown remarkable abilities in generating realistic and high-quality images from text prompts. However, a trained model remains black-box; little do we know about the role of its components in exhibiting a concept such as objects or styles. Recent works employ causal tracing to localize layers storing knowledge in generative models without showing how those layers contribute to the target concept. In this work, we approach the model interpretability problem from a more general perspective and pose a question: \textit{``How do model components work jointly to demonstrate knowledge?''}. We adapt component attribution to decompose diffusion models, unveiling how a component contributes to a concept. Our framework allows effective model editing, in particular, we can erase a concept from diffusion models by removing positive components while remaining knowledge of other concepts. Surprisingly, we also show there exist components that contribute negatively to a concept, which has not been discovered in the knowledge localization approach. Experimental results confirm the role of positive and negative components pinpointed by our framework, depicting a complete view of interpreting generative models. Our code is available at \url{https://github.com/mail-research/CAD-attribution4diffusion}
- Published
- 2024
28. Fast ground-to-air transition with avian-inspired multifunctional legs
- Author
-
Shin, Won Dong, Phan, Hoang-Vu, Daley, Monica A., Ijspeert, Auke J., and Floreano, Dario
- Subjects
Computer Science - Robotics ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Most birds can navigate seamlessly between aerial and terrestrial environments. Whereas the forelimbs evolved into wings primarily for flight, the hindlimbs serve diverse functions such as walking, hopping, and leaping, and jumping take-off for transitions into flight. These capabilities have inspired engineers to aim for similar multi-modality in aerial robots, expanding their range of applications across diverse environments. However, challenges remain in reproducing multi-modal locomotion, across gaits with distinct kinematics and propulsive characteristics, such as walking and jumping, while preserving lightweight mass for flight. This tradeoff between mechanical complexity and versatility limits most existing aerial robots to only one additional locomotor mode. Here, we overcome the complexity-versatility tradeoff with RAVEN (Robotic Avian-inspired Vehicle for multiple ENvironments), which uses its bird-inspired multi-functional legs to jump rapidly into flight, walk on ground and hop over obstacles and gaps similar to the multi-modal locomotion of birds. We show that jumping for take-off contributes substantially to initial flight take-off speed and, remarkably, that it is more energy-efficient than solely propeller-based take-off. Our analysis suggests an important tradeoff in mass distribution between legs and body among birds adapted for different locomotor strategies, with greater investment in leg mass among terrestrial birds with multi-modal gait demands. Multi-functional robot legs expand opportunities to deploy traditional fixed-wing aircraft in complex terrains through autonomous take-offs and multi-modal gaits.
- Published
- 2024
- Full Text
- View/download PDF
29. Nonnegative Tensor Decomposition Via Collaborative Neurodynamic Optimization
- Author
-
Ahmadi-Asl, Salman, Leplat, Valentin, Phan, Anh-Huy, and Cichocki, Andrzej
- Subjects
Mathematics - Numerical Analysis - Abstract
This paper introduces a novel collaborative neurodynamic model for computing nonnegative Canonical Polyadic Decomposition (CPD). The model relies on a system of recurrent neural networks to solve the underlying nonconvex optimization problem associated with nonnegative CPD. Additionally, a discrete-time version of the continuous neural network is developed. To enhance the chances of reaching a potential global minimum, the recurrent neural networks are allowed to communicate and exchange information through particle swarm optimization (PSO). Convergence and stability analyses of both the continuous and discrete neurodynamic models are thoroughly examined. Experimental evaluations are conducted on random and real-world datasets to demonstrate the effectiveness of the proposed approach.
- Published
- 2024
30. Explainable deep learning improves human mental models of self-driving cars
- Author
-
Kenny, Eoin M., Dharmavaram, Akshay, Lee, Sang Uk, Phan-Minh, Tung, Rajesh, Shreyas, Hu, Yunqing, Major, Laura, Tomov, Momchil S., and Shah, Julie A.
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. However, the opacity of such black-box motion planners makes it challenging for the human behind the wheel to accurately anticipate when they will fail, with potentially catastrophic consequences. Here, we introduce concept-wrapper network (i.e., CW-Net), a method for explaining the behavior of black-box motion planners by grounding their reasoning in human-interpretable concepts. We deploy CW-Net on a real self-driving car and show that the resulting explanations refine the human driver's mental model of the car, allowing them to better predict its behavior and adjust their own behavior accordingly. Unlike previous work using toy domains or simulations, our study presents the first real-world demonstration of how to build authentic autonomous vehicles (AVs) that give interpretable, causally faithful explanations for their decisions, without sacrificing performance. We anticipate our method could be applied to other safety-critical systems with a human in the loop, such as autonomous drones and robotic surgeons. Overall, our study suggests a pathway to explainability for autonomous agents as a whole, which can help make them more transparent, their deployment safer, and their usage more ethical., Comment: * - equal contribution
- Published
- 2024
31. Superparamagnetic Superparticles for Magnetic Hyperthermia Therapy: Overcoming the Particle Size Limit
- Author
-
Attanayake, Supun B., Nguyen, Minh Dang, Chanda, Amit, Alonso, Javier, Orue, Inaki, Lee, T. Randall, Srikanth, Hariharan, and Phan, Manh-Huong
- Subjects
Physics - Applied Physics ,Condensed Matter - Materials Science - Abstract
Iron oxide (e.g., Fe$_3$O$_4$ or Fe$_2$O$_3$) nanoparticles are promising candidates for a variety of biomedical applications ranging from magnetic hyperthermia therapy to drug delivery and bio-detection, due to their superparamagnetism, non-toxicity, and biodegradability. While particles of small size (below a critical size, ~20 nm) display superparamagnetic behavior at room temperature, these particles tend to penetrate highly sensitive areas of the body such as the Blood-Brain Barrier (BBB), leading to undesired effects. In addition, these particles possess a high probability of retention, which can lead to genotoxicity and biochemical toxicity. Increasing particle size is a means for addressing these problems but also suppresses the superparamagnetism. We have overcome this particle size limit by synthesizing unique polycrystalline iron oxide nanoparticles composed of multiple nanocrystals of 10 to 15 nm size while tuning particle size from 160 to 400 nm. These so-called superparticles preserve superparamagnetic characteristics and exhibit excellent hyperthermia responses. The specific absorption rates (SAR) exceed 250 W/g (HAC = 800 Oe, f = 310 kHz) at a low concentration of 0.5 mg/mL, indicating their capability in cancer treatment with minimum dose. Our study underscores the potential of size-tunable polycrystalline iron oxide superparticles with superparamagnetic properties for advanced biomedical applications and sensing technologies.
- Published
- 2024
32. An Attempt to Develop a Neural Parser based on Simplified Head-Driven Phrase Structure Grammar on Vietnamese
- Author
-
Nguyen, Duc-Vu, Phan, Thang Chau, Nguyen, Quoc-Nam, Van Nguyen, Kiet, and Nguyen, Ngan Luu-Thuy
- Subjects
Computer Science - Computation and Language - Abstract
In this paper, we aimed to develop a neural parser for Vietnamese based on simplified Head-Driven Phrase Structure Grammar (HPSG). The existing corpora, VietTreebank and VnDT, had around 15% of constituency and dependency tree pairs that did not adhere to simplified HPSG rules. To attempt to address the issue of the corpora not adhering to simplified HPSG rules, we randomly permuted samples from the training and development sets to make them compliant with simplified HPSG. We then modified the first simplified HPSG Neural Parser for the Penn Treebank by replacing it with the PhoBERT or XLM-RoBERTa models, which can encode Vietnamese texts. We conducted experiments on our modified VietTreebank and VnDT corpora. Our extensive experiments showed that the simplified HPSG Neural Parser achieved a new state-of-the-art F-score of 82% for constituency parsing when using the same predicted part-of-speech (POS) tags as the self-attentive constituency parser. Additionally, it outperformed previous studies in dependency parsing with a higher Unlabeled Attachment Score (UAS). However, our parser obtained lower Labeled Attachment Score (LAS) scores likely due to our focus on arc permutation without changing the original labels, as we did not consult with a linguistic expert. Lastly, the research findings of this paper suggest that simplified HPSG should be given more attention to linguistic expert when developing treebanks for Vietnamese natural language processing., Comment: Accepted at SoICT 2024
- Published
- 2024
33. Derivation of recursive formulas for integrals of Hermite polynomial products and their applications
- Author
-
Son, Phan Quang, Anh-Tai, Tran Duong, Khang, Le Minh, Vy, Nguyen Duy, and Pham, Vinh N. T.
- Subjects
Quantum Physics - Abstract
In this work, we derive three recursive formulas for the integrals of products of Hermite polynomials. The derivation is notably straightforward, relying solely on the well-established properties of Hermite polynomials and the technique of integration by parts. These results hold broad relevance across various fields of physics and mathematics. Specifically, they would be applied to accurately compute two- and three-body matrix elements in ab initio simulations of one-dimensional few-body systems confined in harmonic traps. Additionally, we provide a numerical subroutine that implements these recursive formulas, which accompanies this work., Comment: 12 pages, comments are welcome
- Published
- 2024
34. FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation
- Author
-
Pham, Trong Thang, Ho, Ngoc-Vuong, Bui, Nhat-Tan, Phan, Thinh, Brijesh, Patel, Adjeroh, Donald, Doretto, Gianfranco, Nguyen, Anh, Wu, Carol C., Nguyen, Hien, and Le, Ngan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Developing an interpretable system for generating reports in chest X-ray (CXR) analysis is becoming increasingly crucial in Computer-aided Diagnosis (CAD) systems, enabling radiologists to comprehend the decisions made by these systems. Despite the growth of diverse datasets and methods focusing on report generation, there remains a notable gap in how closely these models' generated reports align with the interpretations of real radiologists. In this study, we tackle this challenge by initially introducing Fine-Grained CXR (FG-CXR) dataset, which provides fine-grained paired information between the captions generated by radiologists and the corresponding gaze attention heatmaps for each anatomy. Unlike existing datasets that include a raw sequence of gaze alongside a report, with significant misalignment between gaze location and report content, our FG-CXR dataset offers a more grained alignment between gaze attention and diagnosis transcript. Furthermore, our analysis reveals that simply applying black-box image captioning methods to generate reports cannot adequately explain which information in CXR is utilized and how long needs to attend to accurately generate reports. Consequently, we propose a novel explainable radiologist's attention generator network (Gen-XAI) that mimics the diagnosis process of radiologists, explicitly constraining its output to closely align with both radiologist's gaze attention and transcript. Finally, we perform extensive experiments to illustrate the effectiveness of our method. Our datasets and checkpoint is available at https://github.com/UARK-AICV/FG-CXR., Comment: ACCV 2024
- Published
- 2024
35. Dynamics of an LPAA model for Tribolium Growth: Insights into Population Chaos
- Author
-
Brozak, Samantha J., Peralta, Sophia, Phan, Tin, Nagy, John D., and Kuang, Yang
- Subjects
Quantitative Biology - Populations and Evolution ,Mathematics - Dynamical Systems ,37N25, 92B05 - Abstract
Flour beetles (genus Tribolium) have long been used as a model organism to understand population dynamics in ecological research. A rich and rigorous body of work has cemented flour beetles' place in the field of mathematical biology. One of the most interesting results using flour beetles is the induction of chaos in a laboratory beetle population, in which the well-established LPA (larvae-pupae-adult) model was used to inform the experimental factors which would lead to chaos. However, whether chaos is an intrinsic property of flour beetles remains an open question. Inspired by new experimental data, we extend the LPA model by stratifying the adult population into newly emerged and mature adults and considering cannibalism as a function of mature adults. We fit the model to longitudinal data of larvae, pupae, and adult beetle populations to demonstrate the model's ability to recapitulate the transient dynamics of flour beetles. We present local and global stability results for the trivial and positive steady states and explore bifurcations and limit cycles numerically. Our results suggest that while chaos is a possibility, it is a rare phenomenon within realistic ranges of the parameters obtained from our experiment, and is likely induced by environmental changes connected to media changes and population censusing., Comment: 22 pages, 10 figures
- Published
- 2024
- Full Text
- View/download PDF
36. SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model
- Author
-
Nguyen, Christopher, Nguyen, William, Suzuki, Atsushi, Oku, Daisuke, Phan, Hong An, Dinh, Sang, Nguyen, Zooey, Ha, Anh, Raghavan, Shruti, Vo, Huy, Nguyen, Thang, Nguyen, Lan, and Hirayama, Yoshikuni
- Subjects
Computer Science - Computation and Language - Abstract
Large Language Models (LLMs) have demonstrated the potential to address some issues within the semiconductor industry. However, they are often general-purpose models that lack the specialized knowledge needed to tackle the unique challenges of this sector, such as the intricate physics and chemistry of semiconductor devices and processes. SemiKong, the first industry-specific LLM for the semiconductor domain, provides a foundation that can be used to develop tailored proprietary models. With SemiKong 1.0, we aim to develop a foundational model capable of understanding etching problems at an expert level. Our key contributions include (a) curating a comprehensive corpus of semiconductor-related texts, (b) creating a foundational model with in-depth semiconductor knowledge, and (c) introducing a framework for integrating expert knowledge, thereby advancing the evaluation process of domain-specific AI models. Through fine-tuning a pre-trained LLM using our curated dataset, we have shown that SemiKong outperforms larger, general-purpose LLMs in various semiconductor manufacturing and design tasks. Our extensive experiments underscore the importance of developing domain-specific LLMs as a foundation for company- or tool-specific proprietary models, paving the way for further research and applications in the semiconductor domain. Code and dataset will be available at https://github.com/aitomatic/semikong, Comment: On-going work
- Published
- 2024
37. On a generalized derivative nonlinear Schr\'odinger equation
- Author
-
van Tin, Phan
- Subjects
Mathematics - Analysis of PDEs - Abstract
We consider a generalized derivative nonlinear Schr\''odinger equation. We prove existence of wave operator under an explicit smallness of the given asymptotic states. Our method bases on studying the associated system used in \cite{Tinpaper4}. Moreover, we show that if the initial data is small enough in $H^2(\mathbb{R})$ then the associated solution scatters up to a Gauge transformation.
- Published
- 2024
38. A Survey of Medical Vision-and-Language Applications and Their Techniques
- Author
-
Chen, Qi, Zhao, Ruoshan, Wang, Sinuo, Phan, Vu Minh Hieu, Hengel, Anton van den, Verjans, Johan, Liao, Zhibin, To, Minh-Son, Xia, Yong, Chen, Jian, Xie, Yutong, and Wu, Qi
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Their applications are versatile and have the potential to improve diagnostic accuracy and decision-making for individual patients while also contributing to enhanced public health monitoring, disease surveillance, and policy-making through more efficient analysis of large data sets. MVLMS integrate natural language processing with medical images to enable a more comprehensive and contextual understanding of medical images alongside their corresponding textual information. Unlike general vision-and-language models trained on diverse, non-specialized datasets, MVLMs are purpose-built for the medical domain, automatically extracting and interpreting critical information from medical images and textual reports to support clinical decision-making. Popular clinical applications of MVLMs include automated medical report generation, medical visual question answering, medical multimodal segmentation, diagnosis and prognosis and medical image-text retrieval. Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied. We conduct a detailed analysis of various vision-and-language model architectures, focusing on their distinct strategies for cross-modal integration/exploitation of medical visual and textual features. We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics. Furthermore, we highlight potential challenges and summarize future research trends and directions. The full collection of papers and codes is available at: https://github.com/YtongXie/Medical-Vision-and-Language-Tasks-and-Methodologies-A-Survey.
- Published
- 2024
39. Toward a Better Understanding of the Photothermal Heating of High-Entropy-Alloy Nanoparticles
- Author
-
Que, Ngo T., Nga, Do T., Phan, Anh D., and Tu, Le M.
- Subjects
Condensed Matter - Materials Science ,Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Other Condensed Matter ,Physics - Computational Physics - Abstract
We present a theoretical approach, for the first time, to investigate optical and photothermal properties of high-entropy alloy nanoparticles with a focus on FeCoNi-based alloys. We systematically analyze the absorption spectra of spherical nanoparticles composed of pure metals and alloys in various surrounding media. Through comparison with experimental data, we select appropriate dielectric data for the constituent elements to accurately compute absorption spectra for FeCoNi-based high-entropy-alloy nanoparticles. Then, we predict the temperature rise over time within a substrate comprised of Fe nanoparticles exposed to solar irradiation and find quantitative agreement with experimental data for FeCoNi nanoparticles reported in previous studies. The striking similarity between the optical and photothermal behaviors of FeCoNi nanoparticles and their pure iron counterparts suggests that iron nanoparticles can effectively serve as a model for understanding the optical and thermal response of FeCoNi-based alloy nanoparticles. These findings offer a simplified approach for theoretical modeling of complex high-entropy alloys and provide valuable insights into their nanoscale optical behavior., Comment: 11 pages, 9 figures, the manuscript has been accepted for publication in Materials Today Communications
- Published
- 2024
40. Real-time stress detection on social network posts using big data technology
- Author
-
Nguyen, Hai-Yen Phan, Ly, Phi-Lan, Le, Duc-Manh, and Do, Trong-Hop
- Subjects
Computer Science - Machine Learning - Abstract
In the context of modern life, particularly in Industry 4.0 within the online space, emotions and moods are frequently conveyed through social media posts. The trend of sharing stories, thoughts, and feelings on these platforms generates a vast and promising data source for Big Data. This creates both a challenge and an opportunity for research in applying technology to develop more automated and accurate methods for detecting stress in social media users. In this study, we developed a real-time system for stress detection in online posts, using the "Dreaddit: A Reddit Dataset for Stress Analysis in Social Media," which comprises 187,444 posts across five different Reddit domains. Each domain contains texts with both stressful and non-stressful content, showcasing various expressions of stress. A labeled dataset of 3,553 lines was created for training. Apache Kafka, PySpark, and AirFlow were utilized to build and deploy the model. Logistic Regression yielded the best results for new streaming data, achieving 69,39% for measuring accuracy and 68,97 for measuring F1-scores., Comment: 6 pages, 4 figures
- Published
- 2024
41. Deep Learning Models for UAV-Assisted Bridge Inspection: A YOLO Benchmark Analysis
- Author
-
Phan, Trong-Nhan, Nguyen, Hoang-Hai, Ha, Thi-Thu-Hien, Thai, Huy-Tan, and Le, Kim-Hung
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Visual inspections of bridges are critical to ensure their safety and identify potential failures early. This inspection process can be rapidly and accurately automated by using unmanned aerial vehicles (UAVs) integrated with deep learning models. However, choosing an appropriate model that is lightweight enough to integrate into the UAV and fulfills the strict requirements for inference time and accuracy is challenging. Therefore, our work contributes to the advancement of this model selection process by conducting a benchmark of 23 models belonging to the four newest YOLO variants (YOLOv5, YOLOv6, YOLOv7, YOLOv8) on COCO-Bridge-2021+, a dataset for bridge details detection. Through comprehensive benchmarking, we identify YOLOv8n, YOLOv7tiny, YOLOv6m, and YOLOv6m6 as the models offering an optimal balance between accuracy and processing speed, with mAP@50 scores of 0.803, 0.837, 0.853, and 0.872, and inference times of 5.3ms, 7.5ms, 14.06ms, and 39.33ms, respectively. Our findings accelerate the model selection process for UAVs, enabling more efficient and reliable bridge inspections.
- Published
- 2024
42. DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation
- Author
-
Phung, Hao, Dao, Quan, Dao, Trung, Phan, Hoang, Metaxas, Dimitris, and Tran, Anh
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
We introduce a novel state-space architecture for diffusion models, effectively harnessing spatial and frequency information to enhance the inductive bias towards local features in input images for image generation tasks. While state-space networks, including Mamba, a revolutionary advancement in recurrent neural networks, typically scan input sequences from left to right, they face difficulties in designing effective scanning strategies, especially in the processing of image data. Our method demonstrates that integrating wavelet transformation into Mamba enhances the local structure awareness of visual inputs and better captures long-range relations of frequencies by disentangling them into wavelet subbands, representing both low- and high-frequency components. These wavelet-based outputs are then processed and seamlessly fused with the original Mamba outputs through a cross-attention fusion layer, combining both spatial and frequency information to optimize the order awareness of state-space models which is essential for the details and overall quality of image generation. Besides, we introduce a globally-shared transformer to supercharge the performance of Mamba, harnessing its exceptional power to capture global relationships. Through extensive experiments on standard benchmarks, our method demonstrates superior results compared to DiT and DIFFUSSM, achieving faster training convergence and delivering high-quality outputs. The codes and pretrained models are released at https://github.com/VinAIResearch/DiMSUM.git., Comment: Accepted to NeurIPS 2024. Project page: https://vinairesearch.github.io/DiMSUM/
- Published
- 2024
43. Rational Design Heterobilayers Photocatalysts for Efficient Water Splitting Based on 2D Transition-Metal Dichalcogenide and Their Janus
- Author
-
Bao, Nguyen Tran Gia, Trang, Ton Nu Quynh, Thoai, Nam, Thang, Phan Bach, Thu, Vu Thi Hanh, and Hung, Nguyen Tuan
- Subjects
Condensed Matter - Materials Science - Abstract
Direct Z-scheme heterostructures with enhanced redox potential are increasingly regarded as promising materials for solar-driven water splitting. This potential arises from the synergistic interaction between the intrinsic dipoles in Janus materials and the interfacial electric fields across the layers. In this study, we explore the photocatalytic potential of 20 two-dimensional (2D) Janus transition metal dichalcogenide (TMDC) heterobilayers for efficient water splitting. Utilizing density functional theory (DFT) calculations, we first screen these materials based on key properties such as band gaps and the magnitude of intrinsic electric fields to identify promising candidates. We then evaluate additional critical factors, including carrier mobility and surface chemical reactions, to fully assess their performance. The intrinsic dipole moments in Janus materials generate built-in electric fields that enhance charge separation and reduce carrier recombination, thereby improving photocatalytic efficiency. Furthermore, we employ the Fr\"{o}hlich interaction model to quantify the mobility contributions from the longitudinal optical phonon mode, providing detailed insights into how carrier mobility, influenced by phonon scattering, affects photocatalytic performance. Our results reveal that several Janus-TMDC heterobilayers, including WSe$_2$-SWSe, WSe$_2$-TeWSe, and WS$_2$-SMoSe, exhibit strong absorption in the visible spectrum and achieve solar-to-hydrogen (STH) conversion efficiencies of up to 33.24%. These findings demonstrate the potential of Janus-based Z-scheme systems to overcome existing limitations in photocatalytic water splitting by optimizing the electronic and structural properties of 2D materials. This research highlights a viable pathway for advancing clean energy generation through enhanced photocatalytic processes., Comment: 30 pages, 6 figures, 4 tables
- Published
- 2024
44. CIT: Rethinking Class-incremental Semantic Segmentation with a Class Independent Transformation
- Author
-
Ge, Jinchao, Zhang, Bowen, Liu, Akide, Phan, Minh Hieu, Chen, Qi, Shu, Yangyang, and Zhao, Yang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Class-incremental semantic segmentation (CSS) requires that a model learn to segment new classes without forgetting how to segment previous ones: this is typically achieved by distilling the current knowledge and incorporating the latest data. However, bypassing iterative distillation by directly transferring outputs of initial classes to the current learning task is not supported in existing class-specific CSS methods. Via Softmax, they enforce dependency between classes and adjust the output distribution at each learning step, resulting in a large probability distribution gap between initial and current tasks. We introduce a simple, yet effective Class Independent Transformation (CIT) that converts the outputs of existing semantic segmentation models into class-independent forms with negligible cost or performance loss. By utilizing class-independent predictions facilitated by CIT, we establish an accumulative distillation framework, ensuring equitable incorporation of all class information. We conduct extensive experiments on various segmentation architectures, including DeepLabV3, Mask2Former, and SegViTv2. Results from these experiments show minimal task forgetting across different datasets, with less than 5% for ADE20K in the most challenging 11 task configurations and less than 1% across all configurations for the PASCAL VOC 2012 dataset., Comment: 11 pages, 5 figures
- Published
- 2024
45. Multiplicity of powers of squarefree monomial ideals
- Author
-
Thuy, Phan Thi and Vu, Thanh
- Subjects
Mathematics - Commutative Algebra ,13H15, 05E40, 13F55 - Abstract
Let $I$ be an arbitrary nonzero squarefree monomial ideal of dimension $d$ in a polynomial ring $S = \mathrm{k}[x_1,\ldots,x_n]$. Let $\mu$ be the number of associated primes of $S/I$ of dimension $d$. We prove that the multiplicity of powers of $I$ is given by $$e_0(S/I^s) = \mu \binom{n-d+s-1}{s-1},$$ for all $s \ge 1$. Consequently, we compute the multiplicity of all powers of path ideals of cycles.
- Published
- 2024
46. Scalable AI Framework for Defect Detection in Metal Additive Manufacturing
- Author
-
Phan, Duy Nhat, Jha, Sushant, Mavo, James P., Lanigan, Erin L., Nguyen, Linh, Poudel, Lokendra, and Bhowmik, Rahul
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Additive Manufacturing (AM) is transforming the manufacturing sector by enabling efficient production of intricately designed products and small-batch components. However, metal parts produced via AM can include flaws that cause inferior mechanical properties, including reduced fatigue response, yield strength, and fracture toughness. To address this issue, we leverage convolutional neural networks (CNN) to analyze thermal images of printed layers, automatically identifying anomalies that impact these properties. We also investigate various synthetic data generation techniques to address limited and imbalanced AM training data. Our models' defect detection capabilities were assessed using images of Nickel alloy 718 layers produced on a laser powder bed fusion AM machine and synthetic datasets with and without added noise. Our results show significant accuracy improvements with synthetic data, emphasizing the importance of expanding training sets for reliable defect detection. Specifically, Generative Adversarial Networks (GAN)-generated datasets streamlined data preparation by eliminating human intervention while maintaining high performance, thereby enhancing defect detection capabilities. Additionally, our denoising approach effectively improves image quality, ensuring reliable defect detection. Finally, our work integrates these models in the CLoud ADditive MAnufacturing (CLADMA) module, a user-friendly interface, to enhance their accessibility and practicality for AM applications. This integration supports broader adoption and practical implementation of advanced defect detection in AM processes., Comment: 29 pages
- Published
- 2024
47. MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
- Author
-
Le-Duc, Khai, Phan, Phuc, Pham, Tan-Hanh, Tat, Bach Phan, Ngo, Minh-Huong, and Hy, Truong-Son
- Subjects
Computer Science - Computation and Language ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Multilingual automatic speech recognition (ASR) in the medical domain serves as a foundational task for various downstream applications such as speech translation, spoken language understanding, and voice-activated assistants. This technology enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we introduce MultiMed, the first multilingual medical ASR dataset, along with the first collection of small-to-large end-to-end medical ASR models, spanning five languages: Vietnamese, English, German, French, and Mandarin Chinese. To our best knowledge, MultiMed stands as the world's largest medical ASR dataset across all major benchmarks: total duration, number of recording conditions, number of accents, and number of speaking roles. Furthermore, we present the first multilinguality study for medical ASR, which includes reproducible empirical baselines, a monolinguality-multilinguality analysis, Attention Encoder Decoder (AED) vs Hybrid comparative study, a layer-wise ablation study for the AED, and a linguistic analysis for multilingual medical ASR. All code, data, and models are available online: https://github.com/leduckhai/MultiMed/tree/master/MultiMed, Comment: Preprint, 38 pages
- Published
- 2024
48. Rx Strategist: Prescription Verification using LLM Agents System
- Author
-
Van, Phuc Phan, Minh, Dat Nguyen, Ngoc, An Dinh, and Thanh, Huy Phan
- Subjects
Computer Science - Computation and Language - Abstract
To protect patient safety, modern pharmaceutical complexity demands strict prescription verification. We offer a new approach - Rx Strategist - that makes use of knowledge graphs and different search strategies to enhance the power of Large Language Models (LLMs) inside an agentic framework. This multifaceted technique allows for a multi-stage LLM pipeline and reliable information retrieval from a custom-built active ingredient database. Different facets of prescription verification, such as indication, dose, and possible drug interactions, are covered in each stage of the pipeline. We alleviate the drawbacks of monolithic LLM techniques by spreading reasoning over these stages, improving correctness and reliability while reducing memory demands. Our findings demonstrate that Rx Strategist surpasses many current LLMs, achieving performance comparable to that of a highly experienced clinical pharmacist. In the complicated world of modern medications, this combination of LLMs with organized knowledge and sophisticated search methods presents a viable avenue for reducing prescription errors and enhancing patient outcomes., Comment: 17 Pages, 6 Figures, Under Review
- Published
- 2024
49. Learning to Predict Program Execution by Modeling Dynamic Dependency on Code Graphs
- Author
-
Le, Cuong Chi, Phan, Hoang Nhat, Phan, Huy Nhat, Nguyen, Tien N., and Bui, Nghi D. Q.
- Subjects
Computer Science - Software Engineering - Abstract
Predicting program behavior without execution is a crucial and challenging task in software engineering. Traditional models often struggle to capture the dynamic dependencies and interactions within code. This paper introduces a novel machine learning-based framework called CodeFlow, designed to predict code coverage and detect runtime errors through Dynamic Dependencies Learning. By utilizing control flow graphs (CFGs), CodeFlow represents all possible execution paths and the relationships between different statements, providing a comprehensive understanding of program behavior. CodeFlow constructs CFGs to depict execution paths and learns vector representations for CFG nodes, capturing static control-flow dependencies. Additionally, it learns dynamic dependencies through execution traces, which reflect the impacts among statements during execution. This approach enables accurate prediction of code coverage and effective identification of runtime errors. Empirical evaluations demonstrate significant improvements in code coverage prediction accuracy and effective localization of runtime errors, outperforming existing models.
- Published
- 2024
50. Theoretical Constructs that Explain and Enhance Learning: A Longitudinal Examination
- Author
-
Phan, Huy P.
- Abstract
One important line of inquiry in educational psychology involves the study of change of individuals' cognitive-motivational processes. The conjunctive use of longitudinal data with latent growth curve modeling procedures has, for example, allowed researchers to identify initial levels and to trace trajectories of theoretical variables such as self-efficacy over time. The study reported in this article proposed a conceptual model that depicted relations between a deep-learning approach, mastery goals and self-efficacy over time. A final sample of 195 second-year university students (100 females, 95 males) took part in this three-wave panel study. We used various inventories to test the initial states and rates of change of the three aforementioned constructs. As an a posteriori analysis, we included prior academic achievement as a possible predictor of change. The results ascertained from our analyses indicate an increase in growth of a deep-learning approach, mastery goals and self-efficacy across the two-year period. Importantly, a posteriori results accentuated the role of prior academic achievement as a predictor of the initial level of personal self-efficacy.
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.