1,421,822 results on '"Tan, A"'
Search Results
2. BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
- Author
-
Lu, Xudong, Chen, Yinghao, Chen, Cheng, Tan, Hui, Chen, Boheng, Xie, Yina, Hu, Rui, Tan, Guanxin, Wu, Renshou, Hu, Yan, Zeng, Yi, Wu, Lei, Bian, Liuyang, Wang, Zhaoxiong, Liu, Long, Yang, Yanzhou, Xiao, Han, Zhou, Aojun, Wen, Yafei, Chen, Xiaoxin, Ren, Shuai, and Li, Hongsheng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
The emergence and growing popularity of multimodal large language models (MLLMs) have significant potential to enhance various aspects of daily life, from improving communication to facilitating learning and problem-solving. Mobile phones, as essential daily companions, represent the most effective and accessible deployment platform for MLLMs, enabling seamless integration into everyday tasks. However, deploying MLLMs on mobile phones presents challenges due to limitations in memory size and computational capability, making it difficult to achieve smooth and real-time processing without extensive optimization. In this paper, we present BlueLM-V-3B, an algorithm and system co-design approach specifically tailored for the efficient deployment of MLLMs on mobile platforms. To be specific, we redesign the dynamic resolution scheme adopted by mainstream MLLMs and implement system optimization for hardware-aware deployment to optimize model inference on mobile phones. BlueLM-V-3B boasts the following key highlights: (1) Small Size: BlueLM-V-3B features a language model with 2.7B parameters and a vision encoder with 400M parameters. (2) Fast Speed: BlueLM-V-3B achieves a generation speed of 24.4 token/s on the MediaTek Dimensity 9300 processor with 4-bit LLM weight quantization. (3) Strong Performance: BlueLM-V-3B has attained the highest average score of 66.1 on the OpenCompass benchmark among models with $\leq$ 4B parameters and surpassed a series of models with much larger parameter sizes (e.g., MiniCPM-V-2.6, InternVL2-8B)., Comment: 21 pages
- Published
- 2024
3. Evaluating the Generation of Spatial Relations in Text and Image Generative Models
- Author
-
Sim, Shang Hong, Lee, Clarence, Tan, Alvin, and Tan, Cheston
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Understanding spatial relations is a crucial cognitive ability for both humans and AI. While current research has predominantly focused on the benchmarking of text-to-image (T2I) models, we propose a more comprehensive evaluation that includes \textit{both} T2I and Large Language Models (LLMs). As spatial relations are naturally understood in a visuo-spatial manner, we develop an approach to convert LLM outputs into an image, thereby allowing us to evaluate both T2I models and LLMs \textit{visually}. We examined the spatial relation understanding of 8 prominent generative models (3 T2I models and 5 LLMs) on a set of 10 common prepositions, as well as assess the feasibility of automatic evaluation methods. Surprisingly, we found that T2I models only achieve subpar performance despite their impressive general image-generation abilities. Even more surprisingly, our results show that LLMs are significantly more accurate than T2I models in generating spatial relations, despite being primarily trained on textual data. We examined reasons for model failures and highlight gaps that can be filled to enable more spatially faithful generations.
- Published
- 2024
4. Personalize to generalize: Towards a universal medical multi-modality generalization through personalization
- Author
-
Tan, Zhaorui, Yang, Xi, Pan, Tan, Liu, Tianyi, Jiang, Chen, Guo, Xin, Wang, Qiufeng, Nguyen, Anh, Qi, Yuan, Huang, Kaizhu, and Cheng, Yuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
The differences among medical imaging modalities, driven by distinct underlying principles, pose significant challenges for generalization in multi-modal medical tasks. Beyond modality gaps, individual variations, such as differences in organ size and metabolic rate, further impede a model's ability to generalize effectively across both modalities and diverse populations. Despite the importance of personalization, existing approaches to multi-modal generalization often neglect individual differences, focusing solely on common anatomical features. This limitation may result in weakened generalization in various medical tasks. In this paper, we unveil that personalization is critical for multi-modal generalization. Specifically, we propose an approach to achieve personalized generalization through approximating the underlying personalized invariant representation ${X}_h$ across various modalities by leveraging individual-level constraints and a learnable biological prior. We validate the feasibility and benefits of learning a personalized ${X}_h$, showing that this representation is highly generalizable and transferable across various multi-modal medical tasks. Extensive experimental results consistently show that the additionally incorporated personalization significantly improves performance and generalization across diverse scenarios, confirming its effectiveness.
- Published
- 2024
5. PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption
- Author
-
Tan, Yifan, Tan, Cheng, Mi, Zeyu, and Chen, Haibo
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Confidential computing on GPUs, like NVIDIA H100, mitigates the security risks of outsourced Large Language Models (LLMs) by implementing strong isolation and data encryption. Nonetheless, this encryption incurs a significant performance overhead, reaching up to 52.8 percent and 88.2 percent throughput drop when serving OPT-30B and OPT-66B, respectively. To address this challenge, we introduce PipeLLM, a user-transparent runtime system. PipeLLM removes the overhead by overlapping the encryption and GPU computation through pipelining - an idea inspired by the CPU instruction pipelining - thereby effectively concealing the latency increase caused by encryption. The primary technical challenge is that, unlike CPUs, the encryption module lacks prior knowledge of the specific data needing encryption until it is requested by the GPUs. To this end, we propose speculative pipelined encryption to predict the data requiring encryption by analyzing the serving patterns of LLMs. Further, we have developed an efficient, low-cost pipeline relinquishing approach for instances of incorrect predictions. Our experiments on NVIDIA H100 GPU show that compared with vanilla systems without confidential computing (e.g., vLLM, PEFT, and FlexGen), PipeLLM incurs modest overhead (less than 19.6 percent in throughput) across various LLM sizes, from 13B to 175B., Comment: To appear in ASPLOS 2025
- Published
- 2024
6. Can Personalized Medicine Coexist with Health Equity? Examining the Cost Barrier and Ethical Implications
- Author
-
Francisco, Kishi Kobe Yee, Apuhin, Andrane Estelle Carnicer, Tan, Myles Joshua Toledo, Byers, Mickael Cavanaugh, Maravilla, Nicholle Mae Amor Tan, Karim, Hezerul Abdul, and AlDahoul, Nouar
- Subjects
Computer Science - Computers and Society - Abstract
Personalized medicine (PM) promises to transform healthcare by providing treatments tailored to individual genetic, environmental, and lifestyle factors. However, its high costs and infrastructure demands raise concerns about exacerbating health disparities, especially between high-income countries (HICs) and low- and middle-income countries (LMICs). While HICs benefit from advanced PM applications through AI and genomics, LMICs often lack the resources necessary to adopt these innovations, leading to a widening healthcare divide. This paper explores the financial and ethical challenges of PM implementation, with a focus on ensuring equitable access. It proposes strategies for global collaboration, infrastructure development, and ethical frameworks to support LMICs in adopting PM, aiming to prevent further disparities in healthcare accessibility and outcomes., Comment: 30 pages, 1 figure
- Published
- 2024
7. All-optical autoencoder machine learning framework using diffractive processors
- Author
-
Feng, Peijie, Tan, Yong, Chong, Mingzhe, Li, Lintao, Zhang, Zongkun, Liu, Fubei, Tan, Yunhua, and Wen, Yongzheng
- Subjects
Physics - Applied Physics ,Physics - Optics - Abstract
Diffractive deep neural network (D2NN), known for its high speed, low power consumption, and strong parallelism, has been widely applied across various fields, including pattern recognition, image processing, and image transmission. However, existing network architectures primarily focus on data representation within the original domain, with limited exploration of the latent space, thereby restricting the information mining capabilities and multifunctional integration of D2NNs. Here, we propose an all-optical autoencoder (OAE) framework that can encode the input wavefield into a prior shape distribution in the latent space and decode the encoded pattern back to the original wavefield. By leveraging the non-reciprocal property of D2NN, the OAE models function as encoders in one direction of wave propagation and as decoders in the opposite direction. We further apply the models to three key areas: image denoising, noise-resistant reconfigurable image classification, and image generation. Proof-of-concept experiments have been conducted to validate numerical simulations. Our OAE framework fully exploits the potential of latent space representations, enabling a single set of diffractive processors to simultaneously achieve image reconstruction, representation, and generation. It can be viewed as both a counterpart and an extension of the electronic autoencoder model. This work not only offers fresh insights into the design of optical generative models but also paves the way for developing and applying multifunctional, highly integrated, and general optical intelligent systems., Comment: 21 pages, 7 figure
- Published
- 2024
8. Online 4D Ultrasound-Guided Robotic Tracking Enables 3D Ultrasound Localisation Microscopy with Large Tissue Displacements
- Author
-
Yan, Jipeng, Kawara, Shusei, Tan, Qingyuan, Zhu, Jingwen, Wang, Bingxue, Toulemonde, Matthieu, Liu, Honghai, Tan, Ying, and Tang, Meng-Xing
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Super-Resolution Ultrasound (SRUS) imaging through localising and tracking microbubbles, also known as Ultrasound Localisation Microscopy (ULM), has demonstrated significant potential for reconstructing microvasculature and flows with sub-diffraction resolution in clinical diagnostics. However, imaging organs with large tissue movements, such as those caused by respiration, presents substantial challenges. Existing methods often require breath holding to maintain accumulation accuracy, which limits data acquisition time and ULM image saturation. To improve image quality in the presence of large tissue movements, this study introduces an approach integrating high-frame-rate ultrasound with online precise robotic probe control. Tested on a microvasculature phantom with translation motions up to 20 mm, twice the aperture size of the matrix array used, our method achieved real-time tracking of the moving phantom and imaging volume rate at 85 Hz, keeping majority of the target volume in the imaging field of view. ULM images of the moving cross channels in the phantom were successfully reconstructed in post-processing, demonstrating the feasibility of super-resolution imaging under large tissue motions. This represents a significant step towards ULM imaging of organs with large motion.
- Published
- 2024
9. Ordinal Learning: Longitudinal Attention Alignment Model for Predicting Time to Future Breast Cancer Events from Mammograms
- Author
-
Wang, Xin, Tan, Tao, Gao, Yuan, Marcus, Eric, Han, Luyi, Portaluri, Antonio, Zhang, Tianyu, Lu, Chunyao, Liang, Xinglong, Beets-Tan, Regina, Teuwen, Jonas, and Mann, Ritse
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Precision breast cancer (BC) risk assessment is crucial for developing individualized screening and prevention. Despite the promising potential of recent mammogram (MG) based deep learning models in predicting BC risk, they mostly overlook the 'time-to-future-event' ordering among patients and exhibit limited explorations into how they track history changes in breast tissue, thereby limiting their clinical application. In this work, we propose a novel method, named OA-BreaCR, to precisely model the ordinal relationship of the time to and between BC events while incorporating longitudinal breast tissue changes in a more explainable manner. We validate our method on public EMBED and inhouse datasets, comparing with existing BC risk prediction and time prediction methods. Our ordinal learning method OA-BreaCR outperforms existing methods in both BC risk and time-to-future-event prediction tasks. Additionally, ordinal heatmap visualizations show the model's attention over time. Our findings underscore the importance of interpretable and precise risk assessment for enhancing BC screening and prevention efforts. The code will be accessible to the public.
- Published
- 2024
10. ChatGPT versus a Customized AI Chatbot (Anatbuddy) for Anatomy Education: A Comparative Pilot Study
- Author
-
Gautham Arun, Vivek Perumal, Francis Paul John Bato Urias, Yan En Ler, Bryan Wen Tao Tan, Ranganath Vallabhajosyula, Emmanuel Tan, Olivia Ng, Kian Bee Ng, and Sreenivasulu Reddy Mogali
- Abstract
Large Language Models (LLMs) have the potential to improve education by personalizing learning. However, ChatGPT-generated content has been criticized for sometimes producing false, biased, and/or hallucinatory information. To evaluate AI's ability to return clear and accurate anatomy information, this study generated a custom interactive and intelligent chatbot (Anatbuddy) through an Open AI Application Programming Interface (API) that enables seamless AI-driven interactions within a secured cloud infrastructure. Anatbuddy was programmed through a Retrieval Augmented Generation (RAG) method to provide context-aware responses to user queries based on a predetermined knowledge base. To compare their outputs, various queries (i.e., prompts) on thoracic anatomy (n = 18) were fed into Anatbuddy and ChatGPT 3.5. A panel comprising three experienced anatomists evaluated both tools' responses for factual accuracy, relevance, completeness, coherence, and fluency on a 5-point Likert scale. These ratings were reviewed by a third party blinded to the study, who revised and finalized scores as needed. Anatbuddy's factual accuracy (mean ± SD = 4.78/5.00 ± 0.43; median = 5.00) was rated significantly higher (U = 84, p = 0.01) than ChatGPT's accuracy (4.11 ± 0.83; median = 4.00). No statistically significant differences were detected between the chatbots for the other variables. Given ChatGPT's current content knowledge limitations, we strongly recommend the anatomy profession develop a custom AI chatbot for anatomy education utilizing a carefully curated knowledge base to ensure accuracy. Further research is needed to determine students' acceptance of custom chatbots for anatomy education and their influence on learning experiences and outcomes.
- Published
- 2024
- Full Text
- View/download PDF
11. Bibliography
- Author
-
Tan, Amy G.
- Published
- 2022
12. Index
- Author
-
Tan, Amy G.
- Published
- 2022
13. 10. The paradigm of the 'pastor-author' beyond Bernard
- Author
-
Tan, Amy G.
- Published
- 2022
14. 6. A bit of parish trouble and a manual on giving: self-representation to insiders and outsiders
- Author
-
Tan, Amy G.
- Published
- 2022
15. 9. 'That all the Lord's people could prophesy': innovating in the reference genre (and turning against episcopacy?)
- Author
-
Tan, Amy G.
- Published
- 2022
16. 7. A trial, a guide for jurors, and an allegory: one experience inspiring generically divergent publications
- Author
-
Tan, Amy G.
- Published
- 2022
17. 8. A puritan pastor-author in the 1630s: tailoring the presentation of theological content
- Author
-
Tan, Amy G.
- Published
- 2022
18. 5. Different audiences, different messages: explication and implication in anti-Catholic publications
- Author
-
Tan, Amy G.
- Published
- 2022
19. Part III: Innovation: adapting content, genre, and format
- Author
-
Tan, Amy G.
- Published
- 2022
20. Abbreviations
- Author
-
Tan, Amy G.
- Published
- 2022
21. Part II: Audiences: imagining and fostering relationships with readers
- Author
-
Tan, Amy G.
- Published
- 2022
22. 4. If you learn nothing else: catechisms and the question of the fundamentals of the faith
- Author
-
Tan, Amy G.
- Published
- 2022
23. 3. The call to preach and the question of printed sermons
- Author
-
Tan, Amy G.
- Published
- 2022
24. 2. The making of a pastor-author
- Author
-
Tan, Amy G.
- Published
- 2022
25. Part I: Religious goals: pastoral approaches to devotion, vocation, and print
- Author
-
Tan, Amy G.
- Published
- 2022
26. 1. The ubiquity of 'the devotional'
- Author
-
Tan, Amy G.
- Published
- 2022
27. Select chronology: Richard Bernard's life and career
- Author
-
Tan, Amy G.
- Published
- 2022
28. Figures
- Author
-
Tan, Amy G.
- Published
- 2022
29. Contents
- Author
-
Tan, Amy G.
- Published
- 2022
30. Half title page, Series page, Title page, Copyright page, Dedication
- Author
-
Tan, Amy G.
- Published
- 2022
31. Impact of a Multidisciplinary Supportive Care Model Using Distress Screening at an Asian Ambulatory Cancer Center: A Cluster Randomized Controlled Trial
- Author
-
Ke, Yu, Neo, Patricia Soek Hui, Yang, Grace Meijuan, Neo, Shirlyn Hui-Shan, Tan, Yung Ying, Tan, Yee Pin, Ramalingam, Mothi Babu, Loh, Kiley Wei-Jen, Quah, Daniel Song Chiek, Chew, Lita, Hui, Phebe En, Chan, Raymond Javan, Hwang, William Ying Khee, and Chan, Alexandre
- Subjects
Biomedical and Clinical Sciences ,Oncology and Carcinogenesis ,Cancer ,Clinical Trials and Supportive Activities ,Comparative Effectiveness Research ,Women's Health ,Health Services ,Behavioral and Social Science ,Prevention ,Clinical Research ,Rehabilitation ,Social Determinants of Health ,7.1 Individual care needs ,Good Health and Well Being ,Humans ,Female ,Middle Aged ,Male ,Quality of Life ,Adult ,Aged ,Cancer Survivors ,Psychological Distress ,Singapore ,Stress ,Psychological - Abstract
PurposeThe Accessible Cancer Care to Enable Support for Cancer Survivors (ACCESS) program adopts a multidisciplinary supportive care model with routine distress screening to triage newly diagnosed cancer survivors for additional support on the basis of distress levels. This study aimed to evaluate the clinical impact of ACCESS over 1 year.MethodsWe performed cluster random assignment at the oncologist level in a 1:1 ratio to receive ACCESS or usual care. Participants 21 years and older, newly diagnosed with breast or gynecologic cancer, and receiving care at National Cancer Centre Singapore were included. Outcomes assessed every 3 months for 1 year included quality of life (QoL) (primary), functioning, physical and psychological symptom burden, and activity levels. Data were analyzed using mixed-effects models.ResultsParticipants from 16 clusters (control = 90, intervention = 83) were analyzed. The ACCESS program did not significantly improve QoL (primary outcome). However, compared with usual care recipients, ACCESS recipients reported higher physical functioning (P = .017), role functioning (P = .001), and activity levels (P < .001) at 9 months and lower psychological distress (P = .025) at 12 months. ACCESS recipients screened with high distress had poorer QoL, lower role and social functioning, and higher physical symptom distress at 3 months but had comparable scores with ACCESS recipients without high distress after 12 months.ConclusionCompared with usual care, participation in the ACCESS program did not yield QoL improvement but showed earlier functioning recovery related to activities of daily living and reduced psychological distress. Routine distress screening is a promising mechanism to identify survivors with poorer health for more intensive supportive care.
- Published
- 2024
32. Molecular Precision Engineering for Efficient Binary Organic Photovoltaics through Energy Level and Fibrillar Structure Modulation
- Author
-
Zeng, Rui, Xu, Shengjie, Deng, Jiawei, Tan, Senke, Zhou, Guanqing, Zhang, Ming, Zhu, Lei, Han, Fei, Xue, Xiaonan, Zhang, Anyang, Tan, Hongtao, Zhang, Lingjie, Zhu, Chenhui, Wang, Cheng, Wu, Xuefei, Fink, Zachary, Russell, Thomas P, Zhang, Yongming, and Liu, Feng
- Subjects
Engineering ,Macromolecular and Materials Chemistry ,Materials Engineering ,Chemical Sciences ,Affordable and Clean Energy ,Interdisciplinary Engineering ,Macromolecular and materials chemistry ,Materials engineering - Abstract
Adjusting the energy levels and fibrillar morphology is paramount to enhancing the power conversion efficiency (PCE) of organic solar cells (OSCs). In the present study, an increase in the open-circuit voltage (VOC) is facilitated through the elongation of the alkyl chain within AQx (namely AQx-8), aiming to decrease the free volume ratio (FVR). This reduction in FVR attenuates electron-phonon coupling, thereby augmenting emission efficiency and diminishing the non-radiative energy loss (ΔEnr). To further refine the energy levels and morphological characteristics, the external undecyl chain of AQx-8 is substituted with a shorter carbon chain and cyclohexane noted for its considerable steric hindrance (AQx-H). This alteration significantly mitigates intermolecular aggregation, expands the bandgap, and elevates the lowest unoccupied molecular orbital (LUMO) energy level, culminating in an elevated VOC of 0.923 V in devices based on AQx-H. Morphological analysis reveals that blends based on AQx-H exhibit an enhanced multi-length-scale fibrillar structure, which is conducive to exciton dissociation and charge transport, thereby contributing to a high fill factor (FF) nearing 80%. Consequently, this study reports one of the highest binary PCEs documented, standing at 19.5% (with certification at 19.0%).
- Published
- 2024
33. OminiControl: Minimal and Universal Control for Diffusion Transformer
- Author
-
Tan, Zhenxiong, Liu, Songhua, Yang, Xingyi, Xue, Qiaochu, and Wang, Xinchao
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
In this paper, we introduce OminiControl, a highly versatile and parameter-efficient framework that integrates image conditions into pre-trained Diffusion Transformer (DiT) models. At its core, OminiControl leverages a parameter reuse mechanism, enabling the DiT to encode image conditions using itself as a powerful backbone and process them with its flexible multi-modal attention processors. Unlike existing methods, which rely heavily on additional encoder modules with complex architectures, OminiControl (1) effectively and efficiently incorporates injected image conditions with only ~0.1% additional parameters, and (2) addresses a wide range of image conditioning tasks in a unified manner, including subject-driven generation and spatially-aligned conditions such as edges, depth, and more. Remarkably, these capabilities are achieved by training on images generated by the DiT itself, which is particularly beneficial for subject-driven generation. Extensive evaluations demonstrate that OminiControl outperforms existing UNet-based and DiT-adapted models in both subject-driven and spatially-aligned conditional generation. Additionally, we release our training dataset, Subjects200K, a diverse collection of over 200,000 identity-consistent images, along with an efficient data synthesis pipeline to advance research in subject-consistent generation.
- Published
- 2024
34. OSPtrack: A Labeled Dataset Targeting Simulated Open-Source Package Execution
- Author
-
Tan, Zhuoran, Anagnosstopoulos, Christos, and Singer, Jeremy
- Subjects
Computer Science - Cryptography and Security - Abstract
Open-source software is a fundamental part of the internet and the cyber supply chain, but its exploitation has become more frequent. While vulnerability detection in OSS has advanced, previous work mainly focuses on static code analysis, neglecting runtime indicators. To address this, we created a dataset spanning multiple ecosystems, capturing features generated during the execution of packages and libraries in isolated environments. The dataset includes 9,461 package reports (1,962 malicious), with static and dynamic features such as files, sockets, commands, and DNS records. Labeled with verified information and detailed sub-labels for attack types, this dataset helps identify malicious indicators, especially when source code access is limited, and supports efficient detection methods during runtime.
- Published
- 2024
35. An Attention-based Framework for Fair Contrastive Learning
- Author
-
Nielsen, Stefan K. and Nguyen, Tan M.
- Subjects
Computer Science - Machine Learning - Abstract
Contrastive learning has proven instrumental in learning unbiased representations of data, especially in complex environments characterized by high-cardinality and high-dimensional sensitive information. However, existing approaches within this setting require predefined modelling assumptions of bias-causing interactions that limit the model's ability to learn debiased representations. In this work, we propose a new method for fair contrastive learning that employs an attention mechanism to model bias-causing interactions, enabling the learning of a fairer and semantically richer embedding space. In particular, our attention mechanism avoids bias-causing samples that confound the model and focuses on bias-reducing samples that help learn semantically meaningful representations. We verify the advantages of our method against existing baselines in fair contrastive learning and show that our approach can significantly boost bias removal from learned representations without compromising downstream accuracy.
- Published
- 2024
36. TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior
- Author
-
Yang, Sen, Jiang, Minyue, Fan, Ziwei, Xie, Xiaolu, Tan, Xiao, Li, Yingying, Ding, Errui, Wang, Liang, and Wang, Jingdong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
Recent advances in autonomous driving systems have shifted towards reducing reliance on high-definition maps (HDMaps) due to the huge costs of annotation and maintenance. Instead, researchers are focusing on online vectorized HDMap construction using on-board sensors. However, sensor-only approaches still face challenges in long-range perception due to the restricted views imposed by the mounting angles of onboard cameras, just as human drivers also rely on bird's-eye-view navigation maps for a comprehensive understanding of road structures. To address these issues, we propose to train the perception model to "see" standard definition maps (SDMaps). We encode SDMap elements into neural spatial map representations and instance tokens, and then incorporate such complementary features as prior information to improve the bird's eye view (BEV) feature for lane geometry and topology decoding. Based on the lane segment representation framework, the model simultaneously predicts lanes, centrelines and their topology. To further enhance the ability of geometry prediction and topology reasoning, we also use a topology-guided decoder to refine the predictions by exploiting the mutual relationships between topological and geometric features. We perform extensive experiments on OpenLane-V2 datasets to validate the proposed method. The results show that our model outperforms state-of-the-art methods by a large margin, with gains of +6.7 and +9.1 on the mAP and topology metrics. Our analysis also reveals that models trained with SDMap noise augmentation exhibit enhanced robustness., Comment: 17 pages, 7 figures, and 7 tables
- Published
- 2024
37. Fourier dimension of Gaussian multiplicative chaos
- Author
-
Lin, Zhaofeng, Qiu, Yanqi, and Tan, Mingjie
- Subjects
Mathematics - Probability ,Mathematical Physics ,Mathematics - Dynamical Systems ,Mathematics - Functional Analysis - Abstract
We obtain the precise Fourier dimension of the Gaussian multiplicative chaos on the unit interval. Our main result confirms a conjecture of Garban-Vargas., Comment: This is the first version of our work on Fourier dimension of GMC. New version with more comprehensive and simpler proof, together with illustrative pictures and applications, generalizations of the main result will be updated soon
- Published
- 2024
38. Image Compression Using Novel View Synthesis Priors
- Author
-
Peng, Luyuan, Chitre, Mandar, Vishnu, Hari, Too, Yuen Min, Kalyan, Bharath, Mishra, Rajat, and Tan, Soo Pieng
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
Real-time visual feedback is essential for tetherless control of remotely operated vehicles, particularly during inspection and manipulation tasks. Though acoustic communication is the preferred choice for medium-range communication underwater, its limited bandwidth renders it impractical to transmit images or videos in real-time. To address this, we propose a model-based image compression technique that leverages prior mission information. Our approach employs trained machine-learning based novel view synthesis models, and uses gradient descent optimization to refine latent representations to help generate compressible differences between camera images and rendered images. We evaluate the proposed compression technique using a dataset from an artificial ocean basin, demonstrating superior compression ratios and image quality over existing techniques. Moreover, our method exhibits robustness to introduction of new objects within the scene, highlighting its potential for advancing tetherless remotely operated vehicle operations., Comment: Preprint submitted to Ocean Engineering
- Published
- 2024
39. ALKPU: an active learning method for the DeePMD model with Kalman filter
- Author
-
Li, Haibo, Wu, Xingxing, Liu, Liping, Wang, Lin-Wang, Wang, Long, Tan, Guangming, and Jia, Weile
- Subjects
Physics - Computational Physics - Abstract
Neural network force field models such as DeePMD have enabled highly efficient large-scale molecular dynamics simulations with ab initio accuracy. However, building such models heavily depends on the training data obtained by costly electronic structure calculations, thereby it is crucial to carefully select and label the most representative configurations during model training to improve both extrapolation capability and training efficiency. To address this challenge, based on the Kalman filter theory we propose the Kalman Prediction Uncertainty (KPU) to quantify uncertainty of the model's prediction. With KPU we design the Active Learning by KPU (ALKPU) method, which can efficiently select representative configurations that should be labelled during model training. We prove that ALKPU locally leads to the fastest reduction of model's uncertainty, which reveals its rationality as a general active learning method. We test the ALKPU method using various physical system simulations and demonstrate that it can efficiently coverage the system's configuration space. Our work demonstrates the benefits of ALKPU as a novel active learning method, enhancing training efficiency and reducing computational resource demands.
- Published
- 2024
40. Achieving computational gains with quantum error correction primitives: Generation of long-range entanglement enhanced by error detection
- Author
-
Liao, Haoran, Hartnett, Gavin S., Kakkar, Ashish, Tan, Adrian, Hush, Michael, Mundada, Pranav S., Biercuk, Michael J., and Baum, Yuval
- Subjects
Quantum Physics - Abstract
The resource overhead required to achieve net computational benefits from quantum error correction (QEC) limits its utility while current systems remain constrained in size, despite exceptional progress in experimental demonstrations. In this paper, we demonstrate that the strategic application of QEC primitives without logical encoding can yield significant advantages on superconducting processors--relative to any alternative error-reduction strategy--while only requiring modest overhead. We first present a novel protocol for implementing long-range CNOT gates that relies on a unitarily-prepared Greenberger-Horne-Zeilinger (GHZ) state as well as a unitary disentangling step; the protocol natively introduces an error-detection process using the disentangled qubits as flags. We demonstrate that it achieves state-of-the-art gate fidelities of over 85% across up to 40 lattice sites, significantly and consistently outperforming the best alternative measurement-based protocol without introducing any additional ancilla qubits. We then apply sparse stabilizer measurements to generate large GHZ states by detecting bit-flip and amplitude-damping errors. Employing this technique in combination with deterministic error suppression, we generate a 75-qubit GHZ state exhibiting genuine multipartite entanglement, the largest reported to date. The generation requires no more than 9 ancilla qubits and the fraction of samples discarded due to errors grows no higher than 78%, far lower than previous discard fractions required for tests using comparable numbers of fully encoded qubits. This work in total represents compelling evidence that adopting QEC primitives on current-generation devices can deliver substantial net benefits., Comment: 8 pages, 5 figures (main text) + 9 pages, 10 figures (supplementary)
- Published
- 2024
41. Multi-objective Bayesian Optimisation of Spinodoid Cellular Structures for Crush Energy Absorption
- Author
-
Kansara, Hirak, Khosroshahi, Siamak F., Guo, Leo, Bessa, Miguel A., and Tan, Wei
- Subjects
Condensed Matter - Materials Science ,Computer Science - Computational Engineering, Finance, and Science - Abstract
In the pursuit of designing safer and more efficient energy-absorbing structures, engineers must tackle the challenge of improving crush performance while balancing multiple conflicting objectives, such as maximising energy absorption and minimising peak impact forces. Accurately simulating real-world conditions necessitates the use of complex material models to replicate the non-linear behaviour of materials under impact, which comes at a significant computational cost. This study addresses these challenges by introducing a multi-objective Bayesian optimisation framework specifically developed to optimise spinodoid structures for crush energy absorption. Spinodoid structures, characterised by their scalable, non-periodic topologies and efficient stress distribution, offer a promising direction for advanced structural design. However, optimising design parameters to enhance crush performance is far from straightforward, particularly under realistic conditions. Conventional optimisation methods, although effective, often require a large number of costly simulations to identify suitable solutions, making the process both time-consuming and resource intensive. In this context, multi-objective Bayesian optimisation provides a clear advantage by intelligently navigating the design space, learning from each evaluation to reduce the number of simulations required, and efficiently addressing the complexities of non-linear material behaviour. By integrating finite element analysis with Bayesian optimisation, the framework developed in this study tackles the dual challenge of improving energy absorption and reducing peak force, particularly in scenarios where plastic deformation plays a critical role. The use of scalarisation and hypervolume-based techniques enables the identification of Pareto-optimal solutions, balancing these conflicting objectives.
- Published
- 2024
42. Global Challenge for Safe and Secure LLMs Track 1
- Author
-
Jia, Xiaojun, Huang, Yihao, Liu, Yang, Tan, Peng Yan, Yau, Weng Kuan, Mak, Mun-Thye, Sim, Xin Ming, Ng, Wee Siong, Ng, See Kiong, Liu, Hanqing, Zhou, Lifeng, Yan, Huanqian, Sun, Xiaobing, Liu, Wei, Wang, Long, Qian, Yiming, Liu, Yong, Yang, Junxiao, Zhang, Zhexin, Lei, Leqi, Chen, Renmiao, Lu, Yida, Cui, Shiyao, Wang, Zizhou, Li, Shaohua, Wang, Yan, Goh, Rick Siow Mong, Zhen, Liangli, Zhang, Yingjie, and Zhao, Zhe
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks. With the increasing integration of LLMs in critical sectors such as healthcare, finance, and public administration, ensuring these models are resilient to adversarial attacks is vital for preventing misuse and upholding ethical standards. This competition focused on two distinct tracks designed to evaluate and enhance the robustness of LLM security frameworks. Track 1 tasked participants with developing automated methods to probe LLM vulnerabilities by eliciting undesirable responses, effectively testing the limits of existing safety protocols within LLMs. Participants were challenged to devise techniques that could bypass content safeguards across a diverse array of scenarios, from offensive language to misinformation and illegal activities. Through this process, Track 1 aimed to deepen the understanding of LLM vulnerabilities and provide insights for creating more resilient models.
- Published
- 2024
43. Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification
- Author
-
Liu, Junhua, Tan, Yong Keat, Fu, Bin, and Lim, Kwan Hui
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Generating large-scale, domain-specific, multilingual multi-turn dialogue datasets remains a significant hurdle for training effective Multi-Turn Intent Classification models in chatbot systems. In this paper, we introduce Chain-of-Intent, a novel mechanism that combines Hidden Markov Models with Large Language Models (LLMs) to generate contextually aware, intent-driven conversations through self-play. By extracting domain-specific knowledge from e-commerce chat logs, we estimate conversation turns and intent transitions, which guide the generation of coherent dialogues. Leveraging LLMs to enhance emission probabilities, our approach produces natural and contextually consistent questions and answers. We also propose MINT-CL, a framework for multi-turn intent classification using multi-task contrastive learning, improving classification accuracy without the need for extensive annotated data. Evaluations show that our methods outperform baselines in dialogue quality and intent classification accuracy, especially in multilingual settings, while significantly reducing data generation efforts. Furthermore, we release MINT-E, a multilingual, intent-aware multi-turn e-commerce dialogue corpus to support future research in this area.
- Published
- 2024
44. Planets Around Solar Twins/Analogs (PASTA) I.: High precision stellar chemical abundance for 17 planet-hosting stars and the condensation temperature trend
- Author
-
Sun, Qinghui, Wang, Sharon Xuesong, Gan, Tianjun, Ji, Chenyang, Lin, Zitao, Ting, Yuan-Sen, Teske, Johanna, Li, Haining, Liu, Fan, Hua, Xinyan, Tang, Jiaxin, Yu, Jie, Zhang, Jiayue, Badenas-Agusti, Mariona, Vanderburg, Andrew, Ricker, George R., Vanderspek, Roland, Latham, David W., Seager, Sara, Jenkins, Jon M., Schwarz, Richard P., Guillot, Tristan, Tan, Thiam-Guan, Conti, Dennis M., Collins, Kevin I., Srdoc, Gregor, Stockdale, Chris, Suarez, Olga, Zambelli, Roberto, Radford, Don, Barkaoui, Khalid, Evans, Phil, and Bieryla, Allyson
- Subjects
Astrophysics - Solar and Stellar Astrophysics ,Astrophysics - Earth and Planetary Astrophysics - Abstract
The Sun is depleted in refractory elements compared to nearby solar twins, which may be linked to the formation of giant or terrestrial planets. Here we present high-resolution, high signal-to-noise spectroscopic data for 17 solar-like stars hosting planets, obtained with Magellan II/MIKE, to investigate whether this depletion is related to planet formation. We derive stellar parameters, including stellar atmosphere, age, radius, mass, and chemical abundances for 22 elements from carbon to europium through line-by-line differential analysis. Our uncertainties range from 0.01 dex for Fe and Si to 0.08 dex for Sr, Y, and Eu. By comparing the solar abundances to those of the 17 stars, we investigate the differential abundance ([X/Fe]$_{\rm solar}$ - [X/Fe]$_{\rm star}$) versus condensation temperature ($T_c$) trend. In particular, we apply Galactic chemical evolution corrections to five solar twins within the full sample. Our results conform to previous studies that the Sun is relatively depleted in refractory compared to volatile elements. For both five solar twins and the rest of solar-like stars, we find that all stars hosting known gas giant planets exhibit negative $T_c$ trend slopes, suggesting that the Sun is relatively depleted in refractory elements compared to similar giant-planet-host stars. Additionally, we find no correlation between $T_c$ trend slopes and the total mass of detected terrestrial planets in each system, suggesting that terrestrial planet formation may not be the cause of refractory element depletion in the Sun., Comment: 26 pages, 10 figures, 7 tables; accepted for publication in ApJ
- Published
- 2024
45. Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
- Author
-
Jaiswal, Shantanu, Roy, Debaditya, Fernando, Basura, and Tan, Cheston
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Complex visual reasoning and question answering (VQA) is a challenging task that requires compositional multi-step processing and higher-level reasoning capabilities beyond the immediate recognition and localization of objects and events. Here, we introduce a fully neural Iterative and Parallel Reasoning Mechanism (IPRM) that combines two distinct forms of computation -- iterative and parallel -- to better address complex VQA scenarios. Specifically, IPRM's "iterative" computation facilitates compositional step-by-step reasoning for scenarios wherein individual operations need to be computed, stored, and recalled dynamically (e.g. when computing the query "determine the color of pen to the left of the child in red t-shirt sitting at the white table"). Meanwhile, its "parallel" computation allows for the simultaneous exploration of different reasoning paths and benefits more robust and efficient execution of operations that are mutually independent (e.g. when counting individual colors for the query: "determine the maximum occurring color amongst all t-shirts"). We design IPRM as a lightweight and fully-differentiable neural module that can be conveniently applied to both transformer and non-transformer vision-language backbones. It notably outperforms prior task-specific methods and transformer-based attention modules across various image and video VQA benchmarks testing distinct complex reasoning capabilities such as compositional spatiotemporal reasoning (AGQA), situational reasoning (STAR), multi-hop reasoning generalization (CLEVR-Humans) and causal event linking (CLEVRER-Humans). Further, IPRM's internal computations can be visualized across reasoning steps, aiding interpretability and diagnosis of its errors., Comment: NeurIPS 2024 camera ready; source code to be released at: https://github.com/shantanuj/IPRM_Iterative_and_Parallel_Reasoning_Mechanism
- Published
- 2024
46. Generating 3D-Consistent Videos from Unposed Internet Photos
- Author
-
Chou, Gene, Zhang, Kai, Bi, Sai, Tan, Hao, Xu, Zexiang, Luan, Fujun, Hariharan, Bharath, and Snavely, Noah
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We address the problem of generating videos from unposed internet photos. A handful of input images serve as keyframes, and our model interpolates between them to simulate a path moving between the cameras. Given random images, a model's ability to capture underlying geometry, recognize scene identity, and relate frames in terms of camera position and orientation reflects a fundamental understanding of 3D structure and scene layout. However, existing video models such as Luma Dream Machine fail at this task. We design a self-supervised method that takes advantage of the consistency of videos and variability of multiview internet photos to train a scalable, 3D-aware video model without any 3D annotations such as camera parameters. We validate that our method outperforms all baselines in terms of geometric and appearance consistency. We also show our model benefits applications that enable camera control, such as 3D Gaussian Splatting. Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
- Published
- 2024
47. SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs
- Author
-
Kokane, Shirley, Zhu, Ming, Awalgaonkar, Tulika, Zhang, Jianguo, Hoang, Thai, Prabhakar, Akshara, Liu, Zuxin, Lan, Tian, Yang, Liangwei, Tan, Juntao, Murthy, Rithesh, Yao, Weiran, Liu, Zhiwei, Niebles, Juan Carlos, Wang, Huan, Heinecke, Shelby, Xiong, Caiming, and Savarese, Silivo
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence - Abstract
Evaluating the output of Large Language Models (LLMs) is one of the most critical aspects of building a performant compound AI system. Since the output from LLMs propagate to downstream steps, identifying LLM errors is crucial to system performance. A common task for LLMs in AI systems is tool use. While there are several benchmark environments for evaluating LLMs on this task, they typically only give a success rate without any explanation of the failure cases. To solve this problem, we introduce SpecTool, a new benchmark to identify error patterns in LLM output on tool-use tasks. Our benchmark data set comprises of queries from diverse environments that can be used to test for the presence of seven newly characterized error patterns. Using SPECTOOL , we show that even the most prominent LLMs exhibit these error patterns in their outputs. Researchers can use the analysis and insights from SPECTOOL to guide their error mitigation strategies.
- Published
- 2024
48. Multicomponent cat states with sub-Planck structures and their optomechanical analogues
- Author
-
Hailin, Tan, Akhtar, Naeem, and Xianlong, Gao
- Subjects
Quantum Physics - Abstract
We investigate the superposition of coherent states, emphasizing quantum states with distinct Wigner phase-space features relevant to quantum information applications. In this study, we introduce generalized versions of the compass state, which display enhanced phase-space characteristics compared to the conventional compass state, typically a superposition of four coherent states. Our findings reveal that, unlike sub-Planck structures and phase-space sensitivity of the compass state, these generalized states produce isotropic sub-Planck structures and sensitivity to phase-space displacements. We demonstrate that these desirable phase-space characteristics are maintained in superpositions comprising at least six distinct coherent states. Furthermore, we show that increasing the number of coherent states in the superposition preserves these characteristics, provided the number remains even. Finally, we examine an optomechanical system capable of generating the proposed quantum states, resulting in optomechanical counterparts with nearly identical phase-space structures, thereby suggesting the feasibility of physically realizing these generalized compass states., Comment: 13 pages, 12 figures
- Published
- 2024
49. Fact-Level Confidence Calibration and Self-Correction
- Author
-
Yuan, Yige, Xu, Bingbing, Tan, Hexiang, Sun, Fei, Xiao, Teng, Li, Wei, Shen, Huawei, and Cheng, Xueqi
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Confidence calibration in LLMs, i.e., aligning their self-assessed confidence with the actual accuracy of their responses, enabling them to self-evaluate the correctness of their outputs. However, current calibration methods for LLMs typically estimate two scalars to represent overall response confidence and correctness, which is inadequate for long-form generation where the response includes multiple atomic facts and may be partially confident and correct. These methods also overlook the relevance of each fact to the query. To address these challenges, we propose a Fact-Level Calibration framework that operates at a finer granularity, calibrating confidence to relevance-weighted correctness at the fact level. Furthermore, comprehensive analysis under the framework inspired the development of Confidence-Guided Fact-level Self-Correction ($\textbf{ConFix}$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones. Extensive experiments across four datasets and six models demonstrate that ConFix effectively mitigates hallucinations without requiring external knowledge sources such as retrieval systems., Comment: Code is available at https://github.com/yuanyige/fact-calibration
- Published
- 2024
50. On the $L_{\mathrm{YJ}}(\xi, \eta, X)$ constant for the Bana\'s-Fr\k{a}czek space
- Author
-
Wang, Yuxin, Liu, Qi, Chen, Linhui, Tan, Xiewei, and Sarfraz, Muhammad
- Subjects
Mathematics - Functional Analysis ,46B20 ,F.2.2 ,I.2.7 - Abstract
In this paper, for any $\lambda \geq 1, R_\lambda^2$ is the Bana\'s-Fr\k{a}czek space. The exact value of $L_{\mathrm{YJ}}(\xi, \eta, X)$ for this space will be calculated. Specifically, $L_{\mathrm{YJ}}\left(\xi, \eta, R_\lambda^2\right)=1+\frac{2 \xi \eta}{\xi^2+\eta^2}\left(1-\frac{1}{\lambda^2}\right)$ is the result thereafter through meticilous computation., Comment: for associated mpeg file, see http://myhost.domain/file.mpg
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.