1,371 results on '"Cao Meng"'
Search Results
2. Crystal structure of poly[μ2-dichlorido-(μ2-1-[(2,4-dimethyl-1H-triazole-1-yl)methyl]-1H-benzotriazole-κ2N:N′)cadmium(II)], C11H12CdN6Cl2
- Author
-
Zhou Tan-Peng, Cao Meng-Meng, Du Jun-Ya, Yang Jing, and Wang Xia
- Subjects
2351582 ,Physics ,QC1-999 ,Crystallography ,QD901-999 - Abstract
C11H12CdN6Cl2, triclinic, P1‾ $P\overline{1}$ (no. 2), a = 6.8644(8) Å, b = 10.4577(8) Å, c = 10.6304(12) Å, α = 112.218(9)°, β = 94.638(9)°, γ = 93.289(8)°, Z = 2, V = 700.93(13) Å3, R gt(F) = 0.0427, wR ref(F 2) = 0.1137, T = 293(2) K.
- Published
- 2024
- Full Text
- View/download PDF
3. Prediction and simulation of wearable sensor devices for sports injury prevention based on BP neural network
- Author
-
Jungang Yang, Cao Meng, and Li Ling
- Subjects
BP neural network ,Wearable devices ,Action recognition system ,Injury prevention ,Electric apparatus and materials. Electric circuits. Electric networks ,TK452-454.4 - Abstract
In the research of sports injury prevention, the recognition of sports action plays an important role in the action recognition model and the prediction evaluation model. In view of the above problems, this paper constructs a new mathematical model through the idea of BP neural network. The model combines wearable technology and can improve the recognition accuracy of sports actions. The model uses the BP network classifier, which can be used in data. Processes such as feature extraction improve efficiency and reliability. In this paper, the algorithm simulation is carried out, and the experiments are carried out for three movements: running, running and static. The results show that for the recognition of running and running action, when the hidden layer node is 11, the BP neural network classifier shows the best recognition effect. For static motion, the recognition effect of each classifier is basically the same. This paper analyzes the wearable sports action recognition system, including perception layer, application layer and service layer, to realize the recognition and classification of sports actions, predict actions in advance and prevent sports injuries. Finally, this paper analyzes the causes of sports injury, puts forward specific measures to prevent sports injury, and further reduces sports injury events through wearable devices.
- Published
- 2024
- Full Text
- View/download PDF
4. Application of multi-scale feature fusion algorithm based on motion wearable sensors in feature extraction of sports images
- Author
-
Jungang Yang, Cao Meng, and Li Ling
- Subjects
Multi-scale feature fusion ,Sports ,Image features ,Feature recognition ,Electric apparatus and materials. Electric circuits. Electric networks ,TK452-454.4 - Abstract
The utilization of moving image feature extraction in sports teaching has garnered increasing attention. However, traditional feature extraction algorithms often struggle to meet the diverse and complex demands of moving images. To address this challenge, this paper proposes a multi-scale feature fusion algorithm aimed at improving feature extraction in moving images. The algorithm begins by decomposing the moving image into multiple scales, followed by extracting features from each scale using a feature extraction network. To obtain a more comprehensive and accurate feature representation, feature fusion technology is employed to merge the features from different scales. The proposed algorithm, based on multi-scale feature fusion, exhibits a significant improvement in both accuracy and stability when compared to traditional feature extraction algorithms. Byaccurately extracting and representing the crucial features within moving images, the algorithm contributes to an improved understanding of athletes' movements, enabling instructors to provide more targeted and insightful feedback. This algorithm effectively captures key features within the moving images, providing robust support for tasks such as movement analysis and skill evaluation in sports teaching.
- Published
- 2024
- Full Text
- View/download PDF
5. ZNF3 regulates proliferation, migration and invasion through MMP1 and TWIST in colorectal cancer
- Author
-
Du Le, Liu Ning, Jin Jianfeng, Cao Meng, Sun Yuantian, Gao Xinzheng, Ruan Banzhan, Yang Shangfeng, Ge Dongsheng, Ye Yingzhuan, Zhou Yinxi, Chen Erfei, and Yang Jin
- Subjects
ZNF3 ,colorectal cancer ,proliferation ,migration ,invasion ,Biochemistry ,QD415-436 ,Genetics ,QH426-470 - Abstract
Colorectal cancer (CRC) is a malignant tumor with a high incidence and mortality worldwide. Currently, the underlying molecular mechanisms of CRC are still unclear. Zinc finger protein 3 (ZNF3) is a zinc-finger transcription factor that has been reported as a candidate for breast cancer prognosis, suggesting its involvement in the regulation of tumorigenesis. However, the association between ZNF3 and CRC remains unknown. To investigate the role of ZNF3 in CRC, we first analyze the correlation between ZNF3 expression and CRC, and the results demonstrate that ZNF3 is highly expressed in CRC tissue and cells, which is associated with the age of CRC patients. In vitro studies show that ZNF3 overexpression promotes CRC cell migration. Compared to control cells, knockdown of ZNF3 markedly suppresses CRC cell proliferation, migration and invasion and promotes G0/G1 phase cell cycle arrest. The expressions of the EMT-related markers TWIST and MMP1 are significantly decreased when ZNF3 is silenced. Additionally, overexpression of MMP1 and TWIST exacerbates CRC cell proliferation, accelerates the S phase cell cycle in ZNF3-knockdown SW480 cells, and increases cell migration and invasion through Transwell chambers. These data suggest that ZNF3 is involved in cellular proliferation, migration and invasion by regulating MMP1 and TWIST in CRC cells.
- Published
- 2022
- Full Text
- View/download PDF
6. Effects of school-based high-intensity interval training on body composition, cardiorespiratory fitness and cardiometabolic markers in adolescent boys with obesity: a randomized controlled trial
- Author
-
Cao Meng, Tang Yucheng, Li Shu, and Zou Yu
- Subjects
Pediatrics ,RJ1-570 - Abstract
Abstract Background With accumulating evidence suggesting that CVD has its origins in childhood obesity. The purpose of this study was to determine the effect of a real-world school-based high-intensity interval training intervention on body composition, cardiorespiratory fitness and cardiometabolic markers in obese boys aged 10 to 13 years. Methods Forty-five adolescent boys with obesity (age = 11.2 ± 0.7 years, BMI = 24.2 ± 1.0 kg/m2), were randomized to high-intensity interval training group (HIIT, n = 15), moderate-intensity continuous training group (MICT, n = 15), or a control group (CON, n = 15). The intervention groups performed three weekly exercise sessions over 12 weeks. HIIT group performed two sets of eight bouts of 15 s run at high-intensity [90 ~ 100% maximal aerobic speed (MAS)] separated by eight bouts of 15 s recovery run at low-intensity (50% MAS), MICT group performed 30 min run at moderate intensity (60 ~ 70% MAS) and CON group were instructed to continue their normal behaviors. All participants had indices of body composition, cardiorespiratory fitness (CRF) and cardiometabolic markers measured at baseline and post-intervention. Statistical differences between and within groups were determined by use of two-way analysis of variance (ANOVA) with repeated measures. Results Following the school-based training program, BMI and body fat mass decreased (BMI: − 1.8 kg/m2 vs. – 1.2 kg/m2, P
- Published
- 2022
- Full Text
- View/download PDF
7. Effect of graphene nanosheets on interlaminar mechanical properties of carbon fiber reinforced metal laminates
- Author
-
ZHAO Changbao, CAO Meng, XUE Hongqian, HU Zonghao, ZHOU Zhiqiang, MENG Qingshi, and WANG Shuo
- Subjects
carbon-reinforced aluminum laminates ,graphene platelets ,interface properties ,mechanical properties ,strengthening mechanism ,Motor vehicles. Aeronautics. Astronautics ,TL1-4050 - Abstract
Aiming at the problem of weak bond strength between CARALL metal/resin/fiber layers, this paper proposes a new preparation method to improve the bond strength between CARALL layers. In this method, Graphene platelets (GnPs) with different mass fractions (0%, 0.1%, 0.3%, 0.5% and 1.0%) are uniformly dispersed in epoxy resin by ultrasonic dispersion method, and using the wet layup method completes the production of CARALL. Carry out the I type fracture toughness test, explore the influence of GnPs on the CARALL interlaminar performance, and carry out the test of the tensile and flexural properties of CARALL to study the influence of GnPs on the mechanical properties of CARALL. The enhancement mechanism of GnPs and the failure mode of CARALL specimens were observed by SEM and optical images. The results show that when the amount of GnPs added is 0.5%, CARALL has the best interlaminar strength and mechanical properties. When 0.5% GnPs is added, the type I fracture toughness is increased by 79%; the tensile strength, Young's modulus, and strain rate at break are increased by 14.5%, 11.0%, and 15.5%, respectively; the flexural strength and flexural strain rate are increased by 20.5% and 89.7%, respectively. This is because adding GnPs to the epoxy resin can disperse the load carried by CARALL, and use its own fracture, pull-out and debonding mechanisms to absorb energy, and further improve the interlayer mechanical properties of CARALL.
- Published
- 2022
- Full Text
- View/download PDF
8. Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models
- Author
-
Cao, Meng, Liu, Yuyang, Liu, Yingfei, Wang, Tiancai, Dong, Jiahua, Ding, Henghui, Zhang, Xiangyu, Reid, Ian, and Liang, Xiaodan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Instruction tuning constitutes a prevalent technique for tailoring Large Vision Language Models (LVLMs) to meet individual task requirements. To date, most of the existing approaches are confined to single-task adaptation, whereas the requirements in real-world scenarios are inherently varied and continually evolving. Thus an ideal LVLM should sustain continual instruction tuning in the face of stream-task distributions (i.e., different domains, emerging capabilities, and new datasets) while minimizing the forgetting of previously acquired knowledge. To achieve this, we propose a new benchmark for COntinuAl inStruction Tuning on LVLMs (COAST), which encompasses the aforementioned domain-incremental, capability-incremental, and dataset-incremental configurations. In terms of methodology, we propose Continual LLaVA, a rehearsal-free method tailored for continual instruction tuning in LVLMs. To circumvent the additional overhead associated with experience replay, we freeze LVLMs and construct the dual increment embeddings for each input instruction to facilitate parameter-efficient tuning. Specifically, the increment embeddings can be decomposed into two principal components: 1) intrinsic increment embeddings to encode task-specific characteristics. To achieve this, we set up a low-rank pool containing candidate embeddings, from which we select the relevant ones based on their similarity with the user instructions; 2) contextual increment embeddings to investigate the inter-dependencies across tasks. In this regard, the low-rank embeddings chosen in the previous tasks are aggregated via learnable weighted sum to provide complementary hints. Extensive experiments indicate that the proposed Continual LLaVA outperforms previous methods by significantly reducing the forgetting during the continual instruction tuning process.
- Published
- 2024
9. Synth4Seg -- Learning Defect Data Synthesis for Defect Segmentation using Bi-level Optimization
- Author
-
Mou, Shancong, Vemulapalli, Raviteja, Li, Shiyu, Liu, Yuxuan, Thomas, C, Cao, Meng, Bai, Haoping, Tuzel, Oncel, Huang, Ping, Shan, Jiulong, and Shi, Jianjun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Defect segmentation is crucial for quality control in advanced manufacturing, yet data scarcity poses challenges for state-of-the-art supervised deep learning. Synthetic defect data generation is a popular approach for mitigating data challenges. However, many current methods simply generate defects following a fixed set of rules, which may not directly relate to downstream task performance. This can lead to suboptimal performance and may even hinder the downstream task. To solve this problem, we leverage a novel bi-level optimization-based synthetic defect data generation framework. We use an online synthetic defect generation module grounded in the commonly-used Cut\&Paste framework, and adopt an efficient gradient-based optimization algorithm to solve the bi-level optimization problem. We achieve simultaneous training of the defect segmentation network, and learn various parameters of the data synthesis module by maximizing the validation performance of the trained defect segmentation network. Our experimental results on benchmark datasets under limited data settings show that the proposed bi-level optimization method can be used for learning the most effective locations for pasting synthetic defects thereby improving the segmentation performance by up to 18.3\% when compared to pasting defects at random locations. We also demonstrate up to 2.6\% performance gain by learning the importance weights for different augmentation-specific defect data sources when compared to giving equal importance to all the data sources.
- Published
- 2024
10. How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?
- Author
-
Dong, Jiahua, Liang, Wenqi, Li, Hongliu, Zhang, Duzhen, Cao, Meng, Ding, Henghui, Khan, Salman, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Custom diffusion models (CDMs) have attracted widespread attention due to their astonishing generative ability for personalized concepts. However, most existing CDMs unreasonably assume that personalized concepts are fixed and cannot change over time. Moreover, they heavily suffer from catastrophic forgetting and concept neglect on old personalized concepts when continually learning a series of new concepts. To address these challenges, we propose a novel Concept-Incremental text-to-image Diffusion Model (CIDM), which can resolve catastrophic forgetting and concept neglect to learn new customization tasks in a concept-incremental manner. Specifically, to surmount the catastrophic forgetting of old concepts, we develop a concept consolidation loss and an elastic weight aggregation module. They can explore task-specific and task-shared knowledge during training, and aggregate all low-rank weights of old concepts based on their contributions during inference. Moreover, in order to address concept neglect, we devise a context-controllable synthesis strategy that leverages expressive region features and noise estimation to control the contexts of generated images according to user conditions. Experiments validate that our CIDM surpasses existing custom diffusion models. The source codes are available at https://github.com/JiahuaDong/CIFC., Comment: Accepted to NeurIPS2024
- Published
- 2024
11. ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
- Author
-
Zhang, Haoran, Guo, Hangyu, Guo, Shuyue, Cao, Meng, Huang, Wenhao, Liu, Jiaheng, and Zhang, Ge
- Subjects
Computer Science - Computation and Language - Abstract
As multimodal large language models (MLLMs) continue to demonstrate increasingly competitive performance across a broad spectrum of tasks, more intricate and comprehensive benchmarks have been developed to assess these cutting-edge models. These benchmarks introduce new challenges to core capabilities such as perception, reasoning, and planning. However, existing multimodal benchmarks fall short in providing a focused evaluation of multi-step planning based on spatial relationships in images. To bridge this gap, we present ING-VP, the first INteractive Game-based Vision Planning benchmark, specifically designed to evaluate the spatial imagination and multi-step reasoning abilities of MLLMs. ING-VP features 6 distinct games, encompassing 300 levels, each with 6 unique configurations. A single model engages in over 60,000 rounds of interaction. The benchmark framework allows for multiple comparison settings, including image-text vs. text-only inputs, single-step vs. multi-step reasoning, and with-history vs. without-history conditions, offering valuable insights into the model's capabilities. We evaluated numerous state-of-the-art MLLMs, with the highest-performing model, Claude-3.5 Sonnet, achieving an average accuracy of only 3.37%, far below the anticipated standard. This work aims to provide a specialized evaluation framework to drive advancements in MLLMs' capacity for complex spatial reasoning and planning. The code is publicly available at https://github.com/Thisisus7/ING-VP.git., Comment: 49 pages, 12 figures
- Published
- 2024
12. TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights
- Author
-
Liu, Aiwei, Bai, Haoping, Lu, Zhiyun, Sun, Yanchao, Kong, Xiang, Wang, Simon, Shan, Jiulong, Jose, Albin Madappally, Liu, Xiaojiang, Wen, Lijie, Yu, Philip S., and Cao, Meng
- Subjects
Computer Science - Computation and Language ,68T50 ,I.2.7 - Abstract
Direct Preference Optimization (DPO) has been widely adopted for preference alignment of Large Language Models (LLMs) due to its simplicity and effectiveness. However, DPO is derived as a bandit problem in which the whole response is treated as a single arm, ignoring the importance differences between tokens, which may affect optimization efficiency and make it difficult to achieve optimal results. In this work, we propose that the optimal data for DPO has equal expected rewards for each token in winning and losing responses, as there is no difference in token importance. However, since the optimal dataset is unavailable in practice, we propose using the original dataset for importance sampling to achieve unbiased optimization. Accordingly, we propose a token-level importance sampling DPO objective named TIS-DPO that assigns importance weights to each token based on its reward. Inspired by previous works, we estimate the token importance weights using the difference in prediction probabilities from a pair of contrastive LLMs. We explore three methods to construct these contrastive LLMs: (1) guiding the original LLM with contrastive prompts, (2) training two separate LLMs using winning and losing responses, and (3) performing forward and reverse DPO training with winning and losing responses. Experiments show that TIS-DPO significantly outperforms various baseline methods on harmlessness and helpfulness alignment and summarization tasks. We also visualize the estimated weights, demonstrating their ability to identify key token positions., Comment: 27 pages, 7 figures, 2 tables
- Published
- 2024
13. Contrastive Localized Language-Image Pre-Training
- Author
-
Chen, Hong-You, Lai, Zhengfeng, Zhang, Haotian, Wang, Xinze, Eichner, Marcin, You, Keen, Cao, Meng, Zhang, Bowen, Yang, Yinfei, and Gan, Zhe
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Contrastive Language-Image Pre-training (CLIP) has been a celebrated method for training vision encoders to generate image/text representations facilitating various applications. Recently, CLIP has been widely adopted as the vision backbone of multimodal large language models (MLLMs) to connect image inputs for language interactions. The success of CLIP as a vision-language foundation model relies on aligning web-crawled noisy text annotations at image levels. Nevertheless, such criteria may become insufficient for downstream tasks in need of fine-grained vision representations, especially when region-level understanding is demanding for MLLMs. In this paper, we improve the localization capability of CLIP with several advances. We propose a pre-training method called Contrastive Localized Language-Image Pre-training (CLOC) by complementing CLIP with region-text contrastive loss and modules. We formulate a new concept, promptable embeddings, of which the encoder produces image embeddings easy to transform into region representations given spatial hints. To support large-scale pre-training, we design a visually-enriched and spatially-localized captioning framework to effectively generate region-text pseudo-labels at scale. By scaling up to billions of annotated images, CLOC enables high-quality regional embeddings for image region recognition and retrieval tasks, and can be a drop-in replacement of CLIP to enhance MLLMs, especially on referring and grounding tasks., Comment: Preprint
- Published
- 2024
14. Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
- Author
-
Lai, Zhengfeng, Saveris, Vasileios, Chen, Chen, Chen, Hong-You, Zhang, Haotian, Zhang, Bowen, Tebar, Juan Lao, Hu, Wenze, Gan, Zhe, Grasch, Peter, Cao, Meng, and Yang, Yinfei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Recent advancements in multimodal models highlight the value of rewritten captions for improving performance, yet key challenges remain. For example, while synthetic captions often provide superior quality and image-text alignment, it is not clear whether they can fully replace AltTexts: the role of synthetic captions and their interaction with original web-crawled AltTexts in pre-training is still not well understood. Moreover, different multimodal foundation models may have unique preferences for specific caption formats, but efforts to identify the optimal captions for each model remain limited. In this work, we propose a novel, controllable, and scalable captioning pipeline designed to generate diverse caption formats tailored to various multimodal models. By examining Short Synthetic Captions (SSC) towards Dense Synthetic Captions (DSC+) as case studies, we systematically explore their effects and interactions with AltTexts across models such as CLIP, multimodal LLMs, and diffusion models. Our findings reveal that a hybrid approach that keeps both synthetic captions and AltTexts can outperform the use of synthetic captions alone, improving both alignment and performance, with each model demonstrating preferences for particular caption formats. This comprehensive analysis provides valuable insights into optimizing captioning strategies, thereby advancing the pre-training of multimodal foundation models., Comment: CV/ML
- Published
- 2024
15. Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI
- Author
-
Zhang, Liang, Lin, Jionghao, Sabatini, John, Borchers, Conrad, Weitekamp, Daniel, Cao, Meng, Hollander, John, Hu, Xiangen, and Graesser, Arthur C.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Learning performance data describe correct and incorrect answers or problem-solving attempts in adaptive learning, such as in intelligent tutoring systems (ITSs). Learning performance data tend to be highly sparse (80\%\(\sim\)90\% missing observations) in most real-world applications due to adaptive item selection. This data sparsity presents challenges to using learner models to effectively predict future performance explore new hypotheses about learning. This article proposes a systematic framework for augmenting learner data to address data sparsity in learning performance data. First, learning performance is represented as a three-dimensional tensor of learners' questions, answers, and attempts, capturing longitudinal knowledge states during learning. Second, a tensor factorization method is used to impute missing values in sparse tensors of collected learner data, thereby grounding the imputation on knowledge tracing tasks that predict missing performance values based on real observations. Third, a module for generating patterns of learning is used. This study contrasts two forms of generative Artificial Intelligence (AI), including Generative Adversarial Networks (GANs) and Generate Pre-Trained Transformers (GPT) to generate data associated with different clusters of learner data. We tested this approach on an adult literacy dataset from AutoTutor lessons developed for Adult Reading Comprehension (ARC). We found that: (1) tensor factorization improved the performance in tracing and predicting knowledge mastery compared with other knowledge tracing techniques without data augmentation, showing higher relative fidelity for this imputation method, and (2) the GAN-based simulation showed greater overall stability and less statistical bias based on a divergence evaluation with varying simulation sample sizes compared to GPT.
- Published
- 2024
16. Effect of picture naming training on naming ability of patients with semantic dementia: a preliminary study
- Author
-
WANG Rui, QIAO Yu⁃chen, YANG Xuan, YOU Jing, CAO Meng⁃ge, and CHANG Hong
- Subjects
aphasia ,primary progressive ,frontotemporal lobar degeneration ,language therapy ,Neurology. Diseases of the nervous system ,RC346-429 - Abstract
Objective To explore the feasibility of picture naming training in improving naming ability of patients with semantic dementia (SD). Methods A total of 14 cases of semantic dementia whose the first symptom was semantic memory impairment were given picture naming training 30 min/time, twice a day, for 5 d from October 2018 to December 2020 at Xuanwu Hospital, Capital Medical University. One day before and one day after the end of training, Picture Naming Test and Boston Naming Test (BNT) were used to evaluate the naming ability. Results After training, Picture Naming Test score (65.79 ± 34.54 vs. 49.79 ± 30.85; t = ⁃ 3.297, P = 0.001) and BNT score [6.50 (4.75, 14.00) vs. 5.00 (3.00, 11.50); Z = ⁃ 2.007, P = 0.045] were higher than before training. Conclusions Picture naming training can effectively improve the naming ability of patients with semantic dementia.
- Published
- 2021
- Full Text
- View/download PDF
17. MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
- Author
-
Tang, Haoran, Cao, Meng, Huang, Jinfa, Liu, Ruyang, Jin, Peng, Li, Ge, and Liang, Xiaodan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Text-Video Retrieval (TVR) aims to align and associate relevant video content with corresponding natural language queries. Most existing TVR methods are based on large-scale pre-trained vision-language models (e.g., CLIP). However, due to the inherent plain structure of CLIP, few TVR methods explore the multi-scale representations which offer richer contextual information for a more thorough understanding. To this end, we propose MUSE, a multi-scale mamba with linear computational complexity for efficient cross-resolution modeling. Specifically, the multi-scale representations are generated by applying a feature pyramid on the last single-scale feature map. Then, we employ the Mamba structure as an efficient multi-scale learner to jointly learn scale-wise representations. Furthermore, we conduct comprehensive studies to investigate different model structures and designs. Extensive results on three popular benchmarks have validated the superiority of MUSE., Comment: 8 pages
- Published
- 2024
18. $\pi$ mode lasing in the non-Hermitian Floquet topological system
- Author
-
Shen, Shuang, Kartashov, Yaroslav V., Li, Yongdong, Cao, Meng, and Zhang, Yiqi
- Subjects
Physics - Optics ,Nonlinear Sciences - Pattern Formation and Solitons - Abstract
$\pi$ modes are unique topological edge states appearing in Floquet systems with periodic modulations of the underlying lattice structure in evolution variable, such as dynamically modulated Su-Schrieffer-Heeger (SSH) lattices. These edge states are anomalous states usually appearing between Floquet replicas of the same band, even if standard topological index remains zero for this band. While linear and nonlinear $\pi$ modes were observed in conservative systems, they have never been studied in nonlinear regime in the non-Hermitian systems with structured gain and losses. Here we show that SSH waveguide array with periodically oscillating waveguide positions in propagation direction and with parity-time symmetric refractive index landscape, can support $\pi$ modes that are damped or amplified at different ends of the array. By including nonlinearity and nonlinear absorption into our continuous system, we achieve stable lasing in $\pi$ mode at one end of the array. The representative feature of this system is that lasing in it is thresholdless and it occurs even at low gain-loss amplitudes. The degree of localization of lasing $\pi$ modes can be flexibly controlled by the amplitude of transverse waveguide oscillations. This work therefore introduces a new type of topological Floquet laser and a route to manipulation of $\pi$ modes by structured gain and losses., Comment: 9 pages, 3 figures, to appear in APL Photonics
- Published
- 2024
- Full Text
- View/download PDF
19. Apple Intelligence Foundation Language Models
- Author
-
Gunter, Tom, Wang, Zirui, Wang, Chong, Pang, Ruoming, Narayanan, Andy, Zhang, Aonan, Zhang, Bowen, Chen, Chen, Chiu, Chung-Cheng, Qiu, David, Gopinath, Deepak, Yap, Dian Ang, Yin, Dong, Nan, Feng, Weers, Floris, Yin, Guoli, Huang, Haoshuo, Wang, Jianyu, Lu, Jiarui, Peebles, John, Ye, Ke, Lee, Mark, Du, Nan, Chen, Qibin, Keunebroek, Quentin, Wiseman, Sam, Evans, Syd, Lei, Tao, Rathod, Vivek, Kong, Xiang, Du, Xianzhi, Li, Yanghao, Wang, Yongqiang, Gao, Yuan, Ahmed, Zaid, Xu, Zhaoyang, Lu, Zhiyun, Rashid, Al, Jose, Albin Madappally, Doane, Alec, Bencomo, Alfredo, Vanderby, Allison, Hansen, Andrew, Jain, Ankur, Anupama, Anupama Mann, Kamal, Areeba, Wu, Bugu, Brum, Carolina, Maalouf, Charlie, Erdenebileg, Chinguun, Dulhanty, Chris, Moritz, Dominik, Kang, Doug, Jimenez, Eduardo, Ladd, Evan, Shi, Fangping, Bai, Felix, Chu, Frank, Hohman, Fred, Kotek, Hadas, Coleman, Hannah Gillis, Li, Jane, Bigham, Jeffrey, Cao, Jeffery, Lai, Jeff, Cheung, Jessica, Shan, Jiulong, Zhou, Joe, Li, John, Qin, Jun, Singh, Karanjeet, Vega, Karla, Zou, Kelvin, Heckman, Laura, Gardiner, Lauren, Bowler, Margit, Cordell, Maria, Cao, Meng, Hay, Nicole, Shahdadpuri, Nilesh, Godwin, Otto, Dighe, Pranay, Rachapudi, Pushyami, Tantawi, Ramsey, Frigg, Roman, Davarnia, Sam, Shah, Sanskruti, Guha, Saptarshi, Sirovica, Sasha, Ma, Shen, Ma, Shuang, Wang, Simon, Kim, Sulgi, Jayaram, Suma, Shankar, Vaishaal, Paidi, Varsha, Kumar, Vivek, Wang, Xin, Zheng, Xin, Cheng, Walker, Shrager, Yael, Ye, Yang, Tanaka, Yasu, Guo, Yihao, Meng, Yunsong, Luo, Zhao Tang, Ouyang, Zhi, Aygar, Alp, Wan, Alvin, Walkingshaw, Andrew, Lin, Antonie, Farooq, Arsalan, Ramerth, Brent, Reed, Colorado, Bartels, Chris, Chaney, Chris, Riazati, David, Yang, Eric Liang, Feldman, Erin, Hochstrasser, Gabriel, Seguin, Guillaume, Belousova, Irina, Pelemans, Joris, Yang, Karen, Vahid, Keivan Alizadeh, Cao, Liangliang, Najibi, Mahyar, Zuliani, Marco, Horton, Max, Cho, Minsik, Bhendawade, Nikhil, Dong, Patrick, Maj, Piotr, Agrawal, Pulkit, Shan, Qi, Fu, Qichen, Poston, Regan, Xu, Sam, Liu, Shuangning, Rao, Sushma, Heeramun, Tashweena, Merth, Thomas, Rayala, Uday, Cui, Victor, Sridhar, Vivek Rangarajan, Zhang, Wencong, Zhang, Wenqi, Wu, Wentao, Zhou, Xingyu, Liu, Xinwen, Zhao, Yang, Xia, Yin, Ren, Zhile, and Ren, Zhongzheng
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the models are optimized for inference, and the evaluation results. We highlight our focus on Responsible AI and how the principles are applied throughout the model development.
- Published
- 2024
20. MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains
- Author
-
Yin, Guoli, Bai, Haoping, Ma, Shuang, Nan, Feng, Sun, Yanchao, Xu, Zhaoyang, Ma, Shen, Lu, Jiarui, Kong, Xiang, Zhang, Aonan, Yap, Dian Ang, zhang, Yizhe, Ahnert, Karsten, Kamath, Vik, Berglund, Mathias, Walsh, Dominic, Gindele, Tobias, Wiest, Juergen, Lai, Zhengfeng, Wang, Xiaoming, Shan, Jiulong, Cao, Meng, Pang, Ruoming, and Wang, Zirui
- Subjects
Computer Science - Artificial Intelligence - Abstract
Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing task completion but failing to dissect the underlying skills that drive these outcomes. This lack of granularity makes it difficult to deeply discern where failures stem from. Additionally, setting up these environments requires considerable effort, and issues of unreliability and reproducibility sometimes arise, especially in interactive tasks. To address these limitations, we introduce the Massive Multitask Agent Understanding (MMAU) benchmark, featuring comprehensive offline tasks that eliminate the need for complex environment setups. It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics, and covers five essential capabilities: Understanding, Reasoning, Planning, Problem-solving, and Self-correction. With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents. By testing 18 representative models on MMAU, we provide deep and insightful analyses. Ultimately, MMAU not only sheds light on the capabilities and limitations of LLM agents but also enhances the interpretability of their performance. Datasets and evaluation scripts of MMAU are released at https://github.com/apple/axlearn/tree/main/docs/research/mmau.
- Published
- 2024
21. SLRL: Structured Latent Representation Learning for Multi-view Clustering
- Author
-
Xiong, Zhangci and Cao, Meng
- Subjects
Computer Science - Machine Learning - Abstract
In recent years, Multi-View Clustering (MVC) has attracted increasing attention for its potential to reduce the annotation burden associated with large datasets. The aim of MVC is to exploit the inherent consistency and complementarity among different views, thereby integrating information from multiple perspectives to improve clustering outcomes. Despite extensive research in MVC, most existing methods focus predominantly on harnessing complementary information across views to enhance clustering effectiveness, often neglecting the structural information among samples, which is crucial for exploring sample correlations. To address this gap, we introduce a novel framework, termed Structured Latent Representation Learning based Multi-View Clustering method (SLRL). SLRL leverages both the complementary and structural information. Initially, it learns a common latent representation for all views. Subsequently, to exploit the structural information among samples, a k-nearest neighbor graph is constructed from this common latent representation. This graph facilitates enhanced sample interaction through graph learning techniques, leading to a structured latent representation optimized for clustering. Extensive experiments demonstrate that SLRL not only competes well with existing methods but also sets new benchmarks in various multi-view datasets.
- Published
- 2024
22. $\mathcal{PT}$-symmetric photonic lattices with type-II Dirac cones
- Author
-
Tang, Qian, Belić, Milivoj R., Zhong, Hua, Cao, Meng, Li, Yongdong, and Zhang, Yiqi
- Subjects
Physics - Optics - Abstract
The type-II Dirac cone is a special feature of the band structure, whose Fermi level is represented by a pair of crossing lines. It has been demonstrated that such a structure is useful for investigating topological edge solitons, and more specifically, for mimicking the Kline tunneling. However, it is still not clear what the interplay between type-II Dirac cones and the non-Hermiticity mechanism will result in. Here, this question is addressed; in particular, we report the $\mathcal{PT}$-symmetric photonic lattices with type-II Dirac cones for the first time. We identify a slope-exceptional ring and name it the type-II exceptional ring. We display the restoration of the $\mathcal{PT}$ symmetry of the lattice by reducing the separation between the sites in the unit cell. Curiously, the amplitude of the beam during propagation in the non-Hermitian lattice with $\mathcal{PT}$ symmetry only decays because of diffraction, whereas in the $\mathcal{PT}$ symmetry-broken lattice it will be amplified, even though the beam still diffracts. This work establishes the link between the non-Hermiticity mechanism and the violation of Lorentz invariance in these physical systems., Comment: 5 pages, 4 figures, to appear in Optics Letters. Comments are welcome
- Published
- 2024
- Full Text
- View/download PDF
23. Integrating Attentional Factors and Spacing in Logistic Knowledge Tracing Models to Explore the Impact of Training Sequences on Category Learning
- Author
-
Cao, Meng, Pavlik Jr., Philip I., Chu, Wei, and Zhang, Liang
- Subjects
Computer Science - Computers and Society ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
In category learning, a growing body of literature has increasingly focused on exploring the impacts of interleaving in contrast to blocking. The sequential attention hypothesis posits that interleaving draws attention to the differences between categories while blocking directs attention toward similarities within categories. Although a recent study underscores the joint influence of memory and attentional factors on sequencing effects, there remains a scarcity of effective computational models integrating both attentional and memory considerations to comprehensively understand the effect of training sequences on students' performance. This study introduces a novel integration of attentional factors and spacing into the logistic knowledge tracing (LKT) models to monitor students' performance across different training sequences (interleaving and blocking). Attentional factors were incorporated by recording the counts of comparisons between adjacent trials, considering whether they belong to the same or different category. Several features were employed to account for temporal spacing. We used cross-validations to test the model fit and predictions on the learning session and posttest. Our findings reveal that incorporating both attentional factors and spacing features in the Additive Factors Model (AFM) significantly enhances its capacity to capture the effects of interleaving and blocking and demonstrates superior predictive accuracy for students' learning outcomes. By bridging the gap between attentional factors and memory processes, our computational approach offers a more comprehensive framework for understanding and predicting category learning outcomes in educational settings., Comment: 7 pages, 3 figures, Educational Data Mining 2024
- Published
- 2024
24. Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps
- Author
-
Chen, Jian, Zhou, Peilin, Hua, Yining, Chong, Dading, Cao, Meng, Li, Yaowei, Yuan, Zixuan, Zhu, Bing, and Liang, Junwei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Real-time detection and prediction of extreme weather protect human lives and infrastructure. Traditional methods rely on numerical threshold setting and manual interpretation of weather heatmaps with Geographic Information Systems (GIS), which can be slow and error-prone. Our research redefines Extreme Weather Events Detection (EWED) by framing it as a Visual Question Answering (VQA) problem, thereby introducing a more precise and automated solution. Leveraging Vision-Language Models (VLM) to simultaneously process visual and textual data, we offer an effective aid to enhance the analysis process of weather heatmaps. Our initial assessment of general-purpose VLMs (e.g., GPT-4-Vision) on EWED revealed poor performance, characterized by low accuracy and frequent hallucinations due to inadequate color differentiation and insufficient meteorological knowledge. To address these challenges, we introduce ClimateIQA, the first meteorological VQA dataset, which includes 8,760 wind gust heatmaps and 254,040 question-answer pairs covering four question types, both generated from the latest climate reanalysis data. We also propose Sparse Position and Outline Tracking (SPOT), an innovative technique that leverages OpenCV and K-Means clustering to capture and depict color contours in heatmaps, providing ClimateIQA with more accurate color spatial location information. Finally, we present Climate-Zoo, the first meteorological VLM collection, which adapts VLMs to meteorological applications using the ClimateIQA dataset. Experiment results demonstrate that models from Climate-Zoo substantially outperform state-of-the-art general VLMs, achieving an accuracy increase from 0% to over 90% in EWED verification. The datasets and models in this study are publicly available for future climate science research: https://github.com/AlexJJJChen/Climate-Zoo.
- Published
- 2024
25. Textual Inversion and Self-supervised Refinement for Radiology Report Generation
- Author
-
Luo, Yuanjiang, Li, Hongxiang, Wu, Xuan, Cao, Meng, Huang, Xiaoshuang, Zhu, Zhihong, Liao, Peixi, Chen, Hu, and Zhang, Yi
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specifically, textual inversion can project text and image into the same space by representing images as pseudo words to eliminate the cross-modeling gap. Subsequently, self-supervised refinement refines these pseudo words through contrastive loss computation between images and texts, enhancing the fidelity of generated reports to images. Notably, TISR is orthogonal to most existing methods, plug-and-play. We conduct experiments on two widely-used public datasets and achieve significant improvements on various baselines, which demonstrates the effectiveness and generalization of TISR. The code will be available soon., Comment: This paper has been early accepted by MICCAI 2024!
- Published
- 2024
26. Uncertainty-aware sign language video retrieval with probability distribution modeling
- Author
-
Wu, Xuan, Li, Hongxiang, Luo, Yuanjiang, Cheng, Xuxin, Zhuang, Xianwei, Cao, Meng, and Fu, Keren
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Information Retrieval - Abstract
Sign language video retrieval plays a key role in facilitating information access for the deaf community. Despite significant advances in video-text retrieval, the complexity and inherent uncertainty of sign language preclude the direct application of these techniques. Previous methods achieve the mapping between sign language video and text through fine-grained modal alignment. However, due to the scarcity of fine-grained annotation, the uncertainty inherent in sign language video is underestimated, limiting the further development of sign language retrieval tasks. To address this challenge, we propose a novel Uncertainty-aware Probability Distribution Retrieval (UPRet), that conceptualizes the mapping process of sign language video and text in terms of probability distributions, explores their potential interrelationships, and enables flexible mappings. Experiments on three benchmarks demonstrate the effectiveness of our method, which achieves state-of-the-art results on How2Sign (59.1%), PHOENIX-2014T (72.0%), and CSL-Daily (78.4%).
- Published
- 2024
27. RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
- Author
-
Cao, Meng, Tang, Haoran, Huang, Jinfa, Jin, Peng, Zhang, Can, Liu, Ruyang, Chen, Long, Liang, Xiaodan, Yuan, Li, and Li, Ge
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning based on large-scale pre-trained visionlanguage models (e.g., CLIP). However, fully fine-tuning these pre-trained models for TVR incurs prohibitively expensive computation costs. To this end, we propose to conduct efficient text-video Retrieval with a sparse-andcorrelated AdaPter (RAP), i.e., fine-tuning the pre-trained model with a few parameterized layers. To accommodate the text-video scenario, we equip our RAP with two indispensable characteristics: temporal sparsity and correlation. Specifically, we propose a low-rank modulation module to refine the per-image features from the frozen CLIP backbone, which accentuates salient frames within the video features while alleviating temporal redundancy. Besides, we introduce an asynchronous self-attention mechanism that first selects the top responsive visual patches and augments the correlation modeling between them with learnable temporal and patch offsets. Extensive experiments on four TVR datasets demonstrate that RAP achieves superior or comparable performance compared to the fully fine-tuned counterpart and other parameter-efficient fine-tuning methods., Comment: Accepted by ACL 2024 Findings
- Published
- 2024
28. The $\sigma$ hulls of matrix-product codes and related entanglement-assisted quantum error-correcting codes
- Author
-
Cao, Meng
- Subjects
Computer Science - Information Theory - Abstract
Let $\mathrm{SLAut}(\mathbb{F}_{q}^{n})$ denote the group of all semilinear isometries on $\mathbb{F}_{q}^{n}$, where $q=p^{e}$ is a prime power. Matrix-product (MP) codes are a class of long classical codes generated by combining several commensurate classical codes with a defining matrix. We give an explicit formula for calculating the dimension of the $\sigma$ hull of a MP code. As a result, we give necessary and sufficient conditions for the MP codes to be $\sigma$ dual-containing and $\sigma$ self-orthogonal. We prove that $\mathrm{dim}_{\mathbb{F}_{q}}(\mathrm{Hull}_{\sigma}(\mathcal{C}))=\mathrm{dim}_{\mathbb{F}_{q}}(\mathrm{Hull}_{\sigma}(\mathcal{C}^{\bot_{\sigma}}))$. We prove that for any integer $h$ with $\mathrm{max}\{0,k_{1}-k_{2}\}\leq h\leq \mathrm{dim}_{\mathbb{F}_{q}}(\mathcal{C}_{1}\cap\mathcal{C}_{2}^{\bot_{\sigma}})$, there exists a linear code $\mathcal{C}_{2,h}$ monomially equivalent to $\mathcal{C}_{2}$ such that $\mathrm{dim}_{\mathbb{F}_{q}}(\mathcal{C}_{1}\cap\mathcal{C}_{2,h}^{\bot_{\sigma}})=h$, where $\mathcal{C}_{i}$ is an $[n,k_{i}]_{q}$ linear code for $i=1,2$. We show that given an $[n,k,d]_{q}$ linear code $\mathcal{C}$, there exists a monomially equivalent $[n,k,d]_{q}$ linear code $\mathcal{C}_{h}$, whose $\sigma$ dual code has minimum distance $d'$, such that there exist an $[[n,k-h,d;n-k-h]]_{q}$ EAQECC and an $[[n,n-k-h,d';k-h]]_{q}$ EAQECC for every integer $h$ with $0\leq h\leq \mathrm{dim}_{\mathbb{F}_{q}}(\mathrm{Hull}_{\sigma}(\mathcal{C}))$. Based on this result, we present a general construction method for deriving EAQECCs with flexible parameters from MP codes related to $\sigma$ hulls.
- Published
- 2024
29. Special matrices over finite fields and their applications to quantum error-correcting codes
- Author
-
Cao, Meng
- Subjects
Computer Science - Information Theory - Abstract
The matrix-product (MP) code $\mathcal{C}_{A,k}:=[\mathcal{C}_{1},\mathcal{C}_{2},\ldots,\mathcal{C}_{k}]\cdot A$ with a non-singular by column (NSC) matrix $A$ plays an important role in constructing good quantum error-correcting codes. In this paper, we study the MP code when the defining matrix $A$ satisfies the condition that $AA^{\dag}$ is $(D,\tau)$-monomial. We give an explicit formula for calculating the dimension of the Hermitian hull of a MP code. We provide the necessary and sufficient conditions that a MP code is Hermitian dual-containing (HDC), almost Hermitian dual-containing (AHDC), Hermitian self-orthogonal (HSO), almost Hermitian self-orthogonal (AHSO), and Hermitian LCD, respectively. We theoretically determine the number of all possible ways involving the relationships among the constituent codes to yield a MP code with these properties, respectively. We give alternative necessary and sufficient conditions for a MP code to be AHDC and AHSO, respectively, and show several cases where a MP code is not AHDC or AHSO. We provide the construction methods of HDC and AHDC MP codes, including those with optimal minimum distance lower bounds.
- Published
- 2024
30. Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation
- Author
-
Huang, Xiaoshuang, Li, Hongxiang, Cao, Meng, Chen, Long, You, Chenyu, and An, Dong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent developments underscore the potential of textual information in enhancing learning models for a deeper understanding of medical visual semantics. However, language-guided medical image segmentation still faces a challenging issue. Previous works employ implicit and ambiguous architectures to embed textual information. This leads to segmentation results that are inconsistent with the semantics represented by the language, sometimes even diverging significantly. To this end, we propose a novel cross-modal conditioned Reconstruction for Language-guided Medical Image Segmentation (RecLMIS) to explicitly capture cross-modal interactions, which assumes that well-aligned medical visual features and medical notes can effectively reconstruct each other. We introduce conditioned interaction to adaptively predict patches and words of interest. Subsequently, they are utilized as conditioning factors for mutual reconstruction to align with regions described in the medical notes. Extensive experiments demonstrate the superiority of our RecLMIS, surpassing LViT by 3.74% mIoU on the publicly available MosMedData+ dataset and achieving an average increase of 1.89% mIoU for cross-domain tests on our QATA-CoV19 dataset. Simultaneously, we achieve a relative reduction of 20.2% in parameter count and a 55.5% decrease in computational load. The code will be available at https://github.com/ShashankHuang/RecLMIS.
- Published
- 2024
31. Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations
- Author
-
Yu, Lei, Cao, Meng, Cheung, Jackie Chi Kit, and Dong, Yue
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge. To explore the mechanistic causes of these hallucinations, we create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations. We discover two general and distinct mechanistic causes of hallucinations shared across LMs (Llama-2, Pythia, GPT-J): 1) knowledge enrichment hallucinations: insufficient subject attribute knowledge in lower layer MLPs, and 2) answer extraction hallucinations: failure to select the correct object attribute in upper layer attention heads. We also found these two internal mechanistic causes of hallucinations are reflected in external manifestations. Based on insights from our mechanistic analysis, we propose a novel hallucination mitigation method through targeted restoration of the LM's internal fact recall pipeline, demonstrating superior performance compared to baselines.
- Published
- 2024
32. depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers
- Author
-
You, Kaichao, Bai, Runsheng, Cao, Meng, Wang, Jianmin, Stoica, Ion, and Long, Mingsheng
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Programming Languages - Abstract
PyTorch \texttt{2.x} introduces a compiler designed to accelerate deep learning programs. However, for machine learning researchers, adapting to the PyTorch compiler to full potential can be challenging. The compiler operates at the Python bytecode level, making it appear as an opaque box. To address this, we introduce \texttt{depyf}, a tool designed to demystify the inner workings of the PyTorch compiler. \texttt{depyf} decompiles bytecode generated by PyTorch back into equivalent source code, and establishes connections between in-memory code objects and their on-disk source code counterparts. This feature enables users to step through the source code line by line using debuggers, thus enhancing their understanding of the underlying processes. Notably, \texttt{depyf} is non-intrusive and user-friendly, primarily relying on two convenient context managers for its core functionality. The project is \href{https://github.com/thuml/depyf}{ openly available} and is recognized as a \href{https://pytorch.org/ecosystem/}{PyTorch ecosystem project}., Comment: 16 pages, 2 figures
- Published
- 2024
33. Predicting Learning Performance with Large Language Models: A Study in Adult Literacy
- Author
-
Zhang, Liang, Lin, Jionghao, Borchers, Conrad, Sabatini, John, Hollander, John, Cao, Meng, and Hu, Xiangen
- Subjects
Computer Science - Computers and Society ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Intelligent Tutoring Systems (ITSs) have significantly enhanced adult literacy training, a key factor for societal participation, employment opportunities, and lifelong learning. Our study investigates the application of advanced AI models, including Large Language Models (LLMs) like GPT-4, for predicting learning performance in adult literacy programs in ITSs. This research is motivated by the potential of LLMs to predict learning performance based on its inherent reasoning and computational capabilities. By using reading comprehension datasets from the ITS, AutoTutor, we evaluate the predictive capabilities of GPT-4 versus traditional machine learning methods in predicting learning performance through five-fold cross-validation techniques. Our findings show that the GPT-4 presents the competitive predictive abilities with traditional machine learning methods such as Bayesian Knowledge Tracing, Performance Factor Analysis, Sparse Factor Analysis Lite (SPARFA-Lite), tensor factorization and eXtreme Gradient Boosting (XGBoost). While XGBoost (trained on local machine) outperforms GPT-4 in predictive accuracy, GPT-4-selected XGBoost and its subsequent tuning on the GPT-4 platform demonstrates superior performance compared to local machine execution. Moreover, our investigation into hyper-parameter tuning by GPT-4 versus grid-search suggests comparable performance, albeit with less stability in the automated approach, using XGBoost as the case study. Our study contributes to the field by highlighting the potential of integrating LLMs with traditional machine learning models to enhance predictive accuracy and personalize adult literacy education, setting a foundation for future research in applying LLMs within ITSs., Comment: 26TH International Conference on Human-Computer Interaction
- Published
- 2024
34. Integrated Heterogeneous Graph Attention Network for Incomplete Multi-modal Clustering
- Author
-
Wang, Yu, Yao, Xinjie, Zhu, Pengfei, Li, Weihao, Cao, Meng, and Hu, Qinghua
- Published
- 2024
- Full Text
- View/download PDF
35. Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation
- Author
-
Liu, Aiwei, Bai, Haoping, Lu, Zhiyun, Kong, Xiang, Wang, Simon, Shan, Jiulong, Cao, Meng, and Wen, Lijie
- Subjects
Computer Science - Computation and Language ,68T50 ,I.2.7 - Abstract
Aligning large language models (LLMs) with human expectations without human-annotated preference data is an important problem. In this paper, we propose a method to evaluate the response preference by using the output probabilities of response pairs under contrastive prompt pairs, which could achieve better performance on LLaMA2-7B and LLaMA2-13B compared to RLAIF. Based on this, we propose an automatic alignment method, Direct Large Model Alignment (DLMA). First, we use contrastive prompt pairs to automatically generate preference data. Then, we continue to evaluate the generated preference data using contrastive prompt pairs and calculate a self-rewarding score. Finally, we use the DPO algorithm to effectively align LLMs by combining this self-rewarding score. In the experimental stage, our DLMA method could surpass the \texttt{RLHF} method without relying on human-annotated preference data., Comment: 24 pages, 5 pages
- Published
- 2024
36. Dynamical quantum state tomography with time-dependent channels
- Author
-
Cao, Meng and Wang, Yu
- Subjects
Quantum Physics - Abstract
In this paper, we establish a dynamical quantum state tomography framework. Under this framework, it is feasible to obtain complete knowledge of any unknown state of a $d$-level system via only an arbitrary operator of certain types of IC-POVMs in dimension $d$. We show that under the time-dependent average channel, we can acquire a collection of projective operators that is informationally complete (IC) and thus obtain the corresponding IC-POVMs. We show that under certain condition, it is possible to obtain infinite families of projective operators that are IC, and obtain infinite families of corresponding IC-POVMs; otherwise, the Zauner's conjecture is incorrect. We also show how to simulate a SIC-POVM on any unknown quantum state by using the time-dependent average channel., Comment: 23 pages, 1 table
- Published
- 2024
37. Recommendation Fairness in Social Networks Over Time
- Author
-
Cao, Meng, Hussain, Hussain, Sikdar, Sandipan, Helic, Denis, Strohmaier, Markus, and Kern, Roman
- Subjects
Computer Science - Social and Information Networks ,Computer Science - Computers and Society ,Computer Science - Information Retrieval - Abstract
In social recommender systems, it is crucial that the recommendation models provide equitable visibility for different demographic groups, such as gender or race. Most existing research has addressed this problem by only studying individual static snapshots of networks that typically change over time. To address this gap, we study the evolution of recommendation fairness over time and its relation to dynamic network properties. We examine three real-world dynamic networks by evaluating the fairness of six recommendation algorithms and analyzing the association between fairness and network properties over time. We further study how interventions on network properties influence fairness by examining counterfactual scenarios with alternative evolution outcomes and differing network properties. Our results on empirical datasets suggest that recommendation fairness improves over time, regardless of the recommendation method. We also find that two network properties, minority ratio, and homophily ratio, exhibit stable correlations with fairness over time. Our counterfactual study further suggests that an extreme homophily ratio potentially contributes to unfair recommendations even with a balanced minority ratio. Our work provides insights into the evolution of fairness within dynamic networks in social science. We believe that our findings will help system operators and policymakers to better comprehend the implications of temporal changes and interventions targeting fairness in social networks.
- Published
- 2024
38. 3DG: A Framework for Using Generative AI for Handling Sparse Learner Performance Data From Intelligent Tutoring Systems
- Author
-
Zhang, Liang, Lin, Jionghao, Borchers, Conrad, Cao, Meng, and Hu, Xiangen
- Subjects
Computer Science - Computers and Society ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Learning performance data (e.g., quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data collected from Intelligent Tutoring Systems (ITSs) often suffers from sparsity, impacting the accuracy of learner modeling and knowledge assessments. To address this, we introduce the 3DG framework (3-Dimensional tensor for Densification and Generation), a novel approach combining tensor factorization with advanced generative models, including Generative Adversarial Network (GAN) and Generative Pre-trained Transformer (GPT), for enhanced data imputation and augmentation. The framework operates by first representing the data as a three-dimensional tensor, capturing dimensions of learners, questions, and attempts. It then densifies the data through tensor factorization and augments it using Generative AI models, tailored to individual learning patterns identified via clustering. Applied to data from an AutoTutor lesson by the Center for the Study of Adult Literacy (CSAL), the 3DG framework effectively generated scalable, personalized simulations of learning performance. Comparative analysis revealed GAN's superior reliability over GPT-4 in this context, underscoring its potential in addressing data sparsity challenges in ITSs and contributing to the advancement of personalized educational technology.
- Published
- 2024
39. Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation
- Author
-
Cao, Meng, Shu, Lei, Yu, Lei, Zhu, Yun, Wichers, Nevan, Liu, Yinxiao, and Meng, Lei
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Reinforcement learning (RL) can align language models with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only a single reward for an entire output. This sparsity of rewards can lead to inefficient and unstable learning. To address this challenge, our paper introduces an novel framework that utilizes the critique capability of Large Language Models (LLMs) to produce intermediate-step rewards during RL training. Our method involves coupling a policy model with a critic language model, which is responsible for providing comprehensive feedback of each part of the output. This feedback is then translated into token or span-level rewards that can be used to guide the RL training process. We investigate this approach under two different settings: one where the policy model is smaller and is paired with a more powerful critic model, and another where a single language model fulfills both roles. We assess our approach on three text generation tasks: sentiment control, language model detoxification, and summarization. Experimental results show that incorporating artificial intrinsic rewards significantly improve both sample efficiency and the overall performance of the policy model, supported by both automatic and human evaluation.
- Published
- 2024
40. Responsible AI Considerations in Text Summarization Research: A Review of Current Practices
- Author
-
Liu, Yu Lu, Cao, Meng, Blodgett, Su Lin, Cheung, Jackie Chi Kit, Olteanu, Alexandra, and Trischler, Adam
- Subjects
Computer Science - Computation and Language - Abstract
AI and NLP publication venues have increasingly encouraged researchers to reflect on possible ethical considerations, adverse impacts, and other responsible AI issues their work might engender. However, for specific NLP tasks our understanding of how prevalent such issues are, or when and why these issues are likely to arise, remains limited. Focusing on text summarization -- a common NLP task largely overlooked by the responsible AI community -- we examine research and reporting practices in the current literature. We conduct a multi-round qualitative analysis of 333 summarization papers from the ACL Anthology published between 2020-2022. We focus on how, which, and when responsible AI issues are covered, which relevant stakeholders are considered, and mismatches between stated and realized research goals. We also discuss current evaluation practices and consider how authors discuss the limitations of both prior work and their own work. Overall, we find that relatively few papers engage with possible stakeholders or contexts of use, which limits their consideration of potential downstream adverse impacts or other responsible AI issues. Based on our findings, we make recommendations on concrete practices and research directions.
- Published
- 2023
41. Exploring Recommendation Capabilities of GPT-4V(ision): A Preliminary Case Study
- Author
-
Zhou, Peilin, Cao, Meng, Huang, You-Liang, Ye, Qichen, Zhang, Peiyan, Liu, Junling, Xie, Yueqi, Hua, Yining, and Kim, Jaeboum
- Subjects
Computer Science - Information Retrieval ,Computer Science - Computation and Language - Abstract
Large Multimodal Models (LMMs) have demonstrated impressive performance across various vision and language tasks, yet their potential applications in recommendation tasks with visual assistance remain unexplored. To bridge this gap, we present a preliminary case study investigating the recommendation capabilities of GPT-4V(ison), a recently released LMM by OpenAI. We construct a series of qualitative test samples spanning multiple domains and employ these samples to assess the quality of GPT-4V's responses within recommendation scenarios. Evaluation results on these test samples prove that GPT-4V has remarkable zero-shot recommendation abilities across diverse domains, thanks to its robust visual-text comprehension capabilities and extensive general knowledge. However, we have also identified some limitations in using GPT-4V for recommendations, including a tendency to provide similar responses when given similar inputs. This report concludes with an in-depth discussion of the challenges and research opportunities associated with utilizing GPT-4V in recommendation scenarios. Our objective is to explore the potential of extending LMMs from vision and language tasks to recommendation tasks. We hope to inspire further research into next-generation multimodal generative recommendation models, which can enhance user experiences by offering greater diversity and interactivity. All images and prompts used in this report will be accessible at https://github.com/PALIN2018/Evaluate_GPT-4V_Rec., Comment: In Progress
- Published
- 2023
42. Successor Features for Efficient Multisubject Controlled Text Generation
- Author
-
Cao, Meng, Fatemi, Mehdi, Cheung, Jackie Chi Kit, and Shabanian, Samira
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
While large language models (LLMs) have achieved impressive performance in generating fluent and realistic text, controlling the generated text so that it exhibits properties such as safety, factuality, and non-toxicity remains challenging. % such as DExperts, GeDi, and rectification Existing decoding-based methods are static in terms of the dimension of control; if the target subject is changed, they require new training. Moreover, it can quickly become prohibitive to concurrently control multiple subjects. In this work, we introduce SF-GEN, which is grounded in two primary concepts: successor features (SFs) to decouple the LLM's dynamics from task-specific rewards, and language model rectification to proportionally adjust the probability of selecting a token based on the likelihood that the finished text becomes undesired. SF-GEN seamlessly integrates the two to enable dynamic steering of text generation with no need to alter the LLM's parameters. Thanks to the decoupling effect induced by successor features, our method proves to be memory-wise and computationally efficient for training as well as decoding, especially when dealing with multiple target subjects. To the best of our knowledge, our research represents the first application of successor features in text generation. In addition to its computational efficiency, the resultant language produced by our method is comparable to the SOTA (and outperforms baselines) in both control measures as well as language quality, which we demonstrate through a series of experiments in various controllable text generation tasks.
- Published
- 2023
43. Video Referring Expression Comprehension via Transformer with Content-conditioned Query
- Author
-
Jiang, Ji, Cao, Meng, Song, Tengtao, Chen, Long, Wang, Yi, and Zou, Yuexian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Video Referring Expression Comprehension (REC) aims to localize a target object in videos based on the queried natural language. Recent improvements in video REC have been made using Transformer-based methods with learnable queries. However, we contend that this naive query design is not ideal given the open-world nature of video REC brought by text supervision. With numerous potential semantic categories, relying on only a few slow-updated queries is insufficient to characterize them. Our solution to this problem is to create dynamic queries that are conditioned on both the input video and language to model the diverse objects referred to. Specifically, we place a fixed number of learnable bounding boxes throughout the frame and use corresponding region features to provide prior information. Also, we noticed that current query features overlook the importance of cross-modal alignment. To address this, we align specific phrases in the sentence with semantically relevant visual areas, annotating them in existing video datasets (VID-Sentence and VidSTG). By incorporating these two designs, our proposed model (called ConFormer) outperforms other models on widely benchmarked datasets. For example, in the testing split of VID-Sentence dataset, ConFormer achieves 8.75% absolute improvement on Accu.@0.6 compared to the previous state-of-the-art model., Comment: Accepted to ACM International Conference on Multimedia Workshop (ACM MM), 2023. arXiv admin note: substantial text overlap with arXiv:2210.02953
- Published
- 2023
44. Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model
- Author
-
Ye, Qichen, Liu, Junling, Chong, Dading, Zhou, Peilin, Hua, Yining, Liu, Fenglin, Cao, Meng, Wang, Ziming, Cheng, Xuxin, Lei, Zhu, and Guo, Zhenhua
- Subjects
Computer Science - Computation and Language - Abstract
Integrating large language models (LLMs) into healthcare holds great potential but faces challenges. Pre-training LLMs from scratch for domains like medicine is resource-heavy and often unfeasible. On the other hand, sole reliance on Supervised Fine-tuning (SFT) can result in overconfident predictions and may not tap into domain-specific insights. In response, we present a multi-stage training method combining Domain-specific Continued Pre-training (DCPT), SFT, and Direct Preference Optimization (DPO). In addition, we publish a 3Gb Chinese Medicine (ChiMed) dataset, encompassing medical question answering, plain texts, knowledge graphs, and dialogues, segmented into three training stages. The medical LLM trained with our pipeline, Qilin-Med, shows substantial performance improvement. In the CPT and SFT phases, Qilin-Med achieved 38.4% and 40.0% accuracy on the CMExam test set, respectively. It outperformed the basemodel Baichuan-7B (accuracy: 33.5%), by 7.5%. In the DPO phase, it scored 16.66 in BLEU-1 and 27.44 in ROUGE-1 on the Huatuo-26M test set, bringing further improvement to the SFT phase (12.69 in BLEU-1 and 24.21 in ROUGE-1). Additionally, we have further enhanced the model's performance through the Retrieval Augmented Generation (RAG) approach. Experiments demonstrate that Qilin-Med-RAG achieves an accuracy rate of 42.8% on CMExam. These results highlight the contribution of our novel training approach in building LLMs for medical applications.
- Published
- 2023
45. VeCLIP: Improving CLIP Training via Visual-enriched Captions
- Author
-
Lai, Zhengfeng, Zhang, Haotian, Zhang, Bowen, Wu, Wentao, Bai, Haoping, Timofeev, Aleksei, Du, Xianzhi, Gan, Zhe, Shan, Jiulong, Chuah, Chen-Nee, Yang, Yinfei, and Cao, Meng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Large-scale web-crawled datasets are fundamental for the success of pre-training vision-language models, such as CLIP. However, the inherent noise and potential irrelevance of web-crawled AltTexts pose challenges in achieving precise image-text alignment. Existing methods utilizing large language models (LLMs) for caption rewriting have shown promise on small, curated datasets like CC3M and CC12M. This study introduces a scalable pipeline for noisy caption rewriting. Unlike recent LLM rewriting techniques, we emphasize the incorporation of visual concepts into captions, termed as Visual-enriched Captions (VeCap). To ensure data diversity, we propose a novel mixed training scheme that optimizes the utilization of AltTexts alongside newly generated VeCap. We showcase the adaptation of this method for training CLIP on large-scale web-crawled datasets, termed VeCLIP. Employing this cost-effective pipeline, we effortlessly scale our dataset up to 300 million samples named VeCap dataset. Our results show significant advantages in image-text alignment and overall model performance. For example, VeCLIP achieves up to +25.2% gain in COCO and Flickr30k retrieval tasks under the 12M setting. For data efficiency, VeCLIP achieves +3% gain while only using 14% of the data employed in the vanilla CLIP and 11% in ALIGN. We also note the VeCap data is complementary with other well curated datasets good for zero-shot classification tasks. When combining VeCap and DFN, our model can achieve strong performance on both of image-text retrieval and zero-shot classification tasks, e.g. 83.1% accuracy@1 on ImageNet zero-shot for a H/14 model. We release the pre-trained models at https://github.com/apple/ml-veclip., Comment: CV/ML
- Published
- 2023
46. Construction of $\varepsilon_{d}$-ASIC-POVMs via $2$-to-$1$ PN functions and the Li bound
- Author
-
Cao, Meng and Deng, Xiantao
- Subjects
Quantum Physics - Abstract
Symmetric informationally complete positive operator-valued measures (SIC-POVMs) in finite dimension $d$ are a particularly attractive case of informationally complete POVMs (IC-POVMs), which consist of $d^{2}$ subnormalized projectors with equal pairwise fidelity. However, it is difficult to construct SIC-POVMs, and it is not even clear whether there exists an infinite family of SIC-POVMs. To realize some possible applications in quantum information processing, Klappenecker et al. [37] introduced an approximate version of SIC-POVMs called approximately symmetric informationally complete POVMs (ASIC-POVMs). In this paper, we construct a class of $\varepsilon_{d}$-ASIC-POVMs in dimension $d=q$ and a class of $\varepsilon_{d}$-ASIC-POVMs in dimension $d=q+1$, respectively, where $q$ is a prime power. We prove that all $2$-to-$1$ perfect nonlinear (PN) functions can be used for constructing $\varepsilon_{q}$-ASIC-POVMs. We show that the set of vectors corresponding to the $\varepsilon_{q}$-ASIC-POVM forms a biangular frame. The construction of $\varepsilon_{q+1}$-ASIC-POVMs is based on a multiplicative character sum estimate called the Li bound. We show that the set of vectors corresponding to the $\varepsilon_{q+1}$-ASIC-POVM forms an asymptotically optimal codebook. We characterize "how close" the $\varepsilon_{q}$-ASIC-POVMs (resp. $\varepsilon_{q+1}$-ASIC-POVMs) are from being SIC-POVMs of dimension $q$ (resp. dimension $q+1$). Finally, we explain the significance of constructing $\varepsilon_{d}$-ASIC-POVMs.
- Published
- 2023
47. Low-speed impact damage analysis of aviation composite material structure
- Author
-
He Jun, Cao Meng, Wang Zhishu, and Cong Fanglin
- Subjects
composite laminates ,low-speed damage ,abaqus ,Environmental sciences ,GE1-350 - Abstract
Although the carbon fiber reinforced composite material has high specific strength and stiffness, design-versatility, anti-corrosion and other excellent features, but the impact resistance of composite structures is poor. Therefore, the composite laminates low-speed damage analysis has important significance. Based on a three-dimensional analysis theory of cumulative damage, using the commercial finite element analysis software ABAQUS to establish laminates subjected to low velocity impact finite element model. according to the numerical results and the consistency of the test results, shows that the used model of the article is reasonable and accurate, and the numerical simulation method is verified to be feasible. Finally, through the numerical simulation of process of laminated plates low speed impact damage, the damage characteristics and damage mechanism of the laminates at different times are analyzed, and the forming reasons and expanding rules of the main damage forms of fiber damage and matrix damage are revealed.
- Published
- 2021
- Full Text
- View/download PDF
48. Two SrII coordination compounds based on tetrazole-carboxylate ligands
- Author
-
Cao Meng-Jie, Miao Li-Li, Guo Meng-Yue, Yang Gao-Wen, and Li Qiao-Yun
- Subjects
crystal structure ,hatzp ,h2dtzpha ,sr(ii) ,tetrazole-carboxylate ligands ,Chemistry ,QD1-999 - Abstract
Two novel coordination compounds, [Sr(atzp)2(H2O)2]·CH3OH (1) and [Sr(dtzpha)(H2O)3]·4H2O (2) [Hatzp=5-aminotetrazole-1-propionic acid, H2dtzpha=1,3-di(tetrazol-5-yl)benzene-N2,N2′-diacetic acid)], have been generated by using 5-aminotetrazole-1-propionic acid and 1,3-di(tetrazol-5-yl)benzene-N2,N2′-diacetic acid to react with strontium salts, respectively. X-ray diffraction analysis reveals that carboxylic groups of two ligands show the same coordination mode (the μ1,1,3-COO coordination mode), compound 1 displays a 1D structure while compound 2 shows a 2D structure, which implies the influence of the number of the carboxylic acid groups. Luminescence properties of 1 and 2 were investigated at room temperature in the solid state.
- Published
- 2016
- Full Text
- View/download PDF
49. The influence of secondary wrong wiring of voltage transformer on measurement error
- Author
-
Li Suya, Sun Zhao, Li Lin, Cao Meng, Wei Jufang, and Zhao Cong
- Subjects
Environmental sciences ,GE1-350 - Abstract
In this paper, the influence of mutual opposition error between the standard secondary line at the end of the test bench and the tested secondary line in the error test of voltage transformer is analyzed, and the corresponding solutions are proposed for the wrong wiring.
- Published
- 2020
- Full Text
- View/download PDF
50. G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory
- Author
-
Li, Hongxiang, Cao, Meng, Cheng, Xuxin, Li, Yaowei, Zhu, Zhihong, and Zou, Yuexian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The recent video grounding works attempt to introduce vanilla contrastive learning into video grounding. However, we claim that this naive solution is suboptimal. Contrastive learning requires two key properties: (1) \emph{alignment} of features of similar samples, and (2) \emph{uniformity} of the induced distribution of the normalized features on the hypersphere. Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations. Both characteristics lead to vanilla contrastive learning being unsuitable for video grounding. In this paper, we introduce Geodesic and Game Localization (G2L), a semantically aligned and uniform video grounding framework via geodesic and game theory. We quantify the correlations among moments leveraging the geodesic distance that guides the model to learn the correct cross-modal representations. Furthermore, from the novel perspective of game theory, we propose semantic Shapley interaction based on geodesic distance sampling to learn fine-grained semantic alignment in similar moments. Experiments on three benchmarks demonstrate the effectiveness of our method., Comment: ICCV2023 oral, release the code
- Published
- 2023
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.