Author: "Wang, Chenglong" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wang, Chenglong"' showing total 2,966 results

Start Over Author "Wang, Chenglong"

2,966 results on '"Wang, Chenglong"'

1. The Speculative Hukou Conversion of Rural Migrants in China

Author: Wang, Chenglong and Shen, Jianfa
Published: 2024

2. Data Analysis in the Era of Generative AI

Author: Inala, Jeevana Priya, Wang, Chenglong, Drucker, Steven, Ramos, Gonzalo, Dibia, Victor, Riche, Nathalie, Brown, Dave, Marshall, Dan, and Gao, Jianfeng
Subjects: Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction
Abstract: This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges. We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow by translating high-level user intentions into executable code, charts, and insights. We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps. Finally, we discuss the research challenges that impede the development of these AI-based systems such as enhancing model capabilities, evaluating and benchmarking, and understanding end-user needs.
Published: 2024

3. Contextualized Data-Wrangling Code Generation in Computational Notebooks

Author: Huang, Junjie, Guo, Daya, Wang, Chenglong, Gu, Jiazhen, Lu, Shuai, Inala, Jeevana Priya, Yan, Cong, Gao, Jianfeng, Duan, Nan, and Lyu, Michael R.
Subjects: Computer Science - Software Engineering, Computer Science - Computation and Language, Computer Science - Databases
Abstract: Data wrangling, the process of preparing raw data for further analysis in computational notebooks, is a crucial yet time-consuming step in data science. Code generation has the potential to automate the data wrangling process to reduce analysts' overhead by translating user intents into executable code. Precisely generating data wrangling code necessitates a comprehensive consideration of the rich context present in notebooks, including textual context, code context and data context. However, notebooks often interleave multiple non-linear analysis tasks into linear sequence of code blocks, where the contextual dependencies are not clearly reflected. Directly training models with source code blocks fails to fully exploit the contexts for accurate wrangling code generation. To bridge the gap, we aim to construct a high quality datasets with clear and rich contexts to help training models for data wrangling code generation tasks. In this work, we first propose an automated approach, CoCoMine to mine data-wrangling code generation examples with clear multi-modal contextual dependency. It first adopts data flow analysis to identify the code blocks containing data wrangling codes. Then, CoCoMine extracts the contextualized datawrangling code examples through tracing and replaying notebooks. With CoCoMine, we construct CoCoNote, a dataset containing 58,221 examples for Contextualized Data-wrangling Code generation in Notebooks. To demonstrate the effectiveness of our dataset, we finetune a range of pretrained code models and prompt various large language models on our task. Furthermore, we also propose DataCoder, which encodes data context and code&textual contexts separately to enhance code generation. Experiment results demonstrate the significance of incorporating data context in data-wrangling code generation and the effectiveness of our model. We release code and data at url..., Comment: To appear at ASE 2024
Published: 2024
Full Text: View/download PDF

4. NDP: Next Distribution Prediction as a More Broad Target

Author: Ruan, Junhao, Abudula, Abudukeyumu, Liu, Xinyu, Li, Bei, Li, Yinqiao, Wang, Chenglong, Fan, Yuchun, Ge, Yuan, Xiao, Tong, and Zhu, Jingbo
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) trained on next-token prediction (NTP) paradigm have demonstrated powerful capabilities. However, the existing NTP paradigm contains several limitations, particularly related to planned task complications and error propagation during inference. In our work, we extend the critique of NTP, highlighting its limitation also due to training with a narrow objective: the prediction of a sub-optimal one-hot distribution. To support this critique, we conducted a pre-experiment treating the output distribution from powerful LLMs as efficient world data compression. By evaluating the similarity between the $n$-gram distribution and the one-hot distribution with LLMs, we observed that the $n$-gram distributions align more closely with the output distribution of LLMs. Based on this insight, we introduce Next Distribution Prediction (NDP), which uses $n$-gram distributions to replace the one-hot targets, enhancing learning without extra online training time. We conducted experiments across translation, general task, language transfer, and medical domain adaptation. Compared to NTP, NDP can achieve up to +2.97 COMET improvement in translation tasks, +0.61 average improvement in general tasks, and incredible +10.75 average improvement in the medical domain. This demonstrates the concrete benefits of addressing the target narrowing problem, pointing to a new direction for future work on improving NTP., Comment: 8 pages,5 figures
Published: 2024

5. Utilizing Speaker Profiles for Impersonation Audio Detection

Author: Gu, Hao, Yi, JiangYan, Wang, Chenglong, Ren, Yong, Tao, Jianhua, Yan, Xinrui, Chen, Yujie, and Zhang, Xiaohui
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Fake audio detection is an emerging active topic. A growing number of literatures have aimed to detect fake utterance, which are mostly generated by Text-to-speech (TTS) or voice conversion (VC). However, countermeasures against impersonation remain an underexplored area. Impersonation is a fake type that involves an imitator replicating specific traits and speech style of a target speaker. Unlike TTS and VC, which often leave digital traces or signal artifacts, impersonation involves live human beings producing entirely natural speech, rendering the detection of impersonation audio a challenging task. Thus, we propose a novel method that integrates speaker profiles into the process of impersonation audio detection. Speaker profiles are inherent characteristics that are challenging for impersonators to mimic accurately, such as speaker's age, job. We aim to leverage these features to extract discriminative information for detecting impersonation audio. Moreover, there is no large impersonated speech corpora available for quantitative study of impersonation impacts. To address this gap, we further design the first large-scale, diverse-speaker Chinese impersonation dataset, named ImPersonation Audio Detection (IPAD), to advance the community's research on impersonation audio detection. We evaluate several existing fake audio detection methods on our proposed dataset IPAD, demonstrating its necessity and the challenges. Additionally, our findings reveal that incorporating speaker profiles can significantly enhance the model's performance in detecting impersonation audio., Comment: Accepted by ACM MM2024
Published: 2024

6. Data Formulator 2: Iteratively Creating Rich Visualizations with AI

Author: Wang, Chenglong, Lee, Bongshin, Drucker, Steven, Marshall, Dan, and Gao, Jianfeng
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence
Abstract: To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need not only proficiency in data transformation and visualization tools but also efforts to manage the branching history consisting of many different versions of data and charts. Recent LLM-powered AI systems have greatly improved visualization authoring experiences, for example by mitigating manual data transformation barriers via LLMs' code generation ability. However, these systems do not work well for iterative visualization authoring, because they often require analysts to provide, in a single turn, a text-only prompt that fully describes the complex visualization task to be performed, which is unrealistic to both users and models in many cases. In this paper, we present Data Formulator 2, an LLM-powered visualization system to address these challenges. With Data Formulator 2, users describe their visualization intent with blended UI and natural language inputs, and data transformation are delegated to AI. To support iteration, Data Formulator 2 lets users navigate their iteration history and reuse previous designs towards new ones so that they don't need to start from scratch every time. In a user study with eight participants, we observed that Data Formulator 2 allows participants to develop their own iteration strategies to complete challenging data exploration sessions.
Published: 2024

7. RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

Author: Wang, Chenglong, Gan, Yang, Huo, Yifu, Mu, Yongyu, Yang, Murun, He, Qiaozhi, Xiao, Tong, Zhang, Chunliang, Liu, Tongran, Du, Quan, Yang, Di, and Zhu, Jingbo
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difficulty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM). In this work, we continue the line of research. We present a Robust Visual Reward Model (RoVRM) which improves human-preference alignment for LVLMs. RoVRM leverages auxiliary textual preference data through a three-phase progressive training and optimal transport-based preference data selection to effectively mitigate the scarcity of visual preference data. We experiment with RoVRM on the commonly used vision-language tasks based on the LLaVA-1.5-7B and -13B models. Experimental results demonstrate that RoVRM consistently outperforms traditional VRMs. Furthermore, our three-phase progressive training and preference data selection approaches can yield consistent performance gains over ranking-based alignment techniques, such as direct preference optimization.
Published: 2024

8. ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild

Author: Yi, Jiangyan, Zhang, Chu Yuan, Tao, Jianhua, Wang, Chenglong, Yan, Xinrui, Ren, Yong, Gu, Hao, and Zhou, Junzuo
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: The growing prominence of the field of audio deepfake detection is driven by its wide range of applications, notably in protecting the public from potential fraud and other malicious activities, prompting the need for greater attention and research in this area. The ADD 2023 challenge goes beyond binary real/fake classification by emulating real-world scenarios, such as the identification of manipulated intervals in partially fake audio and determining the source responsible for generating any fake audio, both with real-life implications, notably in audio forensics, law enforcement, and construction of reliable and trustworthy evidence. To further foster research in this area, in this article, we describe the dataset that was used in the fake game, manipulation region location and deepfake algorithm recognition tracks of the challenge. We also focus on the analysis of the technical methodologies by the top-performing participants in each task and note the commonalities and differences in their approaches. Finally, we discuss the current technical limitations as identified through the technical analysis, and provide a roadmap for future research directions. The dataset is available for download at http://addchallenge.cn/downloadADD2023., Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Published: 2024

9. Cross-layer Attention Sharing for Large Language Models

Author: Mu, Yongyu, Wu, Yuzhang, Fan, Yuchun, Wang, Chenglong, Li, Hengyu, He, Qiaozhi, Yang, Murun, Xiao, Tong, and Zhu, Jingbo
Subjects: Computer Science - Computation and Language
Abstract: As large language models (LLMs) evolve, the increase in model depth and parameter number leads to substantial redundancy. To enhance the efficiency of the attention mechanism, previous works primarily compress the KV cache or group attention heads, while largely overlooking redundancy between layers. Our comprehensive analyses across various LLMs show that highly similar attention patterns persist within most layers. It's intuitive to save the computation by sharing attention weights across layers. However, further analysis reveals two challenges: (1) Directly sharing the weight matrix without carefully rearranging the attention heads proves to be ineffective; (2) Shallow layers are vulnerable to small deviations in attention weights. Driven by these insights, we introduce LiSA, a lightweight substitute for self-attention in well-trained LLMs. LiSA employs tiny feed-forward networks to align attention heads between adjacent layers and low-rank matrices to approximate differences in layer-wise attention weights. Evaluations encompassing 13 typical benchmarks demonstrate that LiSA maintains high response quality in terms of accuracy and perplexity while reducing redundant attention calculations within 53-84% of the total layers. Our implementations of LiSA achieve a 6X compression of Q and K, with maximum throughput improvements of 19.5% for LLaMA3-8B and 32.3% for LLaMA2-7B., Comment: Working in process
Published: 2024

10. Hybrid Alignment Training for Large Language Models

Author: Wang, Chenglong, Zhou, Hang, Chang, Kaiyan, Li, Bei, Mu, Yongyu, Xiao, Tong, Liu, Tongran, and Zhu, Jingbo
Subjects: Computer Science - Computation and Language
Abstract: Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and human-preference alignment. However, aligning LLMs with these objectives in sequence suffers from an inherent problem: the objectives may conflict, and the LLMs cannot guarantee to simultaneously align with the instructions and human preferences well. To response to these, in this work, we propose a Hybrid Alignment Training (Hbat) approach, based on alternating alignment and modified elastic weight consolidation methods. The basic idea is to alternate between different objectives during alignment training, so that better collaboration can be achieved between the two alignment tasks.We experiment with Hbat on summarization and dialogue tasks. Experimental results show that the proposed \textsc{Hbat} can significantly outperform all baselines. Notably, Hbat yields consistent performance gains over the traditional two-stage alignment training when using both proximal policy optimization and direct preference optimization., Comment: accepted by ACL (Findings) 2024
Published: 2024

11. MDeRainNet: An Efficient Neural Network for Rain Streak Removal from Macro-pixel Images

Author: Yan, Tao, He, Weijiang, Wang, Chenglong, Zhu, Xiangjie, Wang, Yinghui, and Lau, Rynson W. H.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Since rainy weather always degrades image quality and poses significant challenges to most computer vision-based intelligent systems, image de-raining has been a hot research topic. Fortunately, in a rainy light field (LF) image, background obscured by rain streaks in one sub-view may be visible in the other sub-views, and implicit depth information and recorded 4D structural information may benefit rain streak detection and removal. However, existing LF image rain removal methods either do not fully exploit the global correlations of 4D LF data or only utilize partial sub-views, resulting in sub-optimal rain removal performance and no-equally good quality for all de-rained sub-views. In this paper, we propose an efficient network, called MDeRainNet, for rain streak removal from LF images. The proposed network adopts a multi-scale encoder-decoder architecture, which directly works on Macro-pixel images (MPIs) to improve the rain removal performance. To fully model the global correlation between the spatial and the angular information, we propose an Extended Spatial-Angular Interaction (ESAI) module to merge them, in which a simple and effective Transformer-based Spatial-Angular Interaction Attention (SAIA) block is also proposed for modeling long-range geometric correlations and making full use of the angular information. Furthermore, to improve the generalization performance of our network on real-world rainy scenes, we propose a novel semi-supervised learning framework for our MDeRainNet, which utilizes multi-level KL loss to bridge the domain gap between features of synthetic and real-world rain streaks and introduces colored-residue image guided contrastive regularization to reconstruct rain-free images. Extensive experiments conducted on synthetic and real-world LFIs demonstrate that our method outperforms the state-of-the-art methods both quantitatively and qualitatively., Comment: 13 pages, 13 figures, 4 tables
Published: 2024

12. RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection

Author: Chen, Yujie, Yi, Jiangyan, Xue, Jun, Wang, Chenglong, Zhang, Xiaohui, Dong, Shunbo, Zeng, Siding, Tao, Jianhua, Zhao, Lv, and Fan, Cunhang
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepfake detection. Specifically, we use sinc Layer and multiple convolutional layers to capture short-range features, and then design a bidirectional Mamba to address Mamba's unidirectional modelling problem and further capture long-range feature information. Moreover, we develop a bidirectional fusion module to integrate embeddings, enhancing audio context representation and combining short- and long-range information. The results show that our proposed RawBMamba achieves a 34.1\% improvement over Rawformer on ASVspoof2021 LA dataset, and demonstrates competitive performance on other datasets., Comment: Accepted by Interspeech 2024
Published: 2024

13. Efficient Prompting Methods for Large Language Models: A Survey

Author: Chang, Kaiyan, Xu, Songcheng, Wang, Chenglong, Luo, Yingfeng, Xiao, Tong, and Zhu, Jingbo
Subjects: Computer Science - Computation and Language
Abstract: Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks. While this approach opens the door to in-context learning of LLMs, it brings the additional computational burden of model inference and human effort of manual-designed prompts, particularly when using lengthy and complex prompts to guide and control the behavior of LLMs. As a result, the LLM field has seen a remarkable surge in efficient prompting methods. In this paper, we present a comprehensive overview of these methods. At a high level, efficient prompting methods can broadly be categorized into two approaches: prompting with efficient computation and prompting with efficient design. The former involves various ways of compressing prompts, and the latter employs techniques for automatic prompt optimization. We present the basic concepts of prompting, review the advances for efficient prompting, and highlight future research directions.
Published: 2024

14. Prior Constraints-based Reward Model Training for Aligning Large Language Models

Author: Zhou, Hang, Wang, Chenglong, Hu, Yimin, Xiao, Tong, Zhang, Chunliang, and Zhu, Jingbo
Subjects: Computer Science - Computation and Language
Abstract: Reinforcement learning with human feedback for aligning large language models (LLMs) trains a reward model typically using ranking loss with comparison pairs.However, the training procedure suffers from an inherent problem: the uncontrolled scaling of reward scores during reinforcement learning due to the lack of constraints while training the reward model.This paper proposes a Prior Constraints-based Reward Model (namely PCRM) training method to mitigate this problem. PCRM incorporates prior constraints, specifically, length ratio and cosine similarity between outputs of each comparison pair, during reward model training to regulate optimization magnitude and control score margins. We comprehensively evaluate PCRM by examining its rank correlation with human preferences and its effectiveness in aligning LLMs via RL. Experimental results demonstrate that PCRM significantly improves alignment performance by effectively constraining reward score scaling. As another bonus, our method is easily integrated into arbitrary rank-based alignment methods, such as direct preference optimization, and can yield consistent improvement., Comment: Accepted by CCL 2024
Published: 2024

15. Large Language Models are Parallel Multilingual Learners

Author: Mu, Yongyu, Feng, Peinan, Cao, Zhiquan, Wu, Yuzhang, Li, Bei, Wang, Chenglong, Xiao, Tong, Song, Kai, Liu, Tongran, Zhang, Chunliang, and Zhu, Jingbo
Subjects: Computer Science - Computation and Language
Abstract: In this study, we reveal an in-context learning (ICL) capability of multilingual large language models (LLMs): by translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities. To test this capability, we design extensive experiments encompassing 8 typical datasets, 7 languages and 8 state-of-the-art multilingual LLMs. Experimental results show that (1) incorporating more languages help PiM surpass the conventional ICL further; (2) even combining with the translations that are inferior to baseline performance can also help. Moreover, by examining the activated neurons in LLMs, we discover a counterintuitive but interesting phenomenon. Contrary to the common thought that PiM would activate more neurons than monolingual input to leverage knowledge learned from diverse languages, PiM actually inhibits neurons and promotes more precise neuron activation especially when more languages are added. This phenomenon aligns with the neuroscience insight about synaptic pruning, which removes less used neural connections, strengthens remainders, and then enhances brain intelligence., Comment: Working in process
Published: 2024

16. ContrastDiagnosis: Enhancing Interpretability in Lung Nodule Diagnosis Using Contrastive Learning

Author: Wang, Chenglong, Yi, Yinqiao, Wang, Yida, Zhang, Chengxiu, Liu, Yun, Mori, Kensaku, Yuan, Mei, and Yang, Guang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: With the ongoing development of deep learning, an increasing number of AI models have surpassed the performance levels of human clinical practitioners. However, the prevalence of AI diagnostic products in actual clinical practice remains significantly lower than desired. One crucial reason for this gap is the so-called `black box' nature of AI models. Clinicians' distrust of black box models has directly hindered the clinical deployment of AI products. To address this challenge, we propose ContrastDiagnosis, a straightforward yet effective interpretable diagnosis framework. This framework is designed to introduce inherent transparency and provide extensive post-hoc explainability for deep learning model, making them more suitable for clinical medical diagnosis. ContrastDiagnosis incorporates a contrastive learning mechanism to provide a case-based reasoning diagnostic rationale, enhancing the model's transparency and also offers post-hoc interpretability by highlighting similar areas. High diagnostic accuracy was achieved with AUC of 0.977 while maintain a high transparency and explainability.
Published: 2024

17. Revisiting Differentially Private Hyper-parameter Tuning

Author: Xiang, Zihang, Wang, Tianhao, Wang, Chenglong, and Wang, Di
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: We study the application of differential privacy in hyper-parameter tuning, a crucial process in machine learning involving selecting the best hyper-parameter from several candidates. Unlike many private learning algorithms, including the prevalent DP-SGD, the privacy implications of tuning remain insufficiently understood or often totally ignored. Recent works propose a generic private selection solution for the tuning process, yet a fundamental question persists: is this privacy bound tight? This paper provides an in-depth examination of this question. Initially, we provide studies affirming the current privacy analysis for private selection is indeed tight in general. However, when we specifically study the hyper-parameter tuning problem in a white-box setting, such tightness no longer holds. This is first demonstrated by applying privacy audit on the tuning process. Our findings underscore a substantial gap between current theoretical privacy bound and the empirical bound derived even under strong audit setups. This gap motivates our subsequent investigations. Our further study provides improved privacy results for private hyper-parameter tuning due to its distinct properties. Our results demonstrate broader applicability compared to prior analyses, which are limited to specific parameter configurations.
Published: 2024

18. Double random number encryption blind watermarking technique based on DWT-DCT domain

Author: Wang, Chenglong, Ma, Yi, and Wang, Xia
Published: 2024
Full Text: View/download PDF

19. Research on distribution position of chip-split groove of discrete-edge end mills based on structural dynamic stability

Author: Fu, Xiangfu, Wang, Chenglong, Zheng, Minli, Li, Shuo, and Chen, Enyi
Published: 2024
Full Text: View/download PDF

20. Maize smart-canopy architecture enhances yield at high densities

Author: Tian, Jinge, Wang, Chenglong, Chen, Fengyi, Qin, Wenchao, Yang, Hong, Zhao, Sihang, Xia, Jinliang, Du, Xian, Zhu, Yifan, Wu, Lishuan, Cao, Yan, Li, Hong, Zhuang, Junhong, Chen, Shaojiang, Zhang, Huayuan, Chen, Qiuyue, Zhang, Mingcai, Deng, Xing Wang, Deng, Dezhi, Li, Jigang, and Tian, Feng
Published: 2024
Full Text: View/download PDF

21. Global patterns of organic carbon transfer and accumulation across the land–ocean continuum constrained by radiocarbon data

Author: Wang, Chenglong, Qiu, Yifei, Hao, Zhe, Wang, Junjie, Zhang, Chuchu, Middelburg, Jack J., Wang, Yaping, and Zou, Xinqing
Published: 2024
Full Text: View/download PDF

22. Traditional Chinese Medicine Clerodendrum japonicum (C. japonicum) Ameliorates the Pulmonary Fibrosis through Inhibiting the TGF-β/Smad3 Signaling Pathway

Author: Jiangcun Wei, Wang, Chenglong, Zhou, Jianlong, Tang, Yunli, Deng, Qingmei, Lei, Hong, Qin, Liping, and Qin, Zujie
Published: 2024
Full Text: View/download PDF

23. DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing

Author: Vaithilingam, Priyan, Glassman, Elena L., Inala, Jeevana Priya, and Wang, Chenglong
Subjects: Computer Science - Human-Computer Interaction
Abstract: Users often rely on GUIs to edit and interact with visualizations - a daunting task due to the large space of editing options. As a result, users are either overwhelmed by a complex UI or constrained by a custom UI with a tailored, fixed subset of options with limited editing flexibility. Natural Language Interfaces (NLIs) are emerging as a feasible alternative for users to specify edits. However, NLIs forgo the advantages of traditional GUI: the ability to explore and repeat edits and see instant visual feedback. We introduce DynaVis, which blends natural language and dynamically synthesized UI widgets. As the user describes an editing task in natural language, DynaVis performs the edit and synthesizes a persistent widget that the user can interact with to make further modifications. Study participants (n=24) preferred DynaVis over the NLI-only interface citing ease of further edits and editing confidence due to immediate visual feedback.
Published: 2024

24. PhotoScout: Synthesis-Powered Multi-Modal Image Search

Author: Barnaby, Celeste, Chen, Qiaochu, Wang, Chenglong, and Dillig, Isil
Subjects: Computer Science - Human-Computer Interaction
Abstract: Due to the availability of increasingly large amounts of visual data, there is a growing need for tools that can help users find relevant images. While existing tools can perform image retrieval based on similarity or metadata, they fall short in scenarios that necessitate semantic reasoning about the content of the image. This paper explores a new multi-modal image search approach that allows users to conveniently specify and perform semantic image search tasks. With our tool, PhotoScout, the user interactively provides natural language descriptions, positive and negative examples, and object tags to specify their search tasks. Under the hood, PhotoScout is powered by a program synthesis engine that generates visual queries in a domain-specific language and executes the synthesized program to retrieve the desired images. In a study with 25 participants, we observed that PhotoScout allows users to perform image retrieval tasks more accurately and with less manual effort.
Published: 2024

25. ITEACH-Net: Inverted Teacher-studEnt seArCH Network for Emotion Recognition in Conversation

Author: Sun, Haiyang, Lian, Zheng, Wang, Chenglong, Chen, Kang, Sun, Licai, Liu, Bin, and Tao, Jianhua
Subjects: Computer Science - Multimedia
Abstract: There remain two critical challenges that hinder the development of ERC. Firstly, there is a lack of exploration into mining deeper insights from the data itself for conversational emotion tasks. Secondly, the systems exhibit vulnerability to random modality feature missing, which is a common occurrence in realistic settings. Focusing on these two key challenges, we propose a novel framework for incomplete multimodal learning in ERC, called "Inverted Teacher-studEnt seArCH Network (ITEACH-Net)." ITEACH-Net comprises two novel components: the Emotion Context Changing Encoder (ECCE) and the Inverted Teacher-Student (ITS) framework. Specifically, leveraging the tendency for emotional states to exhibit local stability within conversational contexts, ECCE captures these patterns and further perceives their evolution over time. Recognizing the varying challenges of handling incomplete versus complete data, ITS employs a teacher-student framework to decouple the respective computations. Subsequently, through Neural Architecture Search, the student model develops enhanced computational capabilities for handling incomplete data compared to the teacher model. During testing, we design a novel evaluation method, testing the model's performance under different missing rate conditions without altering the model weights. We conduct experiments on three benchmark ERC datasets, and the results demonstrate that our ITEACH-Net outperforms existing methods in incomplete multimodal ERC. We believe ITEACH-Net can inspire relevant research on the intrinsic nature of emotions within conversation scenarios and pave a more robust route for incomplete learning techniques. Codes will be made available.
Published: 2023

26. What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

Author: Zhang, Xiaohui, Yi, Jiangyan, Wang, Chenglong, Zhang, Chuyuan, Zeng, Siding, and Tao, Jianhua
Subjects: Computer Science - Sound, Computer Science - Cryptography and Security, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms. Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. To address this challenge, one of the emergent effective approaches is continual learning. In this paper, we propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection. The fundamental concept underlying RWM involves categorizing all classes into two groups: those with compact feature distributions across tasks, such as genuine audio, and those with more spread-out distributions, like various types of fake audio. These distinctions are quantified by means of the in-class cosine distance, which subsequently serves as the basis for RWM to introduce a trainable gradient modification direction for distinct data types. Experimental evaluations against mainstream continual learning methods reveal the superiority of RWM in terms of knowledge acquisition and mitigating forgetting in audio deepfake detection. Furthermore, RWM's applicability extends beyond audio deepfake detection, demonstrating its potential significance in diverse machine learning domains such as image recognition., Comment: Accepted by the main track The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)
Published: 2023

27. A two-way trust routing scheme to improve security in fog computing environment

Author: Wang, Jun, Luo, Ze, and Wang, Chenglong
Published: 2024
Full Text: View/download PDF

28. Mechanism of Endogenous Peptide PDYBX1 and Precursor Protein YBX1 in Hirschsprung’s Disease

Author: Sun, Qiaochu, Zhi, Zhengke, Wang, Chenglong, Du, Chunxia, Tang, Jie, Li, Hongxing, and Tang, Weibing
Published: 2024
Full Text: View/download PDF

29. Study on Hydration and Hardening Performance of Coal Gangue-Steel Slag-Cement Composite Cementitious Material

Author: Zhao, Xiaozhi, Wang, Liang, Wang, Chenglong, Xu, Jian, Hu, Wei, Li, Qi, and Wang, Hao
Published: 2024
Full Text: View/download PDF

30. How Do Analysts Understand and Verify AI-Assisted Data Analyses?

Author: Gu, Ken, Shang, Ruoxi, Althoff, Tim, Wang, Chenglong, and Drucker, Steven M.
Subjects: Computer Science - Human-Computer Interaction
Abstract: Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to incorrect conclusions. Therefore, validating AI assistance is crucial and challenging. Here, we explore how analysts understand and verify the correctness of AI-generated analyses. To observe analysts in diverse verification approaches, we develop a design probe equipped with natural language explanations, code, visualizations, and interactive data tables with common data operations. Through a qualitative user study (n=22) using this probe, we uncover common behaviors within verification workflows and how analysts' programming, analysis, and tool backgrounds reflect these behaviors. Additionally, we provide recommendations for analysts and highlight opportunities for designers to improve future AI-assistant experiences., Comment: Accepted to CHI 2024
Published: 2023

31. Data Formulator: AI-powered Concept-driven Visualization Authoring

Author: Wang, Chenglong, Thompson, John, and Lee, Bongshin
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence
Abstract: With most modern visualization tools, authors need to transform their data into tidy formats to create visualizations they want. Because this requires experience with programming or separate data processing tools, data transformation remains a barrier in visualization authoring. To address this challenge, we present a new visualization paradigm, concept binding, that separates high-level visualization intents and low-level data transformation steps, leveraging an AI agent. We realize this paradigm in Data Formulator, an interactive visualization authoring tool. With Data Formulator, authors first define data concepts they plan to visualize using natural languages or examples, and then bind them to visual channels. Data Formulator then dispatches its AI-agent to automatically transform the input data to surface these concepts and generate desired visualizations. When presenting the results (transformed table and output visualizations) from the AI agent, Data Formulator provides feedback to help authors inspect and understand them. A user study with 10 participants shows that participants could learn and use Data Formulator to create visualizations that involve challenging data transformations, and presents interesting future research directions.
Published: 2023

32. Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms

Author: Zhang, Chu Yuan, Yi, Jiangyan, Tao, Jianhua, Wang, Chenglong, and Yan, Xinrui
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent strides in neural speech synthesis technologies, while enjoying widespread applications, have nonetheless introduced a series of challenges, spurring interest in the defence against the threat of misuse and abuse. Notably, source attribution of synthesized speech has value in forensics and intellectual property protection, but prior work in this area has certain limitations in scope. To address the gaps, we present our findings concerning the identification of the sources of synthesized speech in this paper. We investigate the existence of speech synthesis model fingerprints in the generated speech waveforms, with a focus on the acoustic model and the vocoder, and study the influence of each component on the fingerprint in the overall speech waveforms. Our research, conducted using the multi-speaker LibriTTS dataset, demonstrates two key insights: (1) vocoders and acoustic models impart distinct, model-specific fingerprints on the waveforms they generate, and (2) vocoder fingerprints are the more dominant of the two, and may mask the fingerprints from the acoustic model. These findings strongly suggest the existence of model-specific fingerprints for both the acoustic model and the vocoder, highlighting their potential utility in source identification applications., Comment: Accepted by CCL 2024
Published: 2023

33. Audio Deepfake Detection: A Survey

Author: Yi, Jiangyan, Wang, Chenglong, Tao, Jianhua, Zhang, Xiaohui, Zhang, Chu Yuan, and Zhao, Yan
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences across various types of deepfake audio, then outline and analyse competitions, datasets, features, classifications, and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are discussed. In addition, we perform a unified comparison of representative features and classifiers on ASVspoof 2021, ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively. The survey shows that future research should address the lack of large scale datasets in the wild, poor generalization of existing detection methods to unknown fake attacks, as well as interpretability of detection results.
Published: 2023

34. Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection

Author: Fan, Cunhang, Xue, Jun, Tao, Jianhua, Yi, Jiangyan, Wang, Chenglong, Zheng, Chengshi, and Lv, Zhao
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The rhythm of bonafide speech is often difficult to replicate, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so as to improve the performance of FSD, the spatial reconstructed local attention Res2Net (SR-LA Res2Net) is proposed. Specifically, Res2Net is used as a backbone network to obtain multiscale information, and enhanced with a spatial reconstruction mechanism to avoid losing important information when the channel group is constantly superimposed. In addition, local attention is designed to make the model focus on the local information of the F0 subband. Experimental results on the ASVspoof 2019 LA dataset show that our proposed method obtains an equal error rate (EER) of 0.47% and a minimum tandem detection cost function (min t-DCF) of 0.0159, achieving the state-of-the-art performance among all of the single systems., Comment: Accept by Neural Networks
Published: 2023
Full Text: View/download PDF

35. Learning Evaluation Models from Large Language Models for Sequence Generation

Author: Wang, Chenglong, Zhou, Hang, Chang, Kaiyan, Liu, Tongran, Zhang, Chunliang, Du, Quan, Xiao, Tong, and Zhu, Jingbo
Subjects: Computer Science - Computation and Language
Abstract: Large language models achieve state-of-the-art performance on sequence generation evaluation, but typically have a large number of parameters. This is a computational challenge as presented by applying their evaluation capability at scale. To overcome the challenge, in this paper, we propose \textbf{ECT}, an \textbf{e}valuation \textbf{c}apability \textbf{t}ransfer method, to transfer the evaluation capability from LLMs to relatively lightweight language models. Based on the proposed ECT, we learn various evaluation models from ChatGPT, and employ them as reward models to improve sequence generation models via reinforcement learning and reranking approaches. Experimental results on machine translation, text style transfer, and summarization tasks demonstrate the effectiveness of our ECT. Notably, applying the learned evaluation models to sequence generation models results in better generated sequences as evaluated by commonly used metrics and ChatGPT.
Published: 2023

36. Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

Author: Zhang, Xiaohui, Yi, Jiangyan, Tao, Jianhua, Wang, Chenglong, and Zhang, Chuyuan
Subjects: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Current fake audio detection algorithms have achieved promising performances on most datasets. However, their performance may be significantly degraded when dealing with audio of a different dataset. The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets. To overcome this limitation, we propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting, called Regularized Adaptive Weight Modification (RAWM). When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances. The adaptive modification direction ensures the network can effectively detect fake audio on the new dataset while preserving its knowledge of old model, thus mitigating catastrophic forgetting. In addition, genuine audio collected from quite different acoustic conditions may skew their feature distribution, so we introduce a regularization constraint to force the network to remember the old distribution in this regard. Our method can easily be generalized to related fields, like speech emotion recognition. We also evaluate our approach across multiple datasets and obtain a significant performance improvement on cross-dataset experiments., Comment: 40th Internation Conference on Machine Learning (ICML 2023)
Published: 2023

37. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

Author: Wang, Chenglong, Zhou, Hang, Hu, Yimin, Huo, Yifu, Li, Bei, Liu, Tongran, Xiao, Tong, and Zhu, Jingbo
Subjects: Computer Science - Computation and Language
Abstract: Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e.g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences. This is a computational challenge as presented by the practice of sequence generation problems, such as machine translation, where we often deal with a large action space (\textit{e.g.,} a vocabulary) and a long action sequence (\textit{e.g.,} a translation). In this work, we introduce two-stage sampling and dynamic sampling approaches to improve the sampling efficiency during training sequence generation models via RL. We experiment with our approaches on the traditional sequence generation tasks, including machine translation and abstractive summarization. Furthermore, we evaluate our approaches in RL from human feedback (RLHF) through training a large language model using the reward model. Experimental results show that the efficient sampling-based RL, referred to as ESRL, can outperform all baselines in terms of both training efficiency and memory consumption. Notably, ESRL yields consistent performance gains over the strong REINFORCE, minimum risk training, and proximal policy optimization methods.
Published: 2023

38. Inverse design of subwavelength gratings-assisted ultracompact 1.55/2 μm wavelength diplexer based on a bullet-shaped structure

Author: Wen, Jin, Pan, Yu, Wu, Zhengwei, Ma, Chengju, Fan, Wei, Zhang, Ying, Zhang, Hui, Wang, Qian, Yu, Huimin, Qu, Shuangchao, Wang, Chenglong, and Yin, Lan
Published: 2024
Full Text: View/download PDF

39. Preparation and Tribological Behavior of N-doped Graphene Oxide Quantum Dots with MoS2 and Al2O3 Nanocomposites as Lubricant Additive in Aqueous Glycerol

Author: Xiong, Sang, He, Jiaqi, and Wang, Chenglong
Published: 2024
Full Text: View/download PDF

40. Study on the Independent and Joint Effects of Physical Activity and Sleep on Low Back Pain in Middle-aged and Elderly Adults

Author: LI Mingzhe, TIAN Yichuan, WANG Chenglong, WANG Jingjing
Subjects: low back pain, physical activity, sleep duration, middle-aged and elderly adults, root cause analysis, Medicine
Abstract: Background Low back pain (LBP) in middle-aged and elderly adults has become a significant public health issue worldwide. Physical activity and sleep are two core components of the 24-hour lifecycle, and maintaining adequate physical activity and good sleep are crucial for health, both of which are associated with LBP. Objective To investigate the prevalence of LBP in middle-aged and elderly adults in China, analyze the independent and combined effects of physical activity and sleep on its occurrence, and provide scientific evidence for behavioral health. Methods Based on the 2018 China Health and Retirement Longitudinal Study, participants without demographic, physical activity, sleep, and LBP data were excluded. A total of 13 496 eligible individuals aged 45 to 69 were included, and their demographic and behavioral information was collected. Binary logistic regression and multiple linear regression were used to examine the relationship between physical activity, sleep duration, and LBP, and a mediation model was constructed to analyze the mediating effect of sleep duration on the association between physical activity and LBP. Results The prevalence of LBP among the 13 496 participants was 39.0% (n=5 269). Inadequate sleep (1.96, and the path from physical activity to LBP was not significant (β=0.105, P>0.05), suggesting a complete mediating effect of sleep duration on the association between physical activity and LBP. Conclusion Over one-third of middle-aged and elderly adults in China suffer from LBP. Higher levels of physical activity or shorter sleep duration are associated with increased risk of LBP. Sleep duration plays a complete mediating role in the association between physical activity and LBP, where the increased risk of LBP associated with high-intensity physical activity is completely transmitted through reduced sleep duration. Adequate sleep duration plays an important role in reducing the risk of LBP associated with high-intensity physical activity. This study suggests that older adults should adjust their exercise intensity according to their own conditions and maintain adequate sleep duration to reduce the risk of LBP.
Published: 2024
Full Text: View/download PDF

41. $\mathrm{SAM^{Med}}$: A medical image annotation framework based on large vision model

Author: Wang, Chenglong, Li, Dexuan, Wang, Sucheng, Zhang, Chengxiu, Wang, Yida, Liu, Yun, and Yang, Guang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, large vision model, Segment Anything Model (SAM), has revolutionized the computer vision field, especially for image segmentation. SAM presented a new promptable segmentation paradigm that exhibit its remarkable zero-shot generalization ability. An extensive researches have explore the potential and limits of SAM in various downstream tasks. In this study, we presents $\mathrm{SAM^{Med}}$, an enhanced framework for medical image annotation that leverages the capabilities of SAM. $\mathrm{SAM^{Med}}$ framework consisted of two submodules, namely $\mathrm{SAM^{assist}}$ and $\mathrm{SAM^{auto}}$. The $\mathrm{SAM^{assist}}$ demonstrates the generalization ability of SAM to the downstream medical segmentation task using the prompt-learning approach. Results show a significant improvement in segmentation accuracy with only approximately 5 input points. The $\mathrm{SAM^{auto}}$ model aims to accelerate the annotation process by automatically generating input prompts. The proposed SAP-Net model achieves superior segmentation performance with only five annotated slices, achieving an average Dice coefficient of 0.80 and 0.82 for kidney and liver segmentation, respectively. Overall, $\mathrm{SAM^{Med}}$ demonstrates promising results in medical image annotation. These findings highlight the potential of leveraging large-scale vision models in medical image annotation tasks.
Published: 2023

42. Is Self-Repair a Silver Bullet for Code Generation?

Author: Olausson, Theo X., Inala, Jeevana Priya, Wang, Chenglong, Gao, Jianfeng, and Solar-Lezama, Armando
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Programming Languages, Computer Science - Software Engineering
Abstract: Large language models have shown remarkable aptitude in code generation, but still struggle to perform complex tasks. Self-repair -- in which the model debugs and repairs its own code -- has recently become a popular way to boost performance in these settings. However, despite its increasing popularity, existing studies of self-repair have been limited in scope; in many settings, its efficacy thus remains poorly understood. In this paper, we analyze Code Llama, GPT-3.5 and GPT-4's ability to perform self-repair on problems taken from HumanEval and APPS. We find that when the cost of carrying out repair is taken into account, performance gains are often modest, vary a lot between subsets of the data, and are sometimes not present at all. We hypothesize that this is because self-repair is bottlenecked by the model's ability to provide feedback on its own code; using a stronger model to artificially boost the quality of the feedback, we observe substantially larger performance gains. Similarly, a small-scale study in which we provide GPT-4 with feedback from human participants suggests that even for the strongest models, self-repair still lags far behind what can be achieved with human-level debugging., Comment: Accepted to ICLR 2024. Added additional Code Llama experiments and fixed a data processing error harming Code Llama's reported self-repair performance on HumanEval
Published: 2023

43. Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

Author: Wang, Chenglong, Yi, Jiangyan, Zhang, Xiaohui, Tao, Jianhua, Xu, Le, and Fu, Ruibo
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Self-supervised speech models are a rapidly developing research topic in fake audio detection. Many pre-trained models can serve as feature extractors, learning richer and higher-level speech features. However,when fine-tuning pre-trained models, there is often a challenge of excessively long training times and high memory consumption, and complete fine-tuning is also very expensive. To alleviate this problem, we apply low-rank adaptation(LoRA) to the wav2vec2 model, freezing the pre-trained model weights and injecting a trainable rank-decomposition matrix into each layer of the transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared with fine-tuning with Adam on the wav2vec2 model containing 317M training parameters, LoRA achieved similar performance by reducing the number of trainable parameters by 198 times., Comment: 6pages
Published: 2023

44. ADD 2023: the Second Audio Deepfake Detection Challenge

Author: Yi, Jiangyan, Tao, Jianhua, Fu, Ruibo, Yan, Xinrui, Wang, Chenglong, Wang, Tao, Zhang, Chu Yuan, Zhang, Xiaohui, Zhao, Yan, Ren, Yong, Xu, Le, Zhou, Junzuo, Gu, Hao, Wen, Zhengqi, Liang, Shan, Lian, Zheng, Nie, Shuai, and Li, Haizhou
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on surpassing the constraints of binary real/fake classification, and actually localizing the manipulated intervals in a partially fake speech as well as pinpointing the source responsible for generating any fake audio. Furthermore, ADD 2023 includes more rounds of evaluation for the fake audio game sub-challenge. The ADD 2023 challenge includes three subchallenges: audio fake game (FG), manipulation region location (RL) and deepfake algorithm recognition (AR). This paper describes the datasets, evaluation metrics, and protocols. Some findings are also reported in audio deepfake detection tasks.
Published: 2023

45. TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection

Author: Wang, Chenglong, Yi, Jiangyan, Tao, Jianhua, Zhang, Chuyuan, Zhang, Shuai, Fu, Ruibo, and Chen, Xun
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Current fake audio detection relies on hand-crafted features, which lose information during extraction. To overcome this, recent studies use direct feature extraction from raw audio signals. For example, RawNet is one of the representative works in end-to-end fake audio detection. However, existing work on RawNet does not optimize the parameters of the Sinc-conv during training, which limited its performance. In this paper, we propose to incorporate orthogonal convolution into RawNet, which reduces the correlation between filters when optimizing the parameters of Sinc-conv, thus improving discriminability. Additionally, we introduce temporal convolutional networks (TCN) to capture long-term dependencies in speech signals. Experiments on the ASVspoof 2019 show that the Our TO-RawNet system can relatively reduce EER by 66.09\% on logical access scenario compared with the RawNet, demonstrating its effectiveness in detecting fake audio attacks., Comment: Interspeech2023
Published: 2023

46. Detection of Cross-Dataset Fake Audio Based on Prosodic and Pronunciation Features

Author: Wang, Chenglong, Yi, Jiangyan, Tao, Jianhua, Zhang, Chuyuan, Zhang, Shuai, and Chen, Xun
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Existing fake audio detection systems perform well in in-domain testing, but still face many challenges in out-of-domain testing. This is due to the mismatch between the training and test data, as well as the poor generalizability of features extracted from limited views. To address this, we propose multi-view features for fake audio detection, which aim to capture more generalized features from prosodic, pronunciation, and wav2vec dimensions. Specifically, the phoneme duration features are extracted from a pre-trained model based on a large amount of speech data. For the pronunciation features, a Conformer-based phoneme recognition model is first trained, keeping the acoustic encoder part as a deeply embedded feature extractor. Furthermore, the prosodic and pronunciation features are fused with wav2vec features based on an attention mechanism to improve the generalization of fake audio detection models. Results show that the proposed approach achieves significant performance gains in several cross-dataset experiments., Comment: Interspeech2023
Published: 2023

47. Construction of store-operated calcium entry-related gene signature for predicting prognosis and indicates immune microenvironment infiltration in stomach adenocarcinomas

Author: Zhang, Zichao, Wang, Chenglong, Shi, Wenzheng, Wang, Zhihui, and Fu, Weihua
Published: 2024
Full Text: View/download PDF

48. Tetraspanin 3 promotes NSCLC cell proliferation via regulation of β1 integrin intracellular recycling

Author: Zhang, Yao, Wang, Chenglong, Xu, Yitong, and Su, Hongbo
Published: 2024
Full Text: View/download PDF

49. Development and validation of a nomogram integrating marital status for 5-year overall survival of chondrosarcoma: a population-based study

Author: Xie, Chengxin, Jiang, Ruiyuan, Wang, Chenglong, Lei, Xinhuan, Lu, Kaicheng, and Luo, Hua
Published: 2024
Full Text: View/download PDF

50. SPOCK2 modulates neuropathic pain by interacting with MT1-MMP to regulate astrocytic MMP-2 activation in rats with chronic constriction injury

Author: Wang, Chenglong, Xu, Yitong, Xu, Miao, Sun, Cong, Zhang, Xiaojiao, Tao, Xueshu, and Song, Tao
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

2,966 results on '"Wang, Chenglong"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources