Author: "Fang, Yuwei" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Fang, Yuwei"' showing total 202 results

Start Over Author "Fang, Yuwei"

202 results on '"Fang, Yuwei"'

1. VIMI: Grounding Video Generation through Multi-modal Instruction

Author: Fang, Yuwei, Menapace, Willi, Siarohin, Aliaksandr, Chen, Tsai-Shien, Wang, Kuan-Chien, Skorokhodov, Ivan, Neubig, Graham, and Tulyakov, Sergey
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Existing text-to-video diffusion models rely solely on text-only encoders for their pretraining. This limitation stems from the absence of large-scale multimodal prompt video datasets, resulting in a lack of visual grounding and restricting their versatility and application in multimodal integration. To address this, we construct a large-scale multimodal prompt dataset by employing retrieval methods to pair in-context examples with the given text prompts and then utilize a two-stage training strategy to enable diverse video generation tasks within the same model. In the first stage, we propose a multimodal conditional video generation framework for pretraining on these augmented datasets, establishing a foundational model for grounded video generation. Secondly, we finetune the model from the first stage on three video generation tasks, incorporating multi-modal instructions. This process further refines the model's ability to handle diverse inputs and tasks, ensuring seamless integration of multi-modal information. After this two-stage train-ing process, VIMI demonstrates multimodal understanding capabilities, producing contextually rich and personalized videos grounded in the provided inputs, as shown in Figure 1. Compared to previous visual grounded video generation methods, VIMI can synthesize consistent and temporally coherent videos with large motion while retaining the semantic control. Lastly, VIMI also achieves state-of-the-art text-to-video generation results on UCF101 benchmark.
Published: 2024

2. VIA: Unified Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

Author: Gu, Jing, Fang, Yuwei, Skorokhodov, Ivan, Wonka, Peter, Du, Xinya, Tulyakov, Sergey, and Wang, Xin Eric
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Multimedia
Abstract: Video editing is a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistent edits in the spatiotemporal dimension, especially for long videos. In this paper, we introduce VIA, a unified spatiotemporal Video Adaptation framework for global and local video editing, pushing the limits of consistently editing minute-long videos. First, to ensure local consistency within individual frames, we designed test-time editing adaptation to adapt a pre-trained image editing model for improving consistency between potential editing directions and the text instruction, and adapt masked latent variables for precise local control. Furthermore, to maintain global consistency over the video sequence, we introduce spatiotemporal adaptation that recursively gather consistent attention variables in key frames and strategically applies them across the whole sequence to realize the editing effects. Extensive experiments demonstrate that, compared to baseline methods, our VIA approach produces edits that are more faithful to the source videos, more coherent in the spatiotemporal context, and more precise in local control. More importantly, we show that VIA can achieve consistent long video editing in minutes, unlocking the potential for advanced video editing tasks over long video sequences., Comment: 19 pages, 14 figures
Published: 2024

3. MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

Author: Wang, Kuan-Chieh, Ostashev, Daniil, Fang, Yuwei, Tulyakov, Sergey, and Aberman, Kfir
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics
Abstract: We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts mechanism utilized in large language models (LLMs), MoA distributes the generation workload between two attention pathways: a personalized branch and a non-personalized prior branch. MoA is designed to retain the original model's prior by fixing its attention layers in the prior branch, while minimally intervening in the generation process with the personalized branch that learns to embed subjects in the layout and context generated by the prior branch. A novel routing mechanism manages the distribution of pixels in each layer across these branches to optimize the blend of personalized and generic content creation. Once trained, MoA facilitates the creation of high-quality, personalized images featuring multiple subjects with compositions and interactions as diverse as those generated by the original model. Crucially, MoA enhances the distinction between the model's pre-existing capability and the newly augmented personalized intervention, thereby offering a more disentangled subject-context control that was previously unattainable. Project page: https://snap-research.github.io/mixture-of-attention, Comment: Project Website: https://snap-research.github.io/mixture-of-attention, Same as previous version, only updated metadata because bib was missing an author name
Published: 2024

4. Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Author: Chen, Tsai-Shien, Siarohin, Aliaksandr, Menapace, Willi, Deyneka, Ekaterina, Chao, Hsiang-wei, Jeon, Byung Eun, Fang, Yuwei, Lee, Hsin-Ying, Ren, Jian, Yang, Ming-Hsuan, and Tulyakov, Sergey
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to collect. First of all, manual labeling is more time-consuming, as it requires an annotator to watch an entire video. Second, videos have a temporal dimension, consisting of several scenes stacked together, and showing multiple actions. Accordingly, to establish a video dataset with high-quality captions, we propose an automatic approach leveraging multimodal inputs, such as textual video description, subtitles, and individual video frames. Specifically, we curate 3.8M high-resolution videos from the publicly available HD-VILA-100M dataset. We then split them into semantically consistent video clips, and apply multiple cross-modality teacher models to obtain captions for each video. Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation. In this way, we get 70M videos paired with high-quality text captions. We dub the dataset as Panda-70M. We show the value of the proposed dataset on three downstream tasks: video captioning, video and text retrieval, and text-driven video generation. The models trained on the proposed data score substantially better on the majority of metrics across all the tasks., Comment: CVPR 2024. Project Page: https://snap-research.github.io/Panda-70M
Published: 2024

5. Evaluating Very Long-Term Conversational Memory of LLM Agents

Author: Maharana, Adyasha, Lee, Dong-Ho, Tulyakov, Sergey, Bansal, Mohit, Barbieri, Francesco, and Fang, Yuwei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Existing works on long-term open-domain dialogues focus on evaluating model responses within contexts spanning no more than five chat sessions. Despite advancements in long-context large language models (LLMs) and retrieval augmented generation (RAG) techniques, their efficacy in very long-term dialogues remains unexplored. To address this research gap, we introduce a machine-human pipeline to generate high-quality, very long-term dialogues by leveraging LLM-based agent architectures and grounding their dialogues on personas and temporal event graphs. Moreover, we equip each agent with the capability of sharing and reacting to images. The generated conversations are verified and edited by human annotators for long-range consistency and grounding to the event graphs. Using this pipeline, we collect LoCoMo, a dataset of very long-term conversations, each encompassing 300 turns and 9K tokens on avg., over up to 35 sessions. Based on LoCoMo, we present a comprehensive evaluation benchmark to measure long-term memory in models, encompassing question answering, event summarization, and multi-modal dialogue generation tasks. Our experimental results indicate that LLMs exhibit challenges in understanding lengthy conversations and comprehending long-range temporal and causal dynamics within dialogues. Employing strategies like long-context LLMs or RAG can offer improvements but these models still substantially lag behind human performance., Comment: 19 pages; Project page: https://snap-research.github.io/locomo/
Published: 2024

6. Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Author: Menapace, Willi, Siarohin, Aliaksandr, Skorokhodov, Ivan, Deyneka, Ekaterina, Chen, Tsai-Shien, Kag, Anil, Fang, Yuwei, Stoliar, Aleksei, Ricci, Elisa, Ren, Jian, and Tulyakov, Sergey
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability. In this work, we build Snap Video, a video-first model that systematically addresses these challenges. To do that, we first extend the EDM framework to take into account spatially and temporally redundant pixels and naturally support video generation. Second, we show that a U-Net - a workhorse behind image generation - scales poorly when generating videos, requiring significant computational overhead. Hence, we propose a new transformer-based architecture that trains 3.31 times faster than U-Nets (and is ~4.5 faster at inference). This allows us to efficiently train a text-to-video model with billions of parameters for the first time, reach state-of-the-art results on a number of benchmarks, and generate videos with substantially higher quality, temporal consistency, and motion complexity. The user studies showed that our model was favored by a large margin over the most recent methods. See our website at https://snap-research.github.io/snapvideo/.
Published: 2024

7. AToM: Amortized Text-to-Mesh using 2D Diffusion

Author: Qian, Guocheng, Cao, Junli, Siarohin, Aliaksandr, Kant, Yash, Wang, Chaoyang, Vasilkovsky, Michael, Lee, Hsin-Ying, Fang, Yuwei, Skorokhodov, Ivan, Zhuang, Peiye, Gilitschenski, Igor, Ren, Jian, Ghanem, Bernard, Aberman, Kfir, and Tulyakov, Sergey
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously. In contrast to existing text-to-3D methods that often entail time-consuming per-prompt optimization and commonly output representations other than polygonal meshes, AToM directly generates high-quality textured meshes in less than 1 second with around 10 times reduction in the training cost, and generalizes to unseen prompts. Our key idea is a novel triplane-based text-to-mesh architecture with a two-stage amortized optimization strategy that ensures stable training and enables scalability. Through extensive experiments on various prompt benchmarks, AToM significantly outperforms state-of-the-art amortized approaches with over 4 times higher accuracy (in DF415 dataset) and produces more distinguishable and higher-quality 3D outputs. AToM demonstrates strong generalizability, offering finegrained 3D assets for unseen interpolated prompts without further optimization during inference, unlike per-prompt solutions., Comment: 19 pages with appendix and references. Webpage: https://snap-research.github.io/AToM/
Published: 2024

8. PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning

Author: Zhang, Zhihan, Lee, Dong-Ho, Fang, Yuwei, Yu, Wenhao, Jia, Mengzhao, Jiang, Meng, and Barbieri, Francesco
Subjects: Computer Science - Computation and Language
Abstract: Instruction tuning has remarkably advanced large language models (LLMs) in understanding and responding to diverse human instructions. Despite the success in high-resource languages, its application in lower-resource ones faces challenges due to the imbalanced foundational abilities of LLMs across different languages, stemming from the uneven language distribution in their pre-training data. To tackle this issue, we propose pivot language guided generation (PLUG), an approach that utilizes a high-resource language, primarily English, as the pivot to enhance instruction tuning in lower-resource languages. It trains the model to first process instructions in the pivot language, and then produce responses in the target language. To evaluate our approach, we introduce a benchmark, X-AlpacaEval, of instructions in 4 languages (Chinese, Korean, Italian, and Spanish), each annotated by professional translators. Our approach demonstrates a significant improvement in the instruction-following abilities of LLMs by 29% on average, compared to directly responding in the target language alone. Further experiments validate the versatility of our approach by employing alternative pivot languages beyond English to assist languages where LLMs exhibit lower proficiency. Our code and data are available at https://github.com/ytyz1307zzh/PLUG.
Published: 2023

9. i-Code Studio: A Configurable and Composable Framework for Integrative AI

Author: Fang, Yuwei, Khademi, Mahmoud, Zhu, Chenguang, Yang, Ziyi, Pryzant, Reid, Xu, Yichong, Qian, Yao, Yoshioka, Takuya, Yuan, Lu, Zeng, Michael, and Huang, Xuedong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities. Integrative AI is one important direction to approach AGI, through combining multiple models to tackle complex multimodal tasks. However, there is a lack of a flexible and composable platform to facilitate efficient and effective model composition and coordination. In this paper, we propose the i-Code Studio, a configurable and composable framework for Integrative AI. The i-Code Studio orchestrates multiple pre-trained models in a finetuning-free fashion to conduct complex multimodal tasks. Instead of simple model composition, the i-Code Studio provides an integrative, flexible, and composable setting for developers to quickly and easily compose cutting-edge services and technologies tailored to their specific requirements. The i-Code Studio achieves impressive results on a variety of zero-shot multimodal tasks, such as video-to-text retrieval, speech-to-speech translation, and visual question answering. We also demonstrate how to quickly build a multimodal agent based on the i-Code Studio that can communicate and personalize for users.
Published: 2023

10. i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

Author: Yang, Ziyi, Khademi, Mahmoud, Xu, Yichong, Pryzant, Reid, Fang, Yuwei, Zhu, Chenguang, Chen, Dongdong, Qian, Yao, Gao, Mei, Chen, Yi-Ling, Gmyr, Robert, Kanda, Naoyuki, Codella, Noel, Xiao, Bin, Shi, Yu, Yuan, Lu, Yoshioka, Takuya, Zeng, Michael, and Huang, Xuedong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is an integrative system that leverages state-of-the-art single-modality encoders, combining their outputs with a new modality-fusing encoder in order to flexibly project combinations of modalities into a shared representational space. Next, language tokens are generated from these representations via an autoregressive decoder. The whole framework is pretrained end-to-end on a large collection of dual- and single-modality datasets using a novel text completion objective that can be generalized across arbitrary combinations of modalities. i-Code V2 matches or outperforms state-of-the-art single- and dual-modality baselines on 7 multimodal tasks, demonstrating the power of generative multimodal pretraining across a diversity of tasks and signals.
Published: 2023

11. Unifying Vision, Text, and Layout for Universal Document Processing

Author: Tang, Zineng, Yang, Ziyi, Wang, Guoxin, Fang, Yuwei, Liu, Yang, Zhu, Chenguang, Zeng, Michael, Zhang, Cha, and Bansal, Mohit
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation. With a novel Vision-Text-Layout Transformer, UDOP unifies pretraining and multi-domain downstream tasks into a prompt-based sequence generation scheme. UDOP is pretrained on both large-scale unlabeled document corpora using innovative self-supervised objectives and diverse labeled data. UDOP also learns to generate document images from text and layout modalities via masked image reconstruction. To the best of our knowledge, this is the first time in the field of document AI that one model simultaneously achieves high-quality neural document editing and content customization. Our method sets the state-of-the-art on 8 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites. UDOP ranks first on the leaderboard of the Document Understanding Benchmark., Comment: CVPR 2023
Published: 2022

12. MACSum: Controllable Summarization with Mixed Attributes

Author: Zhang, Yusen, Liu, Yang, Yang, Ziyi, Fang, Yuwei, Chen, Yulong, Radev, Dragomir, Zhu, Chenguang, Zeng, Michael, and Zhang, Rui
Subjects: Computer Science - Computation and Language
Abstract: Controllable summarization allows users to generate customized summaries with specified attributes. However, due to the lack of designated annotations of controlled summaries, existing works have to craft pseudo datasets by adapting generic summarization benchmarks. Furthermore, most research focuses on controlling single attributes individually (e.g., a short summary or a highly abstractive summary) rather than controlling a mix of attributes together (e.g., a short and highly abstractive summary). In this paper, we propose MACSum, the first human-annotated summarization dataset for controlling mixed attributes. It contains source texts from two domains, news articles and dialogues, with human-annotated summaries controlled by five designed attributes (Length, Extractiveness, Specificity, Topic, and Speaker). We propose two simple and effective parameter-efficient approaches for the new task of mixed controllable summarization based on hard prompt tuning and soft prefix tuning. Results and analysis demonstrate that hard prompt models yield the best performance on all metrics and human evaluations. However, mixed-attribute control is still challenging for summarization tasks. Our dataset and code are available at https://github.com/psunlpgroup/MACSum., Comment: TACL 2023
Published: 2022

13. Retrieval Augmentation for Commonsense Reasoning: A Unified Approach

Author: Yu, Wenhao, Zhu, Chenguang, Zhang, Zhihan, Wang, Shuohang, Zhang, Zhuosheng, Fang, Yuwei, and Jiang, Meng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: A common thread of retrieval-augmented methods in the existing literature focuses on retrieving encyclopedic knowledge, such as Wikipedia, which facilitates well-defined entity and relation spaces that can be modeled. However, applying such methods to commonsense reasoning tasks faces two unique challenges, i.e., the lack of a general large-scale corpus for retrieval and a corresponding effective commonsense retriever. In this paper, we systematically investigate how to leverage commonsense knowledge retrieval to improve commonsense reasoning tasks. We proposed a unified framework of retrieval-augmented commonsense reasoning (called RACo), including a newly constructed commonsense corpus with over 20 million documents and novel strategies for training a commonsense retriever. We conducted experiments on four different commonsense reasoning tasks. Extensive evaluation results showed that our proposed RACo can significantly outperform other knowledge-enhanced method counterparts, achieving new SoTA performance on the CommonGen and CREAK leaderboards., Comment: EMNLP 2022 (main)
Published: 2022

14. Task Compass: Scaling Multi-task Pre-training with Task Prefix

Author: Zhang, Zhuosheng, Wang, Shuohang, Xu, Yichong, Fang, Yuwei, Yu, Wenhao, Liu, Yang, Zhao, Hai, Zhu, Chenguang, and Zeng, Michael
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Leveraging task-aware annotated data as supervised signals to assist with self-supervised learning on large-scale unlabeled data has become a new trend in pre-training language models. Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks. To tackle the challenge, we propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks. We conduct extensive experiments on 40 datasets, which show that our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships. The task relationships reflected by the prefixes align transfer learning performance between tasks. They also suggest directions for data augmentation with complementary tasks, which help our model achieve human-parity results on commonsense reasoning leaderboards. Code is available at https://github.com/cooelf/CompassMTL, Comment: Findings of EMNLP 2022
Published: 2022

15. i-Code: An Integrative and Composable Multimodal Learning Framework

Author: Yang, Ziyi, Fang, Yuwei, Zhu, Chenguang, Pryzant, Reid, Chen, Dongdong, Shi, Yu, Xu, Yichong, Qian, Yao, Gao, Mei, Chen, Yi-Ling, Lu, Liyang, Xie, Yujia, Gmyr, Robert, Codella, Noel, Kanda, Naoyuki, Xiao, Bin, Yuan, Lu, Yoshioka, Takuya, Zeng, Michael, and Huang, Xuedong
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. In this framework, data from each modality are first given to pretrained single-modality encoders. The encoder outputs are then integrated with a multimodal fusion network, which uses novel attention mechanisms and other architectural innovations to effectively combine information from the different modalities. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Unlike previous research using only video for pretraining, the i-Code framework can dynamically process single, dual, and triple-modality data during training and inference, flexibly projecting different combinations of modalities into a single representation space. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining.
Published: 2022

16. Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data

Author: Wang, Shuohang, Xu, Yichong, Fang, Yuwei, Liu, Yang, Sun, Siqi, Xu, Ruochen, Zhu, Chenguang, and Zeng, Michael
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval
Abstract: Retrieval-based methods have been shown to be effective in NLP tasks via introducing external knowledge. However, the indexing and retrieving of large-scale corpora bring considerable computational cost. Surprisingly, we found that REtrieving from the traINing datA (REINA) only can lead to significant gains on multiple NLG and NLU tasks. We retrieve the labeled training instances most similar to the input text and then concatenate them with the input to feed into the model to generate the output. Experimental results show that this simple method can achieve significantly better performance on a variety of NLU and NLG tasks, including summarization, machine translation, language modeling, and question answering tasks. For instance, our proposed method achieved state-of-the-art results on XSum, BigPatent, and CommonsenseQA. Our code is released, https://github.com/microsoft/REINA ., Comment: Accept to ACL 2022 main conference
Published: 2022

17. Leveraging Knowledge in Multilingual Commonsense Reasoning

Author: Fang, Yuwei, Wang, Shuohang, Xu, Yichong, Xu, Ruochen, Sun, Siqi, Zhu, Chenguang, and Zeng, Michael
Subjects: Computer Science - Computation and Language
Abstract: Commonsense reasoning (CSR) requires the model to be equipped with general world knowledge. While CSR is a language-agnostic process, most comprehensive knowledge sources are in few popular languages, especially English. Thus, it remains unclear how to effectively conduct multilingual commonsense reasoning (XCSR) for various languages. In this work, we propose to utilize English knowledge sources via a translate-retrieve-translate (TRT) strategy. For multilingual commonsense questions and choices, we collect related knowledge via translation and retrieval from the knowledge sources. The retrieved knowledge is then translated into the target language and integrated into a pre-trained multilingual language model via visible knowledge attention. Then we utilize a diverse of 4 English knowledge sources to provide more comprehensive coverage of knowledge in different formats. Extensive results on the XCSR benchmark demonstrate that TRT with external knowledge can significantly improve multilingual commonsense reasoning in both zero-shot and translate-train settings, outperforming 3.3 and 3.6 points over the previous state-of-the-art on XCSR benchmark datasets (X-CSQA and X-CODAH)., Comment: First place in XCSR Leaderboard: https://inklab.usc.edu//XCSR/leaderboard. Work in progress
Published: 2021

18. Dict-BERT: Enhancing Language Model Pre-training with Dictionary

Author: Yu, Wenhao, Zhu, Chenguang, Fang, Yuwei, Yu, Donghan, Wang, Shuohang, Xu, Yichong, Zeng, Michael, and Jiang, Meng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Pre-trained language models (PLMs) aim to learn universal language representations by conducting self-supervised training tasks on large-scale corpora. Since PLMs capture word semantics in different contexts, the quality of word representations highly depends on word frequency, which usually follows a heavy-tailed distributions in the pre-training corpus. Therefore, the embeddings of rare words on the tail are usually poorly optimized. In this work, we focus on enhancing language model pre-training by leveraging definitions of the rare words in dictionaries (e.g., Wiktionary). To incorporate a rare word definition as a part of input, we fetch its definition from the dictionary and append it to the end of the input text sequence. In addition to training with the masked language modeling objective, we propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions to enhance language modeling representation with dictionary. We evaluate the proposed Dict-BERT model on the language understanding benchmark GLUE and eight specialized domain benchmark datasets. Extensive experiments demonstrate that Dict-BERT can significantly improve the understanding of rare words and boost model performance on various NLP downstream tasks., Comment: ACL 2022 (Findings)
Published: 2021

19. KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

Author: Yu, Donghan, Zhu, Chenguang, Fang, Yuwei, Yu, Wenhao, Wang, Shuohang, Xu, Yichong, Ren, Xiang, Yang, Yiming, and Zeng, Michael
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Current Open-Domain Question Answering (ODQA) model paradigm often contains a retrieving module and a reading module. Given an input question, the reading module predicts the answer from the relevant passages which are retrieved by the retriever. The recent proposed Fusion-in-Decoder (FiD), which is built on top of the pretrained generative model T5, achieves the state-of-the-art performance in the reading module. Although being effective, it remains constrained by inefficient attention on all retrieved passages which contain a lot of noise. In this work, we propose a novel method KG-FiD, which filters noisy passages by leveraging the structural relationship among the retrieved passages with a knowledge graph. We initiate the passage node embedding from the FiD encoder and then use graph neural network (GNN) to update the representation for reranking. To improve the efficiency, we build the GNN on top of the intermediate layer output of the FiD encoder and only pass a few top reranked passages into the higher layers of encoder and decoder for answer generation. We also apply the proposed GNN based reranking method to enhance the passage retrieval results in the retrieving module. Extensive experiments on common ODQA benchmark datasets (Natural Question and TriviaQA) demonstrate that KG-FiD can improve vanilla FiD by up to 1.5% on answer exact match score and achieve comparable performance with FiD with only 40% of computation cost., Comment: Accepted by ACL 2022
Published: 2021

20. Does Knowledge Help General NLU? An Empirical Study

Author: Xu, Ruochen, Fang, Yuwei, Zhu, Chenguang, and Zeng, Michael
Subjects: Computer Science - Computation and Language
Abstract: It is often observed in knowledge-centric tasks (e.g., common sense question and answering, relation classification) that the integration of external knowledge such as entity representation into language models can help provide useful information to boost the performance. However, it is still unclear whether this benefit can extend to general natural language understanding (NLU) tasks. In this work, we empirically investigated the contribution of external knowledge by measuring the end-to-end performance of language models with various knowledge integration methods. We find that the introduction of knowledge can significantly improve the results on certain tasks while having no adverse effects on other tasks. We then employ mutual information to reflect the difference brought by knowledge and a neural interpretation model to reveal how a language model utilizes external knowledge. Our study provides valuable insights and guidance for practitioners to equip NLP models with knowledge., Comment: Work in Progress
Published: 2021

21. RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling

Author: Zhang, Yizhe, Sun, Siqi, Gao, Xiang, Fang, Yuwei, Brockett, Chris, Galley, Michel, Gao, Jianfeng, and Dolan, Bill
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent advances in large-scale pre-training such as GPT-3 allow seemingly high quality text to be generated from a given prompt. However, such generation systems often suffer from problems of hallucinated facts, and are not inherently designed to incorporate useful external information. Grounded generation models appear to offer remedies, but their training typically relies on rarely-available parallel data where information-relevant documents are provided for context. We propose a framework that alleviates this data constraint by jointly training a grounded generator and document retriever on the language model signal. The model learns to reward retrieval of the documents with the highest utility in generation, and attentively combines them using a Mixture-of-Experts (MoE) ensemble to generate follow-on text. We demonstrate that both generator and retriever can take advantage of this joint training and work synergistically to produce more informative and relevant text in both prose and dialogue generation., Comment: accepted by AAAI-22, camera ready version
Published: 2021

22. LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

Author: Sun, Siqi, Chen, Yen-Chun, Li, Linjie, Wang, Shuohang, Fang, Yuwei, and Liu, Jingjing
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal pre-training has propelled great advancement in vision-and-language research. These large-scale pre-trained models, although successful, fatefully suffer from slow inference speed due to enormous computation cost mainly from cross-modal attention in Transformer architecture. When applied to real-life applications, such latency and computation demand severely deter the practical use of pre-trained models. In this paper, we study Image-text retrieval (ITR), the most mature scenario of V+L application, which has been widely studied even prior to the emergence of recent pre-trained models. We propose a simple yet highly effective approach, LightningDOT that accelerates the inference time of ITR by thousands of times, without sacrificing accuracy. LightningDOT removes the time-consuming cross-modal attention by pre-training on three novel learning objectives, extracting feature indexes offline, and employing instant dot-product matching with further re-ranking, which significantly speeds up retrieval process. In fact, LightningDOT achieves new state of the art across multiple ITR benchmarks such as Flickr30k, COCO and Multi30K, outperforming existing pre-trained models that consume 1000x magnitude of computational hours. Code and pre-training checkpoints are available at https://github.com/intersun/LightningDOT., Comment: NAACL 2021
Published: 2021

23. Cross-Thought for Sentence Encoder Pre-training

Author: Wang, Shuohang, Fang, Yuwei, Sun, Siqi, Gan, Zhe, Cheng, Yu, Jiang, Jing, and Liu, Jingjing
Subjects: Computer Science - Computation and Language
Abstract: In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering. Instead of using the original signals of full sentences, we train a Transformer-based sequence encoder over a large set of short sequences, which allows the model to automatically select the most useful information for predicting masked words. Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders trained with continuous sentence signals as well as traditional masked language modeling baselines. Our proposed approach also achieves new state of the art on HotpotQA (full-wiki setting) by improving intermediate information retrieval performance., Comment: Accepted by EMNLP 2020
Published: 2020

24. Contrastive Distillation on Intermediate Representations for Language Model Compression

Author: Sun, Siqi, Gan, Zhe, Cheng, Yu, Fang, Yuwei, Wang, Shuohang, and Liu, Jingjing
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Existing language model compression methods mostly use a simple L2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one. Although widely used, this objective by design assumes that all the dimensions of hidden representations are independent, failing to capture important structural knowledge in the intermediate layers of the teacher network. To achieve better distillation efficacy, we propose Contrastive Distillation on Intermediate Representations (CoDIR), a principled knowledge distillation framework where the student is trained to distill knowledge through intermediate layers of the teacher via a contrastive objective. By learning to distinguish positive sample from a large set of negative samples, CoDIR facilitates the student's exploitation of rich information in teacher's hidden layers. CoDIR can be readily applied to compress large-scale language models in both pre-training and finetuning stages, and achieves superb performance on the GLUE benchmark, outperforming state-of-the-art compression methods., Comment: Accepted by EMNLP 2020
Published: 2020

25. Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

Author: Wang, Shuohang, Zhou, Luowei, Gan, Zhe, Chen, Yen-Chun, Fang, Yuwei, Sun, Siqi, Cheng, Yu, and Liu, Jingjing
Subjects: Computer Science - Computation and Language
Abstract: Transformer has become ubiquitous in the deep learning field. One of the key ingredients that destined its success is the self-attention mechanism, which allows fully-connected contextual encoding over input tokens. However, despite its effectiveness in modeling short sequences, self-attention suffers when handling inputs with extreme long-range dependencies, as its complexity grows quadratically with respect to the sequence length. Therefore, long sequences are often encoded by Transformer in chunks using a sliding window. In this paper, we propose Cluster-Former, a novel clustering-based sparse Transformer to perform attention across chunked sequences. The proposed framework is pivoted on two unique types of Transformer layer: Sliding-Window Layer and Cluster-Former Layer, which encode local sequence information and global context jointly and iteratively. This new design allows information integration beyond local windows, which is especially beneficial for question answering (QA) tasks that rely on long-range dependencies. Experiments show that Cluster-Former achieves state-of-the-art performance on several major QA benchmarks., Comment: ACL Findings 2021, 11 pages
Published: 2020

26. Accelerating Real-Time Question Answering via Question Generation

Author: Fang, Yuwei, Wang, Shuohang, Gan, Zhe, Sun, Siqi, Liu, Jingjing, and Zhu, Chenguang
Subjects: Computer Science - Computation and Language
Abstract: Although deep neural networks have achieved tremendous success for question answering (QA), they are still suffering from heavy computational and energy cost for real product deployment. Further, existing QA systems are bottlenecked by the encoding time of real-time questions with neural networks, thus suffering from detectable latency in deployment for large-volume traffic. To reduce the computational cost and accelerate real-time question answering (RTQA) for practical usage, we propose to remove all the neural networks from online QA systems, and present Ocean-Q (an Ocean of Questions), which introduces a new question generation (QG) model to generate a large pool of QA pairs offline, then in real time matches an input question with the candidate QA pool to predict the answer without question encoding. Ocean-Q can be readily deployed in existing distributed database systems or search engine for large-scale query usage, and much greener with no additional cost for maintaining large neural networks. Experiments on SQuAD(-open) and HotpotQA benchmarks demonstrate that Ocean-Q is able to accelerate the fastest state-of-the-art RTQA system by 4X times, with only a 3+% accuracy drop.
Published: 2020

27. FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

Author: Fang, Yuwei, Wang, Shuohang, Gan, Zhe, Sun, Siqi, and Liu, Jingjing
Subjects: Computer Science - Computation and Language
Abstract: Large-scale cross-lingual language models (LM), such as mBERT, Unicoder and XLM, have achieved great success in cross-lingual representation learning. However, when applied to zero-shot cross-lingual transfer tasks, most existing methods use only single-language input for LM finetuning, without leveraging the intrinsic cross-lingual alignment between different languages that proves essential for multilingual tasks. In this paper, we propose FILTER, an enhanced fusion method that takes cross-lingual data as input for XLM finetuning. Specifically, FILTER first encodes text input in the source language and its translation in the target language independently in the shallow layers, then performs cross-language fusion to extract multilingual knowledge in the intermediate layers, and finally performs further language-specific encoding. During inference, the model makes predictions based on the text input in the target language and its translation in the source language. For simple tasks such as classification, translated text in the target language shares the same label as the source language. However, this shared label becomes less accurate or even unavailable for more complex tasks such as question answering, NER and POS tagging. To tackle this issue, we further propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language. Extensive experiments demonstrate that FILTER achieves new state of the art on two challenging multilingual multi-task benchmarks, XTREME and XGLUE., Comment: Accepted to AAAI 2021; Top-1 Performance on XTREME (https://sites.research.google/xtreme, September 8, 2020) and XGLUE (https://microsoft.github.io/XGLUE, September 14, 2020) benchmark
Published: 2020

28. Tunable random laser in capillary with Nile red solution and TiO2 nanoparticles

Author: Fang, Yuwei, Hu, Jigang, and Huang, Chan
Published: 2023
Full Text: View/download PDF

29. Hierarchical Graph Network for Multi-hop Question Answering

Author: Fang, Yuwei, Sun, Siqi, Gan, Zhe, Pillai, Rohit, Wang, Shuohang, and Liu, Jingjing
Subjects: Computer Science - Computation and Language
Abstract: In this paper, we present Hierarchical Graph Network (HGN) for multi-hop question answering. To aggregate clues from scattered texts across multiple paragraphs, a hierarchical graph is created by constructing nodes on different levels of granularity (questions, paragraphs, sentences, entities), the representations of which are initialized with pre-trained contextual encoders. Given this hierarchical graph, the initial node representations are updated through graph propagation, and multi-hop reasoning is performed via traversing through the graph edges for each subsequent sub-task (e.g., paragraph selection, supporting facts extraction, answer prediction). By weaving heterogeneous nodes into an integral unified graph, this hierarchical differentiation of node granularity enables HGN to support different question answering sub-tasks simultaneously. Experiments on the HotpotQA benchmark demonstrate that the proposed model achieves new state of the art, outperforming existing multi-hop QA approaches., Comment: Accepted to EMNLP 2020
Published: 2019

30. A Novel Chinese Sarcasm Detection Model Based on Retrospective Reader

Author: Zhang, Lei, Zhao, Xiaoming, Song, Xueqiang, Fang, Yuwei, Li, Dong, Wang, Haizhou, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Þór Jónsson, Björn, editor, Gurrin, Cathal, editor, Tran, Minh-Triet, editor, Dang-Nguyen, Duc-Tien, editor, Hu, Anita Min-Chun, editor, Huynh Thi Thanh, Binh, editor, and Huet, Benoit, editor
Published: 2022
Full Text: View/download PDF

31. Stochastic Answer Networks for SQuAD 2.0

Author: Liu, Xiaodong, Li, Wei, Fang, Yuwei, Kim, Aerin, Duh, Kevin, and Gao, Jianfeng
Subjects: Computer Science - Computation and Language
Abstract: This paper presents an extension of the Stochastic Answer Network (SAN), one of the state-of-the-art machine reading comprehension models, to be able to judge whether a question is unanswerable or not. The extended SAN contains two components: a span detector and a binary classifier for judging whether the question is unanswerable, and both components are jointly optimized. Experiments show that SAN achieves the results competitive to the state-of-the-art on Stanford Question Answering Dataset (SQuAD) 2.0. To facilitate the research on this field, we release our code: https://github.com/kevinduh/san_mrc., Comment: 6 pages, 2 figures and 2 tables
Published: 2018

32. Bandwidth function matrix-based spectral deconvolution with alternate minimization method

Author: Huang, Chan, Wu, Su, Chang, Yuyang, Fang, Yuwei, and Qiu, Huaili
Published: 2022
Full Text: View/download PDF

33. Research on 3D Geological and Numerical Unified Model of in Mining Slope Based on Multi-Source Data.

Author: Huang, Juehao, Fang, Yuwei, Wang, Chao, Zhang, Zhihui, and Li, Yinan
Subjects: MINING engineering, ENGINEERING models, GEOLOGICAL modeling, NUMERICAL calculations, GEOLOGICAL research, THREE-dimensional modeling
Abstract: As mining engineering progresses into the deep excavation phase, the intensification of high pressure, high temperature, strong disturbances, and complex geological conditions becomes increasingly prominent. Researchers perform stability analysis on the excavation area to reduce potential safety hazards during the extraction process. Developing a detailed numerical calculation model that accurately reflects the true geological structure is essential for numerical simulation analysis in mining engineering. Based on the excellent 3D geological modeling capabilities of 3D Mine software, this paper introduces a new 3D geological and numerical unified modeling method (3DMine-Rhino-HyperMesh) involving multi-software coupling and details the specific steps and concepts of this modeling approach. Subsequently, using a certain open-pit mine in Panzhihua as a backdrop, a detailed geological and numerical unified model is established, reflecting the true geological structure of the mining area, and the potential failure mechanisms of the mine slope are analyzed. The results indicate that the modeling method aligns well with the actual geological conditions, enhancing the grid quality of the numerical model and offering a new modeling approach for simulating and analyzing large complex geological entities in mining operations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. A Novel Chinese Sarcasm Detection Model Based on Retrospective Reader

Author: Zhang, Lei, primary, Zhao, Xiaoming, additional, Song, Xueqiang, additional, Fang, Yuwei, additional, Li, Dong, additional, and Wang, Haizhou, additional
Published: 2022
Full Text: View/download PDF

35. Speckle reduction in laser projection based on a rotating ball lens

Author: Deng, Linxiao, Dong, Tianhao, Fang, Yuwei, Yang, Yuhua, Gu, Chun, Ming, Hai, and Xu, Lixin
Published: 2021
Full Text: View/download PDF

36. Enhancing Architectural Education through Artificial Intelligence: A Case Study of an AI-Assisted Architectural Programming and Design Course.

Author: Jin, Shitao, Tu, Huijun, Li, Jiangfeng, Fang, Yuwei, Qu, Zhang, Xu, Fan, Liu, Kun, and Lin, Yiquan
Subjects: ARCHITECTURAL education, ARCHITECTURAL design, ARTIFICIAL intelligence, TEACHING models, TECHNOLOGY education, ARCHITECTURAL designs
Abstract: This study addresses the current lack of research on the effectiveness assessment of Artificial Intelligence (AI) technology in architectural education. Our aim is to evaluate the impact of AI-assisted architectural teaching on student learning. To achieve this, we developed an AI-embedded teaching model. A total of 24 students from different countries participated in this 9-week course, completing a comprehensive analysis of architectural programming and design using AI technologies. This study conducted questionnaire surveys with students at both midterm and final stages of the course, followed by structured interviews after the course completion, to explore the effectiveness and application status of the teaching model. The results indicate that the AI-embedded teaching model positively and effectively influenced student learning. The "innovative capability" and "work efficiency" of AI technologies were identified as key factors affecting the effectiveness of the teaching model. Furthermore, the study revealed a close integration of AI technologies with architectural programming but identified challenges in the uncontrollable expression of architectural design outcomes. Student utilization of AI technologies appeared fragmented, lacking a systematic approach. Lastly, the study provides targeted optimization suggestions based on the current application status of AI technologies among students. This research offers theoretical and practical support for the further integration of AI technologies in architectural education. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Dynamic Numerical Simulation and Transfer Learning-Based Rapid Rock Identification during Measurement While Drilling (MWD).

Author: Fang, Yuwei, Wu, Zhenjun, Jiang, Lianghua, Tang, Hua, Fu, Xiaodong, and Shen, Junxin
Subjects: DYNAMIC simulation, ARTIFICIAL neural networks, COMPUTER simulation, TRANSFER of training
Abstract: In constructing rapid rock identification models for measurement while drilling (MWD) via neural network methods, collecting actual drilling data to train the model is extremely time-consuming and labor-intensive. This requires extensive drilling experiments in various rock types, resulting in limited neural network training data for rock identification that covers a limited range of rock types. To suitably address this issue, a dynamic numerical simulation model for rock drilling is established that generates extensive drilling data. The input parameters for the simulations include torque, drill bit rotation speed, and drilling speed. A neural network model is then developed for rock classification using large datasets from dynamic numerical simulations, specifically those of granite, limestone, and sandstone. Building upon this model, transfer learning is appropriately applied to store the knowledge obtained in the rock identification based on the neural network model. Further training through transfer learning is conducted with smaller datasets obtained during actual drilling, making the model suitable for practical rock identification and prediction in the drilling processes. The neural network rock classification model, incorporating dynamic numerical simulation and transfer learning, achieves a prediction accuracy of 99.36% for granite, 99.53% for sandstone, and 99.82% for limestone. This reveals an enhancement in prediction accuracy of up to 22.94% compared to the models without transfer learning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Strong coupling of the guided modes with BIC generation in graphene-based one-dimensional dielectric gratings

Author: Wang, Yuchen, primary, Qiu, Wei, additional, Chen, Lu, additional, Bi, Chengchen, additional, Jiang, Xiangyun, additional, Zhou, Leiming, additional, Fang, Yuwei, additional, and Hu, Jigang, additional
Published: 2023
Full Text: View/download PDF

39. Polarization-sensitive optical absorption modulation driven by toroidal-dipole resonance in asymmetric dielectric tetramer metasurface

Author: Fei, Wu, primary, Chen, Lu, additional, Qiu, Wei, additional, Dai, LiangKun, additional, Jiang, XiaoYun, additional, Fang, Yuwei, additional, Yu, Youlong, additional, Zhan, Qiwen, additional, and Hu, Jigang, additional
Published: 2023
Full Text: View/download PDF

40. Permanent displacement characteristics of landslides taking into consideration the frequency and energy behaviour of historical seismic ground motions

Author: Tang, Hua, primary, Fang, Yuwei, additional, Cheng, Xu, additional, Wu, Zhenjun, additional, and Qin, Yuqiao, additional
Published: 2023
Full Text: View/download PDF

41. i-Code: An Integrative and Composable Multimodal Learning Framework

Author: Yang, Ziyi, primary, Fang, Yuwei, additional, Zhu, Chenguang, additional, Pryzant, Reid, additional, Chen, DongDong, additional, Shi, Yu, additional, Xu, Yichong, additional, Qian, Yao, additional, Gao, Mei, additional, Chen, Yi-Ling, additional, Lu, Liyang, additional, Xie, Yujia, additional, Gmyr, Robert, additional, Codella, Noel, additional, Kanda, Naoyuki, additional, Xiao, Bin, additional, Yuan, Lu, additional, Yoshioka, Takuya, additional, Zeng, Michael, additional, and Huang, Xuedong, additional
Published: 2023
Full Text: View/download PDF

42. Unifying Vision, Text, and Layout for Universal Document Processing

Author: Tang, Zineng, primary, Yang, Ziyi, additional, Wang, Guoxin, additional, Fang, Yuwei, additional, Liu, Yang, additional, Zhu, Chenguang, additional, Zeng, Michael, additional, Zhang, Cha, additional, and Bansal, Mohit, additional
Published: 2023
Full Text: View/download PDF

43. Discovery of Anti-Hypercholesterolemia Agents Targeting LXRα from Marine Microorganism-Derived Natural Products.

Author: Fang, Yuwei, She, Jianglian, Zhang, Xi, Gu, Tanwei, Xie, Danni, Luo, Xiaowei, Yi, Xiangxi, Gao, Chenghai, Liu, Yonghong, Zhang, Cuixian, Tang, Lan, and Zhou, Xuefeng
Published: 2024
Full Text: View/download PDF

44. Depression Detection in Social Media using XLNet with Topic Distributions

Author: Wang Gao Wang Gao, Baoping Yang Wang Gao, Yuwei Wang Baoping Yang, and Yuan Fang Yuwei Wang
Subjects: General Computer Science
Abstract: Due to the complexity of depressive diseases, detecting depressed users on social media platforms is a challenging task. In recent years, with an increasing number of users of social media sites, this field of re-search has begun to develop rapidly. To improve the detection performance of traditional methods, two challenges need to be overcome. The first challenge is that textual content posted on social media plat-forms suffers from serious data sparseness. The second one is how to effectively use emotions, user in-formation, and behavior characteristics to predict potentially depressed users. In this paper, we propose a novel model called the Topic-enriched Depression Detection Model (TDDM), which combines topic in-formation and user behavior to predict depressed users on social media platforms. TDDM first employs a Conditional Random Field Regularized Topic Model (CRFTM) to extract the topic knowledge of user posts. XLNet is used to encode posts to further expand the semantic features of short texts. Finally, we integrate user behavior features into TDDM to improve the detection performance of the model. The ex-perimental results on a real-world Twitter dataset demonstrate that the proposed model performs better than baseline models in detecting depressed users at both pseudo-document level and user level. &nbsp
Published: 2022

45. Tunable random laser in capillary with Nile red solution and TiO2 nanoparticles

Author: Fang, Yuwei, primary, Hu, Jigang, additional, and Huang, Chan, additional
Published: 2023
Full Text: View/download PDF

46. Polarization-selective narrow band dual-toroidal-dipole resonances in a symmetry-broken dielectric tetramer metamaterial

Author: Fei, Wu, primary, Jiang, Xiaoyun, additional, Dai, Liangkun, additional, Qiu, Wei, additional, Fang, Yuwei, additional, Li, Dongmei, additional, Hu, Jigang, additional, and Zhan, Qiwen, additional
Published: 2023
Full Text: View/download PDF

47. Research on the Restrictive Factors of Vigorous Promotion of Prefabricated Buildings in Yancheng under the Background of “Double Carbon”

Author: Sun, Houchao, primary, Fang, Yuwei, additional, Yin, Minggan, additional, and Shi, Feiting, additional
Published: 2023
Full Text: View/download PDF

48. High-resolution XPS and DFT investigations into Al-modified Phillips CrOx/SiO2 catalysts

Author: Ma, Yue, Wang, Lisong, Liu, Zhen, Cheng, Ruihua, Zhong, Lei, Yang, Yun, He, Xuelian, Fang, Yuwei, Terano, Minoru, and Liu, Boping
Published: 2015
Full Text: View/download PDF

49. The Application of Improved DEA Model in Evaluation of China’s Production Comprehensive Efficiency

Author: Cui, Yuquan, Shi, Lejun, Fang, Yuwei, and Zhang, Tianbiao, editor
Published: 2012
Full Text: View/download PDF

50. Strong coupling of the guided modes with BIC generation in graphene-based one-dimensional dielectric gratings

Author: Fang, Zheyu, Tanaka, Takuo, Wang, Yuchen, Qiu, Wei, Chen, Lu, Bi, Chengchen, Jiang, Xiaoyun, Zhou, Leiming, Fang, Yuwei, and Hu, Jigang
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

202 results on '"Fang, Yuwei"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources