Author: "Ji, Ziwei" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Ji, Ziwei"' showing total 169 results

Start Over Author "Ji, Ziwei"

169 results on '"Ji, Ziwei"'

1. Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

Author: Bae, Sangmin, Fisch, Adam, Harutyunyan, Hrayr, Ji, Ziwei, Kim, Seungyeon, and Schuster, Tal
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large language models (LLMs) are expensive to deploy. Parameter sharing offers a possible path towards reducing their size and cost, but its effectiveness in modern LLMs remains fairly limited. In this work, we revisit "layer tying" as form of parameter sharing in Transformers, and introduce novel methods for converting existing LLMs into smaller "Recursive Transformers" that share parameters across layers, with minimal loss of performance. Here, our Recursive Transformers are efficiently initialized from standard pretrained Transformers, but only use a single block of unique layers that is then repeated multiple times in a loop. We further improve performance by introducing Relaxed Recursive Transformers that add flexibility to the layer tying constraint via depth-wise low-rank adaptation (LoRA) modules, yet still preserve the compactness of the overall model. We show that our recursive models (e.g., recursive Gemma 1B) outperform both similar-sized vanilla pretrained models (such as TinyLlama 1.1B and Pythia 1B) and knowledge distillation baselines -- and can even recover most of the performance of the original "full-size" model (e.g., Gemma 2B with no shared parameters). Finally, we propose Continuous Depth-wise Batching, a promising new inference paradigm enabled by the Recursive Transformer when paired with early exiting. In a theoretical analysis, we show that this has the potential to lead to significant (2-3x) gains in inference throughput., Comment: 48 pages, 17 figures, 17 tables
Published: 2024

2. ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

Author: Gu, Yuzhe, Ji, Ziwei, Zhang, Wenwei, Lyu, Chengqi, Lin, Dahua, and Chen, Kai
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucinations, this paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset and improves the accuracy of the hallucination annotator. Based on the Expectation Maximization (EM) algorithm, in each iteration, the framework first applies a hallucination annotation pipeline to annotate a scaled dataset and then trains a more accurate hallucination annotator on the dataset. This new hallucination annotator is adopted in the hallucination annotation pipeline used for the next iteration. Extensive experimental results demonstrate that the finally obtained hallucination annotator with only 7B parameters surpasses the performance of GPT-4 and obtains new state-of-the-art hallucination detection results on HaluEval and HalluQA by zero-shot inference. Such an annotator can not only evaluate the hallucination levels of various LLMs on the large-scale dataset but also help to mitigate the hallucination of LLMs generations, with the Natural Language Inference (NLI) metric increasing from 25% to 37% on HaluEval., Comment: 9 pages
Published: 2024

3. LLM Internal States Reveal Hallucination Risk Faced With a Query

Author: Ji, Ziwei, Chen, Delong, Ishii, Etsuko, Cahyawijaya, Samuel, Bang, Yejin, Wilie, Bryan, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: The hallucination problem of Large Language Models (LLMs) significantly limits their reliability and trustworthiness. Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries. Inspired by this, our paper investigates whether LLMs can estimate their own hallucination risk before response generation. We analyze the internal mechanisms of LLMs broadly both in terms of training data sources and across 15 diverse Natural Language Generation (NLG) tasks, spanning over 700 datasets. Our empirical analysis reveals two key insights: (1) LLM internal states indicate whether they have seen the query in training data or not; and (2) LLM internal states show they are likely to hallucinate or not regarding the query. Our study explores particular neurons, activation layers, and tokens that play a crucial role in the LLM perception of uncertainty and hallucination risk. By a probing estimator, we leverage LLM self-assessment, achieving an average hallucination estimation accuracy of 84.32\% at run time.
Published: 2024

4. Efficient Document Ranking with Learnable Late Interactions

Author: Ji, Ziwei, Jain, Himanshu, Veit, Andreas, Reddi, Sashank J., Jayasumana, Sadeep, Rawat, Ankit Singh, Menon, Aditya Krishna, Yu, Felix, and Kumar, Sanjiv
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer based on query and document token embeddings. However, these lightweight scorers are often hand-crafted, and there is no understanding of their approximation power; further, such scorers require access to individual document token embeddings, which imposes an increased latency and storage burden. In this paper, we propose novel learnable late-interaction models (LITE) that resolve these issues. Theoretically, we prove that LITE is a universal approximator of continuous scoring functions, even for relatively small embedding dimension. Empirically, LITE outperforms previous late-interaction models such as ColBERT on both in-domain and zero-shot re-ranking tasks. For instance, experiments on MS MARCO passage re-ranking show that LITE not only yields a model with better generalization, but also lowers latency and requires 0.25x storage compared to ColBERT.
Published: 2024

5. ANAH: Analytical Annotation of Hallucinations in Large Language Models

Author: Ji, Ziwei, Gu, Yuzhe, Zhang, Wenwei, Lyu, Chengqi, Lin, Dahua, and Chen, Kai
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Reducing the `$\textit{hallucination}$' problem of Large Language Models (LLMs) is crucial for their wide applications. A comprehensive and fine-grained measurement of the hallucination is the first key step for the governance of this issue but is under-explored in the community. Thus, we present $\textbf{ANAH}$, a bilingual dataset that offers $\textbf{AN}$alytical $\textbf{A}$nnotation of $\textbf{H}$allucinations in LLMs within Generative Question Answering. Each answer sentence in our dataset undergoes rigorous annotation, involving the retrieval of a reference fragment, the judgment of the hallucination type, and the correction of hallucinated content. ANAH consists of ~12k sentence-level annotations for ~4.3k LLM responses covering over 700 topics, constructed by a human-in-the-loop pipeline. Thanks to the fine granularity of the hallucination annotations, we can quantitatively confirm that the hallucinations of LLMs progressively accumulate in the answer and use ANAH to train and evaluate hallucination annotators. We conduct extensive experiments on studying generative and discriminative annotators and show that, although current open-source LLMs have difficulties in fine-grained hallucination annotation, the generative annotator trained with ANAH can surpass all open-source LLMs and GPT-3.5, obtain performance competitive with GPT-4, and exhibits better generalization ability on unseen questions., Comment: Accepted by ACL 2024
Published: 2024

6. High-Dimension Human Value Representation in Large Language Models

Author: Cahyawijaya, Samuel, Chen, Delong, Bang, Yejin, Khalatbari, Leila, Wilie, Bryan, Ji, Ziwei, Ishii, Etsuko, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The widespread application of Large Language Models (LLMs) across various tasks and fields has necessitated the alignment of these models with human values and preferences. Given various approaches of human value alignment, ranging from Reinforcement Learning with Human Feedback (RLHF), to constitutional learning, etc. there is an urgent need to understand the scope and nature of human values injected into these models before their release. There is also a need for model alignment without a costly large scale human annotation effort. We propose UniVaR, a high-dimensional representation of human value distributions in LLMs, orthogonal to model architecture and training data. Trained from the value-relevant output of eight multilingual LLMs and tested on the output from four multilingual LLMs, namely LlaMA2, ChatGPT, JAIS and Yi, we show that UniVaR is a powerful tool to compare the distribution of human values embedded in different LLMs with different langauge sources. Through UniVaR, we explore how different LLMs prioritize various values in different languages and cultures, shedding light on the complex interplay between human values and language modeling.
Published: 2024

7. Conventional versus rubber band traction-assisted endoscopic submucosal dissection for rectal neuroendocrine tumors: a single-center retrospective study (with video)

Author: Peng, Jinbang, Lin, Jiajia, Fang, Lina, Zhou, Jingjing, Song, Yaqi, Yang, Chaoyu, Zhang, Yu, Gu, Binbin, Ji, Ziwei, Lu, Yandi, Mao, Xinli, and Yan, Lingling
Published: 2024
Full Text: View/download PDF

8. Contrastive Learning for Inference in Dialogue

Author: Ishii, Etsuko, Xu, Yan, Wilie, Bryan, Ji, Ziwei, Lovenia, Holy, Chung, Willy, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker. While recent large language models show remarkable advances in inference tasks, their performance in inductive reasoning, where not all information is present in the context, is far behind deductive reasoning. In this paper, we analyze the behavior of the models based on the task difficulty defined by the semantic information gap -- which distinguishes inductive and deductive reasoning (Johnson-Laird, 1988, 1993). Our analysis reveals that the disparity in information between dialogue contexts and desired inferences poses a significant challenge to the inductive inference process. To mitigate this information gap, we investigate a contrastive learning approach by feeding negative samples. Our experiments suggest negative samples help models understand what is wrong and improve their inference generations., Comment: Accepted to EMNLP2023
Published: 2023

9. Towards Mitigating Hallucination in Large Language Models via Self-Reflection

Author: Ji, Ziwei, Yu, Tiezheng, Xu, Yan, Lee, Nayeon, Ishii, Etsuko, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. However, the practical deployment still faces challenges, notably the issue of "hallucination", where models generate plausible-sounding but unfaithful or nonsensical information. This issue becomes particularly critical in the medical domain due to the uncommon professional concepts and potential social risks involved. This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets. Our investigation centers on the identification and comprehension of common problematic answers, with a specific emphasis on hallucination. To tackle this challenge, we present an interactive self-reflection methodology that incorporates knowledge acquisition and answer generation. Through this feedback process, our approach steadily enhances the factuality, consistency, and entailment of the generated answers. Consequently, we harness the interactivity and multitasking ability of LLMs and produce progressively more precise and accurate answers. Experimental results on both automatic and human evaluation demonstrate the superiority of our approach in hallucination reduction compared to baselines., Comment: Accepted by the findings of EMNLP 2023
Published: 2023

10. Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models

Author: Lovenia, Holy, Dai, Wenliang, Cahyawijaya, Samuel, Ji, Ziwei, and Fung, Pascale
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Object hallucination poses a significant challenge in vision-language (VL) models, often leading to the generation of nonsensical or unfaithful responses with non-existent objects. However, the absence of a general measurement for evaluating object hallucination in VL models has hindered our understanding and ability to mitigate this issue. In this work, we present NOPE (Negative Object Presence Evaluation), a novel benchmark designed to assess object hallucination in VL models through visual question answering (VQA). We propose a cost-effective and scalable approach utilizing large language models to generate 29.5k synthetic negative pronoun (NegP) data of high quality for NOPE. We extensively investigate the performance of 10 state-of-the-art VL models in discerning the non-existence of objects in visual questions, where the ground truth answers are denoted as NegP (e.g., "none"). Additionally, we evaluate their standard performance on visual questions on 9 other VQA datasets. Through our experiments, we demonstrate that no VL model is immune to the vulnerability of object hallucination, as all models achieve accuracy below 10\% on NegP. Furthermore, we uncover that lexically diverse visual questions, question types with large scopes, and scene-relevant objects capitalize the risk of object hallucination in VL models., Comment: Published in ALVR Workshop at ACL 2024
Published: 2023

11. Think before you speak: Training Language Models With Pause Tokens

Author: Goyal, Sachin, Ji, Ziwei, Rawat, Ankit Singh, Menon, Aditya Krishna, Kumar, Sanjiv, and Nagarajan, Vaishnavh
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)^{th}$ token? We operationalize this idea by performing training and inference on language models with a (learnable) $\textit{pause}$ token, a sequence of which is appended to the input prefix. We then delay extracting the model's outputs until the last pause token is seen, thereby allowing the model to process extra computation before committing to an answer. We empirically evaluate $\textit{pause-training}$ on decoder-only models of 1B and 130M parameters with causal pretraining on C4, and on downstream tasks covering reasoning, question-answering, general understanding and fact recall. Our main finding is that inference-time delays show gains when the model is both pre-trained and finetuned with delays. For the 1B model, we witness gains on 8 of 9 tasks, most prominently, a gain of $18\%$ EM score on the QA task of SQuAD, $8\%$ on CommonSenseQA and $1\%$ accuracy on the reasoning task of GSM8k. Our work raises a range of conceptual and practical future research questions on making delayed next-token prediction a widely applicable new paradigm., Comment: Published at ICLR 2024
Published: 2023

12. Improving Query-Focused Meeting Summarization with Query-Relevant Knowledge

Author: Yu, Tiezheng, Ji, Ziwei, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Query-Focused Meeting Summarization (QFMS) aims to generate a summary of a given meeting transcript conditioned upon a query. The main challenges for QFMS are the long input text length and sparse query-relevant information in the meeting transcript. In this paper, we propose a knowledge-enhanced two-stage framework called Knowledge-Aware Summarizer (KAS) to tackle the challenges. In the first stage, we introduce knowledge-aware scores to improve the query-relevant segment extraction. In the second stage, we incorporate query-relevant knowledge in the summary generation. Experimental results on the QMSum dataset show that our approach achieves state-of-the-art performance. Further analysis proves the competency of our methods in generating relevant and faithful summaries., Comment: AACL 2023 Findings
Published: 2023

13. Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference

Author: Xu, Yan, Kong, Deqian, Xu, Dehong, Ji, Ziwei, Pang, Bo, Fung, Pascale, and Wu, Ying Nian
Subjects: Computer Science - Computation and Language
Abstract: The capability to generate responses with diversity and faithfulness using factual knowledge is paramount for creating a human-like, trustworthy dialogue system. Common strategies either adopt a two-step paradigm, which optimizes knowledge selection and response generation separately, and may overlook the inherent correlation between these two tasks, or leverage conditional variational method to jointly optimize knowledge selection and response generation by employing an inference network. In this paper, we present an end-to-end learning framework, termed Sequential Posterior Inference (SPI), capable of selecting knowledge and generating dialogues by approximately sampling from the posterior distribution. Unlike other methods, SPI does not require the inference network or assume a simple geometry of the posterior distribution. This straightforward and intuitive inference procedure of SPI directly queries the response generation model, allowing for accurate knowledge selection and generation of faithful responses. In addition to modeling contributions, our experimental results on two common dialogue datasets (Wizard of Wikipedia and Holl-E) demonstrate that SPI outperforms previous strong baselines according to both automatic and human evaluation metrics., Comment: Accepted to ICML 2023
Published: 2023

14. Inhibition of thioredoxin-1 enhances the toxicity of glycolysis inhibitor 2-deoxyglucose by downregulating SLC1A5 expression in colorectal cancer cells

Author: Tang, Tianbin, Fang, Daoquan, Ji, Ziwei, Zhong, Zuyue, Zhou, Baojian, Ye, Lechi, Jiang, Lei, and Sun, Xuecheng
Published: 2024
Full Text: View/download PDF

15. Depth Dependence of $\mu$P Learning Rates in ReLU MLPs

Author: Jelassi, Samy, Hanin, Boris, Ji, Ziwei, Reddi, Sashank J., Bhojanapalli, Srinadh, and Kumar, Sanjiv
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization. Our purpose is to study the dependence on $n$ and $L$ of the maximal update ($\mu$P) learning rate, the largest learning rate for which the mean squared change in pre-activations after one step of gradient descent remains uniformly bounded at large $n,L$. As in prior work on $\mu$P of Yang et. al., we find that this maximal update learning rate is independent of $n$ for all but the first and last layer weights. However, we find that it has a non-trivial dependence of $L$, scaling like $L^{-3/2}.$
Published: 2023

16. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Author: Bang, Yejin, Cahyawijaya, Samuel, Lee, Nayeon, Dai, Wenliang, Su, Dan, Wilie, Bryan, Lovenia, Holy, Ji, Ziwei, Yu, Tiezheng, Chung, Willy, Do, Quyet V., Xu, Yan, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. We find that ChatGPT outperforms LLMs with zero-shot learning on most tasks and even outperforms fine-tuned models on some tasks. We find that it is better at understanding non-Latin script languages than generating them. It is able to generate multimodal content from textual prompts, via an intermediate code generation step. Moreover, we find that ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning, hence making it an unreliable reasoner. It is, for example, better at deductive than inductive reasoning. ChatGPT suffers from hallucination problems like other LLMs and it generates more extrinsic hallucinations from its parametric memory as it does not have access to an external knowledge base. Finally, the interactive feature of ChatGPT enables human collaboration with the underlying LLM to improve its performance, i.e, 8% ROUGE-1 on summarization and 2% ChrF++ on machine translation, in a multi-turn "prompt engineering" fashion. We also release codebase for evaluation set extraction., Comment: 45 pages, AACL 2022
Published: 2023

17. NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Author: Cahyawijaya, Samuel, Lovenia, Holy, Aji, Alham Fikri, Winata, Genta Indra, Wilie, Bryan, Mahendra, Rahmad, Wibisono, Christian, Romadhony, Ade, Vincentio, Karissa, Koto, Fajri, Santoso, Jennifer, Moeljadi, David, Wirawan, Cahya, Hudi, Frederikus, Parmonangan, Ivan Halim, Alfina, Ika, Wicaksono, Muhammad Satrio, Putra, Ilham Firdausi, Rahmadani, Samsul, Oenang, Yulianti, Septiandri, Ali Akbar, Jaya, James, Dhole, Kaustubh D., Suryani, Arie Ardiyanti, Putri, Rifki Afina, Su, Dan, Stevens, Keith, Nityasya, Made Nindyatama, Adilazuarda, Muhammad Farid, Ignatius, Ryan, Diandaru, Ryandito, Yu, Tiezheng, Ghifari, Vito, Dai, Wenliang, Xu, Yan, Damapuspita, Dyah, Tho, Cuk, Karo, Ichwanul Muslim Karo, Fatyanosa, Tirana Noor, Ji, Ziwei, Fung, Pascale, Neubig, Graham, Baldwin, Timothy, Ruder, Sebastian, Sujaini, Herry, Sakti, Sakriani, and Purwarianti, Ayu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken.
Published: 2022

18. RHO ($\rho$): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding

Author: Ji, Ziwei, Liu, Zihan, Lee, Nayeon, Yu, Tiezheng, Wilie, Bryan, Zeng, Min, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Dialogue systems can leverage large pre-trained language models and knowledge to generate fluent and informative responses. However, these models are still prone to produce hallucinated responses not supported by the input source, which greatly hinders their application. The heterogeneity between external knowledge and dialogue context challenges representation learning and source integration, and further contributes to unfaithfulness. To handle this challenge and generate more faithful responses, this paper presents RHO ($\rho$) utilizing the representations of linked entities and relation predicates from a knowledge graph (KG). We propose (1) local knowledge grounding to combine textual embeddings with the corresponding KG embeddings; and (2) global knowledge grounding to equip RHO with multi-hop reasoning abilities via the attention mechanism. In addition, we devise a response re-ranking technique based on walks over KG sub-graphs for better conversational reasoning. Experimental results on OpenDialKG show that our approach significantly outperforms state-of-the-art methods on both automatic and human evaluation by a large margin, especially in hallucination reduction (17.54% in FeQA)., Comment: accepted by ACL 2023 Findings
Published: 2022

19. Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training

Author: Dai, Wenliang, Liu, Zihan, Ji, Ziwei, Su, Dan, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on visual information. In this paper, we systematically study the object hallucination problem from three aspects. First, we examine recent state-of-the-art VLP models, showing that they still hallucinate frequently, and models achieving better scores on standard metrics (e.g., CIDEr) could be more unfaithful. Second, we investigate how different types of image encoding in VLP influence hallucination, including region-based, grid-based, and patch-based. Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination. Third, we decouple various VLP objectives and demonstrate that token-level image-text alignment and controlled generation are crucial to reducing hallucination. Based on that, we propose a simple yet effective VLP loss named ObjMLM to further mitigate object hallucination. Results show that it reduces object hallucination by up to 17.4% when tested on two benchmarks (COCO Caption for in-domain and NoCaps for out-of-domain evaluation)., Comment: Accepted at EACL 2023
Published: 2022

20. VScript: Controllable Script Generation with Visual Presentation

Author: Ji, Ziwei, Xu, Yan, Cheng, I-Tsun, Cahyawijaya, Samuel, Frieske, Rita, Ishii, Etsuko, Zeng, Min, Madotto, Andrea, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: In order to offer a customized script tool and inspire professional scriptwriters, we present VScript. It is a controllable pipeline that generates complete scripts, including dialogues and scene descriptions, as well as presents visually using video retrieval. With an interactive interface, our system allows users to select genres and input starting words that control the theme and development of the generated script. We adopt a hierarchical structure, which first generates the plot, then the script and its visual presentation. A novel approach is also introduced to plot-guided dialogue generation by treating it as an inverse dialogue summarization. The experiment results show that our approach outperforms the baselines on both automatic and human evaluations, especially in genre control.
Published: 2022

21. Reproducibility in Optimization: Theoretical Framework and Limits

Author: Ahn, Kwangjun, Jain, Prateek, Ji, Ziwei, Kale, Satyen, Netrapalli, Praneeth, and Shamir, Gil I.
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We initiate a formal study of reproducibility in optimization. We define a quantitative measure of reproducibility of optimization procedures in the face of noisy or error-prone operations such as inexact or stochastic gradient computations or inexact initialization. We then analyze several convex optimization settings of interest such as smooth, non-smooth, and strongly-convex objective functions and establish tight bounds on the limits of reproducibility in each setting. Our analysis reveals a fundamental trade-off between computation and reproducibility: more computation is necessary (and sufficient) for better reproducibility., Comment: 45 Pages; Accepted to NeurIPS 2022
Published: 2022

22. Survey of Hallucination in Natural Language Generation

Author: Ji, Ziwei, Lee, Nayeon, Frieske, Rita, Yu, Tiezheng, Su, Dan, Xu, Yan, Ishii, Etsuko, Bang, Yejin, Chen, Delong, Dai, Wenliang, Chan, Ho Shu, Madotto, Andrea, and Fung, Pascale
Subjects: Computer Science - Computation and Language, A.1
Abstract: Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation; and (3) hallucinations in large language models (LLMs). This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
Published: 2022
Full Text: View/download PDF

23. Agnostic Learnability of Halfspaces via Logistic Loss

Author: Ji, Ziwei, Ahn, Kwangjun, Awasthi, Pranjal, Kale, Satyen, and Karp, Stefani
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: We investigate approximation guarantees provided by logistic regression for the fundamental problem of agnostic learning of homogeneous halfspaces. Previously, for a certain broad class of "well-behaved" distributions on the examples, Diakonikolas et al. (2020) proved an $\tilde{\Omega}(\textrm{OPT})$ lower bound, while Frei et al. (2021) proved an $\tilde{O}(\sqrt{\textrm{OPT}})$ upper bound, where $\textrm{OPT}$ denotes the best zero-one/misclassification risk of a homogeneous halfspace. In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves $\Omega(\sqrt{\textrm{OPT}})$ misclassification risk, matching the upper bound in (Frei et al., 2021). On the other hand, we also show that if we impose a radial-Lipschitzness condition in addition to well-behaved-ness on the distribution, logistic regression on a ball of bounded radius reaches $\tilde{O}(\textrm{OPT})$ misclassification risk. Our techniques also show for any well-behaved distribution, regardless of radial Lipschitzness, we can overcome the $\Omega(\sqrt{\textrm{OPT}})$ lower bound for logistic loss simply at the cost of one additional convex optimization step involving the hinge loss and attain $\tilde{O}(\textrm{OPT})$ misclassification risk. This two-step convex optimization algorithm is simpler than previous methods obtaining this guarantee, all of which require solving $O(\log(1/\textrm{OPT}))$ minimization problems.
Published: 2022

24. Actor-critic is implicitly biased towards high entropy optimal policies

Author: Hu, Yuzheng, Ji, Ziwei, and Telgarsky, Matus
Subjects: Computer Science - Machine Learning
Abstract: We show that the simplest actor-critic method -- a linear softmax policy updated with TD through interaction with a linear MDP, but featuring no explicit regularization or exploration -- does not merely find an optimal policy, but moreover prefers high entropy optimal policies. To demonstrate the strength of this bias, the algorithm not only has no regularization, no projections, and no exploration like $\epsilon$-greedy, but is moreover trained on a single trajectory with no resets. The key consequence of the high entropy bias is that uniform mixing assumptions on the MDP, which exist in some form in all prior work, can be dropped: the implicit regularization of the high entropy bias is enough to ensure that all chains mix and an optimal policy is reached with high probability. As auxiliary contributions, this work decouples concerns between the actor and critic by writing the actor update as an explicit mirror descent, provides tools to uniformly bound mixing times within KL balls of policy space, and provides a projection-free TD analysis with its own implicit bias which can be run from an unmixed starting distribution., Comment: v2 primarily improved the proofs, with minimal changes to the body
Published: 2021

25. Fast Margin Maximization via Dual Acceleration

Author: Ji, Ziwei, Srebro, Nathan, and Telgarsky, Matus
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$. This contrasts with a rate of $\mathcal{O}(1/\log(t))$ for standard gradient descent, and $\mathcal{O}(1/t)$ for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive non-uniform sampling via the dual variables., Comment: ICML 2021
Published: 2021

26. Early-stopped neural networks are consistent

Author: Ji, Ziwei, Li, Justin D., and Telgarsky, Matus
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: This work studies the behavior of shallow ReLU networks trained with the logistic loss via gradient descent on binary classification data where the underlying data distribution is general, and the (optimal) Bayes risk is not necessarily zero. In this setting, it is shown that gradient descent with early stopping achieves population risk arbitrarily close to optimal in terms of not just logistic and misclassification losses, but also in terms of calibration, meaning the sigmoid mapping of its outputs approximates the true underlying conditional distribution arbitrarily finely. Moreover, the necessary iteration, sample, and architectural complexities of this analysis all scale naturally with a certain complexity measure of the true conditional model. Lastly, while it is not shown that early stopping is necessary, it is shown that any univariate classifier satisfying a local interpolation property is inconsistent.
Published: 2021

27. Generalization bounds via distillation

Author: Hsu, Daniel, Ji, Ziwei, Telgarsky, Matus, and Wang, Lan
Subjects: Computer Science - Machine Learning
Abstract: This paper theoretically investigates the following empirical phenomenon: given a high-complexity network with poor generalization bounds, one can distill it into a network with nearly identical predictions but low complexity and vastly smaller generalization bounds. The main contribution is an analysis showing that the original network inherits this good generalization bound from its distillation, assuming the use of well-behaved data augmentation. This bound is presented both in an abstract and in a concrete form, the latter complemented by a reduction technique to handle modern computation graphs featuring convolutional layers, fully-connected layers, and skip connections, to name a few. To round out the story, a (looser) classical uniform convergence analysis of compression is also presented, as well as a variety of experiments on cifar and mnist demonstrating similar generalization performance between the original network and its distillation., Comment: To appear, ICLR 2021
Published: 2021

28. Model Generalization on COVID-19 Fake News Detection

Author: Bang, Yejin, Ishii, Etsuko, Cahyawijaya, Samuel, Ji, Ziwei, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Amid the pandemic COVID-19, the world is facing unprecedented infodemic with the proliferation of both fake and real information. Considering the problematic consequences that the COVID-19 fake-news have brought, the scientific community has put effort to tackle it. To contribute to this fight against the infodemic, we aim to achieve a robust model for the COVID-19 fake-news detection task proposed at CONSTRAINT 2021 (FakeNews-19) by taking two separate approaches: 1) fine-tuning transformers based language models with robust loss functions and 2) removing harmful training instances through influence calculation. We further evaluate the robustness of our models by evaluating on different COVID-19 misinformation test set (Tweets-19) to understand model generalization ability. With the first approach, we achieve 98.13% for weighted F1 score (W-F1) for the shared task, whereas 38.18% W-F1 on the Tweets-19 highest. On the contrary, by performing influence data cleansing, our model with 99% cleansing percentage can achieve 54.33% W-F1 score on Tweets-19 with a trade-off. By evaluating our models on two COVID-19 fake-news test sets, we suggest the importance of model generalization ability in this task to step forward to tackle the COVID-19 fake-news problem in online social media platforms., Comment: CONSTRAINT Workshop 2021 (Camera Ready Version)
Published: 2021

29. CrossNER: Evaluating Cross-Domain Named Entity Recognition

Author: Liu, Zihan, Xu, Yan, Yu, Tiezheng, Dai, Wenliang, Ji, Ziwei, Cahyawijaya, Samuel, Madotto, Andrea, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Cross-domain named entity recognition (NER) models are able to cope with the scarcity issue of NER samples in target domains. However, most of the existing NER benchmarks lack domain-specialized entity types or do not focus on a certain domain, leading to a less effective cross-domain evaluation. To address these obstacles, we introduce a cross-domain NER dataset (CrossNER), a fully-labeled collection of NER data spanning over five diverse domains with specialized entity categories for different domains. Additionally, we also provide a domain-related corpus since using it to continue pre-training language models (domain-adaptive pre-training) is effective for the domain adaptation. We then conduct comprehensive experiments to explore the effectiveness of leveraging different levels of the domain corpus and pre-training strategies to do domain-adaptive pre-training for the cross-domain task. Results show that focusing on the fractional corpus containing domain-specialized entities and utilizing a more challenging pre-training strategy in domain-adaptive pre-training are beneficial for the NER domain adaptation, and our proposed method can consistently outperform existing cross-domain NER baselines. Nevertheless, experiments also illustrate the challenge of this cross-domain NER task. We hope that our dataset and baselines will catalyze research in the NER domain adaptation area. The code and data are available at https://github.com/zliucr/CrossNER., Comment: Accepted in AAAI-2021
Published: 2020

30. Multi-hop Question Generation with Graph Convolutional Network

Author: Su, Dan, Xu, Yan, Dai, Wenliang, Ji, Ziwei, Yu, Tiezheng, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs. It is a more challenging yet under-explored task compared to conventional single-hop QG, where the questions are generated from the sentence containing the answer or nearby sentences in the same paragraph without complex reasoning. To address the additional challenges in multi-hop QG, we propose Multi-Hop Encoding Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops with Graph Convolutional Network and encoding fusion via an Encoder Reasoning Gate. To the best of our knowledge, we are the first to tackle the challenge of multi-hop reasoning over paragraphs without any sentence-level information. Empirical results on HotpotQA dataset demonstrate the effectiveness of our method, in comparison with baselines on automatic evaluation metrics. Moreover, from the human evaluation, our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation. The code is publicly available at https://github.com/HLTCHKUST/MulQG}{https://github.com/HLTCHKUST/MulQG ., Comment: Findings of EMNLP 2020
Published: 2020
Full Text: View/download PDF

31. Gradient descent follows the regularization path for general losses

Author: Ji, Ziwei, Dudík, Miroslav, Schapire, Robert E., and Telgarsky, Matus
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an implicit bias. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss. In this work, we show that for empirical risk minimization over linear predictors with arbitrary convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradient-descent path and the algorithm-independent regularization path converge to the same direction (whenever either converges to a direction). Using this result, we provide a justification for the widely-used exponentially-tailed losses (such as the exponential loss or the logistic loss): while this convergence to a direction for exponentially-tailed losses is necessarily to the maximum-margin direction, other losses such as polynomially-tailed losses may induce convergence to a direction with a poor margin., Comment: To appear, COLT 2020
Published: 2020

32. Directional convergence and alignment in deep learning

Author: Ji, Ziwei and Telgarsky, Matus
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: In this paper, we show that although the minimizers of cross-entropy and related classification losses are off at infinity, network weights learned by gradient flow converge in direction, with an immediate corollary that network predictions, training errors, and the margin distribution also converge. This proof holds for deep homogeneous networks -- a broad class of networks allowing for ReLU, max-pooling, linear, and convolutional layers -- and we additionally provide empirical support not just close to the theory (e.g., the AlexNet), but also on non-homogeneous networks (e.g., the DenseNet). If the network further has locally Lipschitz gradients, we show that these gradients also converge in direction, and asymptotically align with the gradient flow path, with consequences on margin maximization, convergence of saliency maps, and a few other settings. Our analysis complements and is distinct from the well-known neural tangent and mean-field theories, and in particular makes no requirements on network width and initialization, instead merely requiring perfect classification accuracy. The proof proceeds by developing a theory of unbounded nonsmooth Kurdyka-{\L}ojasiewicz inequalities for functions definable in an o-minimal structure, and is also applicable outside deep learning., Comment: To appear, NeuRIPS 2020
Published: 2020

33. Neural tangent kernels, transportation mappings, and universal approximation

Author: Ji, Ziwei, Telgarsky, Matus, and Xian, Ruicheng
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: This paper establishes rates of universal approximation for the shallow neural tangent kernel (NTK): network weights are only allowed microscopic changes from random initialization, which entails that activations are mostly unchanged, and the network is nearly equivalent to its linearization. Concretely, the paper has two main contributions: a generic scheme to approximate functions with the NTK by sampling from transport mappings between the initial weights and their desired values, and the construction of transport mappings via Fourier transforms. Regarding the first contribution, the proof scheme provides another perspective on how the NTK regime arises from rescaling: redundancy in the weights due to resampling allows individual weights to be scaled down. Regarding the second contribution, the most notable transport mapping asserts that roughly $1 / \delta^{10d}$ nodes are sufficient to approximate continuous functions, where $\delta$ depends on the continuity properties of the target function. By contrast, nearly the same proof yields a bound of $1 / \delta^{2d}$ for shallow ReLU networks; this gap suggests a tantalizing direction for future work, separating shallow ReLU networks and their linearization.
Published: 2019

34. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

Author: Ji, Ziwei and Telgarsky, Matus
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: Recent theoretical work has guaranteed that overparameterized networks trained by gradient descent achieve arbitrarily low training error, and sometimes even low test error. The required width, however, is always polynomial in at least one of the sample size $n$, the (inverse) target error $1/\epsilon$, and the (inverse) failure probability $1/\delta$. This work shows that $\widetilde{\Theta}(1/\epsilon)$ iterations of gradient descent with $\widetilde{\Omega}(1/\epsilon^2)$ training examples on two-layer ReLU networks of any width exceeding $\mathrm{polylog}(n,1/\epsilon,1/\delta)$ suffice to achieve a test misclassification error of $\epsilon$. We also prove that stochastic gradient descent can achieve $\epsilon$ test error with polylogarithmic width and $\widetilde{\Theta}(1/\epsilon)$ samples. The analysis relies upon the separation margin of the limiting kernel, which is guaranteed positive, can distinguish between true labels and random labels, and can give a tight sample-complexity analysis in the infinite-width setting
Published: 2019

35. Approximation power of random neural networks

Author: Bailey, Bolton, Ji, Ziwei, Telgarsky, Matus, and Xian, Ruicheng
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: This paper investigates the approximation power of three types of random neural networks: (a) infinite width networks, with weights following an arbitrary distribution; (b) finite width networks obtained by subsampling the preceding infinite width networks; (c) finite width networks obtained by starting with standard Gaussian initialization, and then adding a vanishingly small correction to the weights. The primary result is a fully quantified bound on the rate of approximation of general general continuous functions: in all three cases, a function $f$ can be approximated with complexity $\|f\|_1 (d/\delta)^{\mathcal{O}(d)}$, where $\delta$ depends on continuity properties of $f$ and the complexity measure depends on the weight magnitudes and/or cardinalities. Along the way, a variety of ancillary results are developed: an exact construction of Gaussian densities with infinite width networks, an elementary stand-alone proof scheme for approximation via convolutions of radial basis functions, subsampling rates for infinite width networks, and depth separation for corrected networks., Comment: This submission constitutes a poor approach to the problem, and has no scientific purpose. A superior (different) approach (and stronger final result, also treating the NTK) has appeared in arXiv:1910.06956 ; please see that work instead
Published: 2019

36. Characterizing the implicit bias via a primal-dual analysis

Author: Ji, Ziwei and Telgarsky, Matus
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: This paper shows that the implicit bias of gradient descent on linearly separable data is exactly characterized by the optimal solution of a dual optimization problem given by a smoothed margin, even for general losses. This is in contrast to prior results, which are often tailored to exponentially-tailed losses. For the exponential loss specifically, with $n$ training examples and $t$ gradient descent steps, our dual analysis further allows us to prove an $O(\ln(n)/\ln(t))$ convergence rate to the $\ell_2$ maximum margin direction, when a constant step size is used. This rate is tight in both $n$ and $t$, which has not been presented by prior work. On the other hand, with a properly chosen but aggressive step size schedule, we prove $O(1/t)$ rates for both $\ell_2$ margin maximization and implicit bias, whereas prior work (including all first-order methods for the general hard-margin linear SVM problem) proved $\widetilde{O}(1/\sqrt{t})$ margin rates, or $O(1/t)$ margin rates to a suboptimal margin, with an implied (slower) bias rate. Our key observations include that gradient descent on the primal variable naturally induces a mirror descent update on the dual variable, and that the dual objective in this setting is smooth enough to give a faster rate.
Published: 2019

37. Gradient descent aligns the layers of deep linear networks

Author: Ji, Ziwei and Telgarsky, Matus
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: This paper establishes risk convergence and asymptotic weight matrix alignment --- a form of implicit regularization --- of gradient flow and gradient descent when applied to deep linear networks on linearly separable data. In more detail, for gradient flow applied to strictly decreasing loss functions (with similar results for gradient descent with particular decreasing step sizes): (i) the risk converges to 0; (ii) the normalized i-th weight matrix asymptotically equals its rank-1 approximation $u_iv_i^{\top}$; (iii) these rank-1 matrices are aligned across layers, meaning $|v_{i+1}^{\top}u_i|\to1$. In the case of the logistic loss (binary cross entropy), more can be said: the linear function induced by the network --- the product of its weight matrices --- converges to the same direction as the maximum margin solution. This last property was identified in prior work, but only under assumptions on gradient descent which here are implied by the alignment phenomenon.
Published: 2018

38. Risk and parameter convergence of logistic regression

Author: Ji, Ziwei and Telgarsky, Matus
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: Gradient descent, when applied to the task of logistic regression, outputs iterates which are biased to follow a unique ray defined by the data. The direction of this ray is the maximum margin predictor of a maximal linearly separable subset of the data; the gradient descent iterates converge to this ray in direction at the rate $\mathcal{O}(\ln\ln t / \ln t)$. The ray does not pass through the origin in general, and its offset is the bounded global optimum of the risk over the remaining data; gradient descent recovers this offset at a rate $\mathcal{O}((\ln t)^2 / \sqrt{t})$., Comment: Appears in COLT 2019 with the title "The implicit bias of gradient descent on nonseparable data" (and no other changes)
Published: 2018

39. Wikidata Vandalism Detection - The Loganberry Vandalism Detector at WSDM Cup 2017

Author: Zhu, Qi, Ng, Hongwei, Liu, Liyuan, Ji, Ziwei, Jiang, Bingjie, Shen, Jiaming, and Gui, Huan
Subjects: Computer Science - Information Retrieval, H.3
Abstract: Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation. As it can be edited by anyone, entries frequently get vandalized, leading to the possibility that it might spread of falsified information if such posts are not detected. The WSDM 2017 Wiki Vandalism Detection Challenge requires us to solve this problem by computing a vandalism score denoting the likelihood that a revision corresponds to an act of vandalism and performance is measured using the ROC-AUC obtained on a held-out test set. This paper provides the details of our submission that obtained an ROC-AUC score of 0.91976 in the final evaluation., Comment: Vandalism Detector at WSDM Cup 2017, see arXiv:1712.05956
Published: 2017

40. Social welfare and profit maximization from revealed preferences

Author: Ji, Ziwei, Mehta, Ruta, and Telgarsky, Matus
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning
Abstract: Consider the seller's problem of finding optimal prices for her $n$ (divisible) goods when faced with a set of $m$ consumers, given that she can only observe their purchased bundles at posted prices, i.e., revealed preferences. We study both social welfare and profit maximization with revealed preferences. Although social welfare maximization is a seemingly non-convex optimization problem in prices, we show that (i) it can be reduced to a dual convex optimization problem in prices, and (ii) the revealed preferences can be interpreted as supergradients of the concave conjugate of valuation, with which subgradients of the dual function can be computed. We thereby obtain a simple subgradient-based algorithm for strongly concave valuations and convex cost, with query complexity $O(m^2/\epsilon^2)$, where $\epsilon$ is the additive difference between the social welfare induced by our algorithm and the optimum social welfare. We also study social welfare maximization under the online setting, specifically the random permutation model, where consumers arrive one-by-one in a random order. For the case where consumer valuations can be arbitrary continuous functions, we propose a price posting mechanism that achieves an expected social welfare up to an additive factor of $O(\sqrt{mn})$ from the maximum social welfare. Finally, for profit maximization (which may be non-convex in simple cases), we give nearly matching upper and lower bounds on the query complexity for separable valuations and cost (i.e., each good can be treated independently).
Published: 2017

41. Does aspirin reduce the incidence, recurrence, and mortality of colorectal cancer? A meta-analysis of randomized clinical trials

Author: Ma, Shaodi, Han, Tiantian, Sun, Chenyu, Cheng, Ce, Zhang, Huimei, Qu, Guangbo, Bhan, Chandur, Yang, Hongru, Guo, Zhichun, Yan, Yue, Cao, Chenyu, Ji, Ziwei, and Zhou, Qin
Published: 2021
Full Text: View/download PDF

42. The protective effect of magnesium phosphate cement on steel corrosion

Author: Tang, Hao, Qian, Jueshi, Ji, Ziwei, Dai, Xiaobing, and Li, Zhen
Published: 2020
Full Text: View/download PDF

43. Correction: c-Myc-activated long non-coding RNA LINC01050 promotes gastric cancer growth and metastasis by sponging miR-7161-3p to regulate SPZ1 expression

Author: Ji, Ziwei, primary, Tang, Tianbin, additional, Chen, Mengxia, additional, Dong, Buyuan, additional, Sun, Wenjing, additional, Wu, Nan, additional, Chen, Hao, additional, Feng, Qian, additional, Yang, Xingyi, additional, Jin, Rong, additional, and Jiang, Lei, additional
Published: 2024
Full Text: View/download PDF

44. C-Myc-activated long non-coding RNA LINC01050 promotes gastric cancer growth and metastasis by sponging miR-7161-3p to regulate SPZ1 expression

Author: Ji, Ziwei, Tang, Tianbin, Chen, Mengxia, Dong, Buyuan, Sun, Wenjing, Wu, Nan, Chen, Hao, Feng, Qian, Yang, Xingyi, Jin, Rong, and Jiang, Lei
Published: 2021
Full Text: View/download PDF

45. Social Welfare and Profit Maximization from Revealed Preferences

Author: Ji, Ziwei, Mehta, Ruta, Telgarsky, Matus, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Christodoulou, George, editor, and Harks, Tobias, editor
Published: 2018
Full Text: View/download PDF

46. Model Generalization on COVID-19 Fake News Detection

Author: Bang, Yejin, primary, Ishii, Etsuko, additional, Cahyawijaya, Samuel, additional, Ji, Ziwei, additional, and Fung, Pascale, additional
Published: 2021
Full Text: View/download PDF

47. Effect of ettringite seed crystals on the properties of calcium sulphoaluminate cement

Author: Yu, Jincheng, Qian, Jueshi, Tang, Jiangyu, Ji, Ziwei, and Fan, Yingru
Published: 2019
Full Text: View/download PDF

48. The Beachcombers’ Problem: Walking and Searching from an Inner Point of a Line

Author: Chen, Yu, Deng, Xiaotie, Ji, Ziwei, Liao, Chao, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Dediu, Adrian-Horia, editor, Janoušek, Jan, editor, Martín-Vide, Carlos, editor, and Truthe, Bianca, editor
Published: 2016
Full Text: View/download PDF

49. NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Author: Cahyawijaya, Samuel, Lovenia, Holy, Wilie, Bryan, Su, Dan, Yu, Tiezheng, Dai, Wenliang, Xu, Yan, Ji, Ziwei, Fung, Pascale Ngan, Cahyawijaya, Samuel, Lovenia, Holy, Wilie, Bryan, Su, Dan, Yu, Tiezheng, Dai, Wenliang, Xu, Yan, Ji, Ziwei, and Fung, Pascale Ngan
Abstract: We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are underrepresented despite being widely spoken. © 2023 Association for Computational Linguistics.
Published: 2023

50. Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training

Author: Dai, Wenliang, Liu, Zihan, Ji, Ziwei, Su, Dan, Fung, Pascale Ngan, Dai, Wenliang, Liu, Zihan, Ji, Ziwei, Su, Dan, and Fung, Pascale Ngan
Abstract: Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on visual information. In this paper, we systematically study the object hallucination problem from three aspects. First, we examine recent state-of-the-art VLP models, showing that they still hallucinate frequently and models achieving better scores on standard metrics (e.g., CIDEr) could be more unfaithful. Second, we investigate how different types of image encoding in VLP influence hallucination, including region-based, grid-based, and patch-based. Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination. Third, we decouple various VLP objectives and demonstrate that token-level image-text alignment and controlled generation are crucial to reducing hallucination. Based on that, we propose a simple yet effective VLP loss named ObjMLM to further mitigate object hallucination. Results show that it reduces object hallucination by up to 17.4% when tested on two benchmarks (COCO Caption for in-domain and NoCaps for out-of-domain evaluation).
Published: 2023

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

169 results on '"Ji, Ziwei"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources