Author: "Lee, Hongrae" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lee, Hongrae"' showing total 34 results

Start Over Author "Lee, Hongrae"

34 results on '"Lee, Hongrae"'

1. PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions

Author: Chen, Anthony, Pasupat, Panupong, Singh, Sameer, Lee, Hongrae, and Guu, Kelvin
Subjects: Computer Science - Computation and Language
Abstract: The remarkable capabilities of large language models have been accompanied by a persistent drawback: the generation of false and unsubstantiated claims commonly known as "hallucinations". To combat this issue, recent research has introduced approaches that involve editing and attributing the outputs of language models, particularly through prompt-based editing. However, the inference cost and speed of using large language models for editing currently bottleneck prompt-based methods. These bottlenecks motivate the training of compact editors, which is challenging due to the scarcity of training data for this purpose. To overcome these challenges, we exploit the power of large language models to introduce corruptions (i.e., noise) into text and subsequently fine-tune compact editors to denoise the corruptions by incorporating relevant evidence. Our methodology is entirely unsupervised and provides us with faux hallucinations for training in any domain. Our Petite Unsupervised Research and Revision model, PURR, not only improves attribution over existing editing methods based on fine-tuning and prompting, but also achieves faster execution times by orders of magnitude.
Published: 2023

2. RARR: Researching and Revising What Language Models Say, Using Language Models

Author: Gao, Luyu, Dai, Zhuyun, Pasupat, Panupong, Chen, Anthony, Chaganty, Arun Tejasvi, Fan, Yicheng, Zhao, Vincent Y., Lao, Ni, Lee, Hongrae, Juan, Da-Cheng, and Guu, Kelvin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Language models (LMs) now excel at many tasks such as few-shot learning, question answering, reasoning, and dialog. However, they sometimes generate unsupported or misleading content. A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence. To enable attribution while still preserving all the powerful advantages of recent generation models, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically finds attribution for the output of any text generation model and 2) post-edits the output to fix unsupported content while preserving the original output as much as possible. When applied to the output of several state-of-the-art LMs on a diverse set of generation tasks, we find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models. Furthermore, the implementation of RARR requires only a handful of training examples, a large language model, and standard web search., Comment: ACL 2023
Published: 2022

3. LaMDA: Language Models for Dialog Applications

Author: Thoppilan, Romal, De Freitas, Daniel, Hall, Jamie, Shazeer, Noam, Kulshreshtha, Apoorv, Cheng, Heng-Tze, Jin, Alicia, Bos, Taylor, Baker, Leslie, Du, Yu, Li, YaGuang, Lee, Hongrae, Zheng, Huaixiu Steven, Ghafouri, Amin, Menegali, Marcelo, Huang, Yanping, Krikun, Maxim, Lepikhin, Dmitry, Qin, James, Chen, Dehao, Xu, Yuanzhong, Chen, Zhifeng, Roberts, Adam, Bosma, Maarten, Zhao, Vincent, Zhou, Yanqi, Chang, Chung-Ching, Krivokon, Igor, Rusch, Will, Pickett, Marc, Srinivasan, Pranesh, Man, Laichee, Meier-Hellstern, Kathleen, Morris, Meredith Ringel, Doshi, Tulsee, Santos, Renelito Delos, Duke, Toju, Soraker, Johnny, Zevenbergen, Ben, Prabhakaran, Vinodkumar, Diaz, Mark, Hutchinson, Ben, Olson, Kristen, Molina, Alejandra, Hoffman-John, Erin, Lee, Josh, Aroyo, Lora, Rajakumar, Ravi, Butryna, Alena, Lamm, Matthew, Kuzmina, Viktoriya, Fenton, Joe, Cohen, Aaron, Bernstein, Rachel, Kurzweil, Ray, Aguera-Arcas, Blaise, Cui, Claire, Croak, Marian, Chi, Ed, and Le, Quoc
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding. The first challenge, safety, involves ensuring that the model's responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety. The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible. Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency.
Published: 2022

4. Generating Titles for Web Tables

Author: Hancock, Braden, Lee, Hongrae, and Yu, Cong
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Descriptive titles provide crucial context for interpreting tables that are extracted from web pages and are a key component of table-based web applications. Prior approaches have attempted to produce titles by selecting existing text snippets associated with the table. These approaches, however, are limited by their dependence on suitable titles existing a priori. In our user study, we observe that the relevant information for the title tends to be scattered across the page, and often--more than 80% of the time--does not appear verbatim anywhere in the page. We propose instead the application of a sequence-to-sequence neural network model as a more generalizable means of generating high-quality titles. This is accomplished by extracting many text snippets that have potentially relevant information to the table, encoding them into an input sequence, and using both copy and generation mechanisms in the decoder to balance relevance and readability of the generated title. We validate this approach with human evaluation on sample web tables and report that while sequence models with only a copy mechanism or only a generation mechanism are easily outperformed by simple selection-based baselines, the model with both capabilities outperforms them all, approaching the quality of crowdsourced titles while training on fewer than ten thousand examples. To the best of our knowledge, the proposed technique is the first to consider text generation methods for table titles and establishes a new state of the art., Comment: WWW 2019
Published: 2018

5. Learning to Skim Text

Author: Yu, Adams Wei, Lee, Hongrae, and Le, Quoc V.
Subjects: Computer Science - Computation and Language, Computer Science - Learning
Abstract: Recurrent Neural Networks are showing much promise in many sub-areas of natural language processing, ranging from document classification to machine translation to automatic question answering. Despite their promise, many recurrent models have to read the whole text word by word, making it slow to handle long documents. For example, it is difficult to use a recurrent network to read a book and answer questions about it. In this paper, we present an approach of reading text while skipping irrelevant information if needed. The underlying model is a recurrent network that learns how far to jump after reading a few words of the input text. We employ a standard policy gradient method to train the model to make discrete jumping decisions. In our benchmarks on four different tasks, including number prediction, sentiment analysis, news article classification and automatic Q\&A, our proposed model, a modified LSTM with jumping, is up to 6 times faster than the standard sequential LSTM, while maintaining the same or even better accuracy.
Published: 2017

6. Recognition Pattern Restoration Algorithm Based on QR Code Data Loss Rate

Author: Lee, Doyoung, primary, Shin, Hyoseung, additional, Byun, Jiyun, additional, and Lee, Hongrae, additional
Published: 2024
Full Text: View/download PDF

7. String Joins with Synonyms

Author: Song, Gwangho, Lee, Hongrae, Shim, Kyuseok, Park, Yoonjae, Kim, Wooyeol, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Nah, Yunmook, editor, Cui, Bin, editor, Lee, Sang-Won, editor, Yu, Jeffrey Xu, editor, Moon, Yang-Sae, editor, and Whang, Steven Euijong, editor
Published: 2020
Full Text: View/download PDF

8. Similarity Join Size Estimation using Locality Sensitive Hashing

Author: Lee, Hongrae, Ng, Raymond T., and Shim, Kyuseok
Subjects: Computer Science - Databases, Computer Science - Data Structures and Algorithms
Abstract: Similarity joins are important operations with a broad range of applications. In this paper, we study the problem of vector similarity join size estimation (VSJ). It is a generalization of the previously studied set similarity join size estimation (SSJ) problem and can handle more interesting cases such as TF-IDF vectors. One of the key challenges in similarity join size estimation is that the join size can change dramatically depending on the input similarity threshold. We propose a sampling based algorithm that uses the Locality-Sensitive-Hashing (LSH) scheme. The proposed algorithm LSH-SS uses an LSH index to enable effective sampling even at high thresholds. We compare the proposed technique with random sampling and the state-of-the-art technique for SSJ (adapted to VSJ) and demonstrate LSH-SS offers more accurate estimates at both high and low similarity thresholds and small variance using real-world data sets., Comment: VLDB2011
Published: 2011

9. String Joins with Synonyms

Author: Song, Gwangho, primary, Lee, Hongrae, additional, Shim, Kyuseok, additional, Park, Yoonjae, additional, and Kim, Wooyeol, additional
Published: 2020
Full Text: View/download PDF

10. Google Fusion Tables

Author: Balakrishnan, Sreeram, Jacqmin-Adams, Karen, Lee, Hongrae, McChesney, Rod, Shapley, Rebecca, Shekhar, Shashi, editor, Xiong, Hui, editor, and Zhou, Xun, editor
Published: 2017
Full Text: View/download PDF

11. Power-law based estimation of set similarity join size

Author: Lee, Hongrae, Ng, Raymond T., and Shim, Kyuseok
Abstract: We propose a novel technique for estimating the size of set similarity join. The proposed technique relies on a succinct representation of sets using Min-Hash signatures. We exploit frequent patterns in the signatures for the Set Similarity Join (SSJoin) size estimation by counting their support. However, there are overlaps among the counts of signature patterns and we need to use the set Inclusion-Exclusion (IE) principle. We develop a novel lattice-based counting method for efficiently evaluating the IE principle. The proposed counting technique is linear in the lattice size. To make the mining process very light-weight, we exploit a recently discovered Power-law relationship of pattern count and frequency. Extensive experimental evaluations show the proposed technique is capable of accurate and efficient estimation.
Published: 2024
Full Text: View/download PDF

12. RARR: Researching and Revising What Language Models Say, Using Language Models

Author: Gao, Luyu, primary, Dai, Zhuyun, additional, Pasupat, Panupong, additional, Chen, Anthony, additional, Chaganty, Arun Tejasvi, additional, Fan, Yicheng, additional, Zhao, Vincent, additional, Lao, Ni, additional, Lee, Hongrae, additional, Juan, Da-Cheng, additional, and Guu, Kelvin, additional
Published: 2023
Full Text: View/download PDF

13. Google Fusion Tables

Author: Balakrishnan, Sreeram, primary, Jacqmin-Adams, Karen, additional, Lee, Hongrae, additional, McChesney, Rod, additional, and Lee, Rebecca, additional
Published: 2016
Full Text: View/download PDF

14. DARNN-Based Prediction Model for COVID-19 Diffusion for Each Administrative District in Seoul

Author: Park, Yeonjae, primary, Jun, Young Pyo, additional, Lee, Hongrae, additional, and Cho, Young-Rae, additional
Published: 2021
Full Text: View/download PDF

15. Substring Similarity Search with Synonyms

Author: Song, Gwangho, primary, Shim, Kyuseok, additional, and Lee, Hongrae, additional
Published: 2021
Full Text: View/download PDF

16. Natural language to SQL

Author: Kim, Hyeonji, primary, So, Byeong-Hoon, additional, Han, Wook-Shin, additional, and Lee, Hongrae, additional
Published: 2020
Full Text: View/download PDF

17. Generating Titles for Web Tables

Author: Hancock, Braden, primary, Lee, Hongrae, additional, and Yu, Cong, additional
Published: 2019
Full Text: View/download PDF

18. Ten years of webtables

Author: Cafarella, Michael, primary, Halevy, Alon, additional, Lee, Hongrae, additional, Madhavan, Jayant, additional, Yu, Cong, additional, Wang, Daisy Zhe, additional, and Wu, Eugene, additional
Published: 2018
Full Text: View/download PDF

19. Learning to Skim Text

Author: Yu, Adams Wei, primary, Lee, Hongrae, additional, and Le, Quoc, additional
Published: 2017
Full Text: View/download PDF

20. Using SSDs to scale up Google Fusion Tables, a database-in-the-cloud

Author: Bu, Yingyi, primary, Halim, Felix, additional, Kim, Changkyu, additional, Lee, Hongrae, additional, and Madhavan, Jayant, additional
Published: 2016
Full Text: View/download PDF

21. Mining Subjective Properties on the Web

Author: Trummer, Immanuel, primary, Halevy, Alon, additional, Lee, Hongrae, additional, Sarawagi, Sunita, additional, and Gupta, Rahul, additional
Published: 2015
Full Text: View/download PDF

22. Selectivity estimation of approximate predicates on text

Author: Lee, Hongrae
Abstract: This dissertation studies selectivity estimation of approximate predicates on text. Intuitively, we aim to count the number of strings that are similar to a given query string. This type of problem is crucial in handling text in RDBMSs in an error-tolerant way. A common difficulty in handling textual data is that they may contain typographical errors, or use similar but different textual representations for the same real-world entity. To handle such data in databases, approximate text processing has gained extensive interest and commercial databases have begun to incorporate such functionalities. One of the key components in successful integration of approximate text processing in RDBMSs is the selectivity estimation module, which is central in optimizing queries involving such predicates. However, these developments are relatively new and ad-hoc approaches, e.g., using a constant, have been employed. This dissertation studies reliable selectivity estimation techniques for approximate predicates on text. Among many possible predicates, we focus on two types of predicates which are fundamental building blocks of SQL queries: selections and joins. We study two different semantics for each type of operator. We propose a set of related summary structures and algorithms to estimate selectivity of selection and join operators with approximate matching. A common challenge is that there can be a huge number of variants to consider. The proposed data structures enable efficient counting by considering a group of similar variants together rather than each and every one separately. A lattice-based framework is proposed to consider overlapping counts among the groups. We performed extensive evaluation of proposed techniques using real-world and synthetic data sets. Our techniques support popular similarity measures including edit distance, Jaccard similarity and cosine similarity and show how to extend the techniques to other measures. Proposed solutions are compared with state-of-the-arts and baseline methods. Experimental results show that the proposed techniques are able to deliver accurate estimates with small space overhead.
Published: 2010
Full Text: View/download PDF

23. Consistent thinning of large geographical data for map visualization

Author: Sarma, Anish Das, primary, Lee, Hongrae, additional, Gonzalez, Hector, additional, Madhavan, Jayant, additional, and Halevy, Alon, additional
Published: 2013
Full Text: View/download PDF

24. Comparing SSD-placement strategies to scale a database-in-the-cloud

Author: Bu, Yingyi, primary, Lee, Hongrae, additional, and Madhavan, Jayant, additional
Published: 2013
Full Text: View/download PDF

25. Finding related tables

Author: Das Sarma, Anish, primary, Fang, Lujun, additional, Gupta, Nitin, additional, Halevy, Alon, additional, Lee, Hongrae, additional, Wu, Fei, additional, Xin, Reynold, additional, and Yu, Cong, additional
Published: 2012
Full Text: View/download PDF

26. CloudRAMSort

Author: Kim, Changkyu, primary, Park, Jongsoo, additional, Satish, Nadathur, additional, Lee, Hongrae, additional, Dubey, Pradeep, additional, and Chhugani, Jatin, additional
Published: 2012
Full Text: View/download PDF

27. Efficient spatial sampling of large geographical tables

Author: Das Sarma, Anish, primary, Lee, Hongrae, additional, Gonzalez, Hector, additional, Madhavan, Jayant, additional, and Halevy, Alon, additional
Published: 2012
Full Text: View/download PDF

28. Efficient Exact Similarity Searches Using Multiple Token Orderings

Author: Kim, Jongik, primary and Lee, Hongrae, additional
Published: 2012
Full Text: View/download PDF

29. Similarity join size estimation using locality sensitive hashing

Author: Lee, Hongrae, primary, Ng, Raymond T., additional, and Shim, Kyuseok, additional
Published: 2011
Full Text: View/download PDF

30. Variance aware optimization of parameterized queries

Author: Chaudhuri, Surajit, primary, Lee, Hongrae, additional, and Narasayya, Vivek R., additional
Published: 2010
Full Text: View/download PDF

31. Power-law based estimation of set similarity join size

Author: Lee, Hongrae, primary, Ng, Raymond T., additional, and Shim, Kyuseok, additional
Published: 2009
Full Text: View/download PDF

32. Approximate substring selectivity estimation

Author: Lee, Hongrae, primary, Ng, Raymond T., additional, and Shim, Kyuseok, additional
Published: 2009
Full Text: View/download PDF

33. Recent progress towards an ecosystem of structured data on the Web.

Author: Gupta, Nitin, Halevy, Alon Y., Harb, Boulos, Lam, Heidi, Lee, Hongrae, Madhavan, Jayant, Wu, Fei, and Yu, Cong
Published: 2013
Full Text: View/download PDF

34. EIC Editorial.

Author: Pei, Jian, Akoglu, Leman, Lee, Hongrae, Levandoski, Justin, Li, Xuelong, Meo, Rosa, Ordonez, Carlos, Phillips, Jeff, Poblete, Barbara, Candan, K. Selcuk, Wang, Meng, Wen, Ji-Rong, Xiong, Li, and Zhang, Wenjie
Subjects: AKOGLU, Leman, LEVANDOSKI, Justin, ORDONEZ, Carlos
Abstract: The article offers brief profiles of editorial board Leman Akoglu, Justin Levandoski, and Carlos Ordonez, who contributed to the writing of the journal.
Published: 2016
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

34 results on '"Lee, Hongrae"'

1. PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions

2. RARR: Researching and Revising What Language Models Say, Using Language Models

3. LaMDA: Language Models for Dialog Applications

4. Generating Titles for Web Tables

5. Learning to Skim Text

6. Recognition Pattern Restoration Algorithm Based on QR Code Data Loss Rate

7. String Joins with Synonyms

8. Similarity Join Size Estimation using Locality Sensitive Hashing

9. String Joins with Synonyms

10. Google Fusion Tables

11. Power-law based estimation of set similarity join size

12. RARR: Researching and Revising What Language Models Say, Using Language Models

13. Google Fusion Tables

14. DARNN-Based Prediction Model for COVID-19 Diffusion for Each Administrative District in Seoul

15. Substring Similarity Search with Synonyms

16. Natural language to SQL

17. Generating Titles for Web Tables

18. Ten years of webtables

19. Learning to Skim Text

20. Using SSDs to scale up Google Fusion Tables, a database-in-the-cloud

21. Mining Subjective Properties on the Web

22. Selectivity estimation of approximate predicates on text

23. Consistent thinning of large geographical data for map visualization

24. Comparing SSD-placement strategies to scale a database-in-the-cloud

25. Finding related tables

26. CloudRAMSort

27. Efficient spatial sampling of large geographical tables

28. Efficient Exact Similarity Searches Using Multiple Token Orderings

29. Similarity join size estimation using locality sensitive hashing

30. Variance aware optimization of parameterized queries

31. Power-law based estimation of set similarity join size

32. Approximate substring selectivity estimation

33. Recent progress towards an ecosystem of structured data on the Web.

34. EIC Editorial.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

34 results on '"Lee, Hongrae"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources