Author: "Saad-Falcon, Jon" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Saad-Falcon, Jon"' showing total 8 results

Start Over Author "Saad-Falcon, Jon" Database OpenAIRE

8 results on '"Saad-Falcon, Jon"'

1. UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

Author: Saad-Falcon, Jon, Khattab, Omar, Santhanam, Keshav, Florian, Radu, Franz, Martin, Roukos, Salim, Sil, Avirup, Sultan, Md Arafat, and Potts, Christopher
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL), Information Retrieval (cs.IR), Computer Science - Information Retrieval
Abstract: Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challenge, we develop and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply. The method begins by generating a small number of synthetic queries using an expensive LLM. After that, a much less expensive one is used to create large numbers of synthetic queries, which are used to fine-tune a family of reranker models. These rerankers are then distilled into a single efficient retriever for use in the target domain. We show that this technique boosts zero-shot accuracy in long-tail domains, even where only 2K synthetic queries are used for fine-tuning, and that it achieves substantially lower latency than standard reranking methods. We make our end-to-end approach, including our synthetic datasets and replication code, publicly available on Github: https://github.com/primeqa/primeqa.
Published: 2023
Full Text: View/download PDF

2. Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Author: Santhanam, Keshav, Saad-Falcon, Jon, Franz, Martin, Khattab, Omar, Sil, Avirup, Florian, Radu, Sultan, Md Arafat, Roukos, Salim, Zaharia, Matei, and Potts, Christopher
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL), Information Retrieval (cs.IR), Computer Science - Information Retrieval
Abstract: Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these efficiency considerations are chosen and weighed. We hope that future benchmarks will adopt these guidelines toward more holistic IR evaluation.
Published: 2022
Full Text: View/download PDF

3. Evaluation of Argo Scholar with Observational Study

Author: Li, Kevin, Yang, Haoyang, Montoya, Evan, Upadhayay, Anish, Zhou, Zhiyan, Saad-Falcon, Jon, and Chau, Duen Horng
Subjects: FOS: Computer and information sciences, Computer Science - Human-Computer Interaction, Human-Computer Interaction (cs.HC)
Abstract: Discovering and making sense of relevant literature is fundamental in any scientific field. Node-link diagram-based visualization tools can aid this process; however, existing tools have been evaluated only on small scales. This paper evaluates Argo Scholar, an open-source visualization tool designed for interactive exploration of literature and easy sharing of exploration results. A large-scale user study of 122 participants from diverse backgrounds and experiences showed that Argo Scholar is effective at helping users find related work and understand paper connections, and incremental graph-based exploration is effective across diverse disciplines. Based on the user study and user feedback, we provide design considerations and feature suggestions for future work., Comment: VIS IEEE 22
Published: 2022
Full Text: View/download PDF

4. Embedding Recycling for Language Models

Author: Saad-Falcon, Jon, Singh, Amanpreet, Soldaini, Luca, D'Arcy, Mike, Cohan, Arman, and Downey, Doug
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Real-world applications of neural language models often involve running many different models over the same corpus. The high computational cost of these runs has led to interest in techniques that can reuse the contextualized embeddings produced in previous runs to speed training and inference of future ones. We refer to this approach as embedding recycling (ER). While multiple ER techniques have been proposed, their practical effectiveness is still unknown because existing evaluations consider very few models and do not adequately account for overhead costs. We perform an extensive evaluation of ER across eight different models (17 to 900 million parameters) and fourteen tasks in English. We show how a simple ER technique that caches activations from an intermediate layer of a pretrained model, and learns task-specific adapters on the later layers, is broadly effective. For the best-performing baseline in our experiments (DeBERTa-v2 XL), adding a precomputed cache results in a >90% speedup during training and 87-91% speedup for inference, with negligible impact on accuracy. Our analysis reveals important areas of future work., EACL Findings 2023
Published: 2022
Full Text: View/download PDF

5. Quantifying the Impact of Human Capital, Job History, and Language Factors on Job Seniority with a Large-scale Analysis of Resumes

Author: Wright, Austin P, Ziems, Caleb, Park, Haekyu, Saad-Falcon, Jon, Chau, Duen Horng, Yang, Diyi, and Tomprou, Maria
Subjects: FOS: Economics and business, FOS: Computer and information sciences, General Economics (econ.GN), Information Retrieval (cs.IR), Economics - General Economics, Computer Science - Information Retrieval
Abstract: As job markets worldwide have become more competitive and applicant selection criteria have become more opaque, and different (and sometimes contradictory) information and advice is available for job seekers wishing to progress in their careers, it has never been more difficult to determine which factors in a r\'esum\'e most effectively help career progression. In this work we present a novel, large scale dataset of over half a million r\'esum\'es with preliminary analysis to begin to answer empirically which factors help or hurt people wishing to transition to more senior roles as they progress in their career. We find that previous experience forms the most important factor, outweighing other aspects of human capital, and find which language factors in a r\'esum\'e have significant effects. This lays the groundwork for future inquiry in career trajectories using large scale data analysis and natural language processing techniques., Comment: 9 Pages, 5 Figures, 8 Tables
Published: 2021

6. Argo Scholar: Interactive Visual Exploration of Literature in Browsers

Author: Li, Kevin, Yang, Haoyang, Upadhayay, Anish, Zhou, Zhiyan, Saad-Falcon, Jon, and Chau, Duen Horng
Subjects: FOS: Computer and information sciences, Computer Science - Human-Computer Interaction, Human-Computer Interaction (cs.HC)
Abstract: Discovering and making sense of relevant research literature is fundamental to becoming knowledgeable in any scientific discipline. Visualization can aid this process; however, existing tools' adoption and impact have often been constrained, such as by their reliance on small curated paper datasets that quickly become outdated or a lack of support for personalized exploration. We introduce Argo Scholar, an open-source, web-based visualization tool for interactive exploration of literature and easy sharing of exploration results. Argo Scholar queries and visualizes Semantic Scholar's live data of almost 200 million papers, enabling users to generate personalized literature exploration results in real-time through flexible, incremental exploration, a common and effective method for researchers to discover relevant work. Our tool allows users to easily share their literature exploration results as a URL or web-embedded IFrame application. Argo Scholar is open-sourced and available at https://poloclub.github.io/argo-scholar/., Comment: IEEE VIS 2021
Published: 2021
Full Text: View/download PDF

7. Mapping Researchers with PeopleMap

Author: Saad-Falcon, Jon, Shaikh, Omar, Wang, Zijie J., Wright, Austin P., Richardson, Sasha, and Chau, Duen Horng
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computer Science - Human-Computer Interaction, Digital Libraries (cs.DL), Computer Science - Digital Libraries, Computation and Language (cs.CL), Human-Computer Interaction (cs.HC)
Abstract: Discovering research expertise at universities can be a difficult task. Directories routinely become outdated, and few help in visually summarizing researchers' work or supporting the exploration of shared interests among researchers. This results in lost opportunities for both internal and external entities to discover new connections, nurture research collaboration, and explore the diversity of research. To address this problem, at Georgia Tech, we have been developing PeopleMap, an open-source interactive web-based tool that uses natural language processing (NLP) to create visual maps for researchers based on their research interests and publications. Requiring only the researchers' Google Scholar profiles as input, PeopleMap generates and visualizes embeddings for the researchers, significantly reducing the need for manual curation of publication information. To encourage and facilitate easy adoption and extension of PeopleMap, we have open-sourced it under the permissive MIT license at https://github.com/poloclub/people-map. PeopleMap has received positive feedback and enthusiasm for expanding its adoption across Georgia Tech., 2020 IEEE Visualization
Published: 2020

8. PeopleMap: Visualization Tool for Mapping Out Researchers using Natural Language Processing

Author: Saad-Falcon, Jon, Shaikh, Omar, Wang, Zijie J., Wright, Austin P., Richardson, Sasha, and Chau, Duen Horng
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computer Science - Human-Computer Interaction, Digital Libraries (cs.DL), Computer Science - Digital Libraries, Computation and Language (cs.CL), Human-Computer Interaction (cs.HC)
Abstract: Discovering research expertise at institutions can be a difficult task. Manually curated university directories easily become out of date and they often lack the information necessary for understanding a researcher's interests and past work, making it harder to explore the diversity of research at an institution and identify research talents. This results in lost opportunities for both internal and external entities to discover new connections and nurture research collaboration. To solve this problem, we have developed PeopleMap, the first interactive, open-source, web-based tool that visually "maps out" researchers based on their research interests and publications by leveraging embeddings generated by natural language processing (NLP) techniques. PeopleMap provides a new engaging way for institutions to summarize their research talents and for people to discover new connections. The platform is developed with ease-of-use and sustainability in mind. Using only researchers' Google Scholar profiles as input, PeopleMap can be readily adopted by any institution using its publicly-accessible repository and detailed documentation., 7 pages, 3 figures, submission to the 29th ACM International Conference on Information and Knowledge Management (CIKM '20), October 19-23, 2020, Galway, Ireland
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"Saad-Falcon, Jon"'

1. UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

2. Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

3. Evaluation of Argo Scholar with Observational Study

4. Embedding Recycling for Language Models

5. Quantifying the Impact of Human Capital, Job History, and Language Factors on Job Seniority with a Large-scale Analysis of Resumes

6. Argo Scholar: Interactive Visual Exploration of Literature in Browsers

7. Mapping Researchers with PeopleMap

8. PeopleMap: Visualization Tool for Mapping Out Researchers using Natural Language Processing

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

8 results on '"Saad-Falcon, Jon"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources