Author: "Authur, Russell" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Authur, Russell"' showing total 6 results

Start Over Author "Authur, Russell"

6 results on '"Authur, Russell"'

1. OLMo: Accelerating the Science of Language Models

Author: Groeneveld, Dirk, Beltagy, Iz, Walsh, Pete, Bhagia, Akshita, Kinney, Rodney, Tafjord, Oyvind, Jha, Ananya Harsh, Ivison, Hamish, Magnusson, Ian, Wang, Yizhong, Arora, Shane, Atkinson, David, Authur, Russell, Chandu, Khyathi Raghavi, Cohan, Arman, Dumas, Jennifer, Elazar, Yanai, Gu, Yuling, Hessel, Jack, Khot, Tushar, Merrill, William, Morrison, Jacob, Muennighoff, Niklas, Naik, Aakanksha, Nam, Crystal, Peters, Matthew E., Pyatkin, Valentina, Ravichander, Abhilasha, Schwenk, Dustin, Shah, Saurabh, Smith, Will, Strubell, Emma, Subramani, Nishant, Wortsman, Mitchell, Dasigi, Pradeep, Lambert, Nathan, Richardson, Kyle, Zettlemoyer, Luke, Dodge, Jesse, Lo, Kyle, Soldaini, Luca, Smith, Noah A., and Hajishirzi, Hannaneh
Subjects: Computer Science - Computation and Language
Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models. Unlike most prior efforts that have only released model weights and inference code, we release OLMo alongside open training data and training and evaluation code. We hope this release will empower the open research community and inspire a new wave of innovation.
Published: 2024

2. Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Author: Soldaini, Luca, Kinney, Rodney, Bhagia, Akshita, Schwenk, Dustin, Atkinson, David, Authur, Russell, Bogin, Ben, Chandu, Khyathi, Dumas, Jennifer, Elazar, Yanai, Hofmann, Valentin, Jha, Ananya Harsh, Kumar, Sachin, Lucy, Li, Lyu, Xinxi, Lambert, Nathan, Magnusson, Ian, Morrison, Jacob, Muennighoff, Niklas, Naik, Aakanksha, Nam, Crystal, Peters, Matthew E., Ravichander, Abhilasha, Richardson, Kyle, Shen, Zejiang, Strubell, Emma, Subramani, Nishant, Tafjord, Oyvind, Walsh, Pete, Zettlemoyer, Luke, Smith, Noah A., Hajishirzi, Hannaneh, Beltagy, Iz, Groeneveld, Dirk, Dodge, Jesse, and Lo, Kyle
Subjects: Computer Science - Computation and Language
Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities and limitations. To facilitate scientific research on language model pretraining, we curate and release Dolma, a three-trillion-token English corpus, built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials. We extensively document Dolma, including its design principles, details about its construction, and a summary of its contents. We present analyses and experimental results on intermediate states of Dolma to share what we have learned about important data curation practices. Finally, we open-source our data curation toolkit to enable reproduction of our work as well as support further research in large-scale data curation., Comment: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma
Published: 2024

3. The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

Author: Lo, Kyle, Chang, Joseph Chee, Head, Andrew, Bragg, Jonathan, Zhang, Amy X., Trier, Cassidy, Anastasiades, Chloe, August, Tal, Authur, Russell, Bragg, Danielle, Bransom, Erin, Cachola, Isabel, Candra, Stefan, Chandrasekhar, Yoganand, Chen, Yen-Sung, Cheng, Evie Yu-Yen, Chou, Yvonne, Downey, Doug, Evans, Rob, Fok, Raymond, Hu, Fangzhou, Huff, Regan, Kang, Dongyeop, Kim, Tae Soo, Kinney, Rodney, Kittur, Aniket, Kang, Hyeonsu, Klevak, Egor, Kuehl, Bailey, Langan, Michael, Latzke, Matt, Lochner, Jaron, MacMillan, Kelsey, Marsh, Eric, Murray, Tyler, Naik, Aakanksha, Nguyen, Ngoc-Uyen, Palani, Srishti, Park, Soya, Paulic, Caroline, Rachatasumrit, Napol, Rao, Smita, Sayre, Paul, Shen, Zejiang, Siangliulue, Pao, Soldaini, Luca, Tran, Huy, van Zuylen, Madeleine, Wang, Lucy Lu, Wilhelm, Christopher, Wu, Caroline, Yang, Jiangjiang, Zamarron, Angele, Hearst, Marti A., and Weld, Daniel S.
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has changed little in decades. The PDF format for sharing research papers is widely used due to its portability, but it has significant downsides including: static content, poor accessibility for low-vision readers, and difficulty reading on mobile devices. This paper explores the question "Can recent advances in AI and HCI power intelligent, interactive, and accessible reading interfaces -- even for legacy PDFs?" We describe the Semantic Reader Project, a collaborative effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers. Through this project, we've developed ten research prototype interfaces and conducted usability studies with more than 300 participants and real-world users showing improved reading experiences for scholars. We've also released a production reading interface for research papers that will incorporate the best features as they mature. We structure this paper around challenges scholars and the public face when reading research papers -- Discovery, Efficiency, Comprehension, Synthesis, and Accessibility -- and present an overview of our progress and remaining open challenges.
Published: 2023

4. The Semantic Scholar Open Data Platform

Author: Kinney, Rodney, Anastasiades, Chloe, Authur, Russell, Beltagy, Iz, Bragg, Jonathan, Buraczynski, Alexandra, Cachola, Isabel, Candra, Stefan, Chandrasekhar, Yoganand, Cohan, Arman, Crawford, Miles, Downey, Doug, Dunkelberger, Jason, Etzioni, Oren, Evans, Rob, Feldman, Sergey, Gorney, Joseph, Graham, David, Hu, Fangzhou, Huff, Regan, King, Daniel, Kohlmeier, Sebastian, Kuehl, Bailey, Langan, Michael, Lin, Daniel, Liu, Haokun, Lo, Kyle, Lochner, Jaron, MacMillan, Kelsey, Murray, Tyler, Newell, Chris, Rao, Smita, Rohatgi, Shaurya, Sayre, Paul, Shen, Zejiang, Singh, Amanpreet, Soldaini, Luca, Subramanian, Shivashankar, Tanaka, Amber, Wade, Alex D., Wagner, Linda, Wang, Lucy Lu, Wilhelm, Chris, Wu, Caroline, Yang, Jiangjiang, Zamarron, Angele, Van Zuylen, Madeleine, and Weld, Daniel S.
Subjects: Computer Science - Digital Libraries, Computer Science - Computation and Language
Abstract: The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction to build the Semantic Scholar Academic Graph, the largest open scientific literature graph to-date, with 200M+ papers, 80M+ authors, 550M+ paper-authorship edges, and 2.4B+ citation edges. The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings. In this paper, we describe the components of the S2 data processing pipeline and the associated APIs offered by the platform. We will update this living document to reflect changes as we add new data offerings and improve existing services., Comment: 8 pages, 6 figures
Published: 2023

5. The Semantic Reader Project.

Author: Lo, Kyle, Chang, Joseph Chee, Head, Andrew, Bragg, Jonathan, Zhang, Amy X., Trier, Cassidy, Anastasiades, Chloe, August, Tal, Authur, Russell, Bragg, Danielle, Bransom, Erin, Cachola, Isabel, Candra, Stefan, Chandrasekhar, Yoganand, Chen, Yen-Sung, Cheng, Evie Yu-Yen, Chou, Yvonne, Downey, Doug, Evans, Rob, and Fok, Raymond
Subjects: USER interfaces, OPEN source software, ARTIFICIAL intelligence, HUMAN-computer interaction, READING, OPEN scholarship
Abstract: The article offers information on the Semantic Reader Project, a free interactive interface for reading research papers. It discusses the development and evaluation of user interfaces powered by artificial intelligence (AI) to support scholars reading research papers and improve their reading experience.
Published: 2024
Full Text: View/download PDF

6. PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents

Author: Lo, Kyle, primary, Shen, Zejiang, additional, Newman, Benjamin, additional, Chang, Joseph, additional, Authur, Russell, additional, Bransom, Erin, additional, Candra, Stefan, additional, Chandrasekhar, Yoganand, additional, Huff, Regan, additional, Kuehl, Bailey, additional, Singh, Amanpreet, additional, Wilhelm, Chris, additional, Zamarron, Angele, additional, Hearst, Marti A., additional, Weld, Daniel, additional, Downey, Doug, additional, and Soldaini, Luca, additional
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Authur, Russell"'

1. OLMo: Accelerating the Science of Language Models

2. Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

3. The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

4. The Semantic Scholar Open Data Platform

5. The Semantic Reader Project.

6. PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

6 results on '"Authur, Russell"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources