Author: "Ide, Nancy" / Publication Type: Electronic Resources - Searchworks@Jio Institute Digital Library Search Results

1. Infrastructure for Semantic Annotation in the Genomics Domain

Author: El-Haj, Mahmoud, Rutherford, Nathan, Coole, Matthew, Ezeani, Ignatius, Prentice, Sheryl, Ide, Nancy, Knight, Jo, Piao, Scott, Mariani, John, Rayson, Paul, Suderman, Keith, El-Haj, Mahmoud, Rutherford, Nathan, Coole, Matthew, Ezeani, Ignatius, Prentice, Sheryl, Ide, Nancy, Knight, Jo, Piao, Scott, Mariani, John, Rayson, Paul, and Suderman, Keith
Abstract: We describe a novel super-infrastructure for biomedical text mining which incorporates an end-to-end pipeline for the collection, annotation, storage, retrieval and analysis of biomedical and life sciences literature, combining NLP and corpus linguistics methods. The infrastructure permits extreme-scale research on the open access PubMed Central archive. It combines an updatable Gene Ontology Semantic Tagger (GOST) for entity identification and semantic markup in the literature, with a NLP pipeline scheduler (Buster) to collect and process the corpus, and a bespoke columnar corpus database (LexiDB) for indexing. The corpus database is distributed to permit fast indexing, and provides a simple web front-end with corpus linguistics methods for sub-corpus comparison and retrieval. GOST is also connected as a service in the Language Application (LAPPS) Grid, in which context it is interoperable with other NLP tools and data in the Grid and can be combined with them in more complex workflows. In a literature based discovery setting, we have created an annotated corpus of 9,776 papers with 5,481,543 words.
Published: 2020

2. Infrastructure for Semantic Annotation in the Genomics Domain

Author: El-Haj, Mahmoud, Rutherford, Nathan, Coole, Matthew, Ezeani, Ignatius, Prentice, Sheryl, Ide, Nancy, Knight, Jo, Piao, Scott, Mariani, John, Rayson, Paul, Suderman, Keith, El-Haj, Mahmoud, Rutherford, Nathan, Coole, Matthew, Ezeani, Ignatius, Prentice, Sheryl, Ide, Nancy, Knight, Jo, Piao, Scott, Mariani, John, Rayson, Paul, and Suderman, Keith
Abstract: We describe a novel super-infrastructure for biomedical text mining which incorporates an end-to-end pipeline for the collection, annotation, storage, retrieval and analysis of biomedical and life sciences literature, combining NLP and corpus linguistics methods. The infrastructure permits extreme-scale research on the open access PubMed Central archive. It combines an updatable Gene Ontology Semantic Tagger (GOST) for entity identification and semantic markup in the literature, with a NLP pipeline scheduler (Buster) to collect and process the corpus, and a bespoke columnar corpus database (LexiDB) for indexing. The corpus database is distributed to permit fast indexing, and provides a simple web front-end with corpus linguistics methods for sub-corpus comparison and retrieval. GOST is also connected as a service in the Language Application (LAPPS) Grid, in which context it is interoperable with other NLP tools and data in the Grid and can be combined with them in more complex workflows. In a literature based discovery setting, we have created an annotated corpus of 9,776 papers with 5,481,543 words.
Published: 2020

3. Community Standards for Linguistically-Annotated Resources

Author: Ide, Nancy, Calzolari, Nicoletta, Eckle-Kohler, Judith, Gibbon, Dafydd, Hellmann, Sebastian, Lee, Kiyong, Nivre, Joakim, Romary, Laurent, Ide, Nancy, Calzolari, Nicoletta, Eckle-Kohler, Judith, Gibbon, Dafydd, Hellmann, Sebastian, Lee, Kiyong, Nivre, Joakim, and Romary, Laurent
Published: 2017
Full Text: View/download PDF

4. Erratum to: Replicability and reproducibility of research results for human language technology: introducing an LRE special section

Author: Branco, António, Cohen, Kevin Bretonnel, Vossen, Piek, Ide, Nancy, Calzolari, Nicoletta, Branco, António, Cohen, Kevin Bretonnel, Vossen, Piek, Ide, Nancy, and Calzolari, Nicoletta
Published: 2017
Full Text: View/download PDF

5. Replicability and reproducibility of research results for human language technology: introducing an LRE special section

Author: Branco, Antonio, Cohen, Kevin Bretonnel, Vossen, Piek, Ide, Nancy, Calzolari, Nicoletta, Branco, Antonio, Cohen, Kevin Bretonnel, Vossen, Piek, Ide, Nancy, and Calzolari, Nicoletta
Published: 2017
Full Text: View/download PDF

6. Manually Annotated Sub-Corpus Third Release

Author: Ide, Nancy, Suderman, Keith, Baker, Collin, Passonneau, Rebecca, Fellbaum, Christiane, Ide, Nancy, Suderman, Keith, Baker, Collin, Passonneau, Rebecca, and Fellbaum, Christiane
Abstract: *Introduction* Manually Annotated Sub-Corpus (MASC) Third Release was developed as part of The American National Corpus project and consists of approximately 500,000 words of contemporary American English written and spoken data annotated for a wide variety of linguistic phenomena. The MASC project was established to address, to the extent possible, many of the obstacles to the creation of large-scale, robust, multiply-annotated corpora of English covering a wide range of genres of written and spoken language data. The project provides appropriate data and annotations to serve as the base for a community-wide annotation effort, together with an infrastructure that enables the incorporation of contributed annotations into a single, usable format that can then be analyzed as it is or transduced to any of a variety of other formats. The aim is to offset some of the high costs of producing high quality linguistic annotations via a distribution of effort and to solve some of the usability problems for annotations produced at different sites by harmonizing their representation formats. It also provides data from a much wider variety of genres than are often present in existing multiply-annotated corpora of English, and all of the data in the corpus are drawn from current American English so as to be most useful for natural language processing applications used in the web-based environment. Further information about the pojrect is available at the MASC website. The source texts were drawn from the open portion of the American National Corpus Second Release, which includes written texts and spoken transcripts of American English from a broad range of genres produced since 1990 and from the Language Understanding Annotation Corpus, a collection of various genres inlcuding broadcast, newswire, email, and telephone speech annotated for committed belief, event and entity coreference, dialog acts and temporal relations. MASC Third Release includes the the contents of MASC First
Published: 2013

7. Manually Annotated Sub-Corpus First Release

Author: Ide, Nancy, Suderman, Keith, Baker, Collin, Passonneau, Rebecca, Fellbaum, Christiane, Ide, Nancy, Suderman, Keith, Baker, Collin, Passonneau, Rebecca, and Fellbaum, Christiane
Abstract: *Introduction * The Manually Annotated Sub-Corpus First Release (MASC I), Linguistic Data Consortium (LDC) catalog number LDC2010T22 and isbn 1-58563-569-3, is the first of three releases of 500,000 words of MASC data developed as part of the American National Corpus (ANC) project. MASC I consists of approximately 80,000 words of contemporary spoken and written American English annotated for a variety of linguistic phenomena. The MASC project is sponsored by the National Science Foundation and was established to address, to the extent possible, many of the obstacles to the creation of large-scale, robust, multiply-annotated corpora of English covering a wide range of genres of written and spoken language data. Researchers from Vassar College, Columbia University and the International Computer Science Institute, University of California at Berkeley are the principal participants the WordNet project provides consulting. The source texts in MASC I are drawn from the open portion of the American National Corpus (ANC) Second Release LDC2005T35, which includes written texts and spoken transcripts of American English from a broad range of genres produced since 1990 and from the Language Understanding Annotation Corpus LDC2009T09, (LU Corpus), a collection of various genres including broadcast, newswire, email and telephone speech annotated for committed belief, event and entity coreference, dialog acts and temporal relations. All of the words of data in MASC I have validated annotations for token, part of speech, sentence boundary, noun chunks, verb chunks, named entities and Penn Treebank syntax. Full-text FrameNet annotations are available for seventeen texts and WordNet word sense annotations are available for 1000 occurrences of each of fifty-three words. Annotations of all or portions of the sub-corpus for a wide variety of other linguistic phenomena have been contributed by other projects. Software and services available from the ANC project website enable transducti
Published: 2010

8. A road map for interoperable language resource metadata

Author: Cieri, Christopher, Choukri, Khalid, Calzolari, Nicoletta, Langendoen, D. Terence, Leveling, Johannes, Palmer, Martha, Ide, Nancy, Pustejovsky, James, Cieri, Christopher, Choukri, Khalid, Calzolari, Nicoletta, Langendoen, D. Terence, Leveling, Johannes, Palmer, Martha, Ide, Nancy, and Pustejovsky, James
Abstract: LRs remain expensive to create and thus rare relative to demand across languages and technology types. The accidental re-creation of an LR that already exists is a nearly unforgiveable waste of scarce resources that is unfortunately not so easy to avoid. The number of catalogs the HLT researcher must search, with their different formats, make it possible to overlook an existing resource. This paper sketches the sources of this problem and outlines a proposal to rectify along with a new vision of LR cataloging that will to facilitates the documentation and exploitation of a much wider range of LRs than previously considered.
Published: 2010

9. A road map for interoperable language resource metadata

Author: Cieri, Christopher, Choukri, Khalid, Calzolari, Nicoletta, Langendoen, D. Terence, Leveling, Johannes, Palmer, Martha, Ide, Nancy, Pustejovsky, James, Cieri, Christopher, Choukri, Khalid, Calzolari, Nicoletta, Langendoen, D. Terence, Leveling, Johannes, Palmer, Martha, Ide, Nancy, and Pustejovsky, James
Abstract: LRs remain expensive to create and thus rare relative to demand across languages and technology types. The accidental re-creation of an LR that already exists is a nearly unforgiveable waste of scarce resources that is unfortunately not so easy to avoid. The number of catalogs the HLT researcher must search, with their different formats, make it possible to overlook an existing resource. This paper sketches the sources of this problem and outlines a proposal to rectify along with a new vision of LR cataloging that will to facilitates the documentation and exploitation of a much wider range of LRs than previously considered.
Published: 2010

10. A road map for interoperable language resource metadata

Author: Cieri, Christopher, Choukri, Khalid, Calzolari, Nicoletta, Langendoen, D. Terence, Leveling, Johannes, Palmer, Martha, Ide, Nancy, Pustejovsky, James, Cieri, Christopher, Choukri, Khalid, Calzolari, Nicoletta, Langendoen, D. Terence, Leveling, Johannes, Palmer, Martha, Ide, Nancy, and Pustejovsky, James
Abstract: LRs remain expensive to create and thus rare relative to demand across languages and technology types. The accidental re-creation of an LR that already exists is a nearly unforgiveable waste of scarce resources that is unfortunately not so easy to avoid. The number of catalogs the HLT researcher must search, with their different formats, make it possible to overlook an existing resource. This paper sketches the sources of this problem and outlines a proposal to rectify along with a new vision of LR cataloging that will to facilitates the documentation and exploitation of a much wider range of LRs than previously considered.
Published: 2010

11. A road map for interoperable language resource metadata

Author: Cieri, Christopher, Choukri, Khalid, Calzolari, Nicoletta, Langendoen, D. Terence, Leveling, Johannes, Palmer, Martha, Ide, Nancy, Pustejovsky, James, Cieri, Christopher, Choukri, Khalid, Calzolari, Nicoletta, Langendoen, D. Terence, Leveling, Johannes, Palmer, Martha, Ide, Nancy, and Pustejovsky, James
Abstract: LRs remain expensive to create and thus rare relative to demand across languages and technology types. The accidental re-creation of an LR that already exists is a nearly unforgiveable waste of scarce resources that is unfortunately not so easy to avoid. The number of catalogs the HLT researcher must search, with their different formats, make it possible to overlook an existing resource. This paper sketches the sources of this problem and outlines a proposal to rectify along with a new vision of LR cataloging that will to facilitates the documentation and exploitation of a much wider range of LRs than previously considered.
Published: 2010

12. A road map for interoperable language resource metadata

Author: Cieri, Christopher, Choukri, Khalid, Calzolari, Nicoletta, Langendoen, D. Terence, Leveling, Johannes, Palmer, Martha, Ide, Nancy, Pustejovsky, James, Cieri, Christopher, Choukri, Khalid, Calzolari, Nicoletta, Langendoen, D. Terence, Leveling, Johannes, Palmer, Martha, Ide, Nancy, and Pustejovsky, James
Abstract: LRs remain expensive to create and thus rare relative to demand across languages and technology types. The accidental re-creation of an LR that already exists is a nearly unforgiveable waste of scarce resources that is unfortunately not so easy to avoid. The number of catalogs the HLT researcher must search, with their different formats, make it possible to overlook an existing resource. This paper sketches the sources of this problem and outlines a proposal to rectify along with a new vision of LR cataloging that will to facilitates the documentation and exploitation of a much wider range of LRs than previously considered.
Published: 2010

13. Standards for Language Resources

Author: Ide, Nancy, Romary, Laurent, Ide, Nancy, and Romary, Laurent
Abstract: The goal of this paper is two-fold: to present an abstract data model for linguistic annotations and its implementation using XML, RDF and related standards; and to outline the work of a newly formed committee of the International Standards Organization (ISO), ISO/TC 37/SC 4 Language Resource Management, which will use this work as its starting point., Comment: Colloque avec actes et comit\'e de lecture. internationale
Published: 2009

14. Standards for Language Resources

Author: Ide, Nancy, Romary, Laurent, Ide, Nancy, and Romary, Laurent
Abstract: This paper presents an abstract data model for linguistic annotations and its implementation using XML, RDF and related standards; and to outline the work of a newly formed committee of the International Standards Organization (ISO), ISO/TC 37/SC 4 Language Resource Management, which will use this work as its starting point. The primary motive for presenting the latter is to solicit the participation of members of the research community to contribute to the work of the committee., Comment: Colloque avec actes et comit\'e de lecture. internationale
Published: 2009

15. A Common XML-based Framework for Syntactic Annotations

Author: Ide, Nancy, Romary, Laurent, Erjavec, Tomaz, Ide, Nancy, Romary, Laurent, and Erjavec, Tomaz
Abstract: It is widely recognized that the proliferation of annotation schemes runs counter to the need to re-use language resources, and that standards for linguistic annotation are becoming increasingly mandatory. To answer this need, we have developed a framework comprised of an abstract model for a variety of different annotation types (e.g., morpho-syntactic tagging, syntactic annotation, co-reference annotation, etc.), which can be instantiated in different ways depending on the annotator's approach and goals. In this paper we provide an overview of the framework, demonstrate its applicability to syntactic annotation, and show how it can contribute to comparative evaluation of parser output and diverse syntactic annotation schemes., Comment: Colloque avec actes et comit\'e de lecture. internationale
Published: 2009

16. Marking-up multiple views of a Text: Discourse and Reference

Author: Cristea, Dan, Ide, Nancy, Romary, Laurent, Cristea, Dan, Ide, Nancy, and Romary, Laurent
Abstract: We describe an encoding scheme for discourse structure and reference, based on the TEI Guidelines and the recommendations of the Corpus Encoding Specification (CES). A central feature of the scheme is a CES-based data architecture enabling the encoding of and access to multiple views of a marked-up document. We describe a tool architecture that supports the encoding scheme, and then show how we have used the encoding scheme and the tools to perform a discourse analytic task in support of a model of global discourse cohesion called Veins Theory (Cristea & Ide, 1998).
Published: 2009

17. A Formal Model of Dictionary Structure and Content

Author: Romary, Laurent, Ide, Nancy, Kilgarriff, Adam, Romary, Laurent, Ide, Nancy, and Kilgarriff, Adam
Abstract: We show that a general model of lexical information conforms to an abstract model that reflects the hierarchy of information found in a typical dictionary entry. We show that this model can be mapped into a well-formed XML document, and how the XSL transformation language can be used to implement a semantics defined over the abstract model to enable extraction and manipulation of the information in any format.
Published: 2007

18. International Standard for a Linguistic Annotation Framework

Author: Romary, Laurent, Ide, Nancy, Romary, Laurent, and Ide, Nancy
Abstract: This paper describes the Linguistic Annotation Framework under development within ISO TC37 SC4 WG1. The Linguistic Annotation Framework is intended to serve as a basis for harmonizing existing language resources as well as developing new ones.
Published: 2007

19. Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets

Author: Tufis, Dan, Ion, Radu, Ide, Nancy, Tufis, Dan, Ion, Radu, and Ide, Nancy
Abstract: The paper presents a method for word sense disambiguation based on parallel corpora. The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and being supported by available aligned wordnets for the languages in the corpus. The wordnets are aligned to the Princeton Wordnet, according to the principles established by EuroWordNet. The evaluation of the WSD system, implementing the method described herein showed very encouraging results. The same system used in a validation mode, can be used to check and spot alignment errors in multilingually aligned wordnets as BalkaNet and EuroWordNet., Comment: 7 pages in Proc. of COLING2005
Published: 2005

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

19 results on '"Ide, Nancy"'

1. Infrastructure for Semantic Annotation in the Genomics Domain

2. Infrastructure for Semantic Annotation in the Genomics Domain

3. Community Standards for Linguistically-Annotated Resources

4. Erratum to: Replicability and reproducibility of research results for human language technology: introducing an LRE special section

5. Replicability and reproducibility of research results for human language technology: introducing an LRE special section

6. Manually Annotated Sub-Corpus Third Release

7. Manually Annotated Sub-Corpus First Release

8. A road map for interoperable language resource metadata

9. A road map for interoperable language resource metadata

10. A road map for interoperable language resource metadata

11. A road map for interoperable language resource metadata

12. A road map for interoperable language resource metadata

13. Standards for Language Resources

14. Standards for Language Resources

15. A Common XML-based Framework for Syntactic Annotations

16. Marking-up multiple views of a Text: Discourse and Reference

17. A Formal Model of Dictionary Structure and Content

18. International Standard for a Linguistic Annotation Framework

19. Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

19 results on '"Ide, Nancy"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources