551. The Acquisition and Sentence Alignment for Academic Bilingual Resources Based on Web Paper Libraries
- Author
-
Rui Men, Yueheng Sun, and Weijie Ni
- Subjects
Information retrieval ,Parsing ,Computer science ,business.industry ,computer.internet_protocol ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,computer.software_genre ,Rule-based machine translation ,Web page ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Artificial intelligence ,Computational linguistics ,Precision and recall ,business ,Web crawler ,computer ,Sentence ,Natural language processing ,XML - Abstract
This paper presents an approach for acquiring academic bilingual resources from the web paper libraries. By analyzing the structured information of web pages, we first implement a customized crawler to download these pages including paper details, and then use a parser to transfer them into XML format. Based on the classic statistical method for sentence alignment, we propose an improved approach to align the initial bilingual resources, in which two factors, bilingual keyword pairs and matching patterns are introduced. Experimental results show that our sentence aligner supported by the new approach achieves performance enhancement by 7% in both precision and recall.
- Published
- 2009
- Full Text
- View/download PDF