Back to Search Start Over

Evaluating a Machine Learning Approach to Identifying Expressive Content at Page Level in HathiTrust

Authors :
PARULIAN, NIKOLAUS
Publication Year :
2020
Publisher :
Humanities Commons, 2020.

Abstract

HathiTrust currently provides metadata, scanned images, and full text for all public domain volumes. However, it’s likely there is content that is of interest to scholars and free from restriction within the front matter of most volumes, regardless of rights status. For example, the title page or table of contents may contain information that is likely non-expressive and useful to understanding the content’s structure and subject matter. It’s also likely that some volumes include materials that have expressive/creative content in the first 20 pages, so front matter cannot be made open for all volumes without understanding the most frequent type of content within the first 20 pages. This task is time-prohibitive for entirely manual exploration, so we seek to evaluate a machine learning approach for this task.

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.doi...........09875bd00ea2aba2b6b3237dc9a76f1a
Full Text :
https://doi.org/10.17613/3nfw-tx25