Back to Search
Start Over
SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles
- Source :
- PLoS ONE, PLoS ONE, Vol 5, Iss 6, p e10729 (2010), PLOS ONE
- Publication Year :
- 2010
- Publisher :
- Public Library of Science, 2010.
-
Abstract
- BackgroundWord frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to.MethodologyFollowing recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.ConclusionsOur results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.
- Subjects :
- Vocabulary
China
media_common.quotation_subject
Science
Word processing
Social Sciences
Biology
Lexicon
computer.software_genre
Lexical decision task
Humans
media_common
Language
Neuroscience/Cognitive Neuroscience
ENGLISH
Multidisciplinary
Neuroscience/Behavioral Neuroscience
business.industry
LEXICON
Syntax
Neuroscience/Experimental Psychology
NORMS
Word lists by frequency
Neuroscience/Psychology
Word recognition
Medicine
Artificial intelligence
business
computer
Word (computer architecture)
Natural language processing
Research Article
Subjects
Details
- Language :
- English
- ISSN :
- 19326203
- Volume :
- 5
- Issue :
- 6
- Database :
- OpenAIRE
- Journal :
- PLoS ONE
- Accession number :
- edsair.doi.dedup.....29bbf687a8c13a42a6c9d65e3f67a1d5