Back to Search
Start Over
Empirical analysis of Zipf's law, power law, and lognormal distributions in medical discharge reports.
- Source :
-
International journal of medical informatics [Int J Med Inform] 2021 Jan; Vol. 145, pp. 104324. Date of Electronic Publication: 2020 Nov 02. - Publication Year :
- 2021
-
Abstract
- Background: Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions.<br />Objective: This paper empirically analyses whether text in medical discharge reports follow Zipf's law, a commonly assumed statistical property of language where word frequency follows a discrete power-law distribution.<br />Method: We examined 20,000 medical discharge reports from the MIMIC-III dataset. Methods included splitting the discharge reports into tokens, counting token frequency, fitting power-law distributions to the data, and testing whether alternative distributions-lognormal, exponential, stretched exponential, and truncated power-law-provided superior fits to the data.<br />Result: Discharge reports are best fit by the truncated power-law and lognormal distributions. Discharge reports appear to be near-Zipfian by having the truncated power-law provide superior fits over a pure power-law.<br />Conclusion: Our findings suggest that Bayesian modelling and statistical text analysis of discharge report text would benefit from using truncated power-law and lognormal probability priors and non-parametric models that capture power-law behavior.<br /> (Copyright © 2020 Elsevier B.V. All rights reserved.)
- Subjects :
- Bayes Theorem
Humans
Language
Models, Theoretical
Patient Discharge
Subjects
Details
- Language :
- English
- ISSN :
- 1872-8243
- Volume :
- 145
- Database :
- MEDLINE
- Journal :
- International journal of medical informatics
- Publication Type :
- Academic Journal
- Accession number :
- 33181446
- Full Text :
- https://doi.org/10.1016/j.ijmedinf.2020.104324