Back to Search Start Over

Empirical analysis of Zipf's law, power law, and lognormal distributions in medical discharge reports.

Authors :
Quiroz, Juan C
Laranjo, Liliana
Tufanaru, Catalin
Kocaballi, Ahmet Baki
Rezazadegan, Dana
Berkovsky, Shlomo
Coiera, Enrico
Source :
International Journal of Medical Informatics. Jan2021, Vol. 145, pN.PAG-N.PAG. 1p.
Publication Year :
2021

Abstract

<bold>Background: </bold>Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions.<bold>Objective: </bold>This paper empirically analyses whether text in medical discharge reports follow Zipf's law, a commonly assumed statistical property of language where word frequency follows a discrete power-law distribution.<bold>Method: </bold>We examined 20,000 medical discharge reports from the MIMIC-III dataset. Methods included splitting the discharge reports into tokens, counting token frequency, fitting power-law distributions to the data, and testing whether alternative distributions-lognormal, exponential, stretched exponential, and truncated power-law-provided superior fits to the data.<bold>Result: </bold>Discharge reports are best fit by the truncated power-law and lognormal distributions. Discharge reports appear to be near-Zipfian by having the truncated power-law provide superior fits over a pure power-law.<bold>Conclusion: </bold>Our findings suggest that Bayesian modelling and statistical text analysis of discharge report text would benefit from using truncated power-law and lognormal probability priors and non-parametric models that capture power-law behavior. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13865056
Volume :
145
Database :
Academic Search Index
Journal :
International Journal of Medical Informatics
Publication Type :
Academic Journal
Accession number :
147508747
Full Text :
https://doi.org/10.1016/j.ijmedinf.2020.104324