Varga, Tibor V., Niss, Kristoffer, Estampador, Angela C., Collin, Catherine B., Moseley, Pope L., Varga, Tibor V., Niss, Kristoffer, Estampador, Angela C., Collin, Catherine B., and Moseley, Pope L.
Aims: Appropriate analysis of big data is fundamental to precision medicine. While statistical analyses often uncover numerous associations, associations themselves do not convey predictive value. Confusion between association and prediction harms clinicians, scientists, and ultimately, the patients. We analyzed published papers in the field of diabetes that refer to “prediction” in their titles. We assessed whether these articles report metrics relevant to prediction. Methods: A systematic search was undertaken using NCBI PubMed. Articles with the terms “diabetes” and “prediction” were selected. All abstracts of original research articles, within the field of diabetes epidemiology, were searched for metrics pertaining to predictive statistics. Simulated data was generated to visually convey the differences between association and prediction. Results: The search-term yielded 2,182 results. After discarding non-relevant articles, 1,910 abstracts were evaluated. Of these, 39% (n = 745) reported metrics of predictive statistics, while 61% (n = 1,165) did not. The top reported metrics of prediction were ROC AUC, sensitivity and specificity. Using the simulated data, we demonstrated that biomarkers with large effect sizes and low P values can still offer poor discriminative utility. Conclusions: We demonstrate a landscape of confused reporting within the field of diabetes epidemiology where the term “prediction” is often incorrectly used to refer to association statistics. We propose guidelines for future reporting, and two major routes forward in terms of main analytic procedures and research goals: the explanatory route, which contributes to precision medicine, and the prediction route which contributes to personalized medicine.