Yang Jin, Lichuan Zhang, Jinming Yu, Linian Huang, Tao Xu, Lin Tong, Charles A. Powell, Yu Shang, Fei Tan, Yuanlin Song, Jie Liu, Jiwei Wang, Maosong Ye, Jianping Zhao, Yong Zhang, Hongqing Zhao, Dawei Yang, Chaomin Wu, Li Bai, Lin Zhao, Yaoli Wang, Kui Xiao, Chunling Dong, Chunxue Bai, Deng Chen, Jing Li, Hai Yu, Chunhua Du, Zhenju Song, Hong Chen, Yu Xu, Jian Zhou, Xun Wang, and Ziqiang Zhang
BackgroundThe outbreak of coronavirus disease 2019 (COVID-19) has become a global pandemic acute infectious disease, especially with the features of possible asymptomatic carriers and high contagiousness. It causes acute respiratory distress syndrome and results in a high mortality rate if pneumonia is involved. Currently, it is difficult to quickly identify asymptomatic cases or COVID-19 patients with pneumonia due to limited access to reverse transcription-polymerase chain reaction (RT-PCR) nucleic acid tests and CT scans, which facilitates the spread of the disease at the community level, and contributes to the overwhelming of medical resources in intensive care units.GoalThis study aimed to develop a scientific and rigorous clinical diagnostic tool for the rapid prediction of COVID-19 cases based on a COVID-19 clinical case database in China, and to assist global frontline doctors to efficiently and precisely diagnose asymptomatic COVID-19 patients and cases who had a false-negative RT-PCR test result.MethodsWith online consent, and the approval of the ethics committee of Zhongshan Hospital Fudan Unversity (approval number B2020-032R) to ensure that patient privacy is protected, clinical information has been uploaded in real-time through the New Coronavirus Intelligent Auto-diagnostic Assistant Application of cloud plus terminal (nCapp) by doctors from different cities (Wuhan, Shanghai, Harbin, Dalian, Wuxi, Qingdao, Rizhao, and Bengbu) during the COVID-19 outbreak in China. By quality control and data anonymization on the platform, a total of 3,249 cases from COVID-19 high-risk groups were collected. These patients had SARS-CoV-2 RT-PCR test results and chest CT scans, both of which were used as the gold standard for the diagnosis of COVID-19 and COVID-19 pneumonia. In particular, the dataset included 137 indeterminate cases who initially did not have RT-PCR tests and subsequently had positive RT-PCR results, 62 suspected cases who initially had false-negative RT-PCR test results and subsequently had positive RT-PCR results, and 122 asymptomatic cases who had positive RT-PCR test results, amongst whom 31 cases were diagnosed. We also integrated the function of a survey in nCapp to collect user feedback from frontline doctors.FindingsWe applied the statistical method of a multi-factor regression model to the training dataset (1,624 cases) and developed a prediction model for COVID-19 with 9 clinical indicators that are fast and accessible: ‘Residing or visiting history in epidemic regions’, ‘Exposure history to COVID-19 patient’, ‘Dry cough’, ‘Fatigue’, ‘Breathlessness’, ‘No body temperature decrease after antibiotic treatment’, ‘Fingertip blood oxygen saturation ≤93%’, ‘Lymphopenia’, and ‘C-reactive protein (CRP) increased’. The area under the receiver operating characteristic (ROC) curve (AUC) for the model was 0.88 (95% CI: 0.86, 0.89) in the training dataset and 0.84 (95% CI: 0.82, 0.86) in the validation dataset (1,625 cases). To ensure the sensitivity of the model, we used a cutoff value of 0.09. The sensitivity and specificity of the model were 98.0% (95% CI: 96.9%, 99.1%) and 17.3% (95% CI: 15.0%, 19.6%), respectively, in the training dataset, and 96.5% (95% CI: 95.1%, 98.0%) and 18.8% (95% CI: 16.4%, 21.2%), respectively, in the validation dataset. In the subset of the 137 indeterminate cases who initially did not have RT-PCR tests and subsequently had positive RT-PCR results, the model predicted 132 cases, accounting for 96.4% (95% CI: 91.7%, 98.8%) of the cases. In the subset of the 62 suspected cases who initially had false-negative RT-PCR test results and subsequently had positive RT-PCR results, the model predicted 59 cases, accounting for 95.2% (95% CI: 86.5%, 99.0%) of the cases. Considering the specificity of the model, we used a cutoff value of 0.32. The sensitivity and specificity of the model were 83.5% (95% CI: 80.5%, 86.4%) and 83.2% (95% CI: 80.9%, 85.5%), respectively, in the training dataset, and 79.6% (95% CI: 76.4%, 82.8%) and 81.3% (95% CI: 78.9%, 83.7%), respectively, in the validation dataset, which is very close to the published AI model.The results of the online survey ‘Questionnaire Star’ showed that 90.9% of nCapp users in WeChat mini programs were ‘satisfied’ or ‘very satisfied’ with the tool. The WeChat mini program received a significantly higher satisfaction rate than other platforms, especially for ‘availability and sharing convenience of the App’ and ‘fast speed of log-in and data entry’.DiscussionWith the assistance of nCapp, a mobile-based diagnostic tool developed from a large database that we collected from COVID-19 high-risk groups in China, frontline doctors can rapidly identify asymptomatic patients and avoid misdiagnoses of cases with false-negative RT-PCR results. These patients require timely isolation or close medical supervision. By applying the model, medical resources can be allocated more reasonably, and missed diagnoses can be reduced. In addition, further education and interaction among medical professionals can improve the diagnostic efficiency for COVID-19, thus avoiding the transmission of the disease from asymptomatic patients at the community level.