Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, Ashrafinia S, Bakas S, Beukinga RJ, Boellaard R, Bogowicz M, Boldrini L, Buvat I, Cook GJR, Davatzikos C, Depeursinge A, Desseroit MC, Dinapoli N, Dinh CV, Echegaray S, El Naqa I, Fedorov AY, Gatta R, Gillies RJ, Goh V, Götz M, Guckenberger M, Ha SM, Hatt M, Isensee F, Lambin P, Leger S, Leijenaar RTH, Lenkowicz J, Lippert F, Losnegård A, Maier-Hein KH, Morin O, Müller H, Napel S, Nioche C, Orlhac F, Pati S, Pfaehler EAG, Rahmim A, Rao AUK, Scherer J, Siddique MM, Sijtsema NM, Socarras Fernandez J, Spezi E, Steenbakkers RJHM, Tanadini-Lang S, Thorwarth D, Troost EGC, Upadhaya T, Valentini V, van Dijk LV, van Griethuysen J, van Velden FHP, Whybra P, Richter C, and Löck S
Background Radiomic features may quantify characteristics present in medical imaging. However, the lack of standardized definitions and validated reference values have hampered clinical use. Purpose To standardize a set of 174 radiomic features. Materials and Methods Radiomic features were assessed in three phases. In phase I, 487 features were derived from the basic set of 174 features. Twenty-five research teams with unique radiomics software implementations computed feature values directly from a digital phantom, without any additional image processing. In phase II, 15 teams computed values for 1347 derived features using a CT image of a patient with lung cancer and predefined image processing configurations. In both phases, consensus among the teams on the validity of tentative reference values was measured through the frequency of the modal value and classified as follows: less than three matches, weak; three to five matches, moderate; six to nine matches, strong; 10 or more matches, very strong. In the final phase (phase III), a public data set of multimodality images (CT, fluorine 18 fluorodeoxyglucose PET, and T1-weighted MRI) from 51 patients with soft-tissue sarcoma was used to prospectively assess reproducibility of standardized features. Results Consensus on reference values was initially weak for 232 of 302 features (76.8%) at phase I and 703 of 1075 features (65.4%) at phase II. At the final iteration, weak consensus remained for only two of 487 features (0.4%) at phase I and 19 of 1347 features (1.4%) at phase II. Strong or better consensus was achieved for 463 of 487 features (95.1%) at phase I and 1220 of 1347 features (90.6%) at phase II. Overall, 169 of 174 features were standardized in the first two phases. In the final validation phase (phase III), most of the 169 standardized features could be excellently reproduced (166 with CT; 164 with PET; and 164 with MRI). Conclusion A set of 169 radiomics features was standardized, which enabled verification and calibration of different radiomics software. © RSNA, 2020 Online supplemental material is available for this article. See also the editorial by Kuhl and Truhn in this issue.