Meyer C, Huger S, Bruand M, Leroy T, Palisson J, Rétif P, Sarrade T, Barateau A, Renard S, Jolnerovski M, Demogeot N, Marcel J, Martz N, Stefani A, Sellami S, Jacques J, Agnoux E, Gehin W, Trampetti I, Margulies A, Golfier C, Khattabi Y, Olivier C, Alizée R, Py JF, and Faivre JC
Introduction: The delineation of organs-at-risk and lymph node areas is a crucial step in radiotherapy, but it is time-consuming and associated with substantial user-dependent variability in contouring. Artificial intelligence (AI) appears to be the solution to facilitate and standardize this work. The objective of this study is to compare eight available AI software programs in terms of technical aspects and accuracy for contouring organs-at-risk and lymph node areas with current international contouring recommendations., Material and Methods: From January-July 2023, we performed a blinded study of the contour scoring of the organs-at-risk and lymph node areas by eight self-contouring AI programs by 20 radiation oncologists. It was a single-center study conducted in radiation department at the Lorraine Cancer Institute. A qualitative analysis of technical characteristics of the different AI programs was also performed. Three adults (two women and one man) and three children (one girl and two boys) provided six whole-body anonymized CT scans, along with two other adult brain MRI scans. Using a scoring scale from 1 to 3 (best score), radiation oncologists blindly assessed the quality of contouring of organs-at-risk and lymph node areas of all scans and MRI data by the eight AI programs. We have chosen to define the threshold of an average score equal to or greater than 2 to characterize a high-performing AI software, meaning an AI with minimal to moderate corrections but usable in clinical routine., Results: For adults CT scans: There were two AI programs for which the overall average quality score (that is, all areas tested for OARs and lymph nodes) was higher than 2.0: Limbus (overall average score = 2.03 (0.16)) and MVision (overall average score = 2.13 (0.19)). If we only consider OARs for adults, only Limbus, Therapanacea, MVision and Radformation have an average score above 2. For children CT scan, MVision was the only program to have a average score higher than 2 with overall average score = 2.07 (0.19). If we only consider OARs for children, only Limbus and MVision have an average score above 2. For brain MRIs: TheraPanacea was the only program with an average score over 2, for both brain delineation (2.75 (0.35)) and OARs (2.09 (0.19)). The comparative analysis of the technical aspects highlights the similarities and differences between the software. There is no difference in between senior radiation oncologist and residents for OARs contouring., Conclusion: For adult CT-scan, two AI programs on the market, MVision and Limbus, delineate most OARs and lymph nodes areas that are useful in clinical routine. For children CT-scan, only one IA, MVision, program is efficient. For adult brain MRI, Therapancea,only one AI program is efficient., Trial Registration: CNIL-MR0004 Number HDH434., Competing Interests: Declarations. Ethics approval and consent to participate: This study was approved by ethics and conducted in accordance with the ethical standards of the Declaration of Helsinki (as revised in 2013). This study was approved by Ethics committee named the French National Commission of Informatics and Liberty (CNIL) (CNIL-MR0004 Number HDH434). The present study has been approved by the French Health Data Institute (Health DataHub) as the number HDH301. All methods were carried out in accordance with relevant guidelines and regulations. All participants have signed informed consent to the use of their data for research purposes. Consent for publication: Not applicable. Competing interests: The authors have declared no conflicts of interest. The Lorraine Cancer Institute used MVision AI software until January 1, 2023., (© 2024. The Author(s).)