1. Active learning for extracting surgomic features in robot-assisted minimally invasive esophagectomy: a prospective annotation study
- Author
-
Brandenburg, Johanna M., Jenke, Alexander C., Stern, Antonia, Daum, Marie T. J., Schulze, André, Younis, Rayan, Petrynowski, Philipp, Davitashvili, Tornike, Vanat, Vincent, Bhasker, Nithya, Schneider, Sophia, Mündermann, Lars, Reinke, Annika, Kolbinger, Fiona R., Jörns, Vanessa, Fritz-Kebede, Fleur, Dugas, Martin, Maier-Hein, Lena, Klotz, Rosa, Distler, Marius, Weitz, Jürgen, Müller-Stich, Beat P., Speidel, Stefanie, Bodenstedt, Sebastian, and Wagner, Martin
- Abstract
Background: With Surgomics, we aim for personalized prediction of the patient's surgical outcome using machine-learning (ML) on multimodal intraoperative data to extract surgomic features as surgical process characteristics. As high-quality annotations by medical experts are crucial, but still a bottleneck, we prospectively investigate active learning (AL) to reduce annotation effort and present automatic recognition of surgomic features. Methods: To establish a process for development of surgomic features, ten video-based features related to bleeding, as highly relevant intraoperative complication, were chosen. They comprise the amount of blood and smoke in the surgical field, six instruments, and two anatomic structures. Annotation of selected frames from robot-assisted minimally invasive esophagectomies was performed by at least three independent medical experts. To test whether AL reduces annotation effort, we performed a prospective annotation study comparing AL with equidistant sampling (EQS) for frame selection. Multiple Bayesian ResNet18 architectures were trained on a multicentric dataset, consisting of 22 videos from two centers. Results: In total, 14,004 frames were tag annotated. A mean F1-score of 0.75 ± 0.16 was achieved for all features. The highest F1-score was achieved for the instruments (mean 0.80 ± 0.17). This result is also reflected in the inter-rater-agreement (1-rater-kappa > 0.82). Compared to EQS, AL showed better recognition results for the instruments with a significant difference in the McNemar test comparing correctness of predictions. Moreover, in contrast to EQS, AL selected more frames of the four less common instruments (1512 vs. 607 frames) and achieved higher F1-scores for common instruments while requiring less training frames. Conclusion: We presented ten surgomic features relevant for bleeding events in esophageal surgery automatically extracted from surgical video using ML. AL showed the potential to reduce annotation effort while keeping ML performance high for selected features. The source code and the trained models are published open source. Graphical abstract:
- Published
- 2023
- Full Text
- View/download PDF