Back to Search Start Over

PSOR03 Presentation Time: 11:40 AM: Reinforcement Learning Algorithm for Catheter Optimization in HDR Prostate Brachytherapy.

Authors :
Chatigny, Philippe
Bélanger, Cédric
Poulin, Éric
Beaulieu, Luc
Source :
Brachytherapy. Nov2024:Supplement, Vol. 23 Issue 6, pS80-S81. 2p.
Publication Year :
2024

Abstract

In high dose rate brachytherapy, the catheter positions are crucial to the delivery of the treatment. Currently, the most common approach for prostate cancer is with the use of a template to guide catheters. New approaches which are not bound to such templates and offer a wider variety of insertion patterns are being developed, for example, with robotic insertion or personalized 3D-printed template. Many algorithms have been developed to tackle the problem of catheter insertion such as GOMEA or centroidal voronoi tessellation (CVT). The principal trade-off between different algorithms is speed versus quality. We use a deep learning approach to be more efficient; simultaneously being fast and offering good catheter optimization. The approach chosen is reinforcement learning (RL), which has been successfully used to beat the best known solution of traditional algorithms for Chess and Go. The advantage of a RL approach is that they do not need a dataset to be trained which in turn can offer original solutions to known problems. A RL algorithm is implemented using PyTorch Stable Baselines 3. The model chosen is the off-policies algorithm Soft Actor Critic (SAC). The task of the model is to insert a specific number of catheters inside the prostate anatomy. The input consists of the projected 2D anatomy of the prostate and the urethra (as it would be done for a CVT algorithm). The output of the model gives the catheters positions. The reward function used is negative when the model tries to insert catheters inside (or too close to) the urethra as well as catheters fully outside the prostate. When all catheters are inserted correctly, the reward function is positive. In order to evaluate the reward, first, the catheter insertion provided by our in-house CVT is used as a baseline for comparison. Second, our in-house multicriteria optimization (gMCO) is used on the RL predicted catheters and the values of each objective for all plans at the end of the optimization is compared to the one obtained with CVT. The objectives contain prescription dose to the target and protection of Organs At Risk (OAR). The lower the objective functions are compared to CVT, the better is the reward. The training set, and test set, each consists of 30 different prostate anatomies. The model is trained over 500 000 steps and in the case where the insertions are valid, gMCO is used to generate 500 plans. For now, only a single model is trained on inserting 16 catheters. The model takes 3 seconds to load, generate the catheters and send the configuration to gMCO compared to CVT which takes close to 10 seconds for the same task. After training, the model managed to get a similar or better mean objective compared to CVT for 15 cases (50%) in the training set. The model struggled with the test set; it still manages to find lower objectives for at least one plan out of 500 in 3 cases, but not enough so that the mean of a single case is lower than CVT. Focusing on a case where the model performs better than CVT, see table 1. The CVT insertion does not achieve V125%[%] of the urethra equal to 0 while simultaneously having a target V100%[%] higher than 95 for any of the 500 generated plans, for this specific case. Likewise, the urethra D10%[Gy] is higher. In general, CVT manages to achieve a higher V100%[%] of the target compared to our model, but in those cases the OARs doses are higher. Even though the results are preliminary, the catheter patterns observed with the model are as good or better than CVT for a few cases. In order to achieve better results in the test case, we aim to use a greater number of patients which will help the model generalize. The next phase will involve 3D images to take into account the bladder and rectum anatomy. Another approach would be to modify the reward signal in order to take into account DVH indices instead of the gMCO's objective function. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15384721
Volume :
23
Issue :
6
Database :
Academic Search Index
Journal :
Brachytherapy
Publication Type :
Academic Journal
Accession number :
180495233
Full Text :
https://doi.org/10.1016/j.brachy.2024.08.113