Back to Search Start Over

Basal knowledge in the field of pediatric nephrology and its enhancement following specific training of ChatGPT-4 “omni” and Gemini 1.5 Flash.

Authors :
Mondillo, Gianluca
Frattolillo, Vittoria
Colosimo, Simone
Perrotta, Alessandra
Di Sessa, Anna
Guarino, Stefano
Miraglia del Giudice, Emanuele
Marzuillo, Pierluigi
Source :
Pediatric Nephrology. Aug2024, p1-7.
Publication Year :
2024

Abstract

Background: We aimed to evaluate the baseline performance and improvement of ChatGPT-4 “omni” (ChatGPT-4o) and Gemini 1.5 Flash (Gemini 1.5) in answering multiple-choice questions related to pediatric nephrology after specific training.Using questions from the “Educational Review” articles published by Pediatric Nephrology between January 2014 and April 2024, the models were tested both before and after specific training with Portable Data Format (PDF) and text (TXT) file formats of the Educational Review articles removing the last page containing the correct answers using a Python script. The number of correct answers was recorded.Before training, ChatGPT-4o correctly answered 75.2% of the 1395 questions, outperforming Gemini 1.5, which answered 64.9% correctly (<italic>p</italic> < 0.001). After training with PDF files, ChatGPT-4o’s accuracy increased to 77.8%, while Gemini 1.5 improved significantly to 84.7% (<italic>p</italic> < 0.001). Training with TXT files showed similar results, with ChatGPT-4o maintaining 77.8% accuracy and Gemini 1.5 further improving to 87.6% (<italic>p</italic> < 0.001).The study highlights that while ChatGPT-4o has strong baseline performance, specific training does not significantly enhance its accuracy. Conversely, Gemini 1.5, despite its lower initial performance, shows substantial improvement with training, particularly with TXT files. These findings suggest Gemini 1.5’s superior ability to store and retrieve information, making it potentially more effective in clinical applications, albeit with a dependency on additional data for optimal performance.A higher resolution version of the Graphical abstract is available as Supplementary informationA higher resolution version of the Graphical abstract is available as Supplementary informationMethods: We aimed to evaluate the baseline performance and improvement of ChatGPT-4 “omni” (ChatGPT-4o) and Gemini 1.5 Flash (Gemini 1.5) in answering multiple-choice questions related to pediatric nephrology after specific training.Using questions from the “Educational Review” articles published by Pediatric Nephrology between January 2014 and April 2024, the models were tested both before and after specific training with Portable Data Format (PDF) and text (TXT) file formats of the Educational Review articles removing the last page containing the correct answers using a Python script. The number of correct answers was recorded.Before training, ChatGPT-4o correctly answered 75.2% of the 1395 questions, outperforming Gemini 1.5, which answered 64.9% correctly (<italic>p</italic> < 0.001). After training with PDF files, ChatGPT-4o’s accuracy increased to 77.8%, while Gemini 1.5 improved significantly to 84.7% (<italic>p</italic> < 0.001). Training with TXT files showed similar results, with ChatGPT-4o maintaining 77.8% accuracy and Gemini 1.5 further improving to 87.6% (<italic>p</italic> < 0.001).The study highlights that while ChatGPT-4o has strong baseline performance, specific training does not significantly enhance its accuracy. Conversely, Gemini 1.5, despite its lower initial performance, shows substantial improvement with training, particularly with TXT files. These findings suggest Gemini 1.5’s superior ability to store and retrieve information, making it potentially more effective in clinical applications, albeit with a dependency on additional data for optimal performance.A higher resolution version of the Graphical abstract is available as Supplementary informationA higher resolution version of the Graphical abstract is available as Supplementary informationResults: We aimed to evaluate the baseline performance and improvement of ChatGPT-4 “omni” (ChatGPT-4o) and Gemini 1.5 Flash (Gemini 1.5) in answering multiple-choice questions related to pediatric nephrology after specific training.Using questions from the “Educational Review” articles published by Pediatric Nephrology between January 2014 and April 2024, the models were tested both before and after specific training with Portable Data Format (PDF) and text (TXT) file formats of the Educational Review articles removing the last page containing the correct answers using a Python script. The number of correct answers was recorded.Before training, ChatGPT-4o correctly answered 75.2% of the 1395 questions, outperforming Gemini 1.5, which answered 64.9% correctly (<italic>p</italic> < 0.001). After training with PDF files, ChatGPT-4o’s accuracy increased to 77.8%, while Gemini 1.5 improved significantly to 84.7% (<italic>p</italic> < 0.001). Training with TXT files showed similar results, with ChatGPT-4o maintaining 77.8% accuracy and Gemini 1.5 further improving to 87.6% (<italic>p</italic> < 0.001).The study highlights that while ChatGPT-4o has strong baseline performance, specific training does not significantly enhance its accuracy. Conversely, Gemini 1.5, despite its lower initial performance, shows substantial improvement with training, particularly with TXT files. These findings suggest Gemini 1.5’s superior ability to store and retrieve information, making it potentially more effective in clinical applications, albeit with a dependency on additional data for optimal performance.A higher resolution version of the Graphical abstract is available as Supplementary informationA higher resolution version of the Graphical abstract is available as Supplementary informationConclusions: We aimed to evaluate the baseline performance and improvement of ChatGPT-4 “omni” (ChatGPT-4o) and Gemini 1.5 Flash (Gemini 1.5) in answering multiple-choice questions related to pediatric nephrology after specific training.Using questions from the “Educational Review” articles published by Pediatric Nephrology between January 2014 and April 2024, the models were tested both before and after specific training with Portable Data Format (PDF) and text (TXT) file formats of the Educational Review articles removing the last page containing the correct answers using a Python script. The number of correct answers was recorded.Before training, ChatGPT-4o correctly answered 75.2% of the 1395 questions, outperforming Gemini 1.5, which answered 64.9% correctly (<italic>p</italic> < 0.001). After training with PDF files, ChatGPT-4o’s accuracy increased to 77.8%, while Gemini 1.5 improved significantly to 84.7% (<italic>p</italic> < 0.001). Training with TXT files showed similar results, with ChatGPT-4o maintaining 77.8% accuracy and Gemini 1.5 further improving to 87.6% (<italic>p</italic> < 0.001).The study highlights that while ChatGPT-4o has strong baseline performance, specific training does not significantly enhance its accuracy. Conversely, Gemini 1.5, despite its lower initial performance, shows substantial improvement with training, particularly with TXT files. These findings suggest Gemini 1.5’s superior ability to store and retrieve information, making it potentially more effective in clinical applications, albeit with a dependency on additional data for optimal performance.A higher resolution version of the Graphical abstract is available as Supplementary informationA higher resolution version of the Graphical abstract is available as Supplementary informationGraphical Abstract: We aimed to evaluate the baseline performance and improvement of ChatGPT-4 “omni” (ChatGPT-4o) and Gemini 1.5 Flash (Gemini 1.5) in answering multiple-choice questions related to pediatric nephrology after specific training.Using questions from the “Educational Review” articles published by Pediatric Nephrology between January 2014 and April 2024, the models were tested both before and after specific training with Portable Data Format (PDF) and text (TXT) file formats of the Educational Review articles removing the last page containing the correct answers using a Python script. The number of correct answers was recorded.Before training, ChatGPT-4o correctly answered 75.2% of the 1395 questions, outperforming Gemini 1.5, which answered 64.9% correctly (<italic>p</italic> < 0.001). After training with PDF files, ChatGPT-4o’s accuracy increased to 77.8%, while Gemini 1.5 improved significantly to 84.7% (<italic>p</italic> < 0.001). Training with TXT files showed similar results, with ChatGPT-4o maintaining 77.8% accuracy and Gemini 1.5 further improving to 87.6% (<italic>p</italic> < 0.001).The study highlights that while ChatGPT-4o has strong baseline performance, specific training does not significantly enhance its accuracy. Conversely, Gemini 1.5, despite its lower initial performance, shows substantial improvement with training, particularly with TXT files. These findings suggest Gemini 1.5’s superior ability to store and retrieve information, making it potentially more effective in clinical applications, albeit with a dependency on additional data for optimal performance.A higher resolution version of the Graphical abstract is available as Supplementary informationA higher resolution version of the Graphical abstract is available as Supplementary information [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0931041X
Database :
Academic Search Index
Journal :
Pediatric Nephrology
Publication Type :
Academic Journal
Accession number :
179017467
Full Text :
https://doi.org/10.1007/s00467-024-06486-3