Evaluating the accuracy of ChatGPT addressing urological questions: A pilot study

Suleyman Sagir

doi:10.5281/zenodo.6686078

Authors

Suleyman Sagir Islahiye State Hospital, Urology Clinic, Gaziantep Turkey. https://orcid.org/0000-0001-5300-8071

DOI:

https://doi.org/10.5281/zenodo.6686078

Keywords:

Artificial intelligence, ChatGPT, Urology

Abstract

Objective: This research aimed to assess the accuracy of the ChatGPT 3.5 model in providing information related to various urological diseases.

Materials and methods: One hundred twelve questions regarding urological diseases were presented to ChatGPT in December 2022. Responses were recorded and subsequently cross-referenced with the European Urology Association (EUA) guidelines to determine their correctness. Diseases were categorized into subgroups: Urolithiasis, Bladder cancer, Urethroplasty, Renal cancer, and Andrology. Accuracy percentages were calculated for each disease subgroup and the total dataset.

Results: For Urolithiasis, out of 25 responses, 10 (40%) were true and 15 (60%) were false. Bladder cancer had an even distribution, with 50% of the responses (10 out of 20) being true and the remaining 50% being false. Renal cancer showed a higher proportion of true responses, with 14 out of 22 responses (approximately 63.6%) being true and 8 (approximately 36.4%) being false. In the case of Urethroplasty, out of 25 responses, 13 (52%) were true while 12 (48%) were false.

Conclusions: ChatGPT showcased varying degrees of accuracy across different urological disease subgroups. While it demonstrates potential utility as a supportive tool for urological questions, the observed accuracy levels highlight the need for cautious interpretation. Sole reliance on the AI model for medical decisions, absent human oversight, is not recommended at this juncture.

References

Stokel-Walker C. AI bot ChatGPT writes smart essays - should professors worry? Nature. 2022 Dec 9. doi: 10.1038/d41586-022-04397-7. Epub ahead of print. PMID: 36494443.

OpenAI. Introducing ChatGPT. Accessed from: https://openai.com/ blog/chatgpt, Accessed Dec 12, 2022.

Graham F. Daily briefing: Will ChatGPT kill the essay assignment? Nature. 2022 Dec 12. doi: 10.1038/d41586-022-04437-2. Epub ahead of print. PMID: 36517680.

Yang DB, Smith AD, Smith EJ, Naik A, Janbahan M, Thompson CM, et al. The State of Machine Learning in Outcomes Prediction of Transsphenoidal Surgery: A Systematic Review. J Neurol Surg B Skull Base. 2022;84(6):548-59.

Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One. 2017;12(4):e0174944.

Mashraqi AM, Allehyani B. Current trends on the application of artificial intelligence in medical sciences. Bioinformation. 2022;18(11):1050-61.

Gallagher AG, Ritter EM, Champion H, Higgins G, Fried MP, Moses G, et al. Virtual reality simulation for the operating room: proficiency-based training as a paradigm shift in surgical skills training. Ann Surg. 2005;241(2):364-72.

Jin Q, Dhingra B, Liu Z, Cohen WW, Lu X. PubMedQA: a dataset for biomedical research question answering. arXiv doi: 10.48550/ arXiv.1909.06146. Preprint posted online on September 13, 2019

Ha LA, Yaneva V. Automatic question answering for medical MCQs: can it go further than information retrieval? Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019); RANLP 2019; September 2-4, 2019; Varna, Bulgaria. 2019. pp. 418-2.

Jin D, Pan E, Oufattole N, Weng WH, Fang H, Szolovits P. What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences. 2021;11(14):6421.

Kuleshov V, Ding J, Vo C, Hancock B, Ratner A, Li Y, et al. A machine-compiled database of genome-wide association studies. Nat Commun. 2019;10(1):3341.