Artificial intelligence in Otorhinolaryngology practice: Comparative performance of ChatGPT and Gemini AI

Ahmet Celik

doi:10.5281/zenodo.14617672

Authors

Ahmet Celik Silopi State Hospital, Department of Otorhinolaryngology, Sirnak, Turkey https://orcid.org/0000-0002-6192-1546

DOI:

https://doi.org/10.5281/zenodo.14617672

Keywords:

Artificial intelligence, ChatGPT, Gemini, otorhinolaryngology

Abstract

Objective: This study aims to evaluate the accuracy of ChatGPT and Gemini AI in the field of otorhinolaryngology.

Materials and methods: This study evaluated the performance of ChatGPT 4.0 and Gemini AI in answering 150 multiple-choice questions evenly distributed across otorhinolaryngology domains: ear, nose, and throat. Both models were tested under standardized conditions, with their responses compared to an answer key. The true and false answers were evaluated.

Results: For ear-related questions, ChatGPT correctly answered 34 (68%), while Gemini AI correctly answered 33 (66%) (p=0.832). For nose-related questions, both models achieved identical results: 34 correct answers (68%) and 16 incorrect answers (32%) (p=1.000). For throat-related questions, ChatGPT provided 34 correct answers (68%) compared to Gemini AI's 38 correct answers (76%) (p=0.373). Overall, ChatGPT achieved 102 correct answers (68%) and Gemini AI achieved 105 (70%), with no statistically significant difference between the models (p=0.708). The total correct answers across all topics were 207 (69%), and incorrect answers were 91 (31%). Binary logistic regression showed no significant differences in performance between the AI models or topics, confirming their comparable accuracy in otorhinolaryngology question sets.

Conclusion: ChatGPT 4.0 and Gemini AI demonstrated comparable performance in answering otorhinolaryngology questions, with no statistically significant differences observed across ear, nose, and throat topics. Both models achieved high accuracy rates (ChatGPT: 68%, Gemini AI: 70%), suggesting their potential applicability in clinical decision-making and supporting otorhinolaryngology-related diagnostics.

References

Xie Q, Chen Q, Chen A, Peng C, Hu Y, Lin F, et al. Medical Foundation Large Language Models for Comprehensive Text Analysis and Beyond. Res Sq [Preprint]. 2024:rs.3.rs-5456223.

Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595.

Liu J, Wang C, Liu S. Utility of ChatGPT in Clinical Practice. J Med Internet Res. 2023;25:e48568.

Alkaissi H, McFarlane SI. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus. 2023;15(2):e35179.

Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK. ChatGPT in Dentistry: A Comprehensive Review. Cureus. 2023;15(4):e38317.

van Dis EAM, Bollen J, Zuidema W, van Rooij R, Bockting CL. ChatGPT: five priorities for research. Nature. 2023;614(7947):224-6.

Mokmin NAM, Ibrahim NA. The evaluation of chatbot as a tool for health literacy education among undergraduate students. Educ Inf Technol (Dordr). 2021;26(5):6033-49.

Kitamura FC. ChatGPT Is Shaping the Future of Medical Writing But Still Requires Human Judgment. Radiology. 2023;307(2):e230171.

Milne-Ives M, de Cock C, Lim E, Shehadeh MH, de Pennington N, Mole G, et al. The Effectiveness of Artificial Intelligence Conversational Agents in Health Care: Systematic Review. J Med Internet Res. 2020;22(10):e20346.

Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language models are few-shot learners. arXiv. doi: 10.48550/arXiv.2005.14165.

Bhattacharya K, Bhattacharya A, Bhattacharya N, Yagnik Vd, Garg P, Kumar S. ChatGPT in surgical practice—a new kid on the block. Indian J Surg. 2023;22:1–4.

Grünebaum A, Chervenak J, Pollet SL, Katz A, Chervenak FA. The exciting potential for ChatGPT in obstetrics and gynecology. Am J Obstet Gynecol. 2023;228(6):696-705.

Ray PP. Broadening the horizon: a call for extensive exploration of ChatGPT's potential in obstetrics and gynecology. Am J Obstet Gynecol. 2023;229(6):706.

Ray PP. Bridging the gap: integrating ChatGPT into obstetrics and gynecology research-a call to action. Arch Gynecol Obstet. 2024;309(3):1111-3.

Guerra GA, Hofmann H, Sobhani S, Hofmann G, Gomez D, Soroudi D, et al. GPT-4 artificial intelligence model outperforms ChatGPT, medical students, and neurosurgery residents on neurosurgery written board-like questions. World Neurosurg. 2023;179:e160–e165.

Huang RS, Lu KJQ, Meaney C, Kemppainen J, Punnett A, Leung FH. Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study. JMIR Med Educ. 2023;9:e50514.

Waldock WJ, Zhang J, Guni A, Nabeel A, Darzi A, Ashrafian H. The Accuracy and Capability of Artificial Intelligence Solutions in Health Care Examinations and Certificates: Systematic Review and Meta-Analysis. J Med Internet Res. 2024;26:e56532.

Durmaz Engin C, Karatas E, Ozturk T. Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity. Children (Basel). 2024;11(6):750.

Lee Y, Shin T, Tessier L, Javidan A, Jung J, Hong D, et al. Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations. Surg Obes Relat Dis. 2024;20(7):603-8.

Carlà MM, Gambini G, Baldascino A, Boselli F, Giannuzzi F, Margollicci F, et al. Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison. Graefes Arch Clin Exp Ophthalmol. 2024;262(9):2945-59.

Azizoglu M, Aydogdu B. How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study. Acad J Health Sci. 2024;39(1):23-6.

Ulus SA. How does ChatGPT perform on the European Board of Orthopedics and Traumatology examination? A comparative study. Acad J Health Sci. 2023;38(6):43-6.

Demir S. Evaluation of Responses to Questions About Keratoconus Using ChatGPT-4.0, Google Gemini and Microsoft Copilot: A Comparative Study of Large Language Models on Keratoconus. Eye Contact Lens. 2024 Dec 4. doi: 10.1097/ICL.0000000000001158. Epub ahead of print.

Teixeira-Marques F, Medeiros N, Nazaré F, Alves S, Lima N, Ribeiro L, et al. Exploring the role of ChatGPT in clinical decision-making in otorhinolaryngology: a ChatGPT designed study. Eur Arch Otorhinolaryngol. 2024;281(4):2023-30.

Hoch CC, Wollenberg B, Lüers JC, Knoedler S, Knoedler L, Frank K, et al. ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol. 2023;280(9):4271-8.

Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA, et al. Accuracy of ChatGPT-Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis. Otolaryngol Head Neck Surg. 2024;170(6):1492-503.

Qu RW, Qureshi U, Petersen G, Lee SC. Diagnostic and Management Applications of ChatGPT in Structured Otolaryngology Clinical Scenarios. OTO Open. 202322;7(3):e67.