Endoscopy
DOI: 10.1055/a-2289-5732
Innovations and brief communications

Comparative evaluation of a language model and human specialists in the application of European guidelines for the management of inflammatory bowel diseases and malignancies

Itai Ghersin
1   Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel (Ringgold ID: RIN58878)
,
Roni Weisshof
1   Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel (Ringgold ID: RIN58878)
,
Eduard Koifman
1   Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel (Ringgold ID: RIN58878)
,
Haggai Bar-Yoseph
1   Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel (Ringgold ID: RIN58878)
2   Rappaport Faculty of Medicine, Technion, Israel Institute of Technology, Haifa, Israel
,
Dana Ben Hur
1   Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel (Ringgold ID: RIN58878)
,
Itay Maza
1   Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel (Ringgold ID: RIN58878)
,
Erez Hasnis
1   Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel (Ringgold ID: RIN58878)
,
Roni Nasser
1   Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel (Ringgold ID: RIN58878)
,
Baruch Ovadia
3   Department of Gastroenterology and Hepatology, Hillel Yaffe Medical Center, Hadera, Israel
,
Dikla Dror Zur
4   Department of Gastroenterology, Galilee Medical Center, Nahariya, Israel
,
Matti Waterman
1   Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel (Ringgold ID: RIN58878)
2   Rappaport Faculty of Medicine, Technion, Israel Institute of Technology, Haifa, Israel
,
Yuri Gorelik
1   Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel (Ringgold ID: RIN58878)
› Author Affiliations


Abstract

Background Society guidelines on colorectal dysplasia screening, surveillance, and endoscopic management in inflammatory bowel disease (IBD) are complex, and physician adherence to them is suboptimal. We aimed to evaluate the use of ChatGPT, a large language model, in generating accurate guideline-based recommendations for colorectal dysplasia screening, surveillance, and endoscopic management in IBD in line with European Crohn’s and Colitis Organization (ECCO) guidelines.

Methods 30 clinical scenarios in the form of free text were prepared and presented to three separate sessions of ChatGPT and to eight gastroenterologists (four IBD specialists and four non-IBD gastroenterologists). Two additional IBD specialists subsequently assessed all responses provided by ChatGPT and the eight gastroenterologists, judging their accuracy according to ECCO guidelines.

Results ChatGPT had a mean correct response rate of 87.8%. Among the eight gastroenterologists, the mean correct response rates were 85.8% for IBD experts and 89.2% for non-IBD experts. No statistically significant differences in accuracy were observed between ChatGPT and all gastroenterologists (P=0.95), or between ChatGPT and the IBD experts and non-IBD expert gastroenterologists, respectively (P=0.82).

Conclusions This study highlights the potential of language models in enhancing guideline adherence regarding colorectal dysplasia in IBD. Further investigation of additional resources and prospective evaluation in real-world settings are warranted.

Supplementary Material



Publication History

Received: 15 June 2023

Accepted after revision: 18 March 2024

Accepted Manuscript online:
18 March 2024

Article published online:
18 April 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Kucharzik T, Ellul P, Greuter T. et al. ECCO guidelines on the prevention, diagnosis, and management of infections in inflammatory bowel disease. J Crohns Colitis 2021; 15: 879-913 DOI: 10.1093/ecco-jcc/jjab052. (PMID: 33730753)
  • 2 Torres J, Chaparro M, Julsgaard M. et al. European Crohn’s and colitis guidelines on sexuality, fertility, pregnancy, and lactation. J Crohns Colitis 2023; 17: 1-27 DOI: 10.1093/ecco-jcc/jjac115. (PMID: 36005814)
  • 3 Gordon H, Biancone L, Fiorino G. et al. ECCO guidelines on inflammatory bowel disease and malignancies. J Crohns Colitis 2023; 17: 827-854 DOI: 10.1093/ecco-jcc/jjac187. (PMID: 36528797)
  • 4 Jackson BD, Con D, Liew D. et al. Clinicians’ adherence to international guidelines in the clinical care of adults with inflammatory bowel disease. Scand J Gastroenterol 2017; 52: 536-542
  • 5 Kanazaki R, Smith B, Girgis A. et al. Clinician adherence to inflammatory bowel disease guidelines: results of a qualitative study of barriers and enablers. Crohns Colitis 360 2023; 5: otac018 DOI: 10.1093/crocol/otac018. (PMID: 37180282)
  • 6 OpenAI.. ChatGPT: Optimizing language models for dialogue. OpenAI. 2022 Accessed March 26, 2024 at: https://chatgpt.r4wand.eu.org/
  • 7 Jackson B, Begun J, Gray K. et al. Clinical decision support improves quality of care in patients with ulcerative colitis. Aliment Pharmacol Ther 2019; 49: 1040-1051
  • 8 Yu N, Basnayake C, Connell W. et al. Interventions to improve adherence to preventive care in inflammatory bowel disease: a systematic review. Inflamm Bowel Dis 2022; 28: 1177-1188 DOI: 10.1093/ibd/izab247. (PMID: 34618007)
  • 9 Lahat A, Shachar E, Avidan B. et al. Evaluating the utility of a large language model in answering common patients’ gastrointestinal health-related questions: are we there yet?. Diagnostics (Basel) 2023; 13: 1950
  • 10 Henson JB, Glissen Brown JR, Lee JP. et al. Evaluation of the potential utility of an artificial intelligence chatbot in gastroesophageal reflux disease management. Am J Gastroenterol 2023; 118: 2276-2279
  • 11 Yeo YH, Samaan JS, Ng WH. et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol 2023; 29: 721-732
  • 12 Gorelik Y, Ghersin I, Maza I. et al. Harnessing language models for streamlined postcolonoscopy patient management: a novel approach. Gastrointest Endosc 2023; 98: 639-641.e4
  • 13 Suchman K, Garg S, Trindade AJ. Chat generative pretrained transformer fails the multiple-choice American College of Gastroenterology self-assessment test. Am J Gastroenterol 2023; 118: 2280-2282 DOI: 10.14309/ajg.0000000000002320. (PMID: 37212584)