CC BY 4.0 · Yearb Med Inform 2024; 33(01): 229-240
DOI: 10.1055/s-0044-1800750
Section 10: Natural Language Processing
Survey

Natural Language Processing for Digital Health in the Era of Large Language Models

Abeed Sarker
1   Emory University, Atlanta, GA, USA
,
Rui Zhang
2   University of Minnesota, Minneapolis, MN, USA
,
Yanshan Wang
3   University of Pittsburgh, Pittsburgh, PA, USA
,
Yunyu Xiao
4   Cornell University, New York, NY, USA
,
Sudeshna Das
1   Emory University, Atlanta, GA, USA
,
Dalton Schutte
2   University of Minnesota, Minneapolis, MN, USA
,
David Oniani
3   University of Pittsburgh, Pittsburgh, PA, USA
,
Qianqian Xie
5   Yale University, New Haven, CT, USA
,
Hua Xu
5   Yale University, New Haven, CT, USA
› Institutsangaben

Summary

Objectives: Large language models (LLMs) are revolutionizing the natural language pro-cessing (NLP) landscape within healthcare, prompting the need to synthesize the latest ad-vancements and their diverse medical applications. We attempt to summarize the current state of research in this rapidly evolving space.

Methods: We conducted a review of the most recent studies on biomedical NLP facilitated by LLMs, sourcing literature from PubMed, the Association for Computational Linguistics Anthology, IEEE Explore, and Google Scholar (the latter particularly for preprints). Given the ongoing exponential growth in LLM-related publications, our survey was inherently selective. We attempted to abstract key findings in terms of (i) LLMs customized for medical texts, and (ii) the type of medical text being leveraged by LLMs, namely medical literature, electronic health records (EHRs), and social media. In addition to technical details, we touch upon topics such as privacy, bias, interpretability, and equitability.

Results: We observed that while general-purpose LLMs (e.g., GPT-4) are most popular, there is a growing trend in training or customizing open-source LLMs for specific biomedi-cal texts and tasks. Several promising open-source LLMs are currently available, and appli-cations involving EHRs and biomedical literature are more prominent relative to noisier data sources such as social media. For supervised classification and named entity recogni-tion tasks, traditional (encoder only) transformer-based models still outperform new-age LLMs, and the latter are typically suited for few-shot settings and generative tasks such as summarization. There is still a paucity of research on evaluation, bias, privacy, reproduci-bility, and equitability of LLMs.

Conclusions: LLMs have the potential to transform NLP tasks within the broader medical domain. While technical progress continues, biomedical application focused research must prioritize aspects not necessarily related to performance such as task-oriented evaluation, bias, and equitable use.



Publikationsverlauf

Artikel online veröffentlicht:
08. April 2025

© 2024. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed Represen-tations of Words and Phrases and their Compositionality. In: Burges CJ, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, editors. Advances in Neural Information Processing Systems. vol. 26. Curran Associates, Inc.; 2013. https://proceedings.neurips.cc/paper_files/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf
  • 2 Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR. 2018;abs/1810.04805. http://arxiv.org/abs/1810.04805
  • 3 Mahajan D, Liang JJ, Tsou CH, Ozlem Uzuner. Overview of the 2022 n2c2 shared task on contex-tualized medication event extraction in clinical notes. Journal of Biomedical Informatics. 2023;144:104432. https://doi.org/10.1016/j.jbi.2023.104432
  • 4 Klein AZ, Banda JM, Guo Y, Schmidt AL, Xu D, Amaro IF, et al. Overview of the 8th social media mining for health applications (SMM4H) shared tasks at the AMIA 2023 annual symposium. Journal of the American Medical Informatics Association. 2024 01:ocae010. https://doi.org/10.1093/jamia/ocae010
  • 5 Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In: Wallach H, Larochelle H, Beygelzimer A, d'Alché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc.; 2019. https://proceedings.neurips.cc/paper_files/paper/2019/file/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdf
  • 6 Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language Models are Few-Shot Learners. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in Neural Information Processing Systems. vol. 33. Curran Associates, Inc.; 2020. p. 1877-901. https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
  • 7 Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language rep-resentation model for biomedical text mining. Bioinformatics. 2020 Feb;36(4):1234-40. http://doi.org/10.1093/bioinformatics/btz682
  • 8 Alsentzer E, Murphy J, Boag W, Weng WH, Jindi D, Naumann T, et al. Publicly Available Clinical BERT Embeddings. In: Rumshisky A, Roberts K, Bethard S, Naumann T, editors. Proceedings of the 2nd Clinical Natural Language Processing Workshop. Minneapolis, Minnesota, USA: Association for Computational Linguistics; 2019. p. 72-8. https://aclanthology.org/W19-1909
  • 9 Min B, Ross H, Sulem E, Veyseh APB, Nguyen TH, Sainz O, et al. Recent advances in natural lan-guage processing via large pre-trained language models: A survey. ACM Computing Surveys. 2023;56(2):1-40. https://doi.org/10.1145/3605943
  • 10 Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. CoRR. 2019;abs/1910.10683. http://arxiv.org/abs/1910.10683
  • 11 Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:230709288. 2023. https://arxiv.org/abs/2307.09288
  • 12 OpenAI. GPT-4 Technical Report; 2023. https://arxiv.org/abs/2303.08774
  • 13 Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Frontiers in Artificial Intelligence. 2023;6:1169595. https://doi.org/10.3389/frai.2023.1169595
  • 14 Sallam M, Salim N, Barakat M, Al-Tammemi A. ChatGPT applications in medical, dental, pharmacy, and public health education: A descriptive study highlighting the advantages and limitations. Narra J. 2023;3(1):e103-3. https://doi.org/10.52225/narra.v3i1.103
  • 15 Wu C, Zhang X, Zhang Y, Wang Y, Xie W. PMC-LLaMA: Towards Building Open-source Language Models for Medicine. arXiv preprint arXiv:230414454. 2023. https://arxiv.org/abs/2304.14454
  • 16 Chen Z, Cano AH, Romanou A, Bonnet A, Matoba K, Salvi F, et al. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models. arXiv preprint arXiv:231116079. 2023. https://arxiv.org/abs/2311.16079
  • 17 Wang B, Xie Q, Pei J, Chen Z, Tiwari P, Li Z, et al. Pre-trained language models in biomedical do-main: A systematic survey. ACM Computing Surveys. 2023;56(3):1-52. https://doi.org/10.1145/3611651
  • 18 Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, et al. A survey of large language models. arXiv pre-print arXiv:230318223. 2023. https://arxiv.org/abs/2303.18223
  • 19 Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain-specific language model pre-training for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH). 2021;3(1):1-23. https://doi.org/10.1145/3458754
  • 20 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017;30. https://papers.nips.cc/paper/7181-attention-is-all-you-need
  • 21 Ke Z, Shao Y, Lin H, Konishi T, Kim G, Liu B. Continual Pre-training of Language Models. In: The Eleventh International Conference on Learning Representations; 2022. https://doi.org/10.18653/v1/2022.emnlp-main.695
  • 22 Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, et al. LoRA: Low-Rank Adaptation of Large Language Models. In: International Conference on Learning Representations; 2022. https://openreview.net/forum?id=nZeVKeeFYf9
  • 23 Shah NH, Entwistle D, Pfeffer MA. Creation and Adoption of Large Language Models in Medicine. JAMA. 2023 09;330(9):866-9. https://doi.org/10.1001/jama.2023.14217
  • 24 Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. arXiv preprint arXiv:221213138. 2022. https://arxiv.org/abs/2212.13138
  • 25 Li Y, Li Z, Zhang K, Dan R, Jiang S, Zhang Y. ChatDoctor: A Medical Chat Model FineTuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge. Cureus. 2023;15(6). https://doi.org/10.7759/cureus.40895
  • 26 Han T, Adams LC, Papaioannou JM, Grundmann P, Oberhauser T, Löser A, et al. MedAlpaca– An Open-Source Collection of Medical Conversational AI Models and Training Data. arXiv preprint arXiv:230408247. 2023. https://arxiv.org/abs/2304.08247
  • 27 Tu T, Azizi S, Driess D, Schaekermann M, Amin M, Chang PC, et al. Towards generalist biomedical AI. arXiv preprint arXiv:230714334. 2023. https://arxiv.org/abs/2307.14334
  • 28 Toma A, Lawler PR, Ba J, Krishnan RG, Rubin BB, Wang B. Clinical Camel: An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding. arXiv preprint arXiv:230512031. 2023. https://arxiv.org/abs/2305.12031
  • 29 Peng C, Yang X, Chen A, Smith KE, PourNejatian N, Costa AB, et al. A Study of Generative Large Language Model for Medical Research and Healthcare. npj Digital Medicine. 2023 Nov;6(1):210. Available from: https://doi.org/10.1038/s41746-023-00958-w
  • 30 Zhang H, Chen J, Jiang F, Yu F, Chen Z, Chen G, et al. HuatuoGPT, Towards Taming Language Model to Be a Doctor. In: Findings of the Association for Computational Linguistics: EMNLP 2023; 2023. p. 10859-85. https://doi.org/10.18653/v1/2023.findings-emnlp.725
  • 31 Li C, Wong C, Zhang S, Usuyama N, Liu H, Yang J, et al. LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day. arXiv preprint arXiv:230600890. 2023. https://arxiv.org/abs/2306.00890
  • 32 Gema A, Daines L, Minervini P, Alex B. Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain. arXiv preprint arXiv:230703042. 2023. https://arxiv.org/abs/2307.03042
  • 33 Moor M, Huang Q, Wu S, Yasunaga M, Dalmia Y, Leskovec J, et al. Med-Flamingo: a multimodal medical few-shot learner. In: Machine Learning for Health (ML4H). PMLR; 2023. p. 353-67. https://proceedings.mlr.press/v225/moor23a.html
  • 34 Zhang X, Tian C, Yang X, Chen L, Li Z, Petzold LR. AlpaCare: Instruction-tuned Large Language Models for Medical Application. arXiv preprint arXiv:231014558. 2023. https://arxiv.org/abs/2310.14558
  • 35 Tu T, Palepu A, Schaekermann M, Saab K, Freyberg J, Tanno R, et al. Towards Conversational Diagnostic AI. arXiv preprint arXiv:240105654. 2024. https://arxiv.org/abs/2401.05654
  • 36 Xie Q, Chen Q, Chen A, Peng C, Hu Y, Lin F, et al. Me LLaMA: Foundation Large Language Models for Medical Applications. arXiv preprint arXiv:240212749. 2024. https://arxiv.org/abs/2402.12749
  • 37 Qiu P, Wu C, Zhang X, Lin W, Wang H, Zhang Y, et al. Towards Building Multilingual Language Model for Medicine. arXiv preprint arXiv:240213963. 2024. https://arxiv.org/abs/2402.13963
  • 38 Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain-Specific Language Model Pre-training for Biomedical Natural Language Processing. ACM Transactions on Computing for Healthcare. 2022 Jan;3(1):1-23. https://doi.org/10.1145/3458754
  • 39 Wang G, Liu X, Ying Z, Yang G, Chen Z, Liu Z, et al. Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial. Nature Medicine. 2023;29(10):2633-42. https://doi.org/10.1038/s41591-023-02552-9
  • 40 Shoeybi M, Patwary M, Puri R, LeGresley P, Casper J, Catanzaro B. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv; 2020. ArXiv:1909.08053. http://arxiv.org/abs/1909.08053
  • 41 Shin HC, Zhang Y, Bakhturina E, Puri R, Patwary M, Shoeybi M, et al. BioMegatron: Larger Bio-medical Domain Language Model. arXiv; 2020. ArXiv:2010.06060. http://arxiv.org/abs/2010.06060
  • 42 Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172-80. https://doi.org/10.1038/s41586-023-06291-2
  • 43 Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:230213971. 2023. https://arxiv.org/abs/2302.13971
  • 44 Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, et al. A large language model for electronic health records. npj Digital Medicine. 2022;5(1). https://doi.org/10.1038/s41746-022-00742-2
  • 45 Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877-901. https://papers.nips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
  • 46 Peng Y, Yan S, Lu Z. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. In: Proceedings of the 2019 Workshop on Bi-omedical Natural Language Processing (BioNLP 2019); 2019. https://doi.org/10.18653/v1/W19-5006
  • 47 Zhang K, Yu J, Yan Z, Liu Y, Adhikarla E, Fu S, et al. BiomedGPT: A unified and generalist biomedi-cal generative pre-trained transformer for vision, language, and multimodal tasks. arXiv preprint arXiv:230517100. 2023. https://arxiv.org/abs/2305.17100
  • 48 Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H, et al. BioGPT: generative pretrained transformer for biomedical text generation and mining. Briefings in bioinformatics. 2022;23(6):bbac409. https://doi.org/10.1093/bib/bbac409
  • 49 Tang L, Sun Z, Idnay B, Nestor JG, Soroush A, Elias PA, et al. Evaluating large language models on medical evidence summarization. npj Digital Medicine. 2023;6(1):158. https://doi.org/10.1038/s41746-023-00896-7
  • 50 Singh J, Patel T, Singh A. Performance Analysis of Large Language Models for Medical Text Sum-marization. OSF Preprints. https://osf.io/preprints/osf/kn5f2
  • 51 Liu D, Ding C, Bold D, Bouvier M, Lu J, Shickel B, et al. Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes. arXiv preprint arXiv:240113588. 2024. https://arxiv.org/abs/2401.13588
  • 52 Abraham TM, Adams G. Evaluating the Medical Knowledge of Open LLMs - Part 1. MedARC Blog. 2024 January. https://medarc.ai/blog/medarc-llms-eval-part-1
  • 53 He Z, Wang Y, Yan A, Liu Y, Chang E, Gentili A, et al. MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation. In: Proceedings of the 2023 Con-ference on Empirical Methods in Natural Language Processing; 2023. p. 8725-44. https://doi.org/10.18653/v1/2023.emnlp-main.540
  • 54 Cai Y, Wang L, Wang Y, de Melo G, Zhang Y, Wang Y, et al. Medbench: A large-scale chinese benchmark for evaluating medical large language models. arXiv preprint arXiv:231212806. 2023. https://arxiv.org/abs/2312.12806
  • 55 Zhu Z, Sun Z, Yang Y. HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild. arXiv preprint arXiv:240304307. 2024. https://arxiv.org/abs/2403.04307
  • 56 Bian J, Zheng J, Zhang Y, Zhu S. Inspire the Large Language Model by External Knowledge on Bi-oMedical Named Entity Recognition. arXiv; 2023. ArXiv:2309.12278. http://arxiv.org/abs/2309.12278
  • 57 Soman K, Rose PW, Morris JH, Akbas RE, Smith B, Peetoom B, et al. Biomedical knowledge graph-enhanced prompt generation for large language models. arXiv; 2023. ArXiv:2311.17330. http://arxiv.org/abs/2311.17330
  • 58 Li M, Chen M, Zhou H, Zhang R. PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction. arXiv; 2023. ArXiv:2310.18463. http://arxiv.org/abs/2310.18463
  • 59 Chen Q, Du J, Hu Y, Keloth VK, Peng X, Raja K, et al. Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations. arXiv; 2024. ArXiv:2305.16326. http://arxiv.org/abs/2305.16326
  • 60 Wang C, Ong J, Wang C, Ong H, Cheng R, Ong D. Potential for GPT Technology to Optimize Future Clinical Decision-Making Using Retrieval-Augmented Generation. Annals of Biomedical Engi-neering. 52, 1115–1118 (2024). https://doi.org/10.1007/s10439-023-03327-6
  • 61 Khan YA, Hokia C, Xu J, Ehlert B. covLLM: Large Language Models for COVID-19 Biomedical Lit-erature. arXiv; 2023. ArXiv:2306.04926. http://arxiv.org/abs/2306.04926
  • 62 Wang LL, Lo K, Chandrasekhar Y, Reas R, Yang J, Burdick D, et al. CORD-19: The COVID-19 Open Research Dataset. arXiv; 2020. ArXiv:2004.10706. http://arxiv.org/abs/2004.10706
  • 63 Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: benefits, risks, and strategies for success. npj Digital Medicine. 2020 Feb;3(1):17. https://doi.org/10.1038/s41746-020-0221-y
  • 64 Osheroff J, Teich J, Levick D, Saldana L, Velasco F, Sittig D, et al. Improving outcomes with clinical decision support. 2nd ed. HIMSS Book Series. Himss Publishing; 2012. https://doi.org/10.4324/9781498757461
  • 65 van Aken B, Papaioannou JM, Naik M, Eleftheriadis G, Nejdl W, Gers F, et al. This Patient Looks Like That Patient: Prototypical Networks for Interpretable Diagnosis Prediction from Clinical Text. In: He Y, Ji H, Li S, Liu Y, Chang CH, editors. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online only: Association for Computational Linguistics; 2022. p. 172-84. https://aclanthology.org/2022.aacl-main.14
  • 66 Lee RY, Kross EK, Torrence J, Li KS, Sibley J, Cohen T, et al. Assessment of Natural Language Pro-cessing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome. JAMA Network Open. 2023 03;6(3):e231204. https://doi.org/10.1001/jamanetworkopen.2023.1204.
  • 67 Lyu W, Dong X, Wong R, Zheng S, Abell-Hart K, Wang F, et al. A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data for Interpretable In-Hospital Mortality Prediction. AMIA Annu Symp Proc. 2022;2022:719-28. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10148371
  • 68 Garg S, Tsipras D, Liang P, Valiant G. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes. In: Oh AH, Agarwal A, Belgrave D, Cho K, editors. Advances in Neural In-formation Processing Systems; 2022. Available from: https://openreview.net/ fo-rum?id=flNZJ2eOet.
  • 69 Benary M, Wang XD, Schmidt M, Soll D, Hilfenhaus G, Nassir M, et al. Leveraging Large Language Models for Decision Support in Personalized Oncology. JAMA Network Open. 2023 Nov;6(11):e2343689. http://dx.doi.org/10.1001/jamanetworkopen.2023.43689
  • 70 Liu S, Wright AP, Patterson BL, Wanderer JP, Turer RW, Nelson SD, et al. Using AI-generated sug-gestions from ChatGPT to optimize clinical decision support. Journal of the American Medical In-formatics Association. 2023 04;30(7):1237-45. https://doi.org/10.1093/jamia/ocad072
  • 71 Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, et al. Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study. J Med Internet Res. 2023 Aug;25:e48659. https://doi.org/10.2196/48659
  • 72 Fawzi S. A Review of the Role of ChatGPT for Clinical Decision Support Systems. In: 2023 5th Novel Intelligent and Leading Emerging Sciences Conference (NILES); 2023. p. 439-42. https://doi.org/10.1109/NILES59815.2023.10296668.
  • 73 Wang T, Zhao X, Rios A. UTSA-NLP at RadSum23: Multi-modal Retrieval-Based Chest X-Ray Re-port Summarization. In: Demner-fushman D, Ananiadou S, Cohen K, editors. The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks. Toronto, Canada: Associ-ation for Computational Linguistics; 2023. p. 557-66. Available from: https://aclanthology.org/2023.bionlp-1.58
  • 74 Ozler KB, Bethard S. clulab at MEDIQA-Chat 2023: Summarization and classification of medical dialogues. In: Naumann T, Ben Abacha A, Bethard S, Roberts K, Rumshisky A, editors. Proceedings of the 5th Clinical Natural Language Processing Workshop. Toronto, Canada: Association for Computational Linguistics; 2023. p. 144-9. https://aclanthology.org/2023.clinicalnlp-1.19
  • 75 Sharma B, Gao Y, Miller T, Churpek M, Afshar M, Dligach D. Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning. In: Naumann T, Ben Abacha A, Bethard S, Roberts K, Rumshisky A, editors. Proceedings of the 5th Clinical Natural Language Processing Workshop. Toronto, Canada: Association for Computational Linguistics; 2023. p.78-85. https://aclanthology.org/2023.clinicalnlp-1.10
  • 76 Chin S, Li A, Boulet M, Howse K, Rajaram A. Resident and family physician perspectives on billing: An exploratory study. Perspect Health Inf Manag. 2022 Oct;19(4):1g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635049/
  • 77 Bhati D, Deogade MS, Kanyal D. Improving Patient Outcomes Through Effective Hospital Admin-istration: A Comprehensive Review. Cureus. 2023 Oct. http://dx.doi.org/10.7759/cureus.47731
  • 78 Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Internal Medicine. 2023 06;183(6):589-96. https://doi.org/10.1001/jamainternmed.2023.1838
  • 79 Shrank WH, Rogstad TL, Parekh N. Waste in the US Health Care System: Estimated Costs and Po-tential for Savings. JAMA. 2019 10;322(15):1501-9. https://doi.org/10.1001/jama.2019.13978
  • 80 Jiang LY, Liu XC, Nejatian NP, Nasir-Moin M, Wang D, Abidin A, et al. Health system-scale language models are all-purpose prediction engines. Nature. 2023 Jun;619(7969):357-62. http://dx.doi.org/10.1038/s41586-023-06160-y
  • 81 Lost in clinical translation. Nature Medicine. 2004 Sep;10(9):879-9. https://doi.org/10.1038/nm0904-879
  • 82 Yala A, Mikhael PG, Lehman C, Lin G, Strand F, Wan YL, et al. Optimizing risk-based breast cancer screening policies with reinforcement learning. Nature Medicine. 2022 Jan;28(1):136-43. https://doi.org/10.1038/s41591-021-01599-w
  • 83 Rehman RZU, Del Din S, Guan Y, Yarnall AJ, Shi JQ, Rochester L. Selecting Clinically Relevant Gait Characteristics for Classification of Early Parkinson's Disease: A Comprehensive Machine Learning Approach. Scientific Reports. 2019 Nov;9(1):17269. https: //doi.org/10.1038/s41598-019-53656-7
  • 84 Guazzo A, Longato E, Fadini GP, Morieri ML, Sparacino G, Di Camillo B. Deep-learning-based nat-ural-language-processing models to identify cardiovascular disease hospitalisations of patients with diabetes from routine visits' text. Scientific Reports. 2023 Nov;13(1):19132. https://doi.org/10.1038/s41598-023-45115-1
  • 85 Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language mod-els in medicine. Nature Medicine. 2023 Aug;29(8):1930-40. https://doi.org/10.1038/s41591-023-02448-8
  • 86 Kang T, Zhang S, Tang Y, Hruby GW, Rusanov A, Elhadad N, et al. EliIE: An open-source infor-mation extraction system for clinical trial eligibility criteria. Journal of the American Medical In-formatics Association. 2017;24(6):1062-71. https://doi.org/10.1093/jamia/ocx019
  • 87 Jin Q, Wang Z, Floudas CS, Sun J, Lu Z. Matching patients to clinical trials with large language models. ArXiv. 2023. https://arxiv.org/abs/2307.15051
  • 88 Braun LA, Zomorodbakhsch B, Keinki C, Huebner J. Information needs, communication and usage of social media by cancer patients and their relatives. Journal of cancer research and clinical on-cology. 2019;145:1865-75. https://doi.org/10.1007/s00432-019-02929-9
  • 89 Spadaro A, Sarker A, Hogg-Bremer W, Love JS, O'Donnell N, Nelson LS, et al. Reddit discussions about buprenorphine associated precipitated withdrawal in the era of fentanyl. Clinical Toxicol-ogy. 2022;60(6):694-701. https://doi.org/10.1080/15563650.2022.2032730
  • 90 Kogan NE, Clemente L, Liautaud P, Kaashoek J, Link NB, Nguyen AT, et al. An early warning ap-proach to monitor COVID-19 activity with multiple digital traces in near real time. Science Ad-vances. 2021;7(10):eabd6989. https://doi.org/10.1126/sciadv.abd6989
  • 91 Bremer W, Sarker A. Recruitment and retention in mobile application-based intervention studies: a critical synopsis of challenges and opportunities. Informatics for Health and Social Care. 2023;48(2):139-52. https://doi.org/10.1080/17538157.2022.2082297
  • 92 92. Sarker A, Ginn R, Nikfarjam A, O'Connor K, Smith K, Jayaraman S, et al. Utilizing social media data for pharmacovigilance: A review. Journal of Biomedical Informatics. 2015;54:202-12. https://doi.org/10.1016/j.jbi.2015.02.004
  • 93 Shen C, Chen A, Luo C, Zhang J, Feng B, Liao W. Using reports of symptoms and diagnoses on so-cial media to predict COVID-19 case counts in mainland China: Observational infoveillance study. Journal of medical Internet research. 2020;22(5):e19421. https://doi.org/10.2196/19421
  • 94 Zhang H, Wheldon C, Dunn AG, Tao C, Huo J, Zhang R, et al. Mining Twitter to assess the determi-nants of health behavior toward human papillomavirus vaccination in the United States. Journal of the American Medical Informatics Association. 2020;27(2):225-35. https://doi.org/10.1093/jamia/ocz191
  • 95 Guo Y, Ovadje A, Al-Garadi MA, Sarker A. Evaluating Large Language Models for HealthRelated Text Classification Tasks with Public Social Media Data; ArXiv. 2024. https://arxiv.org/abs/2403.19031
  • 96 Correia RB, Wood IB, Bollen J, Rocha LM. Mining Social Media Data for Biomedical Signals and Health-Related Behavior. Annual Review of Biomedical Data Science. 2020;3(1):433-58. https://doi.org/10.1146/annurev-biodatasci-030320-040844
  • 97 Yang K, Zhang T, Kuang Z, Xie Q, Ananiadou S. MentalLLaMA: Interpretable Mental Health Analy-sis on Social Media with Large Language Models. arXiv preprint arXiv:230913567. 2023. https://arxiv.org/abs/2309.13567
  • 98 Lai T, Shi Y, Du Z, Wu J, Fu K, Dou Y, et al. Psy-llm: Scaling up global mental health psychological services with ai-based large language models. arXiv preprint arXiv:230711991. 2023. https://arxiv.org/abs/2307.11991
  • 99 Xu X, Yao B, Dong Y, Gabriel S, Yu H, Hendler J, et al. Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data. In: Proceedings of the ACM on Interac-tive, Mobile, Wearable and Ubiquitous Technologies. 2023;8(1)31. p. 1-32. https://doi.org/10.1145/3643540
  • 100 Taori R, Gulrajani I, Zhang T, Dubois Y, Li X, Guestrin C, et al. Stanford Alpaca: An Instruction-following LLaMA model. GitHub; 2023. https://github.com/tatsu-lab/stanford_alpaca
  • 101 Chen J, Wang Y. Social media use for health purposes: systematic review. Journal of medical In-ternet research. 2021;23(5):e17917. https://doi.org/10.2196/17917
  • 102 Jiang K, Devendra V, Chavan S, Bernard GR. Detection of Day-Based Health Evidence with Pre-trained Large Language Models: A Case of COVID-19 Symptoms in Social Media Posts. In: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2023. p. 4208-12. Available from: https://doi.org/10.1109/BIBM58861.2023.10385580
  • 103 Chen C, Shu K. Combating misinformation in the age of llms: Opportunities and challenges. arXiv preprint arXiv:231105656. 2023. https://arxiv.org/abs/2311.05656
  • 104 Xiao Z, Liao QV, Zhou M, Grandison T, Li Y. Powering an AI Chatbot with Expert Sourcing to Sup-port Credible Health Information Access. In: Proceedings of the 28th International Conference on Intelligent User Interfaces; 2023. p. 2-18. https://doi.org/10.1145/3581641.3584031
  • 105 Tran V, Matsui T. Public Opinion Mining Using Large Language Models on COVID-19 Related Tweets. In: 2023 15th International Conference on Knowledge and Systems Engineering (KSE). IEEE; 2023. p. 1-6. https://doi.org/10.1109/KSE59128.2023.10299499
  • 106 Meskó B, Topol EJ. The imperative for regulatory oversight of large language models (or genera-tive AI) in healthcare. NPJ digital medicine. 2023;6(1):120. https://doi.org/10.1038/s41746-023-00873-0
  • 107 Zack T, Lehman E, Suzgun M, Rodriguez JA, Celi LA, Gichoya J, et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. The Lancet Digital Health. 2024;6(1):e12-22. https://doi.org/10.1016/S2589-7500(23)00225-X
  • 108 Wang X, Chen N, Chen J, Hu Y, Wang Y, Wu X, et al. Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People. arXiv preprint arXiv:240303640. 2024. https://arxiv.org/abs/2403.03640
  • 109 Bommasani R, Klyman K, Longpre S, Kapoor S, Maslej N, Xiong B, et al. The foundation model transparency index. arXiv preprint arXiv:231012941. 2023. https://arxiv.org/abs/2310.12941
  • 110 Lin J, Zhu L, Chen WM, Wang WC, Han S. Tiny Machine Learning: Progress and Futures [Feature]. IEEE Circuits and Systems Magazine. 2023;23(3):8-34. https://doi.org/10.1109/MCAS.2023.3302182
  • 111 Chen Q, Du J, Hu Y, Keloth VK, Peng X, Raja K, et al. Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations. arXiv preprint arXiv:230516326. 2023. https://arxiv.org/abs/2305.16326
  • 112 Wu C, Lin W, Zhang X, Zhang Y, Wang Y, Xie W. PMC-LLaMA: Towards Building Opensource Lan-guage Models for Medicine. arXiv; 2023. ArXiv:2304.14454. http://arxiv.org/abs/2304.14454
  • 113 French RM. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences. 1999;3(4):128-35. https://doi.org/10.1016/S1364-6613(99)01294-2.
  • 114 Gupta K, Thérien B, Ibrahim A, Richter ML, Anthony QG, Belilovsky E, et al. Continual Pre-Training of Large Language Models: How to re-warm your model? In: Workshop on Efficient Systems for Foundation Models@ ICML2023; 2023. Available from: https://openreview.net/forum?id=pg7PUJe0Tl.
  • 115 Clusmann J, Kolbinger FR, Muti HS, Carrero ZI, Eckardt JN, Laleh NG, et al. The future landscape of large language models in medicine. Communications Medicine. 2023;3(1):141. https://doi.org/10.1038/s43856-023-00370-1
  • 116 Gao Y, Xiong Y, Gao X, Jia K, Pan J, Bi Y, et al. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:231210997. 2023. https://arxiv.org/abs/2312.10997
  • 117 Huang L, Yu W, Ma W, Zhong W, Feng Z, Wang H, et al. A Survey on Hallucination in Large Lan-guage Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv; 2023. Ar-Xiv:2311.05232. http://arxiv.org/abs/2311.05232
  • 118 Liu Y, Yao Y, Ton JF, Zhang X, Cheng RGH, Klochkov Y, et al. Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment. arXiv preprint arXiv:230805374. 2023. https://arxiv.org/abs/2308.05374.