PrecisionHealthLLM

Abstract

Medicine today is imprecise. Among the top 20 drugs in the U.S., up to 80% of patients are non-responders. The goal of precision health is to provide the right intervention for the right people at the right time. The key to realize this dream is to develop a data-driven, learning system that can instantly incorporate new health information to optimize care delivery and accelerate biomedical discovery. In reality, however, the health ecosystem is mired in overwhelming unstructured data and excruciating manual processing. For example, in cancer, standard of care often fails, and clinical trials are the last hope. Yet less than 3% of patients can find a matching trial, whereas 40% of trial failures simply stem from insufficient recruitment. Discovery is painfully slow as a new drug may take billions of dollars and over a decade to develop.

In this tutorial, we will explore how large language models (LLMs) can serve as a universal structuring tool to democratize biomedical knowledge work and usher in an intelligence revolution in precision health. We first review background for precision health and give a broad overview of the AI revolution that culminated in the development of large language models, highlighting key technical innovations and prominent trends such as consolidation of AI methods across modalities. We then give an in-depth review of biomedical LLMs and precision health applications, with a particular focus on scaling real-world evidence generation and drug discovery. To conclude, we discuss key technical challenges (e.g., bias, hallucination, cost), societal ramifications (e.g., privacy, regulation), as well as exciting research frontiers such as prompt programming, knowledge distillation, multi-modal learning, causal discovery.

Resources

As a resource, we provide a non-exhaustive list of papers and other resources that we referred to during the tutorial. We broadly categorize resources into three categories aligned with the structure of the tutorial: Precision Health, The Intelligence Revolution, LLMs for Precision Health, Application Challenges, and Research Frontiers.

Precision Health

LLMs for Precision Health

GPT-4 in Medicine

Biomedical LLMs

GPT-4: The AI Revolution in Medicine
Med PaLM: Large Language Models Encode Clinical Knowledge
Med PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models
BioGPT: BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
PubMedGPT / BioMedLM: Stanford CRFM Introduces PubMedGPT 2.7B
BioMedGPT: BioMedGPT: Open Multimodal Generative Pre-trained Transformer for Biomedicine
BioMegatron: BioMegatron: Larger Biomedical Domain Language Model
GatorTronGPT: A Study of Generative Large Language Model for Medical Research and Healthcare
Galactica: Galactica: A Large Language Model for Science
PubMedBERT: Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
ClinicalBERT: Publicly Available Clinical BERT Embeddings
BioBERT: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
BioLinkBERT: LinkBERT: Pretraining Language Models with Document Links
SciBERT: SciBERT: A Pretrained Language Model for Scientific Text
DoT5: Compositional Zero-Shot Domain Transfer with Text-to-Text Models
SciFive: SciFive: a text-to-text transformer model for biomedical literature

LLMs for Real-World Evidence

LLMs for Drug Discovery

Application Challenges

Bias

Hallucinations

Research Frontiers

Prompt Programming

Retrieval-Augmented Generation (RAG)

Knowledge Distillation

Multi-modal learning

Causal Discovery

BibTeX


        @inproceedings{10.1145/3580305.3599568,
            author = {Poon, Hoifung and Naumann, Tristan and Zhang, Sheng and Gonz\'{a}lez Hern\'{a}ndez, Javier},
            title = {Precision Health in the Age of Large Language Models},
            year = {2023},
            isbn = {9798400701030},
            publisher = {Association for Computing Machinery},
            address = {New York, NY, USA},
            url = {https://doi.org/10.1145/3580305.3599568},
            doi = {10.1145/3580305.3599568},
            booktitle = {Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
            pages = {5825–5826},
            numpages = {2},
            keywords = {artificial intelligence, large language model, precision health, machine learning},
            location = {Long Beach, CA, USA},
            series = {KDD '23}
        }

Precision Health in the Age of LLMs

KDD 2023 Tutorial LS-21 | Thursday, August 10

Precision Health in the Age of Large Language Models (LLMs) was presented as tutorial LS-21 at KDD 2023 on Thursday, August 10 10am-1pm. We provide the materials presented as well as additional resources for those interested in this topic.

Abstract

Resources

BibTeX