Key Takeaways:
- Powered by large-language models that have been given a crash course in precision oncology, the AI-based oncologist’s assistant shows the potential for AI to help oncologists keep up with the fast pace of approvals of precision cancer medicines.
- More research is needed before such tools can be safely used in clinical practice.
There are over 100 precision medicines approved for the treatment of cancer. These therapies work to shut down the cancer-driving effects of specific mutations.
Matching a patient to a medicine — the practice of precision oncology — is complex and advancing rapidly. However, there is no easy, standardized way for oncologists to stay informed of advances.
“The clinicians we engaged with said that catching up with FDA approvals is far from seamless,” says Helena Jun, PhD, a computational biologist working in the lab of Eliezer Van Allen, MD, chief of the Division of Population Sciences at Dana-Farber.
To make the process simpler, Jun and Van Allen created an AI-based oncologist’s assistant to help clinicians identify approved therapies that might be appropriate for a given patient. To build it, Jun gave the large-language model (LLM) that powers ChatGPT a crash course in precision oncology. In a study published in Cancer Cell, the resulting tool was 93% accurate when tested with about 100 realistic queries submitted by physicians.
The model the team created is currently intended for research only. More research is required before any AI-based oncologist’s assistant is used in clinical practice.
“We wanted to see if we could create an assistant that can be helpful to an oncologist without taking away the autonomy, decision making, and relationship between the patient and provider, and we’ve learned both that it is possible and that there is more work to be done,” says Van Allen. “Dana-Farber is leading research at the intersection of clinical oncology and AI with the intent of making practical tools that advance the mission of improving the lives of patients with cancer everywhere.”
Eliezer Van Allen, MD, chief of the Division of Population Sciences at Dana-Farber
Evolution of the model
Jun first tested commercially available LLMs to see how well they performed in identifying up-to-date approved treatments given multiple forms of prompts crafted by the team. The model behind ChatGPT, GPT-4o, proved most successful with between 85.9% and 89.3% accuracy depending on the complexity of the prompt.
“We then wondered, what if you take these LLMs and augment them with the years of very niche precision medicine knowledge we’ve been accumulating?” says Van Allen.
He had the perfect set of reading materials in hand. Five years ago, Van Allen’s team created the Molecular Oncology Almanac (MOAlmanac), a highly curated database of information about targeted medicines approved for different types of cancer and molecular biomarkers, which is paired with a companion algorithm for ranking molecular events within a patient’s tumor data. The tool is updated by experts on a quarterly or monthly basis as needed.
The challenge oncologists face when accessing MOAlmanac directly, however, is that its interface is rigid, and the data returned needs to be digested and interpreted.
“There are fundamental shifts happening in the way people interact with technology in every aspect of our lives. People are expecting thoughtful responses and tools they can converse with,” says Van Allen. “We recognize this and wanted to know if these types of approaches could work in the domain we care about, which is precision oncology.”
To augment the LLM with MOAlmanac, Jun developed a retrieval-augmented generation (RAG)-LLM. RAG LLMs have been used to create specialized LLMs in other fields, such as in law. The RAG-LLM outperformed the LLM alone using the team’s test prompts.
Helena Jun, MS, a computational biologist working in the lab of Eliezer Van Allen, MD.
Tuning the model
Jun tested the RAG-LLM with real-world prompts. She reached out to oncologists at Dana-Farber and other Boston-area hospitals and asked for realistic prompts that reflect the kind of queries they would be interested in.
Using these queries, the model had a 93% accuracy rate. Good, but not perfect.
The team evaluated the prompts with erroneous responses and noticed patterns. For instance, when the AI found no approved options for a patient, it would sometimes hallucinate one.
“LLMs tend to be trained and tuned to please you, so they want to give you an answer,” says Van Allen.
Jun tuned the LLM to be less creative, an option under the hood for people creating AI-based tools. This forced the model to report back the news when there were no approved options found. There is still more work to be done to understand other incorrect answers.
The final model is available online as a research tool. It has several options, including the ability to choose a region, such as the European Union or United States, because those regions have different regulatory bodies and approvals.
Meanwhile, the team is working toward building an AI-based model that combines augmentation with other ideas they are experimenting with. The goal is to create something that is ready to be tested in a clinical trial.
“The field would benefit from proper clinical trials assessing the utility of these kinds of decision support tools in this AI era,” says Van Allen. “We need to make sure they’re safe and effective and useful for providers and for patients.”
