ChatGPT and Medicine: The Good, the Bad, and the Bizarre

July 18, 2023

The artificial intelligence chatbot ChatGPT has wowed users with its seemingly human-like ability to generate and respond to language, and worried others by that very same ability. Like virtually every technological innovation that has come before — from the telephone to refrigeration to plastics — ChatGPT is sure to find uses in medicine. People are asking it for information about their health problems. Physicians are seeing whether it can help draft letters to insurance companies or respond to patients’ questions. The results are sometimes impressive, occasionally perplexing, and periodically laughable.

To explore some of the potential benefits and shortcomings of ChatGPT in medicine, we spoke with two Dana-Farber physicians, Benjamin Schlechter, MD, and Douglas Rubinson, MD, PhD, who specialize in gastrointestinal cancers and have experimented with ChatGPT to gauge its strengths and weaknesses.

ChatGPT and Medicine HERO

The artificial intelligence chatbot ChatGPT has potential benefits for medicine as well as shortcomings, according to two Dana-Farber experts.

Let’s start on a positive note. What are some of the ways ChatGPT can be useful in medicine – to doctors as well as patients and the general public?

Rubinson: It does a good job in an area that doctors sometimes struggle with: explaining complicated medical topics at a level patients can grasp. In response to a question from a patient, I can ask it to provide an answer at the level of a fifth grader, a tenth grader, a college graduate, a PhD in biochemistry, or anything in between. I don’t think ChatGPT is at the stage where we can copy its response directly into a communication with a patient, but it can provide a first pass that a physician can edit and correct before sending it on.

Schlechter: It helps provide a framework for writing. If I ask it for a paragraph on a particular topic, the text it generates may be 50% accurate, but it’s set forth very logically. I find that in some ways, it’s most useful for the cadence with which it communicates. It’s very clear, concise, and organized – it puts things in a way that’s very linear. That may not be how human beings think, but it’s how they learn. When I ask ChatGPT to write something, I’m very often impressed by the organization of it, not necessarily by the data it provides.

It can help start the writing process. If I have writer’s block, for example, I may ask it to write an introductory paragraph. What it comes up with may not be quite right, but I can edit and correct it. It can provide a direction for what I want to write.

Can you give an example of a situation where ChatGPT can help patients understand a complicated medical issue?

Rubinson: For patients who undergo genetic testing, it’s important to understand the difference between germline testing — which looks for inherited genetic abnormalities that could potentially be passed on to one’s children — and somatic testing, which can identify specific genetic abnormalities in tumor tissue. That’s an area where I think ChatGPT does a really nice job of providing an explanation that’s easy for patients to follow and share with their family members.

As physicians, you know whether the information provided by ChatGPT is correct. What do you advise patients about using it to learn about the results of research into their cancer?

Schlechter: One of the biggest problems is that ChatGPT’s responses are largely based on retrospective data. That is, they’re skewed toward data published over many years, and less toward new data. ChatGPT doesn’t necessarily provide the most accurate information; it provides the most popular or prevalent information on the Web. There’s always going to be a lot more information on the Web about older research than new research. ChatGPT’s responses are going to reflect that, so the information it gives is apt to be somewhat out of date. For several weeks after Queen Elizabeth died, if you asked ChatGPT for information on her, it would tell you she’s alive.

ChatGPT also doesn’t handle nuance very well. If new findings create a subtle change in what’s understood about a disease or its treatment, that change could be de-emphasized by ChatGPT.

Rubinson: One of the things we talk to our patients about is being very careful about the use of any online information source in understanding how best to treat their cancer. It’s wonderful that we’ve democratized medical information — that it’s so freely available, and patients can come in with wonderfully incisive questions about their treatment and about clinical trials. At the same time, there’s a great deal of specificity to cancer care that requires a really nuanced understanding of their diagnosis: What stage of cancer do they have? What subtype? How aggressive is it? What genomic features does it have? What previous treatments have patients had? What other health conditions are they dealing with? There’s a great deal that goes into clinical decision-making that is hard to capture in a question placed to ChatGPT.

We often hear that ChatGPT can give false, even fictional information. Have you found that to be the case?

Schlechter: In some cases, yes. I once asked it to provide statistics on a certain type of cancer, and it literally made up an equation. It even gave it a name. It was an equation that does nothing, but it looked very convincing. In a way, it’s like talking to children: they start making up a story and continue the more you ask them about it. In this case, ChatGPT was adding detail after detail, none of it real, because I asked it to elaborate. It’s very self-confident, for a computer.

So it’s not a substitute for a conversation with a physician.

Rubinson: Right. In caring for patients, we have multidisciplinary tumor boards: medical oncologists, surgical oncologists, radiation oncologists, pathologists, and radiologists meet to discuss patients’ treatment. That level of discussion and consultation and collaboration — and the nuanced understanding needed to make clinical decisions — can’t be replicated by a non-sentient, text-prediction model like ChatGPT.

Schlechter: Before we start using something like this in patient care, it should subject it to a clinical trial, just as we would for anything used in the clinic. Whether it’s a new type of heart monitor, a new drug or surgical technique – the benefits and risks need to be formally evaluated. The same should be true for ChatGPT.