Updates on AI in Dermatology 

Dr. Roxana Daneshjou discusses artificial intelligence (AI) in dermatology, including the potential and problems with image- and language-based diagnostic tools.

Roxana Daneshjou, MD, PhD, Assistant Professor, Departments of Dermatology and Biomedical Data Science, Stanford University, Stanford, California.   

“I talked about some of the basic things I think every dermatologist needs to know about artificial intelligence (AI), [including] two different types of AI—computer vision, which is image processing, and large language models, such as ChatGPT,” said Roxana Daneshjou, MD, PhD, who presented “AI in Dermatology: Gamechanger or Existential Threat?” at the 75th Annual Meeting of the Pacific Dermatologic Association in San Francisco, California.  

According to Dr. Daneshjou, as of her talk and interview with The Dermatology Digest on August 21, 2023, the FDA had not yet approved image-based AI devices in dermatology. 

There are several companies that are interested in getting FDA approval for image analysis, she said. And there have been many published models that claim their performance is on par with dermatologists on retrospective image data of diseases. 

“However, we now know that performance on retrospective image data does not always translate to the real world.” 

Another major issue in image-based AI for dermatology is that many of these algorithms are not developed using diverse skin tones, she said.  

“The majority of published algorithms have only been trained on images of disease on white skin.” 

Dr. Daneshjou cited a paper she authored published in Science Advances.1 It shows that several algorithms actually perform significantly worse when asked to identify disease in brown or Black skin tones, or Fitzpatrick V and VI, she said.

“Bias is a major issue that we worry about when it comes to image-based diagnostic AI.  Another issue is understanding what real-world performance is—not just on retrospective data.”  

Large Language Models 

Most everyone has either heard of or tried ChatGPT (OpenAI), which is a large language model, said Dr. Daneshjou.  

“These models are trained on incredibly large datasets of text—think of textbooks, possibly Wikipedia, the internet. The problem is nobody knows what has gone into this ‘sausage’ because none of the companies that have actually built these models have told us what text they use to develop them.” 

These models are trained to produce human-like language, she said.  

“The first step of this training process is simply having the model try to predict the next word in a sentence. By doing that training across a large amount of text, the model begins to learn how words…are connected.” 

Many of the models have a second step, where they give responses to human questioning and humans give feedback about whether or not the answers are appropriate, said Dr. Daneshjou.  

There’s no way to assess information accuracy in the first step of autocompleting next words in the sentences, but the human feedback in the second step is an opportunity to adjust accuracy, she said.  

“However, as many people who have played with ChatGPT can tell you, that doesn’t necessarily make it accurate all the time. One of the examples that I like to show is that I asked ChatGPT to make a handout on sunscreen. It gave me a very convincing and mostly factually correct handout on sunscreen. However, the citations that it cited were completely made up.” 

When Dr. Daneshjou asked ChatGPT to write her bio, it delivered something that sounded correct, but included medical schools and degrees of study that weren’t true. 

“When you read it, it sounds very convincing. So that is actually where the danger comes in. It’s that it can sound correct but actually be dead wrong.”  

Dr. Daneshjou demonstrated during her talk that large language models can have biases, including gender and racial biases.  

“If you ask it to complete a series of sentences that are about a doctor, a CEO, and engineer, the model calls all those individuals male. But if you have sentence completion about a parent, it completes it as a mom or a woman.” 

“We have done research that has shown that these large language models encode and might perpetuate false race-based medicine—things that have been thoroughly debunked, like differences in EGFR between races, which has been shown not to be true.”

This research is unyet published, said Dr. Daneshjou.

Practical Pearls 

Large language models are not HIPAA commpliant and nor are they secure, she said. 

“Do not ever enter protected health information into any of these models.” 

Also, don’t believe everything they say, said Dr. Daneshjou.  

“Be aware that they make up very convincing sounding ‘facts.’ Be aware that there are likely biases.”  

“Most importantly, be aware of automation bias, which is when we trust technology and machines more than our own good judgement,” said Dr. Daneshjou.  

An example of automation bias is the news about tourists in Hawaii driving into a body of water because of a GPS error, she said. 

“These people were so used to trusting their GPS system that they were not looking with their own eyes to avoid the water. Similar things can happen with AI technology. If you build such a trust with the system, it could push you in the wrong direction. But because of your trust and automation bias, you may not realize that you are going down a road that you should not be going down.”  

Reference

  1. Daneshjou R, Vodrahalli K, Novoa RA, et al. Disparities in dermatology AI performance on a diverse, curated clinical image set. Sci Adv. 2022;8(32):eabq6147. doi:10.1126/sciadv.abq6147.