AI chatbot outperforms doctors in clinical reasoning

Illustrated by Kim Sung-kyu.

OpenAI’s Chat GPT-4 outperformed doctors in clinical reasoning, a study showed.

When the Chat GPT-4 and 39 doctors were given the same patient case and asked to evaluate diagnosis, the AI scored higher on accuracy, efficiency, and reasoning, as stated in a research letter in JAMA Internal Medicine.

“AI can improve the quality of healthcare for patients, “Adam Rodman, MD of Beth Israel Deaconess Medical Center in Boston (BIDMC), who led the study, said.

The adoption of AI in healthcare has surged significantly. AI is applied in robotic surgery for enhanced precision and is increasingly utilized in analyzing medical data and developing new drugs. Recently, AI has ventured into diagnosing patients, which was considered a challenging area requiring doctors’ experience and know-how. Some believe that with the increasing number of studies demonstrating the superiority of generative AI, such as ChatGPT, in primary diagnosis, “AI Doctors” will soon emerge.

The BIDMC researchers conducted a study in which one out of 20 clinical cases was randomly assigned to each of the 39 physicians, while all 20 cases were presented to chat GPT-4 for diagnosis. Assessments were made on a scale of 1 to 10, considering factors such as reviewing patient history, interpreting main problems, and suggesting alternative diagnoses. GPT-4 scored an average of 10 on 20 cases, compared to 9 for the 21 internal medicine doctors and 8 for the 18 resident doctors. However, the chatbot exhibited more instances of completely incorrect diagnoses than residents, showing some limitations. These inevaluable cases were excluded from the scores. “Generative AI such as ChatGPT can provide more ‘plausible diagnoses’ for refined cases,” Kim Joong-hee, a professor of emergency medicine at Seoul National University Bundang Hospital, said. Nevertheless, Kim, who is working on developing medical AI models in Korea, emphasized that AI may struggle to match physicians’ abilities in real-world scenarios.

With the relay of research showcasing the excellence of AI in healthcare, global companies are investing in the market. Google is set to officially launch MedPharm 2, a generative AI for healthcare, later this year. MedPharm has demonstrated its capabilities by achieving an 80% correct answer rate on the US Medical Licensing Examination (USMLE). Meanwhile, firms such as India’s Qure AI, Israel’s Aidoc, and the Netherlands’ Screen Point Medical aim to develop AI assistance for physicians. These services analyze brain CT scans, X-ray data, and ultrasounds to help doctors make accurate diagnoses.

Korean firms are also venturing into the AI medical field. Lunit, which offers AI-based data analysis services for lung disease and breast cancer, is actively pursuing entry into the U.S., the world’s largest medical market. Late last year, the company acquired Volpara Health, a provider utilized by over 40 percent of U.S. breast screening facilities, and commenced local sales this year. This year, local AI medical firms VUNO and JLK are also gearing up for FDA approval, highlighting Korea’s growing presence in the AI healthcare sector.

However, stakeholders in the industry agree that it is premature for AI to diagnose without the guidance and supervision of medical professionals. While the concerns about the accuracy of AI diagnosis persist, more complex issues, such as safeguarding sensitive medical information and assigning responsibility for the diagnosis, need to be addressed first. The Economist published an article titled “AI doctors will see you... eventually” in the latest issue, outlining various challenges facing AI in healthcare. However, the Economist noted that AI could save between $200 billion and $360 billion from the U.S.’s current annual medical spending of $4.5 trillion, representing 17% of GDP, if AI solutions are correctly implemented, highlighting the imperative to overcome the challenges for monetary benefits.

AI chatbot outperforms doctors in clinical reasoning

로그인