The artificial intelligence bot, ChatGPT, has once again fallen short in the complex realm of medical diagnostics. In a recent study published in JAMA Pediatrics, it was found that the fourth iteration of this AI language model fumbled with an alarming 83% error rate when diagnosing pediatric cases.
Last year, ChatGPT’s performance was subpar at diagnosing challenging medical cases, achieving only a 39% accuracy rate. However, this latest study reveals that its proficiency dips even lower when it comes to pediatric cases, with the AI bot correctly diagnosing only 17% of them. This underwhelming performance emphasizes the irreplaceable value of human clinical experience and puts to rest premature concerns about AI replacing pediatricians.
The findings not only highlight the critical weaknesses in ChatGPT’s diagnostic capabilities but also shed light on potential avenues for transforming it into a more effective tool in clinical settings. Despite these shortcomings, the integration of AI chatbots into healthcare is seen as an inevitability by many physicians, given the significant interest in and experimentation with such technologies.
AI has found a foothold in various aspects of the medical field, from automating administrative tasks to assisting in the interpretation of chest scans and retinal images. However, it has also been associated with notable failures, including perpetuating algorithmic racial bias. The potential of AI in problem-solving has sparked considerable interest in harnessing it for complex diagnostics, eliminating the need for a quirky, genius medical professional.
The study, conducted by researchers at Cohen Children’s Medical Center in New York, underscored that ChatGPT-4 isn’t primed for pediatric diagnoses. Pediatric cases demand a keen consideration of the patient’s age—an aspect often overlooked by the AI. Moreover, diagnosing conditions in infants and small children who cannot articulate their symptoms poses an additional challenge.
The researchers tested ChatGPT against 100 pediatric case challenges published in JAMA Pediatrics and NEJM from 2013 to 2023. These cases, presented as quizzes or challenges, invite physicians to diagnose complex or unusual conditions based on the attending doctors’ information at the time.
In the test, researchers inputted relevant text from the medical cases into ChatGPT, and two qualified physician-researchers evaluated the AI-generated responses. The results were disheartening: ChatGPT correctly diagnosed only 17 of the 100 cases, provided incorrect diagnoses for 72 cases, and insufficiently captured the diagnosis for the remaining 11 cases.
Among the misdiagnosed cases, the AI bot often identified a related but overly broad or unspecific condition. For instance, it diagnosed one child’s condition as a branchial cleft cyst when the correct diagnosis was Branchio-oto-renal syndrome—a genetic condition causing abnormal neck tissue development and ear and kidney malformations.
The study also revealed that ChatGPT struggled with recognizing known connections between conditions—an ability that experienced physicians typically possess. It failed to connect autism with scurvy (Vitamin C deficiency), despite the known link between neuropsychiatric conditions like autism, restrictive diets, and subsequent vitamin deficiencies.
In conclusion, while AI has immense potential in healthcare, ChatGPT’s high error rate in diagnosing pediatric cases underscores the importance of human expertise in complex medical diagnostics.