Microsoft just published their “Path to Medical Superintelligence”

Benchmarked against real-world case records published each week in the New England Journal of Medicine, we show that the Microsoft AI Diagnostic Orchestrator (MAI-DxO) correctly diagnoses up to 85% of NEJM case proceedings, a rate more than four times higher than a group of experienced physicians. MAI-DxO also gets to the correct diagnosis more cost-effectively than physicians.

To practice medicine in the United States, physicians need to pass the United States Medical Licensing Examination (USMLE), a rigorous and standardized assessment of clinical knowledge and decision making. USMLE questions were among the earliest benchmarks used to evaluate AI systems in medicine, offering a structured way to compare model performance – both against each other and against human clinicians. In just three years, generative AI has advanced to the point of scoring near-perfect scores on the USMLE and similar exams. But these tests primarily rely on multiple-choice questions, which favor memorization over deep understanding. By reducing medicine to one-shot answers on multiple-choice questions, such benchmarks overstate the apparent competence of AI systems and obscure their limitations.

At Microsoft AI, we’re working to advance and evaluate clinical reasoning capabilities. To move beyond the limitations of multiple-choice questions, we’ve focused on sequential diagnosis, a cornerstone of real-world medical decision making. In this process, a clinician begins with an initial patient presentation and then iteratively selects questions and diagnostic tests to arrive at a final diagnosis. For example, a patient presenting with cough and fever may lead the clinician to order and review blood tests and a chest X-ray before they feel confident about diagnosing pneumonia. Each week, the New England Journal of Medicine (NEJM) – one of the world’s leading medical journals – publishes a Case Record of the Massachusetts General Hospital, presenting a patient’s care journey in a detailed, narrative format. These cases are among the most diagnostically complex and intellectually demanding in clinical medicine, often requiring multiple specialists and diagnostic tests to reach a definitive diagnosis. How does AI perform? To answer this, we created interactive case challenges drawn from the NEJM case series – what we call the Sequential Diagnosis Benchmark (SD Bench). This benchmark transforms 304 recent NEJM cases into stepwise diagnostic encounters where models – or human physicians – can iteratively ask questions and order tests. As new information becomes available, the model or clinician updates their reasoning, gradually narrowing toward a final diagnosis. This diagnosis can then be compared to the gold-standard outcome published in the NEJM. Each requested investigation also incurs a (virtual) cost, reflecting real-world healthcare expenditures. This allows us to evaluate performance across two key dimensions: diagnostic accuracy and resource expenditure.

Here’s a demonstration:

So Microsoft’s new medical AI can crack some of the world’s trickiest medical cases. Good. It probably will out-diagnose the average harried, overworked, public health system-strangled human doctor.

It’s the potential puppeteers behind the curtain I don’t trust. When your doctor is a neural network owned by a mega-corp, funded by a billionaire who styles himself humanity’s pharmacist-in-chief, what levers are they itching to pull? Yes, Gates has sold most of his shares in Microsoft now, but I’m guessing his influence lingers. He’s tight with the leadership. Satya Nadella (current CEO) is his ideological heir in some ways: a true believer in Big Tech as a global “solution machine.” Gates isn’t in the basement writing code for the Diagnostic Orchestrator. But he helped create the corporate empire, building it, and his fingerprints are all over the broader push to fuse Big Tech, Big Health, and “philanthropic” control. The world should have learned during the scamdemic how tangled those worlds are.

The last few years should have taught us that once a system this powerful exists, it will be manipulated at some point. The tech might be brilliant. But put it in the hands of the same clique that pushed toxic jabs and locked down the world “for your safety” while consolidating control, and you’re signing up for a future where your diagnosis is just another business model or worse. I’m not worried that the AI can’t give a better diagnosis than my GP. I’m concerned that, under certain circumstances, it’ll do exactly what it’s told.

Too paranoid? Maybe. Watch this space.

All that said, I’d still let it diagnose me. After waiting over two years just to see a neurologist, I’m painfully (literally) aware how broken the system already is. How many people die on waiting lists, never mind misdiagnoses?

That’s why AI will be inevitable. Not because it’s perfect, or trustworthy, or free from manipulation — but because the alternative, for most people, is nothing at all.

“What could possibly go wrong?” gets easier to ignore the longer you’re left waiting.

