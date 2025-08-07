GPT-5 was launched today (rolls out to everyone gradually over the next few days, and it’s free):

As these graphs show, OpenAI’s new GPT-5 outperforms top students on elite math contests and approaches PhD-level accuracy on complex, cross-disciplinary science questions, making it a powerful research assistant. Scientists like Dr. Derya Unutmaz already treat it as a trusted collaborator—capable of interpreting data, generating hypotheses, and accelerating experimental design by weeks or even months. GPT-5 still can’t surpass human math professors in constructing rigorous proofs, which often require sustained abstract thinking and long-term creative work. However, OpenAI researchers say they’re developing ways for GPT-5 to run continuously for weeks to tackle the most complex scientific challenges. In the meantime, working alongside human thinkers, GPT-5 will reshape how science is done.

Oh, and it hallucinates way less:

And, it’s more honest:

Alongside improved factuality, GPT‑5 (with thinking) more honestly communicates its actions and capabilities to the user—especially for tasks which are impossible, underspecified, or missing key tools. In order to achieve a high reward during training, reasoning models may learn to lie about successfully completing a task or be overly confident about an uncertain answer. For example, to test this, we removed all the images from the prompts of the multimodal benchmark CharXiv, and found that OpenAI o3 still gave confident answers about non-existent images 86.7% of the time, compared to just 9% for GPT‑5. When reasoning, GPT‑5 more accurately recognizes when tasks can’t be completed and communicates its limits clearly. We evaluated deception rates on settings involving impossible coding tasks and missing multimodal assets, and found that GPT‑5 (with thinking) is less deceptive than o3 across the board. On a large set of conversations representative of real production ChatGPT traffic, we’ve reduced rates of deception from 4.8% for o3 to 2.1% of GPT‑5 reasoning responses. While this represents a meaningful improvement for users, more work remains to be done, and we’re continuing research into improving the factuality and honesty of our models.

As you will know, if you have ever used ChatGPT, when it faces a task it can’t do — like answering a question with missing information — older models often made things up and pretended they could do it. For example, when researchers deleted images from a test and asked questions about them, GPT-4 still confidently described the missing images 87% of the time. GPT-5 only did this 9% of the time — a big improvement.

In general, GPT-5 is better at recognising when something is impossible or unclear, and it tells the user instead of faking an answer. That’s a massive improvement.

It makes the model more trustworthy, especially in real-world situations. But it’s still not perfect. Hopefully, before it starts running the government, the AI will be honest 100% of the time — and never hallucinate. What are the chances of that happening?

Incidentally, 2 million U.S. federal workers now have access to GPT-5. How many of them will be replaced before the end of next year? How many white-collar jobs will follow? And after that, how long until the robots come for the rest?

What will that world look like?

