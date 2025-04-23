This new Virology Capabilities Test (VCT) paper presents a new, highly specialised benchmark designed to assess AI (in the form of LLMs, like o3) on complex, practical virology lab work:

Vct Paper 4.98MB ∙ PDF file Download Download

VCT consists of 322 multimodal questions covering fundamental, tacit, and visual knowledge that is essential for practical work in virology laboratories. The questions constituting VCT were developed and tested by externally-recruited scientists who had either obtained or were in the process of obtaining a PhD in virology.

Here are some examples of the test questions:

OpenAI's o3 model scored 43.8%, outperforming 94% of expert virologists, even when those experts were answering questions within their subfield. Human experts scored just 22.1% on average.

I’ve been working with O3, so this wasn’t a surprise to me. This is a powerful tool, and in the wrong hands, it can have disastrous consequences.

The authors highlight:

Dual-use risk: These models could be used to assist malicious actors with sensitive virology methods.

Governance gap: There's currently no robust regulatory framework for evaluating or restricting LLM capabilities in biosciences.

Tacit knowledge leakage: The LLMs appear to be absorbing forms of knowledge (like visual interpretation and lab-specific heuristics) that were once thought to be inaccessible without hands-on experience.

We’ve already seen what bat ladies and their alphabet agency collaborators can unleash. Now imagine what a superintelligence trained on the entire biomedical literature might do, especially one unshackled by law (to be fair, that doesn’t stop humans either—See Fauci).

And let’s not pretend the affiliations are neutral. The benchmark was developed with support from the RAND Corporation and other key actors in the biodefense space, which, in practice, often overlaps with the military-industrial complex.

Interesting times.

Share

Further reading: