TL;DR: I discuss a new paper that introduces "The AI Scientist," an advanced AI framework capable of autonomously conducting scientific research from start to finish. It can generate research ideas, perform experiments, analyse results, and write and review complete scientific papers. This system has been applied successfully in various machine learning subfields, demonstrating its ability to enhance research efficiency and reduce costs significantly.

As far as I know, no one has invented an artificial general intelligence that can take over the work of any great scientist, yet; however, we do have some very capable LLMs, which can be harnessed to do a good deal of the essential donkey work of human scientists. In a new paper, scientists from the University of Oxford and the University of British Columbia describe the creation of a ‘fully automated open-ended scientific discovery’ AI using LLMs, which they call ‘The AI Scientist’. They claim that it has already been successful in researching and writing several papers related to machine learning:

One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aides to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process. This paper presents the first comprehensive framework for fully automatic scientific discovery, enabling frontier large language models (LLMs) to perform research independently and communicate their findings. We introduce The AI Scientist, which generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a simulated review process for evaluation. In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion and add them to a growing archive of knowledge, acting like the human scientific community. We demonstrate the versatility of this approach by applying it to three distinct subfields of machine learning: diffusion modeling, transformer-based language modeling, and learning dynamics. Each idea is implemented and developed into a full paper at a meager cost of less than $15 per paper, illustrating the potential for our framework to democratize research and significantly accelerate scientific progress. To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores. The A I Scientist can produce papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer. This approach signifies the beginning of a new era in scientific discovery in machine learning: bringing the transformative benefits of AI agents to the entire research process of AI itself, and taking us closer to a world where endless affordable creativity and innovation can be unleashed on the world’s most challenging problems. Our code is open-sourced at https://github.com/SakanaAI/AI-Scientist

The AI Scientist is able to generate its own scientific ideas and hypotheses, as well as a plan for testing them with experiments. Next, The AI Scientist implements plan-directed code-level changes to the experiment “template” using the state-of-the-art coding assistant Aider (Gauthier, 2024), and executes experiments to collect a set of computational results, which are in turn used to draft a scientific paper. The AI Scientist then performs an automated paper-reviewing process using guidelines from a standard machine learning conference. Finally, The AI Scientist adds the completed ideas and reviewer feedback to its archive of scientific findings, and the process repeats. Crucially, the generated paper and experimental artifacts The AI Scientist produces allow us to easily interpret and judge its findings post-hoc, allowing human scientists to also benefit from what is learned. Our contributions are summarized as follows: 1. We introduce the first end-to-end framework for fully automated scientific discovery in Machine Learning research, enabled by frontier LLMs (Section 3). This fully automated process includes idea generation, experiment design, execution, and visualizing and writing up the results into a full manuscript. 2. To assess the quality of the generated papers, we introduce a foundation model-based reviewing process in Section 4. This process achieves near-human-level performance across multiple evaluation metrics (e.g. 65% vs. 66% balanced accuracy) when evaluated on ICLR 2022 OpenReview data. The reviews further enable The AI Scientist to select the best ideas for “publication” to an ever-growing archive of scientific discoveries, and the process can be repeated to build on these discoveries, just as in the human scientific community. 3. The AI Scientist can generate hundreds of interesting, medium-quality papers over the course of a week. In this report, we focus on a subset of these papers, highlighting novel insights in diffusion modeling, language modeling, and grokking. We perform an in-depth case study into one selected paper in Section 5, and present aggregate results in Section 6.

In this paper, we introduced The AI Scientist, the first framework designed to fully automate the scientific discovery process, and, as a first demonstration of its capabilities, applied it to machine learning itself. This end-to-end system leverages LLMs to autonomously generate research ideas, implement and execute experiments, search for related works, and produce comprehensive research papers. By integrating stages of ideation, experimentation, and iterative refinement, The AI Scientist aims to replicate the human scientific process in an automated and scalable manner. Why does writing papers matter? Given our overarching goal to automate scientific discovery, why are we also motivated to have The AI Scientist write papers, like human scientists? For example, previous AI-enabled systems such as FunSearch (Romera-Paredes et al., 2024) and GNoME (Pyzer-Knapp et al., 2022) also conduct impressive scientific discovery in restricted domains, but they do not write papers. There are several reasons why we believe it is fundamentally important for The AI Scientist to write scientific papers to communicate its discoveries. First, writing papers offers a highly interpretable method for humans to benefit from what has been learned. Second, reviewing written papers within the framework of existing machine learning conferences enables us to standardize evaluation. Third, the scientific paper has been the primary medium for disseminating research findings since the dawn of modern science. Since a paper can use natural language, and include plots and code, it can flexibly describe any type of scientific study and discovery. Almost any other conceivable format is locked into a certain kind of data or type of science. Until a superior alternative emerges (or possibly invented by AI), we believe that training The AI Scientist to produce scientific papers is essential for its integration into the broader scientific community. Costs. Our framework is remarkably versatile and effectively conducts research across various subfields of machine learning, including transformer-based language modeling, neural network learning dynamics, and diffusion modeling. The cost-effectiveness of the system, producing papers with potential conference relevance at an approximate cost of $15 per paper, highlights its ability to democratize research (increase its accessibility) and accelerate scientific progress. Preliminary qualitative analysis, for example in Section 5, suggests that the generated papers can be broadly informative and novel, or at least contain ideas worthy of future study. The actual compute we allocated for The AI Scientist to conduct its experiments in this work is also incredibly light by today’s standards. Notably, our experiments generating hundreds of papers were largely run only using a single 8×NVIDIA H100 node over the course of a week. Massively scaling the search and filtering would likely result in significantly higher-quality papers. In this project, the bulk of the cost for running The AI Scientist is associated with the LLM API costs for coding and paper writing. In contrast, the costs associated with running the LLM reviewer, as well as the computational expenses for conducting experiments, are negligible due to the constraints we’ve imposed to keep overall costs down. However, this cost breakdown may change in the future if The AI Scientist is applied to other scientific fields or used for larger-scale computational experiments.

Future Directions. Direct enhancements to The AI Scientist could include integrating vision capabilities for better plot and figure handling, incorporating human feedback and interaction to refine the AI’s outputs, and enabling The AI Scientist to automatically expand the scope of its experiments by pulling in new data and models from the internet, provided this can be done safely. Additionally, The AI Scientist could follow up on its best ideas or even perform research directly on its own code in a self-referential manner. Indeed, significant portions of the code for this project were written by Aider. Expanding the framework to other scientific domains could further amplify its impact, paving the way for a new era of automated scientific discovery. For example, by integrating these technologies with cloud robotics and automation in physical lab spaces (Arnold, 2022; Kehoe et al., 2015; Zucchelli et al., 2021) provided it can be done safely, The AI Scientist could perform experiments for biology, chemistry, and material sciences. Crucially, future work should address the reliability and hallucination concerns, potentially through a more in-depth automatic verification of the reported results. This could be done by directly linking code and experiments, or by seeing if an automated verifier can independently reproduce the results.

Conclusion. The introduction of The AI Scientist marks a significant step towards realizing the full potential of AI in scientific research. By automating the discovery process and incorporating an AI-driven review system, we open the door to endless possibilities for innovation and problem-solving in the most challenging areas of science and technology. Ultimately, we envision a fully AI-driven scientific ecosystem including not only AI-driven researchers but also reviewers, area chairs, and entire conferences. However, we do not believe the role of a human scientist will be diminished. We expect the role of scientists will change as we adapt to new technology, and move up the food chain. While the current iteration of The AI Scientist demonstrates a strong ability to innovate on top of well-established ideas, such as Diffusion Modeling or Transformers, it is an open question whether such systems can ultimately propose genuinely paradigm-shifting ideas. Will future versions of The AI Scientist be capable of proposing ideas as impactful as Diffusion Modeling, or come up with the next Transformer architecture? Will machines ultimately be able to invent concepts as fundamental as the artificial neural network, or information theory? We believe The AI Scientist will make a great companion to human scientists, but only time will tell to the extent to which the nature of human creativity and our moments of serendipitous innovation (Stanley and Lehman, 2015) can be replicated by an open-ended discovery process conducted by artificial agents.

Crucially, the generated paper and experimental artifacts The AI Scientist produces allow us to easily interpret and judge its findings post-hoc, allowing human scientists to also benefit from what is learned.

As I’ve reported previously, AI has made many breakthroughs in science, from the discovery of novel antibiotics to theoretically predicting the structure and interactions of all of life’s molecules. However, what we need are breakthroughs in what is described as the explainability or interpretability of the AI’s work—to be most useful, it can’t just be a ‘black box’. This new ‘AI Scientist’ is one step along that road.

However, we don’t simply want intelligent machines that churn out millions of mediocre papers. I would argue that the priority should be to create AI capable of accessing all current scientific papers and accurately extracting the information scientists need to advance their fields. Additionally, if we want to achieve greater scientific progress, we must employ AI at scale to read hundreds of millions of papers, detecting plagiarism and fake data—a very real scourge in contemporary science. This challenge is not only political and commercial (paywalls) but also technological (accurately reading papers and avoiding hallucinations). Who knows what important data might be out there, waiting for a brilliant scientist to connect the dots across fields using AI's searching power to generate genuinely paradigm-shifting ideas?

