The panel sits at a long table, in front of which is a banner with the MICDE logo. Duraisamy stands, speaking.

Conference: Scientific discovery in the age of AI

Experts from academia, industry and government discussed the growing utility of generative AI in science and what’s coming next.

10 minutes

The second Conference on Foundation Models and AI Agents for Science (SciFM25), organized by the Michigan Institute for Computational Discovery & Engineering (MICDE), brought together more than 500 scientists and engineers from U-M and beyond at the end of May. Attendees traded ideas on the foundations of generative AI, methods that can be applied now, avenues for improvement, ways to integrate AI with other computational tools, and the future of AI in scientific discovery.

“This conference has become the premier venue for in-depth discussions around the theory and applications of generative AI to scientific problems, with a focus on foundation models and agentic intelligence,” said Karthik Duraisamy, the Samir and Puja Kaul Director of MICDE and organizer of the conference. 

“I am particularly happy with the deep discussions on understanding and characterizing how AI works, and the many gaps that remain before these tools become reliable co-scientists. Also, these types of developments require large scale investments, resources and collaborations, and so this conference also serves as a venue to envision a national ecosystem for AI-augmented science,” he added.

Participants hailed from a variety of universities, national laboratories, government institutions and industry, and while many came to the field from computational science and AI backgrounds, others included practitioners who use computational tools to solve problems in their domains. Many speakers and panelists are highly influential in AI for science.

The panel sits at a long table, in front of which is a banner with the MICDE logo. Duraisamy stands, speaking.
Panelists from the left: Paul Kearns (LANL), Bill Dally (NVIDIA), Helena Fu (DoE), Thomas Mason (LANL), Tanya Das (Bipartisan Policy Institute). Duraisamy, behind the panel, moderates. PHOTO: Leisa Thompson, Michigan Photography

The growing utility of AI tools

The first day of the conference was dedicated towards workshops and tutorials on how to leverage modern generative AI tools for mathematical reasoning, molecular design and computer-aided engineering. The workshops also covered the important topic of adapting large reasoning models and AI agents for specific scientific domains.  

Venkat Viswanathan, an associate professor of aerospace engineering at U-M, presented a molecular foundation model that his team built for the purpose of identifying better battery electrolytes. He ran a live demo of that model linked up to a conventional large language model, asking it to design an electrolyte using molecules associated with scent. 

“Following the demo at the conference, there has been significant follow-on interest from industry in using our molecular foundation models for their use-cases,” said Viswanathan.

Over the next two days, presenters discussed using AI and machine learning tools to design experiments, develop mathematical proofs, predict weather, invent recipes, design medicines and other small molecules and proteins, and interpret genetic sequences to reveal encoded proteins and other biomolecules.

They also looked ahead to agentic AI, which is just beginning to take off. As data drove the first wave of generative AI, Rick Stevens, associate laboratory director at Argonne National Laboratory, said that advances in reasoning are driving this phase. Newer large language models (LLMs) can source their facts, access tools like calculators and correct their hallucinations, among other developments.

A man stands with his arm upraised, a bar chart displayed on the large screen behind him, and auto transcription on a smaller screen.
Rick Stevens (Argonne) gives an overview of the state of AI technologies. PHOTO: Leisa Thompson, Michigan Photography

Already, Stevens reported that the first fully AI study has been accepted to a conference workshop. In this study, a program called “The AI Scientist” invented a deep learning experiment, ran it, analyzed the results, and drafted and revised the paper. The conference had agreed to receive an all-AI paper and used a double-blind review process to ensure that the human reviewers wouldn’t be biased against the algorithmic author. While the approach to improving neural networks that The AI Scientist explored wasn’t successful, humans deemed the study worthy of discussion.

Agentic AI in science

Many speakers presented the concept of an AI orchestrator that would oversee the work of different agents tasked with the day-to-day of running experiments. In this situation, the human scientist becomes more akin to a research program manager, overseeing the work of multiple AI-powered groups.

The team of AI agents would include roles like experiment planning, scheduling, running experiments and analyzing the data. Because the needs are fairly universal, Ian Foster, director of Argonne National Laboratory’s Data Science and Learning Division, identified the design of AI agents that can be used across different disciplines as a future direction for the field.

Of course, agents alone aren’t enough—to be useful, they need access to relevant data. Helena Fu, director of the Department of Energy’s Office of Critical and Emerging Technologies, discussed putting the enormous catalogues of data that exist within government agencies into forms that foundation models can train on as a government priority. 

Likewise, Foster spoke of efforts to create a ‘living data fabric’ that provides one-stop access to all of the available data within Argonne, set up to be ingestible by foundation models and regularly updated. Of course, not all data is public, nor even accessible to all within a given national lab. Foster suggested a data fabric that limits access according to the user’s privileges. 

For leveraging AI with classified nuclear data, Thomas Mason, director of Los Alamos National Laboratory, reported the installation of an NVIDIA Grace Hopper machine with OpenAI’s raw model weights. On this contained supercomputer, they are training the model to make predictions about nuclear physics.

Some folks were bullish on the agentic AI future. Stevens predicted that the scientific workforce could be cut by 99% in a decade while maintaining today’s output, drawing an analogy with the mechanization of agriculture. Others were more doubtful that today’s generative AI is up to the task, with Surya Ganguli, senior fellow at the Stanford Institute for Human-Centered AI, noting that five specialized idiots don’t necessarily make a useful tool. 

Duraisamy argued that while he believed that the nature of reality, and hence science, is entirely computational, the community is very far away from developing the correct way to abstract the computation and carry it out in an affordable manner. Given the evolution of AI technology, he sketched out plan for how to structure an AI system for scientific discovery that includes: 

  • Reasoning capabilities, powered by a reasoning LLM
  • Domain-specific foundation models, such as the chemical or protein models
  • Dynamic knowledge graphs, or regularly updated webs of facts and the relationships between them
  • Verification, including data sources and verification method—and input from human scientists

However long it takes for agentic AI to reach prime time, most presenters didn’t express concern about job losses in science.

“We’ve not found that people are out of work—we have a lot of work to do. But people are much more productive,” said Bill Dally, chief scientist at NVIDIA.

People mostly fill the amphitheatre, which has a capacity of 227. Near the front are speakers Markus Buehler (MIT), Surya Ganguly (Stanford), Lav Varshney (UIUC), and Venkat Viswanathan (U-M).
Speakers and attendees listen in the Rackham Amphitheatre. PHOTO: Leisa Thompson, Michigan Photography

Creativity, emergence and general intelligence

The speakers also got at some of the deeper questions underpinning AI, like how can machines oriented on mean values achieve creativity? Where do emergent abilities come from—for instance, translating between languages that were in LLM training data but never directly compared? How do we get to AI that can truly think?

Lav Varshney, an associate professor of electrical and computer engineering at the University of Illinois Urbana-Champaign, presented a way of quantifying creativity. He noted a trade-off between the novelty of a proposed solution and its quality. For instance, he said that creative cuisine expands our idea of what food can be, but there are many combinations that don’t work well together.

Ganguli pointed out that to get the most useful creativity out of a foundation model, designers should build it according to the number of guesses it will have. A model with more guesses can choose a wider variety of results, with more variation between them, whereas one with fewer guesses should keep its answers near the result it is most confident about.

Machine learning models have always had the problem of being “black boxes” that don’t give away how they come to their predictions. Mario Krenn, a research group leader at the Max Planck Institute, described using machine learning to come up with a design for a new experiment that eventually resulted in a Nature Photonics study. 

Two women and a man discuss the project "Developing a foundation model for predicting material failure."
Poster session. PHOTO: Leisa Thompson, MichiSSciF

Krenn’s algorithm couldn’t explain the new conceptual link it had exploited to develop the experiment—that was something that he and his human collaborators had to work out while studying and talking over the experimental design. Researchers are now looking ahead to generative AI that uses symbolic reasoning, reflecting an understanding rather than brute-force number crunching to arrive at a probable answer.

This is part of why researchers are so interested in AI’s mathematical abilities, as several panelists raised, because applying mathematical principles requires symbolic reasoning. Conventional LLMs are basically giving math problems their best guess, and they can be linked up with calculators as external tools, but so far, they haven’t derived math for themselves. 

As LLMs with reasoning capabilities improve at math problems, some researchers even wonder if they’re truly grasping math or taking a toddler’s step toward a theory of mind. 

“One big question was whether they were being told not to use a calculator but doing it anyway and getting more sophisticated about lying,” said Rebecca Willett, a professor of statistics and computer science at the University of Chicago.

Some AI researchers are exploring mathematical proofs as well. Google DeepMind has been experimenting with writing in Lean, the language of theorem-verifying software, with the tools AlphaProof and AlphaGeometry 2. Taking on problems set out in the International Math Olympiad competition, the AI results were at the level of a silver medalists.

Role for public science

Even with this progress, how to arrive at symbolic reasoning is an open question. It’s possible that with the way new capabilities emerge as the size of the model increases, building larger models could enable this. However, government funding agencies would be hard pressed to outstrip the billions that private companies are pouring into the scaling strategy. Even if they could, most panelists agreed that this was not the course the scientific community should take. For one, we don’t know how big models will need to get before the next step change in capability arises.

Two men and a woman discuss the project "Deep learning for pulsar parameter estimation from light curves."
Poster session. PHOTO: Dieu-Nalio Chery, Michigan Photography

For two, they recognized that AI research is currently in a bubble. Qiaozhu Mei, associate dean for research and innovation at U-M’s School of Information, pointed out that while many studies are being published, they’re all clustered around the architectures of the foundation models of today rather than imagining different architectures that could bring the next big advance. 

“If I had a trillion dollars, I would run a grant program that allows people to do anything except what everyone is doing right now,” said Ganguli.

The more limited funding from taxpayers should go toward exploring the areas of the landscape that are currently ignored in the race for bigger machines, panelists said. Varshney noted that biology hasn’t settled on a single brain architecture. He called on public institutions to support as wide a variety of ideas as possible, ensuring that, “a thousand flowers can bloom.”

After two initial years at U-M, the Conference on Foundation Models and AI Agents for Science will be hosted by the University of Chicago and Argonne National Laboratory next year.

SciFM25 was organized by MICDE and sponsored by the Los Alamos National Laboratory, the Trillion Parameter Consortium, Amazon Web Services, NVIDIA, OpenAI, Geminus and Donaldson Filtration Solutions.

Geminus was founded by Duraisamy. He and U-M have a financial interest in the company.