A headshot of a man in a blazer standing, smiling outside of a brick, academic building.

Large language models and research progress: A Q&A with Ricardo Vinuesa

Guidelines for responsible LLM use to help, not hinder, research progress.

The rapid expansion of large language models’ (LLMs) capabilities—including web search, code execution, data analysis and even hypothesis generation and experimental design—is outpacing critical reflection of how the technology fits into academic research.

A headshot of a man in a blazer standing, smiling outside of a brick, academic building.
Ricardo Vinuesa, an associate professor of aerospace engineering at U-M, shares his perspectives on ways to ethically integrate large language models (LLMs) into the research process. Credit: Ricardo Vinuesa personal archive.

This is the argument that Ricardo Vinuesa, an associate professor of aerospace engineering at U-M, and his co-authors recently made in the journal The Innovation.

Though they acknowledge that LLMs are helpful for generating a speedy first draft, adopting LLMs into every stage of the research process without proper guardrails creates risk of misconduct, such as data fabrication or biased experimental design. In a 2024 Nature survey, 81% of researchers reported they had used LLMs in their work, highlighting a need for common guidelines for responsible use.  

Vinuesa shares his perspectives on the best practices for LLM use in research—acknowledging both LLMs’ strengths and current limitations.

How are LLMs helping the research process?

Systems that produce text are accelerating the production of initial versions of research proposals, methods documentation or progress reports. They can also help automate repetitive tasks, like producing a computational mesh for a particular simulation. 

More recently, AI co-scientists, which are multi-agent systems meant to act as a virtual collaborator, are starting to be helpful in deciding which experiments should be conducted to investigate certain phenomena. For example, it proved helpful in recent studies about gene transfer mechanisms and targeted drugs for liver fibrosis

What is the perception of LLMs within the scientific community?

There’s a very broad range of opinions at the moment, and it seems to change almost by the day. On one extreme, some people really don’t trust these systems, and on the other people over-trust them.

Over-trust happens when people assign human properties to LLMs—assuming the output is like a person is thinking and telling you their opinion. In reality, that isn’t how LLMs work. They are just predicting the next token, meaning the next piece of a sentence.

The biggest benefit lies in the middle of these two extremes. Users should understand that LLMs are tools that can accelerate certain tasks at the moment, but recognize that there is a lot of room for improvement.

Could LLMs help with scientific breakthroughs?

A lot of great breakthroughs in physics like relativity and quantum physics were huge departures from existing ways of thinking. People who were thinking outside of the box patiently developed these ideas over years until the community accepted them. 

LLMs are not trained to make leaps like this. Instead, they are programmed to be compliant with what the human will expect. If you want creative thinking that could lead to breakthroughs, you need to think differently. Future systems could have this capability, but it’s not there yet. 

While LLMs can be helpful with certain tasks, we should still promote creativity and novel thinking. Otherwise, scientific progress may stagnate. 

What safeguards can help scientists use LLMs responsibly?

In the context of LLMs, there’s an interesting question about whether the prompting interaction between the human and the AI system should also be preserved as part of a published data set as evidence of how the interaction happened. 

It’s not a common practice yet, but this seems like a good practice in general for transparency. For example, if it was used to write a discussion section of a paper, it helps others understand how much human input there was in the prompting stage to get the end result. Publishing LLM interactions used for research would align with FAIR data principles, which emphasize findability, accessibility, interoperability, and reuse of digital assets. These guidelines are about open access, accessible, reproducible data sets to accelerate research progress. 

When submitting and evaluating grant proposals, we should not remove humans from the loop. While LLMs can help speed up the writing process or summarize ideas during review, we want to avoid a position where LLMs are used entirely to write and evaluate proposals. This would create a cycle where we are stuck with the same information being fed back and forth between LLMs.

Another key idea is explainability. Tools for true explainability are critical for users to trust outputs and detect any possible external manipulation. If users can understand how LLMs are doing what they’re doing, they can align the model with preferred behaviors, like ways of reasoning or depth and breadth of literature exploration.

How should LLMs be credited in research papers?

Many journals claim that to be a credited author of a study, you need to be accountable for any ethical implications the article may have. For example, if data was falsified, the authors are responsible. Because an LLM system is not an ethical entity, it cannot be listed as a co-author of a study.

Many journals are beginning to include a section where authors can outline which LLMs were used and how. Were LLMs used to polish the grammar or to generate full sections of text? If I knew the results were produced entirely by a human but only an LLM wrote the discussion section, I would assume there was not a lot of creative and intuitive interpretation of the results in this stage. It’s important for your reader to have the context.

Who should be involved in creating best practice guidelines?

Because LLMs will affect everyone in science, it requires a global dialogue. There should be significant input from the scientific community. In particular, machine learning researchers should be a part of the process as they know best how the systems they developed work. The scientific community will be producing proposals, papers and peer reviews, and their input on how we should and shouldn’t use LLMs will help ensure the quality and innovation of new developments.

A graphic design featuring a series of circles connected by lines, representing a neural network.

Artificial Intelligence

Explore the forefront of AI
at Michigan Engineering

What LLM improvements are you looking forward to?

There is a lot of potential for improving LLMs if we are able to ground them in physical principles. If you’re working with an agentic AI system in the long-term, it would help to have physical laws be a part of how models work instead of statistically drawing tokens from a database. That way, the causal links are preserved when you are using the output of an LLM for a discovery. 

Overall, we should not remove the human from the loop, because creativity is key for the new ideas and breakthrough thinking that we need.