Labmanager Logo
the letters "AI" in white inside a blue circle, surrounded by a network of blue lines and dots that look like circuits and connect to form the outline of a human head in the side profile, on a dark blue background

iStock, Chor muang

Large Language Models Outperform Students in Scientific Informatics Course

New study raises concerns over the future of student assessment

| 3 min read
Share this Article
Register for free to listen to this article
Listen with Speechify
0:00
3:00

William Hersh, M.D., who has taught generations of medical and clinical informatics students at Oregon Health & Science University, found himself curious about the growing influence of artificial intelligence. He wondered how AI would perform in his own class.

So, he decided to try an experiment.

Want the latest lab management news?

Subscribe to our free Lab Manager Monitor newsletter.

He tested six forms of generative, large-language AI models — for example ChatGPT — in an online version of his popular introductory course in biomedical and health informatics to see how they performed compared with living, thinking students. A study published in the journal npj Digital Medicine, revealed the answer: Better than as many as three-quarters of his human students.

“This does raise concern about cheating, but there is a larger issue here,” Hersh said. “How do we know that our students are actually learning and mastering the knowledge and skills they need for their future professional work?”

As a professor of medical informatics and clinical epidemiology in the OHSU School of Medicine, Hersh is especially attuned to new technologies. The role of technology in education is nothing new, Hersh said, recalling his own experience as a high school student in the 1970s during the transition from slide rules to calculators.

Yet, the shift to generative AI represents an exponential leap forward.

“Clearly, everyone should have some kind of foundation of knowledge in their field,” Hersh said. “What is the foundation of knowledge you expect people to have to be able to think critically?”

Large-language models

Hersh and co-author Kate Fultz Hollis, an OHSU informatician, pulled the knowledge assessment scores of 139 students who took the introductory course in biomedical and health informatics in 2023. They prompted six generative AI large language models with student assessment materials from the course. Depending on the model, AI scored in the top 50th to 75th percentile on multiple-choice questions that were used in quizzes and a final exam that required short written responses to questions.

“The results of this study raise significant questions for the future of student assessment in most, if not all, academic disciplines,” the authors write.

The study is the first to compare large-language models to students for a full academic course in the biomedical field. Hersh and Fultz Hollis noted that a knowledge-based course such as this one may be especially ripe for generative, large-language models, in contrast to more participatory academic courses that help students develop more complex skills and abilities.

Hersh remembers his experience in medical school.

“When I was a medical student, one of my attending physicians told me I needed to have all the knowledge in my head,” he said. “Even in the 1980s, that was a stretch. The knowledge base of medicine has long surpassed the capacity of the human brain to memorize it all.”

Maintaining the human touch

Yet, he believes there’s a fine line between making sensible use of technical resources to advance learning and over-reliance to the point that it inhibits learning. Ultimately, the goal of an academic health center like OHSU is to educate health care professionals capable of caring for patients and optimizing the use of data and information about them in the real world.

In that sense, he said, medicine will always require the human touch.

“There are a lot of things that health care professionals do that are pretty straightforward, but there are those instances where it gets more complicated and you have to make judgment calls,” he said. “That’s when it helps to have that broader perspective, without necessarily needing to have every last fact in your brain.”

With fall classes starting soon, Hersh said he’s not worried about cheating.

“I update the course each year,” he said. “In any scientific field, there are new advancements all the time and large-language models aren’t necessarily up to date on all of it. This just means we’ll have to look at newer or more nuanced tests where you won’t get the answer out of ChatGPT.”

-Note: This news release was originally published on the Oregon Health & Science University website. As it has been republished, it may deviate from our style guide.

About the Author

  • Erik Robinson is the senior media relations specialist for the Oregon Health & Science University.

Related Topics

Loading Next Article...
Loading Next Article...

CURRENT ISSUE - October 2024

Lab Rats to Lab Tech: The Evolution of Research Models

Ethical and innovative scientific alternatives to animal-based R&D

Lab Manager October 2024 Cover Image
Lab Manager eNewsletter

Stay Connected

Click below to subscribe to Lab Manager Monitor eNewsletter!

Subscribe Today