AI-Generated Academic Science Writing Can Be Identified with High Accuracy

The debut of artificial intelligence chatbot ChatGPT has set the world abuzz with its ability to churn out human-like text and conversations. Still, many telltale signs can help us distinguish AI chatbots from humans, according to a study published on in the journal Cell Reports Physical Science. Based on the signs, the researchers developed a tool to identify AI-generated academic science writing with over 99 percent accuracy.

“We tried hard to create an accessible method so that with little guidance, even high school students could build an AI detector for different types of writing,” says first author Heather Desaire, a professor at the University of Kansas. “There is a need to address AI writing, and people don’t need a computer science degree to contribute to this field.”

“Right now, there are some pretty glaring problems with AI writing," says Desaire. "One of the biggest problems is that it assembles text from many sources and there isn't any kind of accuracy check—it's kind of like the game Two Truths and a Lie."

Although many AI text detectors are available online and perform fairly well, they weren’t built specifically for academic writing. To fill the gap, the team aimed to build a tool with better performance precisely for this purpose. They focused on a type of article called perspectives, which provide an overview of specific research topics written by scientists. The team selected 64 perspectives and created 128 ChatGPT-generated articles on the same research topics to train the model. When they compared the articles, they found an indicator of AI writing—predictability.

Contrary to AI, humans have more complex paragraph structures, varying in the number of sentences and total words per paragraph, as well as fluctuating sentence length. Preferences in punctuation marks and vocabulary are also a giveaway. For example, scientists gravitate towards words like "however," "but," and "although," while ChatGPT often uses "others" and "researchers" in writing. The team tallied 20 characteristics for the model to look out for.

When tested, the model aced a 100 percent accuracy rate at weeding out AI-generated full perspective articles from those written by humans. For identifying individual paragraphs within the article, the model had an accuracy rate of 92 percent. The research team's model also outperformed an available AI text detector on the market by a wide margin on similar tests.

Next, the team plans to determine the scope of the model's applicability. They want to test it on more extensive datasets and across different types of academic science writing. As AI chatbots advance and become more sophisticated, the researchers also want to know if their model will stand.

"The first thing people want to know when they hear about the research is 'Can I use this to tell if my students actually wrote their paper?'" said Desaire. While the model is highly skilled at distinguishing between AI and scientists, Desaire says it was not designed to catch AI-generated student essays for educators. However, she notes that people can easily replicate their methods to build models for their own purposes.