Parallel Corpra - AI-Generated versus Human-Written Text

Student Data Science Project

It said that ChatGPT has gotten to the point where some people argue that A.I. generated text is indistinguishable from human text. A group of UBC Master of Data Science Computational Linguistics students (Jialiang (Justin) Ren, Francesco Strafforello, Mingjia Mao, and Yu Tian Shen) wanted to use their Advanced Corpus Linguistics project to show that there is a difference between the two.

For this project, the students focused on English formal text that's divided into two types: human-written script and A.I.-generated script. The human written scripts are in plain text with minimum length of one paragraph from a variety of Wikipedia articles in different topics contributed by real people. The A.I. generated scripts were derived from the same paragraph but rephrased and generated by ChatGPT.

The team built a web interface for people to browse through their corpus. The web interface featured a corpus browser that enabled users to browse the corpus with human and A.I. script side by side. Additionally, on that same page, it implements a function that compared the two texts at every index and highlighted the differences between the two.

AI versus Human Dashboard MDS Computational Linguistics

Another feature is the guessing game that randomly grabbed a text from their corpus that is then displayed to the user. The user will look at the text and determine whether it is from A.I. or human. Once they decide, they can click on the “A.I.” or “Human” button and the system will tell them whether they are correct or not. 

The team uses a prototype of this game to generate 20 paragraphs out of the corpus while hiding its source. The team wanted to checked the accuracy at the end of each paragraph to see how accurate they could be. The results were that the group members with a computer science background had a 50% accuracy while the team member coming from a linguistics background at a 30% accuracy rate. 

This got the group thinking on whether a person’s background is influential on how people would evaluate text as human or A.I. written. They found that A.I. is as confident as humans on long syntax dependency and sometimes even more confident than humans.

Explore Computational Linguistics Explore Other Data in Action Stories