Thursday, September 26, 2024

New study reveals impact of chatGPT on public knowledge sharing

ChatGPT led to a 25% drop in activity on Stack Overflow. There are major implications for AI's future, according to the study's authors

Complexity Science Hub

An extended timeseries of Stack Overflow posts — image:
An extended timeseries of the weekly posts to Stack Overflow. The figure highlights the release of ChatGPT and the conclusion of the data used in the statistical analyses, respectively. After May 2023, the decline in posting activity continues, albeit at a slower rate.
view more
Credit: Maria del Rio-Chanona, Nadzeya Laurentsyeva, Johannes Wachs

[

Vienna, September 25 2024]— A new study published in PNAS Nexus reveals that the widespread adoption of large language models (LLMs), such as ChatGPT, has led to a significant decline in public knowledge sharing on platforms like Stack Overflow. The study highlights a 25% reduction in user activity on the popular programming Q&A site within six months of ChatGPT's release, relative to similar platforms where access to ChatGPT is restricted.

“LLMs are so powerful, have such a high value, and make a huge impact on the world. One begins to wonder about their future,” says first author Maria del Rio-Chanona, an associate faculty member at the Complexity Science Hub (CSH).

“Our study hypothesized that instead of posting questions and receiving answers on public platforms like Stack Overflow, where everybody can see them and learn from them, people are asking privately on ChatGPT instead. However, LLMs like ChatGPT are also trained on this open and public data, which they are replacing in some way. So what's going to happen?,” adds Del Rio-Chanona, who’s also an assistant professor at University College London, an associate researcher at the Institute for New Economic Thinking at the Oxford Martin School, and the Bennett Institute for Public Policy, University of Cambridge.

Implications are Major

“In our findings, we noticed less and less questions and answers on Stack Overflow after ChatGPT was released. This has quite big implications. This means there may not be enough public data to train models in the future” warns Del Rio-Chanona. In this study, she worked together with Nadzeya Laurentsyeva, from Ludwig Maximilian University of Munich; and Johannes Wachs, faculty member at CSH and professor at Corvinus University in Budapest.

“Stack Overflow is an immensely valuable knowledge database accessible to anyone with an internet connection. People all over the world learn from questions and answers that other people post,” says Wachs. In fact, even AI models like ChatGPT are trained on human generated content like Stack Overflow posts. Ironically, the displacement of human content creation by AI will make it more difficult to train future AI models. Using data generated by AI to train new models is generally thought to perform poorly, a process likened to making a photocopy of a photocopy.

A Shift from Public to Private

The findings also point out scenarios that go beyond mere technological changes to touch the fabric of our economic and social structures as well. Users may become less inclined to contribute to open knowledge platforms as they interact more with LLMs like ChatGPT, resulting in valuable data being transferred from public repositories to privately-owned AI systems, explain Del Rio-Chanona and colleagues.

“This represents a significant shift of knowledge from public to private domains,” argue the researchers. According to them, this could also deepen the competitive advantage of early movers in AI, further concentrating knowledge and economic power.

All experience and quality levels

Del Rio-Chanona and her colleagues found that the decline in content creation on Stack Overflow affected users of all experience levels, from novices to experts. They also observed that the quality of posts did not decrease significantly, as measured by user feedback, indicating that both low and high quality contributions are being displaced by LLMs.

In addition, the study showed that posting activity in some programming languages, such as Python and Javascript, dropped significantly more than the platform’s average. “The results suggest that people are indeed asking questions about Python and Javascript, two of the most commonly used programming languages, on ChatGPT rather than Stack Overflow,” says Del Rio-Chanona.

About the Study

This research, titled "Large Language Models Reduce Public Knowledge Sharing on Online Q&A Platforms," by R Maria del Rio-Chanona, Nadzeya Laurentsyeva, and Johannes Wachs, was published in PNAS Nexus and is available online.

About CSH

The Complexity Science Hub (CSH) is Europe’s research center for the study of complex systems. We derive meaning from data from a range of disciplines — economics, medicine, ecology, and the social sciences — as a basis for actionable solutions for a better world. Established in 2015, we have grown to over 70 researchers, driven by the increasing demand to gain a genuine understanding of the networks that underlie society, from healthcare to supply chains. Through our complexity science approaches linking physics, mathematics, and computational modeling with data and network science, we develop the capacity to address today’s and tomorrow’s challenges.

Journal

PNAS Nexus

DOI

10.1093/pnasnexus/pgae400

Method of Research

Data/statistical analysis

Subject of Research

People

Article Title

Large language models reduce public knowledge sharing on online Q&A platforms

LA REVUE GAUCHE - Left Comment