BAD A.I.
On the trail of deepfakes, Drexel researchers identify ‘fingerprints’ of AI-generated video
Machine-learning approach could be key to mitigating AI-driven misinformation
In February, OpenAI released videos created by its generative artificial intelligence program Sora. The strikingly realistic content, produced via simple text prompts, is the latest breakthrough for companies demonstrating the capabilities of AI technology. It also raised concerns about generative AI’s potential to enable the creation of misleading and deceiving content on a massive scale. According to new research from Drexel University, current methods for detecting manipulated digital media will not be effective against AI-generated video; but a machine-learning approach could be the key to unmasking these synthetic creations.
In a paper accepted for presentation at the IEEE Computer Vision and Pattern Recognition Conference in June, researchers from Multimedia and Information Security Lab in Drexel’s College of Engineering explained that while existing synthetic image detection technology has failed thus far at spotting AI-generated video, they’ve had success with a machine learning algorithm that can be trained to extract and recognize digital “fingerprints” of many different video generators, such as Stable Video Diffusion, Video-Crafter and Cog-Video. Additionally, they have shown that this algorithm can learn to detect new AI generators after studying just a few examples of their videos.
“It’s more than a bit unnerving that this video technology could be released before there is a good system for detecting fakes created by bad actors,” said Matthew Stamm, PhD, an associate professor in Drexel’s College of Engineering and director of the MISL. “Responsible companies will do their best to embed identifiers and watermarks, but once the technology is publicly available, people who want to use it for deception will find a way. That’s why we’re working to stay ahead of them by developing the technology to identify synthetic videos from patterns and traits that are endemic to the media.”
Deepfake Detectives
Stamm’s lab has been active in efforts to flag digitally manipulated images and videos for more than a decade, but the group has been particularly busy in the last year, as editing technology is being used to spread political misinformation.
Until recently, these manipulations have been the product of photo and video editing programs that add, remove or shift pixels; or slow, speed up or clip out video frames. Each of these edits leaves a unique digital breadcrumb trail and Stamm’s lab has developed a suite of tools calibrated to find and follow them.
The lab’s tools use a sophisticated machine learning program called a constrained neural network. This algorithm can learn, in ways similar to the human brain, what is “normal” and what is “unusual” at the sub-pixel level of images and videos, rather than searching for specific predetermined identifiers of manipulation from the outset. This makes the program adept at both identifying deepfakes from known sources, as well as spotting those created by a previously unknown program.
The neural network is typically trained on hundreds or thousands of examples to get a very good feel for the difference between unedited media and something that has been manipulated — this can be anything from variation between adjacent pixels, to the order of spacing of frames in a video, to the size and compression of the files themselves.
A New Challenge
“When you make an image, the physical and algorithmic processing in your camera introduces relationships between various pixel values that are very different than the pixel values if you photoshop or AI-generate an image,” Stamm said. “But recently we’ve seen text-to video generators, like Sora, that can make some pretty impressive videos. And those pose a completely new challenge because they have not been produced by a camera or photoshopped.”
Last year a campaign ad circulating in support of Florida Gov. Ron DeSantis appeared to show former President Donald Trump embracing and kissing Antony Fauci was the first to use generative-AI technology. This means the video was not edited or spliced together from others, rather it was created whole-cloth by an AI program.
And if there is no editing, Stamm notes, then the standard clues do not exist — which poses a unique problem for detection.
“Until now, forensic detection programs have been effective against edited videos by simply treating them as a series of images and applying the same detection process,” Stamm said. “But with AI-generated video, there is no evidence of image manipulation frame-to-frame, so for a detection program to be effective it will need to be able to identify new traces left behind by the way generative-AI programs construct their videos.”
In the study, the team tested 11 publicly available synthetic image detectors. Each of these programs was highly effective — at least 90% accuracy — at identifying manipulated images. But their performance dropped by 20-30% when faced with discerning videos created by publicly available AI-generators, Luma, VideoCrafter-v1, CogVideo and Stable Diffusion Video.
“These results clearly show that synthetic image detectors experience substantial difficulty detecting synthetic videos,” they wrote. “This finding holds consistent across multiple different detector architectures, as well as when detectors are pretrained by others or retrained using our dataset.”
A Trusted Approach
The team speculated that convolutional neural network-based detectors, like its MISLnet algorithm, could be successful against synthetic video because the program is designed to constantly shift its learning as it encounters new examples. By doing this, it’s possible to recognize new forensic traces as they evolve. Over the last several years, the team has demonstrated MISLnet’s acuity at spotting images that had been manipulated using new editing programs, including AI tools — so testing it against synthetic video was a natural step.
“We’ve used CNN algorithms to detect manipulated images and video and audio deepfakes with reliable success,” said Tai D. Nguyen, a doctoral student in MISL, who was a coauthor of the paper. “Due to their ability to adapt with small amounts of new information we thought they could be an effective solution for identifying AI-generated synthetic videos as well.”
For the test, the group trained eight CNN detectors, including MISLnet, with the same test dataset used to train the image detectors, which including real videos and AI-generated videos produced by the four publicly available programs. Then they tested the program against a set of videos that included a number created by generative AI programs that are not yet publicly available: Sora, Pika and VideoCrafter-v2.
By analyzing a small portion — a patch — from a single frame from each video, the CNN detectors were able to learn what a synthetic video looks like at a granular level and apply that knowledge to the new set of videos. Each program was more than 93% effective at identify the synthetic videos, with MISLnet performing the best, at 98.3%.
The programs were slightly more effective when conducting an analysis of the entire video, by pulling out a random sampling of a few dozen patches from various frames of the video and using those as a mini training set to learn the characteristics of the new video. Using a set of 80 patches, the programs were between 95-98% accurate.
With a bit of additional training, the programs were also more than 90% accurate at identifying the program that was used to create the videos, which the team suggests is because of the unique, proprietary approach each program uses to produce a video.
“Videos are generated using a wide variety of strategies and generator architectures,” the researchers wrote. “Since each technique imparts significant traces, this makes it much easier for networks to accurately discriminate between each generator.”
A Quick Study
While the programs struggled when faced with the challenge of detecting a completely new generator without previously being exposed to at least a small amount of video from it, with a small amount of fine tuning MISLnet could quickly learn to make the identification at 98% accuracy. This strategy, called “few-shot learning” is an important capability because new AI technology is being created every day, so detection programs must be agile enough to adapt with minimal training.
“We’ve already seen AI-generated video being used to create misinformation,” Stamm said. “As these programs become more ubiquitous and easier to use, we can reasonably expect to be inundated with synthetic videos. While detection programs shouldn’t be the only line of defense against misinformation — information literacy efforts are key — having the technological ability to verify the authenticity of digital media is certainly an important step.”
METHOD OF RESEARCH
Computational simulation/modeling
ARTICLE TITLE
Beyond Synthetic Images: Detecting AI-Generated Synthetic Videos
Computer game in school made students better at detecting fake news
UPPSALA UNIVERSITY
A computer game helped upper secondary school students become better at distinguishing between reliable and misleading news. This is shown by a study conducted by researchers at Uppsala University and elsewhere.
“This is an important step towards equipping young people with the tools they need to navigate in a world full of disinformation. We all need to become better at identifying manipulative strategies – prebunking, as it is known – since it is virtually impossible to discern deep fakes, for example, and other AI-generated disinformation with the naked eye,” says Thomas Nygren, Professor of Education at Uppsala University.
Along with three other researchers, he conducted a study involving 516 Swedish upper secondary school students in different programmes at four schools. The study, published in the Journal of Research on Technology in Education, investigated the effect of the game Bad News in a classroom setting – this is the first time the game has been scientifically tested in a normal classroom. The game has been created for research and teaching, and the participants assume the role of spreader of misleading news. The students in the study either played the game individually, in pairs or in whole class groups with a shared scorecard – all three methods had positive effects. This surprised the researchers, who believed students would learn more by working at the computer together.
“The students improved their ability to identify manipulative techniques in social media posts and to distinguish between reliable and misleading news,” Nygren comments.
The study also showed that students who already had a positive attitude towards trustworthy news sources were better at distinguishing disinformation, and this attitude became significantly more positive after playing the game. Moreover, many students improved their assessments of credibility and were able to explain how they could identify manipulative techniques in a more sophisticated way.
The researchers noted that competitive elements in the game made for greater interest and enhanced its benefit. They therefore conclude that the study contributes insights for teachers into how serious games can be used in formal instruction to promote media and information literacy.
“Some people believe that gamification can enhance learning in school. However, our results show that more gamification in the form of competitive elements does not necessarily mean that students learn more – though it can be perceived as more fun and interesting,” Nygren says.
Participating researchers: Carl-Anton Werner Axelsson (Mälardalen and Uppsala), Thomas Nygren (Uppsala), Jon Roozenbeek (Cambridge) and Sander van der Linden (Cambridge).
JOURNAL
Journal of Research on Technology in Education
METHOD OF RESEARCH
Observational study
SUBJECT OF RESEARCH
People
ARTICLE TITLE
Bad News in the civics classroom: How serious gameplay fosters teenagers’ ability to discern misinformation techniques
ARTICLE PUBLICATION DATE
19-Apr-2024
No comments:
Post a Comment