Leaner large language models could enable efficient local use on phones and laptops
Princeton University, Engineering School
Large language models (LLMs) are increasingly automating tasks like translation, text classification and customer service. But tapping into an LLM’s power typically requires users to send their requests to a centralized server — a process that’s expensive, energy-intensive and often slow.
Now, researchers have introduced a technique for compressing an LLM’s reams of data, which could increase privacy, save energy and lower costs.
The new algorithm, developed by engineers at Princeton and Stanford Engineering, works by trimming redundancies and reducing the precision of an LLM’s layers of information. This type of leaner LLM could be stored and accessed locally on a device like a phone or laptop and could provide performance nearly as accurate and nuanced as an uncompressed version.
“Any time you can reduce the computational complexity, storage and bandwidth requirements of using AI models, you can enable AI on devices and systems that otherwise couldn’t handle such compute- and memory-intensive tasks,” said study coauthor Andrea Goldsmith, dean of Princeton’s School of Engineering and Applied Science and Arthur LeGrand Doty Professor of Electrical and Computer Engineering.
“When you use ChatGPT, whatever request you give it goes to the back-end servers of OpenAI, which process all of that data, and that is very expensive,” said coauthor Rajarshi Saha, a Stanford Engineering Ph.D. student. “So, you want to be able to do this LLM inference using consumer GPUs [graphics processing units], and the way to do that is by compressing these LLMs.” Saha’s graduate work is coadvised by Goldsmith and coauthor Mert Pilanci, an assistant professor at Stanford Engineering.
The researchers will present their new algorithm CALDERA, which stands for Calibration Aware Low precision DEcomposition with low Rank Adaptation, at the Conference on Neural Information Processing Systems (NeurIPS) in December. Saha and colleagues began this compression research not with LLMs themselves, but with the large collections of information that are used to train LLMs and other complex AI models, such as those used for image classification. This technique, a forerunner to the new LLM compression approach, was published in 2023.
Training data sets and AI models are both composed of matrices, or grids of numbers that are used to store data. In the case of LLMs, these are called weight matrices, which are numerical representations of word patterns learned from large swaths of text.
“We proposed a generic algorithm for compressing large data sets or large matrices,” said Saha. “And then we realized that nowadays, it’s not just the data sets that are large, but the models being deployed are also getting large. So, we could also use our algorithm to compress these models.”
While the team’s algorithm is not the first to compress LLMs, its novelty lies in an innovative combination of two properties, one called “low-precision,” the other “low-rank.” As digital computers store and process information as bits (zeros and ones), “low-precision” representation reduces the number of bits, speeding up storage and processing while improving energy efficiency. On the other hand, “low-rank” refers to reducing redundancies in the LLM weight matrices.
“Using both of these properties together, we are able to get much more compression than either of these techniques can achieve individually,” said Saha.
The team tested their technique using Llama 2 and Llama 3, open-source large language models released by Meta AI, and found that their method, which used low-rank and low-precision components in tandem with each other, can be used to improve other methods which use just low-precision. The improvement can be up to 5%, which is significant for metrics that measure uncertainty in predicting word sequences.
They evaluated the performance of the compressed language models using several sets of benchmark tasks for LLMs. The tasks included determining the logical order of two statements, or answering questions involving physical reasoning, such as how to separate an egg white from a yolk or how to make a cup of tea.
“I think it’s encouraging and a bit surprising that we were able to get such good performance in this compression scheme,” said Goldsmith, who moved to Princeton from Stanford Engineering in 2020. “By taking advantage of the weight matrix rather than just using a generic compression algorithm for the bits that are representing the weight matrix, we were able to do much better.”
Using an LLM compressed in this way could be suitable for situations that don’t require the highest possible precision. Moreover, the ability to fine-tune compressed LLMs on edge devices like a smartphone or laptop enhances privacy by allowing organizations and individuals to adapt models to their specific needs without sharing sensitive data with third-party providers. This reduces the risk of data breaches or unauthorized access to confidential information during the training process. To enable this, the LLMs must initially be compressed enough to fit on consumer-grade GPUs.
Saha also cautioned that running LLMs on a smartphone or laptop could hog the device’s memory for a period of time. “You won’t be happy if you are running an LLM and your phone drains out of charge in an hour,” said Saha. Low-precision computation can help reduce power consumption, he added. “But I wouldn’t say that there’s one single technique that solves all the problems. What we propose in this paper is one technique that is used in combination with techniques proposed in prior works. And I think this combination will enable us to use LLMs on mobile devices more efficiently and get more accurate results.”
The paper, “Compressing Large Language Models using Low Rank and Low Precision Decomposition,” will be presented at the Conference on Neural Information Processing Systems (NeurIPS) in December 2024. In addition to Goldsmith, Saha and Pilanci, coauthors include Stanford Engineering researchers Naomi Sagan and Varun Srivastava. This work was supported in part by the U.S. National Science Foundation, the U.S. Army Research Office, and the Office of Naval Research.
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
Compressing Large Language Models using Low Rank and Low Precision Decomposition
The Conversation
November 18, 2024
Image via TippaPatt/Shutterstock.
Think back to a time when you needed a quick answer, maybe for a recipe or a DIY project. A few years ago, most people’s first instinct was to “Google it.” Today, however, many people are more likely to reach for ChatGPT, OpenAI’s conversational AI, which is changing the way people look for information.
Rather than simply providing lists of websites, ChatGPT gives more direct, conversational responses. But can ChatGPT do more than just answer straightforward questions? Can it actually help people be more creative?
I study new technologies and consumer interaction with social media. My colleague Byung Lee and I set out to explore this question: Can ChatGPT genuinely assist people in creatively solving problems, and does it perform better at this than traditional search engines like Google?
Across a series of experiments in a study published in the journal Nature Human Behavour, we found that ChatGPT does boost creativity, especially in everyday, practical tasks. Here’s what we learned about how this technology is changing the way people solve problems, brainstorm ideas and think creatively.
ChatGPT and creative tasks
Imagine you’re searching for a creative gift idea for a teenage niece. Previously, you might have googled “creative gifts for teens” and then browsed articles until something clicked. Now, if you ask ChatGPT, it generates a direct response based on its analysis of patterns across the web. It might suggest a custom DIY project or a unique experience, crafting the idea in real time.
To explore whether ChatGPT surpasses Google in creative thinking tasks, we conducted five experiments where participants tackled various creative tasks. For example, we randomly assigned participants to either use ChatGPT for assistance, use Google search, or generate ideas on their own. Once the ideas were collected, external judges, unaware of the participants’ assigned conditions, rated each idea for creativity. We averaged the judges’ scores to provide an overall creativity rating.
One task involved brainstorming ways to repurpose everyday items, such as turning an old tennis racket and a garden hose into something new. Another asked participants to design an innovative dining table. The goal was to test whether ChatGPT could help people come up with more creative solutions compared with using a web search engine or just their own imagination.
ChatGPT did well with the task of suggesting creative ideas for reusing household items. Simon Ritzmann/DigitalVision via Getty Images
The results were clear: Judges rated ideas generated with ChatGPT’s assistance as more creative than those generated with Google searches or without any assistance. Interestingly, ideas generated with ChatGPT – even without any human modification – scored higher in creativity than those generated with Google.
One notable finding was ChatGPT’s ability to generate incrementally creative ideas: those that improve or build on what already exists. While truly radical ideas might still be challenging for AI, ChatGPT excelled at suggesting practical yet innovative approaches. In the toy-design experiment, for example, participants using ChatGPT came up with imaginative designs, such as turning a leftover fan and a paper bag into a wind-powered craft.
Limits of AI creativity
ChatGPT’s strength lies in its ability to combine unrelated concepts into a cohesive response. Unlike Google, which requires users to sift through links and piece together information, ChatGPT offers an integrated answer that helps users articulate and refine ideas in a polished format. This makes ChatGPT promising as a creativity tool, especially for tasks that connect disparate ideas or generate new concepts.
It’s important to note, however, that ChatGPT doesn’t generate truly novel ideas. It recognizes and combines linguistic patterns from its training data, subsequently generating outputs with the most probable sequences based on its training. If you’re looking for a way to make an existing idea better or adapt it in a new way, ChatGPT can be a helpful resource. For something groundbreaking, though, human ingenuity and imagination are still essential.
Additionally, while ChatGPT can generate creative suggestions, these aren’t always practical or scalable without expert input. Steps such as screening, feasibility checks, fact-checking and market validation require human expertise. Given that ChatGPT’s responses may reflect biases in its training data, people should exercise caution in sensitive contexts such as those involving race or gender.
We also tested whether ChatGPT could assist with tasks often seen as requiring empathy, such as repurposing items cherished by a loved one. Surprisingly, ChatGPT enhanced creativity even in these scenarios, generating ideas that users found relevant and thoughtful. This result challenges the belief that AI cannot assist with emotionally driven tasks
Future of AI and creativity
As ChatGPT and similar AI tools become more accessible, they open up new possibilities for creative tasks. Whether in the workplace or at home, AI could assist in brainstorming, problem-solving and enhancing creative projects. However, our research also points to the need for caution: While ChatGPT can augment human creativity, it doesn’t replace the unique human capacity for truly radical, out-of-the-box thinking.
This shift from Googling to asking ChatGPT represents more than just a new way to access information. It marks a transformation in how people collaborate with technology to think, create and innovate.
Jaeyeon Chung, Assistant Professor of Business, Rice University
This article is republished from The Conversation under a Creative Commons license. Read the original article.
No comments:
Post a Comment