Unlocking the ‘black box’: scientists reveal AI’s hidden thoughts
Researchers introduce a new method to assess how deep neural networks interpret information, ensuring its reliability and robustness for real-world applications
Kyushu University
Fukuoka, Japan— Deep neural networks are a type of artificial intelligence (AI) that imitate how human brains process information, but understanding how these networks “think” has long been a challenge. Now, researchers at Kyushu University have developed a new method to understand how deep neural networks interpret information and sort it into groups. Published in IEEE Transactions on Neural Networks and Learning Systems, the study addresses the important need to ensure AI systems are accurate and robust and can meet the standards required for safe use.
Deep neural networks process information in many layers, similarly to humans solving a puzzle step by step. The first layer, known as the input layer, brings in the raw data. The subsequent layers, called hidden layers, analyze the information. Early hidden layers focus on basic features, such as detecting edges or textures—like examining individual puzzle pieces. Deeper hidden layers combine these features to recognize more complex patterns, such as identifying a cat or a dog—similar to connecting puzzle pieces to reveal the bigger picture.
“However, these hidden layers are like a locked black box: we see the input and output, but what is happening inside is not clear,” says Danilo Vasconcellos Vargas, Associate Professor from the Faculty of Information Science and Electrical Engineering at Kyushu University. "This lack of transparency becomes a serious problem when AI makes mistakes, sometimes triggered by something as small as changing a single pixel. AI might seem smart, but understanding how it comes to its decision is key to ensuring it’s trustworthy.”
Currently, methods for visualizing how AI organizes information rely on simplifying high-dimensional data into 2D or 3D representations. These methods let researchers observe how AI categorizes data points—for example, grouping images of cats close to other cats while separating them from dogs. However, this simplification comes with critical limitations.
“When we simplify high-dimensional information into fewer dimensions, it’s like flattening a 3D object into 2D—we lose important details and fail to see the whole picture. Additionally, this method of visualizing how the data is grouped makes it difficult to compare between different neural networks or data classes,” explains Vargas.
In this study, the researchers developed a new method, called the k* distribution method, that more clearly visualizes and assesses how well deep neural networks categorize related items together.
The model works by assigning each inputted data point a “k* value” which indicates the distance to the nearest unrelated data point. A high k* value means the data point is well-separated (e.g., a cat far from any dogs), while a low k* value suggests potential overlap (e.g., a dog closer to a cat than other cats). When looking at all the data points within a class, such as cats, this approach produces a distribution of k* values that provides a detailed picture of how the data is organized.
“Our method retains the higher dimensional space, so no information is lost. It’s the first and only model that can give an accurate view of the ‘local neighborhood’ around each data point,” emphasizes Vargas.
Using their method, the researchers revealed that deep neural networks sort data into clustered, fractured, or overlapping arrangements. In a clustered arrangement, similar items (e.g., cats) are grouped closely together, while unrelated items (e.g., dogs) are clearly separated, meaning the AI is able to sort the data well. Fractured arrangements, however, indicate that similar items are scattered across a wide space, while overlapping distributions occur when unrelated items are in the same space, with both arrangements making classification errors more likely.
Vargas compares this to a warehouse system: “In a well-organized warehouse, similar items are stored together, making retrieval easy and efficient. If items are intermixed, they become harder to find, increasing the risk of selecting the wrong item.”
AI is increasingly used in critical systems like autonomous vehicles and medical diagnostics, were accuracy and reliability is essential. The k* distribution method helps researchers, and even lawmakers, evaluate how AI organizes and classifies information, pinpointing potential weaknesses or errors. This not only supports the legalization processes needed to safely integrate AI into daily life but also offers valuable insights into how AI “thinks”. By identifying the root causes of errors, researchers can refine AI systems to make them not only accurate but also robust—capable of handling blurry or incomplete data and adapting to unexpected conditions.
“Our ultimate goal is to create AI systems that maintain precision and reliability, even when faced with the challenges of real-world scenarios,” concludes Vargas.
Written by Science Communicator Intern, Negar Khalili
###
For more information about this research, see “k* Distribution: Evaluating the Latent Space of Deep Neural Networks Using Local Neighborhood Analysis,” Shashank Kotyan; Tatsuya Ueda; Danilo Vasconcellos Vargas, IEEE Transactions on Neural Networks and Learning Systems, https://doi.org/10.1109/TNNLS.2024.3446509
About Kyushu University
Founded in 1911, Kyushu University is one of Japan's leading research-oriented institutes of higher education, consistently ranking as one of the top ten Japanese universities in the Times Higher Education World University Rankings and the QS World Rankings. The university is one of the seven national universities in Japan, located in Fukuoka, on the island of Kyushu—the most southwestern of Japan’s four main islands with a population and land size slightly larger than Belgium. Kyushu U’s multiple campuses—home to around 19,000 students and 8000 faculty and staff—are located around Fukuoka City, a coastal metropolis that is frequently ranked among the world's most livable cities and historically known as Japan's gateway to Asia. Through its VISION 2030, Kyushu U will “drive social change with integrative knowledge.” By fusing the spectrum of knowledge, from the humanities and arts to engineering and medical sciences, Kyushu U will strengthen its research in the key areas of decarbonization, medicine and health, and environment and food, to tackle society’s most pressing issues.
Journal
IEEE Transactions on Neural Networks and Learning Systems
Method of Research
Data/statistical analysis
Subject of Research
Not applicable
Article Title
k* Distribution: Evaluating the Latent Space of Deep Neural Networks Using Local Neighborhood Analysis
Researchers demonstrate new technique for stealing AI models
Researchers have demonstrated the ability to steal an artificial intelligence (AI) model without hacking into the device where the model was running. The technique is novel in that it works even when the thief has no prior knowledge of the software or architecture that support the AI.
“AI models are valuable, we don’t want people to steal them,” says Aydin Aysu, co-author of a paper on the work and an associate professor of electrical and computer engineering at North Carolina State University. “Building a model is expensive and requires significant computing sources. But just as importantly, when a model is leaked, or stolen, the model also becomes more vulnerable to attacks – because third parties can study the model and identify any weaknesses.”
“As we note in the paper, model stealing attacks on AI and machine learning devices undermine intellectual property rights, compromise the competitive advantage of the model’s developers, and can expose sensitive data embedded in the model’s behavior,” says Ashley Kurian, first author of the paper and a Ph.D. student at NC State.
In this work, the researchers stole the hyperparameters of an AI model that was running on a Google Edge Tensor Processing Unit (TPU).
“In practical terms, that means we were able to determine the architecture and specific characteristics – known as layer details – we would need to make a copy of the AI model,” says Kurian.
“Because we stole the architecture and layer details, we were able to recreate the high-level features of the AI,” Aysu says. “We then used that information to recreate the functional AI model, or a very close surrogate of that model.”
The researchers used the Google Edge TPU for this demonstration because it is a commercially available chip that is widely used to run AI models on edge devices – meaning devices utilized by end users in the field, as opposed to AI systems that are used for database applications.
“This technique could be used to steal AI models running on many different devices,” Kurian says. “As long as the attacker knows the device they want to steal from, can access the device while it is running an AI model, and has access to another device with the same specifications, this technique should work.”
The technique used in this demonstration relies on monitoring electromagnetic signals. Specifically, the researchers placed an electromagnetic probe on top of a TPU chip. The probe provides real-time data on changes in the electromagnetic field of the TPU during AI processing.
“The electromagnetic data from the sensor essentially gives us a ‘signature’ of the AI processing behavior,” Kurian says. “That’s the easy part.”
To determine the AI model’s architecture and layer details, the researchers compare the electromagnetic signature of the model to a database of other AI model signatures made on an identical device – meaning another Google Edge TPU, in this case.
How can the researchers “steal” an AI model for which they don’t already have a signature? That’s where things get tricky.
The researchers have a technique that allows them to estimate the number of layers in the targeted AI model. Layers are a series of sequential operations that the AI model performs, with the result of each operation informing the following operation. Most AI models have 50 to 242 layers.
“Rather than trying to recreate a model’s entire electromagnetic signature, which would be computationally overwhelming, we break it down by layer,” Kurian says. “We already have a collection of 5,000 first-layer signatures from other AI models. So we compare the stolen first layer signature to the first layer signatures in our database to see which one matches most closely.
“Once we’ve reverse-engineered the first layer, that informs which 5,000 signatures we select to compare with the second layer,” Kurian says. “And this process continues until we’ve reverse-engineered all of the layers and have effectively made a copy of the AI model.”
In their demonstration, the researchers showed that this technique was able to recreate a stolen AI model with 99.91% accuracy.
“Now that we’ve defined and demonstrated this vulnerability, the next step is to develop and implement countermeasures to protect against it,” says Aysu.
The paper, “TPUXtract: An Exhaustive Hyperparameter Extraction Framework,” is published online by the Conference on Cryptographic Hardware and Embedded Systems. The paper was co-authored by Anuj Dubey, a former Ph.D. student at NC State, and Ferhat Yaman, a former graduate student at NC State. The work was done with support from the National Science Foundation, under grant number 1943245.
The researchers disclosed the vulnerability they identified to Google.
Journal
IACR Transactions on Cryptographic Hardware and Embedded Systems
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
TPUXtract: An Exhaustive Hyperparameter Extraction Framework
Article Publication Date
12-Dec-2024
No comments:
Post a Comment