Scientists discover new way the brain learns
Research reveals insights into how the brain forms habits and why they are so hard to break
Sainsbury Wellcome Centre
image:
Image shows the two regions of the brain that were inactivated during the task – the dorsomedial striatum (DMS) and the tail of the striatum (TS). Credit: Hernando Martinez Vergara
Credit: Hernando Martinez Vergara
Neuroscientists at the Sainsbury Wellcome Centre (SWC) at UCL have discovered that the brain uses a dual system for learning through trial and error. This is the first time a second learning system has been identified, which could help explain how habits are formed and provide a scientific basis for new strategies to address conditions related to habitual learning, such as addictions and compulsions. Published today in Nature, the study in mice could also have implications for developing therapeutics for Parkinson’s.
“Essentially, we have found a mechanism that we think is responsible for habits. Once you have developed a preference for a certain action, then you can bypass your value-based system and just rely on your default policy of what you’ve done in the past. This might then allow you to free up cognitive resources to make value-based decisions about something else,” explained Dr Marcus Stephenson-Jones, Group Leader at SWC and lead author of the study.
The researchers uncovered a dopamine signal in the brain that acts as a different kind of teaching signal to the one previously known. Dopamine signals in the brain were already understood to form reward prediction errors (RPE), where they signal to the animal whether an actual outcome is better or worse than expected. In this new study, the scientists discovered that, in parallel to RPE, there is an additional dopamine signal, called action prediction error (APE), which updates how often an action is performed. These two teaching signals give animals two different ways of learning to make a choice, learning to choose either the most valuable option or the most frequent option.
“Imagine going to your local sandwich shop. The first time you go, you might take your time choosing a sandwich and, depending on which you pick, you may or may not like it. But if you go back to the shop on many occasions, you no longer spend time wondering which sandwich to select and instead start picking one you like by default. We think it is the APE dopamine signal in the brain that is allowing you to store this default policy,” explained Dr Stephenson-Jones.
The newly discovered learning system provides a much simpler way of storing information than having to directly compare the value of different options. This might free up the brain to multi-task. For example, once you have learned to drive, you can also hold a conversation with someone during your journey. While your default system is doing all the repetitive tasks to drive the car, your value-based system can decide what to talk about.
Previous research discovered the dopamine neurons needed for learning reside in three areas of the midbrain: the ventral tegmental area, substantia nigra pars compacta, and substantia nigra pars lateralis. While some studies showed that these neurons were involved in coding for reward, earlier research found that half of these neurons code for movement, but the reason remained a mystery.
RPE neurons project to all areas of the striatum apart from one, called the tail of the striatum. Whereas the movement-specific neurons project to all areas apart from the nucleus accumbens. This means that the nucleus accumbens exclusively signals reward, and the tail of the striatum exclusively signals movement.
By investigating the tail of the striatum, the team were able to isolate the movement neurons and discover their function. To test this, the researchers used an auditory discrimination task in mice, which was originally developed by scientists at Cold Spring Harbor Laboratory. Co first authors, Dr Francesca Greenstreet, Dr Hernando Martinez Vergara and Dr Yvonne Johansson, used a genetically encoded dopamine sensor, which showed that dopamine release in this area was not related to reward, but it was related to movement.
“When we lesioned the tail of the striatum, we found a very characteristic pattern. We observed that lesioned mice and control mice initially learn in the same way, but once they get to about 60-70% performance, i.e. when they develop a preference (for example, for a high tone go left, for a low tone, go right), then the control mice rapidly learn and develop expert performance, whereas the lesioned mice only continue to learn in a linear fashion. This is because the lesioned mice can only use RPE, whereas the control mice have two learning systems, RPE and APE, which contribute to the choice,” explained Dr Stephenson Jones.
To further understand this, the team silenced the tail of striatum in expert mice and found that this had a catastrophic effect on their performance in the task. This showed that while in early learning animals form a preference using the value-based system based on RPE, in late learning they switch to exclusively use APE in the tail of striatum to store these stable associations and drive their choice. The team also used extensive computational modelling, led by Dr Claudia Clopath, to understand how the two systems, RPE and APE, learn together.
These findings hint at why it is so hard to break bad habits and why replacing an action with something else may be the best strategy. If you replace an action consistently enough, such as chewing on nicotine gum instead of smoking, the APE system may be able to take over and form a new habit on top of the other one.
“Now that we know this second learning system exists in the brain, we have a scientific basis for developing new strategies to break bad habits. Up until now, most research on addictions and compulsions has focused on the nucleus accumbens. Our research has opened up a new place to look in the brain for potential therapeutic targets,” commented Dr Stephenson Jones.
This research also has potential implications for Parkinson’s, which is known to be caused by the death of midbrain dopamine neurons, specifically in substantia nigra pars compacta. The type of cells that have been shown to die are movement-related dopamine neurons, which may be responsible for coding APE. This may explain why people with Parkinson’s experience deficits in doing habitual behaviours such as walking, however they do not experience deficits in more flexible behaviours such as ice skating.
“Suddenly, we now have a theory for paradoxical movement in Parkinson’s. The movement related neurons that die are the ones that drive habitual behaviour. And so, movement that uses the habitual system is compromised, but movement that uses your value-based flexible system is fine. This gives us a new place to look in the brain and a new way of thinking about Parkinson’s,” concluded Dr Stephenson-Jones.
The research team is now testing whether APE is really needed for habits. They are also exploring what exactly is being learned in each system and how the two work together. This research was funded by an EMBO Long-Term Fellowship (ALTF 827-2018), a Swedish Research Council International Postdoc Grant (2020-06365), the Sainsbury Wellcome Centre Core Grant from the Gatsby Charitable Foundation and Wellcome (219627/Z/19/Z), the Sainsbury Wellcome Centre PhD Programme, and a European Research Council grant (Starting #557533).
Fluorescent images showing the locations in the brain that the scientists recorded from – the tail of the striatum (TS) and ventral striatum (VS). Credit: Francesca Greenstreet
Credit
Francesca Greenstreet
Reward and action prediction error coding dopamine neurons project to distinct areas of the striatum to reinforce different types of associations. Credit: Sainsbury Wellcome Centre
Dual dopaminergic teaching signals are used to learn value-based or frequency-based decision-making strategies. Reward prediction errors are used to update the value of options allowing animals to choose the most valuable option. Action prediction errors are used to update how frequently an option has been chosen allowing animals to choose the most common option. Credit: Sainsbury Wellcome Centre
Source:
Read the full paper in Nature: ‘Dopaminergic action prediction errors serve as a value-free teaching signal’ DOI: 10.1038/s41586-025-09008-9
Media contact:
April Cashin-Garbutt, Head of Research Communications and Engagement, Sainsbury Wellcome Centre
E: a.cashin-garbutt@ucl.ac.uk T: +44 (0)20 3108 8028
Journal
Nature
Method of Research
Experimental study
Subject of Research
Animals
Article Title
Dopaminergic action prediction errors serve as a value-free teaching signal
Article Publication Date
14-May-2025
How the brain allows us to infer emotions
RIKEN
image:
Cartoon showing an example of how inferred emotions are learned. A child often watches wasps fly in and out of their nest in the woods near her house. One day she is stung by one of the wasps for the first time. Afterward, seeing the wasp nest alone makes her feel anxious. In this study neurons in the medial prefrontal cortex of the brain, and their connection to the amygdala, were found to be essential for this type of learning to occur.
view moreCredit: RIKEN
Xiaowei Gu and Joshua Johansen at the RIKEN Center for Brain Science in Japan have discovered key circuitry in the rat brain that allows the learning of inferred emotions. The study reveals how the frontal part of the brain coordinates with the amygdala—a brain region important for simple forms of emotional learning—to make this higher-order emotional ability possible. Published in the scientific journal Nature on May 14, this breakthrough study is the first to show how the brain codes human-like internal models of emotion.
What are inferred emotions? Consider a child who often watches a wasp fly in and out of its nest in the woods near her house. One day the child is stung by the wasp for the first time, a frightening experience that changes her emotional response to this creature but also to the nest itself. Afterward, seeing the wasp’s nest alone makes the child feel anxious and become alert and cautious. In this scenario, the child has built an internal model that links the negative experience with the visual representation of the nest – even though the nest was not there at the time of the fearful event. Johansen and Gu were interested in understanding the neural mechanisms that allow this type of higher-order emotional processing through inference.
To do so, they created a similar situation in animals; rats learned a neutral association between a noise and an image, then later experienced unpleasantness while seeing the image—a process called aversive conditioning. The next day, they were tested to see if they could infer from hearing the noise alone that something unpleasant might happen. Under these conditions, rats did indeed freeze when they heard the noise, indicating their fear and showing that they too can learn inferred emotions.
Once they had a successful animal model, the researchers used a combination of calcium imaging and optogenetics to examine changes in neuron activity within a part of the brain called the medial prefrontal cortex (mPFC). As hypothesized, experiments indicated that the mPFC is the basis of emotional inference.
Before undergoing aversive conditioning, neurons in the mPFC responded similarly to both the image and the noise, whether the stimuli had been paired or not. But after aversive learning, calcium imaging showed that the number of noise-responsive and noise/image co-responsive neurons went way up— provided the noise and image had been initially paired together when presented to the animals. Further testing showed that this phenomenon was possible because the initial sensory pairing “tagged” co-responsive neurons, making them primed to activate during aversive conditioning. Optogenetically blocking the mPFC during the aversive learning stage prevented rats from being able to make the later inference. In this case because the image and the unpleasant experience, and indirectly the noise, could not properly become linked in the rats’ minds.
Blocking the output from the mPFC to the amygdala during the test phase also prevented rats from responding to the noise with fear. However, in this case it was because they could not properly recall the inferred memory, even though the association had been made the day before. At the same time, the rats had no problem freezing in response to the image, indicating that only the higher-order ability of inference is rooted in the physical changes that take place in mPFC neurons.
As Johansen explains, “decades of studying aversive learning in rodents has revealed that the amygdala is a critical site for storing simple emotional memories involving directly experienced associations. However, our new findings indicate that the mPFC is a central brain region for higher order human-like emotions, which involve internal models and inference.”
“The value of our study,” says Johansen, “is that it opens the door for researchers everywhere to examine the neural mechanisms that mediate higher order emotions, which are more relevant to human psychiatric conditions like anxiety or trauma-related disorders.”
Journal
Nature
Schematic showing how paired preconditioning affects the amygdala after aversive learning. Aversive learing after paired preconditioning resulted in more neurons that responded to the noise (red dots), responded to both the noise and the image (mixed red and blue dots) and responded to the unpleasant experience (orange ring).
Credit
RIKEN
No comments:
Post a Comment