AI is good at weather forecasting. Can it predict freak weather events?
UChicago-led study tests neural networks’ ability to handle ‘gray swan’ events
University of Chicago
Increasingly powerful AI models can make short-term weather forecasts with surprising accuracy. But neural networks only predict based on patterns from the past—what happens when the weather does something that’s unprecedented in recorded history? A new study led by scientists from the University of Chicago, in collaboration with New York University and the University of California Santa Cruz, is testing the limits of AI-powered weather prediction. In research published May 21 in Proceedings of the National Academy of Sciences, they found that neural networks cannot forecast weather events beyond the scope of existing training data—which might leave out events like 200-year floods, unprecedented heat waves or massive hurricanes.
This limitation is particularly important as researchers incorporate neural networks into operational weather forecasting, early warning systems, and long-term risk assesments, the authors said. But they also said there are ways to address the problem by integrating more math and physics into the AI tools.
“AI weather models are one of the biggest achievements in AI in science. What we found is that they are remarkable, but not magical,” said Pedram Hassanzadeh, an associate professor of geophysical sciences at UChicago and a corresponding author on the study. “We’ve only had these models for a few years, so there’s a lot of room for innovation.”
Gray swan events
Weather forecasting AIs work in a similar way to other neural networks that many people now interact with, such as ChatGPT.
Essentially, the model is “trained” by feeding it a bunch of text or images into a model and asking it to look for patterns. Then, when a user presents the model with a question, it looks back at what it’s previously seen and uses that to predict an answer.
In the case of weather forecasts, scientists train neural networks by feeding them decades’ worth of weather data. Then a user can input data about the current weather conditions and ask the model to predict the weather for the next several days.
The AI models are very good at this. Generally, they can achieve the same accuracy as a top-of-the-line, supercomputer-based weather model that uses 10,000 to 100,000 times more time and energy, Hassanzadeh said.
“These models do really, really well for day-to-day weather,” he said. “But what if next week there’s a freak weather event?”
The concern is that the neural network is only working off the weather data we currently have, which goes back about 40 years. But that’s not the full range of possible weather.
“The floods caused by Hurricane Harvey in 2017 were considered a once-in-a-2,000-year event, for example,” Hassanzadeh said. “They can happen.”
Scientists sometimes refer to these events as “gray swan” events. They’re not quite all the way to a black swan event—something like the asteroid that killed the dinosaurs—but they are locally devastating.
The team decided to test the limits of the AI models using hurricanes as an example. They trained a neural network using decades of weather data, but removed all the hurricanes stronger than a Category 2. Then they fed it an atmospheric condition that leads to a Category 5 hurricane in a few days. Could the model extrapolate to predict the strength of the hurricane?
The answer was no.
“It always underestimated the event. The model knows something is coming, but it always predicts it’ll only be a Category 2 hurricane,” said Yongqiang Sun, research scientist at UChicago and the other corresponding author on the study.
This kind of error, known as a false negative, is a big deal in weather forecasting. If a forecast tells you a storm will be a Category 5 hurricane and it only turns out to be a Category 2, that means people evacuated who may not have needed to, which is not ideal. But if a forecast underestimates a hurricane that turns out to be a Category 5, the consequences would be far worse.
Hurricane warnings and why physics matters
The big difference between neural networks and traditional weather models is that traditional models “understand” physics. Scientists design them to incorporate our understanding of the math and physics that govern atmospheric dynamics, jet streams and other phenomena.
The neural networks aren’t doing any of that. Like ChatGPT, which is essentially a predictive text machine, they simply look at weather patterns and suggest what comes next, based on what has happened in the past.
No major service is currently using only AI models for forecasting. But as their use expands, this tendency will need to be factored in, Hassanzadeh said.
Researchers, from meteorologists to economists, are beginning to use AI for long-term risk assessments. For example, they might ask an AI to generate many examples of weather patterns, so that we can see the most extreme events that might happen in each region in the future. But if an AI cannot predict anything stronger than what it’s seen before, its usefulness would be limited for this critical task. However, they found the model could predict stronger hurricanes if there was any precedent, even elsewhere in the world, in its training data. For example, if the researchers deleted all the evidence of Atlantic hurricanes but left in Pacific hurricanes, the model could extrapolate to predict Atlantic hurricanes.
“This was a surprising and encouraging finding: it means that the models can forecast an event that was unpresented in one region but occurred once in a while in another region,” Hassanzadeh said.
Merging approaches
The solution, the researchers suggested, is to begin incorporating mathematical tools and the principles of atmospheric physics into AI-based models.
“The hope is that if AI models can really learn atmospheric dynamics, they will be able to figure out how to forecast gray swans,” Hassanzadeh said.
How to do this is a hot area of research. One promising approach the team is pursuing is called active learning—where AI helps guide traditional physics-based weather models to create more examples of extreme events, which can then be used to improve the AI’s training.
“Longer simulated or observed datasets aren't going to work. We need to think about smarter ways to generate data,” said Jonathan Weare, professor at the Courant Institute of Mathematical Sciences at New York University and study co-author. “In this case, that means answering the question 'where should I place my training data to achieve better performance on extremes?' Fortunately, we think AI weather models themselves, when paired with the right mathematical tools, can help answer this question.”
University of Chicago Prof. Dorian Abbot and computational scientist Mohsen Zand were also co-authors on the study, as well as Ashesh Chattopadhyay of the University of California Santa Cruz.
The study used resources maintained by the University of Chicago Research Computing Center. A video explaining the findings can be found here.
Citation: “Can AI weather models predict out-of-distribution gray swan tropical cyclones?” Sun et al, Proceedings of the National Academy of Sciences, May 21, 2025.
Funding: Office of Naval Research, Army Research Office, National Science Foundation.
Journal
Proceedings of the National Academy of Sciences
Method of Research
Data/statistical analysis
Subject of Research
Not applicable
Article Title
Can AI weather models predict out-of-distribution gray swan tropical cyclones?
Article Publication Date
20-May-2025
Breakthrough AI model could transform how we prepare for natural disasters
Machine learning at the core
‘Aurora uses state-of-the-art machine learning techniques to deliver superior forecasts for key environmental systems—air quality, weather, ocean waves, and tropical cyclones,’ explains Max Welling, machine learning expert at the University of Amsterdam and one of the researchers behind the model. Unlike conventional methods, Aurora requires far less computational power, making high-quality forecasting more accessible and scalable—especially in regions that lack expensive infrastructure.
Trained on a million hours of earth data
Aurora is built on a 1.3 billion parameter foundation model, trained on more than one million hours of Earth system data. It has been fine-tuned to excel in a range of forecasting tasks:
- Air quality: Outperforms traditional models in 74% of cases
- Ocean waves: Exceeds numerical simulations on 86% of targets
- Tropical cyclones: Beats seven operational forecasting centres in 100% of tests
- High-resolution weather: Surpasses leading models in 92% of scenarios, especially during extreme events
Forecasting that’s fast, accurate, and inclusive
As climate volatility increases, rapid and reliable forecasts are crucial for disaster preparedness, emergency response, and climate adaptation. The researchers believe Aurora can help by making advanced forecasting more accessible.
‘Development cycles that once took years can now be completed in just weeks by small engineering teams,’ notes AI researcher Ana Lucic, also of the University of Amsterdam. ‘This could be especially valuable for countries in the Global South, smaller weather services, and research groups focused on localised climate risks.’ ‘Importantly, this acceleration builds on decades of foundational research and the vast datasets made available through traditional forecasting methods,’ Welling adds.
Aurora is available freely online for anyone to use. If someone wants to fine-tune it for a specific task, they will need to provide data for that task. ‘But the “initial” training is done, we don’t need these vast datasets anymore, all the information from them is baked into Aurora already’, Lucic explains.
A future-proof forecasting tool
Although current research focuses on the four applications mentioned above, the researchers say Aurora is flexible and can be used for a wide range of future scenarios. These could include forecasting flood risks, wildfire spread, seasonal weather trends, agricultural yields, and renewable energy output. ‘Its ability to process diverse data types makes it a powerful and future-ready tool’, states Welling.
As the world faces more extreme weather—from heatwaves to hurricanes—innovative models like Aurora could shift the global approach from reactive crisis response to proactive climate resilience concludes the study.
Article details
Bodnar et al., 2025, A Foundation Model for the Earth System, Nature, 10.1038/s41586-025-09005-y
Amsterdam and Cambridge
Aurora was developed by a core team primarily based at Microsoft Research AI for Science in Amsterdam and Cambridge, UK. It was a collaboration between machine learning researchers and domain experts in meteorology and Earth system modelling.
Journal
Nature
Method of Research
Data/statistical analysis
Subject of Research
Not applicable
Article Title
A Foundation Model for the Earth System
Article Publication Date
21-May-2025
No comments:
Post a Comment