CAUSALITY
Revealing causal links in complex systems
MIT engineers’ algorithm may have wide impact, from forecasting climate to projecting population growth to designing efficient aircraft
Massachusetts Institute of Technology
Getting to the heart of causality is central to understanding the world around us. What causes one variable — be it a biological species, a voting region, a company stock, or a local climate — to shift from one state to another can inform how we might shape that variable in the future.
But tracing an effect to its root cause can quickly become intractable in real-world systems, where many variables can converge, confound, and cloud over any causal links.
Now, a team of MIT engineers hopes to provide some clarity in the pursuit of causality. They developed a method that can be applied to a wide range of situations to identify those variables that likely influence other variables in a complex system.
The method, in the form of an algorithm, takes in data that have been collected over time, such as the changing populations of different species in a marine environment. From those data, the method measures the interactions between every variable in a system and estimates the degree to which a change in one variable (say, the number of sardines in a region over time) can predict the state of another (such as the population of anchovy in the same region).
The engineers then generate a “causality map” that links variables that likely have some sort of cause-and-effect relationship. The algorithm determines the specific nature of that relationship, such as whether two variables are synergistic — meaning one variable only influences another if it is paired with a second variable — or redundant, such that a change in one variable can have exactly the same, and therefore redundant, effect as another variable.
The new algorithm can also make an estimate of “causal leakage,” or the degree to which a system’s behavior cannot be explained through the variables that are available; some unknown influence must be at play, and therefore, more variables must be considered.
“The significance of our method lies in its versatility across disciplines,” says Álvaro Martínez-Sánchez, a graduate student in MIT’s Department of Aeronautics and Astronautics (AeroAstro). “It can be applied to better understand the evolution of species in an ecosystem, the communication of neurons in the brain, and the interplay of climatological variables between regions, to name a few examples.”
For their part, the engineers plan to use the algorithm to help solve problems in aerospace, such as identifying features in aircraft design that can reduce a plane’s fuel consumption.
“We hope by embedding causality into models, it will help us better understand the relationship between design variables of an aircraft and how it relates to efficiency,” says Adrián Lozano-Durán, an associate professor in AeroAstro.
The engineers, along with MIT postdoc Gonzalo Arranz, have published their results in a study appearing in Nature Communications.
Seeing connections
In recent years, a number of computational methods have been developed to take in data about complex systems and identify causal links between variables in the system, based on certain mathematical descriptions that should represent causality.
“Different methods use different mathematical definitions to determine causality,” Lozano-Durán notes. “There are many possible definitions that all sound ok, but they may fail under some conditions.”
In particular, he says that existing methods are not designed to tell the difference between certain types of causality. Namely, they don’t distinguish between a “unique” causality, in which one variable has a unique effect on another, apart from every other variable, from a “synergistic” or a “redundant” link. An example of a synergistic causality would be if one variable (say, the action of drug A) had no effect on another variable (a person’s blood pressure), unless the first variable was paired with a second (drug B).
An example of redundant causality would be if one variable (a student’s work habits) affect another variable (their chance of getting good grades), but that effect has the same impact as another variable (the amount of sleep the student gets).
“Other methods rely on the intensity of the variables to measure causality,” adds Arranz. “Therefore, they may miss links between variables whose intensity is not strong yet they are important.”
Messaging rates
In their new approach, the engineers took a page from information theory — the science of how messages are communicated through a network, based on a theory formulated by the late MIT professor emeritus Claude Shannon. The team developed an algorithm to evaluate any complex system of variables as a messaging network.
“We treat the system as a network, and variables transfer information to each other in a way that can be measured,” Lozano-Durán explains. “If one variable is sending messages to another, that implies it must have some influence. That’s the idea of using information propagation to measure causality.”
The new algorithm evaluates multiple variables simultaneously, rather than taking on one pair of variables at a time, as other methods do. The algorithm defines information as the likelihood that a change in one variable will also see a change in another. This likelihood — and therefore, the information that is exchanged between variables — can get stronger or weaker as the algorithm evaluates more data of the system over time.
In the end, the method generates a map of causality that shows which variables in the network are strongly linked. From the rate and pattern of these links, the researchers can then distinguish which variables have a unique, synergistic, or redundant relationship. By this same approach, the algorithm can also estimate the amount of “causality leak” in the system, meaning the degree to which a system’s behavior cannot be predicted based on the information available.
“Part of our method detects if there’s something missing,” Lozano-Durán says. “We don’t know what is missing, but we know we need to include more variables to explain what is happening.”
The team applied the algorithm to a number of benchmark cases that are typically used to test causal inference. These cases range from observations of predator-prey interactions over time, to measurements of air temperature and pressure in different geographic regions, and the co-evolution of multiple species in a marine environment. The algorithm successfully identified causal links in every case, compared with most methods that can only handle some cases.
The method, which the team coined SURD, for Synergistic-Unique-Redundant Decomposition of causality, is available online for others to test on their own systems.
“SURD has the potential to drive progress across multiple scientific and engineering fields, such as climate research, neuroscience, economics, epidemiology, social sciences, and fluid dynamics, among others areas,” Martínez-Sánchezsays.
This research was supported, in part, by the National Science Foundation.
###
Written by Jennifer Chu, MIT News
Journal
Nature Communications
Article Title
“Decomposing causality into its synergistic, unique, and redundant components”
Using mathematics to better understand cause and effect
California Institute of Technology
Cause and effect. We understand this concept from an early age. Tug on a pull toy’s string, and the toy follows. Naturally, things get much more complicated as a system grows, as the number of variables increases, and as noise enters the picture. Eventually, it can become almost impossible to tell whether a variable is causing an effect or is simply correlated or associated with it.
Consider an example from climate science. Experts studying large atmospheric circulation patterns and their impacts on global weather would like to know how these systems might change with warming climates. Here, many variables come into play: ocean and air temperatures and pressures, ocean currents and depths, and even details of the earth’s rotation over time. But which variables cause which measured effects?
That is where information theory comes in as the framework to formulate causality. Adrián Lozano-Durán, an associate professor of aerospace at Caltech, and members of his group both at Caltech and MIT have developed a method that can be used to determine causality even in such complex systems.
The new mathematical tool can tease out the contributions that each variable in a system makes to a measured effect—both separately and, more importantly, in combination. The team describes its new method, called synergistic-unique-redundant decomposition of causality (SURD), in a paper published today, November 1, in the journal Nature Communications.
The new model can be used in any situation in which scientists are trying to determine the true cause or causes of a measured effect. That could be anything from what triggered the downturn of the stock market in 2008, to the contribution of various risk factors in heart failure, to which oceanic variables affect the population of certain fish species, to what mechanical properties are responsible for the failure of a material.
"Causal inference is very multidisciplinary and has the potential to drive progress across many fields," says Álvaro Martínez-Sánchez, a graduate student at MIT in Lozano-Durán’s group, who is lead author of the new paper.
For Lozano-Durán’s group, SURD will be most useful in designing aerospace systems. For instance, by identifying which variable is increasing an aircraft’s drag, the method could help engineers optimize the vehicle’s design.
"Previous methods will only tell you how much causality comes from one variable or another," explains Lozano-Durán. "What is unique about our method is its ability to capture the full picture of everything that is causing an effect."
The new method also avoids the incorrect identification of causalities. This is largely because it goes beyond merely quantifying the effect produced by each variable independently. In addition to what the authors refer to as "unique causality," the method incorporates two new categories of causality, namely redundant and synergistic causality.
Redundant causality occurs when more than one variable produces a measured effect, but not all the variables are needed to arrive at the same outcome. For example, a student can get a good grade in class because she is very smart or because she is a hard worker. Both could result in the good grade, but only one is necessary. The two variables are redundant.
Synergistic causality, on the other hand, involves multiple variables that must work together to produce an effect. Each variable on its own will not yield the same outcome. For instance, a patient takes medication A, but he does not recuperate from his illness. Similarly, when he takes medication B, he sees no improvement. But when he takes both medications, he fully recovers. Medications A and B are synergistic.
SURD mathematically breaks down the contributions of each variable in a system to its unique, redundant, and synergistic components of causality. The sum of all these contributions must satisfy a conservation-of-information equation that can then be used to figure out the existence of hidden causality, i.e., variables that could not be measured or that were thought not to be important. (If the hidden causality turns out to be too large, the researchers know they need to reconsider the variables they included in their analysis.)
To test the new method, Lozano-Durán’s team used SURD to analyze 16 validation cases—scenarios with known solutions that would normally pose significant challenges for researchers trying to determine causality.
"Our method will consistently give you a meaningful answer across all these cases," says Gonzalo Arranz, a postdoctoral researcher in the Graduate Aerospace Laboratories at Caltech, who is also an author of the paper. "Other methods mix causalities that should not be mixed, and sometimes they get confused. They get a false positive identifying a causality that doesn’t exist, for example."
In the paper, the team used SURD to study the creation of turbulence as air flows around a wall. In this case, air flows more slowly at lower altitudes, close to the wall, and more quickly at higher altitudes. Previously, some theories of what is happening in this scenario have suggested that the higher-altitude flow influences what is happening close to the wall and not the other way around. Other theories have suggested just the opposite—that the air flow near the wall affects what is happening at higher altitudes.
"We analyzed the two signals with SURD to understand in which way the interactions were happening," says Lozano-Durán. "As it turns out, causality comes from the velocity that is far away. In addition, there is some synergy where the signals interact to create another type of causality. This decomposition, or breaking into pieces of causality, is what is unique for our method."
More details about the research can be found in the paper, "Decomposing causality into its synergistic and redundant components" by Martínez-Sánchez, Gonzalo Arranz, and Lozano-Durán.
The researchers used the SURD method to determine the causality of velocity motions in a turbulent boundary layer around a wall. This schematic shows the outer and inner-layer velocity motions and their interactions as well as two locations where velocity measurements were taken, one close to the wall and another higher up.
The researchers used the SURD method to determine the causality of velocity motions in a turbulent boundary layer around a wall. This schematic shows the outer and inner-layer velocity motions and their interactions as well as two locations where velocity measurements were taken, one close to the wall and another higher up.
Caption
The data show the velocity signals captured at the two locations. SURD analyzed the data and broke up the causal contributions into redundant (dark grey), unique (pink), and synergistic (orange) causalities. This showed the researchers that at each location, it was the flow that was most distant that had the largest impact on the experienced flow, but there was also a synergistic causality that contributed.
The data show the velocity signals captured at the two locations. SURD analyzed the data and broke up the causal contributions into redundant (dark grey), unique (pink), and synergistic (orange) causalities. This showed the researchers that at each location, it was the flow that was most distant that had the largest impact on the experienced flow, but there was also a synergistic causality that contributed.
3D rendering showing a Newton’s Cradle apparatus in action.
3D rendering showing a Newton’s Cradle apparatus in action.
Credit
Please give attribution to 'ccPixs.com' (and point the link to www.ccPixs.com).
Please give attribution to 'ccPixs.com' (and point the link to www.ccPixs.com).
Journal
Nature Communications
Subject of Research
Not applicable
Article Title
Decomposing causality into its synergistic, unique, and redundant components
Article Publication Date
1-Nov-2024