Friday, April 03, 2026

  

Privacy-preserving feature selection scheme based on secure multi-party computation





Higher Education Press

image 

image: 

The steps to calculate the weight vector share based on feature matrix and label matrix

view more 

Credit: HIGHER EDUCATION PRESS





Privacy-preserving feature selection allows identifying more important features while ensuring data privacy, thus enhancing data quality. Secure multiparty computation (MPC) is a cryptographic method that allows effective data processing without a trusted third party. However, most MPC-based feature selection schemes overlook the correlation between features and perform poorly for model training when handling datasets containing both numerical and categorical attributes.
To solve the problem, a research team led by Lu Zhou published their new research on 15 March 2026 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team proposed a feature selection scheme, MPC-Relief, to select the relevant features while preserving privacy. To achieve safety under MPC, they transform all complex computational steps from data-dependent to data-oblivious with faithful implementations.
In the research, they construct a nonlinear function based on MPC, called MN-Ramp, to solve the problem of distance calculation. They apply this function to the Relief algorithm to handle distance calculation when dealing with numerical and categorical features. They construct a bidirectional vector and adopt the mapping method to estimate the near instances, which avoids conditional judgments. They implement MPC-Relief and evaluate it on two computational environments and several datasets. The experimental results show that the scheme can achieve effective feature selection. Future work can optimize the performance of the time-consuming modules and construct a robust privacy-preserving feature selection scheme.

 

Privacy goes public with new database



Differential privacy repository based on Harvard research helps companies protect sensitive data




Harvard John A. Paulson School of Engineering and Applied Sciences




Key Takeaways

  • Harvard researchers have launched the Differential Privacy Deployments Registry, a public database that catalogs real-world uses of differential privacy by companies and agencies to better protect individuals’ data.
  • Developed from Harvard-originated theory and shaped by a 2025 study, the registry is designed as an interactive resource for practitioners, with potential to support policymakers and the public in learning how differential privacy protects sensitive data.

When Apple discovers trending popular emojis, or when Google reports traffic at a busy restaurant, they’re analyzing large datasets made up of individual people. Those people’s personal information is systematically protected thanks in large part to research by Harvard computer scientists.

Now, after two decades of work on the cryptography-adjacent mathematical framework known as differential privacy, researchers in the John A. Paulson School of Engineering and Applied Sciences have reached a key milestone in moving privacy best practices from academia into real-world applications. 

A team led by Salil Vadhan, the Vicky Joseph Professor of Computer Science and Mathematics at SEAS, has launched the Differential Privacy Deployments Registry, a collaborative, shared database of companies and agencies actively using the highly rigorous data-protection scheme that first entered the academic literature in 2006. The theoretical privacy-protection framework has since seen growing popularity amongst large companies and organizations that handle sensitive information. The new database should enable even more adoptions and refinements.

“There’s real societal value that differential privacy has the potential to provide, but only if we can make it easy and effective enough for people to adopt,” said Vadhan, who, in 2019, co-founded the community project OpenDP, which develops open-source tools for deploying differential privacy. OpenDP emerged from a preceding National Science Foundation-supported research initiative at Harvard called the Privacy Tools Project and is led by Vadhan and Gary King, Albert J. Weatherhead III University Professor at Harvard.

The 2006 paper that described the foundational theory behind differential privacy was first authored by Cynthia Dwork, Gordon McKay Professor of Computer Science at SEAS, in collaboration with Frank McSherry, Kobbi Nissim and Adam Smith. Dwork’s research in cryptography and privacy was recently awarded the National Medal of Science.

Since that time, the theoretical framework has moved into diverse real-world applications, springboarded by the U.S. government’s high-profile deployment of the technology on U.S. Census Bureau data in 2020. Thanks to the protections afforded by differentially private algorithms, survey-takers who provided personal information to the government enjoyed an extra guarantee of privacy.

The National Institute of Standards and Technology, a government agency that plays a central role in developing guidelines for information security and privacy technology across the United States, has proposed hosting the new public registry, with a final decision pending. 

A resource for the DP community

Billed as a resource hub for the differential privacy community to support broader understanding and communication across sectors, the new database should not only help create new users of differential privacy but also help legal and policy teams better understand existing uses. Current deployments in the database include large companies like Apple and Microsoft as well as government agencies like the National Statistics Office of Korea, who have self-reported their differential privacy deployments.

Key insights into how to design the registry came from a 2025 research study led by Priyanka Nanayakkara, a postdoctoral researcher in Vadhan’s lab, who joined Harvard in 2024 with plans to develop the registry. The research has been accepted for publication by the IEEE Symposium on Security and Privacy. Together, Nanayakkara, Ph.D. student Elena Ghazi, and Vadhan developed a research prototype of the registry and conducted a user study with practitioners to learn how they might use the registry in their work. 

During the research process, they worked with collaborators on the OpenDP team and at Oblivious, an Ireland-based data privacy company, to incorporate their research into a live version of the registry initially started by Oblivious a year prior.

“We said, ‘How can we build the registry concept out into an interactive interface so that it’s usable by practitioners? Longer term, it would be great to further develop the registry to be usable by policymakers and data subjects – for example, if you are contributing your personal data for model training for analysis, wouldn’t it be great to be able to use the registry to see how your data has been protected?’” Nanayakkara said.

Mathematically rigorous privacy guarantee

Differential privacy is a mathematically formulated definition of privacy. Rather than a set of particular algorithms or equations, it is a benchmark for privacy protection that’s afforded by the process of constructing a post-analysis dataset such that individual information cannot be extracted from it, either unintentionally or otherwise.

For example, if a medical database was used for a statistical analysis or to train a machine learning model, the data would be differentially private only if individual information would be difficult to retrieve from the published results. This standard is met by adding random statistical “noise” during computations of the data. These carefully calibrated blurring mechanisms are created via algorithms that employ specific probability distributions.

The idea for a public-facing deployment registry was initialized by a 2018 paper by Dwork and colleagues. Computer scientists label the critical parameter that must be set when using differential privacy as “epsilon,” so the paper first called the idealized database an “epsilon registry.”

Dwork, who has been giving talks on differential privacy for 20 years, said that the choice to implement the technology is always a policy decision, not a technical one – “yet still, every time, the first question from a general audience is, ‘How should we choose epsilon?’” Dwork said. 

Thus, she is “thrilled” with the establishment of the registry and “in awe” of Vadhan’s leadership in building and sustaining the OpenDP community. “The collective wisdom of the community in balancing the feasible and the tolerable will aid future practice, not just in choosing epsilon but in myriad other decisions and strategies needed for the deployment of differential privacy in different settings and with different goals,” Dwork said.

While it remains to be seen how the new registry will change the differential privacy landscape, initial findings from the Harvard user study are promising: For instance, many practitioners saw potential for the registry to become a needed hub for the community, helping to develop best practices and inform future deployments.

 

Do you trust me? A framework for making networks of robots and vehicles safer


Paper describes how 'cy-trust' can enable resilient systems



Harvard John A. Paulson School of Engineering and Applied Sciences

Figure_1 

image: 

An illustration of how adversarial agents could misreport sensor readings or spoof data to strategically alter the behavior of connected vehicle fleets.

view more 

Credit: REACT Lab / Harvard SEAS




Key Takeaways

  • Harvard SEAS researchers and a multi-university team that includes information theorists and experts in wireless communications, optimization theory, machine learning, and robotics, introduce “cy-trust” as a quantitative measure of how much a robot or vehicle in a networked system should trust information from another agent before acting.
  • Their paper argues for cy-trust to be embedded in system designs for ride-share fleets, truck platoons and other automated cyber-physical systems. 

From birds flying in formation to students working on a group project, the functioning of a group requires not only coordination and communication but also trust — each member must be confident in the others. 

The same is true for networks of connected machines, which are rapidly gaining momentum in our modern world – from self-driving rideshare fleets, to smart power grids. 

Harvard computer scientists, together with a multi-university team that includes information theorists and experts in wireless communications, optimization theory, machine learning, and robotics, are presenting their vision for incorporating the concept of trust into emerging cyber-physical systems. A new paper led by Stephanie Gil, the John L. Loeb Associate Professor of Engineering and Applied Sciences in the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) and associate faculty member in the Kempner Institute, proposes a foundational framework to help multi-agent, connected systems decide what information they can trust before they act. 

“Cyber-physical systems are going to become very pervasive,” said Gil, who co-authored the paper in Proceedings of the IEEE. “The question is, how do we secure these systems? How do we make sure they are going to be resilient as they go into the real world? This is something we had to learn from making internet systems secure.” 

Cy-trust as a measure of trustworthiness

The paper introduces the concept of “cy-trust:” a quantitative measure of how much one autonomous agent, such as a vehicle or robot, should trust another agent or stream of data when making decisions. The researchers argue that establishing this trust framework is paramount to developing secure and reliable connected systems in the future. 

Traditional network security infrastructure that protects software and data from misuse or theft typically focuses on who is allowed access to a system. But for fleets of robots, vehicles, or smart devices that must constantly coordinate in real time, Gil and colleagues argue that these traditional safeguards are not enough. 

The paper surveys threats that are specific to multi-agent, cyber-physical systems, such as malicious or “greedy” behavior by individual agents that disrupt coordination – for example, an autonomous car that speeds up to cut in line and create a dangerous merge. It could also mean false or manipulated data in crowdsourced traffic maps, which could help a hacker reroute traffic for nefarious purposes. Or, in fleets of agents performing search-and-rescue operations, a hacked agent could spoof their location, claiming to be somewhere it’s not and causing gaps in surveillance. 

All of these examples could lead to real harm in the physical world by causing accidents and endangering pedestrians or emergency response. Yet, the paper also points out that such “embodied” systems offer a key advantage, by housing individual systems of sensors and computers on board.

Gil and her co-authors propose that onboard sensors, such as cameras, lidar, radar, and GPS, could be used to cross-validate information received from other agents or from the cloud as a built-in measure of trust. Applying signal-processing to wireless communications received could allow each vehicle or robot to validate the origin of the data. 

Gil and collaborators envision each agent assigning a numerical trust value between 0 and 1 to data from other agents, based on cues from sensing, context, network behavior, and past experience. Those values would determine how strongly each piece of information should influence the agent’s decisions; for example, if a vehicle in a rideshare fleet had a low trust value, the other vehicles might successfully ignore it to protect the overall system from collapse. 

“There’s a clear parallel between this concept of cy-trust, and the familiar kind of psychological trust,” Gil said. “The idea is that psychological trust is a way of accepting risk in an environment where some level of risk is inevitable, you don’t have full access to information, but you still need to make decisions.” 

Cy-trust into practice in the lab

Gil is already testing these ideas in her lab. 

In one set of experiments, a group of blue-team robots represents cooperative agents trying to reach agreement – for example, on a heading direction so they can move as a platoon – while a group of red-team robots mimics attackers launching a disruption to the network by creating fake identities in a “Sybil Attack.” 

Typically, networked robots would simply accept every message and run a standard distributed consensus or optimization algorithm, making them vulnerable to attack. The red-team robots could nudge the group into unsafe or inefficient behavior by pretending to be many different agents or by falsifying their positions.

In Gil’s experiments, each blue-team robot listens to incoming wireless messages; performs signal processing on the physical wireless signal; and ascertains whether messages that appear to come from many distinct agents are in fact emerging from the same physical source. This generates a trust score for each purported agent. 

Over time, the system learns which agents are likely malicious and chooses to ignore their inputs, allowing the blue-team robots to continue their collective task. 

The researchers also argue that cy-trust must be built into policy and regulation before it can gain public acceptance, especially as autonomous coordinated systems are already on the rise. Ride-share vehicle fleets are being deployed in cities like Phoenix and San Francisco. Truck platooning and automated convoys are under active development in an attempt to streamline supply chains. And automated warehouses, like those that power Amazon fulfillment centers, already rely on fleets of robots, although in a controlled environment. Moving such systems into the open world is an exciting and logical next step, Gil says. 

The interdisciplinary survey paper on enabling trust in cyber-physical systems "could not come at a more important time," said Andrea Goldsmith, paper co-author and president of Stony Brook University. "As we move into a world where so many of our physical systems consist of multiple agents controlled by AI in the cloud, we require a rigorous framework for their design that is secure and robust against malicious agents. Our paper provides a comprehensive roadmap of state-of-the-art techniques and new research frontiers to design secure robust collaborative multiagent systems."

The paper's co-authors are Michal Yemini of Bar-Ilan University in Israel; Arsenia Chorti of ENSEA in France; Angelia Nedic of Arizona State University; Vincent Poor of Princeton University; and Andrea J. Goldsmith of Stony Brook University.  

 

First cataloguing of lakes beneath the Canadian Arctic



Map showing 33 subglacial lakes will inform research on impacts of a warming planet





University of Waterloo

Manson Icefield 

image: 

Manson Icefield, Nunavut, above the location where a subglacial lake drains and fills.

view more 

Credit: Dr. Luke Copland/University of Ottawa





Researchers have created the first map of a network of subglacial lakes in the Canadian Arctic showing 33 bodies of water under glaciers.

Using a decade of ArcticDEM satellite data of the Earth’s surface height, a team of researchers including the University of Waterloo developed a method that allowed them to track the draining and filling of active subglacial lakes in unprecedented detail.

Until now, scientists knew little about these bodies of water. The discovery helps scientists better understand this rapidly melting region, which is one of the main contributors to the loss of the world’s glaciers.

“Now we can further characterize the way the Arctic environment is changing, which can be an indication of climate change impacts on the region,” said Dr. Wesley Van Wychen, a professor in the Faculty of Environment at Waterloo. “Changes in water storage are important in terms of understanding how the speed of glaciers may change. By measuring the draining and filling of these lakes and determining how quickly this process could happen is another way that we can characterize the impacts of climate change on the Arctic environment.”

In addition to the classic subglacial lake, which is confined below a single glacier, the researchers noticed and catalogued two other types of subglacial lakes for the first time. Terminal subglacial lakes were found where two glaciers converge. They found partial subglacial lakes beside open water. 

“What's important about creating this kind of classification system is that the glacier flow speed will be impacted differently depending on what type of lake it is,” said Dr. Whyjay Zheng, a professor at the National Centre University in Taiwan and first author of the paper. “Whenever you have a body of water underneath a glacier, that water can act as lubrication between the glacier and its bed, allowing the glacier to move faster.”

Further research will attempt to confirm the stability of these lakes and to try to determine where the water goes during drainage events and if there is an impact on glacier flow

Researchers from the University of Bristol and the Remote Sensing Technology Centre of Japan also contributed to this work. The study Active subglacial lakes in the Canadian Arctic identified by multi-annual ice elevation changes appears in EGUsphere.

 

Frequent prescribed burns help young oaks thrive despite invasive grasses, Illinois study finds





University of Illinois College of Agricultural, Consumer and Environmental Sciences





URBANA, Ill. -- As winter comes to a close, many people look forward to warmer temperatures and spring blooms, but for land managers working to preserve or restore oak-dominated forests, it is prescribed burn season. Fire brings more light into forests, which is crucial for young oak tree growth, but many land managers are concerned about how non-native plants affect fire intensity and young tree survival rates. A new University of Illinois Urbana-Champaign study found that conducting more prescribed burns in forests with invasive grasses creates conditions that benefit young oak trees.

Professor Jennifer M. Fraterrigo studies how ecosystems respond to environmental change, from fire to disease, and said that land managers in southern Illinois, working with large forest plots, brought their concerns about fire and grass invasion affecting oak regeneration to her team.  

“This was a real problem for land managers. Prescribed fire is the most effective tool they have to manage large areas. If fire is having this unexpected, potentially adverse effect, it would be difficult for them to achieve their management objectives,” said Fraterrigo, a researcher in the Department of Natural Resources and Environmental Sciences, part of the College of Agricultural, Consumer and Environmental Sciences at U. of I.

Illinois’ native oak-hickory forests have adapted to survive fires and benefit from controlled burns that remove woody debris from the understory and create canopy gaps for more sunlight to reach the forest floor. This allows acorns to germinate and grow into future forests. However, the introduction of non-native grasses into forest ecosystems has led to uncertainty about the behavior and effects of fire.

Invasive grasses, such as Microstegium vimineum, commonly known as stiltgrass, can cover the forest floor and prevent native plants from growing. Stiltgrass can also respond well to prescribed burns and add to the fuel load, leading to hotter and longer fires that can, in turn, create conditions for more invasive plants to thrive and harm native plants.

To better understand how repeated prescribed fires in Midwestern oak-hickory forests affected invasive species, forests, and oak regeneration, the researchers applied controlled burns in plots with young oak trees in the Shawnee National Forest in southern Illinois. They found that frequent fires increased light in the forest understory and reduced fire intensity. Almost twice as many young oak trees survived and resprouted in plots that had been burnt more often as those with a single burn. Stiltgrass cover also decreased with more frequent fires.

“A lot of research has previously focused on the effects of one or two burns,” Fraterrigo said. “This study demonstrates that we need a lot of fire for a long period of time to achieve the results that we want.”

Co-author Dan Marshalla, at the time an NRES graduate student, led much of the data collection and analysis. “Our findings of the benefits of repeated fire for oak regeneration should boost the confidence of land managers who want to use prescribed fire to promote oaks but are wary about the presence of stiltgrass,” he said, adding that further research is needed to understand the effect of repeated fire throughout stiltgrass's life cycle.

As part of the study, University of Illinois foresters assisted with the controlled burns and collecting data. The Extension forestry program works with forest landowners to increase their management skills and address challenges, including invasive species. “The study supports the use of prescribed fire as a management practice for forests in Illinois, especially as a means to promote oak regeneration,” said Forestry Extension and Research Specialist Chris Evans, who supported the field research. “The most interesting aspect was how repeated fire seemed to mitigate some of the negative impacts of stiltgrass.”

The paper, “Increased fire occurrence benefits early oak regeneration in temperate deciduous forests in part by disrupting an invasive grass-fire feedback,” is published in the Journal of Applied Ecology [DOI: 10.1111/1365-2664.70279].

This research was made possible in part by Hatch funding from USDA’s National Institute of Food and Agriculture and the U.S. Forest Service.

 

Global carbon credit program risks rewarding the wrong behavior



Yale University
Deforestation in Tropical Forests 

image: 

Rows of Sawed Logs in Forest, Olympic Peninsula, Washington State 

view more 

Credit: istock halbergman






United Nations-backed framework for protecting tropical forests could allow governments to collect income from carbon credits without advancing forest conservation. The weakness lies in how the program calculates baselines, which is the expected rate of deforestation without intervention. There is no evidence that enrolled jurisdictions — countries, states, and provinces — have acted on that opportunity, but the incentive structure favors those who do, Yale researchers found. It also penalizes the jurisdictions that are in most need of intervention.

Emissions from land use changes come mostly from deforestation and account for about 10-12% of total anthropogenic carbon dioxide emissions, according to the Global Carbon Budget. The forest credit program at issue in the study is jurisdictional REDD (JREDD+), a variation of the Reducing Emissions from Deforestation and Forest Degradation (REDD+) program, which faced widespread criticism for generating credits that don’t represent real emissions reductions.

REDD+ is a project-based approach to credits, in which landowners enroll to receive payments for reducing deforestation on their plots. It was developed under the United Nations Framework Convention on Climate Change and included in the 2015 Paris Agreement. However, REDD+ projects can include enrolled forests that were never seriously at risk for deforestation. Project-based REDD+ also carries the risk of leakage: while landowners enrolled in the program cut their deforestation rates, others would increase theirs to meet agricultural and market demand.

JREDD+ was developed gradually in response to the problems in the REDD+ program. JREDD+ pays state, provincial, and national governments to reduce deforestation within their borders. Brazil was the first country to create the program in 2008. The voluntary carbon market has increasingly turned to JREDD as a more credible alternative to project-based carbon credits, Yet, the study, published in the Proceedings of the National Academy of Sciences, found that there were also weaknesses in the design of this iteration of the program. These include:

  • Jurisdictions that are already decreasing the rate of deforestation can generate credits even if they aren’t taking any new action.
  • Locations where deforestation is rising — and most in need of funding — can be discouraged from joining the program because they would have to reduce forest loss dramatically before generating their first credit.
  • Deforestation spiked temporarily in half the jurisdictions that enrolled in JREDD right before the crediting period began.

“People have talked a lot about what the benefits are of getting carbon credits from jurisdictional REDD. This study points out that there are definitely some things we should worry about, even if we think that those actors haven't taken advantage of them — yet” said study coauthor Luke Sanford, assistant professor of environmental policy and governance at the Yale School of the Environment.

There are now over $3 billion in committed credit purchases under ART TREES, one of the two major JREDD registries. U.S. companies, including Amazon, Walmart, and Salesforce, are some of the largest purchasers of credits from JREDD+  through the LEAF Coalition, a public-private initiative aimed at halting tropical deforestation by 2030.

“There’s billions of dollars of commitments, but there hasn’t been that much evaluation of what those credits are going to look like, their potential, and the strengths and weaknesses of the program,” Sanford said.

The study’s most concerning finding, the authors note, is the potential for adverse selection — the possibility that jurisdictions could enroll in JREDD+ precisely because they know the program will credit them for reductions they were already planning to make. The mechanism is baked into the program’s design: JREDD protocols calculate baselines using a simple average of deforestation over a previous reference period. That means a jurisdiction already trending toward less deforestation can enroll and collect credits without implementing any new conservation policies.

The researchers also found the reverse is true: jurisdictions with rising deforestation — the places arguably most in need of intervention — are effectively penalized by the baseline structure. This is because they would have to cut their deforestation dramatically just to reach the threshold at which credits kick in, which may not be affordable for them.

Despite the clear structural opportunity for gaming, the study found no evidence that enrolled jurisdictions have exploited it. Governments, unlike private companies, can’t easily optimize enrollment timing to maximize profit. They also face political constraints — election cycles, legislative processes, constituent pressures — that make purely strategic behavior harder, Sanford said.

However, the authors identified an additional issue of concern. Deforestation spiked significantly in the years just before the JREDD+ crediting period began and then dropped back down afterward. Pre-enrollment spikes in deforestation can inflate the baseline, making reductions during the crediting period look larger than they actually are. Sandford and coauthor Alberto Garcia, a former postdoctoral researcher at YSE and assistant professor at the University of Utah, identified this as an “anticipatory moral hazard.”

They suggested a new approach using dynamic baselines. The baseline would be calculated after the crediting period ends based on deforestation data from other jurisdictions deemed comparable, instead of being set in advance. Thus, jurisdictions wouldn't know their own baselines in advance, reducing the opportunity to game the system. This would, however, also make it harder for governments to anticipate the benefits of the program.

“We took seriously how the different incentives faced by both jurisdictional governments and landowners themselves can shape the integrity of forest carbon credits in jurisdictional programs,” Garcia said. “I’m optimistic that better understanding those incentives can help inform the design of more credible carbon markets, especially as jurisdictional REDD+ continues to grow and evolve.”