Tuesday, April 22, 2025

Robot see, robot do: System learns after watching how-to videos

Cornell University

ITHACA, N.Y. – Cornell University researchers have developed a new robotic framework powered by artificial intelligence – called RHyME (Retrieval for Hybrid Imitation under Mismatched Execution) – that allows robots to learn tasks by watching a single how-to video.

Robots can be finicky learners. Historically, they’ve required precise, step-by-step directions to complete basic tasks and tend to call it quits when things go off-script, like after dropping a tool or losing a screw. RHyME, however, could fast-track the development and deployment of robotic systems by significantly reducing the time, energy and money needed to train them, the researchers said.

“One of the annoying things about working with robots is collecting so much data on the robot doing different tasks,” said Kushal Kedia, a doctoral student in the field of computer science. “That’s not how humans do tasks. We look at other people as inspiration.”

Kedia will present the paper, “One-Shot Imitation under Mismatched Execution,” in May at the Institute of Electrical and Electronics Engineers’ International Conference on Robotics and Automation, in Atlanta.

Home robot assistants are still a long way off because they lack the wits to navigate the physical world and its countless contingencies. To get robots up to speed, researchers like Kedia are training them with what amounts to how-to videos – human demonstrations of various tasks in a lab setting. The hope with this approach, a branch of machine learning called “imitation learning,” is that robots will learn a sequence of tasks faster and be able to adapt to real-world environments.

“Our work is like translating French to English – we’re translating any given task from human to robot,” said senior author Sanjiban Choudhury, assistant professor of computer science.

This translation task still faces a broader challenge, however: Humans move too fluidly for a robot to track and mimic, and training robots with video requires gobs of it. Further, video demonstrations – of, say, picking up a napkin or stacking dinner plates – must be performed slowly and flawlessly, since any mismatch in actions between the video and the robot has historically spelled doom for robot learning, the researchers said.

“If a human moves in a way that’s any different from how a robot moves, the method immediately falls apart,” Choudhury said. “Our thinking was, ‘Can we find a principled way to deal with this mismatch between how humans and robots do tasks?’”

RHyME is the team’s answer – a scalable approach that makes robots less finicky and more adaptive. It supercharges a robotic system to use its own memory and connect the dots when performing tasks it has viewed only once by drawing on videos it has seen. For example, a RHyME-equipped robot shown a video of a human fetching a mug from the counter and placing it in a nearby sink will comb its bank of videos and draw inspiration from similar actions – like grasping a cup and lowering a utensil.

RHyME paves the way for robots to learn multiple-step sequences while significantly lowering the amount of robot data needed for training, the researchers said. RHyME requires just 30 minutes of robot data; in a lab setting, robots trained using the system achieved a more than 50% increase in task success compared to previous methods, the researchers said.

Media note: Video and images can be viewed and downloaded here.

For additional information, see this Cornell Chronicle story.

-30-

DOI

10.48550/arXiv.2409.06615

https://www.eurekalert.org/news-releases/1081343#:~:text=Microscopic%20robots%20may,22%2DApr%2D2025

Microscopic robots may shape the future of health, tech and the environment

Penn State

Research team at table in discussion — image:
Stewart Mallory, assistant professor of chemistry and chemical engineering at Penn State, leads a research group that studies active matter, specifically the collective behavior of self-propelled microscopic particles.
view more
Credit: Michelle Bixby / Penn State

UNIVERSITY PARK, Pa. — From microscopic robots that can carry and deliver drugs inside the human body to tiny particles that can detect and break down microplastics, an emerging field called active matter is looking toward the microscale to solve some of the world’s biggest problems.

Stewart Mallory, assistant professor of chemistry and chemical engineering at Penn State, leads a research group that studies active matter, specifically the collective behavior of self-propelled microscopic particles. The group's goal is to develop theoretical and computational tools to control the behavior of matter at the microscale and ultimately design new materials and devices with it.

The team published a paper today (April 22) in The Journal of Chemical Physics describing a solution to a common problem in micro-engineering, a field focused on the design and creation of tiny machines or devices, some so small they are invisible to the naked eye. Mallory spoke about his research and the field more generally in the following Q&A.

Q: What was the micro-engineering problem and what was your solution?

Mallory: A major problem with designing anything that moves, both large and small, is how is its motion altered when placed in a confined environment. In essence, we want to know if an object starts at an initial position how far will it move in a given time interval. We are interested in the problem when objects are confined to a narrow channel and unable to pass one another. If we have the time that something starts moving and we need to know how far away it will be at a later time, then we need to be able to solve this problem.

It’s a really old problem in statistical physics called single file dynamics, and it actually pops up in a lot of places outside of chemistry and physics. Think about any time you’re standing in line or stuck in traffic, you’re not able to pass the people next to you, you’re moving in single file and you’re asking yourself how long it’s going to take to get where you want to go — that’s the problem we focused on solving.

When we’re talking about small, particle-sized robots, they are going to be used in confined environments, like delivering medication into the bloodstream and different locations within the body. Before we can deploy these systems, we need to first run simulations to understand how these microscopic swimmers behave in complex environments. We have to be able to predict where they will travel and how long it will take them to get there. If they run into a single file situation, we need to be able to factor in that time, so we derived an equation that tells you that.

Q: Has this microscopic advancement changed anything about how you understand the human-scale world?

Mallory: It has definitely changed my perspective as a driver. Driving on a two lane road is a nice example of single-file dynamics as cars are unable to pass one another. If you've ever driven your car and it seems like people are stopping for no reason, which is called a “phantom traffic jam.” These slowdowns in traffic emerge spontaneously and are typically caused by small fluctuations in the speed or spacing of cars that amplify over time due to human reaction times and delayed braking or acceleration. In our work on active particles moving in narrow channels, we have observed similar behavior that leads to particles clustering together and slowing down. So yes, this paper has made me think a lot more about traffic.

Q: Before tackling this single file problem, you published a paper showing a potential way to tune Phoretic Janus particles. What are they, why are they significant and why do you want to tune them?

Mallory: About 20 years ago, a team of Penn State scientists invented these self-propelled nanoparticles that they called Phoretic Janus particles. They are these tiny particles, typically micron-sized or 100 times smaller than the width of a human hair, that can propel themselves through a fluid.

Their surface is composed of two chemically distinct regions, which is why they carry the name “Janus,” the Roman god of duality and transitions. That duality allows them to create and maintain chemical gradients around themselves that allows for self-propulsion. Imagine a tiny submarine with two sides, one that pushes water out and the other that pulls water in. This creates a flow that propels the submarine forward. That’s similar to how these particles work. By tuning them, we can control how and where they move in response to chemical signals.

Q: Why did you want to focus on these particles, and what did you discover about them?

Mallory: First, it was interesting to me that they were discovered and designed here at Penn State and now they are studied all over the world. It also made sense to focus on them, because we’re interested in particles that can essentially swim. These little microswimmers have a wide range of applications. They can be put to work within the body, for cases like targeted drug delivery, or they can clean up problems in the environment, breaking down harmful chemicals, bacteria or microplastics. They are relatively new tools, so our group is working to understand how these particles behave, how they self-propel, what kind of fuel they use and how that fuel changes their dynamics.

In general, these particles are ideal for applications where targeted, microscopic movement is needed. And unlike passive particles that rely on external forces to move, Phoretic Janus particles generate their own motion, which means we can figure out how to “drive” them by adjusting the chemical composition of the particle’s two surface regions.

Q: In keeping with that “driving” metaphor, what kinds of fuel do the particles use?

Mallory: Depending on the particle's composition, different fuel sources will power their movement. For example, hydrogen peroxide can be used as fuel for particles that have a metallic region, while other enzyme-coated particles can use bio-based fuels like glucose.

But there is also interaction between particles that can affect their movement, which is why our work focuses on understanding particle behavior on two levels: individual and collective. We’ve talked about the individual level, where we are aiming to control and accurately simulate the behavior of single Janus particles using advanced computational methods.

On the collective level, we study how behavior changes when many particles interact, exploring the dynamics of their collective behavior. Ultimately, our goal is to integrate these approaches, developing highly accurate simulations that capture the interactions of many particles in complex systems.

Q: What are some potential applications for your research?

Mallory: I’ll give you a very specific one. There are nanoparticles made of calcium carbonate that respond to pH gradients generated by cancer cells, allowing them to swim toward the cancer cells. With precise particle design, we can build what are essentially microscopic robots that can sense and move toward specific biological signals, such as the molecules emitted by cancer cells. At some point in the not-too-distant future, we could use those particles to carry a payload of medication and target harmful cells like cancer.

This concept can extend to other applications, like using particles to detect and collect microplastics, offering a potential solution for environmental cleanup.

Q: You also study material research, so how does your research apply to that field?

Mallory: This relates to the collective behavior aspect of our work. The nanoparticles are capable of self-assembly, which is the way that nature builds structures, smaller parts making bigger and bigger parts.

Our work demonstrates that self-propelled particles can enhance this process, making self-assembly a more effective tool for building at the microscale. The idea is that you can design building blocks, suspend them in a solution containing self-propelled particles, and ideally, they would spontaneously form the desired structure.

Q: What is your lab currently working on? What are your next steps?

Mallory: We are developing theories and computational modes to better understand how these particles behave in different environments, which is necessary if we want to develop microscale devices for applications like chemical and drug delivery.

The work we’re doing with Janus particles contributes to a broader field of study focused on systems composed of self-propelled particles and their collective behavior, so the advances in discoveries that we make in our group will have impacts across the entire field of active matter. Any step we make is a step forward in understanding and manipulating matter at the microscale.

Journal

The Journal of Chemical Physics

DOI

10.1063/5.0248772

Method of Research

Computational simulation/modeling

Subject of Research

Cells

Article Title

Single-file diffusion of active Brownian particles

Article Publication Date

22-Apr-2025

Brain-inspired AI breakthrough: making computers see more like humans

IBS-Yonsei team unveils novel Lp-Convolution at ICLR 2025

Institute for Basic Science

Figure 1. Information Processing Structures of the Brain’s Visual Cortex and Artificial Neural Networks — image:
In the actual brain’s visual cortex, neurons are connected broadly and smoothly around a central point, with connection strength varying gradually with distance (a, b). This spatial connectivity follows a bell-shaped curve known as a ‘Gaussian distribution,’ enabling the brain to integrate visual information not only from the center but also from the surrounding areas.
In contrast, traditional Convolutional Neural Networks (CNNs) process information by having neurons focus on a fixed rectangular region (e.g., 3×3, 5×5, etc.) (c, d). CNN filters move across an image at regular intervals, extracting information in a uniform manner, which limits their ability to capture relationships between distant visual elements or respond selectively based on importance.
This study addresses the differences between these biological structures and CNNs, proposing a new filter structure called 'Lp-Convolution' that mimics the brain’s connectivity patterns. In this structure, the range and sensitivity of a neuron’s input are designed to naturally spread in a Gaussian-like form, allowing the system to self-adjust during training—emphasizing important information more strongly while downplaying less relevant details. This enables image processing that is more flexible and biologically aligned compared to traditional CNNs.
view more
Credit: Institute for Basic Science

A team of researchers from the Institute for Basic Science (IBS), Yonsei University, and the Max Planck Institute have developed a new artificial intelligence (AI) technique that brings machine vision closer to how the human brain processes images. Called Lp-Convolution, this method improves the accuracy and efficiency of image recognition systems while reducing the computational burden of existing AI models.

Bridging the Gap Between CNNs and the Human Brain

The human brain is remarkably efficient at identifying key details in complex scenes, an ability that traditional AI systems have struggled to replicate. Convolutional Neural Networks (CNNs)—the most widely used AI model for image recognition—process images using small, square-shaped filters. While effective, this rigid approach limits their ability to capture broader patterns in fragmented data.

More recently, Vision Transformers (ViTs) have shown superior performance by analyzing entire images at once, but they require massive computational power and large datasets, making them impractical for many real-world applications.

Inspired by how the brain’s visual cortex processes information selectively through circular, sparse connections, the research team sought a middle ground: Could a brain-like approach make CNNs both efficient and powerful?

Introducing Lp-Convolution: A Smarter Way to See

To answer this, the team developed Lp-Convolution, a novel method that uses a multivariate p-generalized normal distribution (MPND) to reshape CNN filters dynamically. Unlike traditional CNNs, which use fixed square filters, Lp-Convolution allows AI models to adapt their filter shapes—stretching horizontally or vertically based on the task, much like how the human brain selectively focuses on relevant details.

This breakthrough solves a long-standing challenge in AI research, known as the large kernel problem. Simply increasing filter sizes in CNNs (e.g., using 7×7 or larger kernels) usually does not improve performance, despite adding more parameters. Lp-Convolution overcomes this limitation by introducing flexible, biologically inspired connectivity patterns.

Real-World Performance: Stronger, Smarter, and More Robust AI

In tests on standard image classification datasets (CIFAR-100, TinyImageNet), Lp-Convolution significantly improved accuracy on both classic models like AlexNet and modern architectures like RepLKNet. The method also proved to be highly robust against corrupted data, a major challenge in real-world AI applications.

Moreover, the researchers found that when the Lp-masks used in their method resembled a Gaussian distribution, the AI’s internal processing patterns closely matched biological neural activity, as confirmed through comparisons with mouse brain data.

“We humans quickly spot what matters in a crowded scene,” said Dr. C. Justin LEE, Director of the Center for Cognition and Sociality within the Institute for Basic Science. “Our Lp-Convolution mimics this ability, allowing AI to flexibly focus on the most relevant parts of an image—just like the brain does.”

Impact and Future Applications

Unlike previous efforts that either relied on small, rigid filters or required resource-heavy transformers, Lp-Convolution offers a practical, efficient alternative. This innovation could revolutionize fields such as:

- Autonomous driving, where AI must quickly detect obstacles in real time

- Medical imaging, improving AI-based diagnoses by highlighting subtle details

- Robotics, enabling smarter and more adaptable machine vision under changing conditions

“This work is a powerful contribution to both AI and neuroscience,” said Director C. Justin Lee. “By aligning AI more closely with the brain, we’ve unlocked new potential for CNNs, making them smarter, more adaptable, and more biologically realistic.”

Looking ahead, the team plans to refine this technology further, exploring its applications in complex reasoning tasks such as puzzle-solving (e.g., Sudoku) and real-time image processing.

The study will be presented at the International Conference on Learning Representations (ICLR) 2025, and the research team has made their code and models publicly available:

🔗 https://github.com/jeakwon/lpconv/

The brain processes visual information using a Gaussian-shaped connectivity structure that gradually spreads from the center outward, flexibly integrating a wide range of information. In contrast, traditional CNNs face issues where expanding the filter size dilutes information or reduces accuracy (d, e).

To overcome these structural limitations, the research team developed Lp-Convolution, inspired by the brain’s connectivity (a–c). This design spatially distributes weights to preserve key information even over large receptive fields, effectively addressing the shortcomings of conventional CNNs.

Credit