How to create intelligent robots as those in science fiction?
The researchers provide an Embodied AI survey on achieving robot's behavioral intelligence.
image:
According to the process of robot behavior, we categorize Embodied AI into three modules: embodied perception, embodied decision-making, and embodied execution.
view moreCredit: Weinan Zhang from “Harbin Institute of Technology”
While recent advancements in artificial intelligence (AI) have shown remarkable capabilities in language, vision, and speech processing, these technologies are largely "disembodied." The authors argue this disembodied nature is insufficient for creating the general-purpose intelligent robots often envisioned in science fiction.
They illustrate this using the complex instruction: "clean the room." A classic, disembodied AI can process parts of this task—it can interpret the audio (speech), understand the command's meaning (NLP), and detect objects in a static image (CV). However, this passive analysis is where its capabilities end.
An embodied agent, by contrast, must solve the entire problem. This begins with Embodied Perception; as the robot moves, it perceives far more information than a static view allows (for instance, finding a toy hidden behind a box). It then uses Embodied Decision-Making, knowing the correct sequence (e.g., throw away trash before arranging toys) and how to handle problems (like searching for that missing item). Finally, it performs Embodied Execution—the physical acts of walking, grasping a bottle, or opening a door.
To bridge the gap from passive analysis to behavioral intelligence, a comprehensive new survey from a team of researchers provides a structural framework for the field of Embodied AI. The survey, titled "Embodied AI: A Survey on the Evolution from Perceptive to Behavioral Intelligence," systematically maps the field to guide future research.
The authors propose that achieving intelligent behavior is a process that can be categorized into three modules.
The framework begins with Embodied Perception, which the authors categorize based on its relationship with robot behavior. The first is “perception for behavior,” which focuses on the perception tasks primarily utilized for robot actions. This includes object perception—sensing an object's geometric shape, articulated structure, and physical properties to enable manipulation—and scene perception, which involves building models of the environment, such as metric or topological maps, to guide mobility. The second and more distinct area is “behavior for perception,” which involves incorporating the robot's own behavior into the perception process. The survey details how an agent can use mobility to actively move and obtain more information about objects and scenes, or use manipulation to interact with an object to discover its properties, such as its articulated structure.
The second module, Embodied Decision-Making, addresses how the agent generates a sequence of behaviors to complete a human instruction based on its observations. The survey categorizes this crucial step into two primary domains: Navigation and Task Planning. Navigation involves reasoning a sequence of mobility commands (e.g., 'turn left,' 'move straight') to move through an environment, while Task Planning generates a sequence of manipulation skills (e.g., 'open the microwave,' 'grasp the bottle'), including integrated navigation steps. The authors emphasize that the fundamental challenge in this module is real-world grounding. Unlike purely digital decision-making, an embodied agent must account for numerous real-world challenges, such as physical feasibility, object affordance, and preconditions.
The final module, Embodied Execution, translates the generated decision into physical action. The survey focuses this discussion on manipulation skill learning, defining it as learning a behavior policy that maps skill descriptions and environmental observations to a concrete action, typically an embodiment-independent 7-DoF trajectory for a robot arm. The authors review the two primary algorithmic approaches used to train this policy: Imitation Learning (IL), which learns from human demonstrations, and Reinforcement Learning (RL), which learns through trial-and-error interaction. The survey states that the key research problem in this area is achieving generalization—across varied objects, scenes, skills, and instructions. It also highlights a critical trend: a shift away from training isolated, single-skill models and toward developing General-Purpose Execution Models, which, as a direct application of multimodal large language models, can handle multiple skills within a single model.
By providing this comprehensive three-module framework, the survey aims to structure the research landscape, systematically identify key challenges, and offer a clear roadmap for the field. The authors hope this structural approach will guide the community's efforts in developing the next generation of general-purpose intelligent agents.
Journal
SmartBot
Method of Research
Literature review
Subject of Research
Not applicable
Article Title
Embodied AI: A Survey on the Evolution from Perceptive to Behavioral Intelligence
Isaac Asimov's "Three Laws of Robotics"
- A robot may not injure a human being or, through inaction, allow a human being to come to harm.
- A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
- A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
An integrated monolithic synaptic device for C-tactile afferent perception and robot emotional interaction
Beijing Institute of Technology Press Co., Ltd
image:
A pressure-mechanical integrated synaptic device inspired by human skin mechanoreceptors and synapses that recognizes different touch patterns for discrete affective state classification.
view moreCredit: Wentao Xu, College of Electronic Information and Optical Engineering, Nankai University.
Human emotional interaction relies heavily on CT afferents—unmyelinated nerves in hairy skin that convert gentle tactile stimuli into affective states. For robots to engage in similar empathetic communication, existing tactile sensing technologies fall short: most rely on segregated "sensation-transmission-processing" modules, which cause latency accumulation and high energy consumption due to repeated analog-to-digital conversion. "Current neuromorphic devices for touch either lack low-threshold sensitivity or separate sensing from computation," explained Yue Li, first author of the study. "We aimed to create a single device that both feels gentle touch like human skin and processes that touch into emotional signals—just as CT afferents do."
Pressure-Electronic-Gated (PEG) synaptic device that integrates tactile perception and neuromorphic computing in one monolithic structure. Its design draws direct inspiration from biological systems: (1) A proton-conductive chitosan hydrogel (derived from crustacean exoskeletons or fungi) acts as the gate dielectric, enabling neurotransmitter-like ionic transport and ensuring biocompatibility for potential epidermal integration; (2) A solution-processed poly(3-hexylthiophene) (P3HT) semiconductor channel mimics postsynaptic receptor activation via ionic trapping/detrapping; (3) Gold (Au) source/drain/top-gate electrodes complete the 3-terminal architecture.
The device operates via the synergistic effect of dynamic ionic migration (triggered by voltage) and injection (enhanced by pressure). Key performance metrics set it apart: (1) Ultralow threshold: Responds to pressures as low as 80 Pa—comparable to the gentle touch detected by human CT afferents. (2) Energy efficiency: Operates at just -0.2 V, with a current range of 0.039–24.872 μA (nearly 3 orders of magnitude). (3) Stability: Maintains <1% signal deviation during 2,000 seconds of continuous use and over 1,000 cycles. "Unlike previous devices that require large forces to drive computation, our PEG device processes gentle touch in real time," Prof. Xu noted. "It’s chitosan layer also solves the biocompatibility issue that has blocked epidermal or implantable tactile systems."
To translate tactile input into emotional states, the team leveraged the device’s ability to encode spatiotemporal tactile parameters (pressure, frequency, duration) into distinct Excitatory Postsynaptic Currents (EPSCs)—electrical signals analogous to neural activity. When connected to a microcomputer, the device automatically classifies these EPSC signals into discrete emotions—achieving reliable emotional recognition without separate processing modules.
"We’re now working to scale the device into flexible arrays for full-body robot 'skin,'" Prof. Xu added. "This technology does not just make robots 'touch-sensitive'—it makes them capable of understanding the emotional meaning behind touch."
Authors of the paper include Yue Li, Lu Yang, Qianbo Yu, Yi Du, Ning Wu, Wentao Xu.
This work was supported by the National Key R&D Program of China (2022YFA1204500, 2022YFA1204504, and 2022YFE0198200); the National Science Fund for Distinguished Young Scholars of China (T2125005); the Fundamental Research Funds for the Central Universities, Nankai University (BEG124901 and BEG124401); the China Postdoctoral Science Foundation (2024M761520); the Postdoctoral Fellowship Program of CPSF (GZC20250388); and the Shenzhen Science and Technology Project (JCYJ20240813165508012).
The paper, “An Integrated Monolithic Synaptic Device for C-Tactile Afferent Perception and Robot Emotional Interaction” was published in the journal Cyborg and Bionic Systems on Aug 19, 2025, at DOI: 10.34133/cbsystems.0367.
Journal
Cyborg and Bionic Systems

No comments:
Post a Comment