Real-Time ASoC: Bio-inspired hardware architectures for real-time attention mechanisms using prior contextual knowledge in cognitive robotics

A chinese lion statue
Perception is one of the main challenges for the new generation of robotic platforms. In our dynamic real-world, robots interacting with their environment need closing action-perception feedback loops in real-time. Although real systems are multi-sensory, this project is focused on vision perception which is essential for most of the tasks. As biological systems do, robotic systems need a way of adaptively selecting the relevant information in the scene, for its further processing. Attention mechanisms allow both systems to determine the parts of the visual input array to process.

Cognitive robots need attention to determine the parts of the input sensory array they need to process, in much like biological systems. Let us explain it with a simple example: a robot is asked to “go and bring the shoes”. To begin with, the robot should have a prior common-sense knowledge about where to find the target, in this case, “the shoes” shouldn't be in the shelves, or the wall, or on top of the table, but on the floor. Now, the robot should also have an idea of how “the shoes” look like: size, shape, color, or texture. In other words, the robot should also have a model of the current target under consideration. This model should let the robot find a candidate quickly and efficiently enough, following our example, a few possible locations for “the shoes”.

Project Information

The ability to perceive and understand our dynamic real world is critical for the next generation of multi-sensory robotic systems. One of the most important tasks is the one of search. Biological mechanisms for search are seen as adaptive process that searches for objects of interest managing the limited available processing. In actual fact, cognitive robots also require attention mechanisms to determine which parts of the sensory array they need to process, in a way similar to what biological systems do. In other words, attention consists in selecting the most relevant information from multi-sensory inputs to perform efficiently the search of a target.

A first mechanism consists of a bottom-up approach that is inherent to the scenario and happens at a very early processing stage. This mechanism also includes some prior symbolic contextual knowledge about the target un-der consideration. This knowledge determines to a large extent the area that the robot will have to examine first, and it is usually expressed in natural language in the form of resources such as lexica and ontologies and could be directly accessible by the robot to narrow down the visual search space.

Then, when the robot decides to examine a specific selected area, the robot should have the model of the target with information about its shape, size, color, or texture. This model should describe the target enough to allow the robot to efficiently find a small number of candidates, in order to proceed inspecting each one for segmentation. This second process is voluntary and is called top-down attention.

The project aims to integrate bottom up with top down attention and develop the resulting mechanism in hardware. It is an interdisciplinary project involving linguistics and knowledge engineering (these amount to tools necessary for the robot to deduce that it has to search the floor if it is given the command: “search for the shoes”), computer vision and image processing (these amount to tools necessary for the robot to use color, texture, shape and size for finding the target), distributed control and VLSI design.