Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial
Fundació Eurecat
Dalmau Moreno, Magí
Rosell Gratacòs, Jan
2025-07
This project presents the development of a knowledge representation framework for a robotic agent operating in a simulated, previously unmapped and unseen environment. The main objective is to enable the robot to perceive, understand, and reason about its surroundings by dynamically building and maintaining a Knowledge Graph from multimodal input, combining visual frames from the environment and object metadata obtained from the simulator’s perception layer. To achieve this, the framework has a multi-agentic architecture composed of agents built upon state-of-the-art Large Language Models (LLMs), capable of extracting structured relational data from both text and images. This structured information is used to construct a semantic representation of the environment that evolves as the robot explores new areas. The robot then uses this knowledge to interact with users by answering natural language queries based on the updated information it has accumulated. The entire system is deployed in a simulated environment using AI2-THOR and ROS2, with custom modules developed for perception, knowledge graph generation, key frame detection, and user interaction. Results show that the proposed approach enables effective knowledge accumulation and contextual reasoning, in previously unknown environments. However, there is a trade-off between computational time and the accuracy of knowledge representation, depending on how much autonomy is given to the agents in constructing that knowledge. When the process is left entirely to a single LLM that ingests all the information at once, the acquisition of knowledge is very fast. However, the resulting answers tend to be more generic, less precise, and more prone to hallucinations. In contrast, when using a multi-agentic framework that builds a structured knowledge graph by explicitly extracting and processing semantic relations, the resulting representation is much more accurate. Although this process takes more time, it leads to higher-quality answers from the robot and provides a much more scalable solution in the long term for growing and maintaining knowledge.
Master thesis
Inglés
Àrees temàtiques de la UPC::Informàtica; Intelligent agents (Computer software); Robotics; Knowledge representation (Information theory); Agents intel·ligents (Programari); Robòtica; Representació del coneixement (Teoria de la informació)
Universitat Politècnica de Catalunya
Open Access
Treballs acadèmics [82541]