EMOROBCARE: A Low-Cost Social Robot for Supporting Children with Autism in Therapeutic Settings

By Alberto Redondo, Arnau Mayoral-Macau, Bartomeu Pou, David Rios, Raquel Ros and Sara Cooper
View Original Paper

We introduce a software architecture and system design for EMOROBCARE, an affordable, open-source social robot created to assist therapists in interventions for children with Autism Spectrum Disorder (ASD) Level 2.

The Challenge: Accessibility in Robot-Assisted Therapy

Children with ASD Level 2 often face significant challenges in social communication and behavior, requiring highly structured and personalized therapeutic interventions. While social robots have proven to be effective “non-judgmental” partners for improving skills like emotion recognition and joint attention, their widespread adoption is hindered by several barriers:

  • High Costs: Many existing clinical-grade robots are prohibitively expensive for common use.
  • Proprietary Ecosystems: Closed software limits the ability of researchers and therapists to customize or extend the robot’s capabilities.
  • Need for Natural Interaction: There is a constant demand for robots that can engage in triadic interactions (therapist-child-robot) while remaining engaging and non-threatening.

Methodology: Modular Software and Low-Cost Hardware

Architecture of the robot

The EMOROBCARE project utilizes a multidisciplinary approach to build a modular, open-source system. The architecture is currently tested in a virtual simulation environment (Gazebo) before its scheduled physical deployment in 2025.

  • Hardware Design: The robot is 30 cm tall with a 3-DOF head, a screen for a cartoon-like expressive face, and a Jetson Nano for perception.
  • Software Stack: Built on ROS 2, the system extends the ROS4HRI framework to include social perception, symbolic reasoning, and LLM-based dialogue management.

Perception Modules

  • Vision: Combines YOLOv8 and YOLO-world for object detection, alongside MiDaS for depth estimation and a custom pointing gesture detection module to interpret where a child is looking or pointing.
  • Communication: Integrates Whisper and Vosk for speech recognition (ASR) and Coqui XTTS for expressive, emotional text-to-speech (TTS).

Dialogue & Behavior

Uses Large Language Models (LLMs) like Llama 3.2 to generate supportive, age-appropriate responses, which are then coordinated by a mission controller that manages interactive “serious games”.

Results and Impact

The research demonstrates the feasibility of integrating advanced AI tools (LLMs and open-vocabulary object detectors) into a low-cost robotic platform.

  • Vision Performance: Testing showed that while task-specific models (YOLOv8) are more precise (0.894 F1-score), open-vocabulary models (YOLO-world) provide the flexibility needed for uncontrolled therapy environments, especially when prompts include descriptive colors.

  • Communication Accuracy: The system identified Whisper as a superior transcription tool for child speech compared to Vosk, which is critical for maintaining the flow of therapy.

  • Therapeutic Potential: A prototype game, “Finding Objects,” was successfully simulated, demonstrating how the robot can lead a session by detecting objects, asking questions, providing multimodal feedback (gestures + facial expressions), and sharing fun facts via the LLM.

  • Future Impact: By prioritizing affordability and open-source modularity, EMOROBCARE aims to lower the entry barrier for specialized clinics to use social robotics as a standard tool for ASD therapy.

Access to the full paper