Modular
Independent components for cognition, actions, UI, state management, and expression.
M.I.R.A. — Modular Interactive Robotic Agent
Embodied desktop AI interface with expressive behavior, modular cognition, and local action execution.
Status: Active software prototype
M.I.R.A. is a software-first robotic assistant prototype designed around the idea that an intelligent interface should not only answer, but also appear attentive, reactive, and embodied. The current implementation focuses on a desktop companion interface: a PySide6 application with animated eyes, conversational input, session memory, intent recognition, local actions, and a debug panel for tuning expressive behavior in real time.
The long-term objective is to use this software core as the interaction layer for a future physical assistant. Instead of starting from motors and sensors, the project first develops the agent architecture: perception of user input, cognition, action selection, UI feedback, expressive state transitions, and a modular path toward voice, vision, and robotics integration.
The previous naming separated the cognitive core and the physical robot into two identities. M.I.R.A. simplifies the concept into a single project name that can scale from the current desktop prototype to a future embodied robotic platform.
Independent components for cognition, actions, UI, state management, and expression.
Text-based interaction today, with a clear path toward voice, camera input, and multimodal arbitration.
A software agent designed to become the behavioral layer of a future physical assistant.
Animated eyes with blinking, idle motion, emotional states, eyelid deformation, asymmetry, and smooth transitions.
Intent inference, response generation, session memory, action mapping, and event-driven state transitions.
Action registry and executor for system-level tasks such as opening URLs, launching apps, showing notifications, and retrieving system information.
Rule-based interpretation for deterministic behavior, plus an optional local LLM-backed parser through Ollama.
Runtime tuning of facial profiles, blink timing, idle animation, speaking pulse, thinking drift, and asymmetry controls.
Centralized event bus connecting user input, processing states, inferred intents, action results, and visual feedback.
M.I.R.A. is an embodied AI interface prototype that explores how a desktop assistant can feel more present through motion, expression, and stateful interaction. The system is built around a face widget, a chat panel, a debug panel, and a modular backend that coordinates cognition and behavior.
The assistant receives text input, stores the interaction in session memory, infers an intent, optionally builds and executes an action request, generates a response, and maps the result to a visible face state. This creates a tight loop between cognition and embodiment: listening, thinking, speaking, confusion, happiness, tiredness, and other expressive states are reflected visually instead of remaining hidden in the backend.
Main window layout with a large expressive face on the left and a chat/debug control panel on the right.
User input triggers listening, processing, intent inference, action execution, response generation, and visual state updates.
Recent user and assistant messages are stored in memory, together with the latest inferred intent and contextual metadata.
Actions are registered through a central registry and executed through a dedicated executor with success/failure events.
Each face state maps to an expression profile with size, offsets, corner radius, eyelid deformation, blink timing, and animation flags.
Optional Ollama integration converts natural language into structured intents and action requests while preserving the same backend contract.
Current and planned demonstrations for the embodied assistant prototype.
Text interaction with visual feedback: listening while the user types, thinking during processing, and speaking when responding.
Open applications, navigate to websites, show notifications, and retrieve basic system information from natural language commands.
Live tuning of expression profiles for designing eye behavior, emotional states, and micro-animations.
PySide6 UI, animated face, chat panel, debug panel, event bus, state manager, brain, session memory, and rule-based intent engine.
Registry/executor pattern for local actions such as time/date, memory introspection, URL opening, app launching, notifications, and system info.
Use a local model through Ollama to parse natural language into structured intents and actions while keeping rule-based fallback behavior.
Add microphone input, speech-to-text, wake interaction, webcam-based presence detection, and multimodal arbitration.
Extend the software core into a physical robotic assistant with sensors, lights, audio, mechanical expression, and low-level hardware control.
~35% software prototype complete
The current milestone is not a hardware MVP yet. The project is deliberately focused on the interaction and cognition layer first: a working desktop prototype that can express state, interpret basic requests, execute local actions, and provide a foundation for more advanced multimodal and robotic behavior.