Custom Conditions
IsHazardous and IsTargetVisible read structured VLM state from BT input ports and decide whether higher-priority branches should execute.
ROS 2 VLM-to-Behavior-Tree robotics middleware for local perception and reactive control.
Status: Work in progress
VISTA is a ROS 2 framework for connecting on-device Vision-Language Model perception with deterministic Behavior Tree control. The repository describes a decoupled two-loop architecture: a slower perception loop analyzes camera frames through a local Gemma VLM served by llama.cpp, while a faster Behavior Tree loop consumes structured state and selects robot behaviors without blocking on model inference.
The project is ideated and maintained by Andrea Ricciardi. My contribution is limited to the Behavior Tree side of the system: the decision structure, custom BT nodes, blackboard integration, and the runtime link between VLM state and BT control logic.
VISTA runs two ROS 2 composable nodes in a multi-threaded component container. GemmaPerceptionNode subscribes to a camera image topic, encodes frames as JPEG, calls a local OpenAI-compatible llama.cpp chat-completions endpoint, parses JSON perception output, and publishes a custom /vlm_data message. BehaviorTreeNode subscribes to that topic, updates a BehaviorTree.CPP blackboard, and ticks the tree at the configured control rate.
Runtime parameters live in config/vista_params.yaml. The same file configures the VLM host and timeouts, camera and output topics, JPEG quality, the system prompt, the BT tick rate, and the BT XML used by the control node.
The current Behavior Tree uses a root Fallback node to encode priority-based decision logic. The high-priority branch checks whether the VLM-reported hazard_level meets the configured threshold and then runs RecoveryBehavior. If no hazard branch succeeds, the target branch checks target_visible and runs PursueTarget with the VLM-provided bounding box. If neither branch applies, DefaultBehavior runs.
IsHazardous and IsTargetVisible read structured VLM state from BT input ports and decide whether higher-priority branches should execute.
RecoveryBehavior, PursueTarget, and DefaultBehavior define the current action leaves. The repository implementation logs behavior selection rather than claiming a full robot actuation stack.
The ROS subscriber callback writes scene_state, hazard_level, target_visible, target_bbox, confidence, and description into the BT blackboard.
The BT structure is loaded from YAML as XML using BehaviorTree.CPP format 4, so the decision tree can be adjusted from configuration.
The repository includes a ROS 2 launch file that starts both composable nodes inside component_container_mt with intra-process communication enabled. The provided tmux setup can start the local llama.cpp server, launch VISTA, and optionally play a rosbag configured in tmux/session.yaml.
The VLM request uses the configured system prompt to request structured JSON with scene state, hazard level, target visibility, target bounding box, confidence, and natural-language description. That structured message is the contract between the slow perception loop and the fast BT loop.
My role is focused on the Behavior Tree part of VISTA. In portfolio terms, this means presenting my work as BT architecture and implementation support, not as authorship of the full VISTA project or the VLM perception framework. The project repository identifies Andrea Ricciardi as maintainer and copyright holder, and I do not present myself as the project author.
VISTA - Visual Intelligence State-Tree Architecture is ideated and maintained by Andrea Ricciardi. The repository package metadata lists Andrea Ricciardi as maintainer, and the license/notice files attribute copyright and software development to Andrea Ricciardi.
Third-party components referenced by the repository include Gemma, BehaviorTree.CPP, llama.cpp, cpp-httplib, and nlohmann/json, under their respective licenses.