Code-as-Monitor | Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Code-as-Monitor is the first framework to integrate both reactive and proactive failure detection.

Code-as-Monitor leverages the proposed constraint elements to simplify real-time failure detection with high precision.

Code-as-Monitor achieves state-of-the-art (SOTA) performance in both simulated and real-world environments, and exhibits strong generalizability on unseen scenarios, tasks, and objects.

For the task "Move the pan with lobster to the stove without losing the lobster", (a) reactive failure detection identifies failures after they occur, and proactive failure detection prevents foreseeable failures. In (a), the robot detects the failure after the lobster unpredictably jumps out due to the heat. In (b), pan tilting is detected and corrected it requiring real-time precision. (c) shows that our method combined with an open-loop policy forms a closed-loop system, enabling proactive (e.g., detecting moving glass during grasping) and reactive (e.g., removing toy after grasping) failure detection in cluttered scenes.

Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively after they occur and prevent foreseeable ones proactively. To this end, we propose Code-as-Monitor (CaM), a novel paradigm leveraging the vision-language model (VLM) for both open-set reactive and proactive failure detection. The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring. To enhance the accuracy and efficiency of monitoring, we further introduce constraint elements that abstract constraint-related entities or their parts into compact geometric elements. This approach offers greater generality, simplifies tracking, and facilitates constraint-aware visual programming by leveraging these elements as visual prompts. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances compared to baselines across three simulators and a real-world setting. Moreover, CaM can be integrated with open-loop control policies to form closed-loop systems, enabling long-horizon tasks in cluttered scenes with dynamic environments.

Overview of Code-as-Monitor. This framework unifies reactive and proactive failure detection via constraints, more generally abstracts relevant entities/parts through constraint elements, and ensures precise and real-time monitoring via code evaluation.

Constraint Element Pipeline. Given a constraint, our model ConSeg generates instance-level and part-level masks across multiple views, which are projected into 3D space. Through a series of heuristics, the desired elements are produced. Once all elements are obtained, they are annotated onto the original multi-view images.

BibTeX

@article{zhou2024code,
    title={Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection},
    author={Zhou, Enshen and Su, Qi and Chi, Cheng and Zhang, Zhizheng and Wang, Zhongyuan and Huang, Tiejun and Sheng, Lu and Wang, He},
    journal={arXiv preprint arXiv:2412.04455},
    year={2024}
}

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

CVPR 2025

Highlights

Summary Video

Motivation

Real-world Demos

Abstract

Method Overview

Constraint Elements

CLIPort Simulator Demos

Omnigibson Simulator Demos

Visualization of Constraint-aware Segmentation

BibTeX