Steering Vision-Language-Action Models for Safe Robotic Execution

Overview

This group project investigated whether pretrained vision-language-action models can be modified during inference through activation-space interventions rather than retraining. The experiments used Pi 0.5 on a Franka robotic arm to test whether language-derived steering vectors could alter physical pick-and-place trajectories.

Approach

The project compared Activation Addition (ActAdd) and Conditional Activation Steering (CAST). Contrastive prompts describing desired and undesired behaviors were used to construct steering vectors, which were injected through PyTorch forward hooks at mid-to-late transformer layers. ActAdd applied a steering vector throughout a rollout, while CAST attempted to activate the intervention only when the visual context indicated it was relevant.

Evaluation and findings

ActAdd produced clear physical changes: high-trajectory steering consistently raised the end-effector path, while the inverted steering direction produced lower trajectories. However, unconditional steering could interfere with task execution, including missed target objects when the global intervention overrode task-dependent motion.

CAST produced promising context-similarity signals, but did not yet alter behavior reliably in visually ambiguous or cluttered states. The results suggest that activation-space steering can shape real robotic behavior without retraining, while reliable context-aware gating remains necessary for safety-critical use.

My contribution

Contributed to the project’s simulation and VLA setup by evaluating InternNAV in Isaac Sim, preparing fine-tuning setup for InternVLA-M1, and assisting with robotic-arm testing. Earlier StreamVLN setup work was discontinued when that platform was not pursued.