Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate

Chen Yang

yangchen@dexmal.com

Yucheng Hu

hyc@dexmal.com

Yunchao Ma

myc@dexmal.com

Yunhuan Yang

yyh@dexmal.com

Jing Tan

tanjing@dexmal.com

Haoqiang Fan

fhq@dexmal.com

arXiv Code

Abstract

In deployment of the VLA models to real-world robotic tasks, execution speed matters. In previous work we analyze how to make inference time of VLAs on GPU fast, and in this report we attack the remaining issues. We show a set of techniques that are important to let a light-weight robotic reach human speed with VLA generated trajectory.

The stack of technology ranges from calibration, planning and control, and learning based method to identify optimal execution speed. The end-to-end result of our methods is that on a set of real-world tasks that require both dexterity and accuracy, we let the robot execute about 3x faster than a standard execution baseline, on par with casual human operation, and close to the robot’s hardware limit.

Post-processing framework of VLA's trajectory — Post-processing framework of VLA’s trajectory.

Demo Videos

Below are three videos of the robot performing complex tasks in real environments. Each task is also accompanied by a corresponding RRD record, where one can inspect synchronized videos from three camera views together with the visualization of different trajectory curves. Across all three tasks, Realtime-VLA V2 reaches execution that is on par with casual human operation.

Citation

If you find this project useful in your research, please cite:

@article{yang2026realtimevlav2,
  title={Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate},
  author={Yang, Chen and Hu, Yucheng and Ma, Yunchao and Yang, Yunhuan and Tan, Jing and Fan, Haoqiang},
  journal={arXiv preprint arXiv:2603.26360},
  year={2026}
}

Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate

Abstract

Demo Videos

Fold Shirt

Pick Latch

Place Into Fixture

Citation