Microsoft Launches Vision-Language-Action Model for Robots

Microsoft introduced Rho-alpha, a new vision-language action model designed to make robots more adaptable, responsive, and capable of operating in real-world environments.

The tech giant revealed the generative AI vision-language-action (VLA) model in a blog post earlier this month. The model is derived from Microsoft’s Phi open model series.

Rho-alpha translates natural language commands into control signals for robots performing manipulation tasks.

To train its model, Microsoft said it combined physical demonstrations and simulations together with a multistage reinforcement learning process built on the open Nvidia Isaac Sim framework.

For better perception, Microsoft also added tactile sensing capabilities, enabling robots to use touch to respond to their environment rather than solely relying on visual input.

In future iterations, Microsoft said it plans to add force sensing and other modalities.

A video demonstration included in the blog post shows Rho-alpha interacting with BusyBox, a physical interaction benchmark recently introduced by Microsoft Research, using natural language instructions.

The Microsoft model release comes as more industries are starting to use robots, shifting from narrow, task-specific deployments to rollout across more dynamic, unstructured and often human-centered environments.

Related:Serve Robotics Acquires Hospital Assistant Robot Company

The shift has led to increased focus on models that enable robots to reason and act with greater autonomy.

In this context, Microsoft is positioning Rho-alpha as a more flexible and adaptable AI system for robots, enabling greater deployment opportunities across sectors than traditional models.

“The emergence of VLA models for physical systems is enabling systems to perceive, reason, and act with increasing autonomy alongside humans,” Ashley Llorens, corporate vice president and managing director at Microsoft’s Research Accelerator, said in a blog post introducing the model.

Rho-alpha is currently being evaluated on dual-arm robotic systems and humanoid robots, with Microsoft planning to publish a technical description of the model in the coming months.

The model will initially be available through an early access program, with broader availability planned in the Microsoft Foundry in the future.

Menu

Categories:

Hot right now:

Follow on:

Menu

Categories:

Hot right now:

Follow on:

Microsoft Launches Vision-Language-Action Model for Robots

Share This Post

Related Posts

Iran war, debanking drive commodity traders toward stablecoins, says Haycen CEO

Bitcoin Stalls Near $73K as US-Iran Talks Collapse, Markets Hold Their Breath – Bitcoin News

Carson Reed on the Rise of AI-First Agencies and the End of the Traditional Service Model

The CIA Let AI Write Its First Intelligence Report—And AI ‘Coworkers’ Are Up Next

Trump token sees whale accumulation ahead of Mar-a-Lago gala; senators raise questions over event

Nano Banana vs Nano Banana 2 vs Nano Banana Pro: Which One Should You Choose?

Categories:

Hot right now:

Company:

Follow on: