StyleLoco

Generative Adversarial Distillation for Natural Humanoid Robot Locomotion

Le Ma^1,*

Ziyu Meng^1,2,*

Tengyu Liu¹

Yuhan Li^1,3

Ran Song²

Wei Zhang²

Siyuan Huang¹

¹National Key Laboratory of General Artificial Intelligence, BIGAI

²School of Control Science and Engineering, Shandong University

³Huazhong University of Science and Technology

* Equal contribution

Paper

arXiv

Video

We introduce StyleLoco, a novel two-stage framework that bridges the gap between robust task execution and natural motion synthesis through a Generative Adversarial Distillation (GAD) process. Our framework begins by training a teacher policy using reinforcement learning to achieve agile and dynamic locomotion. It then employs a multi-discriminator architecture, where distinct discriminators concurrently extract skills from both the teacher policy and motion capture data. This approach effectively combines the agility of reinforcement learning with the natural fluidity of human-like movements while mitigating the instability issues commonly associated with adversarial training.

We introduce StyleLoco, a novel two-stage framework that bridges the gap between robust task execution and natural motion synthesis through a Generative Adversarial Distillation (GAD) process. Our framework begins by training a teacher policy using reinforcement learning to achieve agile and dynamic locomotion. It then employs a multi-discriminator architecture, where distinct discriminators concurrently extract skills from both the teacher policy and motion capture data. This approach effectively combines the agility of reinforcement learning with the natural fluidity of human-like movements while mitigating the instability issues commonly associated with adversarial training.

Demonstrations Video

Generative Adversarial Distillation

The core innovation of StyleLoco is our GAD framework, which synthesizes natural and adaptive behaviors from two complementary sources: a well-trained teacher policy and a reference motion dataset. As illustrated in figure, GAD trains a student policy alongside two AMP-style discriminators. Each discriminator evaluates the student's generated state transitions against one source of reference motions: either the teacher policy or the motion dataset.

Stylized Locomotion

We validate our method's capability to combine robust locomotion skills with distinct motion styles by implementing the 'arms-akimbo walking' maneuver on the Unitree H1 full-size humanoid platform.

Arms-akimbo Style

Motion Demonstration

Simulation

Real Robot

Deploying asymmetric limping gaits on the Unitree H1 humanoid poses significant challenges in dynamic stability maintenance, yet our method enables the student policy to move with the distinctive limp characteristics.

Limping Style

Motion Demonstration

Simulation

Real Robot

While the reference 'arms-akimbo' motion in the LaFAN1 dataset exhibits limited directional diversity (forward: 100% vs. lateral & backward: 0%), our student policy trained with GAD demonstrates emerge omnidirectional capability and extrodinary robustness.

Omnidirection

Robustness

Natural Gait Transition

and Outdoor Evaluations !

By combining several gait patterns in different velocity ranges and involving omnidirectional teacher policy, the learned controller attains robust omnidirectional stylized locomotion with natural gait transitions. Our outdoor real robot experiments are conducted on low-friction snowy surfaces and uneven grassland which is extremely challenging for locomotion!

BibTeX


          @misc{ma2025styleloco,
            title={StyleLoco: Generative Adversarial Distillation for Natural Humanoid Robot Locomotion}, 
            author={Le Ma and Ziyu Meng and Tengyu Liu and Yuhan Li and Ran Song and Wei Zhang and Siyuan Huang},
            year={2025},
            eprint={2503.15082},
            archivePrefix={arXiv},
            primaryClass={cs.RO},
            url={https://arxiv.org/abs/2503.15082}, 
          }