Generative Factor Chaining

TL;DR: We present Generative Factor Chaining (GFC), a novel approach using spatial-temporal factor graphs to compose modular generative models for solving long-horizon bimanual coordination tasks.

In the realm of challenging long-horizon planning tasks involving multiple manipulators, existing methods encounter computational scalability issues or require an impractical amount of training data. To address these limitations, we present Generative Factor Chaining (GFC), a novel approach based on modularized generative models for learning and composing skills in complex tasks. Our proposed method treats a long-horizon planning task in a complex scene as a spatial-temporal factor graph, where nodes represent objects in the scene and factors denote constraints/skills that connect different objects. By employing the diffusion model framework, different factors can be jointly learned using individual skill data, which is readily obtainable. During inference, these factors can be flexibly composed, possibly with additional constraints, to achieve long-horizon planning. The modular design of GFC enables generalization to unseen planning tasks. We showcase the advantages of our method through real-world experiments.

Task 1: The goal is to place the hammer in the box. The hammer and the box are out of workspace for any single arm. Hence, a handover is required to complete the task.

Task 2: The goal is to handover the hammer to the other arm, pick the nail and hammer it into the green brick. The success of task depends on where the hammer is grasped during handover.

Task 3: The goal is to pick both the mugs and align them such that contents from one can be poured onto the other. Both the arm must coordinate to achieve the desired alignment.

Task 4: The goal is to reorient the pot at the given target angle. While each arm can grasp the pot, they need to coordinate to achieve the desired orientation.

The primary contributions of this work encompass:

A generalized task representation to formulate complex long-horizon coordination tasks as a spatial-temporal factor graph of single-arm manipulation skill sequences connected via spatial dependencies.
A compositional framework to compose short-horizon skill-level transition distributions learned via diffusion models to represent long-horizon task-level distributions.
Easy plug-and-play via learning skill distributions with skill-level data only and add it to the skill library. Any skill from the library can be plugged as temporal factors in the spatial-temporal factor graph directly at inference for a given long-horizon task.

BibTeX

@inproceedings{
      mishra2024generative,
      title={Generative Factor Chaining: Coordinated Manipulation with Diffusion-based Factor Graph},
      author={Utkarsh Aashu Mishra and Yongxin Chen and Danfei Xu},
      booktitle={8th Annual Conference on Robot Learning},
      year={2024},
      url={https://openreview.net/forum?id=p6Wq6TjjHH}
      }

Generative Factor Chaining:
Coordinated Manipulation with Diffusion-based Factor Graph

CoRL 2024, Munich, Germany

Abstract

Experiments

Evaluation on bimanual coordination tasks

Contribution

BibTeX