[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
[Project page]
[Paper]
[Data]
[Colab (state)]
[Colab (vision)]
Cheng Chi1,
Siyuan Feng2,
Yilun Du3,
Zhenjia Xu1,
Eric Cousineau2,
Benjamin Burchfiel2,
Shuran Song1
1Columbia University,
2Toyota Research Institute,
3MIT
Our self-contained Google Colab notebooks is the easiest way to play with Diffusion Policy. We provide separate notebooks for state-based environment and vision-based environment.
For each experiment used to generate Table I,II and IV in the paper, we provide:
config.yaml
that contains all parameters needed to reproduce the experiment.logs.json.txt
for every training step.epoch=*-test_mean_score=*.ckpt
and last latest.ckpt
epoch of each run.Experiment logs are hosted on our website as nested directories in format:
https://diffusion-policy.cs.columbia.edu/data/experiments/<image|low_dim>/<task>/<method>/
Within each experiment directory you may find:
.
βββ config.yaml
βββ metrics
βΒ Β βββ logs.json.txt
βββ train_0
βΒ Β βββ checkpoints
βΒ Β βΒ Β βββ epoch=0300-test_mean_score=1.000.ckpt
βΒ Β βΒ Β βββ latest.ckpt
βΒ Β βββ logs.json.txt
βββ train_1
βΒ Β βββ checkpoints
βΒ Β βΒ Β βββ epoch=0250-test_mean_score=1.000.ckpt
βΒ Β βΒ Β βββ latest.ckpt
βΒ Β βββ logs.json.txt
βββ train_2
βββ checkpoints
βΒ Β βββ epoch=0250-test_mean_score=1.000.ckpt
βΒ Β βββ latest.ckpt
βββ logs.json.txt
The metrics/logs.json.txt
file aggregates evaluation metrics from all 3 training runs every 50 epochs using multirun_metrics.py
. The numbers reported in the paper correspond to max
and k_min_train_loss
aggregation keys.
To download all files in a subdirectory, use:
$ wget --recursive --no-parent --no-host-directories --relative --reject="index.html*" https://diffusion-policy.cs.columbia.edu/data/experiments/low_dim/square_ph/diffusion_policy_cnn/
To reproduce our simulation benchmark results, install our conda environment on a Linux machine with Nvidia GPU. On Ubuntu 20.04 you need to install the following apt packages for mujoco:
$ sudo apt install -y libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf
We recommend Mambaforge instead of the standard anaconda distribution for faster installation:
$ mamba env create -f conda_environment.yaml
but you can use conda as well:
$ conda env create -f conda_environment.yaml
The conda_environment_macos.yaml
file is only for development on MacOS and does not have full support for benchmarks.
Hardware (for Push-T):
Software:
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf
sudo apt install libspnav-dev spacenavd; sudo systemctl start spacenavd
mamba env create -f conda_environment_real.yaml
Under the repo root, create data subdirectory:
[diffusion_policy]$ mkdir data && cd data
Download the corresponding zip file from https://diffusion-policy.cs.columbia.edu/data/training/
[data]$ wget https://diffusion-policy.cs.columbia.edu/data/training/pusht.zip
Extract training data:
[data]$ unzip pusht.zip && rm -f pusht.zip && cd ..
Grab config file for the corresponding experiment:
[diffusion_policy]$ wget -O image_pusht_diffusion_policy_cnn.yaml https://diffusion-policy.cs.columbia.edu/data/experiments/image/pusht/diffusion_policy_cnn/config.yaml
Activate conda environment and login to wandb (if you havenβt already).
[diffusion_policy]$ conda activate robodiff
(robodiff)[diffusion_policy]$ wandb login
Launch training with seed 42 on GPU 0.
(robodiff)[diffusion_policy]$ python train.py --config-dir=. --config-name=image_pusht_diffusion_policy_cnn.yaml training.seed=42 training.device=cuda:0 hydra.run.dir='data/outputs/${now:%Y.%m.%d}/${now:%H.%M.%S}_${name}_${task_name}'
This will create a directory in format data/outputs/yyyy.mm.dd/hh.mm.ss_<method_name>_<task_name>
where configs, logs and checkpoints are written to. The policy will be evaluated every 50 epochs with the success rate logged as test/mean_score
on wandb, as well as videos for some rollouts.
(robodiff)[diffusion_policy]$ tree data/outputs/2023.03.01/20.02.03_train_diffusion_unet_hybrid_pusht_image -I wandb
data/outputs/2023.03.01/20.02.03_train_diffusion_unet_hybrid_pusht_image
βββ checkpoints
β βββ epoch=0000-test_mean_score=0.134.ckpt
β βββ latest.ckpt
βββ .hydra
β βββ config.yaml
β βββ hydra.yaml
β βββ overrides.yaml
βββ logs.json.txt
βββ media
β βββ 2k5u6wli.mp4
β βββ 2kvovxms.mp4
β βββ 2pxd9f6b.mp4
β βββ 2q5gjt5f.mp4
β βββ 2sawbf6m.mp4
β βββ 538ubl79.mp4
βββ train.log
3 directories, 13 files
Launch local ray cluster. For large scale experiments, you might want to setup an AWS cluster with autoscaling. All other commands remain the same.
(robodiff)[diffusion_policy]$ export CUDA_VISIBLE_DEVICES=0,1,2 # select GPUs to be managed by the ray cluster
(robodiff)[diffusion_policy]$ ray start --head --num-gpus=3
Launch a ray client which will start 3 training workers (3 seeds) and 1 metrics monitor worker.
(robodiff)[diffusion_policy]$ python ray_train_multirun.py --config-dir=. --config-name=image_pusht_diffusion_policy_cnn.yaml --seeds=42,43,44 --monitor_key=test/mean_score -- multi_run.run_dir='data/outputs/${now:%Y.%m.%d}/${now:%H.%M.%S}_${name}_${task_name}' multi_run.wandb_name_base='${now:%Y.%m.%d-%H.%M.%S}_${name}_${task_name}'
In addition to the wandb log written by each training worker individually, the metrics monitor worker will log to wandb project diffusion_policy_metrics
for the metrics aggregated from all 3 training runs. Local config, logs and checkpoints will be written to data/outputs/yyyy.mm.dd/hh.mm.ss_<method_name>_<task_name>
in a directory structure identical to our training logs:
(robodiff)[diffusion_policy]$ tree data/outputs/2023.03.01/22.13.58_train_diffusion_unet_hybrid_pusht_image -I 'wandb|media'
data/outputs/2023.03.01/22.13.58_train_diffusion_unet_hybrid_pusht_image
βββ config.yaml
βββ metrics
β βββ logs.json.txt
β βββ metrics.json
β βββ metrics.log
βββ train_0
β βββ checkpoints
β β βββ epoch=0000-test_mean_score=0.174.ckpt
β β βββ latest.ckpt
β βββ logs.json.txt
β βββ train.log
βββ train_1
β βββ checkpoints
β β βββ epoch=0000-test_mean_score=0.131.ckpt
β β βββ latest.ckpt
β βββ logs.json.txt
β βββ train.log
βββ train_2
βββ checkpoints
β βββ epoch=0000-test_mean_score=0.105.ckpt
β βββ latest.ckpt
βββ logs.json.txt
βββ train.log
7 directories, 16 files
Download a checkpoint from the published training log folders, such as https://diffusion-policy.cs.columbia.edu/data/experiments/low_dim/pusht/diffusion_policy_cnn/train_0/checkpoints/epoch=0550-test_mean_score=0.969.ckpt.
Run the evaluation script:
(robodiff)[diffusion_policy]$ python eval.py --checkpoint data/0550-test_mean_score=0.969.ckpt --output_dir data/pusht_eval_output --device cuda:0
This will generate the following directory structure:
(robodiff)[diffusion_policy]$ tree data/pusht_eval_output
data/pusht_eval_output
βββ eval_log.json
βββ media
βββ 1fxtno84.mp4
βββ 224l7jqd.mp4
βββ 2fo4btlf.mp4
βββ 2in4cn7a.mp4
βββ 34b3o2qq.mp4
βββ 3p7jqn32.mp4
1 directory, 7 files
eval_log.json
contains metrics that is logged to wandb during training:
(robodiff)[diffusion_policy]$ cat data/pusht_eval_output/eval_log.json
{
"test/mean_score": 0.9150393806777066,
"test/sim_max_reward_4300000": 1.0,
"test/sim_max_reward_4300001": 0.9872969750774386,
...
"train/sim_video_1": "data/pusht_eval_output//media/2fo4btlf.mp4"
}
Make sure your UR5 robot is running and accepting command from its network interface (emergency stop button within reach at all time), your RealSense cameras plugged in to your workstation (tested with realsense-viewer
) and your SpaceMouse connected with the spacenavd
daemon running (verify with systemctl status spacenavd
).
Start the demonstration collection script. Press βCβ to start recording. Use SpaceMouse to move the robot. Press βSβ to stop recording.
(robodiff)[diffusion_policy]$ python demo_real_robot.py -o data/demo_pusht_real --robot_ip 192.168.0.204
This should result in a demonstration dataset in data/demo_pusht_real
with in the same structure as our example real Push-T training dataset.
To train a Diffusion Policy, launch training with config:
(robodiff)[diffusion_policy]$ python train.py --config-name=train_diffusion_unet_real_image_workspace task.dataset_path=data/demo_pusht_real
Edit diffusion_policy/config/task/real_pusht_image.yaml
if your camera setup is different.
Assuming the training has finished and you have a checkpoint at data/outputs/blah/checkpoints/latest.ckpt
, launch the evaluation script with:
python eval_real_robot.py -i data/outputs/blah/checkpoints/latest.ckpt -o data/eval_pusht_real --robot_ip 192.168.0.204
Press βCβ to start evaluation (handing control over to the policy). Press βSβ to stop the current episode.
This codebase is structured under the requirement that:
N
tasks and M
methods will only require O(N+M)
amount of code instead of O(N*M)
To achieve this requirement, we
These design decisions come at the cost of code repetition between the tasks and the methods. However, we believe that the benefit of being able to add/modify task/methods without affecting the remainder and being able understand a task/method by reading the code linearly outweighs the cost of copying and pasting π.
On the task side, we have:
Dataset
: adapts a (third-party) dataset to the interface.EnvRunner
: executes a Policy
that accepts the interface and produce logs and metrics.config/task/<task_name>.yaml
: contains all information needed to construct Dataset
and EnvRunner
.Env
: an gym==0.21.0
compatible class that encapsulates the task environment.On the policy side, we have:
Policy
: implements inference according to the interface and part of the training process.Workspace
: manages the life-cycle of training and evaluation (interleaved) of a method.config/<workspace_name>.yaml
: contains all information needed to construct Policy
and Workspace
.A LowdimPolicy
takes observation dictionary:
"obs":
Tensor of shape (B,To,Do)
and predicts action dictionary:
"action":
Tensor of shape (B,Ta,Da)
A LowdimDataset
returns a sample of dictionary:
"obs":
Tensor of shape (To, Do)
"action":
Tensor of shape (Ta, Da)
Its get_normalizer
method returns a LinearNormalizer
with keys "obs","action"
.
The Policy
handles normalization on GPU with its copy of the LinearNormalizer
. The parameters of the LinearNormalizer
is saved as part of the Policy
βs weights checkpoint.
A ImagePolicy
takes observation dictionary:
"key0":
Tensor of shape (B,To,*)
"key1":
Tensor of shape e.g. (B,To,H,W,3)
([0,1] float32)and predicts action dictionary:
"action":
Tensor of shape (B,Ta,Da)
A ImageDataset
returns a sample of dictionary:
"obs":
Dict of
"key0":
Tensor of shape (To, *)
"key1":
Tensor fo shape (To,H,W,3)
"action":
Tensor of shape (Ta, Da)
Its get_normalizer
method returns a LinearNormalizer
with keys "key0","key1","action"
.
To = 3
Ta = 4
T = 6
|o|o|o|
| | |a|a|a|a|
|o|o|
| |a|a|a|a|a|
| | | | |a|a|
Terminology in the paper: varname
in the codebase
To|n_obs_steps
Ta|n_action_steps
T|horizon
The classical (e.g. MDP) single step observation/action formulation is included as a special case where To=1
and Ta=1
.
Workspace
A Workspace
object encapsulates all states and code needed to run an experiment.
BaseWorkspace
.OmegaConf
config object generated by hydra
should contain all information needed to construct the Workspace object and running experiments. This config correspond to config/<workspace_name>.yaml
+ hydra overrides.run
method contains the entire pipeline for the experiment.Workspace
level. All training states implemented as object attributes are automatically saved by the save_checkpoint
method.run
method.The entrypoint for training is train.py
which uses @hydra.main
decorator. Read hydraβs official documentation for command line arguments and config overrides. For example, the argument task=<task_name>
will replace the task
subtree of the config with the content of config/task/<task_name>.yaml
, thereby selecting the task to run for this experiment.
Dataset
A Dataset
object:
torch.utils.data.Dataset
.get_normalizer
that returns a LinearNormalizer
conforming to the interface.Normalization is a very common source of bugs during project development. It is sometimes helpful to print out the specific scale
and bias
vectors used for each key in the LinearNormalizer
.
Most of our implementations of Dataset
uses a combination of ReplayBuffer
and SequenceSampler
to generate samples. Correctly handling padding at the beginning and the end of each demonstration episode according to To
and Ta
is important for good performance. Please read our SequenceSampler
before implementing your own sampling method.
Policy
A Policy
object:
BaseLowdimPolicy
or BaseImagePolicy
.predict_action
that given observation dict, predicts actions conforming to the interface.set_normalizer
that takes in a LinearNormalizer
and handles observation/action normalization internally in the policy.compute_loss
that takes in a batch and returns the loss to be optimized.Policy
class correspond to a Workspace
class due to the differences of training and evaluation process between methods.EnvRunner
A EnvRunner
object abstracts away the subtle differences between different task environments.
run
that takes a Policy
object for evaluation, and returns a dict of logs and metrics. Each value should be compatible with wandb.log
.To maximize evaluation speed, we usually vectorize environments using our modification of gym.vector.AsyncVectorEnv
which runs each individual environment in a separate process (workaround python GIL).
β οΈ Since subprocesses are launched using fork
on linux, you need to be specially careful for environments that creates its OpenGL context during initialization (e.g. robosuite) which, once inherited by the child process memory space, often causes obscure bugs like segmentation fault. As a workaround, you can provide a dummy_env_fn
that constructs an environment without initializing OpenGL.
ReplayBuffer
The ReplayBuffer
is a key data structure for storing a demonstration dataset both in-memory and on-disk with chunking and compression. It makes heavy use of the zarr
format but also has a numpy
backend for lower access overhead.
On disk, it can be stored as a nested directory (e.g. data/pusht_cchi_v7_replay.zarr
) or a zip file (e.g. data/robomimic/datasets/square/mh/image_abs.hdf5.zarr.zip
).
Due to the relative small size of our datasets, itβs often possible to store the entire image-based dataset in RAM with Jpeg2000
compression which eliminates disk IO during training at the expense increasing of CPU workload.
Example:
data/pusht_cchi_v7_replay.zarr
βββ data
β βββ action (25650, 2) float32
β βββ img (25650, 96, 96, 3) float32
β βββ keypoint (25650, 9, 2) float32
β βββ n_contacts (25650, 1) float32
β βββ state (25650, 5) float32
βββ meta
βββ episode_ends (206,) int64
Each array in data
stores one data field from all episodes concatenated along the first dimension (time). The meta/episode_ends
array stores the end index for each episode along the fist dimension.
SharedMemoryRingBuffer
The SharedMemoryRingBuffer
is a lock-free FILO data structure used extensively in our real robot implementation to utilize multiple CPU cores while avoiding pickle serialization and locking overhead for multiprocessing.Queue
.
As an example, we would like to get the most recent To
frames from 5 RealSense cameras. We launch 1 realsense SDK/pipeline per process using SingleRealsense
, each continuously writes the captured images into a SharedMemoryRingBuffer
shared with the main process. We can very quickly get the last To
frames in the main process due to the FILO nature of SharedMemoryRingBuffer
.
We also implemented SharedMemoryQueue
for FIFO, which is used in RTDEInterpolationController
.
RealEnv
In contrast to OpenAI Gym, our polices interact with the environment asynchronously. In RealEnv
, the step
method in gym
is split into two methods: get_obs
and exec_actions
.
The get_obs
method returns the latest observation from SharedMemoryRingBuffer
as well as their corresponding timestamps. This method can be call at any time during an evaluation episode.
The exec_actions
method accepts a sequence of actions and timestamps for the expected time of execution for each step. Once called, the actions are simply enqueued to the RTDEInterpolationController
, and the method returns without blocking for execution.
Read and imitate:
diffusion_policy/dataset/pusht_image_dataset.py
diffusion_policy/env_runner/pusht_image_runner.py
diffusion_policy/config/task/pusht_image.yaml
Make sure that shape_meta
correspond to input and output shapes for your task. Make sure env_runner._target_
and dataset._target_
point to the new classes you have added. When training, add task=<your_task_name>
to train.py
βs arguments.
Read and imitate:
diffusion_policy/workspace/train_diffusion_unet_image_workspace.py
diffusion_policy/policy/diffusion_unet_image_policy.py
diffusion_policy/config/train_diffusion_unet_image_workspace.yaml
Make sure your workspace yamlβs _target_
points to the new workspace class you created.
This repository is released under the MIT license. See LICENSE for additional details.
ConditionalUnet1D
implementation is adapted from Planning with Diffusion.TransformerForDiffusion
implementation is adapted from MinGPT.ObservationEncoder
are used extensively in this project.