A Modular Library for Off-Policy Reinforcement Learning with a focus on SafeRL and distributed computing

134
28
Python

Description

A Modular Library for Off-Policy Reinforcement Learning with a focus on SafeRL and distributed computing. The code supports SafetyGymnasium environment set for giving a starting point developing SafeRL solutions. Distributed setting is implemented via pika library and will be improved in the near future.

Code style: black
codecov

Roadmap 🏗

  • [x] Support for SafetyGymnasium
  • [x] Style and readability improvements
  • [ ] REDQ, DrQ Algorithms support
  • [ ] Distributed Training Improvements

In a Snapshot

Environments Support

DMControl Suite SafetyGymnasium Gymnasium

Algorithms

DDPG TD3 SAC TQC

Installation

The project supports uv for package managment and ruff for formatting checks. To install it via uv in virutalenv:

uv venv
source .venv/bin/activate
uv sync

Installing SafetyGymnasium

For working with SafetyGymnasium install it manually

git clone https://github.com/PKU-Alignment/safety-gymnasium
cd safety-gymnasium && uv pip install -e .

Tests

To run tests locally:

uv pip install pytest
uv run pytest tests/functional

RL Training

All training is set via python config files located in configs folder. To make your own configuration, change the code there or create a similar one. During training, all the code is copied to logs folder to ensure full experimental reproducibility.

Single Agent

To run DDPG in a single process

python configs/ddpg.py --env walker-walk

Distributed

Run RabbitMQ

docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3.12-management

Run training

python configs/distrib_ddpg.py --env walker-walk

Results

Results for single process DDPG and TQC:
ddpg_tqc_eval

Cite

OPRL

@inproceedings{
  kuznetsov2024safer,
  title={Safer Reinforcement Learning by Going Off-policy: a Benchmark},
  author={Igor Kuznetsov},
  booktitle={ICML 2024 Next Generation of AI Safety Workshop},
  year={2024},
  url={https://openreview.net/forum?id=pAmTC9EdGq}
}

SafetyGymnasium

@inproceedings{ji2023safety,
  title={Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark},
  author={Jiaming Ji and Borong Zhang and Jiayi Zhou and Xuehai Pan and Weidong Huang and Ruiyang Sun and Yiran Geng and Yifan Zhong and Josef Dai and Yaodong Yang},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2023},
  url={https://openreview.net/forum?id=WZmlxIuIGR}
}