Adds the ability to use more algorithms from sbx and sb3-contrib (with e.g. MlpPolicy) #163

Ivan-267 · 2023-12-18T22:53:56Z

This change adds an additional sb3 wrapper class that uses a single observation space ("obs"), like our CleanRL implementation, without modifying the original wrapper.

Algorithms like ARS from SB3 Contrib and PPO from SBX (currently doesn't have MultiInputPolicy) can be tested easily on any environment that doesn't require multiple observation spaces (which is often the case for the examples) by using this class.

Usage:

In stable_baselines_3_example.py:

First import the SingleObsSpace variant of the env wrapper:

+ from godot_rl.wrappers.sbg_single_obs_wrapper import SBGSingleObsEnv

Then (after the installation of needed packages with pip), import any algorithms to be used:

- from stable_baselines3 import PPO
+ from sb3_contrib import ARS
+ from sbx import TQC, DroQ, SAC, PPO, DQN, TD3, DDPG

The env just needs its class name replaced to:

env = SBGSingleObsEnv(env_path=args.env_path, show_window=args.viz, seed=args.seed, n_parallel=args.n_parallel, speedup=args.speedup)

And then you can use e.g. the SBX PPO, SB3 Contrib ARS or any other algorithm that may not support the MultiInputPolicy:

    model: PPO = PPO("MlpPolicy",
                     env,
                     verbose=2)

Here's a brief try of starting testing with ARS (the env is slightly modified for some experiments and doesn't have the correct obs, but this was just an attempt to start the training, not for testing learning performance):

ars_training_test.mp4

godot_rl/wrappers/stable_baselines_wrapper_single_obs_space.py

edbeeching

This is great, but I think the code could be more compat. The filename and class name are also enormous, it would be great to shorten them.

…_single_obs_wrapper.py

Ivan-267 · 2023-12-19T19:06:59Z

Thanks for the suggestions, I implemented the solutions and updated the return types in the file.

edbeeching

Thanks for making these changes, LGTM

Ivan-267 · 2023-12-22T13:20:25Z

Thanks for the review. I'll just add the a small change to allow changing the dictionary name from obs to any value, which I found useful while testing CNN usage with the camera example, then I can merge it.

Create stable_baselines_wrapper_single_obs_space.py

f0723fe

Ivan-267 added the enhancement New feature or request label Dec 18, 2023

Ivan-267 changed the title ~~Adds the ability to use more algorithms from sbx and sb3-contrib~~ Adds the ability to use more algorithms from sbx and sb3-contrib (with e.g. MlpPolicy) Dec 18, 2023

edbeeching reviewed Dec 19, 2023

View reviewed changes

godot_rl/wrappers/stable_baselines_wrapper_single_obs_space.py Outdated Show resolved Hide resolved

edbeeching reviewed Dec 19, 2023

View reviewed changes

godot_rl/wrappers/stable_baselines_wrapper_single_obs_space.py Outdated Show resolved Hide resolved

edbeeching reviewed Dec 19, 2023

View reviewed changes

godot_rl/wrappers/stable_baselines_wrapper_single_obs_space.py Outdated Show resolved Hide resolved

edbeeching requested changes Dec 19, 2023

View reviewed changes

Update and rename stable_baselines_wrapper_single_obs_space.py to sbg…

4dddc44

…_single_obs_wrapper.py

Ivan-267 requested a review from edbeeching December 19, 2023 19:07

Remove unused imports, reformat wrapper

14c45b8

edbeeching approved these changes Dec 22, 2023

View reviewed changes

Add dictionary key setting to sbg_single_obs_wrapper.py

f44f851

Ivan-267 merged commit d42acbe into main Dec 22, 2023
0 of 12 checks passed

Ivan-267 deleted the add_single_obs_space_sb3_wrapper branch December 22, 2023 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds the ability to use more algorithms from sbx and sb3-contrib (with e.g. MlpPolicy) #163

Adds the ability to use more algorithms from sbx and sb3-contrib (with e.g. MlpPolicy) #163

Ivan-267 commented Dec 18, 2023 •

edited

Loading

edbeeching left a comment

Ivan-267 commented Dec 19, 2023

edbeeching left a comment

Ivan-267 commented Dec 22, 2023

Adds the ability to use more algorithms from sbx and sb3-contrib (with e.g. MlpPolicy) #163

Adds the ability to use more algorithms from sbx and sb3-contrib (with e.g. MlpPolicy) #163

Conversation

Ivan-267 commented Dec 18, 2023 • edited Loading

Usage:

edbeeching left a comment

Choose a reason for hiding this comment

Ivan-267 commented Dec 19, 2023

edbeeching left a comment

Choose a reason for hiding this comment

Ivan-267 commented Dec 22, 2023

Ivan-267 commented Dec 18, 2023 •

edited

Loading