Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds the ability to use more algorithms from sbx and sb3-contrib (with e.g. MlpPolicy) #163

Merged
merged 4 commits into from
Dec 22, 2023

Conversation

Ivan-267
Copy link
Collaborator

@Ivan-267 Ivan-267 commented Dec 18, 2023

This change adds an additional sb3 wrapper class that uses a single observation space ("obs"), like our CleanRL implementation, without modifying the original wrapper.

Algorithms like ARS from SB3 Contrib and PPO from SBX (currently doesn't have MultiInputPolicy) can be tested easily on any environment that doesn't require multiple observation spaces (which is often the case for the examples) by using this class.

Usage:

In stable_baselines_3_example.py:

First import the SingleObsSpace variant of the env wrapper:

+ from godot_rl.wrappers.sbg_single_obs_wrapper import SBGSingleObsEnv

Then (after the installation of needed packages with pip), import any algorithms to be used:

- from stable_baselines3 import PPO
+ from sb3_contrib import ARS
+ from sbx import TQC, DroQ, SAC, PPO, DQN, TD3, DDPG

The env just needs its class name replaced to:

env = SBGSingleObsEnv(env_path=args.env_path, show_window=args.viz, seed=args.seed, n_parallel=args.n_parallel, speedup=args.speedup)

And then you can use e.g. the SBX PPO, SB3 Contrib ARS or any other algorithm that may not support the MultiInputPolicy:

    model: PPO = PPO("MlpPolicy",
                     env,
                     verbose=2)

Here's a brief try of starting testing with ARS (the env is slightly modified for some experiments and doesn't have the correct obs, but this was just an attempt to start the training, not for testing learning performance):

ars_training_test.mp4

@Ivan-267 Ivan-267 added the enhancement New feature or request label Dec 18, 2023
@Ivan-267 Ivan-267 changed the title Adds the ability to use more algorithms from sbx and sb3-contrib Adds the ability to use more algorithms from sbx and sb3-contrib (with e.g. MlpPolicy) Dec 18, 2023
Copy link
Owner

@edbeeching edbeeching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, but I think the code could be more compat. The filename and class name are also enormous, it would be great to shorten them.

@Ivan-267
Copy link
Collaborator Author

Thanks for the suggestions, I implemented the solutions and updated the return types in the file.

@Ivan-267 Ivan-267 requested a review from edbeeching December 19, 2023 19:07
Copy link
Owner

@edbeeching edbeeching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these changes, LGTM

@Ivan-267
Copy link
Collaborator Author

Thanks for the review. I'll just add the a small change to allow changing the dictionary name from obs to any value, which I found useful while testing CNN usage with the camera example, then I can merge it.

@Ivan-267 Ivan-267 merged commit d42acbe into main Dec 22, 2023
0 of 12 checks passed
@Ivan-267 Ivan-267 deleted the add_single_obs_space_sb3_wrapper branch December 22, 2023 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants