Ready for testing 🧪 Multi-policy training support #181

Ivan-267 · 2024-04-01T17:32:03Z

Adds support for training multiple policies with Rllib.

Plugin PR: edbeeching/godot_rl_agents_plugin#40
Example env PR: edbeeching/godot_rl_agents_examples#30

TODO:

Fix multiple obs spaces case (should be fixed now but still needs more checking, onnx export doesn't currently work with multiple discrete obs spaces, I think it's configured for a single space, will need to check this at some point as well).

Ivan-267 · 2024-04-03T12:13:35Z

I've done a little testing with some of my previous envs and Jumper Hard with older plugin version, and it seemed to work with both multiagent set to false in yaml config (possible that number of envs per worker should be adjusted manually), or true (not intended for single agent envs due to individual agents being inactivated in rllib after done = true, but should not cause errors due to the compatibility code in GDRLPettingZooWrapper). SB3 seems to work properly after these changes, but I only tested in on a modified version of the multi-agent env for now (made into a single agent compatible version). Further testing is always welcome, especially on Linux and Sample Factory.

LSTM/Attention wrappers work (they show a deprecated warning so possibly accessing them might be different when newer versions of rllib come out), but for exporting we can't use them yet since the state data wouldn't be fed in.

One thing I found that doesn't work well is enabling some exploration options with PPO, one that worked was RE3 with Tensorflow rather than Torch set. Curiosity needs discrete or multidiscrete actions, but didn't seem to work when I switched the env to discrete actions. I think it might be related to the tuple action space, it might not be supported in some of the exploration codes.

Warning

Edit: With the current script, the exported onnx from Rllib doesn't output just action means like our SB3 setup, so the output size is doubled and exported onnx with more than one action won't work correctly. ~~Not yet sure how to solve that so that both onnx export from sb3 and rllib works with different sizes.~~

Edit2: I've just updated the plugin to handle the case above.

godot_rl/wrappers/ray_wrapper.py

edbeeching

LGTM pending review of other PRs

edbeeching · 2024-05-06T19:34:42Z

examples/rllib_example.py

+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(allow_abbrev=False)
+    parser.add_argument("--config_file", default="rllib_config.yaml", type=str, help="The yaml config file")


I think this should be examples/rllib_config.ymal

I usually call the example from within the examples folder, so the default was based on my usage. If calling from GDRL repository directly then it should be changed.

If someone installs GDRL using pip install and then just downloads the example file and config file, they might not have the entire repository, but I'm not sure how common this is.

I leave this up to you, I can definitely change the default.

- Adds supports for exporting envs with multidiscrete actions with sb3 - Multiple obs spaces onnx export support (for sb3) still needs to be worked on in the future

Also removes the previously removed init variables from `tune.register_env()`

Updates rllib doc to include the new process.

Ivan-267 added 3 commits April 1, 2024 19:29

Multi-policy training support added

4cd1b84

Create TRAINING_MULTIPLE_POLICIES.md

e5a5d50

Update TRAINING_MULTIPLE_POLICIES.md

3e080b2

Ivan-267 changed the title ~~WIP🚧 Multi-policy training support~~ Ready for testing 🧪 Multi-policy training support Apr 1, 2024

Update TRAINING_MULTIPLE_POLICIES.md

8a6a8d8

Ivan-267 requested a review from edbeeching April 1, 2024 19:30

Ivan-267 added 6 commits April 3, 2024 19:51

Update hyperparameters in rllib_config.yaml

f4b1d88

Update rllib_config.yaml hyperparameters

beb1203

Auto-set num_envs_per_worker in rllib_example.py

fe60a8b

Update rllib_config.yaml

668b88d

Added basic calculation for train_batch_size to rllib_config.yaml

7ade3b0

Multiple observation spaces fix

0fd70c2

edbeeching reviewed May 6, 2024

View reviewed changes

godot_rl/wrappers/ray_wrapper.py Show resolved Hide resolved

edbeeching approved these changes May 6, 2024

View reviewed changes

edbeeching reviewed May 6, 2024

View reviewed changes

Ivan-267 added 3 commits May 9, 2024 21:46

Adds support for multidiscrete actions with sb3

39c5d91

- Adds supports for exporting envs with multidiscrete actions with sb3 - Multiple obs spaces onnx export support (for sb3) still needs to be worked on in the future

Removes init variables in ray_wrapper.py

322b398

Removes register_env arguments - ray_wrapper.py

e7489ad

Also removes the previously removed init variables from `tune.register_env()`

edbeeching approved these changes May 13, 2024

View reviewed changes

Ivan-267 mentioned this pull request May 13, 2024

Changes to enable discrete actions > 2 #121

Closed

Update ADV_RLLIB.md

26532d4

Updates rllib doc to include the new process.

Ivan-267 merged commit 39852ac into main May 15, 2024
13 checks passed

Ivan-267 deleted the multiagent_experimental branch May 15, 2024 05:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ready for testing 🧪 Multi-policy training support #181

Ready for testing 🧪 Multi-policy training support #181

Ivan-267 commented Apr 1, 2024 •

edited

Loading

Ivan-267 commented Apr 3, 2024 •

edited

Loading

edbeeching left a comment

edbeeching May 6, 2024

Ivan-267 May 6, 2024

Ready for testing 🧪 Multi-policy training support #181

Ready for testing 🧪 Multi-policy training support #181

Conversation

Ivan-267 commented Apr 1, 2024 • edited Loading

Ivan-267 commented Apr 3, 2024 • edited Loading

edbeeching left a comment

Choose a reason for hiding this comment

edbeeching May 6, 2024

Choose a reason for hiding this comment

Ivan-267 May 6, 2024

Choose a reason for hiding this comment

Ivan-267 commented Apr 1, 2024 •

edited

Loading

Ivan-267 commented Apr 3, 2024 •

edited

Loading