Skip to content

Commit

Permalink
Merge pull request #126 from edbeeching/feature/experiment-name
Browse files Browse the repository at this point in the history
Feature: experiment_dir + experiment_name
  • Loading branch information
visuallization authored Jul 16, 2023
2 parents 85b787a + e377033 commit cfa756c
Show file tree
Hide file tree
Showing 12 changed files with 62 additions and 38 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ You may need to example run permissions on the game executable. `chmod +x exampl
3. Train and visualize

```bash
gdrl --env=gdrl --env_path=examples/godot_rl_JumperHard/bin/JumperHard.x86_64 --viz
gdrl --env=gdrl --env_path=examples/godot_rl_JumperHard/bin/JumperHard.x86_64 --experiment_name=Experiment_01 --viz
```

### In editor interactive training
Expand Down
2 changes: 1 addition & 1 deletion docs/ADV_RLLIB.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ chmod +x examples/godot_rl_<ENV_NAME>/bin/<ENV_NAME>.x86_64 # linux example
• Train a model from scratch:

```
gdrl --trainer=rllib --env=gdrl --env_path=examples/godot_rl_<ENV_NAME>/bin/<ENV_NAME>.x86_64 --speedup=8 --viz
gdrl --trainer=rllib --env=gdrl --env_path=examples/godot_rl_<ENV_NAME>/bin/<ENV_NAME>.x86_64 --speedup=8 --experiment_name=Experiment_01 --viz
```

By default rllib will use the hyperparameters in the **ppo_test.yaml** file on the github repo. You can either modify this file, or create your own one.
Expand Down
12 changes: 6 additions & 6 deletions docs/ADV_SAMPLE_FACTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ chmod +x examples/godot_rl_<ENV_NAME>/bin/<ENV_NAME>.x86_64 # linux example
• Train a model from scratch:

```bash
gdrl --trainer=sf --env=gdrl --env_path=examples/godot_rl_<ENV_NAME>/bin/<ENV_NAME>.x86_64 --num_workers=10 --experiment=BallChase --viz --speedup=8 --batched_sampling=True
gdrl --trainer=sf --env=gdrl --env_path=examples/godot_rl_<ENV_NAME>/bin/<ENV_NAME>.x86_64 --num_workers=10 --experiment_name=BallChase --viz --speedup=8 --batched_sampling=True
```

• Download a pretrained checkpoint from the HF hub:
Expand All @@ -57,7 +57,7 @@ python -m sample_factory.huggingface.load_from_hub -r edbeeching/sample_factory_
• Visualize a trained model:

```bash
gdrl --trainer=sf --env=gdrl --env_path=examples/godot_rl_<ENV_NAME>/bin/<ENV_NAME>.x86_64 --num_workers=1 --experiment=<ENV_NAME> --viz --eval --batched_sampling=True --speedup=8 --push_to_hub --hf_repository=<HF_USERNAME>/sample_factory_<ENV_NAME>
gdrl --trainer=sf --env=gdrl --env_path=examples/godot_rl_<ENV_NAME>/bin/<ENV_NAME>.x86_64 --num_workers=1 --experiment_name=<ENV_NAME> --viz --eval --batched_sampling=True --speedup=8 --push_to_hub --hf_repository=<HF_USERNAME>/sample_factory_<ENV_NAME>
```

## Advanced Environment Usage
Expand All @@ -74,7 +74,7 @@ chmod +x examples/godot_rl_Racer/bin/Racer.x86_64 # linux example
• Train a model from scratch:

```bash
gdrl--trainer=sf --env=gdrl --env_path=examples/godot_rl_Racer/bin/Racer.x86_64 --train_for_env_steps=10000000 --experiment=Racer --reward_scale=0.01 --worker_num_splits=2 --num_envs_per_worker=2 --num_workers=40 --speedup=8 --batched_sampling=True --batch_size=2048 --num_batches_per_epoch=2 --num_epochs=2 --learning_rate=0.0001 --exploration_loss_coef=0.0001 --lr_schedule=kl_adaptive_epoch --lr_schedule_kl_threshold=0.04 --use_rnn=True --recurrence=32
gdrl--trainer=sf --env=gdrl --env_path=examples/godot_rl_Racer/bin/Racer.x86_64 --train_for_env_steps=10000000 --experiment_name=Racer --reward_scale=0.01 --worker_num_splits=2 --num_envs_per_worker=2 --num_workers=40 --speedup=8 --batched_sampling=True --batch_size=2048 --num_batches_per_epoch=2 --num_epochs=2 --learning_rate=0.0001 --exploration_loss_coef=0.0001 --lr_schedule=kl_adaptive_epoch --lr_schedule_kl_threshold=0.04 --use_rnn=True --recurrence=32
```

• Download a pretrained checkpoint from the HF hub:
Expand All @@ -86,7 +86,7 @@ python -m sample_factory.huggingface.load_from_hub -r edbeeching/sample_factory_
• Visualize a trained model:

```bash
gdrl --trainer=sf --env=gdrl --env_path=examples/godot_rl_Racer/bin/Racer.x86_64 --num_workers=1 --experiment=Racer --viz --eval --batched_sampling=True --speedup=8 --push_to_hub --hf_repository=edbeeching/sample_factory_Racer
gdrl --trainer=sf --env=gdrl --env_path=examples/godot_rl_Racer/bin/Racer.x86_64 --num_workers=1 --experiment_name=Racer --viz --eval --batched_sampling=True --speedup=8 --push_to_hub --hf_repository=edbeeching/sample_factory_Racer
```

### Usage instructions for env **MultiAgent FPS**
Expand All @@ -101,7 +101,7 @@ chmod +x examples/godot_rl_FPS/bin/FPS.x86_64 # linux example
• Train a model from scratch:

```bash
gdrl --trainer=sf --env=gdrl --env_path=examples/godot_rl_FPS/bin/FPS.x86_64 --num_workers=10 --experiment=FPS --viz --batched_sampling=True --speedup=8 --num_workers=80 --batched_sampling=False --num_policies=4 --with_pbt=True --pbt_period_env_steps=1000000 --pbt_start_mutation=1000000 --batch_size=2048 --num_batches_per_epoch=2 --num_epochs=2 --learning_rate=0.00005 --exploration_loss_coef=0.001 --lr_schedule=kl_adaptive_epoch --lr_schedule_kl_threshold=0.08 --use_rnn=True --recurrence=32
gdrl --trainer=sf --env=gdrl --env_path=examples/godot_rl_FPS/bin/FPS.x86_64 --num_workers=10 --experiment_name=FPS --viz --batched_sampling=True --speedup=8 --num_workers=80 --batched_sampling=False --num_policies=4 --with_pbt=True --pbt_period_env_steps=1000000 --pbt_start_mutation=1000000 --batch_size=2048 --num_batches_per_epoch=2 --num_epochs=2 --learning_rate=0.00005 --exploration_loss_coef=0.001 --lr_schedule=kl_adaptive_epoch --lr_schedule_kl_threshold=0.08 --use_rnn=True --recurrence=32
```

• Download a pretrained checkpoint from the HF hub:
Expand All @@ -113,7 +113,7 @@ python -m sample_factory.huggingface.load_from_hub -r edbeeching/sample_factory_
• Visualize a trained model:

```bash
gdrl --trainer=sf --env=gdrl --env_path=examples/godot_rl_FPS/bin/FPS.x86_64 --num_workers=1 --experiment=FPS --viz --eval --batched_sampling=True --speedup=8 --push_to_hub --hf_repository=edbeeching/sample_factory_FPS
gdrl --trainer=sf --env=gdrl --env_path=examples/godot_rl_FPS/bin/FPS.x86_64 --num_workers=1 --experiment_name=FPS --viz --eval --batched_sampling=True --speedup=8 --push_to_hub --hf_repository=edbeeching/sample_factory_FPS
```

## Training on a cluster
Expand Down
4 changes: 2 additions & 2 deletions docs/ADV_STABLE_BASELINES_3.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,14 @@ chmod +x examples/godot_rl_<ENV_NAME>/bin/<ENV_NAME>.x86_64 # linux example
### Train a model from scratch:

```bash
gdrl --env=gdrl --env_path=examples/godot_rl_<ENV_NAME>/bin/<ENV_NAME>.x86_64 --viz
gdrl --env=gdrl --env_path=examples/godot_rl_<ENV_NAME>/bin/<ENV_NAME>.x86_64 --experiment_name=Experiment_01 --viz
```

While the default options for sb3 work reasonably well. You may be interested in changing the hyperparameters.

We recommend taking the [sb3 example](https://github.com/edbeeching/godot_rl_agents/blob/main/examples/stable_baselines3_example.py) and modifying to match your needs.

This example exposes more parameter for the user to configure, such as `--speedup` to run the environment faster than realtime and the `n_parallel` to launch several instances of the game executable in order to accelerate training (not available for in-editor training).
This example exposes more parameter for the user to configure, such as `--speedup` to run the environment faster than realtime and the `--n_parallel` to launch several instances of the game executable in order to accelerate training (not available for in-editor training).


```python
Expand Down
8 changes: 4 additions & 4 deletions docs/EXAMPLE_ENVIRONMENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ For the current version, we provide 4 example environments, located in **envs/ex
### Example training:
The agent can be trained with the following command:
```
gdrl --env_path envs/builds/JumperHard/jumper_hard.x86_64 --config_file envs/configs/ppo_config_jumper_hard.yaml
gdrl --env_path envs/builds/JumperHard/jumper_hard.x86_64 --config_file envs/configs/ppo_config_jumper_hard.yaml --experiment_name=Experiment_01
```
Training logs will be output by default to **/home/USER/ray_results/PPO/jumper_hard/**
You can monitor training curves etc with tensorboard
Expand Down Expand Up @@ -75,7 +75,7 @@ gdrl --env_path envs/builds/JumperHard/jumper_hard.x86_64 --eval --restore envs/
### Example training:
The agent can be trained with the following command:
```
gdrl --env_path envs/builds/BallChase/ball_chase.x86_64 --config_file envs/configs/ppo_config_ball_chase.yaml
gdrl --env_path envs/builds/BallChase/ball_chase.x86_64 --config_file envs/configs/ppo_config_ball_chase.yaml --experiment_name=BallChase_01
```
Training logs will be output by default to **/home/USER/ray_results/PPO/ball_chase/**
You can monitor training curves etc with tensorboard
Expand Down Expand Up @@ -112,7 +112,7 @@ gdrl --env_path envs/builds/BallChase/ball_chase.x86_64 --eval --restore envs/ch
### Example training:
The agent can be trained with the following command:
```
gdrl --env_path envs/builds/FlyBy/fly_by.x86_64 --config_file envs/configs/ppo_config_fly_by.yaml
gdrl --env_path envs/builds/FlyBy/fly_by.x86_64 --config_file envs/configs/ppo_config_fly_by.yaml --experiment_name=FlyBy_01
```
Training logs will be output by default to **/home/USER/ray_results/PPO/fly_by/**
You can monitor training curves etc with tensorboard
Expand Down Expand Up @@ -153,7 +153,7 @@ gdrl --env_path envs/builds/FlyBy/fly_by.x86_64 --eval --restore envs/checkpoint
### Example training:
The agent can be trained with the following command:
```
gdrl --env_path envs/builds/SpaceShooter/space_shooter.x86_64 --config_file envs/configs/ppo_config_space_shooter.yaml
gdrl --env_path envs/builds/SpaceShooter/space_shooter.x86_64 --config_file envs/configs/ppo_config_space_shooter.yaml --experiment_name=Shooter_01
```
Training logs will be output by default to **/home/USER/ray_results/PPO/space_shooter/**
You can monitor training curves etc with tensorboard
Expand Down
10 changes: 6 additions & 4 deletions examples/clean_rl_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@
def parse_args():
# fmt: off
parser = argparse.ArgumentParser()
parser.add_argument("--exp-name", type=str, default=os.path.basename(__file__).rstrip(".py"),
help="the name of this experiment")
parser.add_argument("--experiment_dir", default="logs/cleanrl", type=str,
help="The name of the the experiment directory, in which the tensorboard logs are getting stored")
parser.add_argument("--experiment_name", default=os.path.basename(__file__).rstrip(".py"), type=str,
help="The name of the the experiment, which will be displayed in tensborboard")
parser.add_argument("--seed", type=int, default=1,
help="seed of the experiment")
parser.add_argument("--torch-deterministic", type=lambda x: bool(strtobool(x)), default=True, nargs="?", const=True,
Expand Down Expand Up @@ -124,7 +126,7 @@ def get_action_and_value(self, x, action=None):

if __name__ == "__main__":
args = parse_args()
run_name = f"{args.env_path}__{args.exp_name}__{args.seed}__{int(time.time())}"
run_name = f"{args.experiment_name}__{args.seed}__{int(time.time())}"
if args.track:
import wandb

Expand All @@ -137,7 +139,7 @@ def get_action_and_value(self, x, action=None):
# monitor_gym=True, no longer works for gymnasium
save_code=True,
)
writer = SummaryWriter(f"runs/{run_name}")
writer = SummaryWriter(f"{args.experiment_dir}/{run_name}")
writer.add_text(
"hyperparameters",
"|param|value|\n|-|-|\n%s" % ("\n".join([f"|{key}|{value}|" for key, value in vars(args).items()])),
Expand Down
16 changes: 14 additions & 2 deletions examples/stable_baselines3_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,18 @@
type=str,
help="The Godot binary to use, do not include for in editor training",
)
parser.add_argument(
"--experiment_dir",
default="logs/sb3",
type=str,
help="The name of the the experiment directory, in which the tensorboard logs are getting stored",
)
parser.add_argument(
"--experiment_name",
default="Experiment",
type=str,
help="The name of the the experiment, which will be displayed in tensborboard",
)
parser.add_argument(
"--onnx_export_path",
default=None,
Expand All @@ -32,8 +44,8 @@
env = StableBaselinesGodotEnv(env_path=args.env_path, show_window=True, n_parallel=args.n_parallel, speedup=args.speedup)
env = VecMonitor(env)

model = PPO("MultiInputPolicy", env, ent_coef=0.0001, verbose=2, n_steps=32, tensorboard_log="logs/log")
model.learn(1000000)
model = PPO("MultiInputPolicy", env, ent_coef=0.0001, verbose=2, n_steps=32, tensorboard_log=args.experiment_dir)
model.learn(1000000, tb_log_name=args.experiment_name)

print("closing env")
env.close()
Expand Down
16 changes: 4 additions & 12 deletions godot_rl/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
try:
from godot_rl.wrappers.ray_wrapper import rllib_training
except ImportError as e:
print("Warning: ", e)
def rllib_training(args, extras):
print("Import error when trying to use rllib. If you have not installed the package, try: pip install godot-rl[rllib]")
print("Otherwise try fixing the error above.")
Expand All @@ -34,7 +33,6 @@ def rllib_training(args, extras):
try:
from godot_rl.wrappers.stable_baselines_wrapper import stable_baselines_training
except ImportError as e:
print("Warning: ", e)
def stable_baselines_training(args, extras):
print(
"Import error when trying to use sb3. If you have not installed the package, try: pip install godot-rl[sb3]"
Expand All @@ -44,7 +42,6 @@ def stable_baselines_training(args, extras):
try:
from godot_rl.wrappers.sample_factory_wrapper import sample_factory_training, sample_factory_enjoy
except ImportError as e:
print("Warning: ", e)
def sample_factory_training(args, extras):
print(
"Import error when trying to use sample-factory If you have not installed the package, try: pip install godot-rl[sf]"
Expand All @@ -54,21 +51,16 @@ def sample_factory_training(args, extras):

def get_args():
parser = argparse.ArgumentParser(allow_abbrev=False)
parser.add_argument(
"--trainer",
default="sb3",
choices=["sb3", "sf", "rllib"],
type=str,
help="framework to use (rllib or stable-baselines)",
)
parser.add_argument("--trainer", default="sb3", choices=["sb3", "sf", "rllib"], type=str, help="framework to use (rllib, sf, sb3)")
parser.add_argument("--env_path", default=None, type=str, help="Godot binary to use")
parser.add_argument("--config_file", default="ppo_test.yaml", type=str, help="The yaml config file (used by rllib)")
parser.add_argument("--config_file", default="ppo_test.yaml", type=str, help="The yaml config file [only for rllib]")
parser.add_argument("--restore", default=None, type=str, help="the location of a checkpoint to restore from")
parser.add_argument("--eval", default=False, action="store_true", help="whether to eval the model")
parser.add_argument("--speedup", default=1, type=int, help="whether to speed up the physics in the env")
parser.add_argument("--export", default=False, action="store_true", help="wheter to export the model")
parser.add_argument("--num_gpus", default=None, type=int, help="Number of GPUs to use [only for rllib]")
parser.add_argument("--experiment_name", default=None, type=str, help="The name of the experiment [only for rllib]")
parser.add_argument("--experiment_dir", default=None, type=str, help="The name of the the experiment directory, in which the tensorboard logs are getting stored")
parser.add_argument("--experiment_name", default=None, type=str, help="The name of the the experiment, which will be displayed in tensborboard")
parser.add_argument("--viz", default=False, action="store_true", help="Whether to visualize one process")

return parser.parse_known_args()
Expand Down
1 change: 1 addition & 0 deletions godot_rl/wrappers/ray_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ def rllib_training(args, extras):
checkpoint_freq=checkpoint_freq,
checkpoint_at_end=not args.eval,
restore=args.restore,
local_dir=args.experiment_dir or "logs/rllib",
trial_name_creator=lambda trial: f"{args.experiment_name}" if args.experiment_name else f"{trial.trainable_name}_{trial.trial_id}"
)
if args.export:
Expand Down
15 changes: 15 additions & 0 deletions godot_rl/wrappers/sample_factory_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,13 +164,28 @@ def add_gdrl_env_args(_env, p: argparse.ArgumentParser, evaluation=False):
type=int,
help="Num agents in each envpool (if used)",
)
p.add_argument(
"--experiment_dir",
default="logs/sf",
type=str,
help="The name of the the experiment directory, in which the tensorboard logs are getting stored",
)
p.add_argument(
"--experiment_name",
default=None,
type=str,
help="The name of the the experiment, which will be displayed in tensborboard",
)


def parse_gdrl_args(argv=None, evaluation=False):
parser, partial_cfg = parse_sf_args(argv=argv, evaluation=evaluation)
add_gdrl_env_args(partial_cfg.env, parser, evaluation=evaluation)
gdrl_override_defaults(partial_cfg.env, parser)
final_cfg = parse_full_cfg(parser, argv)
args, _ = parser.parse_known_args()
final_cfg.train_dir = args.experiment_dir or "logs/sf"
final_cfg.experiment = args.experiment_name or final_cfg.experiment
return final_cfg


Expand Down
6 changes: 4 additions & 2 deletions godot_rl/wrappers/stable_baselines_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env.base_vec_env import VecEnv
from stable_baselines3.common.vec_env.vec_monitor import VecMonitor
from typing import Any, Dict, List, Optional, Tuple, Union

from godot_rl.core.godot_env import GodotEnv
Expand Down Expand Up @@ -128,6 +129,7 @@ def step_wait(self) -> Tuple[Dict[str, np.ndarray], np.ndarray, np.ndarray, List
def stable_baselines_training(args, extras, n_steps: int = 200000, **kwargs) -> None:
# Initialize the custom environment
env = StableBaselinesGodotEnv(env_path=args.env_path, show_window=args.viz, speedup=args.speedup, **kwargs)
env = VecMonitor(env)

# Initialize the PPO model
model = PPO(
Expand All @@ -136,11 +138,11 @@ def stable_baselines_training(args, extras, n_steps: int = 200000, **kwargs) ->
ent_coef=0.0001,
verbose=2,
n_steps=32,
tensorboard_log="logs/log",
tensorboard_log=args.experiment_dir or "logs/sb3",
)

# Train the model
model.learn(n_steps)
model.learn(n_steps, tb_log_name=args.experiment_name)

print("closing env")
env.close()
8 changes: 4 additions & 4 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ sb3 =
huggingface_sb3

sf =
sample-factory
sample-factory==2.0.3
gym==0.26.2

rllib =
Expand All @@ -66,9 +66,9 @@ all =
numpy==1.23.5
gym==0.26.2
stable-baselines3==1.2.0
huggingface_sb3
sample-factory

sample-factory==2.0.3
ray==2.2.0
ray[rllib]

huggingface_sb3
tensorflow_probability

0 comments on commit cfa756c

Please sign in to comment.