Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeding update #2422

Merged
merged 33 commits into from
Dec 8, 2021
Merged

Seeding update #2422

merged 33 commits into from
Dec 8, 2021

Conversation

RedTachyon
Copy link
Contributor

See Issue #1663

This is a bit of a ride. The base change is getting rid of the custom seeding utils of gym, and instead using np.random.default_rng() as is recommended with modern versions of NumPy. I kept the gym.utils.seeding.np_random interface and changed it to basically being a synonym for default_rng (with some API difference, consistent with the old np_random)

Because the API (then RandomState, now Generator) changed a bit, np_random.randint calls were replaced with np_random.integers, np_random.rand -> np_random.random, np_random.randn -> np_random.standard_normal. This is all in accordance with the recommended NumPy conversion.

My doubts in order of (subjective) importance

Doubt number 1:

In gym/utils/seeding.py#L18 I'm accessing a "protected" variable seed_seq. This serves accessing the random seed that was automatically generated under the hood when the passed seed is None. (it also gives the correct value if the passed seed is an integer)
An alternative solution would be restoring the whole create_seed machinery which generates a random initial seed from os.urandom. I was unable to find another way to get the initial seed of a default Generator (default_rng(None)) instance.

Doubt number 2:

In gym/spaces/multi_discrete.py#L64. Turns out that a Generator doesn't directly support get_state and set_state. The same functionality seems to be achievable by accesing the bit generator and modifying its state directly (without getters/setters).

Doubt number 3:

As mentioned earlier, this version maintains the gym.utils.seeding file with just a single function. Functionally, I don't think that's a problem, but might feel a bit redundant from a stylistic point of view. This could be replaced by changing something like 17 calls of this function that occur in the codebase, but at the moment I think it'd be a bad move. The reason is that the function passes through the seed that's generated if the passed seed is None (see Doubt 1), which has to be obtained through mildly sketchy means, so it's better to keep it contained within the function. I don't think passing the seed extremely necessary, but that would somewhat significantly change the actual external API, I tend to be hesitant about changes like this if there's no good reason.
Overall I think it's good to keep the seeding.np_random function to keep some consistency with previous versions. The alternative is just completely removing the concept of "gym seeding", and using NumPy. (right now "gym seeding" is basically an interface for NumPy seeding)

Doubt number 4:

Pinging @araffin as there's a possibility this will (again) break some old pickled spaces in certain cases, and I know this was an issue with SB3 and the model zoo. Specifically, if you create a Space with the current master branch, run space.sample() at least once, and then pickle it, it will be pickled with a RandomState instance, which is now considered a legacy generator in NumPy. If you unpickle it using new gym code (i.e. this PR), space.np_random will still point to a RandomState, but the rest of the code expects space.np_random to be a Generator instance, which has a few API changes (see the beginning of this post).

Overall I don't know how important it is for the internal details of gym objects to remain the same - which is more or less necessary for old objects to be unpicklable in new gym versions. There's probably a way of a custom unpickling protocol as a compatibility layer - I'm not sufficiently familiar with this to do it, but I imagine it should be doable on the user side? (i.e. not in gym)

Doubt number 2137: (very low importance)

This doesn't technically solve #2210. IMHO this is absolutely acceptable, because np.random.seed is part of the legacy seeding mechanism, and is officially discouraged by NumPy. Proper usage yields expected results:

import numpy as np
from gym.utils import seeding

user_given_seed = None
np_random, seed = seeding.np_random(user_given_seed)

# since in some places np.random.randn or similar is used we want to seed the global numpy random generator as well
rng = np.random.default_rng(seed)

tl;dr

This definitely needs another set of eyes on it because I'm not confident enough about the nitty-gritty details of NumPy RNG. There are a few things I'm not 100% happy with from a stylistic point of view, but as far as my understanding goes, the functionality is what it should be. There's also the question of supporting old pickled objects, which I think is a whole different topic that needs discussing now that gym is maintained again.

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Sep 26, 2021

Thanks for contributing this PR. Could you add a few test cases on some environments like CartPole-v1 and assert the observations and actions are the same given the same random seed?

@RedTachyon
Copy link
Contributor Author

I think this is already present in gym.envs.tests.test_determinism.test_env (https://github.com/openai/gym/blob/master/gym/envs/tests/test_determinism.py), can you check if that's what you mean?

@XuehaiPan
Copy link
Contributor

XuehaiPan commented Sep 26, 2021

Because the API (then RandomState, now Generator) changed a bit, np_random.randint calls were replaced with np_random.integers, np_random.rand -> np_random.random, np_random.randn -> np_random.standard_normal. This is all in accordance with the recommended NumPy conversion.

As @RedTachyon already noted here, such changes drop backward compatibility for users who using self.np_random to keep reproducibility in their custom environments. Some legacy code would suddenly be unfunctional after upgrading their gym backend version. We should be really careful about that.

In the NumPy API reference (What’s New or Different), the only two benefits I can see are that Generator provides more distribution functions than RandomState and is a little bit faster. However, these new functionalities we have never used in gym by far. Such changes will lead to ugly implementations that are shown in Random sampling (numpy.random) almost everywhere when users use self.np_random:

try:
    rng_integers = rng.integers
except AttributeError:
    rng_integers = rng.randint
a = rng_integers(1000)

to ensure their code is compatible with different versions of gym.


In NEP 19 — Random number generator policy:

All current usages of RandomState will continue to work in perpetuity, though some may be discouraged through documentation. Unit tests can continue to use the full complement of RandomState methods. Mixed RandomState/Generator code can safely share the common BitGenerator state. Unmodified RandomState code can make use of the new features of alternative BitGenerator-like settable streams.

NumPy will continue to keep RandomState as a stable API in the future. As gym is a widely used framework, I think we should provide our interfaces as stable as we can. I don't think it's a good idea to introduce these breaking changes.

@araffin
Copy link
Contributor

araffin commented Sep 26, 2021

As gym is a widely used framework, I think we should provide our interfaces as stable as we can. I don't think is a good idea to introduce these breaking changes.

I second @XuehaiPan on that.
What is the rationale behind that breaking change? (as the feature is not buggy nor deprecated)

@RedTachyon
Copy link
Contributor Author

My main rationale is essentially avoiding technical debt. The RandomState interface is considered legacy now and will not be receiving updates, and instead the Generator is the "correct" way of doing RNG with NumPy since 1.17. I'm not entirely sure what is the level of adoption of this change in the general community, but that's the direction things are moving.

The necessary changes in downstream code are also not particularly difficult, and are also essentially an update in the direction of "best practices".

From what I understand, Generator also behaves better with parallel processes in some way, although I'm unable to find exactly how that works.

Ultimately, there's again the overarching issue of how much backwards-compatibility we want to maintain now that gym is actually maintained again. In a way, breaking changes were always a part of gym due to the environment versioning, and old code that depends e.g. on LunarLander-v0 or v1 had to be updated. Conversely, striving to keep everything compatible with all previous versions will make it more difficult to make actual updates and avoid racking up technical debt.

@XuehaiPan
Copy link
Contributor

XuehaiPan commented Sep 26, 2021

The necessary changes in downstream code are also not particularly difficult, and are also essentially an update in the direction of "best practices".

Users will need to refactor their code and specify something like gym >= 0.21.0 (or any other version). Otherwise, add gym <= 0.20.0 in their requirements.

Nowadays, users only write a small piece of code that build on top of lots of dependency packages. Although such changes in downstream code are not difficult, it needs the whole community to update their code almost simultaneously and push a new version to PyPI and/or conda as quickly as possible. I don't think it is possible to achieve.

@araffin
Copy link
Contributor

araffin commented Sep 26, 2021

My main rationale is essentially avoiding technical debt.

Is it really technical debt? Is Gym currently limited by the current implementation? What features are missing?
(from what I see, gym only uses very basic functions from the random generator)

how much backwards-compatibility we want to maintain now that gym is actually maintained again

Maintained does not mean we need to break code at every release.
See also my comment in #2407 (comment) (quoting @jkterry1 intentions in the "Plans for future maintenance").

In a way, breaking changes were always a part of gym due to the environment versioning, and old code that depends e.g. on LunarLander-v0 or v1 had to be updated.

Breaking changes in environment should occur only when a bug is fixed, only when it is necessary.

Too many things rely on gym and doing breaking changes at every release is the best thing you can do to annoy every user. That would be ok if gym was in an early stage of development but that's not the case anymore.

The only result you will get is people capping gym version to avoid breaking changes: DLR-RM/stable-baselines3#573 and cyprienc/stable-baselines3@e7a48d2

From what I understand, Generator also behaves better with parallel processes in some way, although I'm unable to find exactly how that works.

If this piece of code was a bottleneck in current gym env, I would agree to change it.
But currently no benchmark has been done and the ratio benefits/cost seems pretty low.

@jkterry1
Copy link
Collaborator

-Correct me if I'm wrong here, but the breaking changes here should be for a very minor use case, and I don't believe that it should break SB3 or any other major repo?

-@RedTachyon is there anyway to retain backwards compatibility with a warning instead of breaking things?

@RedTachyon
Copy link
Contributor Author

Hm, so the way I see it there are two issues to deal with, which are somewhat orthogonal.

Number 1: existing codebases (particularly custom environments) can use self.np_random.randn and other legacy methods like this.

Number 2: spaces (and envs? does anyone ever pickle and store envs?) that were pickled in old versions can get wonky if unpickled with a new version.


Re 1: my best idea for a solution is using a very thin wrapper class like this:

class NGenerator:        
    def __init__(self, generator: np.random.Generator):
        self.generator = generator
        
    def __getattr__(self, name):
        return getattr(self.generator, name)
    
    def __repr__(self):
        return repr(self.generator)
    
    def __str__(self):
        return str(self.generator)
    
    def rand(self, *dims):
        # some warning
        return self.generator.random(dims)
    
    def randn(self, *dims):
        return self.generator.standard_normal(dims)
    
    def randint(self, low, high=None, size=None, dtype=int):
        return self.generator.integers(low=low, high=high, size=size, dtype=dtype)
    
    def seed(self, seed: int):
        self.generator = np.random.default_rng(seed)

Add some docstrings, add some warnings, and in principle it should act exactly like the original Generator, except it has these three additional methods, which optionally pass a warning. This would still use the new "correct" RNG, but would allow using it with the legacy interface.

Then the seeding.np_random would return a generator wrapped in this class, and people would still be able to happily use randint etc.

The one thing this can't do is setattr, but doing setattr on a Generator or RandomState object would be incredibly cursed and most likely - not possible. (naively forwarding __setattr__ causes some problems in the init, there's probably a way to do it, but it's probably also not worth it)


Re 2: Do we care about the pickled spaces sampling with the exact same behavior as before pickling? If so, we could extend the wrapper class described above to accept either a Generator or a RandomState and expose a shared API with some isinstance checks. This would be significantly more spaghettified.

That being said, I don't think this is actually important, it feels like an extremely niche use case (i.e. pickle a space or an environment, then unpickle it in the future and require the RNG sequence to be exactly the same).

If we're content with just gracefully unpickling a space and giving it some reasonable self.np_random, maybe extending the __setstate__ so that it just creates a new Generator wrapped in the NGenerator wrapper would be enough? Or even just set it to None and let the lazy seeding do its job.

Note: the NGenerator and self.generator names are completely arbitrary and I'm not attached to them whatsoever.

@jkterry1
Copy link
Collaborator

jkterry1 commented Sep 27, 2021

Your solution in point 1 seems fine to me provided no one else has further objections

Regarding 2, unless there's an application I'm aware of we don't need to worry about that kind of pickling across different Gym versions. All the ones I'm aware of that this would impact only cache these objects instead of permanently saving them.

@araffin
Copy link
Contributor

araffin commented Sep 27, 2021

Number 2: spaces (and envs? does anyone ever pickle and store envs?) that were pickled in old versions can get wonky if unpickled with a new version.

Most env cannot be pickled anyway. But spaces are (to check if the new env matches the one used for training).
However, is np_random used in spaces?

@XuehaiPan
Copy link
Contributor

XuehaiPan commented Sep 27, 2021

I think adding the legacy function alias to the "new" Generator is a good approach for backward compatibility.


For the solution provided in #2422 (comment), I have some advice here:

  1. Warned in numpy.random.Generator:

The function numpy.random.default_rng will instantiate a Generator with numpy’s default BitGenerator.

No Compatibility Guarantee

Generator does not provide a version compatibility guarantee. In particular, as better algorithms evolve the bitstream may change.

We should use an explicit BitGenerator rather than the default of NumPy (which may change between versions).

Currently, the default BitGenerator is numpy.random.PGC64. And noted in Upgrading PCG64 with PCG64DXSM:

We have introduced a new PCG64DXSM BitGenerator that will eventually become the new default BitGenerator implementation used by default_rng in future releases. PCG64DXSM solves the statistical weakness while preserving the performance and the features of PCG64.

Although PCG64DXSM is a better version of PCG64, it is only available since numpy >= 1.21.0. I suggest to explicitly use the Generator(PCG64(seed)) rather than np.random.default_rng(seed).

  1. It is unnecessary to write a new wrapper class for the Generator, and that introduces new technical debt here. Just using inheritance is enough:
from pkg_resources import parse_version

import numpy as np
import gym
from gym.logger import warn


# TODO: Remove this class and make it alias to `Generator` in Gym release 0.25.0
#       RandomNumberGenerator = np.random.Generator
class RandomNumberGenerator(np.random.Generator):
    REMOVE_SINCE = '0.25.0'

    assert parse_version(gym.__version__) < parse_version(REMOVE_SINCE)

    def rand(self, *size):
        warn(f'Function `rng.rand(*size)` is marked as deprecated '
             f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
             f'Please use `Generator.random(size)` instead.')

        return self.random(size)

    random_sample = rand

    def randn(self, *size):
        warn(f'Function `rng.randn(*size)` is marked as deprecated '
             f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
             f'Please use `rng.standard_normal(size)` instead.')

        return self.standard_normal(size)

    def randint(self, low, high=None, size=None, dtype=np.int64):
        warn(f'Function `rng.randint(low, [high, size, dtype])` is marked as deprecated '
             f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
             f'Please use `rng.integers(low, [high, size, dtype])` instead.')

        return self.integers(low=low, high=high, size=size, dtype=dtype)

    random_integers = randint

    def get_state(self):
        warn(f'Function `rng.get_state()` is marked as deprecated '
             f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
             f'Please use `rng.bit_generator.state` instead.')

        return self.bit_generator.state

    def set_state(self, state):
        warn(f'Function `rng.set_state(state)` is marked as deprecated '
             f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
             f'Please use `rng.bit_generator.state = state` instead.')

        self.bit_generator.state = state

    def seed(self, seed=None):
        warn(f'Function `rng.seed(seed)` is marked as deprecated '
             f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
             f'Please use `rng = get_rng(seed)` to create a separated generator instead.')

        self.bit_generator.state = type(self.bit_generator)(seed).state

    rand.__doc__ = np.random.rand.__doc__
    randn.__doc__ = np.random.randn.__doc__
    randint.__doc__ = np.random.randint.__doc__
    get_state.__doc__ = np.random.get_state.__doc__
    set_state.__doc__ = np.random.set_state.__doc__
    seed.__doc__ = np.random.seed.__doc__


# Shortcut for RandomNumberGenerator
RNG = RandomNumberGenerator


def get_rng(seed=None):
    if isinstance(seed, np.random.BitGenerator):
        return RNG(seed)
    if isinstance(seed, np.random.Generator):
        return RNG(seed.bit_generator)  # TODO: just `return seed` after migrating to Gym 0.25.0
    return RNG(np.random.PCG64(seed))

NOTE: The version Gym 0.25.0 here can be any other value. Or we can keep this backward compatibility in perpetuity.

I could create a PR for this if needed.

@RedTachyon
Copy link
Contributor Author

However, is np_random used in spaces?

Very rarely, but yes, I describe it in the original PR post in the part where I tagged you. Tl;dr under some conditions, it's used in calling space.sample()

@RedTachyon
Copy link
Contributor Author

RedTachyon commented Sep 27, 2021

@XuehaiPan I'm fine with this solution too, this should probably just be added directly in this PR, I can do it if there's agreement.

My thinking about my approach was that default_rng is the officially recommended constructor as per numpy/numpy#15391.

I don't necessarily agree that the two solutions are very different from a technical debt point of view, in the end both simply introduce a new class which will be deleted later.

So for the difference in functionality: with a wrapper class, we use default_rng which might not have stable behavior, but from what I understand will have stable API. So we'd automatically get new RNG mechanisms.

With an inherited class, we manage the RNG mechanism manually (for better or for worse), but a very big (subjectively) benefit is that we'd still have isinstance(self.np_random, np.random.Generator), which makes static type checkers happy.

As a minor note regarding the exact code snippet you posted - making a new get_rng function is unnecessary imo, this can all be handled within gym.utils.seeding.np_random

@XuehaiPan
Copy link
Contributor

this should probably just be added directly in this PR, I can do it if there's agreement.

I'm fine with that.

As a minor note regarding the exact code snippet you posted - making a new get_rng function is unnecessary imo, this can all be handled within gym.utils.seeding.np_random.

Agreed. The snippet here is just an example.

gym/utils/seeding.py Outdated Show resolved Hide resolved
@XuehaiPan
Copy link
Contributor

XuehaiPan commented Sep 27, 2021

Continue to #2422 (comment)

Or we can keep this backward compatibility in perpetuity.

Should we keep self.np_random to provide the exact same functionality as np.random, i.e., always have np_random.rand and np_random.randint etc.? Since:

NumPy will continue to keep RandomState as a stable API in the future.

If the answer is Yes, the warnings in #2422 (comment):

warn(f'Function `rng.rand(*size)` is marked as deprecated '
     f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
     f'Please use `Generator.random(size)` instead.')

should be replaced to:

warn('Function `rng.rand(*size)` is marked as legacy.'
     'Please use `Generator.random(size)` instead.')

To notice the users that we don't forbid legacy uses but we don't encourage that.


Otherwise, the name np_random could be misleading after removing the helper class in the future.

>>> self.np_random.randint
AttributeError: 'numpy.random._generator.Generator' object has no attribute 'randint'

>>> np.random.randint
<built-in method randint of numpy.random.mtrand.RandomState object at 0x7f6e4d1ec740>

@jkterry1
Copy link
Collaborator

jkterry1 commented Sep 27, 2021

I would like to keep the warning as is. This is functionality that should be removed in the future, and will have to be as numpy plans to deprecate this eventually. We'll just leave it as a warning for maybe a year (or longer, depending) so there can be a smooth transition period without unnecessary breaks.

@jkterry1
Copy link
Collaborator

Also I'm going to wait and merge this PR until the release after the next one

@XuehaiPan
Copy link
Contributor

XuehaiPan commented Nov 14, 2021

So since the self.np_random is set when you call the first reset, the first env.reset(1) effectively acts as env.seed(1).

Then, another question: what if I want to reseed the environment again as randomly as possible? I will need:

env.np_random = None
env.reset(seed=None)

@RedTachyon
Copy link
Contributor Author

reseed the environment again as randomly as possible

What exactly do you mean by that? And specifically, how is it different from not reseeding it explicitly at all?

(I'll try to address the other things soon)

@XuehaiPan
Copy link
Contributor

XuehaiPan commented Nov 14, 2021

What exactly do you mean by that?

@RedTachyon For example, I what to test my policy for multiple episodes with randomly initialized starting state (reseed the environment at the beginning of each episode with None). See point 3 in #2422 (review).

Reseed the environment with None:

for epi in range(<num_episode>):
    env.np_random = None
    observation = env.reset(seed=None)  # only reseed once here
    done = False
    while not done:
        action = policy(observation)
        observation, reward, done, info = env.step(action)

Otherwise, I'll need to generate multiple seeds first:

seeds = [...]

for seed in seeds:
    env.np_random = None
    observation = env.reset(seed=seed)
    done = False
    while not done:
        action = policy(observation)
        observation, reward, done, info = env.step(action)

@RedTachyon
Copy link
Contributor Author

@XuehaiPan I still don't understand. What's the difference between reseeding "as randomly as possible" (which means you don't control the random seed and the PRNG), versus not reseeding at all (which means you use whatever the state of the PRNG is at the moment, i.e. you don't control the random seed and the PRNG).

I agree that there would be a difference in the specific numbers generated, but in either case they will be not reproducible since you're not controlling the seed.

@XuehaiPan
Copy link
Contributor

XuehaiPan commented Nov 14, 2021

What's the difference between reseeding "as randomly as possible" (which means you don't control the random seed and the PRNG), versus not reseeding at all (which means you use whatever the state of the PRNG is at the moment, i.e. you don't control the random seed and the PRNG).

@RedTachyon I agree with this (point 2 in #2422 (review)).

What I'm concerned about is this seeding update takes away the rights to seed the environment again with None. See point 3 in #2422 (review).

@araffin
Copy link
Contributor

araffin commented Nov 14, 2021

with this PR, what would be the way to seed multiple envs that are in different processes?

before:

seed = 1
# at the beginning of training
# `env.seed(seed + rank)` is called in each process so next call to `reset()` will provide expected obs
vec_env.seed(seed) 

# later in the code, e.g., evaluating the agent
obs = vec_env.reset()

after?

seed = 1
vec_env.reset(seed) # calling reset(seed + rank) without retrieving the obs?

# later in the code, e.g., evaluating the agent
obs = vec_env.reset()

EDIT: the more I look at it, the more this PR seems to be breaking all existing envs for only a minor improvement...

@XuehaiPan
Copy link
Contributor

XuehaiPan commented Nov 14, 2021

I'd suggest keeping the seed() method. If the user wants to seed the environment on reset, just provide a built-in wrapper:

class Env:
    def seed(seed=None):
        # keep this function

    def reset(self, *, **kwargs):  # only accept keyword arguments
        # do not modify this too


class SeedOnReset(gym.Wrapper):
    __marker = object()

    def reset(self, *, seed=__marker, **kwargs):
        if seed is not self.__marker:
            self.seed(seed)
        return super().reset(**kwargs)

We can add wrapper SeedOnReset to register function (like wrapper TimeLimit) if the user register their environment with argument seed_on_reset=True.

@RedTachyon
Copy link
Contributor Author

General note: I won't be discussing why env.seed is being removed/integrated into env.reset, since I'm personally mostly indifferent to this and it's not my decision anyways. I think the comment that brought me towards the current solution is this: #2422 (comment)
Not quite sure if Jesse still stands by this solution (after offline discussions), but it's about the content of the message.

@vwxyzjn Some of the doubts might have been cleared up in the meantime, but as a summary of the changes:

  1. The internals of seeding now use np.random.Generator as opposed to a np.random.RandomState, which follows the recommendations on how to use numpy RNG.
  2. If you never used env.seed, and will not use env.reset(seed=...), the behavior should be exactly the same - you just get a bunch of random numbers you have no control over.
  3. If you used env.seed(...); env.reset(), now you write env.reset(seed=...) and should get the same behavior
  4. If you used env.seed(...) in the middle of an episode, you're out of luck because this doesn't make any sense in a lot of environments. Then again, if you use it like this, you should probably stop anyways. (i.e. this usage is explicitly deprecated; see the rest of the discussion)
  5. How the resetting works now (with this PR):
  • On the very first reset, you either give an explicit seed, which is used to initialize the RNG; or you don't, in which case RNG is initialized "however"
  • On any subsequent reset, you can either give an explicit seed, which restarts the RNG; or you don't, in which case RNG just goes on. This is equivalent to calling env.reset() before the PR without explicitly calling env.seed() before it.
  • In custom environments, you should call super().__init__(seed=...) to let this mechanism happen. Or you can just handle it another way, because some environments use an external process (e.g. Atari, Unity) and not the python-side self.np_random.

@XuehaiPan I still don't see why someone would need to explicitly reseed with None. What's the difference between doing this and doing nothing? Either way you get an uncontrolled PRNG.

@araffin Yea, your after seems about right. I don't see why you'd need to discard those observations, since you'll need them in whatever training procedure anyways, but you can. What's wrong with that version?

@XuehaiPan
Copy link
Contributor

XuehaiPan commented Nov 14, 2021

  1. If you used env.seed(...); env.reset(), now you write env.reset(seed=...) and should get the same behavior
  2. If you used env.seed(...) in the middle of an episode, you're out of luck because this doesn't make any sense in a lot of environments. Then again, if you use it like this, you should probably stop anyways. (i.e. this usage is explicitly deprecated; see the rest of the discussion)
  3. How the resetting works now (with this PR):
  • On the very first reset, you either give an explicit seed, which is used to initialize the RNG; or you don't, in which case RNG is initialized "however"
  • On any subsequent reset, you can either give an explicit seed, which restarts the RNG; or you don't, in which case RNG just goes on. This is equivalent to calling env.reset() before the PR without explicitly calling env.seed() before it.
  • In custom environments, you should call super().__init__(seed=...) to let this mechanism happen. Or you can just handle it another way, because some environments use an external process (e.g. Atari, Unity) and not the python-side self.np_random.

Oppose for these which are minor improvements but backward incompatible. These need the downstream code base to be updated. For example, suppose I'm a downstream package maintainer, I cannot arbitrarily set gym>=0.19.0 (or what ever version before this PR being merged) in requirements.txt or setup,py. I will need to add if gym.__version__ .... or if hasattr(env, 'seed') conditions in my code (same for other packages depend on gym).


@XuehaiPan I still don't see why someone would need to explicitly reseed with None. What's the difference between doing this and doing nothing? Either way you get an uncontrolled PRNG.

When the environment have been seeded, the user cannot seed the environment again with None. Some third-party environments call self.reset() in __init__ to make sure the environments have been reset.

class MyEnv(gym.Env):
    def __init__(self):
        super().__init__()
        self.reset(seed=0)

env = MyEnv()
env.reset(seed=None)  # no effect for the seed

@JesseFarebro
Copy link
Contributor

JesseFarebro commented Nov 14, 2021

My preference would be to keep .seed() over .reset(*, seed). I don't think moving it to reset solves any of the issues I complained about earlier in this thread and it's just as confusing IMO. I think I underestimated all the edge cases people are using seed for (thanks to @XuehaiPan, @araffin, @vwxyzjn, @RedTachyon for the discussions). I do have another proposal:

I feel like this same argument has been going on far longer than Gym. For example, this is an issue in the ALE and we have these two functions cloneState and cloneSystemState. Not very intuitive, but cloneState essentially serializes the state WITHOUT the RNG state (i.e., you'll get a random seed upon restoring the state). cloneSystemState serializes the state WITH the RNG state (i.e., upon restoring you'll get the exact same RNG sequence going forward).

Apart from the questionable naming of the system versus whatever non-system is, perhaps mandating proper serialization and copy support could help with some of these use cases. For example, something like,

env = gym.make('...')

# Makes a copy of the environment where RNG is seeded with 1
env_copy_with_new_rng = env.copy(seed=1)

# Makes a copy but keeps the current RNG state
env_copy = env.copy()

I think this does a couple of things:

  1. This solves my admittedly pedantic annoyances with the seed function and its purpose within Gym. I think it's reasonable to want to make a copy of an environment and potentially seed it. I would probably argue that the default behaviour for pickling etc. should be to serialize with RNG. If you need to do anything special you can always call copy(seed).

  2. Provides "native" support for copying environments post-initialization. As we've discussed in this thread this is something that's important for Gym as environment initialization can be really expensive. Supporting some type of native copy solidifies this is something environments should support. Environment authors can add further features, e.g., multi-agent case if it's really expensive to initialize the environment but we want a different policy for our opponent: copy(opponent=...).

Furthermore, I actually think this clears things up with the case where seed=None is supplied. seed(seed=None) is kind of weird but I understand how this would initialize with an arbitrary seed. I think it's clear that copy(seed=None) means that you should not re-seed the environment when copied. If you really need to be calling .copy(seed) repetitively then perhaps do something like:

from numpy.random import SeedSequence

env = gym.make('...')
ss = SeedSequence()

envs = map(lambda seed: env.copy(seed=seed), ss.spawn(10))

This solution could also be somewhat backwards compatible. You can provide a default implementation for copy as:

def copy(seed=None):
    _self = _copy.deepcopy(self)
    if seed is not None:
        _self.seed(seed)
    return _self

and slowly deprecate seed in favour of passing seed as an argument to the Environment constructor. In the case of the ALE we could now supply seed in __init__, keep .seed() for backwards compatibility, and support a custom .copy() which is more efficient than doing .seed() in our case.

@RedTachyon
Copy link
Contributor Author

I feel like there's a fundamental problem of how much gym should be handling, what assumptions it should make (and enforce), and where it should just users do whatever they want. Obviously any extreme will be bad, since too many assumptions can break some reasonable environments, and if there are no assumptions, then this already exists, and it's called Python.

I don't know to what extent this design work was done by OpenAI at the beginning, but probably not enough. And now we just kinda have to make it up as we go, to figure out the right balance for what actually exists. The only problem is that whenever some design change in either direction seems to make sense, it will likely be a breaking change, and that will inevitably make some people angry. But at the same time, a situation where we're stuck with an outdated architecture just because that's what ended up being made a few years ago is... not ideal.

@XuehaiPan
Copy link
Contributor

XuehaiPan commented Nov 15, 2021

I feel like there's a fundamental problem of how much gym should be handling, what assumptions it should make (and enforce), and where it should just users do whatever they want.

tl;dr: keep the interface as stable as we can. Minimize the impact on downstream code bases.

  • new features: welcome :).
  • new methods / interfaces: fine :).
  • modify old methods / interfaces: no :(. (only when necessary for bug fix)
  • remove old methods / interfaces: no :(. (especially widely used by downstream code)

As an interface maintainer / designer, the interface definition should and must be stable (new: fine; remove/change: no). When doing modifications, keep in mind what impact will be on the downside. Add necessary compatibility layers for it, or add a separated new interface and keep the old one (like numpy.random.Generator onto numpy.random.RandomState).

Suppose I create a new PR: remove observation_space and action_space, rename them to obs_space and act_space, I can imagine everyone will against this.


By now, there are 55 third-party environments listed in Third Party Environments and 23.5k repositories explicitly mark gym as their dependency (Used by 23.5k). The number may be larger since many third-party environments are not open sourced or haven't added to the document yet. And some repositories do not add gym as an explicit dependency but depend on packages that use gym (RLlib, etc.).

This PR contains two breaking changes: 1) replace RandomState with Generator; 2) remove seed method from Env. These will affect all users above (23.5k+). In my opinion, non of these changes is necessary.

Change 1: replace RandomState with Generator (not necessary but fine with compatibility layers)

Commented in #2422 (comment):

In NEP 19 — Random number generator policy:

All current usages of RandomState will continue to work in perpetuity, though some may be discouraged through documentation. Unit tests can continue to use the full complement of RandomState methods. Mixed RandomState/Generator code can safely share the common BitGenerator state. Unmodified RandomState code can make use of the new features of alternative BitGenerator-like settable streams.

NumPy will continue to keep RandomState as a stable API in the future. As gym is a widely used framework, I think we should provide our interfaces as stable as we can. I don't think it's a good idea to introduce these breaking changes.

This change is not necessary. If we want to change the behavior of self.np_random cause NumPy suggest we do so. It's fine but please include compatibility layers (already included in this PR).

Change 2: remove seed method from Env (not necessary for any reason)

As an interface definition, many RL environments are subclasses of gym.Env. That means: given an env instance, it always has:

properties:
    action_space: gym.space.Space
    observation_space: gym.space.Space
methods:
    reset: () -> observation
    step: (action) -> observation, reward, done, info
    close: () -> NoReturn
    seed: (seed) -> List[int]
    render: (mode) -> Any

It is a paradigm of environment designing and almost everyone follows that. It makes the downstream RL learning frameworks and third-party environments much easier to be compatible with each other.

We are welcome to add new methods such as Env.copy(seed=None), but do not remove old methods.

As an interface maintainer / designer, the interface definition should and must be stable (new: fine; remove/change: no). In this PR it is Env.seed. Suppose I create a new PR: remove observation_space and action_space, rename them to obs_space and act_space, I can imagine everyone will against this.

We have seen the painful transition from Python 2 to Python 3. Although there is a migration tool called 2to3, why does it still take so long to migrate? The reason is Python 2 has a large codebase and there is a large community relying on it. Here is the same situation for gym.Env (used by 23.5k+).

If we still want to make changes to gym.Env, please add a separated class with a suffix like gym.EnvV2. Do not touch the old interface. Otherwise, add a compatibility layer like six:

if PY3:
    def iterkeys(d, **kw):
        return iter(d.keys(**kw))

    def itervalues(d, **kw):
        return iter(d.values(**kw))

    def iteritems(d, **kw):
        return iter(d.items(**kw))

    def iterlists(d, **kw):
        return iter(d.lists(**kw))

    viewkeys = operator.methodcaller("keys")

    viewvalues = operator.methodcaller("values")

    viewitems = operator.methodcaller("items")
else:
    def iterkeys(d, **kw):
        return d.iterkeys(**kw)

    def itervalues(d, **kw):
        return d.itervalues(**kw)

    def iteritems(d, **kw):
        return d.iteritems(**kw)

    def iterlists(d, **kw):
        return d.iterlists(**kw)

    viewkeys = operator.methodcaller("viewkeys")

    viewvalues = operator.methodcaller("viewvalues")

    viewitems = operator.methodcaller("viewitems")

Please add some migration variables like gym.GYM025 etc.

@jkterry1
Copy link
Collaborator

jkterry1 commented Nov 20, 2021

@JesseFarebro

I do not believe that that should be added to Gym. Regarding your new comment about Atari not actually being incompatible with .seed(), I'm going to sit down at some point and think about if we still want to to include the breaking change component of the PR given this.

@XuehaiPan (and others)

I disagree that making changes like this is an extraordinarily big deal. Every major Python library deprecates and removes/changes features over time, including ones infinitely bigger than us like NumPy or SciPy. This PR is deprecates a feature that will still continue to function for awhile (e.g. roughly 6 months in our case) before using a new function is required, this happens all the time. Additionally, changing existing code to use the new API here would require incredibly small changes that could be made in minutes, which is different from what happened with Python 3.6. Python makes many changes of this scope in sub versions, e.g. 3.6->3.7. I had a Zoom call with the RLlib developers last week (RLlib is the biggest RL library in the world by PyPI installs and as best I can tell has more Gym code than anyone else in the world) and mentioned this to them, and their response to this PR was "ya sure that's fine." Regarding changing the object name, this isn't really standard for marking these sorts of changes, the standard to mark this would be a major version change in Gym. The problem with this is that semantic versioning is super weird as a construct in Gym, so it's never been used.

@XuehaiPan
Copy link
Contributor

XuehaiPan commented Nov 20, 2021

I disagree that making changes like this is an extraordinarily big deal.

@jkterry1 If the decision is made, it is fine for me. I'm not against any changes to the library. I'm arguing whether these changes are necessary (listed in #2422 (comment)), especially for the core function.

If we are going to remove Env.seed(), I think the implementation of the seeding update in this PR could be better:

  1. move seed to __init__ rather than reset (point 2. in Seeding update #2422 (review)).

  2. came with point 1. above (point 3. in Seeding update #2422 (review)), potential issues for Seeding update #2422 (comment):

    @XuehaiPan I still don't see why someone would need to explicitly reseed with None. What's the difference between doing this and doing nothing? Either way you get an uncontrolled PRNG.

    When the environment have been seeded, the user cannot seed the environment again with None. Some third-party environments call self.reset() in __init__ to make sure the environments have been reset.

    class MyEnv(gym.Env):
        def __init__(self):
            super().__init__()
            self.reset(seed=0)
    
    env = MyEnv()
    env.reset(seed=None)  # no effect for the seed

If we are not going to remove Env.seed but support seeding on reset: see #2422 (comment).


I disagree that making changes like this is an extraordinarily big deal. Every major Python library deprecates and removes/changes features over time, including ones infinitely bigger than us like NumPy or SciPy. This PR is deprecates a feature that will still continue to function for awhile (e.g. roughly 6 months in our case) before using a new function is required, this happens all the time.

@jkterry1 Indeed, some libraries bigger than us deprecate features over time, but they still support the old functionality and just prompt a warning or nothing. How long will the old functionality support? The answer for Python and/or NumPy may be forever, especially for the core functions. For other changes that do not guarantee backward compatibility, they provide migration tools like 2to3 and six.

For example, threadings.Event.isSet was deprecated in favor of name is_set since Python 2.6 (Python 2.6 docs threading.Event), which was released more than ten years ago. But it is still supported by now in Python 3.10:

Python 3.10.0 (default, Oct  4 2021, 17:55:55) [GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import threading
>>> threading.Event.is_set
<function Event.is_set at 0x7fbf1c632050>
>>> threading.Event.isSet
<function Event.isSet at 0x7fbf1c6320e0>
>>>

Another example is the numpy.random.RandomState:

In NEP 19 — Random number generator policy:

All current usages of RandomState will continue to work in perpetuity, though some may be discouraged through documentation. Unit tests can continue to use the full complement of RandomState methods. Mixed RandomState/Generator code can safely share the common BitGenerator state. Unmodified RandomState code can make use of the new features of alternative BitGenerator-like settable streams.

All current usages of RandomState will continue to work in perpetuity.

@XuehaiPan
Copy link
Contributor

BTW, a special reminder for breaking changes without backward compatibilities: the source codes of many academic papers are open-source with warnings:

- "the source code is released as it is without any warranties"

@jkterry1
Copy link
Collaborator

jkterry1 commented Dec 8, 2021

I briefly spoke to people at OpenAI about this and others like Jesse offline and I still believe this is the best course of action going forward (OpenAI is in agreement with this change)

@jkterry1 jkterry1 merged commit c364506 into openai:master Dec 8, 2021
@jkterry1
Copy link
Collaborator

jkterry1 commented Dec 8, 2021

The old seed method will continue to work until the Gym 1.0 release to support legacy code though. And again, updating code to use this change is absolutely tiny. I've spoken to the largest users of the Gym API in the world about this and a few other discussed small API changes, and essentially their positions universally were "ya sure, this isn't a big deal"

@araffin
Copy link
Contributor

araffin commented Dec 9, 2021

The old seed method will continue to work until the Gym 1.0 release to support legacy code though.

Can we at least make the method forever deprecated as numpy does it?

That would be nice for all the 24k projects that already exist... https://github.com/openai/gym/network/dependents?package_id=UGFja2FnZS01MjE5NTAzMA%3D%3D

@efokschaner
Copy link
Contributor

Hi I recently started working with gym, and have been having some fun getting fooled by what's in master with what's released while trying to integrate.

Take this all with a pinch of salt:

It seems that changing reset() from not implemented to having an implementation could be a little incompatible for classes that derive, as they will definitely not be calling the base class implementation currently.

If I could offer a suggestion, it would be to split a pure, implementation-free Env interface out from the common implementation stuff shared amongst envs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants