-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeding update #2422
Seeding update #2422
Conversation
…ault_rng. Let's see if tests pass
… with None as seed
Thanks for contributing this PR. Could you add a few test cases on some environments like |
I think this is already present in |
As @RedTachyon already noted here, such changes drop backward compatibility for users who using In the NumPy API reference (What’s New or Different), the only two benefits I can see are that try:
rng_integers = rng.integers
except AttributeError:
rng_integers = rng.randint
a = rng_integers(1000) to ensure their code is compatible with different versions of In NEP 19 — Random number generator policy:
NumPy will continue to keep |
I second @XuehaiPan on that. |
My main rationale is essentially avoiding technical debt. The RandomState interface is considered legacy now and will not be receiving updates, and instead the Generator is the "correct" way of doing RNG with NumPy since 1.17. I'm not entirely sure what is the level of adoption of this change in the general community, but that's the direction things are moving. The necessary changes in downstream code are also not particularly difficult, and are also essentially an update in the direction of "best practices". From what I understand, Generator also behaves better with parallel processes in some way, although I'm unable to find exactly how that works. Ultimately, there's again the overarching issue of how much backwards-compatibility we want to maintain now that gym is actually maintained again. In a way, breaking changes were always a part of gym due to the environment versioning, and old code that depends e.g. on LunarLander-v0 or v1 had to be updated. Conversely, striving to keep everything compatible with all previous versions will make it more difficult to make actual updates and avoid racking up technical debt. |
Users will need to refactor their code and specify something like Nowadays, users only write a small piece of code that build on top of lots of dependency packages. Although such changes in downstream code are not difficult, it needs the whole community to update their code almost simultaneously and push a new version to PyPI and/or conda as quickly as possible. I don't think it is possible to achieve. |
Is it really technical debt? Is Gym currently limited by the current implementation? What features are missing?
Maintained does not mean we need to break code at every release.
Breaking changes in environment should occur only when a bug is fixed, only when it is necessary. Too many things rely on gym and doing breaking changes at every release is the best thing you can do to annoy every user. That would be ok if gym was in an early stage of development but that's not the case anymore. The only result you will get is people capping gym version to avoid breaking changes: DLR-RM/stable-baselines3#573 and cyprienc/stable-baselines3@e7a48d2
If this piece of code was a bottleneck in current gym env, I would agree to change it. |
-Correct me if I'm wrong here, but the breaking changes here should be for a very minor use case, and I don't believe that it should break SB3 or any other major repo? -@RedTachyon is there anyway to retain backwards compatibility with a warning instead of breaking things? |
Hm, so the way I see it there are two issues to deal with, which are somewhat orthogonal. Number 1: existing codebases (particularly custom environments) can use Number 2: spaces (and envs? does anyone ever pickle and store envs?) that were pickled in old versions can get wonky if unpickled with a new version. Re 1: my best idea for a solution is using a very thin wrapper class like this: class NGenerator:
def __init__(self, generator: np.random.Generator):
self.generator = generator
def __getattr__(self, name):
return getattr(self.generator, name)
def __repr__(self):
return repr(self.generator)
def __str__(self):
return str(self.generator)
def rand(self, *dims):
# some warning
return self.generator.random(dims)
def randn(self, *dims):
return self.generator.standard_normal(dims)
def randint(self, low, high=None, size=None, dtype=int):
return self.generator.integers(low=low, high=high, size=size, dtype=dtype)
def seed(self, seed: int):
self.generator = np.random.default_rng(seed) Add some docstrings, add some warnings, and in principle it should act exactly like the original Generator, except it has these three additional methods, which optionally pass a warning. This would still use the new "correct" RNG, but would allow using it with the legacy interface. Then the The one thing this can't do is Re 2: Do we care about the pickled spaces sampling with the That being said, I don't think this is actually important, it feels like an extremely niche use case (i.e. pickle a space or an environment, then unpickle it in the future and require the RNG sequence to be exactly the same). If we're content with just gracefully unpickling a space and giving it some reasonable Note: the |
Your solution in point 1 seems fine to me provided no one else has further objections Regarding 2, unless there's an application I'm aware of we don't need to worry about that kind of pickling across different Gym versions. All the ones I'm aware of that this would impact only cache these objects instead of permanently saving them. |
Most env cannot be pickled anyway. But spaces are (to check if the new env matches the one used for training). |
I think adding the legacy function alias to the "new" For the solution provided in #2422 (comment), I have some advice here:
We should use an explicit Currently, the default
Although
from pkg_resources import parse_version
import numpy as np
import gym
from gym.logger import warn
# TODO: Remove this class and make it alias to `Generator` in Gym release 0.25.0
# RandomNumberGenerator = np.random.Generator
class RandomNumberGenerator(np.random.Generator):
REMOVE_SINCE = '0.25.0'
assert parse_version(gym.__version__) < parse_version(REMOVE_SINCE)
def rand(self, *size):
warn(f'Function `rng.rand(*size)` is marked as deprecated '
f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
f'Please use `Generator.random(size)` instead.')
return self.random(size)
random_sample = rand
def randn(self, *size):
warn(f'Function `rng.randn(*size)` is marked as deprecated '
f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
f'Please use `rng.standard_normal(size)` instead.')
return self.standard_normal(size)
def randint(self, low, high=None, size=None, dtype=np.int64):
warn(f'Function `rng.randint(low, [high, size, dtype])` is marked as deprecated '
f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
f'Please use `rng.integers(low, [high, size, dtype])` instead.')
return self.integers(low=low, high=high, size=size, dtype=dtype)
random_integers = randint
def get_state(self):
warn(f'Function `rng.get_state()` is marked as deprecated '
f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
f'Please use `rng.bit_generator.state` instead.')
return self.bit_generator.state
def set_state(self, state):
warn(f'Function `rng.set_state(state)` is marked as deprecated '
f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
f'Please use `rng.bit_generator.state = state` instead.')
self.bit_generator.state = state
def seed(self, seed=None):
warn(f'Function `rng.seed(seed)` is marked as deprecated '
f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
f'Please use `rng = get_rng(seed)` to create a separated generator instead.')
self.bit_generator.state = type(self.bit_generator)(seed).state
rand.__doc__ = np.random.rand.__doc__
randn.__doc__ = np.random.randn.__doc__
randint.__doc__ = np.random.randint.__doc__
get_state.__doc__ = np.random.get_state.__doc__
set_state.__doc__ = np.random.set_state.__doc__
seed.__doc__ = np.random.seed.__doc__
# Shortcut for RandomNumberGenerator
RNG = RandomNumberGenerator
def get_rng(seed=None):
if isinstance(seed, np.random.BitGenerator):
return RNG(seed)
if isinstance(seed, np.random.Generator):
return RNG(seed.bit_generator) # TODO: just `return seed` after migrating to Gym 0.25.0
return RNG(np.random.PCG64(seed)) NOTE: The version Gym 0.25.0 here can be any other value. Or we can keep this backward compatibility in perpetuity. I could create a PR for this if needed. |
Very rarely, but yes, I describe it in the original PR post in the part where I tagged you. Tl;dr under some conditions, it's used in calling |
@XuehaiPan I'm fine with this solution too, this should probably just be added directly in this PR, I can do it if there's agreement. My thinking about my approach was that I don't necessarily agree that the two solutions are very different from a technical debt point of view, in the end both simply introduce a new class which will be deleted later. So for the difference in functionality: with a wrapper class, we use With an inherited class, we manage the RNG mechanism manually (for better or for worse), but a very big (subjectively) benefit is that we'd still have As a minor note regarding the exact code snippet you posted - making a new |
I'm fine with that.
Agreed. The snippet here is just an example. |
Continue to #2422 (comment)
Should we keep
If the answer is Yes, the warnings in #2422 (comment): warn(f'Function `rng.rand(*size)` is marked as deprecated '
f'and will be removed in gym {self.REMOVE_SINCE} in the future. '
f'Please use `Generator.random(size)` instead.') should be replaced to: warn('Function `rng.rand(*size)` is marked as legacy.'
'Please use `Generator.random(size)` instead.') To notice the users that we don't forbid legacy uses but we don't encourage that. Otherwise, the name >>> self.np_random.randint
AttributeError: 'numpy.random._generator.Generator' object has no attribute 'randint'
>>> np.random.randint
<built-in method randint of numpy.random.mtrand.RandomState object at 0x7f6e4d1ec740> |
I would like to keep the warning as is. This is functionality that should be removed in the future, and will have to be as numpy plans to deprecate this eventually. We'll just leave it as a warning for maybe a year (or longer, depending) so there can be a smooth transition period without unnecessary breaks. |
Also I'm going to wait and merge this PR until the release after the next one |
Then, another question: what if I want to reseed the environment again as randomly as possible? I will need: env.np_random = None
env.reset(seed=None) |
What exactly do you mean by that? And specifically, how is it different from not reseeding it explicitly at all? (I'll try to address the other things soon) |
@RedTachyon For example, I what to test my policy for multiple episodes with randomly initialized starting state (reseed the environment at the beginning of each episode with Reseed the environment with for epi in range(<num_episode>):
env.np_random = None
observation = env.reset(seed=None) # only reseed once here
done = False
while not done:
action = policy(observation)
observation, reward, done, info = env.step(action) Otherwise, I'll need to generate multiple seeds first: seeds = [...]
for seed in seeds:
env.np_random = None
observation = env.reset(seed=seed)
done = False
while not done:
action = policy(observation)
observation, reward, done, info = env.step(action) |
@XuehaiPan I still don't understand. What's the difference between reseeding "as randomly as possible" (which means you don't control the random seed and the PRNG), versus not reseeding at all (which means you use whatever the state of the PRNG is at the moment, i.e. you don't control the random seed and the PRNG). I agree that there would be a difference in the specific numbers generated, but in either case they will be not reproducible since you're not controlling the seed. |
@RedTachyon I agree with this (point 2 in #2422 (review)). What I'm concerned about is this seeding update takes away the rights to seed the environment again with |
with this PR, what would be the way to seed multiple envs that are in different processes? before: seed = 1
# at the beginning of training
# `env.seed(seed + rank)` is called in each process so next call to `reset()` will provide expected obs
vec_env.seed(seed)
# later in the code, e.g., evaluating the agent
obs = vec_env.reset() after? seed = 1
vec_env.reset(seed) # calling reset(seed + rank) without retrieving the obs?
# later in the code, e.g., evaluating the agent
obs = vec_env.reset() EDIT: the more I look at it, the more this PR seems to be breaking all existing envs for only a minor improvement... |
I'd suggest keeping the class Env:
def seed(seed=None):
# keep this function
def reset(self, *, **kwargs): # only accept keyword arguments
# do not modify this too
class SeedOnReset(gym.Wrapper):
__marker = object()
def reset(self, *, seed=__marker, **kwargs):
if seed is not self.__marker:
self.seed(seed)
return super().reset(**kwargs) We can add wrapper |
General note: I won't be discussing why @vwxyzjn Some of the doubts might have been cleared up in the meantime, but as a summary of the changes:
@XuehaiPan I still don't see why someone would need to explicitly reseed with @araffin Yea, your |
Oppose for these which are minor improvements but backward incompatible. These need the downstream code base to be updated. For example, suppose I'm a downstream package maintainer, I cannot arbitrarily set
When the environment have been seeded, the user cannot seed the environment again with class MyEnv(gym.Env):
def __init__(self):
super().__init__()
self.reset(seed=0)
env = MyEnv()
env.reset(seed=None) # no effect for the seed |
My preference would be to keep I feel like this same argument has been going on far longer than Gym. For example, this is an issue in the ALE and we have these two functions Apart from the questionable naming of the system versus whatever non-system is, perhaps mandating proper serialization and copy support could help with some of these use cases. For example, something like, env = gym.make('...')
# Makes a copy of the environment where RNG is seeded with 1
env_copy_with_new_rng = env.copy(seed=1)
# Makes a copy but keeps the current RNG state
env_copy = env.copy() I think this does a couple of things:
Furthermore, I actually think this clears things up with the case where from numpy.random import SeedSequence
env = gym.make('...')
ss = SeedSequence()
envs = map(lambda seed: env.copy(seed=seed), ss.spawn(10)) This solution could also be somewhat backwards compatible. You can provide a default implementation for def copy(seed=None):
_self = _copy.deepcopy(self)
if seed is not None:
_self.seed(seed)
return _self and slowly deprecate |
I feel like there's a fundamental problem of how much I don't know to what extent this design work was done by OpenAI at the beginning, but probably not enough. And now we just kinda have to make it up as we go, to figure out the right balance for what actually exists. The only problem is that whenever some design change in either direction seems to make sense, it will likely be a breaking change, and that will inevitably make some people angry. But at the same time, a situation where we're stuck with an outdated architecture just because that's what ended up being made a few years ago is... not ideal. |
tl;dr: keep the interface as stable as we can. Minimize the impact on downstream code bases.
As an interface maintainer / designer, the interface definition should and must be stable (new: fine; remove/change: no). When doing modifications, keep in mind what impact will be on the downside. Add necessary compatibility layers for it, or add a separated new interface and keep the old one (like Suppose I create a new PR: remove By now, there are 55 third-party environments listed in Third Party Environments and 23.5k repositories explicitly mark This PR contains two breaking changes: 1) replace Change 1: replace
|
I do not believe that that should be added to Gym. Regarding your new comment about Atari not actually being incompatible with .seed(), I'm going to sit down at some point and think about if we still want to to include the breaking change component of the PR given this. @XuehaiPan (and others) I disagree that making changes like this is an extraordinarily big deal. Every major Python library deprecates and removes/changes features over time, including ones infinitely bigger than us like NumPy or SciPy. This PR is deprecates a feature that will still continue to function for awhile (e.g. roughly 6 months in our case) before using a new function is required, this happens all the time. Additionally, changing existing code to use the new API here would require incredibly small changes that could be made in minutes, which is different from what happened with Python 3.6. Python makes many changes of this scope in sub versions, e.g. 3.6->3.7. I had a Zoom call with the RLlib developers last week (RLlib is the biggest RL library in the world by PyPI installs and as best I can tell has more Gym code than anyone else in the world) and mentioned this to them, and their response to this PR was "ya sure that's fine." Regarding changing the object name, this isn't really standard for marking these sorts of changes, the standard to mark this would be a major version change in Gym. The problem with this is that semantic versioning is super weird as a construct in Gym, so it's never been used. |
@jkterry1 If the decision is made, it is fine for me. I'm not against any changes to the library. I'm arguing whether these changes are necessary (listed in #2422 (comment)), especially for the core function. If we are going to remove
If we are not going to remove
@jkterry1 Indeed, some libraries bigger than us deprecate features over time, but they still support the old functionality and just prompt a warning or nothing. How long will the old functionality support? The answer for Python and/or NumPy may be forever, especially for the core functions. For other changes that do not guarantee backward compatibility, they provide migration tools like For example, Python 3.10.0 (default, Oct 4 2021, 17:55:55) [GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import threading
>>> threading.Event.is_set
<function Event.is_set at 0x7fbf1c632050>
>>> threading.Event.isSet
<function Event.isSet at 0x7fbf1c6320e0>
>>> Another example is the In NEP 19 — Random number generator policy:
All current usages of |
BTW, a special reminder for breaking changes without backward compatibilities: the source codes of many academic papers are open-source with warnings: - "the source code is released as it is without any warranties" |
I briefly spoke to people at OpenAI about this and others like Jesse offline and I still believe this is the best course of action going forward (OpenAI is in agreement with this change) |
The old seed method will continue to work until the Gym 1.0 release to support legacy code though. And again, updating code to use this change is absolutely tiny. I've spoken to the largest users of the Gym API in the world about this and a few other discussed small API changes, and essentially their positions universally were "ya sure, this isn't a big deal" |
Can we at least make the method forever deprecated as numpy does it? That would be nice for all the 24k projects that already exist... https://github.com/openai/gym/network/dependents?package_id=UGFja2FnZS01MjE5NTAzMA%3D%3D |
Hi I recently started working with gym, and have been having some fun getting fooled by what's in master with what's released while trying to integrate. Take this all with a pinch of salt: It seems that changing If I could offer a suggestion, it would be to split a pure, implementation-free Env interface out from the common implementation stuff shared amongst envs. |
See Issue #1663
This is a bit of a ride. The base change is getting rid of the custom seeding utils of gym, and instead using
np.random.default_rng()
as is recommended with modern versions of NumPy. I kept thegym.utils.seeding.np_random
interface and changed it to basically being a synonym fordefault_rng
(with some API difference, consistent with the oldnp_random
)Because the API (then
RandomState
, nowGenerator
) changed a bit,np_random.randint
calls were replaced withnp_random.integers
,np_random.rand
->np_random.random
,np_random.randn
->np_random.standard_normal
. This is all in accordance with the recommended NumPy conversion.My doubts in order of (subjective) importance
Doubt number 1:
In
gym/utils/seeding.py#L18
I'm accessing a "protected" variableseed_seq
. This serves accessing the random seed that was automatically generated under the hood when the passed seed is None. (it also gives the correct value if the passed seed is an integer)An alternative solution would be restoring the whole
create_seed
machinery which generates a random initial seed fromos.urandom
. I was unable to find another way to get the initial seed of a default Generator (default_rng(None)
) instance.Doubt number 2:
In
gym/spaces/multi_discrete.py#L64
. Turns out that aGenerator
doesn't directly supportget_state
andset_state
. The same functionality seems to be achievable by accesing the bit generator and modifying its state directly (without getters/setters).Doubt number 3:
As mentioned earlier, this version maintains the
gym.utils.seeding
file with just a single function. Functionally, I don't think that's a problem, but might feel a bit redundant from a stylistic point of view. This could be replaced by changing something like 17 calls of this function that occur in the codebase, but at the moment I think it'd be a bad move. The reason is that the function passes through the seed that's generated if the passed seed is None (see Doubt 1), which has to be obtained through mildly sketchy means, so it's better to keep it contained within the function. I don't think passing the seed extremely necessary, but that would somewhat significantly change the actual external API, I tend to be hesitant about changes like this if there's no good reason.Overall I think it's good to keep the
seeding.np_random
function to keep some consistency with previous versions. The alternative is just completely removing the concept of "gym seeding", and using NumPy. (right now "gym seeding" is basically an interface for NumPy seeding)Doubt number 4:
Pinging @araffin as there's a possibility this will (again) break some old pickled spaces in certain cases, and I know this was an issue with SB3 and the model zoo. Specifically, if you create a
Space
with the current master branch, runspace.sample()
at least once, and then pickle it, it will be pickled with aRandomState
instance, which is now considered a legacy generator in NumPy. If you unpickle it using new gym code (i.e. this PR),space.np_random
will still point to aRandomState
, but the rest of the code expectsspace.np_random
to be aGenerator
instance, which has a few API changes (see the beginning of this post).Overall I don't know how important it is for the internal details of gym objects to remain the same - which is more or less necessary for old objects to be unpicklable in new gym versions. There's probably a way of a custom unpickling protocol as a compatibility layer - I'm not sufficiently familiar with this to do it, but I imagine it should be doable on the user side? (i.e. not in gym)
Doubt number 2137: (very low importance)
This doesn't technically solve #2210. IMHO this is absolutely acceptable, because
np.random.seed
is part of the legacy seeding mechanism, and is officially discouraged by NumPy. Proper usage yields expected results:tl;dr
This definitely needs another set of eyes on it because I'm not confident enough about the nitty-gritty details of NumPy RNG. There are a few things I'm not 100% happy with from a stylistic point of view, but as far as my understanding goes, the functionality is what it should be. There's also the question of supporting old pickled objects, which I think is a whole different topic that needs discussing now that gym is maintained again.