-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for more complex action distributions #253
Comments
This is a very reasonable inquiry. This would be a great feature to have in a future release. BTW, contributions are welcome, and I'd be happy to review code/provide suggestions if you decide to take on it! @theOGognf what are the specific environments that you have in mind? Having concrete examples might help! For now I would recommend forking the code and implementing the action distribution in a manner similar to how Tuple or other action distributions are implemented |
Thanks for the quick response, Alex. I'd be happy to take a stab at it. I can't share my environments, but there are a couple of examples from RLlib that get the point across. For action masking, an action mask is part of the observation and used to mask logits going into a model. Autoregressive distributions are usually specific to environments, but the whole point is building a model that can condition action heads on one another. Here's RLlib's corresponding thread on supporting autoregressive distributions as well for reference. I think it'd be easy to support if TensorDicts were passed between components rather than flattened (*vectors) observations, but I imagine that'd be a bit of a breaking change. Would a change like that be okay? |
SF actually supports dictionaries of observations out of the box, so passing action masks along with observations should not be a problem. Just define an env with a dictionary observation space, and SF should correctly handle any number of key-value observations. We're also already using TensorDict to pass these observations around, so this should not be an issue. There is one design limitation motivated by performance considerations. There are currently two abstractions related to action distributions: ActionParameterizations (this is the part of the policy that outputs the parameters of the action distribution) and action distributions themselves. I think to implement this properly we need facilities to define both custom parameterizations and custom ActionDistribution classes, which should have a well defined interface. E.g. action distributions should support sampling, entropy calculation, KL-divergence calculation (or at least some proxy of it), calculating logprob of a sampled action. In case of masked actions this action distribution object will be stateful (i.e. holding a valid action mask) Overall, this seems doable! I'm excited to see this feature and I'd be happy to help |
The current action distribution model has some restrictions that inhibits richer families of action distributions for complex environments.
As far as I can tell, only single space distributions or tuple space distributions are supported. Many custom environments make use of action masking and autoregressive distributions for handling complex action spaces. It'd be nice if there was an interface for registering custom action distributions much like registering other components.
The text was updated successfully, but these errors were encountered: