-
Notifications
You must be signed in to change notification settings - Fork 1
MountainCarContinuous v0
Name: MountainCarContinuous-v0
Category: Classic Control
Environment Page
Algorithms Page
An underpowered car must climb a one-dimensional hill to reach a target.
The target is on top of a hill on the right-hand side of the car. If the car reaches it or goes beyond, the episode terminates.
On the left-hand side, there is another hill. Climbing this hill can be used to gain potential energy and accelerate towards the target. On top of this second hill, the car cannot go further than a position equal to -1, as if there was a wall. Hitting this limit does not generate a penalty (it might in a more challenging version).
This environment corresponds to the continuous version of the mountain car environment described in Andrew Moore's PhD thesis (apart from the reward function).
Such a continuous version has been used in several research papers, e.g.:
http://image.diku.dk/igel/paper/VMRLMAttNMCP.pdf
Recently, it has been used to compare DDPG to CMA-ES in this paper:
http://arxiv.org/abs/1606.09152
Type: Box(2)
Num | Observation | Min | Max |
---|---|---|---|
0 | Car Position | -1.2 | 0.6 |
1 | Car Velocity | -0.07 | 0.07 |
Note that velocity has been constrained to facilitate exploration, but this constraint might be relaxed in a more challenging version.
Type: Box(1)
Num | Action |
---|---|
0 | Push car to the left (negative value) or to the right (positive value) |
Reward is 100 for reaching the target of the hill on the right hand side, minus the squared sum of actions from start to goal.
This reward function raises an exploration challenge, because if the agent does not reach the target soon enough, it will figure out that it is better not to move, and won't find the target anymore.
Note that this reward is unusual with respect to most published work, where the goal was to reach the target as fast as possible, hence favouring a bang-bang strategy.
Position between -0.6 and -0.4, null velocity.
Position equal to 0.5. A constraint on velocity might be added in a more challenging version.
Adding a maximum number of steps might be a good idea.
Get a reward over 90. This value might be tuned.