From 1c89e7d39a8c1c2523ec40f8efce67a0e60c4e08 Mon Sep 17 00:00:00 2001
From: sash-a a function to aggregate all agents rewards into a single scalar
value, e.g. sum. a function to aggregate all agents discounts into a single
scalar value, e.g. max.
Last update:
- 2024-03-05
+ 2024-03-08
diff --git a/api/environments/bin_pack/index.html b/api/environments/bin_pack/index.html
index 380a24b73..0a481d8cf 100644
--- a/api/environments/bin_pack/index.html
+++ b/api/environments/bin_pack/index.html
@@ -2098,7 +2098,7 @@
Last update:
- 2024-03-05
+ 2024-03-08
diff --git a/api/environments/cvrp/index.html b/api/environments/cvrp/index.html
index 024b60716..dec5b1632 100644
--- a/api/environments/cvrp/index.html
+++ b/api/environments/cvrp/index.html
@@ -1952,7 +1952,7 @@
Last update:
- 2024-03-05
+ 2024-03-08
diff --git a/api/environments/job_shop/index.html b/api/environments/job_shop/index.html
index ece9b3eb1..18c37cb5d 100644
--- a/api/environments/job_shop/index.html
+++ b/api/environments/job_shop/index.html
@@ -1943,7 +1943,7 @@
Last update:
- 2024-03-05
+ 2024-03-08
diff --git a/api/environments/maze/index.html b/api/environments/maze/index.html
index 399d18ad9..b7795b1d1 100644
--- a/api/environments/maze/index.html
+++ b/api/environments/maze/index.html
@@ -1934,7 +1934,7 @@
Last update:
- 2024-03-05
+ 2024-03-08
diff --git a/api/environments/minesweeper/index.html b/api/environments/minesweeper/index.html
index 92bc72751..aca868910 100644
--- a/api/environments/minesweeper/index.html
+++ b/api/environments/minesweeper/index.html
@@ -1947,7 +1947,7 @@
Last update:
- 2024-03-05
+ 2024-03-08
diff --git a/api/environments/mmst/index.html b/api/environments/mmst/index.html
index 7b9fde0b7..af03d2f95 100644
--- a/api/environments/mmst/index.html
+++ b/api/environments/mmst/index.html
@@ -2030,7 +2030,7 @@
Last update:
- 2024-03-05
+ 2024-03-08
diff --git a/api/environments/rware/index.html b/api/environments/rware/index.html
index 30550fc6d..0b6d5f93d 100644
--- a/api/environments/rware/index.html
+++ b/api/environments/rware/index.html
@@ -1872,7 +1872,7 @@
Last update:
- 2024-03-05
+ 2024-03-08
diff --git a/api/wrappers/index.html b/api/wrappers/index.html
index ac86dd249..c208ee068 100644
--- a/api/wrappers/index.html
+++ b/api/wrappers/index.html
@@ -2303,7 +2303,7 @@
-
__init__(self, env: Environment, reward_aggregator: Callable = <function sum at 0x7fd62457a790>, discount_aggregator: Callable = <function amax at 0x7fd62457af70>)
+__init__(self, env: Environment, reward_aggregator: Callable = <function sum at 0x7f2d8d729790>, discount_aggregator: Callable = <function amax at 0x7f2d8d729f70>)
special
@@ -2337,14 +2337,14 @@
Callable
-
+ <function sum at 0x7fd62457a790>
<function sum at 0x7f2d8d729790>
@@ -2712,6 +2712,7 @@ discount_aggregator
Callable
-
+ <function amax at 0x7fd62457af70>
<function amax at 0x7f2d8d729f70>
the state, observation, and step_type are reset. The observation and step_type of the
terminal TimeStep is reset to the reset observation and StepType.LAST, respectively.
The reward, discount, and extras retrieved from the transition to the terminal state.
+NOTE: The observation from the terminal TimeStep is stored in timestep.extras["next_obs"].
WARNING: do not
jax.vmap
the wrapped environment (e.g. do not use with the VmapWrapper
),
which would lead to inefficient computation due to both the step
and reset
functions
being processed each time step
is called. Please use the VmapAutoResetWrapper
instead.
reset(self, key: chex.PRNGKey) -> Tuple[State, TimeStep[Observation]]
+
+
+#Resets the environment to an initial state.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
key |
+ chex.PRNGKey |
+ random key used to reset the environment. |
+ required | +
Returns:
+Type | +Description | +
---|---|
state |
+ State object corresponding to the new state of the environment, +timestep: TimeStep object corresponding the first timestep returned by the environment, |
+