diff --git a/slides.html b/slides.html index 96a7f64..767a6ce 100644 --- a/slides.html +++ b/slides.html @@ -7769,8 +7769,9 @@
observation_space
and action_space
reset
method: resets the environment for a new episode, returns 2-tuple (observation, info)
step
method: main logic of the environment. It takes an action
, changes the environment to a new state
, get new observation
, compute the reward
, and finally returns the 4-tuple (observation, reward, done, info)
done
checks if the current episode should be terminated (reached goal reached, or exceeded some thresholds)step
method: main logic of the environment. It takes an action
, changes the environment to a new state
, get new observation
, compute the reward
, and finally returns the 5-tuple (observation, reward, terminated, truncated, info)
terminated
checks if the current episode should be terminated according to the underlying MDP (reached goal reached, or exceeded some thresholds)truncated
checks if the current episode should be truncated outside of the underlying MD (e.g. time limit)render
method: to visualize the environment (a video, or just some plots)ARESEA
implements the ARES Experimental Area transverse tuning task as a gym.Env
. It contains the basic logic, such as definition of observation space, action space, and reward. How an action is taken is implemented in child classes with specific backends.ARESEACheetah
is derived from the base class ARESEA
, where it uses cheetah
simulation as a backend.make_env
Initializes a ARESEA
envrionment, and wraps it with required gym.wrappers with convenient features (e.g. monitoring the progress, end episode when time_limit is reached, rescales the action, normalize the observation, ...)train
convenient function for training the RL agent. It calls make_env
, setup the RL algorithm, starts training, and saves the results in utils/recordings
, utils/monitors
and utils/models
.train
convenient function for training the RL agent. It calls make_env
, sets up the RL algorithm, starts training, and saves the results in utils/recordings
, utils/monitors
and utils/models
.env.target_beam_values = target_beam
env.reset() ##
-plt.figure(figsize = (7, 4))
+plt.figure(figsize=(7, 4))
plt.imshow(env.render()) # Plot the screen image
env = RescaleAction(env, -1, 1) # rescales the action to the interval [-1, 1]
env.reset()
env.step(action)
-plt.figure(figsize = (7, 4))
+plt.figure(figsize=(7, 4))
plt.imshow(env.render())
env.reset()
steps = 10
+
def change_vertical_corrector(q1, q2, cv, q3, ch, steps, i):
action = np.array([q1, q2, cv + 1 / steps * i, q3, ch])
return action
-fig, ax = plt.subplots(1, figsize = (7, 4))
+fig, ax = plt.subplots(1, figsize=(7, 4))
for i in range(steps):
action = change_vertical_corrector(0.2, -0.2, -0.5, 0.3, 0, steps, i)
env.step(action)
-
+
img = env.render()
ax.imshow(img)
display(fig)
@@ -8250,8 +8252,8 @@ Relevant config
parameters
Reward = objective_improvement
Difference of the objective:
-$$ r_\mathrm{obj-improvement} = ( \mathrm{obj}_{j-1} - \mathrm{obj}_{j} ) / \mathrm{obj}_0 $$
-$$ obj = \sum_{i}|b_i^\mathrm{(c)} - b_i^\mathrm{(t)}|$$
+$$ r*\mathrm{obj-improvement} = ( \mathrm{obj}*{j-1} - \mathrm{obj}\_{j} ) / \mathrm{obj}\_0 $$
+$$ obj = \sum\_{i}|b_i^\mathrm{(c)} - b_i^\mathrm{(t)}|$$
where $j$ is the index of the current time step.
config
parametersobjective_improvement
$$ r_\mathrm{obj-improvement} = ( \mathrm{obj}_{j-1} - \mathrm{obj}_{j} ) / \mathrm{obj}_0 $$ +
$$ r*\mathrm{obj-improvement} = ( \mathrm{obj}*{j-1} - \mathrm{obj}_{j} ) / \mathrm{obj}\_0 $$ $$ obj = \sum_{i}|b_i^\mathrm{(c)} - b_i^\mathrm{(t)}|$$
where $j$ is the index of the current time step.
@@ -8453,7 +8455,7 @@config
parametersobjective_improvement
$$ r_\mathrm{obj-improvement} = ( \mathrm{obj}_{j-1} - \mathrm{obj}_{j} ) / \mathrm{obj}_0 $$ +
$$ r*\mathrm{obj-improvement} = ( \mathrm{obj}*{j-1} - \mathrm{obj}_{j} ) / \mathrm{obj}\_0 $$ $$ obj = \sum_{i}|b_i^\mathrm{(c)} - b_i^\mathrm{(t)}|$$
where $j$ is the index of the current time step.
@@ -8537,7 +8539,7 @@config
parametersnegative_objective"
$$ r_\mathrm{neg-obj} = -1 * \mathrm{obj} / \mathrm{obj}_0 $$
+$$ r\_\mathrm{neg-obj} = -1 \* \mathrm{obj} / \mathrm{obj}\_0 $$
where $b = [\mu_x,\sigma_x,\mu_y,\sigma_y]$, $b^\mathrm{(c)}$ is the current beam, and $b^\mathrm{(t)}$ is the target beam. $\mathrm{obj}_0$ is the initial objective after reset
.
plt.figure(figsize = (7,4))
+plt.figure(figsize=(7, 4))
evaluate_ares_ea_agent(agent_under_investigation, include_position=False, n=2000)