Skip to content

Commit

Permalink
Update website to output generated at 9da3f5c
Browse files Browse the repository at this point in the history
  • Loading branch information
cr-xu committed Feb 23, 2024
1 parent 5c03366 commit fe3a466
Showing 1 changed file with 26 additions and 131 deletions.
157 changes: 26 additions & 131 deletions slides.html
Original file line number Diff line number Diff line change
Expand Up @@ -7623,7 +7623,7 @@ <h3> The environment's state</h3>
</ul>
</li>
<li>The <strong>incoming beam</strong>: the beam that enters the EA upstream<ul>
<li>$I = [\mu_x^{(\mathrm{i})},\sigma_x^{(\mathrm{i})},\mu_y^{(\mathrm{i})},\sigma_y^{(\mathrm{i})},\mu_xp^{(\mathrm{i})},\sigma_xp^{(\mathrm{i})},\mu_yp^{(\mathrm{i})},\sigma_yp^{(\mathrm{i})},,\mu_s^{(\mathrm{i})},\sigma_s^{(\mathrm{i})}]$, where $i$ stands for "incoming"</li>
<li>$I = [\mu_x^{(\mathrm{i})},\sigma_x^{(\mathrm{i})},\mu_y^{(\mathrm{i})},\sigma_y^{(\mathrm{i})},\mu_{xp}^{(\mathrm{i})},\sigma_{xp}^{(\mathrm{i})},\mu_{yp}^{(\mathrm{i})},\sigma_{yp}^{(\mathrm{i})},\mu_s^{(\mathrm{i})},\sigma_s^{(\mathrm{i})}]$, where $i$ stands for "incoming"</li>
</ul>
</li>
<li>The <strong>magnet strengths</strong> and <strong>deflection angles</strong><ul>
Expand All @@ -7636,7 +7636,7 @@ <h3> The environment's state</h3>
</li>
</ul>
<h3 style="color:#038aa1;">Discussion</h3>
<p style="color:#038aa1;"> $\implies$ Do we know or can we observe the state of the environment?</p>
<p style="color:#038aa1;"> $\implies$ Do we (fully) know or can we observe the state of the environment?</p>
</div>
</div>
</div>
Expand Down Expand Up @@ -7728,12 +7728,16 @@ <h2>Part II: Algorithm implementation in Python</h2>
<h2 style="color: #b51f2a">About libraries for RL</h2>
<p>There are many libraries with already implemented RL algorithms, and frameworks to implement an environment to interact with. In this notebook we use:</p>
<ul>
<li><a href="https://stable-baselines3.readthedocs.io/">Stable-Baselines3</a> for the agents</li>
<li><a href="https://www.gymlibrary.dev/">OpenAI Gym</a> for the environment
<li><a href="https://stable-baselines3.readthedocs.io/">Stable-Baselines3</a> for the RL algorithms</li>
<li><a href="https://gymnasium.farama.org/">Gymnasium</a> for the environment
<img alt="No description has been provided for this image" src="img/rl_libraries.png" style="width:60%; margin:auto;"/><p style="clear:both; font-size: small; text-align: center; margin-top:1em;">More info <a href="https://neptune.ai/blog/the-best-tools-for-reinforcement-learning-in-python">here</a></p>
</li>
</ul>
<p><strong>Note:</strong> OpenAI Gym is slowly being succeeded by <a href="https://gymnasium.farama.org/#"><em>Gymnasium</em></a>, a fork of Gym maintained by the Farama Foundation.</p>
<p>Note:</p>
<ul>
<li>Gymnasium is the successor of the <a href="https://www.gymlibrary.dev/">OpenAI Gym</a>.</li>
<li>Stable-baselines3 now has an early-stage JAX implementation <a href="https://github.com/araffin/sbx">sbx</a>.</li>
</ul>
</div>
</div>
</div>
Expand All @@ -7760,7 +7764,7 @@ <h2 style="color: #b51f2a">Agent / algorithm</h2>
<div class="jp-InputArea jp-Cell-inputArea"><div class="jp-InputPrompt jp-InputArea-prompt">
</div><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput" data-mime-type="text/markdown">
<h2 style="color: #b51f2a">Environment</h2>
<p>We take all the elements of the RL problem we defined previously and reprensent the tuning task as an <a href="https://www.gymlibrary.dev/">OpenAI Gym</a> environment, which is a standard library for RL tasks.</p>
<p>We take all the elements of the RL problem we defined previously and represent the tuning task as a <code>gym</code> environment, which is a standard library for RL tasks.</p>
<p>A custom <code>gym.Env</code> would contain the following parts:</p>
<ul>
<li><strong>Initialization</strong>: setup the environment, declares the allowed <code>observation_space</code> and <code>action_space</code></li>
Expand All @@ -7769,7 +7773,7 @@ <h2 style="color: #b51f2a">Environment</h2>
<li><code>done</code> checks if the current episode should be terminated (reached goal reached, or exceeded some thresholds)</li>
</ul>
</li>
<li><code>render</code> <strong>method</strong>: to visualize the environment (a video,or just some plots)</li>
<li><code>render</code> <strong>method</strong>: to visualize the environment (a video, or just some plots)</li>
</ul>
</div>
</div>
Expand Down Expand Up @@ -7835,7 +7839,7 @@ <h2 style="color: #b51f2a">Code directory structure</h2>
</div><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput" data-mime-type="text/markdown">
<h2 style="color: #b51f2a">What is Cheetah?</h2>
<ul>
<li>RL algorithms require a large number of samples to learn ($10^5-10^9$), and getting those samples in the real accelerator is often too costly.<ul>
<li>RL algorithms require a large number of samples to learn ($10^5-10^9$), and getting those samples in the real accelerator is often too costly.<ul>
<li>This is why a common approach is to train the agent in simulation, and then deploy it in the real machine</li>
</ul>
</li>
Expand All @@ -7845,7 +7849,7 @@ <h2 style="color: #b51f2a">What is Cheetah?</h2>
</li>
<li><strong>Cheetah</strong> is a tensorized approach for transfer matrix tracking, which saves computation time and overhead compared to OCELOT</li>
</ul>
<p>More information <a href="https://accelconf.web.cern.ch/ipac2022/papers/wepoms036.pdf">here</a> and <a href="https://github.com/desy-ml/cheetah">here</a></p>
<p>You can find more information in the <a href="https://arxiv.org/abs/2401.05815">paper</a> and the <a href="https://github.com/desy-ml/cheetah">code repository</a>.</p>
</div>
</div>
</div>
Expand All @@ -7854,10 +7858,11 @@ <h2 style="color: #b51f2a">What is Cheetah?</h2>
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser">
</div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In [1]:</div>
<div class="jp-InputPrompt jp-InputArea-prompt">In [6]:</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="cm-editor cm-s-jupyter">
<div class="highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">time</span> <span class="kn">import</span> <span class="n">sleep</span>
<div class="highlight hl-ipython3"><pre><span></span><span class="c1"># Importing the required packages</span>
<span class="kn">from</span> <span class="nn">time</span> <span class="kn">import</span> <span class="n">sleep</span>

<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
<span class="kn">import</span> <span class="nn">names</span>
Expand All @@ -7868,7 +7873,6 @@ <h2 style="color: #b51f2a">What is Cheetah?</h2>

<span class="kn">from</span> <span class="nn">utils.helpers</span> <span class="kn">import</span> <span class="p">(</span>
<span class="n">evaluate_ares_ea_agent</span><span class="p">,</span>
<span class="n">make_ares_ea_training_videos</span><span class="p">,</span>
<span class="n">plot_ares_ea_training_history</span><span class="p">,</span>
<span class="n">show_video</span><span class="p">,</span>
<span class="p">)</span>
Expand All @@ -7887,9 +7891,9 @@ <h2 style="color: #b51f2a">What is Cheetah?</h2>
</div>
<div class="jp-InputArea jp-Cell-inputArea"><div class="jp-InputPrompt jp-InputArea-prompt">
</div><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput" data-mime-type="text/markdown">
<h2 style="color: #b51f2a">The ARESEA (ARES Experimental Area) Environment</h2>
<h2 style="color: #b51f2a">The ARES-EA (ARES Experimental Area) Environment</h2>
<ul>
<li>We formulated the ARESEA task as a <a href="https://github.com/openai/gym">OpenAI Gym</a> environment, which allows our algorithm to easily interface with both the simulation and real machine backends as shown before.</li>
<li>We formulated the ARES-EA task as a <code>gym</code> environment, which allows our algorithm to easily interface with both the simulation and real machine backends as shown before.</li>
<li>In this part, you will get familiar with the environment for the beam focusing and positioning at ARES accelerator.</li>
</ul>
<p>Some methods:</p>
Expand Down Expand Up @@ -7957,7 +7961,7 @@ <h3 style="color:#038aa1;">Set a target beam you want to achieve</h3>
<div class="highlight hl-ipython3"><pre><span></span><span class="n">env</span><span class="o">.</span><span class="n">target_beam_values</span> <span class="o">=</span> <span class="n">target_beam</span>
<span class="n">env</span><span class="o">.</span><span class="n">reset</span><span class="p">()</span> <span class="c1">##</span>
<span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span> <span class="o">=</span> <span class="p">(</span><span class="mi">7</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">env</span><span class="o">.</span><span class="n">render</span><span class="p">(</span><span class="n">mode</span><span class="o">=</span><span class="s2">"rgb_array"</span><span class="p">))</span> <span class="c1"># Plot the screen image</span>
<span class="n">plt</span><span class="o">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">env</span><span class="o">.</span><span class="n">render</span><span class="p">())</span> <span class="c1"># Plot the screen image</span>
</pre></div>
</div>
</div>
Expand Down Expand Up @@ -7990,7 +7994,7 @@ <h3 style="color:#038aa1;">Set a target beam you want to achieve</h3>
</div><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput" data-mime-type="text/markdown">
<h3 style="color:#038aa1;">Get familiar with the Gym environment</h3>
<p style="color:#038aa1;"> $\implies$ Change the magnet values, i.e. the actions</p>
<p style="color:#038aa1;"> $\implies$ The actions are normalized to 1, so valid values are in the [0, 1] interval</p>
<p style="color:#038aa1;"> $\implies$ The actions are normalized to 1, so valid values are in the [-1, 1] interval</p>
<p style="color:#038aa1;"> $\implies$ The values of the <code>action</code> list in the cell below follows this magnet order: [Q1, Q2, CV, Q3, CH]</p>
</div>
</div>
Expand Down Expand Up @@ -8854,16 +8858,17 @@ <h3 style="color:#038aa1;">Questions</h3>
</div>
<div class="jp-InputArea jp-Cell-inputArea"><div class="jp-InputPrompt jp-InputArea-prompt">
</div><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput" data-mime-type="text/markdown">
<p>You will train the agent by executing the cell below:</p>
<p>You will train the agent by executing the cell below:
<em>Note</em>: This could take about 10 min on a laptop.</p>
</div>
</div>
</div>
</div><div class="jp-Cell jp-CodeCell jp-Notebook-cell">
</div><div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper" tabindex="0">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser">
</div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In [3]:</div>
<div class="jp-InputPrompt jp-InputArea-prompt">In [ ]:</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="cm-editor cm-s-jupyter">
<div class="highlight hl-ipython3"><pre><span></span><span class="c1"># Toggle comment to re-run the training (can take very long)</span>
Expand All @@ -8873,116 +8878,6 @@ <h3 style="color:#038aa1;">Questions</h3>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser">
</div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
<pre>==&gt; Training agent "Ernest Fairfield"
Moviepy - Building video /Users/chenran/Workspace/Phd/Projects/rl-tutorial-ares-basic/utils/recordings/Ernest Fairfield/rl-video-episode-0.mp4.
Moviepy - Writing video /Users/chenran/Workspace/Phd/Projects/rl-tutorial-ares-basic/utils/recordings/Ernest Fairfield/rl-video-episode-0.mp4

</pre>
</div>
</div>
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="application/vnd.jupyter.stderr" tabindex="0">
<pre> </pre>
</div>
</div>
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
<pre>Moviepy - Done !
Moviepy - video ready /Users/chenran/Workspace/Phd/Projects/rl-tutorial-ares-basic/utils/recordings/Ernest Fairfield/rl-video-episode-0.mp4
Moviepy - Building video /Users/chenran/Workspace/Phd/Projects/rl-tutorial-ares-basic/utils/recordings/Ernest Fairfield/rl-video-episode-1.mp4.
Moviepy - Writing video /Users/chenran/Workspace/Phd/Projects/rl-tutorial-ares-basic/utils/recordings/Ernest Fairfield/rl-video-episode-1.mp4

</pre>
</div>
</div>
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="application/vnd.jupyter.stderr" tabindex="0">
<pre> </pre>
</div>
</div>
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
<pre>Moviepy - Done !
Moviepy - video ready /Users/chenran/Workspace/Phd/Projects/rl-tutorial-ares-basic/utils/recordings/Ernest Fairfield/rl-video-episode-1.mp4
Eval num_timesteps=20000, episode_reward=-16.84 +/- 2.71
Episode length: 25.00 +/- 0.00
New best mean reward!
Moviepy - Building video /Users/chenran/Workspace/Phd/Projects/rl-tutorial-ares-basic/utils/recordings/Ernest Fairfield/rl-video-episode-8.mp4.
Moviepy - Writing video /Users/chenran/Workspace/Phd/Projects/rl-tutorial-ares-basic/utils/recordings/Ernest Fairfield/rl-video-episode-8.mp4

</pre>
</div>
</div>
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="application/vnd.jupyter.stderr" tabindex="0">
<pre> </pre>
</div>
</div>
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
<pre>Moviepy - Done !
Moviepy - video ready /Users/chenran/Workspace/Phd/Projects/rl-tutorial-ares-basic/utils/recordings/Ernest Fairfield/rl-video-episode-8.mp4
Eval num_timesteps=40000, episode_reward=-7.24 +/- 1.45
Episode length: 25.00 +/- 0.00
New best mean reward!
Eval num_timesteps=60000, episode_reward=-5.87 +/- 1.23
Episode length: 25.00 +/- 0.00
New best mean reward!
Eval num_timesteps=80000, episode_reward=-5.89 +/- 0.97
Episode length: 25.00 +/- 0.00
Eval num_timesteps=100000, episode_reward=-7.63 +/- 1.70
Episode length: 25.00 +/- 0.00
Moviepy - Building video /Users/chenran/Workspace/Phd/Projects/rl-tutorial-ares-basic/utils/recordings/Ernest Fairfield/rl-video-episode-27.mp4.
Moviepy - Writing video /Users/chenran/Workspace/Phd/Projects/rl-tutorial-ares-basic/utils/recordings/Ernest Fairfield/rl-video-episode-27.mp4

</pre>
</div>
</div>
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="application/vnd.jupyter.stderr" tabindex="0">
<pre> </pre>
</div>
</div>
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"></div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain" tabindex="0">
<pre>Moviepy - Done !
Moviepy - video ready /Users/chenran/Workspace/Phd/Projects/rl-tutorial-ares-basic/utils/recordings/Ernest Fairfield/rl-video-episode-27.mp4
Eval num_timesteps=120000, episode_reward=-5.69 +/- 0.72
Episode length: 25.00 +/- 0.00
New best mean reward!
Eval num_timesteps=140000, episode_reward=-6.57 +/- 2.97
Episode length: 25.00 +/- 0.00
Eval num_timesteps=160000, episode_reward=-5.39 +/- 0.34
Episode length: 25.00 +/- 0.00
New best mean reward!
Eval num_timesteps=180000, episode_reward=-5.20 +/- 0.72
Episode length: 25.00 +/- 0.00
New best mean reward!
Eval num_timesteps=200000, episode_reward=-4.98 +/- 0.41
Episode length: 25.00 +/- 0.00
New best mean reward!
CPU times: user 7min 18s, sys: 14min 38s, total: 21min 56s
Wall time: 3min 47s
</pre>
</div>
</div>
</div>
</div>
</div></section></section><section><section>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper" tabindex="0">
Expand Down Expand Up @@ -9258,8 +9153,8 @@ <h3 id="Literature">Literature<a class="anchor-link" href="#Literature">¶</a></
<li><a href="http://incompleteideas.net/book/the-book.html">Reinforcement Learning: An Introduction</a> - Standard text book on RL.</li>
</ul>
<h3 id="Packages">Packages<a class="anchor-link" href="#Packages">¶</a></h3><ul>
<li><a href="https://www.gymlibrary.ml">Gym</a> - Defacto standard for implementing custom environments. Also provides a library of RL tasks widely used for benchmarking.</li>
<li><a href="https://github.com/DLR-RM/stable-baselines3">Stable Baslines3</a> - Provides reliable, benchmarked and easy-to-use implementations of the most important RL algorithms.</li>
<li><a href="https://gymnasium.farama.org/">Gymnasium</a>, (successor of <a href="https://www.gymlibrary.ml">OpenAI Gym</a>) - De facto standard for implementing custom environments. Also provides a library of RL tasks widely used for benchmarking.</li>
<li><a href="https://github.com/DLR-RM/stable-baselines3">Stable Baselines3</a> - Provides reliable, benchmarked and easy-to-use implementations of the most important RL algorithms.</li>
<li><a href="https://docs.ray.io/en/latest/rllib/index.html">Ray RLlib</a> - Part of the <em>Ray</em> Python package providing implementations of various RL algorithms with a focus on distributed training.</li>
</ul>
</div>
Expand Down

0 comments on commit fe3a466

Please sign in to comment.