-
Notifications
You must be signed in to change notification settings - Fork 5
/
admin-guide.html
392 lines (325 loc) · 15.3 KB
/
admin-guide.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
<html>
<head>
<title>Vcycle admin guide</title>
</head><body>
<h1 align=center>Vcycle Admin Guide<!-- version --></h1>
<p align=center><b>Andrew McNab <Andrew.McNab AT cern.ch></b>
<p>
Vcycle is a VM factory system which implements the Vacuum Platform described in
<a
href="http://hepsoftwarefoundation.org/notes/HSF-TN-2016-04.pdf">HSF-TN-2016-04</a>,
using a cloud service such as OpenStack to carry out its VM instantiation
decisions.
<h2>Contents</h2>
<ul>
<li><a href="#quickstart">Quick start</a>
<li><a href="#stepbystep">Configuration step-by-step</a>
<ul>
<li><a href="#httpd">CernVM images</a>
<li><a href="#cernvm">CernVM images</a>
<li><a href="#installation">Installation of Vcycle: tar vs RPM</a>
<li><a href="#settings">Configuration of the Vcycle spaces</a>
<li><a href="#gocdbggus">GOCDB and GGUS</a>
<li><a href="#machinetypes">Setting up machinetypes</a>
</ul>
<li><a href="#startingstopping">Starting and stopping vcycled</a>
<li><a href="#apel">APEL accounting</a>
<li><a href="#fizzlebackoff">Setting fizzle_seconds and backoff_seconds</a>
</ul>
<h2 style="border-bottom: 1px solid"><a name="quickstart">Quick start</a></h2>
<p>
By following this quick start recipe you can verify that your cloud service
will work with Vcycle and see it creating and destroying virtual machines.
<p>
To follow the quick start, we assume you have access to an OpenStack
service and a Scientific Linux 6 (or CentOS 6 or RHEL 6) machine to use as a
VM factory on which
to run Vcycle. This factory machine can itself be a VM, but don't choose a name
which starts with "vcycle-" . You must be able to perform
Nova VM management operations from the command line on this machine. This may
require changes to the OpenStack and/or local firewall configuration. Make sure
you can list VMs in the tenancy/project you plan to use, whether they are created
with the Horizon web dashboard or from the command line too.
<p>
Then install the vcycle RPM. This installs a copy of this guide and some
example files in /usr/share/doc/vcycle-VERSION and the vcycle.conf man page
in the usual place.
<p>
Copy /usr/share/doc/vcycle-VERSION/example.vcycle.conf into /etc/vcycle.d
the Vcycle configuration directory. You don't need to change the name, and
it must end in ".conf" .
<p>
Make sure that root on the VM factory machine has an RSA ssh keypair. Run
the command ssh-keygen if /root/.ssh/id_rsa does not exist already. This
key pair will be authorized to ssh into the VMs which is very useful for
debugging. You may need to copy the keypair to another machine if the
factory is not allowed ssh access to the OpenStack VMs. If this is
completely barred, then production use of Vcycle is still possible as
this feature is just used for debugging.
<p>
Change the following values in the <b>[space vcycle01.example.com]</b>
section of example.vcycle.conf:
<dl>
<dt><b>url</b>
<dd>This is the KeyStone URL for your OpenStack service. This is given
by the Identity field when looking at the API Access tab of Access
&Security in the Horizon web dashboard.
<dt><b>tenancy_name</b>
<dd>Is the OpenStack tenancy or project name. Spaces are allowed and
quotes are not needed (or allowed).
<dt><b>username</b>
<dd>Is the username for API access.
<dt><b>password_base64</b>
<dd>Is the corresponding password, encoded in Base64. You can do this
with something like: echo -n 'PASSword' | base64
<dt><b>processors_limit</b>
<dd>If there are already VMs in the tenancy/project, then you should
increase this from 1 to one more than the number of processors used by
existing VMs. Otherwise leave it at 1 for now.
</dl>
<p>
The next options are for the the section
<b>[machinetype vcycle01.example.com example]</b>
<dl>
<dt><b>flavor_name</b>
<dd>The flavor of the VMs which will be created.
<dt><b>user_data_option_cvmfs_proxy</b>
<dd>It should be set to the URL of an HTTP cache you have access to. If
you are already using cvmfs for grid worker nodes, you can use the same
value. CernVM will mostly work without this option so you can leave it
commented out for now. However, if you go into production without a cache
set, <b>you will get random filesystem errors!</b>
</dl>
<p>
Now start vcycle with service vcycled start
<p>
You will see messages appear in /var/log/vcycle/vcycled.log and you see a VM
appear in your OpenStack tenancy once a message saying Vcycle has created
a VM is logged. If you are able to ssh to VMs, then you should be able
to ssh into the VM.
<p>
If everything is working, you can see from the log file how a
CernVM image is fetched from
https://repo.gridpp.ac.uk/vacproject/example/cernvm3.iso and
uploaded to OpenStack, and a user_date template is fetched from
https://repo.gridpp.ac.uk/vacproject/example/user_data and used
to create the VM.
<p>
If you look in the subdirectories of /var/lib/vcycle/machines you
will see files used to create the VM and record its state changes.
<h2 style="border-bottom: 1px solid"><a name="stepbystep">Configuration step-by-step</a></h2>
<p>
This part of the guide covers the same ground as the quick start
guide but in a lot more detail. It's intended to help you choose
how best to configure your site.
<p>
The configuration files /etc/vcycle.conf and /etc/vcycle.d/*conf use
the Python ConfigParser syntax, which is similar to MS
Windows INI files. The files are divided into sections, with each section
name in square brackets. For example: [space vcycle01.example.com]. Each
section contains
a series of option=value pairs. Values can continue onto the next line
by starting that line with some white space. Sections with the same name
are merged and if options are duplicated, later values overwrite values
given earlier.
Any configuration file ending in .conf in the
directory /etc/vcycle.d is read. These files are read in
alphanumeric order, and then /etc/vcycle.conf is read if present.
<p>
Based on this ordering in /etc/vcycle.d/, options from space.conf
would override any given
in site.conf, but themselves be overwritten by options from
vm.conf .
<p>
You may find it convenient to use multiple configuration files if
you have a complicated setup. For instance: space01.conf,
space01-machinetype01.conf, space01-machinetype02.conf,
space02.conf, space02-machinetype01.conf each containing one
[space ...] or [machinetype ... ...] section.
<h3><a name="httpd">Apache httpd</a></h3>
<p>
Vcycle uses an Apache httpd running on the VM factory to publish
Machine/Job Features values according to
<a href="http://hepsoftwarefoundation.org/notes/HSF-TN-2016-02.pdf">HSF-TN-2016-02</a>
and to receive $JOBOUTPUTS files according to
<a href="http://hepsoftwarefoundation.org/notes/HSF-TN-2016-04.pdf">HSF-TN-2016-04</a>
which is essential for VMs using the heartbeat and shutdown_message
mechanisms.
<p>
You can meet this requirement by installing the standard httpd RPM
from SL6 (or CentOS 6 or RHEL 6) and copying two files from
/usr/share/doc/vcycle-VERSION:
<ul>
<li>vcycle.httpd.conf to /etc/httpd/conf/httpd.conf
<li>vcycle.httpd.inc to /etc/httpd/includes/vcycle.httpd.inc
</ul>
and then (re)start httpd. You must ensure that firewall rules
allow access to TCP port 443 on the VM factory from the VMs.
<p>
If this port access is not possible, then you should replace
443 in both configuration files with a suitable value, and
set the https_port option in the [space ...] sections. This
will cause the $MACHINEFEATURES, $JOBFEATURES, and $JOBOUTPUTS
URLs to include this port number. See the vcycle.conf man page
for more details.
<h3><a name="cernvm">CernVM images</a></h3>
<p>
Any ISO or HDD image which is compatible with your OpenStack
service shoudl also work with Vcycle. However, Vcycle works
best with images which look for a user_data contextualization
file, and CernVM boot images are particularly convenient due to
their very small size.
<p>
If you need to download an image, they can be found on
the <a href="http://cernvm.cern.ch/portal/downloads">CernVM
downloads page</a>.
<p>
However, most experiments will supply you with their own
URL from which Vcycle can automatically fetch their current
designated image version using the user_data option of the [machinetype ... ...]
section, which Vcycle caches in /var/lib/vcycle/imagecache .
<h3><a name="installation">Installation of Vcycle: tar vs RPM</a></h3>
<p>
RPM is the recommended installation procedure, and RPMs are available
from the <a href="https://repo.gridpp.ac.uk/vacproject/vcycle/">Vcycle
downloads area</a>.
<p>
It is also possible to install Vcycle from a tar file, using the install
Makefile target.
<h3><a name="settings">Configuration of the Vcycle space</a></h3>
<p>
Each [space ...] section must include a space name, which is also used
as the virtual CE name. The section name is of the format [space
vcycle01.example.com] .
<p>
The space must have a value for the option api which defines how to
talk to the service (such as api = openstack), and then other API specific
options. For OpenStack, these are url, tenancy_name, username, and
password_base64.
<p>
See the vcycle.conf man page for more details.
<h3><a name="gocdbggus">GOCDB and GGUS</a></h3>
<p>
Vcycle is designed to work within the WLCG/EGI grid model of sites composed
of one or more CEs. Each Vcycle space name corresponds to one CE within a site,
and can co-exist with conventional CREAM or ARC CEs. If you are at a
site registered in the <a href="http://goc.egi.eu/">GOCDB</a>, you
should add your space name(s) to your site in GOCDB as services. There
is a registered service type (uk.ac.gridpp.vcycle) for Vcycle spaces.
<p>
Problems encountered during the operation of Vcycle in production may
appear as tickets in <a href="https://ggus.eu/">GGUS</a>. The
<a href="https://wiki.egi.eu/wiki/GGUS:Vac_FAQ">Vac/Vcycle Support Unit</a>
appears under "Second Level - Software" on the GGUS
"Assign ticket to support unit" menu.
<p>
Vcycle writes APEL accounting records as described below. The GOCDB site
name given by gocdb_sitename in the [space ...] section is included in these
records. To avoid the risk of polluting the central APEL database with incorrect
site names, please use your real GOCDB sitename for this option.
<h3><a name="machinetypes">Setting up machinetypes</a></h3>
<p>
One [machinetype ... ...] section must exist for each machinetype in the system, with
the name of the space and machinetype given in the section name, such as [machinetype
vcycle01.example.com example] .
A machinetype name must only consist of lowercase letters, numbers,
and hyphens. The vcycle.conf man page lists the options
that can be given for each machinetype.
<p>
The target_share option for the machinetype gives
the desired share of the total VMs available in this space for that
machinetype. The shares do not need to add up to 1.0, and if a share is not given
for a machinetype, then it is set to 0. The creation of new VMs can be completely
disabled by setting all shares to 0. Vcycle consults these shares
when deciding which machinetype to start as VM slots become available.
<p>
For ease of management, the target_shares options can be grouped
together in a separate file in /etc/vcycle.d apart from the main [machinetype
... ...]
sections, which is convenient if shares
are generated automatically or frequently edited by hand. For example:
<pre>
[machinetype vcycle01.example.com example1]
target_share = 5.0
[machinetype vcycle01.example.com example2]
target_share = 6.0
[machinetype vcycle01.example.com example3]
target_share = 7.0
</pre>
<p>
The experiment or VO responsible for each machinetype should supply
step by step intructions on how to set up the rest of the [machinetype ...
...]
section and how to create any hostcert.pem and hostkey.pem
pair to give to the VM.
<h2 style="border-bottom: 1px solid"><a name="startingstopping">Starting and stopping
vcycled</a></h2>
<p>
The Vcycle daemon, vcycled, is started and stopped by
/etc/rc.d/init.d/vcycled
on conjunction with the usual service and chkconfig commands. As the
configuration files are reread at the start of each cycle (by default,
one per minute) <b>it is not necessary to restart vacd after changing the
configuration</b>.
<p>
Furthermore, as vcycled rereads the current state of the VMs from status
files and the cloud service at the start of each cycle, vcycled can be
restarted without disrupting running VMs or losing information about
their state.
In most cases it will even be possible to upgrade vcycled from one patch
level to another within the same minor release without having to
drain the site of running VMs. If problems arise during upgrades,
the most likely outcome is that Vcycle will fail to create new VMs until
the configuration is fixed, but the existing VMs will continue to run.
("We want Vcycle failures to look like planned draining.")
<h2 style="border-bottom: 1px solid"><a name="apel">APEL accounting</a></h2>
<p>
When Vcycle detects that a VM has run for at least fizzle_seconds and
now finished, it writes a copy of the APEL
accounting message to subdirectories of /var/lib/vcycle/apel-archive .
If you have set gocdb_sitename in [space ...], then the file is also
written to /var/lib/vcycle/apel-outgoing .
<p>
Vcycle uses the UUID of the VM as the local job
ID, the VM factory hostname as the local user ID, and the machinetype name as the
batch queue name. A unique user DN is constructed from the components
of the Vcycle space name. For example, vac01.example.com becomes
/DC=com/DC=example/DC=vac01 . If the accounting_fqan option is present in
the [machinetype ... ...] section, then for VMs of that type the value of that option
is included as the user FQAN, which indicates the VO associated with the VM.
The GOCDB sitename field is either the value you
gave explicitly or the Vcycle space name as a placeholder.
<p>
These accounting messages are designed to be published to the central
APEL service using the
standard APEL ssmsend command, which can be run on each factory machine
from cron. Please see the <a href="https://wiki.egi.eu/wiki/APEL">APEL
SSM client documentation for details</a>. To submit records you should agree
use of APEL with the APEL team, have your certificate authorized, set up the
correct APEL entries in GOCDB, and do any requested tests.
<h2 style="border-bottom: 1px solid"><a name="fizzlebackoff">Setting fizzle_seconds and backoff_seconds</a></h2>
<p>
For each machinetype you can set values of fizzle_seconds and backoff_seconds
to make the most efficient use of your resources when deciding to start
VMs.
<p>
You may be able to optimize the value of fizzle_seconds by looking for VMs
which fail to find work and seeing how long they ran for. However, for most
VM architectures and VM factories a value of
600 seconds will work perfectly well and you do not need to spend a lot
of effort arriving at an ideal number.
<p>
The value of backoff_seconds is a matter for your site policy about how
quickly to recover from periods when each experiment has no work available.
If you know a particular experiment usually has plenty of work, then you
could set a low value of 600 seconds, so that Vcycle will try creating one
VM every 10 minutes or so to see if work is available again after a short
idle period. Alternatively,
if you know that an experiment usually has no work for you, then you could
set much larger values of many hours between creating VMs. In either
case, once Vcycle identifies that VMs for the experiment are indeed passing
fizzle_seconds and finding work to do, then the backoff_seconds option is
ignored and VMs are created as VM slots become free in line with the
machinetype's target_share.
</body>
</html>