-
Notifications
You must be signed in to change notification settings - Fork 156
Install Agent
We assume that your system is configured with a sudoable admin user named devops
.
Your Backend.AI manager should be already set up and running.
{NS} |
The etcd namespace (just create a unique string like domain names) |
{ETCDADDR} |
The etcd cluster address ({ETCDHOST}:{ETCDPORT} , localhost:2379 for development setup) |
{SSLCERT} |
The path to your SSL certificate (bundled with CA chain certificates) |
{SSLPKEY} |
The path to your SSL private key |
{S3AKEY} |
The access key for AWS S3 or compatible services[1] |
{S3SKEY} |
The secret key for AWS S3 or compatible services |
{DDAPIKEY} |
The Datadog API key |
{DDAPPKEY} |
The Datadog application key |
{SENTRYURL} |
The private Sentry report URL |
$ sudo apt-get -y update
$ sudo apt-get -y dist-upgrade
$ sudo apt-get install -y ca-certificates git-core supervisor
Here are some optional but useful packages:
$ sudo apt-get install -y vim tmux htop
(TODO)
Check out the Install CUDA guide.
Check out Install Python via pyenv for instructions.
Create a virtualenv named venv-agent
.
(Only in Linux) To enable detailed resource statistics, give the Python executable to have CAP_SYS_ADMIN
, CAP_SYS_PTRACE
, and CAP_DAC_OVERRIDE
capabilities.
$ sudo setcap cap_sys_ptrace,cap_sys_admin,cap_dac_override+eip "$(readlink -f $(pyenv which python))"
$ pyenv shell venv-agent
$ pip install -U setuptools pip
$ pip install -U backend.ai-agent
Check out the Install Monitoring and Logging Tools guide.
$ sudo vi /etc/supervisor/conf.d/apps.conf
[program:backendai-agent]
user = devops
stopsignal = TERM
stopasgroup = true
command = /home/devops/run-agent.sh
$ vi /home/devops/init-venv.sh
#!/bin/bash
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
pyenv shell venv-agent
$ sudo mkdir -p /var/cache/scratches
$ sudo chown devops:devops /var/cache/scratches
$ vi /home/devops/run-agent.sh
source /home/devops/init-venv.sh
umask 0002
export AWS_ACCESS_KEY_ID="{S3AKEY}"
export AWS_SECRET_ACCESS_KEY="{S3SEKEY}"
export DATADOG_API_KEY={DDAPIKEY}
export DATADOG_APP_KEY={DDAPPKEY}
export RAVEN_URI="{SENTRYURL}"
exec python -m ai.backend.agent.server \
--etcd-addr {ETCDADDR} \
--namespace {NS} \
--scratch-root=/var/cache/scratches
You need to pull the kernel container images first to actually spawn compute sessions.
The name and tag pairs of images must be also specified in backend.ai-manager/sample-configs/image-metadata.yml
file imported into etcd.
Here are the pull commands for a few commonly used Python-based images:
$ docker pull lablup/kernel-python:3.6-debian
$ docker pull lablup/kernel-python-tensorflow:1.8-py36
$ docker pull lablup/kernel-python-tensorflow:1.8-py36-gpu
For the full list of publicly available kernels, check out the kernels repository.
$ sudo supervisorctl reread
$ sudo supervisorctl start backendai-agent