Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend active/active cluster #482

Merged
merged 31 commits into from
Aug 21, 2015
Merged

Backend active/active cluster #482

merged 31 commits into from
Aug 21, 2015

Conversation

eea03
Copy link

@eea03 eea03 commented Jul 28, 2015

UV can now run with 2 (or N in theory) backend processes, which run simultaneously, so executions are processed by two nodes.

  • each backend must run on its own server (app lock prevents from running multiple backends on one server)
  • backend cluster was developed and tested only with PostgreSQL. Should work also with MySQL but no tested and no guarantee
  • backend cluster is purely optional and must be explicitly turned on by configuration properties
  • by default UV runs in single mode - without any markable changes to previous versions
  • in cluster mode, frontend communicates with backends via database (checking if backend online) - in single mode, frontend communicates with backend directly via RMI

eea03 added 4 commits July 24, 2015 15:37
…ble used for backends exclusive locking; PipelineExecution now has backend_id parameter (=executing backend)
…ions (running executions are now failed, not restarted - configurable behavior, by default old behavior is preserved); Bug fixes; Adaptation of backend unit tests
… scheduler; Fixes in synchronization via DB table
@eea03 eea03 changed the title Backend active/passive cluster Backend active/active cluster Aug 4, 2015
eea03 and others added 4 commits August 4, 2015 17:59
@skrchnavy skrchnavy added this to the Release v2.2.0 milestone Aug 6, 2015
@tomas-knap
Copy link

Info about testing the cluster (in SK):

testovane s 2000 QUEUED exekuciami naliatymi do systemu naraz
pomocou SQL skriptov bolo naraz do systemu nasypanych 2000 QUEUED executions
pocas testu bolo este viackrat pridavanych zopar QUEUED exekucii s IGNORE priority = musia sa pustit aj ked uz backend spracovava limit exekucii
testy ukazali, ze kazda exekucia bola spustena prave raz - dokazane cez pocty a statusy exekucii a logy (prislusne SQL commandy su v prilozenom skripte)
obidva backendy spracovali priblizne rovnake mnozstvo exekucii (pravdepodobne by bolo uplne rovnake ak by sa tam nepridavali IGNORE priority exekucie - pri mensich testoch mi vysli uplne rovnake pocty spracovanych exekucii)
testovane aj paralelne spracovanie schedules
naliatych niekolkokrat 10 schedules pre 10 roznych pipeline naraz do systemu - pre kazdy schedule bola vytvorena prave jedna pipeline execution (otestovane cez SQL commandy)
testovanie ukazalo, ze nie su ziadne concurrent issues ani pri vysokom loade

@tomas-knap
Copy link

Note: In case of cluster, shared directory should be used for DPU templates (JAR files), so that backends are automatically notified as any DPU template is changed. Otherwise, Maintenance of DPUs will be tougher - when certain DPU is imported/replaced via frontend, backends on other servers (using different directory than frontend) has to be updated manually - thus when new version of the DPU is prepared and loaded via UV admin interface, it has to be also manually deployed to backends.

@tomas-knap
Copy link

Note: Cluster of backends will not work properly in case of DPUs relying on various caches unless the directory with caches is a shared directory that all backends can access.

@tomas-knap
Copy link

Please unify the way how frontend checks whether backend is online. So even in case of single backend, RMI is not used for checking whether backend is online, but DB is checked instead. (as in cluster mode)

Expected changes for frontend:

  • No need to define whether frontend works in single/cluster mode using backend.cluster.mode
  • status update whether backend is online will not be instant, but will have certain delay based on the settings of the param backend.alive.limit (I suggest by default 10s)
  • when pipeline is run/debug from frontend, it may take up to 2s before certain backend really runs the pipeline

Expected changes for backend:

  • backend.id is mandatory also in cluster mode
  • backend.cluster.mode property not needed

This update may be also solved in a separate pull request (if not doable before vacation)

@tomas-knap
Copy link

I tested single mode, which works fine.

But when I tried cluster mode with mysql I got the following problem:
#509

In #509, there is a suggested solution. Please test whether the suggested solution works. If not and cannot be figured out quickly, we can solve that in a separate pull request.

Otherwise approved.

@eea03
Copy link
Author

eea03 commented Aug 19, 2015

I fixed the issue with MySQL as suggested.
This time I executed full proof tests both for MySQL and PostgreSQL.

Test description:

  • 2 backend servers (Win 10 + Virtual Ubuntu) - Postgres 9.4.4 / MySQL 5.6.25
  • 200 QUEUED executions of the same pipeline inserted at once at the beginning of the test
  • 3x during test inserted 10 new IGNORE priority QUEUED executions

Test result

  • for both databases, test was successful
  • executions were distributed more or less evenly
  • each execution was started and processed only by one backend
  • IGNORE priority executions were processed instantly even if limit was exceeded

Everything should be OK now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants