Implement Cap on total number of processes on a worker #61

galenatjpl · 2021-10-05T17:40:46Z

Ceiling of number of RUNNING processes in total on a worker. The purpose would be to have a predictable limit/cap on the CPU and/or memory usage on a worker, at any given time.

So for example, if the individual BPMN limits are:

BPMN A: 5
BPMN B: 3
BPMN C: 2

and there is a cap:

CAP = 4

You could, for example have 1 A running, 2 B running, 1 C running (totals to 4), at most.

CURRENT WORKAROUND:

limit a worker to a single BPMN type
use a smaller/appropriately-sized worker instance class

eamonford · 2022-07-28T20:12:28Z

Implementing a configurable cap on the total number of processes that can run concurrently on a worker, regardless of what BPMNs are assigned to that worker, would allow us to more finely-tune the ASGs such that:

a) ASGs would use EC2 types that only have enough resources to run a single BPMN at a time
b) ASGs would have a much higher max, for greater worker-level parallelism

Being able to restrict workers to run only one BPMN at a time would allow us to still assign multiple BPMNs to an ASG (without the risk of multiple concurrent BPMNs crashing the worker), which would mean an ASG can be reused for many different types of BPMNs that have similar resource needs. Without being able to limit workers to running a single BPMN at a time, each BPMN would need to have its own ASG if we wanted to use smaller machines, which would be very unwieldy.

The benefit that we would ultimately derive from having this capability to cap the number of concurrent processes on a worker would be that we could achieve a higher level of worker parallelism in the pipeline for the same (or similar) cost of EC2 resources.

voxparcxls · 2022-10-18T21:05:06Z

Should we make a new configuration variable for the process cap? like e.g.: max_processes_per_worker=4
If not, should this implementation have a UI component? If its in the UI components it can start with a default value and it can be changed

eamonford · 2022-10-18T21:11:12Z

From an M2020 Ops perspective, being able to configure this value at deploy time would be the most important (so having a configuration variable). Being able to change this configuration in the UI might be nice, but not critical.

voxparcxls · 2022-10-18T21:27:00Z

Okay thats do-able

Getting a UI input that is configurable on the fly, after deployment/installation, would take extra effort.

What do you think, @galenatjpl ?

eamonford · 2022-10-18T21:30:57Z

Okay! If it would be a lot of effort to make it UI configurable on the fly, it might not be worthwhile implementing that -- I don't think we would need to be changing it on the fly, because changing this value would probably go alongside changing the EC2 instance types in most cases (for M20 at least), which also wouldn't be done on the fly.

voxparcxls · 2022-10-18T21:51:34Z

Alright, thanks for the response. Will get start on this soon. If any requirements change, let me know

galenatjpl · 2022-10-20T21:59:43Z

I agree that doing this at deploy time would be easiest, and quickest, and seems to satisfy M20 needs. However, that being said, the value would have to be stored somewhere (e.g. in the MariaDB cws_workers table in a column), so making it modifiable at runtime, by allowing a SQL update to be made is a good middle-ground here, without having to implement a UI.

eamonford · 2022-10-21T01:23:10Z

@galenatjpl That would definitely be a usable option for the operators!

eamonford · 2022-11-10T20:27:22Z

Here's a question: once this is implemented, how will priority be determined if multiple tasks are competing for a worker? For example, suppose BPMN A and BPMN B are both assigned to run on a given worker, and that worker has a configured cap of 1 concurrent process. If both BPMNs are waiting for that worker to free up, then once the worker finishes its current task, how will it decide whether to next take on the pending task from BPMN A or B first? Would it be determined by the timestamps of those scheduled tasks?

galenatjpl · 2022-12-01T21:57:20Z

@eamonford and @voxparcxls : Per Eamon's question, timestamps (FIFO) and the priority field in the database should determine the ordering of what actually runs on a worker. This is done already in SQL queries on the backend, but the logic and code would have to be slightly changed for this feature to do this. We want to ensure fair scheduling, while avoiding Starvation.

voxparcxls · 2022-12-02T01:01:57Z

Alright, the coordination for avoiding is Starvation is what I'm working on per ProcessCounter . Thanks for the added detail @galenatjpl

kgrimes2 mentioned this issue Nov 30, 2021

Multiple processes running when limit is set to 1 #50

Open

voxparcxls added this to the 2.3 milestone Mar 25, 2022

voxparcxls mentioned this issue Nov 16, 2022

IDS-9547: Max Processes Per Worker #126

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Cap on total number of processes on a worker #61

Implement Cap on total number of processes on a worker #61

galenatjpl commented Oct 5, 2021 •

edited

Loading

eamonford commented Jul 28, 2022

voxparcxls commented Oct 18, 2022

eamonford commented Oct 18, 2022

voxparcxls commented Oct 18, 2022

eamonford commented Oct 18, 2022 •

edited

Loading

voxparcxls commented Oct 18, 2022

galenatjpl commented Oct 20, 2022

eamonford commented Oct 21, 2022

eamonford commented Nov 10, 2022

galenatjpl commented Dec 1, 2022

voxparcxls commented Dec 2, 2022

Implement Cap on total number of processes on a worker #61

Implement Cap on total number of processes on a worker #61

Comments

galenatjpl commented Oct 5, 2021 • edited Loading

eamonford commented Jul 28, 2022

voxparcxls commented Oct 18, 2022

eamonford commented Oct 18, 2022

voxparcxls commented Oct 18, 2022

eamonford commented Oct 18, 2022 • edited Loading

voxparcxls commented Oct 18, 2022

galenatjpl commented Oct 20, 2022

eamonford commented Oct 21, 2022

eamonford commented Nov 10, 2022

galenatjpl commented Dec 1, 2022

voxparcxls commented Dec 2, 2022

galenatjpl commented Oct 5, 2021 •

edited

Loading

eamonford commented Oct 18, 2022 •

edited

Loading