-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Cap on total number of processes on a worker #61
Comments
Implementing a configurable cap on the total number of processes that can run concurrently on a worker, regardless of what BPMNs are assigned to that worker, would allow us to more finely-tune the ASGs such that: a) ASGs would use EC2 types that only have enough resources to run a single BPMN at a time Being able to restrict workers to run only one BPMN at a time would allow us to still assign multiple BPMNs to an ASG (without the risk of multiple concurrent BPMNs crashing the worker), which would mean an ASG can be reused for many different types of BPMNs that have similar resource needs. Without being able to limit workers to running a single BPMN at a time, each BPMN would need to have its own ASG if we wanted to use smaller machines, which would be very unwieldy. The benefit that we would ultimately derive from having this capability to cap the number of concurrent processes on a worker would be that we could achieve a higher level of worker parallelism in the pipeline for the same (or similar) cost of EC2 resources. |
|
From an M2020 Ops perspective, being able to configure this value at deploy time would be the most important (so having a configuration variable). Being able to change this configuration in the UI might be nice, but not critical. |
Okay thats do-able Getting a UI input that is configurable on the fly, after deployment/installation, would take extra effort. What do you think, @galenatjpl ? |
Okay! If it would be a lot of effort to make it UI configurable on the fly, it might not be worthwhile implementing that -- I don't think we would need to be changing it on the fly, because changing this value would probably go alongside changing the EC2 instance types in most cases (for M20 at least), which also wouldn't be done on the fly. |
Alright, thanks for the response. Will get start on this soon. If any requirements change, let me know |
I agree that doing this at deploy time would be easiest, and quickest, and seems to satisfy M20 needs. However, that being said, the value would have to be stored somewhere (e.g. in the MariaDB cws_workers table in a column), so making it modifiable at runtime, by allowing a SQL update to be made is a good middle-ground here, without having to implement a UI. |
@galenatjpl That would definitely be a usable option for the operators! |
Here's a question: once this is implemented, how will priority be determined if multiple tasks are competing for a worker? For example, suppose BPMN A and BPMN B are both assigned to run on a given worker, and that worker has a configured cap of 1 concurrent process. If both BPMNs are waiting for that worker to free up, then once the worker finishes its current task, how will it decide whether to next take on the pending task from BPMN A or B first? Would it be determined by the timestamps of those scheduled tasks? |
@eamonford and @voxparcxls : Per Eamon's question, timestamps (FIFO) and the priority field in the database should determine the ordering of what actually runs on a worker. This is done already in SQL queries on the backend, but the logic and code would have to be slightly changed for this feature to do this. We want to ensure fair scheduling, while avoiding Starvation. |
Alright, the coordination for avoiding is Starvation is what I'm working on per |
Ceiling of number of RUNNING processes in total on a worker. The purpose would be to have a predictable limit/cap on the CPU and/or memory usage on a worker, at any given time.
So for example, if the individual BPMN limits are:
BPMN A: 5
BPMN B: 3
BPMN C: 2
and there is a cap:
CAP = 4
You could, for example have 1 A running, 2 B running, 1 C running (totals to 4), at most.
CURRENT WORKAROUND:
The text was updated successfully, but these errors were encountered: