This directory contains benchmark tests for the following workload managers and schedulers:
- Kueue
- Volcano
- Yunikorn
- Run:ai
The benchmark tests involve submitting workloads intended to evaluate the scheduler's performance under specific scenarios.
These workloads are designed to fully utilize the cluster under optimal scheduling conditions.
One approach to benchmarking is to run this workload on clusters with different schedulers and then compare the average GPU occupancy of the nodes.
For all workload managers except Run:ai, the benchmark test involves two sequential workflows. The first workflow registers the CRDs, and the second workflow runs the common part of the test. Run:ai requires additional customization and thus has a separate workflow
The gang-scheduling benchmark workflow operates on 32 virtual GPU nodes, submitting a burst of 53 jobs with replica numbers ranging from 1 to 32 in a predetermined order.
To run the benchmark test for Kueue:
./scripts/benchmarks/gang-scheduling/run-kueue.sh
To run the benchmark test for Run:ai
./scripts/benchmarks/gang-scheduling/run-runai.sh
The scaling benchmark workflow operates on 700 virtual GPU nodes with tho workflows. The first workflow submits is a job with 700 replicas, the second workflow submits a batch of 700 single-node jobs.
To run the benchmark test for Volcano:
./bin/knavigator -workflow 'resources/benchmarks/scaling/workflows/{config-nodes.yaml,config-volcano.yaml,run-test-multi.yaml}'
To run the benchmark test for Run:ai
./bin/knavigator -workflow 'resources/benchmarks/scaling/workflows/{config-nodes.yaml,config-runai.yaml,runai-test-single.yaml}'
The network topology benchmark workflow runs on 12 virtual GPU nodes, arranged to simulate a tree-like network topology. Out of these, 5 nodes are marked as busy, leaving 7 nodes available. The workflow submits a job with 3 replicas.
From a network connectivity standpoint, the optimal assignment would be nodes n5, n7, and n8, as shown in the following diagram.
To run the benchmark test for Run:ai
./bin/knavigator -workflow 'resources/benchmarks/nwtopo/workflows/{config-nodes.yaml,runai-test.yaml}'