Skip to content

Commit

Permalink
[YUNIKORN-2509] Remove state-aware scheduling documentation (apache#417)
Browse files Browse the repository at this point in the history
Closes: apache#417
  • Loading branch information
craigcondit committed Mar 20, 2024
1 parent df41c3d commit 837d53a
Show file tree
Hide file tree
Showing 9 changed files with 5 additions and 67 deletions.
12 changes: 0 additions & 12 deletions docs/api/scheduler.md
Original file line number Diff line number Diff line change
Expand Up @@ -416,10 +416,6 @@ In the example below there are three allocations belonging to two applications,
"time": 1648741409145224000,
"applicationState": "Accepted"
},
{
"time": 1648741409145509400,
"applicationState": "Starting"
},
{
"time": 1648741409147432100,
"applicationState": "Running"
Expand Down Expand Up @@ -523,10 +519,6 @@ In the example below there are three allocations belonging to two applications,
"time": 1648741409145224000,
"applicationState": "Accepted"
},
{
"time": 1648741409145509400,
"applicationState": "Starting"
},
{
"time": 1648741409147432100,
"applicationState": "Running"
Expand Down Expand Up @@ -675,10 +667,6 @@ Field `uuid` has been deprecated, would be removed from below response in YUNIKO
"time": 1648741409145224000,
"applicationState": "Accepted"
},
{
"time": 1648741409145509400,
"applicationState": "Starting"
},
{
"time": 1648741409147432100,
"applicationState": "Running"
Expand Down
File renamed without changes.
Binary file modified docs/assets/application-state.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/assets/k8shim-node-state.png
Binary file not shown.
3 changes: 0 additions & 3 deletions docs/design/historical_usage_tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,6 @@ APP_REQUEST = 201 // Request changed
APP_REJECT = 202 // Application rejected on create
APP_NEW = 203 // Application added with state new
APP_ACCEPTED = 204 // State change to accepted
APP_STARTING = 205 // State change to starting
APP_RUNNING = 206 // State change to running
APP_COMPLETING = 207 // State change to completing
APP_COMPLETED = 208 // State change to completed
Expand Down Expand Up @@ -466,7 +465,6 @@ It serves as a reference for the core scheduler actions that will trigger the ev
| APP | REMOVE | REQUEST_CANCEL | RequestID | Removal triggered by application removal |
| APP | SET | APP_NEW | | State change: New |
| APP | SET | APP_ACCEPTED | | State change: Accepted |
| APP | SET | APP_STARTING | | State change: Starting |
| APP | SET | APP_RUNNING | | State change: Running |
| APP | SET | APP_COMPLETING | | State change: Completing |
| APP | SET | APP_COMPLETED | | State change: Completed |
Expand Down Expand Up @@ -568,7 +566,6 @@ An application undergoes state transitions, so the following events are generate
- Add new application
- State change: New
- State change: Accepted
- State change: Starting
- State change: Running
- State change: Completing
- State change: Completed
Expand Down
6 changes: 3 additions & 3 deletions docs/design/user_group_manager.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,11 +105,11 @@ Since placeholders and placeholder timeout can play a role in state changes the

### Running state entry

The application when submitted and placed into a queue is in the _New_ state. At this point there is no allocation or pending request present on the application. After one or more requests, _AllocationAsks_, are added the application moves into an _Accepted_ state. The _Accepted_ state is exited when the first _Allocation_ is added to the application. The application then transitions into the Starting state.
The application when submitted and placed into a queue is in the _New_ state. At this point there is no allocation or pending request present on the application. After one or more requests, _AllocationAsks_, are added the application moves into an _Accepted_ state. The _Accepted_ state is exited when the first _Allocation_ is added to the application. The application then transitions into the Running state.

At this point a resource quota would be used by the application and the application should be considered as running from a tracking perspective. This means that the addition of the first _Allocation_ onto the application also must be the trigger point for the increase of the running applications. This trigger point for tracking is when the application is in the _Accepted_ state. This is also the point at which the group for the usage tracking needs to be set as described in the [group limitations](#group-limitations).

Note that currently, the application state transition code block in application_state.go updates the application running queue metrics when the application enters _Running_ state. The metric must be updated to be consistent with the above definition of a running application. Linking this back to a state transition the entry into the Starting state should be used.
Note that currently, the application state transition code block in application_state.go updates the application running queue metrics when the application enters _Running_ state. The metric must be updated to be consistent with the above definition of a running application. Linking this back to a state transition the entry into the Running state should be used.

### Running state exit

Expand Down Expand Up @@ -407,4 +407,4 @@ An example below the approximate output for the groups endpoint for one group:
}
}
]
```
```
12 changes: 1 addition & 11 deletions docs/developer_guide/scheduler_object_states.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,7 @@ An application can have the following states:
* New: A new application that is being submitted or created, from here the application transitions into the accepted state when it is ready for scheduling.
The first ask to be added will trigger the transition.
* Accepted: The application is ready and part of the scheduling cycle.
On allocation of the first ask the application moves into a starting state.
This state is part of the normal scheduling cycle.
* Starting: The application has exactly one allocation confirmed this corresponds to one running container/pod.
The application transitions to running if and when more allocations are added to the application.
This state times out automatically to prevent applications that consist of just one allocation from getting stuck in this state.
The current time out is set to 5 minutes, and cannot be changed.
If after the timeout expires the application will auto transition to running.
The state change on time out is independent of the number of allocations added.
On allocation of the first ask the application moves into a running state.
This state is part of the normal scheduling cycle.
* Running: The state in which the application will spend most of its time.
Containers/pods can be added to and removed from the application.
Expand Down Expand Up @@ -118,6 +111,3 @@ The node status changes based on the status provided by the resource manager (sh

### Task
![task state diagram](./../assets/k8shim-task-state.png)

### Node
![node state diagram](./../assets/k8shim-node-state.png)
37 changes: 0 additions & 37 deletions docs/user_guide/sorting_policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,43 +86,6 @@ All resources defined on the application will be taken into account when calcula

The result is that the resources available are spread equally over all applications that request resources.

### StateAwarePolicy
Short description: limit of one (1) application in Starting or Accepted state

Config value: `stateaware`

**DEPRECATED:** The `stateaware` policy is **deprecated** in YuniKorn 1.5.0 and
will be **removed** in YuniKorn 1.6.0. To preserve backwards compatibility,
`stateaware` will become an alias for `fifo` in YuniKorn 1.6.0 and later.
Users are encouraged to migrate to `fifo` and utilize either gang scheduling or
`maxapplications` to limit concurrency instead.

This sorting policy requires an understanding of the application states.
Applications states are described in the [application states](developer_guide/scheduler_object_states.md#application-state) documentation.

Before sorting applications the following filters are applied to all applications in the queue:
The first filter is based on the application state.
The following applications pass through the filter and generate the first intermediate list:
* all applications in the state _running_
* _one_ (1) application in the _starting_ state
* if there are _no_ applications in the _starting_ state _one_ (1) application in the _accepted_ state is added

The second filter takes the result of the first filter as an input.
The preliminary list is filtered again: all applications _without_ a pending request are removed.

After filtering based on status and pending requests the applications that remain are sorted.
The final list is thus filtered twice with the remaining applications sorted on create time.

To recap the _staring_ and _accepted_ state interactions:
The application in the _accepted_ state is only added if there is no application in the _starting_ state.
The application in the _starting_ state does not have to have pending requests.
Any application in the _starting_ state will prevent _accepted_ applications from being added to the filtered list.

For further details see the [Example run](design/state_aware_scheduling.md#example-run) in the design document.

The result is that already running applications that request resources will get resources first.
A drip feed of one new applications is added to the list of running applications to be allocated after all running applications.

## Node sorting
The node sorting policy is set for a partition via the config.
Each partition can use a different policy.
Expand Down
2 changes: 1 addition & 1 deletion sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,6 @@ module.exports = {
'design/generic_resource',
'design/priority_scheduling',
'design/resilience',
'design/state_aware_scheduling',
'design/config_v2',
'design/scheduler_configuration',
]
Expand All @@ -106,6 +105,7 @@ module.exports = {
'archived_design/namespace_resource_quota',
'archived_design/predicates',
'archived_design/scheduler_core_design',
'archived_design/state_aware_scheduling',
'archived_design/cross_queue_preemption',
'archived_design/pluggable_app_management',
]
Expand Down

0 comments on commit 837d53a

Please sign in to comment.