Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pause lifecyle changes and add drained status #2028

Merged
merged 6 commits into from
Sep 6, 2024

Conversation

kohlisid
Copy link
Contributor

@kohlisid kohlisid commented Sep 4, 2024

fixes #2027
fixes #1991

DrainedOnPause field is added here

  1. It will be set when the pipeline is moved to a Paused state after a successful drain has been completed
  2. It will always be reset whenever phase is moved back to running

@kohlisid
Copy link
Contributor Author

kohlisid commented Sep 4, 2024

  1. Pipeline created before ISB Service, and desiredPhase=Paused
skohli@macos-JQWR9T560R numaflow % kubectl get pl
NAME              PHASE    VERTICES   AGE   MESSAGE
simple-pipeline   Failed   3          13s   ISB Service not found.

We keep the phase as failed and do not try to pause
#1992 (comment)

@vigith vigith changed the title chore: pause lifecyle changes and drained status fix: pause lifecyle changes and drained status Sep 4, 2024
Copy link

codecov bot commented Sep 4, 2024

Codecov Report

Attention: Patch coverage is 53.84615% with 6 lines in your changes missing coverage. Please review.

Project coverage is 61.89%. Comparing base (8fc99bb) to head (ec3adf9).

Files with missing lines Patch % Lines
pkg/apis/numaflow/v1alpha1/pipeline_types.go 0.00% 4 Missing ⚠️
pkg/reconciler/pipeline/controller.go 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2028      +/-   ##
==========================================
+ Coverage   61.85%   61.89%   +0.04%     
==========================================
  Files         318      318              
  Lines       28979    28986       +7     
==========================================
+ Hits        17924    17940      +16     
+ Misses      10098    10096       -2     
+ Partials      957      950       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kohlisid kohlisid changed the title fix: pause lifecyle changes and drained status fix: pause lifecyle changes and add drained status Sep 5, 2024
@kohlisid
Copy link
Contributor Author

kohlisid commented Sep 5, 2024

 drainedOnPause: true
  lastUpdated: "2024-09-05T07:36:36Z"
  mapUDFCount: 1
  message: Pipeline paused
  observedGeneration: 5
  phase: Paused
  reduceUDFCount: 0
  sinkCount: 1
  sourceCount: 1
  udfCount: 1
  vertexCount: 3
skohli@macos-JQWR9T560R numaflow % kubectl get pl                        
NAME              PHASE    VERTICES   AGE   MESSAGE
simple-pipeline   Paused   3          8h    Pipeline paused

@kohlisid
Copy link
Contributor Author

kohlisid commented Sep 5, 2024

  lastUpdated: "2024-09-05T15:36:32Z"
  mapUDFCount: 1
  observedGeneration: 6
  phase: Running
  reduceUDFCount: 0
  sinkCount: 1
  sourceCount: 1
  udfCount: 1
  vertexCount: 3
skohli@macos-JQWR9T560R numaflow % kubectl get pl                        
NAME              PHASE     VERTICES   AGE   MESSAGE
simple-pipeline   Running   3          8h   

Signed-off-by: Sidhant Kohli <[email protected]>
@kohlisid
Copy link
Contributor Author

kohlisid commented Sep 5, 2024

skohli@macos-JQWR9T560R numaflow %   kubectl patch pl simple-pipeline --type=merge --patch '{"spec": {"lifecycle": {"desiredPhase": "Paused"}}}' 

pipeline.numaflow.numaproj.io/simple-pipeline patched (no change)
skohli@macos-JQWR9T560R numaflow % kubectl get pl                                                                                               
NAME              PHASE    VERTICES   AGE   MESSAGE
simple-pipeline   Failed   3          11s   ISB Service not found.

@kohlisid
Copy link
Contributor Author

kohlisid commented Sep 5, 2024

skohli@macos-JQWR9T560R numaflow %   kubectl patch pl simple-pipeline --type=merge --patch '{"spec": {"lifecycle": {"desiredPhase": "Paused"}}}'

pipeline.numaflow.numaproj.io/simple-pipeline patched
skohli@macos-JQWR9T560R numaflow % kubectl get po                                                                                               
NAME                               READY   STATUS      RESTARTS   AGE
simple-pipeline-cln-ac4416-l6xds   0/1     Completed   0          24s
skohli@macos-JQWR9T560R numaflow % kubectl get pl
NAME              PHASE    VERTICES   AGE   MESSAGE
simple-pipeline   Failed   3          16s   ISB Service not found.

@kohlisid
Copy link
Contributor Author

kohlisid commented Sep 5, 2024

  drainedOnPause: false
  lastUpdated: "2024-09-05T16:48:56Z"
  mapUDFCount: 1
  observedGeneration: 4
  phase: Running
  reduceUDFCount: 0
  sinkCount: 1
  sourceCount: 1
  udfCount: 1
  vertexCount: 3
skohli@macos-JQWR9T560R numaflow % kubectl get pl                        
NAME              PHASE     VERTICES   AGE    MESSAGE
simple-pipeline   Running   3          6m9s  

Signed-off-by: Sidhant Kohli <[email protected]>
@kohlisid kohlisid marked this pull request as ready for review September 5, 2024 16:53
@kohlisid kohlisid requested a review from juliev0 September 5, 2024 16:53
@kohlisid kohlisid self-assigned this Sep 5, 2024
@@ -599,7 +620,8 @@ func buildVertices(pl *dfv1.Pipeline) map[string]dfv1.Vertex {
copyVertexTemplate(pl, vCopy)
copyVertexLimits(pl, vCopy)
replicas := int32(1)
if pl.Status.Phase == dfv1.PipelinePhasePaused {
// If the desired phase is pause or we are in the middle of pausing we should not start any vertex replicas
if isLifecycleChange(pl) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I just noticed that isLifecycleChange() == true in the case that the oldPhase was Paused but the desiredPhase==Running. I haven't read enough of the other code yet to confirm that that's okay here ?

Copy link
Contributor Author

@kohlisid kohlisid Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be a resume pipeline case and I kept it like that for it to be handled by the
resume-logic
So the replicas will be scaled up there

if oldPhase := pl.Status.Phase; pl.Spec.Lifecycle.GetDesiredPhase() == dfv1.PipelinePhasePaused ||
oldPhase == dfv1.PipelinePhasePaused || oldPhase == dfv1.PipelinePhasePausing {
// Regular pipeline change
// This should be happening in call cases to ensure a clean initialization regardless of the lifecycle phase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This should be happening in call cases to ensure a clean initialization regardless of the lifecycle phase
// This should be happening in all cases to ensure a clean initialization regardless of the lifecycle phase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that's what you meant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this, will address it in another PR.

@whynowy whynowy merged commit cf90e25 into numaproj:main Sep 6, 2024
24 checks passed
whynowy pushed a commit that referenced this pull request Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants