-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: create buffers and buckets before updating Vertices #2112
Conversation
Signed-off-by: Julie Vogelman <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2112 +/- ##
==========================================
+ Coverage 64.25% 64.45% +0.19%
==========================================
Files 324 324
Lines 30650 30650
==========================================
+ Hits 19695 19755 +60
+ Misses 9913 9852 -61
- Partials 1042 1043 +1 ☔ View full report in Codecov by Sentry. |
This looks good to me. @kohlisid and @KeranYang , can you double check if there's any issue with the order switch? |
I actually haven't tested it to tell you the truth yet. I wanted to see if it would pass the CI. If we can build an image for it, I can start to incorporate it into Numaplane, which is aggressively testing this. We can keep it in Draft stage for a little while until we feel confident. |
What this does mean is that the Buffers and Buckets that the vertices are currently using will disappear and those Vertices will not be able to work with the isbsvc. Then the vertices will get updated and they will get reconciled by Vertex Controller to replace the old Pods with the new ones, and I think those should come up. Of course, if the Pipeline is paused first, then the Vertices will be down when the buffers and buckets get wiped and replaced, so nothing should be broken then. |
Thanks Julie for making the change. I don't think this change will break anything so should be safe to merge. My only concern is that we didn't validate the fix. +1 on using numaplane e2e to aggressively test it. Meanwhile, we should develop our own numaflow test case to catch such issue - a test with pipeline topology change would work. @juliev0 , if we merge this in, will numaplane e2e immediately get it? If so, we can monitor. |
Thanks @juliev0 |
I saw this same thing happen again when running our e2e test so I guess it's not that rare. Meanwhile, I don't believe I've seen any adverse consequences of this change. Therefore, I'll make this "ready for review". Feel free to merge either now or we can give it a little more time if you prefer. |
Thanks, @juliev0 . Please continue monitoring and let us know if this fixes. |
Signed-off-by: Julie Vogelman <[email protected]>
Fixes #2083
I simply swapped 1. the Job creation to occur prior to 2. the creation/deletion/Update of the Vertex CRs.
Previously the creation of the Jobs was dependent on the Vertex specs (newBuffers and newBuckets are calculated based on Vertex specs).
So if there's a transient error that occurs after or in the middle of creating/updating the Vertex CRs, the next reconciliation will incorrectly calculate the Jobs it needs to perform, as some or all of the Vertex specs have changed.