-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DFBUGS-1026: Provision to disable DRPC reconsile temporarily - WA of CG #1720
base: main
Are you sure you want to change the base?
Conversation
The issue explains the problem, but what is the proposed solution? What is the user flow with this fix? Which flows should be affected? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution is not clear to me, added some comments about the code style.
ba359d9
to
25a0902
Compare
/hold not yet tested on the cluster |
25a0902
to
431b5cb
Compare
Signed-off-by: Gowtham Shanmugasundaram <[email protected]>
431b5cb
to
264182c
Compare
@nirs the description outlines both the problem and the proposed solution. As Gowtham mentioned, this is intended as a temporary fix and is expected to be removed in future versions. That said, I like the concept of pausing a drpc reconciliation, as it could offer valuable control in certain scenarios. |
Yes, I also think this is a useful feature - being able to disable reconcile for a while sounds like nice feature, even if this does not stop replication on the managed cluster. So we better add these options in a generic and keep them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to think which operations are affected by disabling reconcile. The current change will not work for deleting the drpc.
Should we disable reconcile only for live drpc, and reconcile normally if the drpc is deleted?
|
||
// StartDRPCReconciliationPaused is a flag to pause all the DRPC reconcile temporarily. | ||
// This flag can be overridden by the annotation drplacementcontrol.ramendr.openshift.io/reconciliationUnpaused | ||
StartDRPCReconciliationPaused bool `json:"startDRPCReconciliationPaused,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the Start from the name, this flag disable reconcile even if was already started. It means that drpc is paused by default and requires unpuasing to start reconciling.
I'm not sure about paused and unpause, maybe we can use more common terms like disabled and enabled?
The option can be:
drpcReconciliationDisabled
and the annotation:
reconciliation-enabled
The logic seems clear, if the option is set, all drpcs are disabled, except the these with the explicit "enabled" annotation. It is common that local flag overrides global flag.
@BenamarMk what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Start...
here, is simply an indication that disabling is intended for new DRPCs only. But, we can drop it. I'll let Gowtham decides. Also, I like the word Pause
instead of disable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really certain we need this, to stop reconciliation just scale down the deployment to 0 replicas. That would scale down all controllers (i.e policy/cluster/etc.) but this seems excessive at present.
Once the annotation is removed for CG enabling, if a future use-case arises where a single controller from the set needs to be disabled, this maybe considered.
@@ -2467,3 +2480,8 @@ func (r *DRPlacementControlReconciler) drpcHaveCommonClusters(ctx context.Contex | |||
|
|||
return drpolicyClusters.Intersection(otherDrpolicyClusters).Len() > 0, nil | |||
} | |||
|
|||
func schedulingEnabled(drpc *rmn.DRPlacementControl, ramenConfig *rmn.RamenConfig) bool { | |||
force := drpc.GetAnnotations()[ReconciliationUnpaused] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
force is does not match the new name, it was good for the old name.
@@ -152,6 +155,16 @@ func (r *DRPlacementControlReconciler) Reconcile(ctx context.Context, req ctrl.R | |||
return ctrl.Result{}, err | |||
} | |||
|
|||
// Workaround: to add drplacementcontrol.ramendr.openshift.io/is-cg-enabled annotation | |||
// for the application workload that is DR protected from the UI. | |||
if !schedulingEnabled(drpc, ramenConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the flags are about reconciliation, this helper should be reconciliationEnabled()
if !schedulingEnabled(drpc, ramenConfig) { | ||
logger.Info("DRPC is reconcile is paused, To continue the reconsile, " + | ||
"either add drplacementcontrol.ramendr.openshift.io/reconciliationUnpaused: 'true' " + | ||
"annotation or update StartDRPCReconciliationPaused on the ramen configMap.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log will appear many times - each time a watched resource changes, and it is a very verbose log.
It would be better to set the status to report that the drpc is disabled. In this case this will happen once since the next reconcile the status will not change.
}) | ||
|
||
When("Application deployed for the first time", func() { | ||
It("Should not have any drpc status", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be better to have status reporting that the drpc is disabled.
populateDRClusters() | ||
}) | ||
|
||
When("Application deployed for the first time", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRPC created in the first time?
Application is deployment may is not related to the drpc, it can be deployed before the drpc was created.
}) | ||
|
||
When("Update ramen config to override DRPC reconcile", func() { | ||
It("Should have some drpc status after reconcile", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that disabling reconciliation will keep the status before the reconciliation was disabled?
It would be more useful if we had disable status in the drpc status, and it would reflect the current status of the drpc. If we disabled the drpc after it was reconciled, the status will be the same as the last status, with Status.disabled: "true".
When("Deleting DRPolicy with DRPC references", func() { | ||
It("Should retain the deleted DRPolicy in the API server", func() { | ||
By("\n\n*** DELETE drpolicy ***\n\n") | ||
deleteDRPolicyAsync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests that drpolicy is not deleted?
|
||
When("Deleting DRPolicy with DRPC references", func() { | ||
It("Should retain the deleted DRPolicy in the API server", func() { | ||
By("\n\n*** DELETE drpolicy ***\n\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was copied from another test? not sure why we need all these newlines and ***
When("Deleting user PlacementRule", func() { | ||
It("Should cleanup DRPC", func() { | ||
By("\n\n*** DELETE User PlacementRule ***\n\n") | ||
deleteUserPlacementRule(UserPlacementRuleName, DefaultDRPCNamespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "cleanup DRPC" means? if this delete the drpc, the next test is incorrect since the drpc was already deleted.
But I don't think that deleting drpc will work with the current code, since return from the reconcile very early before we reach the check for deleted drpc.
Can we just scale the reconciler deployment to 0 and then create the DRPC via the console, edit the DRPC with the required annotation and then scale back the deployment? |
@ShyamsundarR initially, that was the approach, and that's how we test it now. However, during the testing process, we consistently found ourselves cleaning up after the mess due to forgetting to scale down the deployment. If we think that's acceptable though, then we can close the PR. The idea was to set the new global config once and forget about it. |
Is it possible to add the missing annotation in the UI instead of adding this new "paused" state to ramen? |
I understand someone forgetting it and so making it easier, but as the feature is still under a dev preview state, I would prefer we not add config changes to help the qualification efforts. IOW, we can close this and (soon) shift to testing with the default methods in the future. |
cc: @BenamarMk for the confirmation to close this PR |
It looks like, the general agreement is that for a dev preview release, asking users to manually scale down Ramen at the hub seems acceptable. This change was proposed to address a common issue where forgetting to scale down Ramen before deploying a new workload led to additional cleanup efforts after the fact. The goal was to introduce a "set it once and forget it" option in the Ramen config. This way, no actions would be taken on new workloads until users explicitly unpause it, serving as a built-in reminder. That said, I am aligned with the decision to close the PR. |
@ShyamsundarR UI epic targetted for ODF 4.19, Current KCS has some gap for 4.17 and 4.18: https://access.redhat.com/articles/7091592. I am ok to close this PR but we need to correct this KCS. Shall ask the documentation team to add the steps to scale down the ramen hub pod deployment? |
Issue: https://issues.redhat.com/browse/DFBUGS-1026
Description
The issue is the user has to choose a multivolume consistency group way of replication via DRPC(DRPlacementContorl CR) by adding an annotation. The limitation is, that the user has to add this annotation while creating DRPC. They can't do it after creating DRPC. But the problem is UI. Users can't test this techpreview feature via UI because UI is creating a DRPC for them. The user can't add annotation after UI creates it. KCS: https://access.redhat.com/articles/7091592
The proposed solution is:
To temporarily halt the DRPC reconcile, a new ramen configuration specification is being introduced. Thus, it won't be reconciled when a DRPC is created through the user interface. The user can add annotations
drplacementcontrol.ramendr.openshift.io/is-cg-enabled
,drplacementcontrol.ramendr.openshift.io/reconciliationUnpaused
to resume the reconcile and enable consistency group for a specific DRPC. This is a short-term fix for the devpreview function.