-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read secrets for client-onboarding-token-validation #2827
base: main
Are you sure you want to change the base?
Conversation
services/ux-backend/handlers/onboarding/clienttokens/handler.go
Outdated
Show resolved
Hide resolved
a suggestion, we are seeing this PR for third time, second time it's fine that you weren't able to recover GH (remember you can't create new a/c every-time though) but last time it's better if you can focus on rebasing properly. yes, GH doesn't have any issue w/ closing & opening a new PR but for reviewers it's kinda hard to relook from the start. |
674b54c
to
dc8a56b
Compare
I have tested this PR, with the latest 4.18 build. I could see the keys getting exchanged when the rotate signing key is clicked in storageclient page. |
services/ux-backend/handlers/onboarding/clienttokens/handler.go
Outdated
Show resolved
Hide resolved
services/ux-backend/handlers/onboarding/clienttokens/handler.go
Outdated
Show resolved
Hide resolved
|
@@ -225,7 +225,7 @@ func (r *StorageClusterReconciler) SetupWithManager(mgr ctrl.Manager) error { | |||
Owns(&appsv1.Deployment{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})). | |||
Owns(&corev1.Service{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})). | |||
Owns(&corev1.ConfigMap{}, builder.MatchEveryOwner, builder.WithPredicates(predicate.GenerationChangedPredicate{})). | |||
Owns(&corev1.Secret{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})). | |||
Owns(&corev1.Secret{}, builder.MatchEveryOwner, builder.WithPredicates(predicate.GenerationChangedPredicate{})). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a MatchEveryOwner here? Isn't StorageCluster the controller owner of the secret we want to watch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
controller owner reference set to secrets, ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was not addressed as tall! You are matching by controller ownership, it does not make sense to match every owner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Owner reference were set during creation of secrets in on-boarding job, when the rotate signing key is clicked the re-consiliation is not happening, The onboarding-token needs to be created which is failing.
Any suggestion for this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a controller owner ref on the secret and then someone deletes that secret then a delete event will be sent here. And because the item had an owner ref (with controller set) then a reconcile request will be queued. Now if that is not happening there can be a bug in a couple of places:
- The predicate might be the one the drops the event (should not, but might be)
- The reconcile is initiated but the reconcile code might not re-create the secret (might happen because of stale cache)
- Are we sure we are adding a controller reference on the secret? if not and we are only adding an owner ref then that is your bug
@mrudraia1 You need to identify the issue, and only after you fully understand what is going on you can suggest a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Owner reference were set during creation of secrets in on-boarding job, when the rotate signing key is clicked the re-consiliation is not happening, The onboarding-token needs to be created which is failing.
[...]
Are we sure we are adding a controller reference on the secret? if not and we are only adding an owner ref then that is your bug
- ouch, so the issue all along is storagecluster not getting a reconcile event when rotate keys is invoked (ie, deletion of secrets) but not the delay b/n rotation and kubelet updating the secret as diagnosed in https://bugzilla.redhat.com/show_bug.cgi?id=2311012#c5?
@rchikatw was there a discussion w/ @mrudraia1 along these lines when he started to work on this? @mrudraia1 did you observe the same behavior from when the work started on this PR, end of July and in latest revision I don't see this change which basically means we are just refactoring the code w/o any observable behavior and not fixing the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leelavg earlier secret keys are read from the volume mounts , which is now changed to read directly by reading the secrets.
OwnerReference of the secrets changed to ControllerReference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
secrets are read from volumemounts is correct but the bug was when the rotation was clicked new secrets aren't realized which is a 2 step process.
- storagecluster controller gets an event and onboarding job will be rerun creating new secrets
- presence of new secrets are known by kubelet and get updated in ux-backend
the description in BZ mentioned that the problem was in (2) but the observation in this comment says the root cause probably is (1), ie, storagecluster controller isn't getting an event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, know with this PR build I tested, I can observe the
- creation of new on-boarding_token immediately after the clicking of Rotate signing key.
- earlier I was using MatchEveryOwner to watch the secrets, Know the ControllerOwner is set to watch the resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of any discussions about changing the predicate, and I'm not sure why we are changing from OwnerRef to ControllerRef: Here. Because of this change, you might be experiencing issues with job creation not occurring when keys are rotated—this is just a guess. To my knowledge, a job is typically created when keys are rotated, and the QE team has tested this. You can also verify this in your cluster without implementing your change to see if the job is triggered when keys are rotated.
services/ux-backend/handlers/onboarding/clienttokens/handler.go
Outdated
Show resolved
Hide resolved
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mrudraia1 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4dac720
to
8198c71
Compare
ecddbf0
to
d31f826
Compare
infra issue |
@@ -225,7 +225,7 @@ func (r *StorageClusterReconciler) SetupWithManager(mgr ctrl.Manager) error { | |||
Owns(&appsv1.Deployment{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})). | |||
Owns(&corev1.Service{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})). | |||
Owns(&corev1.ConfigMap{}, builder.MatchEveryOwner, builder.WithPredicates(predicate.GenerationChangedPredicate{})). | |||
Owns(&corev1.Secret{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})). | |||
Owns(&corev1.Secret{}, builder.MatchEveryOwner, builder.WithPredicates(predicate.GenerationChangedPredicate{})). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was not addressed as tall! You are matching by controller ownership, it does not make sense to match every owner
services/ux-backend/handlers/onboarding/clienttokens/handler.go
Outdated
Show resolved
Hide resolved
e627033
to
bbae1f7
Compare
bbae1f7
to
36162f4
Compare
36162f4
to
e3bba8e
Compare
e3bba8e
to
f6ee295
Compare
controllers/util/provider.go
Outdated
} | ||
|
||
Block, _ := pem.Decode(pemString) | ||
Block, _ := pem.Decode(privateSecretKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not ignore decode errors. What if the key is corrupted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
services/ux-backend/handlers/onboarding/clienttokens/handler.go
Outdated
Show resolved
Hide resolved
4f84a13
to
64dfa2f
Compare
8d4a61d
to
1a8e140
Compare
controllers/util/provider.go
Outdated
Block, err1 := pem.Decode(privateSecretKey) | ||
if err1 != nil { | ||
return nil, fmt.Errorf("Failed to decode private key: %v", err1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lower case Block
and as per pem.Decode
func signature the second return value isn't err but a slice of bytes which should be empty.
Block, err1 := pem.Decode(privateSecretKey) | |
if err1 != nil { | |
return nil, fmt.Errorf("Failed to decode private key: %v", err1) | |
block, rest := pem.Decode(privateSecretKey) | |
if len(rest) > 0 { | |
return nil, fmt.Errorf("PEM block not found in private key: %s", rest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pem.Decode returns byte, changed as suggested by Leela
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also mentioned to lowercase Block
as it doesn't have any extra significance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
klog.Info("Loading onboarding validation private Key") | ||
privateKey, err := util.LoadOnboardingValidationPrivateKey(r.Context(), cl, namespace) | ||
if err != nil { | ||
http.Error(w, fmt.Sprintf("Failed loading onboarding validation private: %v", err), http.StatusBadRequest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http.Error(w, fmt.Sprintf("Failed loading onboarding validation private: %v", err), http.StatusBadRequest) | |
http.Error(w, fmt.Sprintf("Failed loading onboarding validation private key: %v", err), http.StatusBadRequest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
klog.Info("Loading onboarding validation private Key") | ||
privateKey, err := util.LoadOnboardingValidationPrivateKey(r.Context(), cl, namespace) | ||
if err != nil { | ||
http.Error(w, fmt.Sprintf("Failed loading onboarding validation private: %v", err), http.StatusBadRequest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http.Error(w, fmt.Sprintf("Failed loading onboarding validation private: %v", err), http.StatusBadRequest) | |
http.Error(w, fmt.Sprintf("Failed loading onboarding validation private key: %v", err), http.StatusBadRequest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls run make deps-update
to update vendor.
dc1cc8c
to
dc950bc
Compare
@@ -81,7 +81,7 @@ func main() { | |||
klog.Exitf("failed to delete public secret: %v", err) | |||
} | |||
|
|||
err = controllerutil.SetOwnerReference(storageCluster, privateSecret, cl.Scheme()) | |||
err = controllerutil.SetControllerReference(storageCluster, privateSecret, cl.Scheme()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would need additional permissions inorder to add a ControllerReference, please make sure that you add the required permissions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the permissions for ControllerReference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Rewant mentioned adding additional permissions, not removing existing ones, which suggests Owner and Controller are needed. @rewantsoni am I right here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed rbac will provide update access to storagecluster which isn't good, use a new section for only update on finalizers and the job doesn't has controller and watch isn't required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
Signed-off-by: Mrudraia1 <[email protected]>
This PR reads the secrets instead of reading the secrets from the volume mounts.
whenever the new onboarding secrets are created, it takes more time to read the secrets from the volume mounts,
The user clicks the rotate onboarding keys, the kubernetes still uses the old public, private keys , the new keys are mounted later, So this PR will read the secrets directly from the kubernetes secrets.