Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read secrets for client-onboarding-token-validation #2827

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mrudraia1
Copy link

This PR reads the secrets instead of reading the secrets from the volume mounts.
whenever the new onboarding secrets are created, it takes more time to read the secrets from the volume mounts,
The user clicks the rotate onboarding keys, the kubernetes still uses the old public, private keys , the new keys are mounted later, So this PR will read the secrets directly from the kubernetes secrets.

controllers/storagecluster/storageclient.go Outdated Show resolved Hide resolved
controllers/storagecluster/storageclient.go Outdated Show resolved Hide resolved
controllers/util/provider.go Show resolved Hide resolved
controllers/util/provider.go Outdated Show resolved Hide resolved
controllers/util/provider.go Outdated Show resolved Hide resolved
services/ux-backend/handlers/common.go Outdated Show resolved Hide resolved
services/ux-backend/main.go Outdated Show resolved Hide resolved
tools/csv-merger/csv-merger.go Outdated Show resolved Hide resolved
@leelavg
Copy link
Contributor

leelavg commented Oct 1, 2024

a suggestion, we are seeing this PR for third time, second time it's fine that you weren't able to recover GH (remember you can't create new a/c every-time though) but last time it's better if you can focus on rebasing properly.

yes, GH doesn't have any issue w/ closing & opening a new PR but for reviewers it's kinda hard to relook from the start.

@mrudraia1
Copy link
Author

I have tested this PR, with the latest 4.18 build. I could see the keys getting exchanged when the rotate signing key is clicked in storageclient page.

@leelavg
Copy link
Contributor

leelavg commented Oct 15, 2024

I have tested this PR, with the latest 4.18 build. I could see the keys getting exchanged when the rotate signing key is clicked in storageclient page.

  • ack, only minor comments now. @nb-ohad possible for a final review?

controllers/storagecluster/storageclient.go Outdated Show resolved Hide resolved
@@ -225,7 +225,7 @@ func (r *StorageClusterReconciler) SetupWithManager(mgr ctrl.Manager) error {
Owns(&appsv1.Deployment{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})).
Owns(&corev1.Service{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})).
Owns(&corev1.ConfigMap{}, builder.MatchEveryOwner, builder.WithPredicates(predicate.GenerationChangedPredicate{})).
Owns(&corev1.Secret{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})).
Owns(&corev1.Secret{}, builder.MatchEveryOwner, builder.WithPredicates(predicate.GenerationChangedPredicate{})).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a MatchEveryOwner here? Isn't StorageCluster the controller owner of the secret we want to watch?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controller owner reference set to secrets, ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was not addressed as tall! You are matching by controller ownership, it does not make sense to match every owner

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Owner reference were set during creation of secrets in on-boarding job, when the rotate signing key is clicked the re-consiliation is not happening, The onboarding-token needs to be created which is failing.

Any suggestion for this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a controller owner ref on the secret and then someone deletes that secret then a delete event will be sent here. And because the item had an owner ref (with controller set) then a reconcile request will be queued. Now if that is not happening there can be a bug in a couple of places:

  • The predicate might be the one the drops the event (should not, but might be)
  • The reconcile is initiated but the reconcile code might not re-create the secret (might happen because of stale cache)
  • Are we sure we are adding a controller reference on the secret? if not and we are only adding an owner ref then that is your bug

@mrudraia1 You need to identify the issue, and only after you fully understand what is going on you can suggest a fix.

Copy link
Contributor

@leelavg leelavg Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Owner reference were set during creation of secrets in on-boarding job, when the rotate signing key is clicked the re-consiliation is not happening, The onboarding-token needs to be created which is failing.
[...]
Are we sure we are adding a controller reference on the secret? if not and we are only adding an owner ref then that is your bug

@rchikatw was there a discussion w/ @mrudraia1 along these lines when he started to work on this? @mrudraia1 did you observe the same behavior from when the work started on this PR, end of July and in latest revision I don't see this change which basically means we are just refactoring the code w/o any observable behavior and not fixing the issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leelavg earlier secret keys are read from the volume mounts , which is now changed to read directly by reading the secrets.

OwnerReference of the secrets changed to ControllerReference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secrets are read from volumemounts is correct but the bug was when the rotation was clicked new secrets aren't realized which is a 2 step process.

  1. storagecluster controller gets an event and onboarding job will be rerun creating new secrets
  2. presence of new secrets are known by kubelet and get updated in ux-backend

the description in BZ mentioned that the problem was in (2) but the observation in this comment says the root cause probably is (1), ie, storagecluster controller isn't getting an event

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, know with this PR build I tested, I can observe the

  1. creation of new on-boarding_token immediately after the clicking of Rotate signing key.
  2. earlier I was using MatchEveryOwner to watch the secrets, Know the ControllerOwner is set to watch the resources.

Copy link
Contributor

@rchikatw rchikatw Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of any discussions about changing the predicate, and I'm not sure why we are changing from OwnerRef to ControllerRef: Here. Because of this change, you might be experiencing issues with job creation not occurring when keys are rotated—this is just a guess. To my knowledge, a job is typically created when keys are rotated, and the QE team has tested this. You can also verify this in your cluster without implementing your change to see if the job is triggered when keys are rotated.

controllers/util/provider.go Show resolved Hide resolved
controllers/util/provider.go Outdated Show resolved Hide resolved
controllers/util/provider.go Outdated Show resolved Hide resolved
services/ux-backend/main.go Outdated Show resolved Hide resolved
services/ux-backend/main.go Outdated Show resolved Hide resolved
services/ux-backend/main.go Outdated Show resolved Hide resolved
services/ux-backend/main.go Outdated Show resolved Hide resolved
Copy link
Contributor

openshift-ci bot commented Oct 16, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mrudraia1
Once this PR has been reviewed and has the lgtm label, please ask for approval from nb-ohad. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mrudraia1 mrudraia1 force-pushed the onboarding branch 6 times, most recently from 4dac720 to 8198c71 Compare October 16, 2024 07:06
@mrudraia1 mrudraia1 requested review from nb-ohad and leelavg October 16, 2024 09:05
@mrudraia1 mrudraia1 force-pushed the onboarding branch 2 times, most recently from ecddbf0 to d31f826 Compare November 6, 2024 14:07
@deepsm007
Copy link

infra issue
/test images

@@ -225,7 +225,7 @@ func (r *StorageClusterReconciler) SetupWithManager(mgr ctrl.Manager) error {
Owns(&appsv1.Deployment{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})).
Owns(&corev1.Service{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})).
Owns(&corev1.ConfigMap{}, builder.MatchEveryOwner, builder.WithPredicates(predicate.GenerationChangedPredicate{})).
Owns(&corev1.Secret{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})).
Owns(&corev1.Secret{}, builder.MatchEveryOwner, builder.WithPredicates(predicate.GenerationChangedPredicate{})).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was not addressed as tall! You are matching by controller ownership, it does not make sense to match every owner

controllers/util/provider.go Outdated Show resolved Hide resolved
controllers/util/provider.go Outdated Show resolved Hide resolved
services/ux-backend/main.go Outdated Show resolved Hide resolved
services/ux-backend/main.go Outdated Show resolved Hide resolved
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 12, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 12, 2024
@mrudraia1 mrudraia1 requested a review from nb-ohad November 13, 2024 08:00
}

Block, _ := pem.Decode(pemString)
Block, _ := pem.Decode(privateSecretKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not ignore decode errors. What if the key is corrupted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

controllers/util/provider.go Outdated Show resolved Hide resolved
@mrudraia1 mrudraia1 force-pushed the onboarding branch 2 times, most recently from 4f84a13 to 64dfa2f Compare November 19, 2024 13:54
@mrudraia1 mrudraia1 force-pushed the onboarding branch 3 times, most recently from 8d4a61d to 1a8e140 Compare November 22, 2024 06:05
Comment on lines 114 to 116
Block, err1 := pem.Decode(privateSecretKey)
if err1 != nil {
return nil, fmt.Errorf("Failed to decode private key: %v", err1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower case Block and as per pem.Decode func signature the second return value isn't err but a slice of bytes which should be empty.

Suggested change
Block, err1 := pem.Decode(privateSecretKey)
if err1 != nil {
return nil, fmt.Errorf("Failed to decode private key: %v", err1)
block, rest := pem.Decode(privateSecretKey)
if len(rest) > 0 {
return nil, fmt.Errorf("PEM block not found in private key: %s", rest)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pem.Decode returns byte, changed as suggested by Leela

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also mentioned to lowercase Block as it doesn't have any extra significance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

klog.Info("Loading onboarding validation private Key")
privateKey, err := util.LoadOnboardingValidationPrivateKey(r.Context(), cl, namespace)
if err != nil {
http.Error(w, fmt.Sprintf("Failed loading onboarding validation private: %v", err), http.StatusBadRequest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
http.Error(w, fmt.Sprintf("Failed loading onboarding validation private: %v", err), http.StatusBadRequest)
http.Error(w, fmt.Sprintf("Failed loading onboarding validation private key: %v", err), http.StatusBadRequest)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

klog.Info("Loading onboarding validation private Key")
privateKey, err := util.LoadOnboardingValidationPrivateKey(r.Context(), cl, namespace)
if err != nil {
http.Error(w, fmt.Sprintf("Failed loading onboarding validation private: %v", err), http.StatusBadRequest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
http.Error(w, fmt.Sprintf("Failed loading onboarding validation private: %v", err), http.StatusBadRequest)
http.Error(w, fmt.Sprintf("Failed loading onboarding validation private key: %v", err), http.StatusBadRequest)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

@leelavg leelavg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls run make deps-update to update vendor.

@mrudraia1 mrudraia1 force-pushed the onboarding branch 5 times, most recently from dc1cc8c to dc950bc Compare December 4, 2024 11:24
@@ -81,7 +81,7 @@ func main() {
klog.Exitf("failed to delete public secret: %v", err)
}

err = controllerutil.SetOwnerReference(storageCluster, privateSecret, cl.Scheme())
err = controllerutil.SetControllerReference(storageCluster, privateSecret, cl.Scheme())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need additional permissions inorder to add a ControllerReference, please make sure that you add the required permissions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the permissions for ControllerReference

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Rewant mentioned adding additional permissions, not removing existing ones, which suggests Owner and Controller are needed. @rewantsoni am I right here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed rbac will provide update access to storagecluster which isn't good, use a new section for only update on finalizers and the job doesn't has controller and watch isn't required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants