Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many logs produced by container #564

Closed
2 tasks done
deyanp opened this issue Jun 30, 2021 · 23 comments
Closed
2 tasks done

Too many logs produced by container #564

deyanp opened this issue Jun 30, 2021 · 23 comments
Labels
bug Something isn't working

Comments

@deyanp
Copy link

deyanp commented Jun 30, 2021

Have you

What steps did you take and what happened:
Looked at the Log Analytics -> Logs with the following query:

ContainerLog
| summarize count() by replace('xxx.azurecr.io/', '', Image)
| order by count_ desc

and saw on the top 2 places:

image

These seems to be because of the Secret Rotation interval of 2m (I have this active) and probably due to the fact that I have more than 50 pods in my cluster using CSI/Aad Pod Identity.

What did you expect to happen:
Wondering how to decrease the number of logs ...

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
No

Which access mode did you use to access the Azure Key Vault instance:
[e.g. Service Principal, Pod Identity, User Assigned Managed Identity, System Assigned Managed Identity]
Pod Identity

Environment:

  • Secrets Store CSI Driver version: (use the image tag): 2.2.0
  • Azure Key Vault provider version: (use the image tag): 0.0.15
  • Kubernetes version: (use kubectl version and kubectl get nodes -o wide): 1.20.5
  • Cluster type: (e.g. AKS, aks-engine, etc): AKS
@deyanp deyanp added the bug Something isn't working label Jun 30, 2021
@nilekhc
Copy link
Contributor

nilekhc commented Jun 30, 2021

Hi @deyanp Thanks for reporting issue. Could you tell what log level you are currently using? Default is 0. You can set log verbosity while installing via helm chart with --set=logVerbosity flag.

@deyanp
Copy link
Author

deyanp commented Jun 30, 2021

@nilekhc I am installing using yaml, and inside I dont see this arg specified ..

@nilekhc
Copy link
Contributor

nilekhc commented Jun 30, 2021

You can use arg -v in yaml. For eg. -v=10

@deyanp
Copy link
Author

deyanp commented Jul 1, 2021

thanks, @nilekhc , I will put an arg -v=X, but how to choose the right value, so that:

  1. I do not get the standard info messages that are currently filling the logs
  2. I still get warnings, errors and important/critical messages?

How did you come up with the -v=10? Why not 9 or 11?

@deyanp
Copy link
Author

deyanp commented Jul 1, 2021

@nilekhc , note that I read already #387, but it did not give me any hope that I can configure properly the log level with -v ... there seems to be no solution to only show warnings/errors but not info/debug??

@deyanp
Copy link
Author

deyanp commented Jul 1, 2021

also this one provides no solution: #358

@nilekhc
Copy link
Contributor

nilekhc commented Jul 1, 2021

@deyanp -v=10 was just eg to try out different logging levels which might be applicable for your scenario. Also it looks like there is an open issue about this is klog.

@deyanp
Copy link
Author

deyanp commented Jul 1, 2021

@nilekhc , you know that -v=10 will make it worse, right? The higher the more logs will be included, this is opposite to what I am used to in other logging libraries ...

@deyanp
Copy link
Author

deyanp commented Jul 4, 2021

This is really not nice, and leaving 2 options to the customers:

  1. Higher costs for Azure Log Analytics (we are speaking about hundreds of EUR here, not about a few EUR!)

  2. Increase the --rotation-poll-interval, however this exposes us more to the other issue Pod Env Vars from SecretProviderClass (envFrom+secretRef) not updated upon deployment #368 , see Pod Env Vars from SecretProviderClass (envFrom+secretRef) not updated upon deployment #368 (comment) ...

Not nice ...

@aramase
Copy link
Member

aramase commented Jul 12, 2021

@deyanp Thank you for the feedback. The current logs generated for the provider are for each secret/key/cert object defined in the SecretProviderClass. It generates 3 logs per object which provides info on auth, success/ error response which makes debugging easier in case of issues. There is no difference in the grpc calls from driver to provider for initial mount/rotation call which is why the logs are consistent across all calls. We can certainly explore increasing the log verbosity for 2 out of 3 logs for each secret/key/cert object but that will make debugging a little harder. In case of errors, you would need to explicitly increase the provider log verbosity and re-enable. The correct long term solution will be the ability to set the log threshold and only get the error/warning logs which is tracked here.

@deyanp
Copy link
Author

deyanp commented Jul 19, 2021

@aramase If I set the level to error I expect that errors are in the logs but informational messages not ... Not really sure why would I need an informational log every 2 minutes that secret x has been refreshed ... Actually currently the informational logs are written regardless of the secret was changed or not - I would expect that only if underlying secret changes then I will have a log message saying this secret was refreshed. Why would I need an info log every 2 minutes that the same secret (unchanged) has been refreshed?

@github-actions
Copy link

github-actions bot commented Aug 3, 2021

This issue is stale because it has been open 14 days with no activity. Please comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Aug 3, 2021
@aramase aramase removed the stale label Aug 3, 2021
@pierluigilenoci
Copy link
Contributor

@deyanp the problem is that the bug is inside the Klog library that many of the Kubernetes components use. To solve the problem, Klog needs to be fixed.

Ref: kubernetes/klog#212

@deyanp
Copy link
Author

deyanp commented Aug 4, 2021

@pierluigilenoci , check the issue you referenced yourself - dims there is implying the issue is not with klog but with the client applications (secrets-store-csi-driver-provider-azure) ..

@pierluigilenoci
Copy link
Contributor

@deyanp I have read the Klog code and I have also proposed a PR.
I assure you that the error is on the Klog side. 😉

@github-actions
Copy link

This issue is stale because it has been open 14 days with no activity. Please comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Aug 26, 2021
@pierluigilenoci
Copy link
Contributor

🚀

@github-actions github-actions bot removed the stale label Aug 27, 2021
@github-actions
Copy link

This issue is stale because it has been open 14 days with no activity. Please comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Sep 10, 2021
@deyanp
Copy link
Author

deyanp commented Sep 10, 2021

@pierluigilenoci could you then pls answer the question of/convince dims in issue kubernetes/klog#212 that this is an issue in klog, and that your PR makes sense?

@pierluigilenoci
Copy link
Contributor

@deyanp I've already tried.

They basically said it is a known bug but there is no intention to fix it because it would change the behavior of the logger for practically all the components of Kubernetes and must be fixed while maintaining backward compatibility. Obviously, this is not possible because it works badly and to fix it the behavior must necessarily be changed and it is not possible to do this. So we're stuck.

@pierluigilenoci
Copy link
Contributor

As soon as I have a few hours of time I will try to find a solution that is backward compatible but I don't promise anything. 😞

@deyanp
Copy link
Author

deyanp commented Sep 10, 2021

As soon as I have a few hours of time I will try to find a solution that is backward compatible but I don't promise anything.

@pierluigilenoci , appreciated, I understand that you may not have time for this!

@aramase as a workaround, if I change --rotation-poll-interval from 2 to 30 minutes for example (to work around the problem with too many logs in this way) will I hit again the other issue where your last comment was "We'll also explore what it would mean to enable the Kubernetes secret update during new deployments with the rotation feature also enabled.", or has that been solved already?

@github-actions github-actions bot removed the stale label Sep 11, 2021
@deyanp
Copy link
Author

deyanp commented Sep 21, 2021

Ok, I guess this is going nowhere, so I found an alternative solution (in case another poor soul needs it) - see below.

  1. Upon pod start I used the trick of Environment.SetEnvironmentVariable (Environment.GetEnvironmentVariable(key) |> get from key vault). So basically I am retrieving the connection string from key vault and then setting it as the same env var, before the env var is retrieved for example by WebJobs SDK which needs clear-text connection string in env var

  2. Deleted all secret provider classes and secrets, and reduced the rotation frequency from 2m to 60m, as only traefik certs now need to be rotated

Basically we stopped using the csi driver in 99% of the cases, as it was used mainly for mapping secrets from key vault to env vars.

@deyanp deyanp closed this as completed Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants