-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(bug): Ensure windows agent stability using hubble/legacy helm values #1128
base: main
Are you sure you want to change the base?
Conversation
@@ -130,8 +130,8 @@ data: | |||
enabledPlugin: {{ .Values.enabledPlugin_win }} | |||
metricsInterval: {{ .Values.metricsInterval }} | |||
metricsIntervalDuration: {{ .Values.metricsIntervalDuration }} | |||
enableTelemetry: {{ .Values.enableTelemetry }} | |||
enablePodLevel: {{ .Values.enablePodLevel }} | |||
enableTelemetry: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put back these changes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the linux OS, the default hubble values work but for windows, they failed with these default values. We can either create new helm values for the windows OS specifically or leave it as is. I don't mind
The enableTelemetry
requires application insights and leaving the default as false prevents the agent crashing if its not defined. If the consumer wants it enabled then they can simply update it.
The enablePodLevel
causes RBAC issues with the retina-agent service account. As it's currently not supported on Windows, I think a default of false makes sense.
In legacy, both values are currently set to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can define a default like this: {{ .Values.enablePodLevel | default false }}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original values exist but it's currently set to true which is causing the Windows agent to crash but not for Linux. It's not a matter of having a default boolean 😊
- controller.exe --config ./retina/config.yaml | ||
- powershell.exe | ||
- -command | ||
- .\setkubeconfigpath.ps1; ./controller.exe --config ./retina/config.yaml --kubeconfig ./kubeconfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an issue with 1.30 and oidc with this approach. we should remove these changes for these k8s versions. cc @rbtr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The custom kubeconfig is only required with containerd <1.7 and only AKS 1.27 (LTS) is still using that, maybe we remove it completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what specific ask here is. Should the custom kubeconfig
only be enabled for K8s version <= 1.30?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the following code in both agent daemonsets (hubble/legacy) and windows.yaml
manifest and the agent still crashes:
command:
- powershell.exe
- -command
{{- if semverCompare ">=1.30" .Capabilities.KubeVersion.GitVersion }}
- $env:CONTAINER_SANDBOX_MOUNT_POINT/controller.exe --config ./retina/config.yaml
{{- else }}
- .\setkubeconfigpath.ps1; ./controller.exe --config ./retina/config.yaml --kubeconfig ./kubeconfig
{{- end }}
Description
This PR aims to fix the stability of the retina windows agent. There were 4 causes identified and each commit resolves one respectively.
Telemetry enabled also causes the agent to panic if application insights is not defined. User can change the config map as desired but default values should not cause the agent to crash (3rd commit)
kubeconfig
file cannot be found for the legacy chart values. Executing thesetkubeconfigpath.ps1
was required for the container setup (4th commit)Related Issue
#1122
Checklist
git commit -S -s ...
). See this documentation on signing commits.Screenshots (if applicable) or Testing Completed
Each commit corresponding image was built and tested on the cluster to confirm each fix works!
Additional Notes
First three problems were experienced when deploying retina using the hubble path and the last issue was experienced when deploying retina using the legacy path
Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.