Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use round robin to distribute requests to all repo server pods #353

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

djeebus
Copy link
Collaborator

@djeebus djeebus commented Jan 14, 2025

Issues like #349 seem to indicate that our requests are overloading specific pods, rather than utilizing the whole cluster.

Copy link

Temporary image available at ghcr.io/zapier/kubechecks:0.0.0-pr353.

if we avoid sharing, this should let k8s build new connections to new pods
@djeebus
Copy link
Collaborator Author

djeebus commented Jan 14, 2025

Success! This reduced the checks time from 153 seconds down to 35 seconds, or about a 78% reduction!

@zapier-sre-bot
Copy link
Collaborator

Mergecat's Review

Click to read mergecats review!

😼 Mergecat review of pkg/argo_client/client.go

@@ -23,13 +23,10 @@ import (
 )
 
 type ArgoClient struct {
-	client apiclient.Client
-
-	repoClient         repoapiclient.RepoServerServiceClient
-	namespace          string
-	k8s                kubernetes.Interface
-	k8sConfig          *rest.Config
-	sendFullRepository bool
+	client    apiclient.Client
+	k8s       kubernetes.Interface
+	k8sConfig *rest.Config
+	cfg       config.ServerConfig
 }
 
 func NewArgoClient(
@@ -56,25 +53,27 @@ func NewArgoClient(
 		return nil, err
 	}
 
+	return &ArgoClient{
+		cfg:       cfg,
+		client:    argo,
+		k8s:       k8s.ClientSet(),
+		k8sConfig: k8s.Config(),
+	}, nil
+}
+
+func (a *ArgoClient) createRepoServerClient() (repoapiclient.RepoServerServiceClient, *grpc.ClientConn, error) {
 	log.Info().Msg("creating client")
-	tlsConfig := tls.Config{InsecureSkipVerify: cfg.ArgoCDRepositoryInsecure}
-	conn, err := grpc.NewClient(cfg.ArgoCDRepositoryEndpoint,
+	tlsConfig := tls.Config{InsecureSkipVerify: a.cfg.ArgoCDRepositoryInsecure}
+	conn, err := grpc.NewClient(a.cfg.ArgoCDRepositoryEndpoint,
 		grpc.WithTransportCredentials(
 			credentials.NewTLS(&tlsConfig),
 		),
 	)
 	if err != nil {
-		return nil, errors.Wrap(err, "failed to create client")
+		return nil, nil, errors.Wrap(err, "failed to create client")
 	}
 
-	return &ArgoClient{
-		repoClient:         repoapiclient.NewRepoServerServiceClient(conn),
-		client:             argo,
-		namespace:          cfg.ArgoCDNamespace,
-		k8s:                k8s.ClientSet(),
-		k8sConfig:          k8s.Config(),
-		sendFullRepository: cfg.ArgoCDSendFullRepository,
-	}, nil
+	return repoapiclient.NewRepoServerServiceClient(conn), conn, nil
 }
 
 // GetApplicationClient has related argocd diff code https://github.com/argoproj/argo-cd/blob/d3ff9757c460ae1a6a11e1231251b5d27aadcdd1/cmd/argocd/commands/app.go#L899

Feedback & Suggestions:

  1. Separation of Concerns: The refactoring to move the creation of the RepoServerServiceClient into a separate method createRepoServerClient is a good step towards separation of concerns. However, ensure that this method is called where necessary to maintain the functionality that was previously in NewArgoClient.

  2. Error Handling: The createRepoServerClient method now returns a *grpc.ClientConn along with the RepoServerServiceClient. Ensure that the connection is properly closed when it is no longer needed to prevent resource leaks. Consider using defer conn.Close() after the connection is successfully established.

  3. Configuration Management: The cfg field in ArgoClient is a good addition for managing configuration. Ensure that all necessary configuration values are accessed through this field to maintain consistency.

  4. Logging: The logging statement "creating client" is now inside the createRepoServerClient method. Make sure that this log message is still relevant and correctly placed to provide useful information during debugging.

  5. Code Consistency: Ensure that the removal of fields like namespace and sendFullRepository from ArgoClient does not affect other parts of the code that might rely on these fields. If they are still needed, consider how they can be accessed or managed differently.

  6. Security: The use of InsecureSkipVerify in tls.Config can lead to security vulnerabilities. Ensure that this is only used in development or when absolutely necessary, and consider adding a warning or documentation to highlight the potential risks.


😼 Mergecat review of pkg/argo_client/manifests.go

@@ -126,8 +126,8 @@ func (a *ArgoClient) generateManifests(ctx context.Context, app v1alpha1.Applica
 		return nil, errors.Wrap(err, "failed to get settings")
 	}
 
-	settingsMgr := argosettings.NewSettingsManager(ctx, a.k8s, a.namespace)
-	argoDB := db.NewDB(a.namespace, settingsMgr, a.k8s)
+	settingsMgr := argosettings.NewSettingsManager(ctx, a.k8s, a.cfg.ArgoCDNamespace)
+	argoDB := db.NewDB(a.cfg.ArgoCDNamespace, settingsMgr, a.k8s)
 
 	repoTarget := source.TargetRevision
 	if pkg.AreSameRepos(source.RepoURL, pullRequest.CloneURL) && areSameTargetRef(source.TargetRevision, pullRequest.BaseRef) {
@@ -141,7 +141,7 @@ func (a *ArgoClient) generateManifests(ctx context.Context, app v1alpha1.Applica
 	}
 
 	var packageDir string
-	if a.sendFullRepository {
+	if a.cfg.ArgoCDSendFullRepository {
 		log.Info().Msg("sending full repository")
 		packageDir = repo.Directory
 	} else {
@@ -226,11 +226,25 @@ func (a *ArgoClient) generateManifests(ctx context.Context, app v1alpha1.Applica
 		RefSources:         refSources,
 	}
 
+	// creating a new client forces grpc to create a new connection, which causes
+	//the k8s load balancer to select a new pod, balancing requests among all repo-server pods.
+	repoClient, conn, err := a.createRepoServerClient()
+	if err != nil {
+		return nil, errors.Wrap(err, "error creating repo client")
+	}
+	defer conn.Close()
+
 	log.Info().Msg("generating manifest with files")
-	stream, err := a.repoClient.GenerateManifestWithFiles(ctx)
+	stream, err := repoClient.GenerateManifestWithFiles(ctx)
 	if err != nil {
 		return nil, errors.Wrap(err, "failed to get manifests with files")
 	}
+	defer func() {
+		err := stream.CloseSend()
+		if err != nil {
+			log.Error().Err(err).Msg("failed to close stream")
+		}
+	}()
 
 	log.Info().Msg("sending request")
 	if err := stream.Send(&repoapiclient.ManifestRequestWithFiles{

Feedback & Suggestions:

  1. Namespace Configuration: The change from a.namespace to a.cfg.ArgoCDNamespace improves configurability by using a configuration struct. Ensure that a.cfg.ArgoCDNamespace is properly initialized and validated to avoid potential nil or empty values. 🛡️

  2. Repository Sending Flag: The change from a.sendFullRepository to a.cfg.ArgoCDSendFullRepository follows the same pattern of using configuration. Ensure that this configuration is well-documented and defaults are set appropriately. 📋

  3. gRPC Client Creation: Creating a new gRPC client (repoClient) for each request can help with load balancing, but it may introduce overhead. Consider the performance implications and test to ensure this change does not degrade performance under high load. ⚖️

  4. Stream Closure: The addition of defer stream.CloseSend() is a good practice to ensure the stream is closed properly. However, ensure that any errors from CloseSend are handled appropriately, as logging alone may not be sufficient if the error impacts subsequent operations. 🛠️

  5. Error Handling: The error handling for stream.CloseSend() logs the error but does not propagate it. Consider whether this error should be returned to the caller to handle it more effectively. 🔄

Overall, these changes improve configurability and potentially enhance load balancing, but be mindful of the performance and error handling implications. 🚀



Dependency Review

Click to read mergecats review!

No suggestions found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants