Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ AWS CNI Mode #461

Merged
merged 2 commits into from
Jan 31, 2022
Merged

✨ AWS CNI Mode #461

merged 2 commits into from
Jan 31, 2022

Conversation

universam1
Copy link
Contributor

@universam1 universam1 commented Dec 17, 2021

new operational mode of Direct Ingress Pod access

implements #298 Support for target-type IP

This adds a new operational mode of adding the Ingress Pods as Targets in the Target Groups directly, instead of the current operation mode where the Ingress pods are accessed through a HostPort.

The core idea is based on the fact that standard AWS EKS cluster running AWS VPC CNI have their pods as first class members in the VPCs. Their IPs are directly reachable from the ALB/NLB target groups like the nodes are too, which means there is no necessity for the HostPort indirection.

There are several drivers and advantages accessing the pod directly vs a HostPort:

HostNetwork and HostPort conflict with best practices

ref: https://kubernetes.io/docs/concepts/configuration/overview/#services

Delayed, eventual consistency of AutoscalingGroup to ALB/NLB target group updates

This has been the biggest trouble in operations, that the update of node members in target groups is slower than the nodes are physically replaced, which ends up in a black hole of no Ingresses available at a time. We are facing regularly downtimes especially when spot interruptions or ASG node rollings happen that the ALB/NLB takes up to 2 minutes to reflect the group change. For smaller clusters this leads to no Skipper instance being registered hence no target available to forward traffic to.
With this new mode the registration happens independently of ASGs and instantly, the scheduling of pods up to be serving traffic from ALB takes less than 10 seconds!

Independent scaling from nodes

With HostPort there is an eventual dependency on available nodes to scale the Ingress.
Plus the Ingress pod cannot be replaced in place but requires a termination first and then rescheduling. For a certain time, which can be more than a minute, this node is offline as an Ingress.
With this mode the (host networking ?) and HostPort is obsolete, which allows node independent scaling of Skipper pods! Skipper becomes a regular deployment and its ReplicaSet can be independent on the cluster size, which simplifies operations especially for smaller clusters. We are using a custom HPA metric attached to node group size to counteract this deployment / daemonset hybrid combination, which is obsolete now!

Synced de/registering and instantaneous response

Core idea is the event based registration to Kubernetes using pod Informer that receives immediate notifications about pod changes, which allow almost zero delayed updates on the target groups.

The registration happens as soon as the pod received an IP from AWS VPC. Hence the readiness probe of the ALB/NLB starts to monitor already during scheduling of the pod, serving the earliest possible. Tests in lab show pods serving ALB traffic well under 10s from scheduling!

Deregistration happens bound to the kubernetes event. That means the LB is now in sync with the cluster and will stop sending traffic before the pod is actually terminated. This implement save deregistering without traffic loss. Tests in lab show even under aggressive deployment scalings there are no packet losses measured!

TG without unhealthy targets

Since the IP based TGs are managed now by this controller, they represent pods and thus all of them are shown healthy, otherwise cleaned up by this controller.

Implementation details:

  • client-go Informer: This high level functions are providing a convenient access to event registrations of kubernetes. Since the event registration is the key of fast response and efficient compared to high rate polling, using this standard factory methods seems standing to reason.

Tests

  • successful transistion of TG from type Instance to type IP vice versa
  • the controller registers pods that are discovered
  • the controller deregisters pods that are "terminating" status
  • the controller recovers desired state if manual intervention on the TG happened by "resyncing"
  • it removes pods that are killed or dead

HostNetwork and HostPort combinations tested:

access mode HostNetwork HostPort Result Notes
Instance true true ok status quo
IP true true ok PodIP == HostIP --> limited benefits
IP false true ok PodIP != HostIP --> limited scaling, node bounde
IP true false N/A impossible, HN implies HP
IP false false OK goal achieved: free scaling and HA

Example logs:

time="2021-12-17T15:36:36Z" level=info msg="Deleted Pod skipper-ingress-575548f66-k99vs IP:172.16.49.249" 
time="2021-12-17T15:36:36Z" level=info msg="Deregistering CNI targets: [172.16.49.249]" 
time="2021-12-17T15:36:37Z" level=info msg="New Pod skipper-ingress-575548f66-qff2q IP:172.16.60.193"
time="2021-12-17T15:36:40Z" level=info msg="Registering CNI targets: [172.16.60.193]"

misc

  • extended the service account with required RBAC permissions to watch/list pods
  • added example of Skipper without a HostPort and HostNetwork

@universam1
Copy link
Contributor Author

fixes #298

cc @szuecs

@szuecs
Copy link
Member

szuecs commented Dec 17, 2021

@universam1 thanks, sounds like a great idea for aws-vpc cni.
Does the change also work with plain hostnetwork without nodeport?
We don't use nodeport service. see also https://github.com/zalando-incubator/kubernetes-on-aws/blob/dev/cluster/manifests/skipper/deployment.yaml#L47-L56

One drawback is that we need informers and because of that client-go WATCH (internal to informers). We tried to omit client-go, because of kubernetes/client-go#374 (comment) zalando-incubator/stackset-controller#263 , but since kubernetes/kubernetes#95981 was done maybe it's also fine to use client-go.

@universam1
Copy link
Contributor Author

universam1 commented Dec 18, 2021

Thanks @szuecs for review!

Does the change also work with plain hostnetwork without nodeport?

Please have a look at the roster of tests:

access mode HostNetwork HostPort Result Notes
Instance true true ok status quo
IP true true ok PodIP == HostIP --> limited benefits
IP false true ok PodIP != HostIP --> limited scaling, node bound
IP true false N/A impossible, HN implies HP
IP false false OK goal: free scaling and HA

From my testing it is impossible to define HostNetwork but disable HostPort. AFAIK this combination is not avail as the HostNetworking implies pod port mapping.

We don't use nodeport service

Likewise, we don't either! Sorry I mixed up nodeport with hostport, while the first stands for a service and the latter for a port on the host. I was only referring to the host port, corrected the description

drawback is that we need informers and because of that client-go WATCH (internal to informers)

Curious, unaware of this issue. Wondering, as the informer is used across SIGS projects such as ClusterAutoscaler or even suggested pattern for controller - is that issue you face a corner case?

@universam1 universam1 force-pushed the TGIP branch 3 times, most recently from 80db96d to a04d016 Compare December 18, 2021 10:49
@szuecs
Copy link
Member

szuecs commented Dec 18, 2021

Thanks for the testing table, looks great to me.
Shared informers are afaik only a different api working on cache and internal using watch and hopefully Timeouts, which we can check or if not try to pass a Transport with Timeouts when we create the client.
You will observe the issue not often and only if you check "node issues" that likely everyone will just terminate because it is one way to fix it fast. For ingress (skipper, this one and external-dns), we were conservative and use things that work reliably. I think external-dns changed it or we added passing a transport to it in the early days.
If we change to informers we should change all of it to informers in my opinion.
Maybe as a follow up PR.

@universam1
Copy link
Contributor Author

we can check or if not try to pass a Transport with Timeouts when we create the client.

Thank you for the context, pretty impressive deep dive! Issues like this are incredible challenging to surface!
AFAIK the client defaults to https://pkg.go.dev/net/http#DefaultTransport that looks like it incorporates a timeout and H2 at least for Go 1.16!?
The rest.Client supports Timeout as well https://github.com/kubernetes/client-go/blob/4ae0d1917adff2d93ea4c6a7ae3a9e5c19280b9f/rest/transport.go#L32

Since I'm not familiar with this issue, can you help me out what the desired config would look like?

If we change to informers we should change all of it to informers in my opinion. Maybe as a follow up PR.

IMHO a full refactor could simplify things, but might render this PR large to impossible to land.

@szuecs
Copy link
Member

szuecs commented Dec 20, 2021

we can check or if not try to pass a Transport with Timeouts when we create the client.

Thank you for the context, pretty impressive deep dive! Issues like this are incredible challenging to surface! AFAIK the client defaults to https://pkg.go.dev/net/http#DefaultTransport that looks like it incorporates a timeout and H2 at least for Go 1.16!?

Interesting not sure if this helps in this case. On the other hand client-go uses H2 ping frames to detect broken connections, so should be fine.

The rest.Client supports Timeout as well https://github.com/kubernetes/client-go/blob/4ae0d1917adff2d93ea4c6a7ae3a9e5c19280b9f/rest/transport.go#L32

Since I'm not familiar with this issue, can you help me out what the desired config would look like?

I would just start to pass a custom Transport, you can use the DefaultTransport to pass it.
When I would do it I would likely use https://pkg.go.dev/github.com/zalando/skipper/net#NewTransport.
Let's see if we observe errors and then we can work on a fix for these. Maybe H2 ping frames detect all things we need already.

If we change to informers we should change all of it to informers in my opinion. Maybe as a follow up PR.

IMHO a full refactor could simplify things, but might render this PR large to impossible to land.

Agree out of scope, but we should have an issue that we can work on later to migrate.

@universam1
Copy link
Contributor Author

universam1 commented Dec 27, 2021

I would likely use https://pkg.go.dev/github.com/zalando/skipper/net#NewTransport.

@szuecs struggling to leverage the skipper lib as a custom Transport here, running into the error kubernetes/client-go#452 "using a custom transport with TLS certificate options or the insecure flag is not allowed". I don't see TLSClientConfig being exported to be provided for TLS setup and alternatively disabling TLS doesn't sound like an option to me. Thanks for any hint!
Found a solution by providing a http.Transport to the skipper Options which can be configured accordingly.

@universam1
Copy link
Contributor Author

@szuecs Added complete set of unit tests, implemented the Skipper NewTransport, tested in Lab, I believe this should be ready to go.
Anything missing?

go.mod Show resolved Hide resolved
@szuecs
Copy link
Member

szuecs commented Jan 3, 2022

@universam1 from my side only DCO :)

@universam1 universam1 force-pushed the TGIP branch 3 times, most recently from 712b195 to 653679c Compare January 3, 2022 13:27
@szuecs
Copy link
Member

szuecs commented Jan 3, 2022

Run make lint
golangci-lint run ./...
level=error msg="Running error: context loading failed: failed to load packages: timed out to load packages: context deadline exceeded"
level=error msg="Timeout exceeded: try increasing it by passing --timeout option"

restarted, in case it happens again please check why it takes so long and if it really takes so long increase the timeout, please

@universam1
Copy link
Contributor Author

universam1 commented Jan 3, 2022

please check why it takes so long and if it really takes so long increase the timeout, please

It requires a couple of adjustments on the CI config as much as I see, fine from my side if that's desired

@szuecs
Copy link
Member

szuecs commented Jan 3, 2022

yes, please

@universam1 universam1 force-pushed the TGIP branch 2 times, most recently from 782d97a to cd5f94b Compare January 4, 2022 08:50
@szuecs
Copy link
Member

szuecs commented Jan 4, 2022

👍

@universam1
Copy link
Contributor Author

added a section in the readme to describe that mode, anything missing @szuecs ?

README.md Outdated Show resolved Hide resolved
aws/adapter.go Outdated Show resolved Hide resolved
worker.go Outdated Show resolved Hide resolved
kubernetes/pods.go Outdated Show resolved Hide resolved
kubernetes/pods.go Outdated Show resolved Hide resolved
worker.go Outdated Show resolved Hide resolved
aws/adapter.go Outdated Show resolved Hide resolved
aws/adapter.go Outdated Show resolved Hide resolved
aws/adapter.go Outdated Show resolved Hide resolved
aws/adapter.go Outdated Show resolved Hide resolved
kubernetes/pods.go Show resolved Hide resolved
// PodInformer is a event handler for Pod events registered to, that builds a local list of valid and relevant pods
// and sends an event to the endpoint channel, triggering a resync of the targets.
func (a *Adapter) PodInformer(ctx context.Context, endpointChan chan<- []string) error {
podEndpoints := sync.Map{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we need to pre-populate podEndpoints map before launching informer, otherwise it is empty after controller restart and would trigger an update after receiving a single pod change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@AlexanderYastrebov AlexanderYastrebov Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, on controller restart:

  • the main cycle will send a list of the relevant target groups.
  • this informer will receive a change of a single pod and would publish an updated list of endpoints of size 1 since podEndpoints is empty.
    The SetTargetsOnCNITargetGroups will take the difference between registered targets and this single item endpoint list and will de-register other valid endpoints.

The "guard" only verifies that TGs and EPs were received at least onec, but the update itself assumes that endpoint list contains all available EPs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That case is not existing because the informer is long time in sync before the TGs are enumerated. It is flushing those first messages. There are several seconds between

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess @AlexanderYastrebov you should take some time to explore this concept in a lab to learn how it behaves

Copy link
Contributor Author

@universam1 universam1 Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#461 (comment) doesn't work, the events are generated yet by informer.Run, not from factory.Start . That's why we cannot omit the informer.Run, and also the semaphore via atomic.StoreInt64 is effectless here, unfortunately! 😞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hypothetical or not, if the problem exist we need to tackle it.
We have sometimes more than 1k Ingress objects in a cluster with an SLA that 40k is fine. This can easily lead to kubernetes slower than AWS API response. time.Sleep is never a solution, I would rather accept a TODO comment that needs to be addressed some day.

What if we just list endpoints once on creation to populate the cache?
One issue that could happen is that if we do not execute an update operation on cloud load balancers to update the endpoints list, then a deployment of kube-ingress-aws-controller could miss a new created endpoint while it was down.
I think the right thing would be at start to make sure we list endpoints through informer (warm up cache) and use the list to trigger updates to all cloud load balancers. After that we can rely onUpdate events.

Copy link
Contributor Author

@universam1 universam1 Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually a smart idea to determine when the cache equilibrium is reached - I like that and it doesn't sound hard to implement...

Copy link
Contributor Author

@universam1 universam1 Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szuecs @AlexanderYastrebov Please check if you like this version. Using the list is convenient in several ways as it delivers the warm cache right from the start!
The successive informer updates result in a no-op since we do anyway a map reduce! via LoadOrStore and LoadAndDelete
That means, there is nothing like a warm-cache sync validation required!

The beauty is that even in a hard race we will process correctly a pod deletion right after the initial list just fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not over complicate things see also #461
@AlexanderYastrebov do you agree?

@universam1 universam1 force-pushed the TGIP branch 3 times, most recently from 8ef03e3 to 377e6c9 Compare January 24, 2022 11:38
Comment on lines +29 to +30
informer := factory.Core().V1().Pods().Informer()
factory.Start(ctx.Done())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factory.Start() apparently starts informers but we configure informer EventHandler later which seems wrong.

From this perspective informer.Run(ctx.Done()) also feels wrong as factory.Start already runs it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so Kubernetes sig-apimachinery says that it's not an issue because List is used before Watch, which means there is no race.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szuecs
In another discussion #461 (comment) this issue kubernetes/kubernetes#92777 (comment) was brought up which is exactly about the problem we have now - configuring informer after factory start.

Copy link
Member

@szuecs szuecs Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should just move the AddEventhandler call with body to before Start()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the extended unit test

Copy link
Contributor Author

@universam1 universam1 Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the sources https://github.com/kubernetes/client-go/blob/master/tools/cache/shared_informer.go#L552-L556 that explain how a safe join happens to a running informer

I prefer this solution over a semaphore since it is clean and easier to grasp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this makes sure we get all entries from a fresh LIST, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes- so theoretically we could omit the explicit List before since we get this one implicitly by late binding, but it serves two reasons so I'd keep it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this makes sure we get all entries from a fresh LIST, right?
Yes- so theoretically we could omit the explicit List

Idk, I see it uses addNotification but we only handle updates.

```
Run make lint
golangci-lint run ./...
level=error msg="Running error: context loading failed: failed to load packages: timed out to load packages: context deadline exceeded"
level=error msg="Timeout exceeded: try increasing it by passing --timeout option"
```

ref: golangci/golangci-lint#825

Seems like an incompatibility with the client-go pkg and the module go version.
Updating to 1.16 and recreating the go.sum fixes the timeouts and slims down the dependency list.

Also updating the install script for github action script of golangci-lint as it is deprecated

Signed-off-by: Samuel Lang <[email protected]>
@AlexanderYastrebov
Copy link
Member

AlexanderYastrebov commented Jan 31, 2022

I guess we've spent enough time pondering on the proposed solution so it is good enough to be accepted.
@universam1 Thank you for the commitment and persistence 👍

@AlexanderYastrebov
Copy link
Member

👍

1 similar comment
@szuecs
Copy link
Member

szuecs commented Jan 31, 2022

👍

@szuecs
Copy link
Member

szuecs commented Jan 31, 2022

@universam1 thanks and let's have follow up fixes if you detect something that does not work as great as it could.

@szuecs szuecs merged commit 55ec0b6 into zalando-incubator:master Jan 31, 2022
@universam1 universam1 deleted the TGIP branch January 31, 2022 18:51
@universam1
Copy link
Contributor Author

Thank you @szuecs for releasing v0.12.17 - just wondering if this is really considered a bugfix release bump? 😺

@szuecs
Copy link
Member

szuecs commented Jan 31, 2022

@universam1 we increase minor version in case there is a need to change flags or something like this. We add always migration guide to the readme for minor release increase.

AlexanderYastrebov added a commit that referenced this pull request Jul 4, 2022
…` version

Before version `0.12.17` loadbalancer target group target type was not
specified and defaulted to `instance` in Cloudformation, see
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticloadbalancingv2-targetgroup.html#cfn-elasticloadbalancingv2-targetgroup-targettype

PR #461 introduced AWS CNI mode and configured target group target type
either to 'ip' or 'instance'.

Changing target type from unset to 'instance' in Cloudformation triggers target
group re-creation which makes it impossible to upgrade from pre `0.12.17` without downtime.

This change:
- makes `target-access-mode` flag required to force users to choose
proper value
- renames `HostPort` option of `target-access-mode` into `InstanceFilter`
- introduces a new 'Legacy' option that does not set target
type and thus enables upgrade from pre `0.12.17`.

Fixes #507

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Jul 4, 2022
…` version

Before version `0.12.17` loadbalancer target group target type was not
specified and defaulted to `instance` in CloudFormation, see
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticloadbalancingv2-targetgroup.html#cfn-elasticloadbalancingv2-targetgroup-targettype

PR #461 introduced AWS CNI mode and configured target group target type
either to `ip` or `instance`.

Changing target type from unset to 'instance' in Cloudformation triggers target
group re-creation which makes it impossible to upgrade from pre `0.12.17` without downtime.

This change:
- makes `target-access-mode` flag required to force users to choose proper value
- renames `HostPort` option of `target-access-mode` into `InstanceFilter`
- introduces a new 'Legacy' option that does not set target type and thus enables upgrade from pre `0.12.17`.

Fixes #507

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Jul 4, 2022
…` version

Before version `0.12.17` loadbalancer target group target type was not
specified and defaulted to `instance` in CloudFormation, see
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticloadbalancingv2-targetgroup.html#cfn-elasticloadbalancingv2-targetgroup-targettype

PR #461 introduced AWS CNI mode and configured target group target type
either to `ip` or `instance`.

Changing target type from unset to `instance` in Cloudformation triggers target
group re-creation which makes it impossible to upgrade from pre `0.12.17` without downtime.

This change:
- makes `target-access-mode` flag required to force users to choose proper value
- renames `HostPort` option of `target-access-mode` into `InstanceFilter`
- introduces a new 'Legacy' option that does not set target type and thus enables upgrade from pre `0.12.17`.

Fixes #507

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Jul 4, 2022
…` version

Before version `0.12.17` loadbalancer target group target type was not
specified and defaulted to `instance` in CloudFormation, see
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticloadbalancingv2-targetgroup.html#cfn-elasticloadbalancingv2-targetgroup-targettype

PR #461 introduced AWS CNI mode and configured target group target type
either to `ip` or `instance`.

Changing target type from unset to `instance` in Cloudformation triggers target
group re-creation which makes it impossible to upgrade from pre `0.12.17` without downtime.

This change:
- makes `target-access-mode` flag required to force users to choose proper value
- renames `HostPort` option of `target-access-mode` into `InstanceFilter`
- introduces a new `Legacy` option that does not set target type and thus enables upgrade from pre `0.12.17`.

Fixes #507

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Jul 5, 2022
…` version

Before version `0.12.17` loadbalancer target group target type was not
specified and defaulted to `instance` in CloudFormation, see
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticloadbalancingv2-targetgroup.html#cfn-elasticloadbalancingv2-targetgroup-targettype

PR #461 introduced AWS CNI mode and configured target group target type
either to `ip` or `instance`.

Changing target type from unset to `instance` in Cloudformation triggers target
group re-creation which makes it impossible to upgrade from pre `0.12.17` without downtime.

This change:
- makes `target-access-mode` flag required to force users to choose proper value
- renames `HostPort` option of `target-access-mode` into `InstanceFilter`
- introduces a new `Legacy` option that does not set target type and thus enables upgrade from pre `0.12.17`.

Fixes #507

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Nov 3, 2022
…` version

Before version `0.12.17` loadbalancer target group target type was not
specified and defaulted to `instance` in CloudFormation, see
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticloadbalancingv2-targetgroup.html#cfn-elasticloadbalancingv2-targetgroup-targettype

PR #461 introduced AWS CNI mode and configured target group target type
either to `ip` or `instance`.

Changing target type from unset to `instance` in Cloudformation triggers target
group re-creation which makes it impossible to upgrade from pre `0.12.17` without downtime.

This change:
- makes `target-access-mode` flag required to force users to choose proper value
- renames `HostPort` option of `target-access-mode` into `InstanceFilter`
- introduces a new `Legacy` option that does not set target type and thus enables upgrade from pre `0.12.17`.

Fixes #507

Signed-off-by: Alexander Yastrebov <[email protected]>
AlexanderYastrebov added a commit that referenced this pull request Nov 3, 2022
…` version

Before version `0.12.17` loadbalancer target group target type was not
specified and defaulted to `instance` in CloudFormation, see
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticloadbalancingv2-targetgroup.html#cfn-elasticloadbalancingv2-targetgroup-targettype

PR #461 introduced AWS CNI mode and configured target group target type
either to `ip` or `instance`.

Changing target type from unset to `instance` in Cloudformation triggers target
group re-creation which makes it impossible to upgrade from pre `0.12.17` without downtime.

This change:
- makes `target-access-mode` flag required to force users to choose proper value
- introduces a new `Legacy` option that does not set target type and thus enables upgrade from pre `0.12.17`.

Fixes #507

Signed-off-by: Alexander Yastrebov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants