-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent the PassthroughCluster for clients/workloads in the service mesh #3711
Prevent the PassthroughCluster for clients/workloads in the service mesh #3711
Conversation
The KServe Ingress VirtualServices are created with configurations targeting only the Gateways. Although this works, the omission of the Istio sidecars has the following downsides for workloads that belong to the Istio mesh: * Requests to InferenceServices will be treated as going to external services (i.e. not part of the mesh), because the sidecars are unaware of the routing rules. * In consequence, the requests will be hanlded as with any external (non-mesh) workload: the ingress gateway will first receive the request and will forward it to itself doing the URL rewrite to the relevant -predictor, -explainer or -transformer hostname. Such forwarding can be avoided (for mesh-workloads) and the rewrite can be performed by the sidecars with the right VirtualService configuration. This is adding the missing configurations in the KServe-created VirtualService, so that Istio sidecars are aware of the KServe services/hostnames and do the rewrite in the sidecar, rather than delaying/deferring the rewrite to the Gateway. For workloads that belong to the mesh, slightly better performance may be seen (given one request forwarding is saved) and better observability from Istio may also be possible. Signed-off-by: Edgar Hernández <[email protected]>
@@ -231,6 +231,7 @@ const ( | |||
|
|||
var ( | |||
LocalGatewayHost = "knative-local-gateway.istio-system.svc." + network.GetClusterDomainName() | |||
IstioMeshGateway = "mesh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it safe to hardcode to mesh?
I mean, it is isolated peer namespace, but the MeshConfig can have m ore than one namespace, I don't think it is the case for KServe, just double checking :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be safe.
The mesh gateway is a keyword in the VirtualService. It means that the configuration should be applied to the sidecars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One little nit - it is applied to ALL sidecars in the mesh.
One thing I wonder is if "mesh" was deliberately skipped in the original architecture or if it was overlooked. I can't answer this question but I wonder if this has any impact on the traffic flow and alternates some rules or is it working as expected. |
It was probably an oversight, I think this should work fine as it is adding the mesh gateway to resolve the routing in the sidecar instead of going to the passthrough. |
@yuzisun looks like this pr is ready to merge. could you please approve it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@yuzisun OK, I have one approval. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: israel-hdez, spolti, yuzisun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…esh (kserve#3711) Prevent the PassthroughCluster for clients in the service mesh The KServe Ingress VirtualServices are created with configurations targeting only the Gateways. Although this works, the omission of the Istio sidecars has the following downsides for workloads that belong to the Istio mesh: * Requests to InferenceServices will be treated as going to external services (i.e. not part of the mesh), because the sidecars are unaware of the routing rules. * In consequence, the requests will be hanlded as with any external (non-mesh) workload: the ingress gateway will first receive the request and will forward it to itself doing the URL rewrite to the relevant -predictor, -explainer or -transformer hostname. Such forwarding can be avoided (for mesh-workloads) and the rewrite can be performed by the sidecars with the right VirtualService configuration. This is adding the missing configurations in the KServe-created VirtualService, so that Istio sidecars are aware of the KServe services/hostnames and do the rewrite in the sidecar, rather than delaying/deferring the rewrite to the Gateway. For workloads that belong to the mesh, slightly better performance may be seen (given one request forwarding is saved) and better observability from Istio may also be possible. Signed-off-by: Edgar Hernández <[email protected]> Signed-off-by: asdqwe123zxc <[email protected]>
…esh (kserve#3711) Prevent the PassthroughCluster for clients in the service mesh The KServe Ingress VirtualServices are created with configurations targeting only the Gateways. Although this works, the omission of the Istio sidecars has the following downsides for workloads that belong to the Istio mesh: * Requests to InferenceServices will be treated as going to external services (i.e. not part of the mesh), because the sidecars are unaware of the routing rules. * In consequence, the requests will be hanlded as with any external (non-mesh) workload: the ingress gateway will first receive the request and will forward it to itself doing the URL rewrite to the relevant -predictor, -explainer or -transformer hostname. Such forwarding can be avoided (for mesh-workloads) and the rewrite can be performed by the sidecars with the right VirtualService configuration. This is adding the missing configurations in the KServe-created VirtualService, so that Istio sidecars are aware of the KServe services/hostnames and do the rewrite in the sidecar, rather than delaying/deferring the rewrite to the Gateway. For workloads that belong to the mesh, slightly better performance may be seen (given one request forwarding is saved) and better observability from Istio may also be possible. Signed-off-by: Edgar Hernández <[email protected]> Signed-off-by: asdqwe123zxc <[email protected]>
What this PR does / why we need it
Background
The KServe Ingress VirtualServices are created with configurations targeting only the Gateways. Although this works, the omission of the Istio sidecars has the following downsides for client/workloads that belong to the Istio mesh (i.e. have an Istio sidecar):
-predictor
,-explainer
or-transformer
hostname. Such forwarding can be avoided (for mesh-workloads) and the rewrite can be performed by the sidecars with the right VirtualService configuration.This can be verified in the metrics that Istio emits. For example, the
istio_requests_total
metric would be emitted like this (some labels omitted for brevity):Specifically, the
destination_service_name="PassthroughCluster"
reveals that the requests are being hanlded as leaving the service mesh, despite the client-workload has a sidecar. Such requests would be blocked if Istio would be configured withREGISTRY_ONLY
.When using the Kiali project, its graph will reveal that observability is potentially lost. For example, the following image shows that the
curl-inside-mesh
workload has a sidecar and all its requests are going to the PassthroghCluster, despite everything is in the mesh.Fix
This is adding the missing configurations in the KServe-created VirtualService, so that Istio sidecars are aware of the KServe services/hostnames and workloads with an Istio sidecar will correctly handle the requests as mesh-internal traffic. Also, there is the added benefit that the URL rewrite will be done in the sidecar, rather than delaying/deferring the rewrite to the Gateway and this saves one request forwarding (potentially, slightly better performance).
With the fixed configs, the following is an example of a metric that will be emitted by Istio:
Such metric no longer involves the PassthroughCluster, which should be understood as Istio handling the requests as mesh-internal. Furthermore, if mTLS is enabled in Istio, the metrics will also have principal labels populated (requests are authenticated); e.g:
...and this gives the advantage of being able to use Istio security features, if needed (like AuthorizationPolicies). Also, better observability may also be possible. For example, the Kiali graph would now be like this:
Type of changes
Please delete options that are not relevant.
Feature/Issue validation/testing
Checklist: