Activator Pod keep in CrashLoopBackoff #4407

yuxiaoba · 2019-06-18T13:09:47Z

In what area(s)?

/area networking

What version of Knative?

Knative 0.6

Expected Behavior

Install the component of Knative Serving successfully

Actual Behavior

kubectl get pod -n knative-serving
NAME                                READY   STATUS    RESTARTS   AGE
activator-dfdb7f85-mfwhg            1/2     CrashLoopBackoff  15         32m
autoscaler-565cdf546b-hhkm7         2/2     Running   0          32m
controller-78564fd45c-x5mm5         1/1     Running   0          32m
networking-istio-6d95d868fb-tqwqw   1/1     Running   0          32m
webhook-5dc6f74b5b-5xdqv            1/1     Running   0          32m

Steps to Reproduce the Problem

Install Kubernetes 1.13

Install Istio 1.1.3

kubectl get pod -n istio-system 
NAME                                      READY   STATUS      RESTARTS   AGE
grafana-749c78bcc5-vtdm4                  1/1     Running     0          4h19m
istio-citadel-899dfb67c-7w5lz             1/1     Running     0          4h19m
istio-egressgateway-748d5fd794-khlm2      1/1     Running     0          4h5m
istio-egressgateway-748d5fd794-xddj4      1/1     Running     0          4h19m
istio-galley-555dd7c7d7-dffls             1/1     Running     0          4h19m
istio-ingressgateway-55dd86767f-lpp9w     1/1     Running     0          4h19m
istio-pilot-7979d58649-ffwjt              2/2     Running     0          4h19m
istio-pilot-7979d58649-lbqfv              2/2     Running     0          4h3m
istio-policy-f89c945dc-pfct8              2/2     Running     0          4h19m
istio-sidecar-injector-998dd6cbb-w5428    1/1     Running     1          4h19m
istio-telemetry-7d9d866c65-dw8mq          2/2     Running     0          4h19m
istio-tracing-595796cf54-zmqdb            1/1     Running     1          4h19m
kiali-5df77dc9b6-dlztz                    1/1     Running     0          4h19m
prometheus-7f87866f5f-gdl95               1/1     Running     1          4h19m

Install Knative Serving

kubectl apply --selector knative.dev/crd-install=true  --filename serving.yaml
kubectl apply --filename serving.yaml  --selector networking.knative.dev/certificate-provider!=cert-manager

The text was updated successfully, but these errors were encountered:

yuxiaoba · 2019-06-18T13:14:01Z

The logs of activator is here:

For activator container

kubectl logs -n knative-serving activator-dfdb7f85-mfwhg activator
...
{"level":"error","ts":"2019-06-18T13:10:58.667Z","logger":"activator","caller":"websocket/connection.go:158","msg":"Failed to send ping message","knative.dev/controller":"activator","error":"connection has not yet been established","stacktrace":"github.com/knative/serving/vendor/github.com/knative/pkg/websocket.NewDurableConnection.func3\n\t/go/src/github.com/knative/serving/vendor/github.com/knative/pkg/websocket/connection.go:158"}
{"level":"error","ts":"2019-06-18T13:11:02.003Z","logger":"activator","caller":"websocket/connection.go:158","msg":"Failed to send ping message","knative.dev/controller":"activator","error":"connection has not yet been established","stacktrace":"github.com/knative/serving/vendor/github.com/knative/pkg/websocket.NewDurableConnection.func3\n\t/go/src/github.com/knative/serving/vendor/github.com/knative/pkg/websocket/connection.go:158"}
{"level":"error","ts":"2019-06-18T13:11:05.236Z","logger":"activator","caller":"websocket/connection.go:158","msg":"Failed to send ping message","knative.dev/controller":"activator","error":"connection has not yet been established","stacktrace":"github.com/knative/serving/vendor/github.com/knative/pkg/websocket.NewDurableConnection.func3\n\t/go/src/github.com/knative/serving/vendor/github.com/knative/pkg/websocket/connection.go:158"}

For istio-proxy container

...
[2019-06-18T13:11:18.469Z] "GET / HTTP/1.1" 404 NR "-" 0 0 0 - "-" "Go-http-client/1.1" "105ae136-0eb5-9ee9-8781-83a39e5a5ef8" "autoscaler.knative-serving.svc.cluster.local.:8080" "-" - - 10.68.254.15:8080 172.20.9.17:51564 -
[2019-06-18T13:11:19.867Z] "GET / HTTP/1.1" 404 NR "-" 0 0 0 - "-" "Go-http-client/1.1" "a544b997-356b-9de0-b840-f271d4ffcaa7" "autoscaler.knative-serving.svc.cluster.local.:8080" "-" - - 10.68.254.15:8080 172.20.9.17:51568 -
[2019-06-18T13:11:19.890Z] "GET /healthz HTTP/1.1" 500 - "-" 0 40 1 0 "-" "kube-probe/1.13" "4dac30b2-a6ee-91a2-a444-54c4dda5621b" "172.20.9.17:8012" "127.0.0.1:8012" inbound|80|http|activator-service.knative-serving.svc.cluster.local - 172.20.9.17:8012 172.20.9.1:58396 -
[2019-06-18T13:11:21.468Z] "GET / HTTP/1.1" 404 NR "-" 0 0 0 - "-" "Go-http-client/1.1" "04ac8022-4ac0-9ec2-9976-01b834135dd2" "autoscaler.knative-serving.svc.cluster.local.:8080" "-" - - 10.68.254.15:8080 172.20.9.17:51576 -
[2019-06-18T13:11:22.223Z] "GET /healthz HTTP/1.1" 500 - "-" 0 40 1 0 "-" "kube-probe/1.13" "a1ccb430-b763-9e64-86e5-72332884943e" "172.20.9.17:8012" "127.0.0.1:8012" inbound|80|http|activator-service.knative-serving.svc.cluster.local - 172.20.9.17:8012 172.20.9.1:58404 -
[2019-06-18T13:11:23.752Z] "GET / HTTP/1.1" 404 NR "-" 0 0 0 - "-" "Go-http-client/1.1" "8b589bee-a8c2-9651-92f3-4adda79006db" "autoscaler.knative-serving.svc.cluster.local.:8080" "-" - - 10.68.254.15:8080 172.20.9.17:51582 -
[2019-06-18T13:11:26.704Z] "GET / HTTP/1.1" 404 NR "-" 0 0 0 - "-" "Go-http-client/1.1" "0029075a-1c7f-9fee-9551-3c7862992a4a" "autoscaler.knative-serving.svc.cluster.local.:8080" "-" - - 10.68.254.15:8080 172.20.9.17:51590 -
[2019-06-18T13:11:29.890Z] "GET /healthz HTTP/1.1" 500 - "-" 0 40 1 0 "-" "kube-probe/1.13" "f8b5097b-d827-9f72-8c3b-44e22bda7a87" "172.20.9.17:8012" "127.0.0.1:8012" inbound|80|http|activator-service.knative-serving.svc.cluster.local - 172.20.9.17:8012 172.20.9.1:58420 -
[2019-06-18T13:11:30.932Z] "GET / HTTP/1.1" 404 NR "-" 0 0 0 - "-" "Go-http-client/1.1" "7431f8ba-c511-96bf-9079-a47a913a59c2" "autoscaler.knative-serving.svc.cluster.local.:8080" "-" - - 10.68.254.15:8080 172.20.9.17:51598 -
[2019-06-18T13:11:32.223Z] "GET /healthz HTTP/1.1" 500 - "-" 0 40 1 0 "-" "kube-probe/1.13" "ae17054e-73ca-9a14-866f-bae098da2ac1" "172.20.9.17:8012" "127.0.0.1:8012" inbound|80|http|activator-service.knative-serving.svc.cluster.local - 172.20.9.17:8012 172.20.9.1:58426 -
[2019-06-18T13:11:35.459Z] "GET / HTTP/1.1" 404 NR "-" 0 0 0 - "-" "Go-http-client/1.1" "cb88188c-cf9d-9474-ba50-b3419d12d6b7" "autoscaler.knative-serving.svc.cluster.local.:8080" "-" - - 10.68.254.15:8080 172.20.9.17:51608 -
[2019-06-18T13:11:39.890Z] "GET /healthz HTTP/1.1" 500 - "-" 0 40 1 0 "-" "kube-probe/1.13" "e031aff1-1b53-9fdf-b28a-c4d2c0a3b076" "172.20.9.17:8012" "127.0.0.1:8012" inbound|80|http|activator-service.knative-serving.svc.cluster.local - 172.20.9.17:8012 172.20.9.1:58440 -
[2019-06-18T13:11:42.222Z] "GET /healthz HTTP/1.1" 500 - "-" 0 40 3 2 "-" "kube-probe/1.13" "06d1fcb9-c392-9db1-bc88-c63795a263ce" "172.20.9.17:8012" "127.0.0.1:8012" inbound|80|http|activator-service.knative-serving.svc.cluster.local - 172.20.9.17:8012 172.20.9.1:58444 -
[2019-06-18T13:11:13.682Z] "- - -" 0 - "-" 2899 132999 28591 - "-" "-" "-" "-" "192.168.199.181:6443" outbound|443||kubernetes.default.svc.cluster.local 172.20.9.17:44104 10.68.0.1:443 172.20.9.17:57854 -

yuxiaoba · 2019-06-18T13:19:15Z

@gyliu513

gyliu513 · 2019-06-18T13:30:10Z

@zxDiscovery can you help check this?

zxDiscovery · 2019-06-19T01:09:36Z

@yuxiaoba
Can you show me the Gateway information on your environment?

kubectl get gateway -n knative-serving

yuxiaoba · 2019-06-19T01:12:02Z

@zxDiscovery Thank you for your help.

$ kubectl get gateway -n knative-serving
NAME                      AGE
cluster-local-gateway     12h
knative-ingress-gateway   12h

$ kubectl get gateway -n knative-serving -o yaml
apiVersion: v1
items:
- apiVersion: networking.istio.io/v1alpha3
  kind: Gateway
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"networking.istio.io/v1alpha3","kind":"Gateway","metadata":{"annotations":{},"labels":{"networking.knative.dev/ingress-provider":"istio","serving.knative.dev/release":"v0.6.1"},"name":"cluster-local-gateway","namespace":"knative-serving"},"spec":{"selector":{"istio":"cluster-local-gateway"},"servers":[{"hosts":["*"],"port":{"name":"http","number":80,"protocol":"HTTP"}}]}}
    creationTimestamp: "2019-06-18T12:32:45Z"
    generation: 1
    labels:
      networking.knative.dev/ingress-provider: istio
      serving.knative.dev/release: v0.6.1
    name: cluster-local-gateway
    namespace: knative-serving
    resourceVersion: "1643314"
    selfLink: /apis/networking.istio.io/v1alpha3/namespaces/knative-serving/gateways/cluster-local-gateway
    uid: 2c984332-91c5-11e9-b455-5254004f8e0e
  spec:
    selector:
      istio: cluster-local-gateway
    servers:
    - hosts:
      - '*'
      port:
        name: http
        number: 80
        protocol: HTTP
- apiVersion: networking.istio.io/v1alpha3
  kind: Gateway
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"networking.istio.io/v1alpha3","kind":"Gateway","metadata":{"annotations":{},"labels":{"networking.knative.dev/ingress-provider":"istio","serving.knative.dev/release":"v0.6.1"},"name":"knative-ingress-gateway","namespace":"knative-serving"},"spec":{"selector":{"istio":"ingressgateway"},"servers":[{"hosts":["*"],"port":{"name":"http","number":80,"protocol":"HTTP"}},{"hosts":["*"],"port":{"name":"https","number":443,"protocol":"HTTPS"},"tls":{"mode":"PASSTHROUGH"}}]}}
    creationTimestamp: "2019-06-18T12:32:45Z"
    generation: 1
    labels:
      networking.knative.dev/ingress-provider: istio
      serving.knative.dev/release: v0.6.1
    name: knative-ingress-gateway
    namespace: knative-serving
    resourceVersion: "1643312"
    selfLink: /apis/networking.istio.io/v1alpha3/namespaces/knative-serving/gateways/knative-ingress-gateway
    uid: 2c815ed7-91c5-11e9-b455-5254004f8e0e
  spec:
    selector:
      istio: ingressgateway
    servers:
    - hosts:
      - '*'
      port:
        name: http
        number: 80
        protocol: HTTP
    - hosts:
      - '*'
      port:
        name: https
        number: 443
        protocol: HTTPS
      tls:
        mode: PASSTHROUGH
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

zxDiscovery · 2019-06-19T10:56:32Z

@yuxiaoba Please show the information about VirtualService in your environment. Thanks!

kubectl get virtualService

yuxiaoba · 2019-06-19T11:02:34Z

@zxDiscovery My environment doesn't seem to have VirtualService

$ kubectl get virtualservice --all-namespaces
No resources found.

mattmoor · 2019-06-19T14:15:21Z

The activator is attempting to reach the autoscaler, and won't report healthy until it does. If you are restricting network traffic, then you have to allow this for it to ever become healthy. This is what the activator logs tell me.

You should also share a describe of that pod, which will hopefully corroborate that hypothesis.

yuxiaoba · 2019-06-20T01:57:54Z

@mattmoor Thank you for your help。
This the describe of that pod

kubectl describe pod -n knative-serving activator-dfdb7f85-5467c
  Normal   Created      41m                 kubelet, 192.168.199.190  Created container
  Warning  Unhealthy    40m (x11 over 41m)  kubelet, 192.168.199.190  Readiness probe failed: HTTP probe failed with statuscode: 500
  Normal   Started      15m (x14 over 41m)  kubelet, 192.168.199.190  Started container
  Warning  BackOff      7s (x167 over 39m)  kubelet, 192.168.199.190  Back-off restarting failed container

And I am a fresh man about istio and knative, I do not know how I have set restricting network traffic, is this ？

# Source: istio/charts/security/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: istio-security-custom-resources
  namespace: istio-system
  labels:
    app: security
    chart: security
    heritage: Tiller
    release: release-name
    istio: citadel
data:
  custom-resources.yaml: |-
    # Authentication policy to enable permissive mode for all services (that have sidecar) in the mesh.
    apiVersion: "authentication.istio.io/v1alpha1"
    kind: "MeshPolicy"
    metadata:
      name: "default"
      labels:
        app: security
        chart: security
        heritage: Tiller
        release: release-name
    spec:
      peers:
      - mtls:
          mode: PERMISSIVE

mattmoor · 2019-06-20T02:06:09Z

as I suspected the activator's readiness probes are failing, it seems because it cannot open a websocket to the autoscaler's metrics endpoint.

mattmoor · 2019-06-20T02:13:09Z

I'm not sure why...

yuxiaoba · 2019-06-20T02:15:17Z

@mattmoor I am also confusing about this . Is this the code bug? Or the deployment yaml of Knative Serving has some error?

mattmoor · 2019-06-20T03:10:40Z

I don't see any indication that this is a Knative problem (yet). I'd launch an ubuntu container with a sidecar in the knative-system namespace and try to curl the autoscaler service from it.

patrickshan · 2019-06-20T04:34:48Z

we are seeing similar issue when knative is under some load (around 450 ksvc). Currently we are running knative 0.6.1.

Some commends results mentioned above:

$ kubectl get gateway -n knative-serving
NAME                      AGE
cluster-local-gateway     40d
knative-ingress-gateway   40d

$ kubectl get virtualService -n knative-serving --no-headers
route-00e90f5c-930f-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-6oe146xrcc3ifhch8iz.knative.test.example.com dev-test-6oe146xrcc3ifhch8iz.knative.svc.cluster.local]       33m
route-0da6136e-930f-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-88czshxetkdu0lejrfy6.knative.test.example.com dev-test-88czshxetkdu0lejrfy6.knative.svc.cluster.local]     27m
route-312f9031-930f-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-e0q247cj3d7k45i8yfas.knative.test.example.com dev-test-e0q247cj3d7k45i8yfas.knative.svc.cluster.local]     20m
route-407a0e40-930f-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-g0m7713xog5u32dqkf7.knative.test.example.com dev-test-g0m7713xog5u32dqkf7.knative.svc.cluster.local]       15m
route-d36e1ded-930e-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-064qojgxhoj79jtkx85k.knative.test.example.com dev-test-064qojgxhoj79jtkx85k.knative.svc.cluster.local]     35m
route-d3e3d532-930e-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-07f5l78xqrhv2wdx1vql.knative.test.example.com dev-test-07f5l78xqrhv2wdx1vql.knative.svc.cluster.local]     35m
route-d45a0690-930e-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-098whrdu50ugaukqi19.knative.test.example.com dev-test-098whrdu50ugaukqi19.knative.svc.cluster.local]       35m
route-d4a80fdc-930e-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-09gt2knxqc6o8slap0n3.knative.test.example.com dev-test-09gt2knxqc6o8slap0n3.knative.svc.cluster.local]     35m
route-d54b45f0-930e-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-0af0co4r5z4alym6s8a.knative.test.example.com dev-test-0af0co4r5z4alym6s8a.knative.svc.cluster.local]       35m
route-d5c81091-930e-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-0c9jjv26ipb5gci6sqzi.knative.test.example.com dev-test-0c9jjv26ipb5gci6sqzi.knative.svc.cluster.local]     35m
route-d6349933-930e-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-0g6m1w9hsrn8xm1q91jm.knative.test.example.com dev-test-0g6m1w9hsrn8xm1q91jm.knative.svc.cluster.local]     35m
route-d6bae3d9-930e-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-0i1g1f0hxvkrpv8be1pi.knative.test.example.com dev-test-0i1g1f0hxvkrpv8be1pi.knative.svc.cluster.local]     35m
route-d739a8d4-930e-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-0vsbctd82q3l38uzood.knative.test.example.com dev-test-0vsbctd82q3l38uzood.knative.svc.cluster.local]       35m
route-d7cf85df-930e-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-0ytgtl3t9crdvj3gdxr.knative.test.example.com dev-test-0ytgtl3t9crdvj3gdxr.knative.svc.cluster.local]       35m
route-d858cd26-930e-11e9-861e-06a53c0a941c   [knative-ingress-gateway mesh]   [dev-test-110ytza8c8pxftn05sn.knative.test.example.com dev-test-110ytza8c8pxftn05sn.knative.svc.cluster.local]       35m
......

$ kubectl get virtualService -n knative-serving --no-headers  | wc -l
91

$ kubectl get ksvc -n knative --no-headers | wc -l
449

we can see some error logs from knative autoscaler container which failed to talk with kube-apiserver but at the same time, we were able to talk with kube-apiserver without problem from local and all the other part of the cluster were just running fine.

{"level":"error","ts":"2019-06-20T04:27:49.373Z","logger":"activator","caller":"activator/main.go:147","msg":"Failed to get k8s version","knative.dev/controller":"activator","error":"Get https://10.32.0.1:443/version?timeout=32s: dial tcp 10.32.0.1:443: connect: connection refused","stacktrace":"main.main.func1\n\t/go/src/github.com/knative/serving/cmd/activator/main.go:147\ngithub.com/knative/serving/vendor/k8s.io/apimachinery/pkg/util/wait.WaitFor\n\t/go/src/github.com/knative/serving/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:331\ngithub.com/knative/serving/vendor/k8s.io/apimachinery/pkg/util/wait.pollInternal\n\t/go/src/github.com/knative/serving/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:227\ngithub.com/knative/serving/vendor/k8s.io/apimachinery/pkg/util/wait.pollImmediateInternal\n\t/go/src/github.com/knative/serving/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:252\ngithub.com/knative/serving/vendor/k8s.io/apimachinery/pkg/util/wait.PollImmediate\n\t/go/src/github.com/knative/serving/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:241\nmain.main\n\t/go/src/github.com/knative/serving/cmd/activator/main.go:145\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}

patrickshan · 2019-06-20T07:10:45Z

after further looking into our issue, it seems to be related with sidecar container resources limit for activator and autoscaler. Apparently the default 128Mi memory for sidecar container isn't enough with 450 ksvc workload. we bumped those limit by adding annotations to the PodSpec of their deployments: sidecar.istio.io/proxyCPU: 500m and sidecar.istio.io/proxyMemory: 512Mi. After that both autoscaler and activator are up and running.

mattmoor · 2019-06-20T14:05:26Z

@yuxiaoba Any chance you are doing this at scales similar to @patrickshan ?

mattmoor · 2019-06-21T01:08:14Z

@yuxiaoba could you try starting a container in the knative-serving namespace with a sidecar and curling the autoscaler service?

I notice it doesn't have a readiness probe itself, which is troubling.

yuxiaoba · 2019-06-21T02:17:32Z

@mattmoor I have tried this, but I am a freshman about this , may be I would do some incorrect action.

First, Starting a container

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: nginx
  namespace: knative-serving
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.13-alpine
        ports:
        - name: http
          containerPort: 80

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: knative-serving
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
  type: ClusterIP

kubectl apply -f <(istioctl kube-inject -f nginx.yaml)

Second, show the staus of pods

kubectl get pod -n knative-serving 
NAME                                READY   STATUS             RESTARTS   AGE
activator-dfdb7f85-pckpn            1/2     CrashLoopBackOff   27         69m
autoscaler-565cdf546b-fkdwk         2/2     Running            0          69m
controller-78564fd45c-ck4p2         1/1     Running            0          69m
networking-istio-6d95d868fb-mt6jc   1/1     Running            0          69m
nginx-798c57cf6f-9jkxx              2/2     Running            0          11m
webhook-5dc6f74b5b-8vvd2            1/1     Running            0          69m

kubectl get svc -n knative-serving 
NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
activator-service   ClusterIP   10.68.140.88    <none>        80/TCP,81/TCP,9090/TCP   69m
autoscaler          ClusterIP   10.68.86.227    <none>        8080/TCP,9090/TCP        69m
controller          ClusterIP   10.68.145.199   <none>        9090/TCP                 69m
nginx               ClusterIP   10.68.194.81    <none>        80/TCP                   30m
webhook             ClusterIP   10.68.64.29     <none>        443/TCP                  69m

Curl the autoscaler and shows the result

kubectl exec -it -n  knative-serving nginx-798c57cf6f-9jkxx -- /bin/sh
/ # curl 10.68.86.227:8080
Bad Request
/ # curl 10.68.86.227:9090
404 page not found
/# curl autoscaler.knative-serving.svc.cluster.local.:8080
Bad Request

mattmoor · 2019-06-21T02:29:25Z

What if you add -H k-network-probe: activator? I also wonder whether it is websocket support that's the problem (cc @tcnghia ).

yuxiaoba · 2019-06-21T02:47:26Z

@mattmoor Does change the yaml like this?

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: nginx
  namespace: knative-serving
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.13-alpine
        livenessProbe:
          httpGet:
            httpHeaders:
            - name: k-kubelet-probe
              value: activator
            path: /healthz
            port: 8012
        ports:
        - name: http
          containerPort: 80
        - name: http1
          containerPort: 8012
        readinessProbe:
          httpGet:
            httpHeaders:
            - name: k-kubelet-probe
              value: activator
            path: /healthz
            port: 8012

mattmoor · 2019-06-21T02:58:09Z

Actually nevermind, the autoscaler doesn't respond to network probes.

yuxiaoba · 2019-06-21T06:19:18Z

@mattmoor I think this problem may be on the sidecar, because when I install the istio without sidecar, the activator pod runs normally

tcnghia · 2019-06-21T17:30:42Z

@yuxiaoba did you enable authorization in your Istio installation?

You may want to try using https://istio.io/docs/concepts/security/#enabling-authorization and exclude the namespace knative-serving to see if this is still a problem?

yuxiaoba · 2019-06-22T01:18:43Z

@tcnghia # cat /etc/resolv.conf
nameserver 10.68.0.2
search istio-system.svc.cluster.local. svc.cluster.local. cluster.local.
options ndots:5

tcnghia · 2019-06-22T01:21:42Z

Thanks, I believe the https://github.com/knative/serving/blob/master/pkg/network/domain.go#L69 didn't handle this right.

We need to strip the trailing dot.

majuansari · 2019-08-02T03:33:04Z

@yuxiaoba You were able to fix this? I am still getting this in macosx local

yuxiaoba · 2019-08-02T14:15:50Z

@majuansari I have fixed this by change CLUSTER_DNS_DOMAIN from cluster.local. to cluster.local

jgnoonan · 2019-08-26T21:17:48Z

I am having this issue with OpenShift 3.11. I should note that I am using the multitenant CNI from Red Hat which won't allow project-to-project communications except from the default namespace. Could this be the issue?

vagababov · 2019-08-26T21:20:08Z

Can you report the actual error you're getting when the activator fails?

…

On Mon, Aug 26, 2019 at 2:17 PM Joseph G. Noonan ***@***.***> wrote: I am having this issue with OpenShift 3.11. I should note that I am using the multitenant CNI from Red Hat which won't allow project-to-project communications except from the default namespace. Could this be the issue? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4407?email_source=notifications&email_token=AAF2WX2AHJYROFFG4RYBAATQGRB75A5CNFSM4HY74L72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5FWXYY#issuecomment-525036515>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAF2WX24XCNI6REU5BL6UUTQGRB75ANCNFSM4HY74L7Q> .

jgnoonan · 2019-08-26T21:25:00Z

Yup, that was it. I ran the following command to join the project networks and both pods are now running. Again, the worked because I am using the mult-tenant sdn from Red Hat:

oc adm pod-network join-projects --to=istio-system knative-serving knative-build knative-monitoring

marcjimz · 2022-08-12T23:12:10Z

after further looking into our issue, it seems to be related with sidecar container resources limit for activator and autoscaler. Apparently the default 128Mi memory for sidecar container isn't enough with 450 ksvc workload. we bumped those limit by adding annotations to the PodSpec of their deployments: sidecar.istio.io/proxyCPU: 500m and sidecar.istio.io/proxyMemory: 512Mi. After that both autoscaler and activator are up and running.

Just want to comment that this ended up being our solution, as we had tested our system to over 2000 ksvcs. Fortunately our entire service mesh didn't require the full service discovery so we limited the egress via sidecar definitions. But it's probably not scalable to keep increasing the resources on the activator pod so we're hoping in the future we can reduce the number of routes kept in the sidecar.

[~/Documents]: istioctl proxy-config clusters activator-668db7f4d5-4q986 -n knative-serving | wc -l
    8707

yuxiaoba added the kind/bug Categorizes issue or PR as related to a bug. label Jun 18, 2019

dgerd added this to the Serving 0.8 milestone Jun 20, 2019

mattmoor mentioned this issue Jun 20, 2019

AKS pod activator CrashLoopBackOff #4081

Closed

yuxiaoba mentioned this issue Jun 21, 2019

Helloworld example get 503 #4281

Closed

tcnghia mentioned this issue Jun 24, 2019

Trim trailing dots from resolv.conf search line. #4514

Merged

mattmoor assigned tcnghia Jun 25, 2019

mattmoor added the area/networking label Jun 25, 2019

knative-prow-robot closed this as completed in #4514 Jun 25, 2019

psschwei mentioned this issue Feb 9, 2023

kn-quickstart fails to install serving components using kind knative-extensions/kn-plugin-quickstart#387

Closed

orfeas-k mentioned this issue Sep 4, 2023

Provide bundle definitions for airgapped installations canonical/bundle-kubeflow#680

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activator Pod keep in CrashLoopBackoff #4407

Activator Pod keep in CrashLoopBackoff #4407

yuxiaoba commented Jun 18, 2019 •

edited

Loading

yuxiaoba commented Jun 18, 2019

yuxiaoba commented Jun 18, 2019

gyliu513 commented Jun 18, 2019

zxDiscovery commented Jun 19, 2019

yuxiaoba commented Jun 19, 2019

zxDiscovery commented Jun 19, 2019

yuxiaoba commented Jun 19, 2019 •

edited

Loading

mattmoor commented Jun 19, 2019

yuxiaoba commented Jun 20, 2019 •

edited

Loading

mattmoor commented Jun 20, 2019

mattmoor commented Jun 20, 2019

yuxiaoba commented Jun 20, 2019

mattmoor commented Jun 20, 2019

patrickshan commented Jun 20, 2019 •

edited

Loading

patrickshan commented Jun 20, 2019

mattmoor commented Jun 20, 2019

mattmoor commented Jun 21, 2019

yuxiaoba commented Jun 21, 2019 •

edited

Loading

mattmoor commented Jun 21, 2019

yuxiaoba commented Jun 21, 2019 •

edited

Loading

mattmoor commented Jun 21, 2019

yuxiaoba commented Jun 21, 2019

tcnghia commented Jun 21, 2019

yuxiaoba commented Jun 22, 2019

tcnghia commented Jun 22, 2019

majuansari commented Aug 2, 2019

yuxiaoba commented Aug 2, 2019 •

edited

Loading

jgnoonan commented Aug 26, 2019

vagababov commented Aug 26, 2019 via email

jgnoonan commented Aug 26, 2019

marcjimz commented Aug 12, 2022

Activator Pod keep in CrashLoopBackoff #4407

Activator Pod keep in CrashLoopBackoff #4407

Comments

yuxiaoba commented Jun 18, 2019 • edited Loading

In what area(s)?

What version of Knative?

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Install Kubernetes 1.13

Install Istio 1.1.3

Install Knative Serving

yuxiaoba commented Jun 18, 2019

yuxiaoba commented Jun 18, 2019

gyliu513 commented Jun 18, 2019

zxDiscovery commented Jun 19, 2019

yuxiaoba commented Jun 19, 2019

zxDiscovery commented Jun 19, 2019

yuxiaoba commented Jun 19, 2019 • edited Loading

mattmoor commented Jun 19, 2019

yuxiaoba commented Jun 20, 2019 • edited Loading

mattmoor commented Jun 20, 2019

mattmoor commented Jun 20, 2019

yuxiaoba commented Jun 20, 2019

mattmoor commented Jun 20, 2019

patrickshan commented Jun 20, 2019 • edited Loading

patrickshan commented Jun 20, 2019

mattmoor commented Jun 20, 2019

mattmoor commented Jun 21, 2019

yuxiaoba commented Jun 21, 2019 • edited Loading

First, Starting a container

Second, show the staus of pods

Curl the autoscaler and shows the result

mattmoor commented Jun 21, 2019

yuxiaoba commented Jun 21, 2019 • edited Loading

mattmoor commented Jun 21, 2019

yuxiaoba commented Jun 21, 2019

tcnghia commented Jun 21, 2019

yuxiaoba commented Jun 22, 2019

tcnghia commented Jun 22, 2019

majuansari commented Aug 2, 2019

yuxiaoba commented Aug 2, 2019 • edited Loading

jgnoonan commented Aug 26, 2019

vagababov commented Aug 26, 2019 via email

jgnoonan commented Aug 26, 2019

marcjimz commented Aug 12, 2022

yuxiaoba commented Jun 18, 2019 •

edited

Loading

yuxiaoba commented Jun 19, 2019 •

edited

Loading

yuxiaoba commented Jun 20, 2019 •

edited

Loading

patrickshan commented Jun 20, 2019 •

edited

Loading

yuxiaoba commented Jun 21, 2019 •

edited

Loading

yuxiaoba commented Jun 21, 2019 •

edited

Loading

yuxiaoba commented Aug 2, 2019 •

edited

Loading