-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nats: no responders available for request #2741
Comments
I can't remember the code for nats specifically but it could.be that across restarts the registry is being polluted by dead nodes. Ensure you're using heartbeat and ttls for expiry with the registry and set your client retries to 3. That should mitigate some of the problem. |
If you look at the nats registry code in the plugins repo you'll notice that deregistration has no form of broadcasting. So effectively it can result in dead nodes. It's been a long time since I've done any development here so you'll need to investigate yourself I'm afraid. |
@asim this problem is related to all registries available in plugins repo so I think this is something related to dead nodes or so. Would you mind giving more advise about the fix because I really stuck here. Thank you very much. |
If it's related to all registries then it's an issue with shutdown and services not getting the time to deregister. It can happen if they are killed without a termination signal, usually in a k8s like environment or kill -9 locally. If you have ttl and expiry set in the service options then these entries should expire from the registry but you can increase the client retries so it immediately tries a different service entry from the registry. service.Client().Init(client.Retries(3)) |
@asim weird thing is that when I'm using a custom selector strategy, I receive an incomplete list of nodes sometimes exactly when a request fails. |
Then the issue is likely with your custom selector I guess. What are you using? |
@asim I use a self-pinger client https://github.com/begmaroman/go-micro-boilerplate/blob/master/proto/health/pinger.go and the custom selector https://github.com/begmaroman/go-micro-boilerplate/blob/master/proto/health/health.go So the list of nodes which is coming as an argument has 2 or 1 nodes. When it has 1 node, the request usually fails. |
Some more context:
|
Ok so assuming this is a self healthcheck I guess the assumption is going to be the error could potentially occur before the service actually registers if the request to ping is initiated before the entry is in the registry, meaning no nodes are returned matching the instance. That's the only time I could see that error based on my limited understanding of the code |
@asim given that I use a regular way to define microservice and communicate with it, should we consider the issue is in the framework? |
It fails every second request. I send ping and healthchecks after the service is fully started. |
How often is the healthcheck fired? |
|
Every second. I tried every 5 seconds as well. The point is that the healthcheck of the service itself and the selfpinger client with the custom selector works well. Results of executing 2k requests with 50 concurrency:
without a custom selector strategy:
As fas as I can guess, it fails when trying to retrieve a live node, but if we specify which exactly node I want to use, it works well. |
I found the problem. Due to some reason, the list of nodes in the service object has a weird item which does not work:
Since the default random selector is used, in 50% cases it tries to use unreachable node. The valid node ID starts with the service name plus node id while this bad node has only ID. I created a custom selector which filters out bad nodes. |
Description of the bug
I'm building an example of the microservices project built using
go-micro
with two simple services: https://github.com/begmaroman/go-micro-boilerplate/blob/feature/k8s/docker-compose.yamlThe first service is rest-api-svc and built using go micro web framework. Another one is account-svc built using go-micro.
They use nats server for discovery, transport and as a broker. I tried all other options for the transport but the following bug still appears:
The interesting point is that the bug is flaky and appears ~ once per 10 request.
How to reproduce the bug
go mod download
make build-base-image
docker compose up --build
GET http://localhost:3004/user
few timesEnvironment
Go Version: 1.23.0
The text was updated successfully, but these errors were encountered: