-
Notifications
You must be signed in to change notification settings - Fork 623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refresh ring when node UP event received for missing host #1669
Conversation
To be clear, this is occurs when multiple frames are passed to
|
any thoughts on this? |
We're now running these changes on our sizeable production infrastructure, having built them off a private fork. This issue doesn't just affect folks who cycle ip's on Cassandra restart, it also prevents node-rotation. Our IPs are stable across restarts, but we regularly perform vertical scaling operations with We (sseidman, myself, and the folks we work with) are happy to do any reasonable amount of work required to make this fix (or a better designed fix, if the maintainers think this is a non-optimal approach) upstreamable. Whether that's more tests, docs, testing alternate approaches... I think there's significant value in addressing this bug and we're motivated to try to get a fix merged if there's a world where that can happen. I don't think we plan on doing further nudges here to try to elicit a maintainer response, though. |
@martin-sucha Any chance this can be reviewed and merged? |
#915 seems to be related too. Do you think this will fix it too? |
Hi! Looks good to me. Sorry for the delay. What do you guys think of merging #1680 instead? Would that one work for you? Seems that it handles more cases. |
The current issue is that there are no events dispatched for
NEW_NODE
changes. In a 3 node cassandra cluster deployed on cloud infrastructure, a cassandra instance was replaced using thereplace_address
flag so that it was moved to a new cloud instance (new IP, new host ID). The following debug logs are an example of how the client is currently handling this change in topology.There is no event dispatched for the
NEW_NODE
event and it jumps straight to processing theUP
event, so the new node is never added to session ring. If all nodes in the cluster are replaced in this manner, the eventual outcome is that clients lose connection to the cluster and begin outputtinggocql: no hosts available in the pool
.After bisecting recent commits, I found this PR introduced the bug into the client. It looks like there was seemingly a fail-safe for when the
NEW_NODE
event was missed, but this function was removed in the previously mentioned PR.This change proposes to refresh the hostSource ring on
UP
events when the host cannot be found in the ring. This ensures that the hostMap stays up to date even ifNEW_NODE
events are not processed.This should fix #1668, #1667, and #1582