-
Notifications
You must be signed in to change notification settings - Fork 623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reresolve hostnames as fallback when all hosts are unreachable #1708
Conversation
@martin-sucha Could you please take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wprzytula thank you for the pull request! This is indeed a useful fix. I've added a few comments inline.
35aaf59
to
25a06a8
Compare
I've addressed comments and retested. |
If all nodes in the cluster change their IPs at one time, driver used to no longer be able to ever contact the cluster; the only solution was to restart the driver. A fallback is added to the control connection `reconnect()` logic so that when no known host is reachable, all hostnames provided in ClusterConfig (initial contact points) are reresolved and control connection is attempted to be opened to any of them. If this succeeds, a metadata fetch is issued normally and the whole cluster is discovered with its new IPs. For the cluster to correctly learn new IPs in case that nodes are accessible indirectly (e.g. through a proxy), that is, by translated address and not `rpc_address` or `broadcast_address`, the code introduced in apache#1682 was extended to remove and re-add a host also when its translated address changed (even when its internal address stays the same). As a bonus, a misnamed variable `hostport` is renamed to a suitable `hostaddr`.
25a06a8
to
b9737dd
Compare
@martin-sucha Done. |
Merged. Thank you! Would you be willing to contribute a test that would ensure that the function does not break in the future? |
I'm afraid that automated testing of DNS changes is too much effort. The manual tests that I ran were complex: they involved:
|
If all nodes in the cluster change their IPs at one time, driver used to no longer be able to ever contact the cluster; the only solution was to restart the driver. A fallback is added to the control connection
reconnect()
logic so that when no known host is reachable, all hostnames provided in ClusterConfig (initial contact points) are reresolved and control connection is attempted to be opened to any of them. If this succeeds, a metadata fetch is issued normally and the whole cluster is discovered with its new IPs.For the cluster to correctly learn new IPs in case that nodes are accessible indirectly (e.g. through a proxy), that is, by translated address and not
rpc_address
orbroadcast_address
, the code introduced in #1682 is extended to remove and re-add a host also when its translated address changed (even when its internal address stays the same).As a bonus, a misnamed variable
hostport
is renamed to a suitablehostaddr
.