Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): allow requests to be queued in CONNECTING state (#374) [v2] #588

Conversation

ztzg
Copy link
Contributor

@ztzg ztzg commented Feb 17, 2020

With this patch, requests issued while the client is in the CONNECTING state get queued instead of raising a misleading SessionExpiredError.

This fixes #374, and brings Kazoo more in line with the Java and C clients.

See the kazoo.client.KazooClient.state documentation as well as these discussions for more details:

This "v2" is a respin of #583 which only includes the fix for #374.

A few notes:

  • Fixes the issue, and passes existing/adapted/new tests;
  • The aforementioned queue is unbounded, but that matches the Java/C clients;
  • The queue is emptied if the client leaves the CONNECTING state;
  • The change in behavior is mentioned in the client documentation.

(Feel free to suggest a better place for that documentation. I haven't updated CHANGES.md as that seems to be done at release time.)

_create_connection = handler.create_connection

def controlled_create_connection(*args, **kwargs):
ev_can_connect.wait(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a really involved test setup. Would you not be able to reproduce the same with

  • ZkClient configured with infinite retries, flat backoff, zero jitter
  • ZkServer initially downed.
  1. You start the client -> it enters the connecting state as it keeps trying to connect
  2. You can submit commands -> no exceptions (with your patch)
  3. You start the server -> the commands should be effected.

I think it would make the test case less intrusive.
What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh :) What you are suggesting is close to what the previous test was doing:

https://github.com/python-zk/kazoo/pull/583/files#diff-b526bcbf3d9f758b2032e6763cb453c7R601

(Except for the "starting with ZK down" bit, which would indeed make things easier. I don't know if any other tests start/stop ZK, though; I will check.)

I have switched because test_connection_write_timeout, which was already using a pretty intrusive method, was close to what I wanted to test—and I thought that making a single test a bit more complicated was the lesser evil.

The new test_connection_lost_empties_queue test, which is largely a duplicate of test_connection_write_timeout, also verifies that the queue empties out when the session cannot be recovered. Would you know of a trick to invalidate the session without being intrusive? (Besides submitting _SESSION_EXPIRED, which is inapplicable because it goes through the queue.)

I must admit that the thing which bothers me more in this test file is the duplication, which I made even worse; I was thinking of factoring out at least some of these local functions, and to submit a subsequent PR if the result turned out to be cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, how about starting the client with a random ID, of course unknown to the server. Wouldn't that trigger a session expired error and clear the queues?

As for server up/down, I think I saw this in the tests pertaining to failovers. I might be wrong, would need a minute to go through the tests with a PC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay—I have now moved by tests back to TestClient, and applied some of your suggestions:

Starting with ZK down is not easy, due to the way ZookeeperCluster the test suite interact, and the fact that teardown_zookeeper does not tear down the cluster :) Moreover, the first state change a client perceives is CONNECTED; it never goes from the initial LOST to SUSPENDED.

So instead, I create a fresh client which is backed by a single server, and stop the latter to force a transition to SUSPENDED.

As for flushing: fudging the session password before reconnect is indeed a good way to trigger SessionExpiredError. (Surprisingly enough, just invoking KazooClient._reset_session isn't: a new session gets created, but the state directly transitions from SUSPENDED to CONNECTED!)

Moving the two tests back to TestClient means that they are also exercised under the gevent and eventlet handlers.

…#374)

With this patch, requests issued while the client is in the
'CONNECTING' state get queued instead of raising a misleading
'SessionExpiredError'.

This fixes python-zk#374, and brings
Kazoo more in line with the Java and C clients.

See the 'kazoo.client.KazooClient.state' documentation as well as
these discussions for more details:

python-zk#570 (comment)
python-zk#583 (comment)
@ztzg ztzg force-pushed the fix-374-queue-requests-while-connecting-v2 branch from d866316 to 3d68e84 Compare February 24, 2020 11:48
@ztzg ztzg requested a review from ceache February 24, 2020 12:37
Copy link
Contributor

@ceache ceache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @ztzg !

@ztzg
Copy link
Contributor Author

ztzg commented Mar 4, 2020

Hi @jeffwidman, @StephenSorriaux: Mind if I nag you a bit about this? :)

Copy link
Member

@StephenSorriaux StephenSorriaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR!

@StephenSorriaux StephenSorriaux merged commit a636d7a into python-zk:master Mar 9, 2020
@ztzg
Copy link
Contributor Author

ztzg commented Mar 9, 2020

Fantastic! Thank you.

Now, if I may take this opportunity to bother you a bit more... :) Do you have any plans regarding cutting a new release? And in any case: is there anything I could do which would help?

@jeffwidman
Copy link
Member

jeffwidman commented Mar 9, 2020

I think @StephenSorriaux is planning to cut a new release soon.

Thank you again for your hard work on this! Only just now looking at this, as I had a deadline at work that kept me busy the past few weeks so had to defer some of my open source involvement for a little bit.

@ztzg
Copy link
Contributor Author

ztzg commented Mar 9, 2020

Great! (And no worries; I understand that this is not your full-time occupation :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect SessionExpiredError when ZooKeeper leader process down
4 participants