-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Our use of urllib3's ConnectionPools is not threadsafe. #1871
Comments
Note that we may not want to do anything. My general advice has always been "one session per thread". If that advice remains true we don't have to do anything. |
Another option is to use a |
Oh, this probably explains why I got Pool closed errors in pip before I ripped the threading out. |
This makes me wonder if we should be doing something. |
When I was trying to integrate requests into pip before I ripped threading out, I was able to get the Pool closed errors almost constantly. Pip's threading was a little jacked up is it created a new pool for every dependency and used it to do parallel discovery of the links from /simple/ and external sites.I just assumed we were doing something wrong and didn't feel like debugging it. Add in the changes we had been making to where, for a lot of packages, there would be no external sites I just ended up ripping out the threading code instead. But I figured I'd comment so y'all know I did run into that problem, and we did use one global session. |
Yeah, I'm not sure we should be encouraging using one session object across threads, but I can see how sharing cookies across threads would be useful to a user. |
To be honest I didn't really decide to do that explicitly. Just it was the most obvious thing to do with the pip code base so it's what I did. It's possible that's not a reasonable thing to do too ;) |
Realistically, fixing up the threadsafety of our The architectural cost of fixing up the thread safety of a |
Perhaps that should be documented? Or if it is I missed it :) |
Is thread safety only an issue when one session connects to multiple hosts? I'm curious if multi-threaded applications would be safe if they use a policy of one Session object per host. |
One session per host may not be safe if it's shared across threads, because I don't believe the stdlib cookie jar is thread safe. |
@Lukasa If the issue is limited to the stdlib cookie jar, then using a requests.Session across multiple threads should be fine for APIs (without cookies). Is this correct? |
@pior It's my belief that that should be safe. |
Thanks! |
I think using a Session is only safe in the very specific single host case, and only because there will be only one connection in the pool. As soon as the pool grows too large, you run into race conditions where a connection might be dropped before it is used. |
@pepijndevos That should not happen. The connection pool is thread-safe, and when a connection is removed from the pool it is owned entirely by the object that withdrew it. As a result, the connection should not be dropped before use. It is possible that there is a TCP FIN packet in flight when the connection is being handled before use, and as a result the connection is torn down: in that situation, a simple retry is a good idea (and something that should be being done anyway). |
According to this StackOverflow answer, it seems that it is thread-safe in certain cookielib implementations. Is it true? @Lukasa |
I would not depend on any undocumented thread-safety. If cookielib is thread-safe and it says so in the docs, then yes, we should be mostly fine. |
The main problem with thread safety (other than perhaps the cookiejar) seems to be the PoolManager's use of RecentlyUsedContainer which ignores whether or not the pool is in use when deciding to throw away pools. When it does dispose of a pool it aggressively calls pool.close(). If, instead, the PoolManager didn't call close itself and the pool implemented del to close itself, wouldn't this theoretically solve the problem? I realize that relying on the GC means we can't make any assertions about the number of TCP connections that might still be open, but this seems like the simplest potential solution to this particular problem. |
More discussion of thread safety on reversefold/urllib3@f10336d The issue with the connection pool manager appears to be one in urllib3, not in requests proper. |
It doesn't appear there's any actual work for us to do here. Closing this for now. |
I'm sorry if this is the wrong place for the question, but this appears to be the only official place that such a thing is discussed that I can find. Is requests.Session thread-safe or not? The frontpage says that requests is thread safe, but does not mention anything about Session specifically. I understand that urllib3 connection pool is thread safe. Is Session? |
I can confirm that requests.Session() is not thread safe. Granted I was running 11 sessions for one host, but I ran all the sessions through a proxy, and only ran one session per thread, so it should have been thread safe. Be extremely careful when using threading with requests.Session(). r = requests.get() is threadsafe, but, r = requests.Sesssion().get() is not threadsafe. |
@Arbi717 the only difference between those two is that you're not closing the session properly in the latter https://github.com/requests/requests/blob/master/requests/api.py#L59 |
Forget everything I said. I had a global variable (a dictionary) that should have been local. Requests is amazing. |
Note you can use triple-backticks to surround code segments in Markdown. |
I think there are other race conditions when I do multithread calls with Python requests I get a ClosedPoolError and had to patch to return a new connection any clue on the potential impacts of doing this ? |
We use a
PoolManager
to provide us with connection pools from urllib3. However, we don't actually run our requests through it: instead we get a suitableConnectionPool
and then useurlopen
on that instead.This can lead to bugs when using threads and hitting a large number of hosts from one
Session
object. Each host creates a newConnectionPool
in thePoolManager
: if too many are created, the least recently used one is evicted.If that eviction happens in between us getting hold of the
ConnectionPool
and us actually trying to send a request, we can try to send on a closed connection pool. That's pretty lame. You can see the discussion on IRC that led to this issue here.We can either try to make that section thread safe, or try to reconnect when we hit closed pool errors. Thoughts?
The text was updated successfully, but these errors were encountered: