Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our use of urllib3's ConnectionPools is not threadsafe. #1871

Closed
Lukasa opened this issue Jan 20, 2014 · 27 comments
Closed

Our use of urllib3's ConnectionPools is not threadsafe. #1871

Lukasa opened this issue Jan 20, 2014 · 27 comments
Milestone

Comments

@Lukasa
Copy link
Member

Lukasa commented Jan 20, 2014

We use a PoolManager to provide us with connection pools from urllib3. However, we don't actually run our requests through it: instead we get a suitable ConnectionPool and then use urlopen on that instead.

This can lead to bugs when using threads and hitting a large number of hosts from one Session object. Each host creates a new ConnectionPool in the PoolManager: if too many are created, the least recently used one is evicted.

If that eviction happens in between us getting hold of the ConnectionPool and us actually trying to send a request, we can try to send on a closed connection pool. That's pretty lame. You can see the discussion on IRC that led to this issue here.

We can either try to make that section thread safe, or try to reconnect when we hit closed pool errors. Thoughts?

@Lukasa
Copy link
Member Author

Lukasa commented Jan 20, 2014

Note that we may not want to do anything.

My general advice has always been "one session per thread". If that advice remains true we don't have to do anything.

@pepijndevos
Copy link
Contributor

Another option is to use a threading.local internally for some critical state, that way you effectively, automaticaly and trivially have one Session per tread.

@dstufft
Copy link
Contributor

dstufft commented Jan 23, 2014

Oh, this probably explains why I got Pool closed errors in pip before I ripped the threading out.

@sigmavirus24
Copy link
Contributor

Oh, this probably explains why I got Pool closed errors in pip before I ripped the threading out.

This makes me wonder if we should be doing something.

@dstufft
Copy link
Contributor

dstufft commented Jan 26, 2014

When I was trying to integrate requests into pip before I ripped threading out, I was able to get the Pool closed errors almost constantly. Pip's threading was a little jacked up is it created a new pool for every dependency and used it to do parallel discovery of the links from /simple/ and external sites.I just assumed we were doing something wrong and didn't feel like debugging it. Add in the changes we had been making to where, for a lot of packages, there would be no external sites I just ended up ripping out the threading code instead.

But I figured I'd comment so y'all know I did run into that problem, and we did use one global session.

@sigmavirus24
Copy link
Contributor

Yeah, I'm not sure we should be encouraging using one session object across threads, but I can see how sharing cookies across threads would be useful to a user.

@dstufft
Copy link
Contributor

dstufft commented Jan 26, 2014

To be honest I didn't really decide to do that explicitly. Just it was the most obvious thing to do with the pip code base so it's what I did. It's possible that's not a reasonable thing to do too ;)

@Lukasa
Copy link
Member Author

Lukasa commented Jan 26, 2014

Realistically, fixing up the threadsafety of our Session object is substantial work. Realistically, one Session per thread is probably the best way to go at this stage.

The architectural cost of fixing up the thread safety of a Session is probably quite significant, at least if we want to do it well. At the very least, we need to lock on cookie access. Not sure how much more work would be required.

@dstufft
Copy link
Contributor

dstufft commented Jan 26, 2014

Perhaps that should be documented? Or if it is I missed it :)

@idealopamp
Copy link

Is thread safety only an issue when one session connects to multiple hosts? I'm curious if multi-threaded applications would be safe if they use a policy of one Session object per host.

@Lukasa
Copy link
Member Author

Lukasa commented Dec 11, 2015

One session per host may not be safe if it's shared across threads, because I don't believe the stdlib cookie jar is thread safe.

@pior
Copy link

pior commented Dec 14, 2015

@Lukasa If the issue is limited to the stdlib cookie jar, then using a requests.Session across multiple threads should be fine for APIs (without cookies).

Is this correct?

@Lukasa
Copy link
Member Author

Lukasa commented Dec 14, 2015

@pior It's my belief that that should be safe.

@pior
Copy link

pior commented Dec 14, 2015

Thanks!

@pepijndevos
Copy link
Contributor

I think using a Session is only safe in the very specific single host case, and only because there will be only one connection in the pool. As soon as the pool grows too large, you run into race conditions where a connection might be dropped before it is used.

@Lukasa
Copy link
Member Author

Lukasa commented Dec 15, 2015

@pepijndevos That should not happen. The connection pool is thread-safe, and when a connection is removed from the pool it is owned entirely by the object that withdrew it. As a result, the connection should not be dropped before use. It is possible that there is a TCP FIN packet in flight when the connection is being handled before use, and as a result the connection is torn down: in that situation, a simple retry is a good idea (and something that should be being done anyway).

@ginward
Copy link

ginward commented Jul 18, 2017

According to this StackOverflow answer, it seems that it is thread-safe in certain cookielib implementations. Is it true? @Lukasa

@Lukasa
Copy link
Member Author

Lukasa commented Jul 18, 2017

I would not depend on any undocumented thread-safety. If cookielib is thread-safe and it says so in the docs, then yes, we should be mostly fine.

@reversefold
Copy link

The main problem with thread safety (other than perhaps the cookiejar) seems to be the PoolManager's use of RecentlyUsedContainer which ignores whether or not the pool is in use when deciding to throw away pools. When it does dispose of a pool it aggressively calls pool.close(). If, instead, the PoolManager didn't call close itself and the pool implemented del to close itself, wouldn't this theoretically solve the problem? I realize that relying on the GC means we can't make any assertions about the number of TCP connections that might still be open, but this seems like the simplest potential solution to this particular problem.

@reversefold
Copy link

More discussion of thread safety on reversefold/urllib3@f10336d

The issue with the connection pool manager appears to be one in urllib3, not in requests proper.

@sigmavirus24
Copy link
Contributor

It doesn't appear there's any actual work for us to do here. Closing this for now.

@jarshwah
Copy link

jarshwah commented Nov 1, 2018

I'm sorry if this is the wrong place for the question, but this appears to be the only official place that such a thing is discussed that I can find.

Is requests.Session thread-safe or not?

The frontpage says that requests is thread safe, but does not mention anything about Session specifically.

I understand that urllib3 connection pool is thread safe. Is Session?

@PythonProgrammer77
Copy link

I'm sorry if this is the wrong place for the question, but this appears to be the only official place that such a thing is discussed that I can find.

Is requests.Session thread-safe or not?

The frontpage says that requests is thread safe, but does not mention anything about Session specifically.

I understand that urllib3 connection pool is thread safe. Is Session?

I can confirm that requests.Session() is not thread safe.
I am running one session per thread (on 3-4 threads) and it is still corrupting data. I think you are limited to about one session per Python instance.

Granted I was running 11 sessions for one host, but I ran all the sessions through a proxy, and only ran one session per thread, so it should have been thread safe.

Be extremely careful when using threading with requests.Session().

r = requests.get() is threadsafe, but, r = requests.Sesssion().get() is not threadsafe.

@nanonyme
Copy link
Contributor

@Arbi717 the only difference between those two is that you're not closing the session properly in the latter https://github.com/requests/requests/blob/master/requests/api.py#L59

@PythonProgrammer77
Copy link

PythonProgrammer77 commented Jan 15, 2019

Forget everything I said. I had a global variable (a dictionary) that should have been local. Requests is amazing.

@nanonyme
Copy link
Contributor

Note you can use triple-backticks to surround code segments in Markdown.

@marcobazzani
Copy link

I think there are other race conditions when I do multithread calls with Python requests I get a ClosedPoolError and had to patch to return a new connection

urllib3/urllib3#951 (comment)

any clue on the potential impacts of doing this ?

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests