Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing requests is fairly slow #3213

Closed
cournape opened this issue May 21, 2016 · 44 comments
Closed

Importing requests is fairly slow #3213

cournape opened this issue May 21, 2016 · 44 comments

Comments

@cournape
Copy link

We are using requests in our projects, and it is working great. Unfortunately, for our CLI tools, using requests is an issue because it is slow to import. E.g. on my 2014 macbook:

python -c "import requests"

Takes close to 90 ms.

Is optimizing import time worth of consideration for the project ?

@Lukasa
Copy link
Member

Lukasa commented May 21, 2016

This is almost certainly the result of CFFI. Can you print what libraries you have installed in your environment (pip freeze)?

@cournape
Copy link
Author

This is not linked to cffi (I don't have it installed). I did the test w/ a quasi empty virtualenv:

$ pip list
pip (8.1.2)
requests (2.10.0)
setuptools (21.1.0)
wheel (0.24.0)
$ time python -c "import requests"

real    0m0.090s
user    0m0.061s
sys 0m0.026s

@sigmavirus24
Copy link
Contributor

I have a 2012 Macbook pro:, using a slightly different way of measuring the import time:

$ pip list
pip (8.0.2)
requests (2.10.0)
setuptools (19.6.2)
wheel (0.26.0)
$ python -m timeit -n 10000000 'import requests'
10000000 loops, best of 3: 0.571 usec per loop

I also upgraded pip and setuptools to match your environment:

$ pip list
pip (8.1.2)
requests (2.10.0)
setuptools (21.1.0)
wheel (0.26.0)
$ python -m timeit -n 10000000 'import requests'
10000000 loops, best of 3: 0.59 usec per loop

Further

$ time python -c 'import requests'
python -c 'import requests'  0.07s user 0.05s system 61% cpu 0.200 total

Which is still not ideal, but is probably higher than the simple import of requests due to Python's initialization.

@sigmavirus24
Copy link
Contributor

That said, if you have information about specific things in our __init__.py (and elsewhere) that are causing slowness, I think we can fix them so long as they don't break backwards compatibility or our functional API.

@cournape
Copy link
Author

timeit is not informative for imports, you have to create a new process every time because of python import caching.

The main cost is importing urllib3, so I suspect that should be addressed there.

@nicktimko
Copy link
Contributor

@sigmavirus24 as @cournape mentions you can't time it like that; try using time on the command line. On my machine (ran each ~6-10 times to try to get a reliable average) for user+sys times (including sys because if there's any strange calls made into the kernel as a result of the import, that should be counted):

  • time python -c "" (CPy 2.7.11) = 130 ms (time for Python to start up)
  • time python -c "import requests" (2.10.0) = 240 ms (above + importing requests)
  • time python -c "import urllib3" (1.15.1) = 210 ms (installed separately)

So urllib3 takes about 80 ms, then about 30 ms more for requests stuff.

@kennethreitz
Copy link
Contributor

Neither of these numbers seem that high to me.

@nicktimko
Copy link
Contributor

Yeah. Especially as your average request is going to take a few hundred milliseconds or so. If you reeeeally want to shave off milliseconds, could always back-translate to use the stdlib. 😜

Or you could shorten the name of your command line tool by 1-2 characters.

@kennethreitz
Copy link
Contributor

Ah, I didn't realize you were working on a CLI tool. Well, it hasn't seemed to be an issue for httpie.

@cournape
Copy link
Author

It is true that as soon as you do even one request on a remote server, the delay will not matter much. But the problem in python is that you pay the import cost whether you actually do a request or not: think doing tool --version, tool --help, etc... Imagine a tool like git or hg that only occasionally do http IO.

80 ms is pretty bad in this case: the rule of thumb for a CLI tool to have noticeable delay is ~100-150 ms. Try using e.g. git with a 200 ms delay, it is a bit annoying.

Now, I completely understand that it may not be a focus of the library, especially if it is not that easy to improve.

@kennethreitz
Copy link
Contributor

kennethreitz commented Jul 13, 2016

I personally think you're using the wrong tool for the job — given that, on the metrics given above, Python itself is taking 120ms alone. Httpie and mercuial are well-loved tools that do their jobs very well, but they are not, nor ever will they be, curl or git.

@nicktimko
Copy link
Contributor

nicktimko commented Jul 13, 2016

Makes sense; you could code up some lazy loading bits. Found this though, which claims to have been excerpted from hg: https://github.com/bwesterb/py-demandimport (haven't ever used it, no guarantees).

Apparent origin: https://selenic.com/hg/file/tip/mercurial/demandimport.py

@cournape
Copy link
Author

@kennethreitz python does not take 120 ms to start, unless you are on a seriously broken environment. It is much closer to 20 ms on decently modern hw (< 5 years), i.e. importing requests means 3x the cost of starting python.

FWIW, on my 2011 Desktop PC (Debian):

$ time python -c ""
real    0m0.020s
user    0m0.016s
sys 0m0.000s
$ time python -c "import requests"
real    0m0.107s
user    0m0.088s
sys 0m0.012s

On my 2014 macbook (OS X)

$ time python -c ""

real    0m0.022s
user    0m0.010s
sys 0m0.009s
$ time python -c "import requests"
real    0m0.096s
user    0m0.060s
sys 0m0.032s

A simple hg (for help) on my macbook takes ~ 100 ms

@nicktimko
Copy link
Contributor

Oh, oops, yeah...my environment is a bit wonky because of the pyenv shim (adds about 100 ms because it's a bash script that leads to python with exec). Bypassing that, the incremental times are still the same and Python itself loads in 10 ms or so.

@kennethreitz
Copy link
Contributor

haha, I thought that number looked high :)

@dsully
Copy link

dsully commented Jan 13, 2017

The "profimp" module shows where the time goes:

https://pypi.python.org/pypi/profimp

pkg_resources takes up a lot of time in general.

Another time suck is the automatic loading of .packages.urllib3.contrib.pyopenssl. It would be great if we could opt-out, as there is core SNI support in the appropriate versions of Python.

There doesn't appear to be any way of doing this today, if PyOpenSSL is importable. Would a PR allowing an opt-out be likely to be merged?

@Lukasa
Copy link
Member

Lukasa commented Jan 13, 2017

@dsully How would you propose to expose the API for an opt-out?

@dsully
Copy link

dsully commented Jan 13, 2017

@Lukasa Unless I'm mistaken, the .contrib.pyopenssl ssl wrapper & context is not needed on Python 2.7.9+ and 3.4+. Given that, requests is always using pyopenssl when it is installed, even when the core Python libs support SNI, etc. So, changing the import to be:

# Attempt to enable urllib3's SNI support, if possible & needed.
try:
    from ssl import SSLContext  # Python 2.7.9+, 3.4+
except ImportError:
    try:
         from .packages.urllib3.contrib import pyopenssl
         pyopenssl.inject_into_urllib3()
    except ImportError:
         pass

Less of an opt-out, more of a not-needed.

Now, if there are cases where the pyopenssl SSLContext wrapper is desired for some reason (?) even when the core libraries are sufficient, then I'll make another pass. What do you think?

@sigmavirus24
Copy link
Contributor

@dsully you said:

pkg_resources takes up a lot of time in general.

Where is Requests using pkg_resources? I don't believe either Requests or Urllib3 uses it.

Now, if there are cases where the pyopenssl SSLContext wrapper is desired for some reason (?) even when the core libraries are sufficient

If I remember correctly, only the more recent versions of 2.7 are actually appropriate. I think there were some minor issues with 2.7.9 and 2.7.10.

@Lukasa
Copy link
Member

Lukasa commented Jan 13, 2017

Unless I'm mistaken, the .contrib.pyopenssl ssl wrapper & context is not needed on Python 2.7.9+ and 3.4+.

Unfortunately, that's not true. The best example is on the system Python on macOS, which provides an SSLContext but which ships OpenSSL 0.9.8zh. This is an almost impossible to use version which is missing a number of vital features, including TLSv1.2 support. This is resolved by PyOpenSSL, which uses the OpenSSL 1.0.2 statically linked inside the cryptography module. This is vital to secure systems.

More generally, using PyOpenSSL allows for users to link in a version of OpenSSL that is not constrained by the OpenSSL that their base Python is linked against.

@dsully
Copy link

dsully commented Jan 13, 2017

@sigmavirus24 Indirectly via cryptography:

src/cryptography/hazmat/backends/__init__.py
7:import pkg_resources
27:            for ep in pkg_resources.iter_entry_points(

Imported via:

packages/urllib3/contrib/pyopenssl.py
49:from cryptography.hazmat.backends.openssl import backend as openssl_backend
50:from cryptography.hazmat.backends.openssl.x509 import _Certificate

@dsully
Copy link

dsully commented Jan 13, 2017

@Lukasa Right.. forgot about that. I don't have that particular issue on macOS, but most people do.

I'll look at coming up with a way to explicitly opt-out then.

@Lukasa
Copy link
Member

Lukasa commented Jan 13, 2017

It should be noted that the easiest way to opt-out is to simply not have PyOpenSSL in your environment. If it's present and you need it, you're presumably paying that import cost somewhere anyway.

@sigmavirus24
Copy link
Contributor

Also worth noting that pkg_resources takes longer to iterate over entry-points on systems with very large numbers of packages installed. It's also sadly the best solution to finding plugins defined by packages.

@dsully
Copy link

dsully commented Jan 13, 2017

@Lukasa - yes and no. Our build environment has real dependency management for modules (https://engineering.linkedin.com/blog/2016/08/introducing--py-gradle--an-open-source-python-plugin-for-gradle), which means just because someone included PyOpenSSL in their dependency tree, doesn't mean that the code you are importing for your upstream uses it. Lots of code gets installed transitively, but not imported.

@Lukasa
Copy link
Member

Lukasa commented Jan 13, 2017

@reaperhulk BTW, is slow import of cryptography still an ongoing issue?

@dsully
Copy link

dsully commented Jan 13, 2017

@Lukasa - without moving the location of the pyopenssl loader, currently in requests/__init__.py, would an environment variable be acceptable?

@Lukasa
Copy link
Member

Lukasa commented Jan 13, 2017

I'm tentatively open to that, yes.

@sigmavirus24
Copy link
Contributor

Lots of code gets installed transitively, but not imported.

That sounds like the real root cause of this. Well that and the possibility that cryptography is still slow to import. Which is an issue for pyca/cryptography.

would an environment variable be acceptable?

I don't like that. Let's say someone is using HTTPie and notice it's slow. They're also using requests in a development project. If they export the environment variable to speed it up without understanding the ramifications and they're on an LTS distribution with a terrible ssl module and terrible OpenSSL, then they'll start seeing SSLErrors in their project if they were relying on any of the functionality provided by PyOpenSSL. But those SSLErrors weren't there yesterday and they've already forgotten that they exported that variable. Now we have yet one more avenue of complexity to help them debug. I abhor that.

@dsully
Copy link

dsully commented Jan 13, 2017

@sigmavirus24 I hear you there. If environment changes aren't ok, an explicit call? I'd have to move the current injection, since it happens in __init__.py

@sigmavirus24
Copy link
Contributor

@dsully so I think what we need to balance is: "better security for most people" versus "avoiding performance penalties for a smaller group of people".

If I understand your situation correctly, you have recent enough versions of OpenSSL and Python, yes? In that case a version check against the OPENSSL_VERSION in ssl would be good enough for your needs, right? Is there a reason not to try to check against that (or perhaps a more friendly attribute that isn't a string)?

@dsully
Copy link

dsully commented Jan 13, 2017

Additional data here - if the PyOpenSSL (16.2.0) based SSLContext wrapper is loaded, and in requests verify is set to a file path, in this case containing an internal CA bundle, it seems as though the default CA bundle is removed and replaced with just the passed CA bundle, instead of appending. This works fine with Python 3.4+'s built in ssl.SSLContext (ie: when PyOpenSSL is not available).

Since this is getting far off the tract of this issue, I'll open another one to track what appears to be a bug.

@dsully
Copy link

dsully commented Jan 13, 2017

@sigmavirus24 I think so.. but I don't follow 100%. Do you mean checking the OPENSSL_VERSION to avoid the pyopenssl injection?

@dsully
Copy link

dsully commented Jan 13, 2017

@sigmavirus24 And yes, you are correct - we run Python 2.7.11 and Python 3.5 (soon to be 3.6), both compiled against a non-system shipped OpenSSL.$latest. Our build system (PyGradle) also sets build time CPPFLAGS and LDFLAGS to have modules like cryptography and PyOpenSSL link to the non-system OpenSSL as well.

@sigmavirus24
Copy link
Contributor

sigmavirus24 commented Jan 13, 2017 via email

@dsully
Copy link

dsully commented Jan 13, 2017

Ok, so:

import ssl

if ssl.OPENSSL_VERSION_INFO < (1, 0, 1):
    try:
         from .packages.urllib3.contrib import pyopenssl
         pyopenssl.inject_into_urllib3()
    except ImportError:
         pass

What is the minimum OpenSSL version for the required functionality? 1.0.1?

@sigmavirus24
Copy link
Contributor

sigmavirus24 commented Jan 13, 2017 via email

@Lukasa
Copy link
Member

Lukasa commented Jan 13, 2017

Additional data here - if the PyOpenSSL (16.2.0) based SSLContext wrapper is loaded, and in requests verify is set to a file path, in this case containing an internal CA bundle, it seems as though the default CA bundle is removed and replaced with just the passed CA bundle, instead of appending.

This is not a bug, unless the behaviour is inconsistent with the standard library's behaviour when you do that. Passing a custom bundle to verify should never append to the standard CA bundle, it should always replace it.

What is the minimum OpenSSL version for the required functionality? 1.0.1?

I am nervous about setting a minimum required OpenSSL version. We could definitely do it, I suppose, but it strikes me as something that is hard to come up with a universal "good" value for.

@dsully
Copy link

dsully commented Jan 13, 2017

@Lukasa - It should be set to whatever pyOpenSSL's minimum is for the functionality that is required.

0.9.8 was dropped in pyOpenSSL 16.1.0:

https://pyopenssl.readthedocs.io/en/stable/changelog.html

So 0.9.9 would be the bare minimum, but that's hard to recommend. I still say 1.0.1 or perhaps 1.0.0

@Lukasa
Copy link
Member

Lukasa commented Jan 13, 2017

The problem is that "required" is a moving target. We have a definition of "required" that is true today, but any check like this is going to go out of date.

@dsully
Copy link

dsully commented Jan 13, 2017

Well then it should be updated when the requirements change. :)

@Lukasa
Copy link
Member

Lukasa commented Jan 13, 2017

Heh, I agree, but speaking as someone with about twice as many maintenance tasks as his job allows, I'm a big believer in reducing the number of things I have to keep track of. ;)

@dsully
Copy link

dsully commented Jan 13, 2017

PR #3817 created.

homu referenced this issue in ycm-core/YouCompleteMe Mar 6, 2017
[READY] Import the requests module lazily

The requests module is slow to import. See https://github.com/kennethreitz/requests/issues/3213. We should lazy load it to improve startup time. We do that by adding two methods to the `BaseRequest` class: one that returns the requests module and another the session object since it depends on the `request-futures` module and `requests-futures` imports `requests`. In addition, we make sure that no requests are sent at startup otherwise there would be no point to lazy load these. These requests would fail anyway since the server can't be ready yet.

Here are the improvements on startup time:
<table>
  <tr>
    <th rowspan="2">Platform</th>
    <th colspan="2">First run (ms)</th>
    <th colspan="2">Subsequent runs (ms)</th>
  </tr>
  <tr>
    <td>Before</td>
    <td>After</td>
    <td>Before</td>
    <td>After</td>
  </tr>
  <tr>
    <td>Ubuntu 16.04 64-bit</td>
    <td>240</td>
    <td>131</td>
    <td>173</td>
    <td>74</td>
  </tr>
  <tr>
    <td>macOS 10.12</td>
    <td>435</td>
    <td>315</td>
    <td>261</td>
    <td>208</td>
  </tr>
  <tr>
    <td>Windows 10 64-bit</td>
    <td>894</td>
    <td>594</td>
    <td>359</td>
    <td>247</td>
  </tr>
</table>

*Results obtained by running the `prof.py` script from [this branch](https://github.com/micbou/YouCompleteMe/tree/profiling-startup). The difference between first run and subsequent runs is Python bytecode generation (`*.pyc` files).*

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/youcompleteme/2563)
<!-- Reviewable:end -->
@sigmavirus24
Copy link
Contributor

So the cryptography portion of this was fixed by @dsully in pyca/cryptography@aa396c0 which makes me believe that this can be closed. For me, on Fedora 25 and a 4th gen x1 carbon I get these results:

~ ❯❯❯ time python -c ''
python -c ''  0.02s user 0.01s system 96% cpu 0.027 total
~ ❯❯❯ time python -c ''
python -c ''  0.01s user 0.00s system 95% cpu 0.014 total
~ ❯❯❯ time python -c 'import requests'
python -c 'import requests'  0.07s user 0.03s system 98% cpu 0.099 total
~ ❯❯❯ time python -c 'import requests'
python -c 'import requests'  0.09s user 0.02s system 98% cpu 0.110 total
~ ❯❯❯ time python -c 'import requests'
python -c 'import requests'  0.07s user 0.01s system 99% cpu 0.081 total
~ ❯❯❯ time python -c 'import requests'
python -c 'import requests'  0.06s user 0.01s system 99% cpu 0.072 total

If this crops up again, let's open a new issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants