Noticeable performance downgrade from Python 3.10 to onwards versions #4716

gi0baro · 2024-11-18T18:54:40Z

Hi 👋 I'm not sure wether this makes sense as an issue or should be a discussion instead: to project maintainers, feel free to move this to a discussion if you conclude that's better.

Context: in the Granian project (an HTTP server) we recently introduced some e2e benchmarks using different Python versions, which show a ~30% performance degradation for some tests when comparing Python 3.10 to all other versions onwards (PyO3 0.22, cfg pyo3_disable_reference_pool).

Now, the specific tests showing this degradation involves some relatively simple code:

we create both a pyclass and a PyDict objects (https://github.com/emmett-framework/granian/blob/c94e73e32a4865a011a4b659ef04bbc0a96e6fd4/src/wsgi/callbacks.rs#L24-L106)
call a python function with those arguments (https://github.com/emmett-framework/granian/blob/c94e73e32a4865a011a4b659ef04bbc0a96e6fd4/granian/wsgi.py#L48-L58)
that will call back a method of the first pyclass object (https://github.com/emmett-framework/granian/blob/c94e73e32a4865a011a4b659ef04bbc0a96e6fd4/src/wsgi/io.rs#L49-L57)

While I understand an e2e benchmark might suffer from a lot of additional noise when compared to a smaller unit-benchmark, and there's a lot more to consider (network stack in CPython stdlib, for example), I also believe, given other protocols involving asyncio and a bunch more stuff suffer from a very smaller degradation compared to the one I referenced, there might be something going on in PyO3 <-> CPython interop. Thus I have two main questions:

is there any well-known difference from Python 3.11 and onwards in how PyO3 interact with the Python interpreter that might explain this?
do you have any suggestions on how to investigate this in a more fine-grained way to help enlighten any other differences between Python versions that might play a role in this?

Thanks in advance 🙏

The text was updated successfully, but these errors were encountered:

davidhewitt · 2024-11-19T21:09:21Z

Thanks for the questions. I don't have an immediate answer for you; to check I understand, Python 3.11 and up got ~30% slower?

Have you tried generating flame graphs (e.g. with samply) to see if that gives a hint where the differences come from?

gi0baro · 2024-11-20T12:32:10Z

to check I understand, Python 3.11 and up got ~30% slower?

@davidhewitt correct, here is a more direct comparison extracted from that bench:

Python version	Total requests	RPS	avg latency	max latency
3.10	559148	55978	2.28ms	24.538ms
3.11	381549	38197	3.339ms	24.674ms
3.12	356792	35798	7.121ms	64.292ms
3.13	371324	37194	3.429ms	18.313ms

Have you tried generating flame graphs (e.g. with samply) to see if that gives a hint where the differences come from?

that makes sense. Let me plan some tests using sampling, I'll post here the findings.

gi0baro · 2024-12-06T15:25:51Z

@davidhewitt I tried using samply as you suggested to build flame graphs, but even with the following profile used in build

[profile.profiling]
inherits = "release"
debug = true

the stacks in the report just shows items as 0x12989b _granian.cpython-310-x86_64-linux-gnu.so, so it's quite hard to spot on differences between 3.10 and 3.11 builds. Do you have any further suggestions on how to get full stacks on a PyO3 cdylib built library?

gi0baro · 2024-12-06T15:48:54Z

Btw, I re-run tests with PyO3 0.23 and they show the same issue (ignore the absolute numbers vs last table as it's different hardware):

Python version	RPS	avg latency	max latency
3.9	132489	0.483ms	1.653ms
3.10	132521	0.482ms	1.565ms
3.11	64298	0.994ms	1.816ms
3.12	62252	2.054ms	7.601ms
3.13	63075	1.014ms	1.692ms

Might this be related to threads? The involved code has the main Python thread waiting on a threading.Event object, with 1 tokio thread dealing with I/O and sending/receiving stuff from a 2nd thread which interacts with Python code through a loop of Python::with_gil(|py| { ... }) calls. I'm starting wondering there might be some difference in GIL acquisition from different threads after 3.10.

gi0baro changed the title ~~Noticeable performance downgrade from Python 3.10 to onwards version~~ Noticeable performance downgrade from Python 3.10 to onwards versions Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noticeable performance downgrade from Python 3.10 to onwards versions #4716

Noticeable performance downgrade from Python 3.10 to onwards versions #4716

gi0baro commented Nov 18, 2024

davidhewitt commented Nov 19, 2024

gi0baro commented Nov 20, 2024

gi0baro commented Dec 6, 2024

gi0baro commented Dec 6, 2024 •

edited

Loading

Noticeable performance downgrade from Python 3.10 to onwards versions #4716

Noticeable performance downgrade from Python 3.10 to onwards versions #4716

Comments

gi0baro commented Nov 18, 2024

davidhewitt commented Nov 19, 2024

gi0baro commented Nov 20, 2024

gi0baro commented Dec 6, 2024

gi0baro commented Dec 6, 2024 • edited Loading

gi0baro commented Dec 6, 2024 •

edited

Loading