Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore(telemetry): use importlibs instead of pkg_resources (#3783)
pkg_resources has known performance issues: pypa/setuptools#926. This PR replaces pkg_resources with importlib.metadata and uses this module to retrieve package names and versions. A further optimization was made to the importlib implementation which parses package metadata: https://github.com/DataDog/dd-trace-py/compare/munir/benchmark-importlib...munir/tests-importlib-metadata-custom-parsing?expand=1. Benchmarks for this third optimization are also shown in the table below: | benchmark | test case | Number of Packages | mean (ms) | std (ms) | baseline (ms) | overhead (ms) | overhead (%) | |---------------------------|---------------------------|--------------------|:---------:|:--------:|:-------------:|:-------------:|:------------:| | ddtracerun-auto_telemetry | pkg_resources (1.x branch) | 15 | 326 | 13 | 274 | 52 | 19.0 | | ddtracerun-auto_telemetry | importlib | 15 | 285 | 5 | 270 | 15 | 5.6 | | ddtracerun-auto_telemetry | importlib with partial parsing | 15 | 285 | 10 | 269 | 16 | 5.9 | | ddtracerun-auto_telemetry | importlib | 30 | 377 | 5 | 350 | 27 | 7.7 | | ddtracerun-auto_telemetry | importlib with partial parsing | 30 | 362 | 7 | 350 | 12 | 3.4 | | ddtracerun-auto_telemetry | importlib | 45 | 381 | 24 | 348 | 31 | 8.9 | | ddtracerun-auto_telemetry | importlib with partial parsing | 45 | 363 | 9 | 350 | 23 | 6.3 | | ddtracerun-auto_telemetry | importlib | 313 | 1050 | 79 | 991 | 59 | 5.9 | | ddtracerun-auto_telemetry | importlib with partial parsing | 313 | 911 | 28 | 905 | 6 | 0.6 | | benchmark | test case | Number of Packages | mean (ms) | std (ms) | baseline (ms) | overhead (ms) | overhead (%) | |:---------------------------------:|---------------------------|--------------------|:---------:|:--------:|:-------------:|:-------------:|:------------:| | ddtracerun-auto_tracing_telemetry | pkg_resources (1.x) | 15 | 324 | 8 | 274 | 50 | 18.2 | | ddtracerun-auto_tracing_telemetry | importlib | 15 | 293 | 11 | 272 | 21 | 8.3 | | ddtracerun-auto_tracing_telemetry | importlib with partial parsing | 15 | 291 | 12 | 272 | 19 | 6.9 | | ddtracerun-auto_tracing_telemetry | importlib | 30 | 373 | 11 | 351 | 22 | 6.28 | | ddtracerun-auto_tracing_telemetry | importlib with partial parsing | 30 | 367 | 13 | 354 | 13 | 3.6 | | ddtracerun-auto_tracing_telemetry | importlib | 45 | 376 | 8 | 355 | 21 | 5.9 | | ddtracerun-auto_tracing_telemetry | importlib with partial parsing | 45 | 364 | 9 | 352 | 22 | 6.5 | | ddtracerun-auto_tracing_telemetry | importlib | 313 | 1010 | 80 | 960 | 50 | 5.2 | | ddtracerun-auto_tracing_telemetry | importlib with partial parsing | 313 | 910 | 20 | 873 | 37 | 4.2 | Note: redis, requests and urllib3 were included in test cases with 30 and 45 packages. These packages were patched by `ddtrace-run` and this increased the baseline by ~74ms but the overhead of telemetry observed remained consistent. The case with 313 packages patched gevent, pylons, SQLAlchemy, requests, flask, grpc, cassandra, botocore, and urllib3. This was to simulate the overhead of telemetry in a real world application with telemetry enabled. Findings from benchmarking sending telemetry events with different number of packages installed, patching integrations, and/or enabling tracing: - Using importlib instead of pkg_resources reduced the overhead of telemetry in half (~50ms to ~19ms) - The number of packages does not appear to correlate with the overhead of telemetry - the benchmarks might've been too noisy to measure the difference accurately. - creating a custom parser to retrieve package names and versions from PKG-INFO and METADATA files lead to notable performance gains with a large number of packages. - the difference appears to be within a standard deviation so more testing is required to accurately measure the difference. - Iterating on this approach might lead to better results: https://github.com/DataDog/dd-trace-py/compare/munir/benchmark-importlib...munir/tests-importlib-metadata-custom-parsing?expand=1 - These performance gains seem to be minor. It might not be work developing and maintaining a metadata parser. ## Checklist - [x] Library documentation is updated. - [x] [Corp site](https://github.com/DataDog/documentation/) documentation is updated (link to the PR). ## Reviewer Checklist - [ ] Title is accurate. - [ ] Description motivates each change. - [ ] No unnecessary changes were introduced in this PR. - [ ] PR cannot be broken up into smaller PRs. - [ ] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [ ] Tests provided or description of manual testing performed is included in the code or PR. - [ ] Release note has been added for fixes and features, or else `changelog/no-changelog` label added. - [ ] All relevant GitHub issues are correctly linked. - [ ] Backports are identified and tagged with Mergifyio. - [ ] Add to milestone.
- Loading branch information