Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(telemetry): use importlibs instead of pkg_resources #3783

Merged
merged 4 commits into from
Jun 22, 2022

Conversation

mabdinur
Copy link
Contributor

@mabdinur mabdinur commented May 27, 2022

pkg_resources has known performance issues: pypa/setuptools#926. This PR replaces pkg_resources with importlib.metadata and uses this module to retrieve package names and versions.

A further optimization was made to the importlib implementation which parses package metadata: https://github.com/DataDog/dd-trace-py/compare/munir/benchmark-importlib...munir/tests-importlib-metadata-custom-parsing?expand=1. Benchmarks for this third optimization are also shown in the table below:

benchmark test case Number of Packages mean (ms) std (ms) baseline (ms) overhead (ms) overhead (%)
ddtracerun-auto_telemetry pkg_resources (1.x branch) 15 326 13 274 52 19.0
ddtracerun-auto_telemetry importlib 15 285 5 270 15 5.6
ddtracerun-auto_telemetry importlib with partial parsing 15 285 10 269 16 5.9
ddtracerun-auto_telemetry importlib 30 377 5 350 27 7.7
ddtracerun-auto_telemetry importlib with partial parsing 30 362 7 350 12 3.4
ddtracerun-auto_telemetry importlib 45 381 24 348 31 8.9
ddtracerun-auto_telemetry importlib with partial parsing 45 363 9 350 23 6.3
ddtracerun-auto_telemetry importlib 313 1050 79 991 59 5.9
ddtracerun-auto_telemetry importlib with partial parsing 313 911 28 905 6 0.6
benchmark test case Number of Packages mean (ms) std (ms) baseline (ms) overhead (ms) overhead (%)
ddtracerun-auto_tracing_telemetry pkg_resources (1.x) 15 324 8 274 50 18.2
ddtracerun-auto_tracing_telemetry importlib 15 293 11 272 21 8.3
ddtracerun-auto_tracing_telemetry importlib with partial parsing 15 291 12 272 19 6.9
ddtracerun-auto_tracing_telemetry importlib 30 373 11 351 22 6.28
ddtracerun-auto_tracing_telemetry importlib with partial parsing 30 367 13 354 13 3.6
ddtracerun-auto_tracing_telemetry importlib 45 376 8 355 21 5.9
ddtracerun-auto_tracing_telemetry importlib with partial parsing 45 364 9 352 22 6.5
ddtracerun-auto_tracing_telemetry importlib 313 1010 80 960 50 5.2
ddtracerun-auto_tracing_telemetry importlib with partial parsing 313 910 20 873 37 4.2

Note: redis, requests and urllib3 were included in test cases with 30 and 45 packages. These packages were patched by ddtrace-run and this increased the baseline by ~74ms but the overhead of telemetry observed remained consistent. The case with 313 packages patched gevent, pylons, SQLAlchemy, requests, flask, grpc, cassandra, botocore, and urllib3. This was to simulate the overhead of telemetry in a real world application with telemetry enabled.

Findings from benchmarking sending telemetry events with different number of packages installed, patching integrations, and/or enabling tracing:

  • Using importlib instead of pkg_resources reduced the overhead of telemetry in half (~50ms to ~19ms)
  • The number of packages does not appear to correlate with the overhead of telemetry
    • the benchmarks might've been too noisy to measure the difference accurately.
  • creating a custom parser to retrieve package names and versions from PKG-INFO and METADATA files lead to notable performance gains with a large number of packages.

Checklist

  • Library documentation is updated.
  • Corp site documentation is updated (link to the PR).

Reviewer Checklist

  • Title is accurate.
  • Description motivates each change.
  • No unnecessary changes were introduced in this PR.
  • PR cannot be broken up into smaller PRs.
  • Avoid breaking API changes unless absolutely necessary.
  • Tests provided or description of manual testing performed is included in the code or PR.
  • Release note has been added for fixes and features, or else changelog/no-changelog label added.
  • All relevant GitHub issues are correctly linked.
  • Backports are identified and tagged with Mergifyio.
  • Add to milestone.

@mabdinur mabdinur changed the title feat(telemetry): use importlibs instead of pkg_resources chore(telemetry): use importlibs instead of pkg_resources May 27, 2022
@mabdinur mabdinur force-pushed the munir/benchmark-importlib branch 4 times, most recently from 45dcd5c to 23566fc Compare May 27, 2022 20:16
@mabdinur mabdinur added the changelog/no-changelog A changelog entry is not required for this PR. label May 27, 2022
@mabdinur mabdinur force-pushed the munir/benchmark-importlib branch 2 times, most recently from 1e31e6d to 218d71e Compare May 31, 2022 15:04
@mabdinur mabdinur force-pushed the munir/benchmark-importlib branch 5 times, most recently from bc46a1a to 701be7e Compare May 31, 2022 17:45
@mabdinur mabdinur marked this pull request as ready for review May 31, 2022 20:32
@mabdinur mabdinur requested a review from a team as a code owner May 31, 2022 20:32
@mabdinur mabdinur force-pushed the munir/benchmark-importlib branch from 701be7e to d5ff8c9 Compare June 1, 2022 13:12
ddtrace/internal/packages.py Outdated Show resolved Hide resolved
@mabdinur mabdinur force-pushed the munir/benchmark-importlib branch 3 times, most recently from 0ade23b to 6576475 Compare June 1, 2022 15:26
@mabdinur mabdinur force-pushed the munir/benchmark-importlib branch from 6576475 to d7f9ea0 Compare June 1, 2022 19:32
Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just looking for a little more description in the get_distributions() function, otherwise lgtm

ddtrace/internal/packages.py Show resolved Hide resolved
ddtrace/internal/packages.py Outdated Show resolved Hide resolved
@mabdinur mabdinur force-pushed the munir/benchmark-importlib branch 2 times, most recently from 6d29375 to 1adcbe0 Compare June 1, 2022 21:39
@mabdinur mabdinur requested a review from jd June 2, 2022 00:57
@mabdinur mabdinur force-pushed the munir/benchmark-importlib branch 2 times, most recently from 3c9ff31 to 53635ce Compare June 2, 2022 14:28
ddtrace/internal/packages.py Show resolved Hide resolved
ddtrace/internal/telemetry/data.py Outdated Show resolved Hide resolved
@mabdinur mabdinur force-pushed the munir/benchmark-importlib branch from 53635ce to 8feb275 Compare June 9, 2022 14:01
@mergify mergify bot merged commit db54098 into 1.x Jun 22, 2022
@mergify mergify bot deleted the munir/benchmark-importlib branch June 22, 2022 15:50
@github-actions github-actions bot added this to the v1.3.0 milestone Jun 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog A changelog entry is not required for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants