Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with Julia profiling support? #1534

Closed
vilterp opened this issue Oct 20, 2022 · 10 comments
Closed

Help with Julia profiling support? #1534

vilterp opened this issue Oct 20, 2022 · 10 comments
Labels
ack enhancement quick change/addition that does not need full team approval profiler

Comments

@vilterp
Copy link

vilterp commented Oct 20, 2022

Hi folks, I know this is an unusual request and not quite the right place; just hoping for a pointer or two 🙏

Motivation

My company, RelationalAI (GitHub, Website) has a complex server written in Julia, and uses Datadog heavily for observability, including writing our own datadog metrics and tracing (not OSS yet) libraries. We've also added an allocation profiler and heap snapshotter to Julia.

We'd love to use Datadog profiling for Julia too! Sadly it's not supported yet, as Julia not a common language for server apps.

Thankfully, Julia already has a profiler which outputs PProf files, so I was hoping we could mimic the official SDKs — i.e. post a PProf file to the same endpoint, and have it show up. In the meantime, we are sending around .pb.gz files and screenshots of the pprof -web tool on Slack 😛

Expectation

I wrote up https://github.com/vilterp/DatadogProfileUploader.jl, containing my attempt to mimic the request made by this package.

I even set the tags to show the language as Go, just in case unknown languages were filtered out. Was hoping that the profile would show up as if it were a Go profile.

Result

I ran it against a local agent (localhost:8126). The agent returned 200, but the profile never showed up in the profile list in the UI. Posting directly to Datadog with my API key also returned a 200, but with no profile in the UI.

Another experiment

To validate that other things were correct, including my local agent, the pprof file, my API key, etc, I tried another hack: made a branch of the Go SDK (this repo) which would read a Julia pprof file from disk and upload it, instead of taking a Go profile.

This works, showing Julia stack frames in the UI:

image

Profile: https://app.datadoghq.com/profiling/search?query=&event=AQAAAYP3wP6gxJKPQAAAAABBWVAzd01yN0FBRDJPXzRhVV9sYlFnQUE&my_code=disabled&profileId=AYP3wMr7AAD2O_4aU_lbQgAA&viz=stream&start=1666294591761&end=1666308991761&paused=false

Branch here: https://github.com/DataDog/dd-trace-go/compare/main...vilterp:dd-trace-go:trying-julia?expand=1

So there is hope that it's possible :) It just seems like something about the request I'm making is subtly different from the one coming from the official Go SDK, and I have no way of telling what it is, since the API doesn't return an error, and I can't see any internal logs with an error in them.

Here's an example request from my Julia code: https://gist.github.com/vilterp/92978cf5441b6794f64dda81749afeec (intercepted locally by requesting to nc -l 8126)

Request for help

Is there anything obviously wrong with the request I'm making? Is there a better channel to go through to get help on this? It'd be appreciated, as we have a lot of performance issues which we'd love to use DD profiling for. If all else fails, we could use a Go sidecar process (forked version of the Go SDK) to upload our Julia profiles 😅 Thanks! 🙏

@vilterp vilterp added the enhancement quick change/addition that does not need full team approval label Oct 20, 2022
@ajgajg1134
Copy link
Contributor

Hello! This sounds like an interesting request here, I'm not sure what requirements we have on the pprof files that is making them not appear in the UI for you. I think your best option here is to open a ticket with Datadog Support directly. That way it can be directed to profiling teams that may be able to assist you in getting your Julia pprof files uploaded (And also take down a feature request for better Julia support perhaps?)

@felixge
Copy link
Member

felixge commented Oct 25, 2022

@vilterp I just wanted to let you know that your request has reached the profiling team and we're discussing it. Official Julia support isn't on our roadmap for now, but we'd probably be okay with receiving Julia profiles if they don't cause issues on our end.

We'll ping you once we've reached consensus. Meanwhile, do you have a pprof produced by Julia that I could try? I looked at https://github.com/vilterp/DatadogProfileUploader.jl/blob/master/test/cpu.pb.gz but it appears to be a Go profile?

@vilterp
Copy link
Author

vilterp commented Oct 25, 2022

Ah, whoops. Updated it to be a julia profile: https://github.com/vilterp/DatadogProfileUploader.jl/blob/9f176cdea2ac766e7d0941279e5b9a1b7a553147/test/cpu.pprof

(This is with the tweaks here applied, to make it look more like Go's pprof files: JuliaPerf/PProf.jl#74)

Glad to hear it's reached the team, thanks! 🙏

@nsrip-dd
Copy link
Contributor

👋 This is the first I've heard of Julia's profiling capabilities, that's really cool!

In the request example you've shared, you say it's from the Julia client, but it's the user agent is Go-http-client/1.1. Is that actually from the dd-trace-go fork, or did you have your Julia client set that user agent?

@vilterp
Copy link
Author

vilterp commented Oct 25, 2022

In that one I had my Julia client set the user agent — was just trying to make it look as much like Go as possible, in case there was some user agent check on the datadog side 😛 Didn't seem to make a difference though… Still can't get one to show up in the list if I upload with Julia 🤔 (if I upload the same pprof file with Go, it does)

@felixge
Copy link
Member

felixge commented Oct 27, 2022

I took a closer look. Specifically:

  1. I confirmed that I can upload your cpu.pprof example using an (unofficial) CLI tool of mine. It shows up in the UI.
  2. I looked at the multipart message you shared, and couldn't find any error.
  3. I verified that posting to our intake directly (without the agent) will give you an error message if you got the API key wrong.

So my guess is that the most likely explanation is that you made a small mistake during your testing. E.g. used the API key of one account, but looked into another one in the UI. Or you didn't wait long enough for the file to show up. Or you had the API key of the agent misconfigured. Something like this.

Please let me know if you could do some sanity-checking on your end to rule out these kind of problems. If that doesn't help we can discuss next steps.

As far as our policy for uploading Julia profiles is concerned: We are okay with it as long as it's done through a Datadog agent per host and while classifying the profiles as Go. We can't promise that this won't break at some point in the future, but it's rather unlikely that this would happen.

Asides from our policy: My team is very excited about your interest in using our profiler with Julia! Thanks for working on this :).

@vilterp
Copy link
Author

vilterp commented Oct 27, 2022

Thanks for taking a look, Felix! I did get a 200 from posting to the service directly, so I think my API key was correct, but maybe something else was wrong… Will try sanity checking a bit more. I'm guessing there's some asynchronous processing/indexing happening before profiles show up in the UI, so errors may not be evident immediately?

Thanks for linking to the CLI tool; it's very helpful to see a 'minimum viable uploader'. I feel like this is pretty much exactly what I did in Julia, so not sure what the difference is! Probably some small mistake that's right in front of me.

Uploading through the agent as Go works for us, thanks. Only other Go/Julia mismatch so far has been name parsing — the service expects Go-style mypackage.MyFunc, but Julia profiles currently sometimes take the form myfunc{Pkg.MyType}, which parses in the UI as:

package: myfunc{Pkg.
func: MyType}

🤦 But we may be able to fix that by prefixing with the module name, e.g. MyPkg.myfunc{OtherPkg.MyType}.

Thanks again; we're excited too! This will definitely beat sending pprof files around in Slack if we can get it to work 🙏

@felixge
Copy link
Member

felixge commented Oct 27, 2022

I'm guessing there's some asynchronous processing/indexing happening before profiles show up in the UI, so errors may not be evident immediately?

Yes, there is a chance it's getting stuck in the pipeline on our end. But I just don't see why, your request looks fine🤔 . But if you can't figure it out, I can try to see if I can hunt down any internal errors.

@nsrip-dd
Copy link
Contributor

One possible error: The start and end times in your example are the same. See: https://gist.github.com/vilterp/92978cf5441b6794f64dda81749afeec#file-julia-dd-request-txt-L18-L24

The start and end times should differ by the profiling period. In other words, if you're collecting a profile once every minute, start and end should differ by 1 minute.

@vilterp
Copy link
Author

vilterp commented Oct 28, 2022

I think I found the issue, and it was date-related after all — for the start and end parts,

  • Go was sending in the format 2022-10-27T23:55:24+02:00
  • I was sending in the format 2022-10-28T00:12:54Z

I could have sworn I checked this before!

Anyway, the profiles seem to be showing up reliably now, so I think I'm unblocked — just going to continue to iterate on JuliaPerf/PProf.jl#74 to tweak the pprof files themselves.

Will post other issues here if anything else comes up. Thanks!
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ack enhancement quick change/addition that does not need full team approval profiler
Projects
None yet
Development

No branches or pull requests

4 participants