Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Enable worker-level profiling of Dataflow Jobs #72

Open
bjchambers opened this issue Oct 20, 2015 · 9 comments
Open

Enable worker-level profiling of Dataflow Jobs #72

bjchambers opened this issue Oct 20, 2015 · 9 comments
Labels

Comments

@bjchambers
Copy link
Contributor

For both working on the SDK and building Dataflow pipelines, it would be useful if there was an easy way to get profiles from the execution of code on the workers.

@bjchambers
Copy link
Contributor Author

bjchambers commented Feb 18, 2016

We’re working on a better experience for profiling but there is rudimentary support for profiling available in the SDK.

What you’ll need

  1. An installation of pprof
  2. An installation of graphviz if you’d like to visualize profile information.

How to get profiles

  1. Run your pipeline specifying --saveProfilesToGcs=<gs://your_gcs_bucket>. This will write profiles to the given GCS bucket.
  2. Retrieve the profiles from the GCS using gsutil -m cp -r <gs://your_gcs_bucket> <local_dir>.
  3. View the profiles using pprof. Run pprof <local_dir>/*cpu*.gz for CPU profiles (or *wall*.gz for wall-time profiles). From here you can run graphviz to render a calltree, or text or tree for text-based reports. See the pprof docs and pprof --help for more ways to interact with the profiles.

Hope that helps!

Notes and Caveats

  • The profiles will be 10 second samples from every 60 seconds of execution.
  • For a batch job the VM instances are normally torn down after the job completes, and the final trace may not get uploaded to GCS.
  • Multiple ParDo steps may execute together. When this happens, the call to output() in the first step will include the time to execute the later steps. As a result, the inclusive time for these steps will be inflated.
  • If you want the profiles to include information about JNI calls make sure to have any relevant binaries/object files in the directory you run pprof from.

@bfabry
Copy link

bfabry commented Feb 6, 2017

If you want the profiles to include information about JNI calls make sure to have any relevant binaries/object files in the directory you run pprof from.

Is there any documentation on how to get the binaries used by dataflow to do this?

EDIT: ie, I'm seeing a lot of this type of thing

      flat  flat%   sum%        cum   cum%
885904.05s 93.27% 93.27% 885904.05s 93.27%  [libpthread-2.19.so]
 36963.81s  3.89% 97.16%  36963.81s  3.89%  GC
 11025.24s  1.16% 98.33%  11045.60s  1.16%  [libc-2.19.so]
  5444.52s  0.57% 98.90%   5444.52s  0.57%  Native
   488.93s 0.051% 98.95% 330068.45s 34.75%  [libjvm.so]
    60.95s 0.0064% 98.96% 897579.05s 94.50%  <unknown>

and would like to get some understanding as to what is being called inside libpthread

/cc @bjchambers

@peay
Copy link

peay commented Apr 24, 2017

Is there similar support for Beam's Dataflow runner? (edit: nevermind, just found DataflowProfilingOptions)

@swegner
Copy link
Contributor

swegner commented Apr 6, 2018

Yes, in Apache Beam profiling support is now enabled via --saveProfilesToGcs=<gs://...>, defined inside DataflowProfilingOptions.

@bvolpato
Copy link

bvolpato commented Aug 8, 2019

I couldn't get it to work.

Even though I am sending:

  saveProfilesToGcs: gs://labs1-carol-internal/profiler
  profilingAgentConfiguration: {APICurated=true}

Through Java code:

        DataflowProfilingOptions profilingOptions = dataflowPipelineOptions.as(DataflowProfilingOptions.class);
        profilingOptions.setSaveProfilesToGcs("gs://" + PipelineHelper.getBucketName(bucket) + "/profiler");

        DataflowProfilingAgentConfiguration agent = new DataflowProfilingAgentConfiguration();
        agent.put("APICurated", true);
        
        profilingOptions.setProfilingAgentConfiguration(agent);
        

I don't get any files in the profiler, and this message is printed on Stackdriver:
Profiling Agent not found. Profiles will not be available from this worker.

Any ideas?

@lukecwik
Copy link
Contributor

lukecwik commented Aug 8, 2019

Which version of the SDK are you using?

Have you tried contacting Google Cloud support and share some job ids with them?

@bvolpato
Copy link

bvolpato commented Aug 8, 2019

@lukecwik I tried with both 2.13.0 and 2.14.0. Will try to contact their support, thanks!

@bvolpato
Copy link

bvolpato commented Aug 9, 2019

Support could not help with this and still didn't find a way to get profilers.

On the other hand, I would like to mention that Profiles in Dataflow don't have a Service Level Agreement (SLA) since this is an experimental Alpha feature, and is not recommended for production use cases, as mentioned in [2].
[2] https://cloud.google.com/products/#product-launch-stages

@bvolpato
Copy link

For those who are wondering, the Profiler does not get populated (and profile files are not saved on GCS, either) if you set both properties at the same time (APICurated=true and saveProfilesToGcs={path}).

I removed the saveProfilesToGcs and now profiler works fine for me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

8 participants
@bfabry @bjchambers @swegner @bvolpato @peay @davorbonaci @lukecwik and others