Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dotnet trace running in a sidecar container #810

Open
MichaelSimons opened this issue Feb 7, 2020 · 21 comments
Open

Support dotnet trace running in a sidecar container #810

MichaelSimons opened this issue Feb 7, 2020 · 21 comments
Labels
containers related to running/installing/configuring Diagnostics in a container enhancement New feature or request
Milestone

Comments

@MichaelSimons
Copy link
Member

  1. Run a containerized ASP.NET Core app within a container.
docker run -it -p 8000:80 --name aspnetapp mcr.microsoft.com/dotnet/core/samples:aspnetapp
  1. Build a image which contains the dotnet trace tool
FROM mcr.microsoft.com/dotnet/core/sdk:3.1

RUN dotnet tool install --global dotnet-trace

ENV PATH="${PATH}:/root/.dotnet/tools"
docker build -t dotnet/trace .
  1. Run the dotnet trace container
docker run -it --net=container:aspnetapp --pid=container:aspnetapp --cap-add ALL --privileged dotnet/trace
  1. Try to collect a trace of the ASP.NET Core app.

Expected Results:

I should be able to collect a dotnet trace. It is a common technique to utilize sidecar containers to profile applications running in another container. This allows you to profile existing application images without having to modify them. You can package all of your tools into a completely separate tools image.

An example of running perfcollect with this pattern is documented in this blog post

Actual Results

  1. dotnet trace ps does not match ps -aux
root@17be50bbd0fd:/# ps -aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.1  2.1 21288796 87672 pts/0  SLsl+ 22:36   0:03 dotnet aspnetapp.dll
root      1747  0.0  0.0   3988  3036 pts/0    Ss   23:19   0:00 bash
root      2231  0.0  0.0   7640  2704 pts/0    R+   23:23   0:00 ps -aux
root@17be50bbd0fd:/# dotnet trace ps
      2254 dotnet     /usr/share/dotnet/dotnet
      2272 dotnet-trace /root/.dotnet/tools/dotnet-trace
  1. The PID reported by dotnet trace cannot be found when running collect
root@17be50bbd0fd:/# dotnet trace collect -p 2254
No profile or providers specified, defaulting to trace profile 'cpu-sampling'

Provider Name                           Keywords            Level               Enabled By
Microsoft-DotNETCore-SampleProfiler     0xFFFFFFFFFFFFFFFF  Verbose(5)          --profile
Microsoft-Windows-DotNETRuntime         0x00000004C14FCCBD  Informational(4)    --profile

[ERROR] System.ArgumentException: Process with an Id of 2254 is not running.
   at System.Diagnostics.Process.GetProcessById(Int32 processId, String machineName)
   at System.Diagnostics.Process.GetProcessById(Int32 processId)
   at Microsoft.Diagnostics.Tools.Trace.CollectCommandHandler.Collect(CancellationToken ct, IConsole console, Int32 processId, FileInfo output, UInt32 buffersize, String providers, String profile, TraceFileFormat format, TimeSpan duration) in /_/src/Tools/dotnet-trace/CommandLine/Commands/CollectCommand.cs:line 89
  1. The PID reported by ps is not a valid .NET app when running collect
root@17be50bbd0fd:/# dotnet trace collect -p 1
No profile or providers specified, defaulting to trace profile 'cpu-sampling'

Provider Name                           Keywords            Level               Enabled By
Microsoft-DotNETCore-SampleProfiler     0xFFFFFFFFFFFFFFFF  Verbose(5)          --profile
Microsoft-Windows-DotNETRuntime         0x00000004C14FCCBD  Informational(4)    --profile

[ERROR] System.PlatformNotSupportedException: Process 1 not running compatible .NET Core runtime
   at Microsoft.Diagnostics.Tools.RuntimeClient.DiagnosticsIpc.IpcClient.GetTransport(Int32 processId) in /_/src/Microsoft.Diagnostics.Tools.RuntimeClient/DiagnosticsIpc/IpcClient.cs:line 50
   at Microsoft.Diagnostics.Tools.RuntimeClient.DiagnosticsIpc.IpcClient.SendMessage(Int32 processId, IpcMessage message, IpcMessage& response) in /_/src/Microsoft.Diagnostics.Tools.RuntimeClient/DiagnosticsIpc/IpcClient.cs:line 84
   at Microsoft.Diagnostics.Tools.RuntimeClient.EventPipeClient.CollectTracing(Int32 processId, SessionConfiguration configuration, UInt64& sessionId) in /_/src/Microsoft.Diagnostics.Tools.RuntimeClient/Eventing/EventPipeClient.cs:line 80
   at Microsoft.Diagnostics.Tools.Trace.CollectCommandHandler.Collect(CancellationToken ct, IConsole console, Int32 processId, FileInfo output, UInt32 buffersize, String providers, String profile, TraceFileFormat format, TimeSpan duration) in /_/src/Tools/dotnet-trace/CommandLine/Commands/CollectCommand.cs:line 104

Notes:

  1. I was using the dotnet trace version 3.1.57502+6767a9ac24bde3a58d7b51bdaff7c7d75aab9a65
  2. dotnet trace worked as expected if I added it within my application container. This required me to install to .NET SDK since it is a global tool.
  3. It is possible I am messed up something obvious here and this is something that is already supported 😄
@hoyosjs
Copy link
Member

hoyosjs commented Feb 8, 2020

@josalem

@josalem
Copy link
Contributor

josalem commented Feb 8, 2020

This is something we are actively working on lighting up, but is unsupported right now. There are some PRs in flight that will enable this scenario in the 5.0 timeframe. dotnet/runtime#1600 and #770 when merged will make this easy to configure. We are thinking of changing the names of the configuration variables from what was merged in the runtime PR, but the functionality should be the same. Documentation on how we recommend setting this scenario up will come before release. It will involve a shared volume mount inside the pod.

CC @shirhatti

@josalem josalem added the containers related to running/installing/configuring Diagnostics in a container label Feb 8, 2020
@josalem josalem added this to the 5.0 milestone Feb 8, 2020
@SidShetye
Copy link

Our use case: When encountering a low memory situation in a container (e.g. host = 16 GB, each containers = 2GB), the tools within that same container cannot do a memory dump because the tool doesn't have enough memory either. But if there was another container, this would be much simpler.

@bss-git
Copy link

bss-git commented Feb 18, 2020

.NET Core app on linux creates domain socket files in /tmp catalog. You can establish IPC session with this file.
You should start your target container with option that maps /tmp somewhere to host, e.g.
--v /tmp/container_sockets:/tmp
And start your tracing container with option that maps host catalog to /tmp in container:
--v /tmp/container_sockets:/tmp
(and with other your options like --pid).
Then if you start tracing it should just work.

Pid in this case in nothing more than abstraction. Internally Microsoft.Diagnostics.NETCore.Client.IpcClient uses pid just to find socket file name and start an IPC session:

                string ipcPort;
                try
                {
                    ipcPort = Directory.GetFiles(IpcRootPath, $"dotnet-diagnostic-{processId}-*-socket") // Try best match.
                                .OrderByDescending(f => new FileInfo(f).LastWriteTime)
                                .FirstOrDefault();
                    if (ipcPort == null)
                    {
                        throw new ServerNotAvailableException($"Process {processId} not running compatible .NET Core runtime.");
                    }
                }

I've configured intercontainer IPC in my monitoring app so it collects counters just by socket file names from two other containers. You even don't need target pid in this scenario.

@glitch100
Copy link

@MichaelSimons is this still the primary issue for this? I saw in #1737 that you are using this for tracking?

Following the steps in the issue above still has issues with the PIDs being inaccurate between ps -aux and dotnet trace ps. I am not sure if I am missing something with the sidecar approach or if the socket based approach posted by @bss-git is the way it should be done. Is there any documentation for this?

@MichaelSimons
Copy link
Member Author

@glitch100 - as far as I am aware, this is the primary issue.

@shirhatti - can you help @glitch100?

@shirhatti
Copy link
Contributor

shirhatti commented May 14, 2020

@glitch100 As noted in an earlier comment, access to the PID namespace isn't really required.

You just need access to the diagnostic server created by the runtime. As it stands today (3.0/3.1), this socket is always created in the /tmp directory.

Tooling and runtime changes are incoming for 5.0 that allow you customize how the diagnostics server is created and how the tools attach.

@bss-git's suggestion of sharing the temp directory across both containers should suffice.

FROM mcr.microsoft.com/dotnet/core/sdk:3.1 AS tools
RUN dotnet tool install --tool-path /tools dotnet-trace

FROM mcr.microsoft.com/dotnet/core/aspnet:3.1 AS runtime

COPY --from=tools /tools /tools
ENV PATH="/tools:${PATH}"

ENV COMPlus_EnableDiagnostics="0"

WORKDIR /tools
docker run -d -p 8000:80 -v /container:/tmp mcr.microsoft.com/dotnet/core/samples:aspnetapp
docker run --rm -u root -it -v /container:/tmp trace /bin/sh

# Inside the trace container
dotnet-trace collect -p 1

@glitch100
Copy link

glitch100 commented May 15, 2020

@shirhatti
Firstly for this to work will my dotnet-app also need:

ENV COMPlus_EnableDiagnostics="0"

In the dockerfile or we hit that issue around CoreClr starting (Due to /tmp directory.)

Once running, and having another container with the Dockerfile you provided I still hit an issue:

No profile or providers specified, defaulting to trace profile 'cpu-sampling'

Provider Name                           Keywords            Level               Enabled By
Microsoft-DotNETCore-SampleProfiler     0x0000000000000000  Informational(4)    --profile 
Microsoft-Windows-DotNETRuntime         0x00000014C14FCCBD  Informational(4)    --profile 

Unable to start a tracing session: Microsoft.Diagnostics.NETCore.Client.ServerNotAvailableException: Process 1 not running compatible .NET Core runtime.
No profile or providers specified, defaulting to trace profile 'cpu-sampling'

Provider Name                           Keywords            Level               Enabled By
Microsoft-DotNETCore-SampleProfiler     0x0000000000000000  Informational(4)    --profile 
Microsoft-Windows-DotNETRuntime         0x00000014C14FCCBD  Informational(4)    --profile 

Unable to start a tracing session: Microsoft.Diagnostics.NETCore.Client.ServerNotAvailableException: Process 1 not running compatible .NET Core runtime.
   at Microsoft.Diagnostics.NETCore.Client.IpcClient.GetTransport(Int32 processId) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcClient.cs:line 63
   at Microsoft.Diagnostics.NETCore.Client.IpcClient.SendMessage(Int32 processId, IpcMessage message, IpcMessage& response) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcClient.cs:line 104
   at Microsoft.Diagnostics.NETCore.Client.EventPipeSession..ctor(Int32 processId, IEnumerable`1 providers, Boolean requestRundown, Int32 circularBufferMB) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsClient/EventPipeSession.cs:line 30
   at Microsoft.Diagnostics.Tools.Trace.CollectCommandHandler.Collect(CancellationToken ct, IConsole console, Int32 processId, FileInfo output, UInt32 buffersize, String providers, String profile, TraceFileFormat format, TimeSpan duration, String clrevents, String clreventlevel) in /_/src/Tools/dotnet-trace/CommandLine/Commands/CollectCommand.cs:line 130
Unable to create session.

I have confirmed that both have volumes for /tmp mapped to the host.

Could you provide a working example? Also is this documented anywhere?

@shirhatti
Copy link
Contributor

Firstly for this to work my dotnet-app also will need:
ENV COMPlus_EnableDiagnostics="0"

That's not going to work. If you disable the diagnostics server, you can't trace the application.

If creation of pipes fails, do you mind creating a new issue for that? Let's work through that first.

Could you provide a working example?

I'll publish a gist later today.

@glitch100
Copy link

@shirhatti Any luck on that gist?

@glitch100
Copy link

Worth noting that I have tried what you posted with a vanilla ASPNET app with docker support (As per Visual Studio), as well as the trace container. I have tried in both Docker Compose and via the CLI with the commands above.

Docker Compose gives me that tmp CORECLR issues so I suspect I am doing something wrong there regards to the volumes/mounts.

docker run... successfully starts both containers, but again I get the same error as I posted above.

I look forward to seeing the gist as I am hoping I have done something silly rather than this being an issue on the dotnet side.

@shirhatti
Copy link
Contributor

EDIT: I made a small change to my earlier comment to include -u root on the trace container. I've just verified that my comment does indeed work and can be considered a complete example.

Docker Compose gives me that tmp CORECLR issues so I suspect I am doing something wrong there regards to the volumes/mounts.

As I mentioned earlier, please create a separate issue for that.

@glitch100
Copy link

I will give it a try. Where would you recommend I raise that issue?

@shirhatti
Copy link
Contributor

Where would you recommend I raise that issue?

https://github.com/dotnet/runtime

@glitch100
Copy link

@shirhatti I appreciate you making the edit however the timeline of this conversation does now seem a bit strange. That said the edits you made did get it working which is good 🎉, so thanks.

I might be wrong on this one but it seems like the default Dockerfile in a new ASPNETAPP (template) was not compatible with that flow, however the differences are quite minor so I am really unsure if it's down to the way I am running the container or if I made some changes along the way.

I validated this by running the sample image as you did, seeing success, running mine, seeing failure, and then updating my Dockerfile to match.

Is is the aspnet:3.1-buster-slim images?

I will make a ticket on the CORECLR repo.

Final question - is this documented somewhere?

@galvesribeiro
Copy link
Member

Hey folks!

After reading this issue among many others I was able to make it work. The YAML would look something like this:

apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  shareProcessNamespace: true
  volumes:
    - name: data
      emptyDir: {}
    - name: tmp
      emptyDir: {}
  containers:
    - name: sampleapp
      image: sampleapp:latest
      imagePullPolicy: Never
      volumeMounts:
        - name: data
          mountPath: /app/data
        - name: tmp
          mountPath: /tmp
      ports:
        - containerPort: 8080
    - name: profiler
      image: profiler:latest
      imagePullPolicy: Never
      stdin: true
      tty: true
      env:
        - name: PROFILER_TARGET_PROCESS
          value: "SampleApp"
        - name: PROFILER_BLOB_CONTAINER
          value: "dumps"
        - name: PROFILER_DATA_PATH
          value: /profiler/data
      volumeMounts:
        - name: data
          mountPath: /profiler/data
        - name: tmp
          mountPath: /tmp

The important pieces there are:

  1. shareProcessNamespace must be set to true otherwise, the process namespace isn't shared across the containers within the pod;
  2. You must mount /tmp in a shared volume. dotnet-trace / dotnet-counters / dotnet-dump will all rely on it to connect to the target process;
  3. stdin and tty must be set to true. If it is false, the dotnet-xxx tools will fail to start.

The rest of the options are totally optional as they are only meant to simplify and make the profiler container more generic. On my example, I'm also mapping the /profiler/data volume so I can pass it in the -o argument to the tools to write the files. Once the trace session is over, it will upload to a blob storage for further analysis.

So yes, it is somehow an involved process but it works just fine.

I hope it helps!

@galvesribeiro
Copy link
Member

You can ofc use the injection hooks on the admission controller to add that profiler pod spec using a label rather than having it hardcoded on the pod/deployment definition. I just meant to give an example and what are the requirements to get it working.

@glitch100
Copy link

Thanks a bunch - I was having trouble with docker-compose, and it looks like the stdin and tty were the bits I was missing. I will give this a go thanks

@noahfalk noahfalk added the enhancement New feature or request label Nov 6, 2020
@StupidScience
Copy link

You can also try our kubectl plugin that was created for gathering trace/gcdump results from dotnet apps running in k8s

@baal2000
Copy link

baal2000 commented Dec 1, 2020

@noahfalk is this issue related to #1720?

@josalem
Copy link
Contributor

josalem commented Dec 1, 2020

Sort of. That issue is tracking shipping a container that comes pre-installed with our tools.

This issue is tracking an experience where you can configure your multi-container Pod to have the tools in one container and your app in another. We have added the necessary features to the tools/runtime for 5.0 to do this but haven't documented the functionality fully yet.

There is a PR open on the docs repo that documents the flags for the tools, but not the end-to-end experience: dotnet/docs#21666

@tommcdon tommcdon modified the milestones: 5.0, 6.0 Dec 18, 2020
@tommcdon tommcdon modified the milestones: 6.0.0, 7.0.0 Jun 21, 2021
@tommcdon tommcdon modified the milestones: 7.0.0, 8.0.0 Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
containers related to running/installing/configuring Diagnostics in a container enhancement New feature or request
Projects
None yet
Development

No branches or pull requests