Option for not using SIGSTOP/SIGCONT because not all apps take it well #13

podusowski · 2020-01-07T08:43:32Z

This STOP/CONT pattern is used to avoid data-race between reading /proc and handling things like mmap events from the kernel, isn't it? Anyway, we are profiling some apps that don't take it very well since STOP causes syscalls to return abnormally. It should be fixed but you know, it is not always that easy. Therefore I'm proposing a switch to disable this behavior.

nwind/src/unwind_context.rs

Co-Authored-By: bjorn3 <[email protected]>

src/args.rs

src/perf_group.rs

Co-Authored-By: bjorn3 <[email protected]>

koute · 2020-01-17T15:37:10Z

Yes, the STOP/CONT are sent because the perf_event_open interface is somewhat broken and AFAIK it's not really possible to use it in a non-racy way with an application which is already running. (It really shows that it was designed mostly with the fork + exec model in mind where you always start a fresh instance when profiling.)

koute · 2020-01-17T15:43:14Z

nwind/src/unwind_context.rs

+        // avoid infinite loops
+        if self.ctx.nth_frame > 1000 {
+            warn!("possible infinite loop detected and avoided");
+            return false;
+        }


Have you actually hit a genuine infinite loop here? AFAIK infinite loops shouldn't really be possible as you're going to overflow the stack sooner or later anyway.

Anyway, this change isn't really correct. Even though > 1000 frame deep stacks are certainly a sign of a problem they should still be gathered. I've seen such stack traces in the wild, and gathering as much of it as possible later helps to fix it if you can manage to get to the top. So what was your motivation in adding this here? If you want to limit stack traces to a certain length we could add an extra parameter instead.

Yes, I've got it frequently in one of the cortex-15 app, but I cannot post it here nor dig into it further since I'm leaving the company.

What I managed to figure out though is that it looked like a arm unwinder bug, vec holding the frames kept allocating until it failed while trying to reallocate into 3.5 gigs.

Hi, do you still want to make this as command line option? I'm asking because failed allocation, which is how this bug manifests itself, is just an abort, no panic nor Err. This makes it hard to diagnose if it happens to someone.

podusowski added 2 commits January 7, 2020 09:10

option for not using SIGSTOP because not all apps take it well

c4a362b

workaround for possibly infinite unwind loop

ef29cdb

bjorn3 reviewed Jan 10, 2020

View reviewed changes

nwind/src/unwind_context.rs Outdated Show resolved Hide resolved

nwind/src/unwind_context.rs Outdated Show resolved Hide resolved

podusowski and others added 2 commits January 10, 2020 17:23

Update nwind/src/unwind_context.rs

f20e756

Co-Authored-By: bjorn3 <[email protected]>

Update nwind/src/unwind_context.rs

c6c6f28

Co-Authored-By: bjorn3 <[email protected]>

bjorn3 reviewed Jan 10, 2020

View reviewed changes

src/args.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Jan 10, 2020

View reviewed changes

src/perf_group.rs Outdated Show resolved Hide resolved

podusowski and others added 2 commits January 11, 2020 18:49

Update src/args.rs

5e07279

Co-Authored-By: bjorn3 <[email protected]>

Update src/perf_group.rs

9d39e02

Co-Authored-By: bjorn3 <[email protected]>

koute reviewed Jan 17, 2020

View reviewed changes

koute added 3 commits January 17, 2020 16:44

Fix style

9a87266

dont_stop_processes -> do_not_send_sigstop (1/2)

fb72c21

dont_stop_processes -> do_not_send_sigstop (2/2)

54a77dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option for not using SIGSTOP/SIGCONT because not all apps take it well #13

Option for not using SIGSTOP/SIGCONT because not all apps take it well #13

podusowski commented Jan 7, 2020

koute commented Jan 17, 2020

koute Jan 17, 2020

podusowski Jan 18, 2020 •

edited

Loading

podusowski Jan 27, 2020

Option for not using SIGSTOP/SIGCONT because not all apps take it well #13

Are you sure you want to change the base?

Option for not using SIGSTOP/SIGCONT because not all apps take it well #13

Conversation

podusowski commented Jan 7, 2020

koute commented Jan 17, 2020

koute Jan 17, 2020

Choose a reason for hiding this comment

podusowski Jan 18, 2020 • edited Loading

Choose a reason for hiding this comment

podusowski Jan 27, 2020

Choose a reason for hiding this comment

podusowski Jan 18, 2020 •

edited

Loading