Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for not using SIGSTOP/SIGCONT because not all apps take it well #13

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
6 changes: 6 additions & 0 deletions nwind/src/unwind_context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,12 @@ impl< 'a, A: Architecture > UnwindHandle< 'a, A > {

self.ctx.nth_frame += 1;

// avoid infinite loops
if self.ctx.nth_frame > 1000 {
warn!("possible infinite loop detected and avoided");
return false;
}
Comment on lines +99 to +103
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you actually hit a genuine infinite loop here? AFAIK infinite loops shouldn't really be possible as you're going to overflow the stack sooner or later anyway.

Anyway, this change isn't really correct. Even though > 1000 frame deep stacks are certainly a sign of a problem they should still be gathered. I've seen such stack traces in the wild, and gathering as much of it as possible later helps to fix it if you can manage to get to the top. So what was your motivation in adding this here? If you want to limit stack traces to a certain length we could add an extra parameter instead.

Copy link
Author

@podusowski podusowski Jan 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've got it frequently in one of the cortex-15 app, but I cannot post it here nor dig into it further since I'm leaving the company.

What I managed to figure out though is that it looked like a arm unwinder bug, vec holding the frames kept allocating until it failed while trying to reallocate into 3.5 gigs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, do you still want to make this as command line option? I'm asking because failed allocation, which is how this bug manifests itself, is just an abort, no panic nor Err. This makes it hard to diagnose if it happens to someone.


self.ctx.address = self.ctx.regs.get( A::INSTRUCTION_POINTER_REG ).unwrap();
debug!( "Unwinding #{} -> #{} at: 0x{:016X}", self.ctx.nth_frame - 1, self.ctx.nth_frame, self.ctx.address );

Expand Down
6 changes: 5 additions & 1 deletion src/args.rs
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,11 @@ pub struct RecordArgs {
pub discard_all: bool,

#[structopt(flatten)]
pub profiler_args: GenericProfilerArgs
pub profiler_args: GenericProfilerArgs,

#[structopt(long)]
/// Do not stop processes before gathering its info
pub dont_stop_processes: bool
podusowski marked this conversation as resolved.
Show resolved Hide resolved
}

#[derive(StructOpt, Debug)]
Expand Down
2 changes: 1 addition & 1 deletion src/cmd_record.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ pub fn main( args: args::RecordArgs ) -> Result< (), Box< dyn Error > > {
});

info!( "Opening perf events for process with PID {}...", controller.pid() );
let mut perf = match PerfGroup::open( controller.pid(), args.frequency, args.stack_size, args.event_source ) {
let mut perf = match PerfGroup::open( controller.pid(), args.frequency, args.stack_size, args.event_source, !args.dont_stop_processes ) {
Ok( perf ) => perf,
Err( error ) => {
error!( "Failed to start profiling: {}", error );
Expand Down
17 changes: 11 additions & 6 deletions src/perf_group.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,8 @@ pub struct PerfGroup {
stack_size: u32,
event_source: EventSource,
initial_events: Vec< Event< 'static > >,
stopped_processes: Vec< StoppedProcess >
stopped_processes: Vec< StoppedProcess >,
stop_processes: bool
podusowski marked this conversation as resolved.
Show resolved Hide resolved
}

fn poll_events< 'a, I >( poll_fds: &mut Vec< libc::pollfd >, iter: I ) where I: IntoIterator< Item = &'a Member >, <I as IntoIterator>::IntoIter: Clone {
Expand Down Expand Up @@ -163,7 +164,7 @@ fn get_threads( pid: u32 ) -> Result< Vec< (u32, Option< Vec< u8 > >) >, io::Err
}

impl PerfGroup {
pub fn new( frequency: u32, stack_size: u32, event_source: EventSource ) -> Self {
pub fn new( frequency: u32, stack_size: u32, event_source: EventSource, stop_processes: bool ) -> Self {
let group = PerfGroup {
event_buffer: Vec::new(),
members: Default::default(),
Expand All @@ -172,20 +173,24 @@ impl PerfGroup {
stack_size,
event_source,
initial_events: Vec::new(),
stopped_processes: Vec::new()
stopped_processes: Vec::new(),
stop_processes: stop_processes
};

group
}

pub fn open( pid: u32, frequency: u32, stack_size: u32, event_source: EventSource ) -> Result< Self, io::Error > {
let mut group = PerfGroup::new( frequency, stack_size, event_source );
pub fn open( pid: u32, frequency: u32, stack_size: u32, event_source: EventSource, stop_processes: bool ) -> Result< Self, io::Error > {
let mut group = PerfGroup::new( frequency, stack_size, event_source, stop_processes );
group.open_process( pid )?;
Ok( group )
}

pub fn open_process( &mut self, pid: u32 ) -> Result< (), io::Error > {
self.stopped_processes.push( StoppedProcess::new( pid )? );
if self.stop_processes {
self.stopped_processes.push( StoppedProcess::new( pid )? );
}

let mut perf_events = Vec::new();
let threads = get_threads( pid )?;

Expand Down