Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a C API nwind_backtrace around the local unwind API #37

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

milianw
Copy link

@milianw milianw commented Sep 17, 2023

This allows other projects not written in Rust to consume this unwinding library. The API is pretty basic and only offers the fundamental building blocks required for an external consumer to drive the unwinding. There is no fancy lazy global static address space, nor any thread local unwind context - configuring that is left to the consumer of this API.

Fixes: koute/bytehound#8

kdesysadmin pushed a commit to KDE/heaptrack that referenced this pull request Sep 17, 2023
See also: koute/not-perf#37

This is supposedly faster. tst_trace works, which is good, but trying
to use a heaptrack build with this API on the multi threaded test
crashes:

```
RUST_BACKTRACE=full ./bin/heaptrack --debug --record-only --raw ./tests/manual/threaded
heaptrack output will be written to "/home/milian/projects/src/heaptrack/build-nwind/heaptrack.threaded.68148.raw.zst"
starting application in GDB, this might take some time...
Reading symbols from ./tests/manual/threaded...
Starting program: /home/milian/projects/src/heaptrack/build-nwind/tests/manual/threaded
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff73ff6c0 (LWP 68207)]
thread '<unnamed>' panicked at 'assertion failed: !duplicate_matches', nwind/src/address_space.rs:929:5
stack backtrace:
   0:     0x7ffff7676e0a - std::backtrace_rs::backtrace::libunwind::trace::h9a6b80bbf328ba5d
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x7ffff7676e0a - std::backtrace_rs::backtrace::trace_unsynchronized::hd162ec543a11886b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7ffff7676e0a - std::sys_common::backtrace::_print_fmt::h78a5099be12f51a6
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x7ffff7676e0a - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::ha1c5390454d74f71
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x7ffff7696e6f - core::fmt::write::h9ffde816c577717b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/fmt/mod.rs:1254:17
   5:     0x7ffff7674745 - std::io::Write::write_fmt::h88186074961638e4
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/io/mod.rs:1698:15
   6:     0x7ffff7676bd5 - std::sys_common::backtrace::_print::h184198273ed08d59
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:47:5
   7:     0x7ffff7676bd5 - std::sys_common::backtrace::print::h1b4d8e7add699453
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:34:9
   8:     0x7ffff7677f6e - std::panicking::default_hook::{{closure}}::h393bcea75423915a
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:269:22
   9:     0x7ffff7677d15 - std::panicking::default_hook::h48c64f31d8b3fd03
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:288:9
  10:     0x7ffff767852e - std::panicking::rust_panic_with_hook::hafdc493a79370062
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:691:13
  11:     0x7ffff76783e2 - std::panicking::begin_panic_handler::{{closure}}::h0a64bc82e36bedc7
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:580:13
  12:     0x7ffff7677276 - std::sys_common::backtrace::__rust_end_short_backtrace::hc203444fb7416a16
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:150:18
  13:     0x7ffff7678182 - rust_begin_unwind
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
  14:     0x7ffff76946d3 - core::panicking::panic_fmt::h0f6ef0178afce4f2
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
  15:     0x7ffff769476d - core::panicking::panic::h0ead933cb8f56d66
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
  16:     0x7ffff750c587 - nwind::address_space::match_mapping::h571f20ecd5df2ae9
                               at /home/milian/projects/src/not-perf/nwind/src/address_space.rs:929:5
  17:     0x7ffff74f8703 - nwind::address_space::reload::h7bdbef3c4ecee369
                               at /home/milian/projects/src/not-perf/nwind/src/address_space.rs:1098:34
  18:     0x7ffff759a769 - nwind::local_unwinding::LocalAddressSpace::reload::h3b5f5e2abb8bcbde
                               at /home/milian/projects/src/not-perf/nwind/src/local_unwinding.rs:738:9
  19:     0x7ffff759a399 - nwind::local_unwinding::LocalAddressSpace::new_with_opts::hc3923e5b6a30003f
                               at /home/milian/projects/src/not-perf/nwind/src/local_unwinding.rs:715:9
  20:     0x7ffff749c633 - nwind_create_address_space
                               at /home/milian/projects/src/not-perf/nwind-capi/src/lib.rs:58:29
  21:     0x7ffff7fbe84a - AddressSpace
                               at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:21:44
  22:     0x7ffff7fbe84a - addressSpace
                               at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:36:25
  23:     0x7ffff7fbe916 - _ZN5Trace6unwindEPPv
                               at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:87:27
  24:     0x7ffff7fbda1a - _ZN5Trace4fillEi
                               at /home/milian/projects/src/heaptrack/src/track/trace.h:47:26
  25:     0x7ffff7fbda1a - heaptrack_malloc
                               at /home/milian/projects/src/heaptrack/src/track/libheaptrack.cpp:879:19
  26:     0x7ffff7fb9d98 - malloc
                               at /home/milian/projects/src/heaptrack/src/track/heaptrack_preload.cpp:214:21
  27:     0x7ffff7cac4fb - pool
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:235:30
  28:     0x7ffff7cac4fb - __static_initialization_and_destruction_0
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:373:8
  29:     0x7ffff7cac4fb - _GLOBAL__sub_I_eh_alloc.cc
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:456:1
  30:     0x7ffff7fcf0fe - <unknown>
  31:     0x7ffff7fcf1ec - <unknown>
  32:     0x7ffff7fe5560 - <unknown>
terminate called without an active exception
```
@milianw
Copy link
Author

milianw commented Sep 17, 2023

This seems to work in simple tests, but when I integrate the library into heaptrack and run it on a multi threaded application, I get a panic. Any clue what I'm doing wrong? see https://invent.kde.org/sdk/heaptrack/-/commit/56cada696feefe9b3dc7c4f0abeda7f824f29277

I'll paste the panic backtrace here again too:

RUST_BACKTRACE=full ./bin/heaptrack --debug --record-only --raw ./tests/manual/threaded
heaptrack output will be written to "/home/milian/projects/src/heaptrack/build-nwind/heaptrack.threaded.68148.raw.zst"
starting application in GDB, this might take some time...
Reading symbols from ./tests/manual/threaded...
Starting program: /home/milian/projects/src/heaptrack/build-nwind/tests/manual/threaded 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff73ff6c0 (LWP 68207)]
thread '<unnamed>' panicked at 'assertion failed: !duplicate_matches', nwind/src/address_space.rs:929:5
stack backtrace:
   0:     0x7ffff7676e0a - std::backtrace_rs::backtrace::libunwind::trace::h9a6b80bbf328ba5d
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x7ffff7676e0a - std::backtrace_rs::backtrace::trace_unsynchronized::hd162ec543a11886b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7ffff7676e0a - std::sys_common::backtrace::_print_fmt::h78a5099be12f51a6
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x7ffff7676e0a - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::ha1c5390454d74f71
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x7ffff7696e6f - core::fmt::write::h9ffde816c577717b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/fmt/mod.rs:1254:17
   5:     0x7ffff7674745 - std::io::Write::write_fmt::h88186074961638e4
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/io/mod.rs:1698:15
   6:     0x7ffff7676bd5 - std::sys_common::backtrace::_print::h184198273ed08d59
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:47:5
   7:     0x7ffff7676bd5 - std::sys_common::backtrace::print::h1b4d8e7add699453
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:34:9
   8:     0x7ffff7677f6e - std::panicking::default_hook::{{closure}}::h393bcea75423915a
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:269:22
   9:     0x7ffff7677d15 - std::panicking::default_hook::h48c64f31d8b3fd03
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:288:9
  10:     0x7ffff767852e - std::panicking::rust_panic_with_hook::hafdc493a79370062
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:691:13
  11:     0x7ffff76783e2 - std::panicking::begin_panic_handler::{{closure}}::h0a64bc82e36bedc7
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:580:13
  12:     0x7ffff7677276 - std::sys_common::backtrace::__rust_end_short_backtrace::hc203444fb7416a16
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:150:18
  13:     0x7ffff7678182 - rust_begin_unwind
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
  14:     0x7ffff76946d3 - core::panicking::panic_fmt::h0f6ef0178afce4f2
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
  15:     0x7ffff769476d - core::panicking::panic::h0ead933cb8f56d66
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
  16:     0x7ffff750c587 - nwind::address_space::match_mapping::h571f20ecd5df2ae9
                               at /home/milian/projects/src/not-perf/nwind/src/address_space.rs:929:5
  17:     0x7ffff74f8703 - nwind::address_space::reload::h7bdbef3c4ecee369
                               at /home/milian/projects/src/not-perf/nwind/src/address_space.rs:1098:34
  18:     0x7ffff759a769 - nwind::local_unwinding::LocalAddressSpace::reload::h3b5f5e2abb8bcbde
                               at /home/milian/projects/src/not-perf/nwind/src/local_unwinding.rs:738:9
  19:     0x7ffff759a399 - nwind::local_unwinding::LocalAddressSpace::new_with_opts::hc3923e5b6a30003f
                               at /home/milian/projects/src/not-perf/nwind/src/local_unwinding.rs:715:9
  20:     0x7ffff749c633 - nwind_create_address_space
                               at /home/milian/projects/src/not-perf/nwind-capi/src/lib.rs:58:29
  21:     0x7ffff7fbe84a - AddressSpace
                               at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:21:44
  22:     0x7ffff7fbe84a - addressSpace
                               at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:36:25
  23:     0x7ffff7fbe916 - _ZN5Trace6unwindEPPv
                               at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:87:27
  24:     0x7ffff7fbda1a - _ZN5Trace4fillEi
                               at /home/milian/projects/src/heaptrack/src/track/trace.h:47:26
  25:     0x7ffff7fbda1a - heaptrack_malloc
                               at /home/milian/projects/src/heaptrack/src/track/libheaptrack.cpp:879:19
  26:     0x7ffff7fb9d98 - malloc
                               at /home/milian/projects/src/heaptrack/src/track/heaptrack_preload.cpp:214:21
  27:     0x7ffff7cac4fb - pool
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:235:30
  28:     0x7ffff7cac4fb - __static_initialization_and_destruction_0
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:373:8
  29:     0x7ffff7cac4fb - _GLOBAL__sub_I_eh_alloc.cc
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:456:1
  30:     0x7ffff7fcf0fe - <unknown>
  31:     0x7ffff7fcf1ec - <unknown>
  32:     0x7ffff7fe5560 - <unknown>
terminate called without an active exception

@milianw
Copy link
Author

milianw commented Sep 17, 2023

and here with the mappings it tries to load:

555555554000-555555558000 r--p 00000000 fe:00 32306893                   /home/milian/projects/src/heaptrack/build-nwind/tests/manual/threaded
555555558000-55555555d000 r-xp 00003000 fe:00 32306893                   /home/milian/projects/src/heaptrack/build-nwind/tests/manual/threaded
55555555d000-55555555e000 r--p 00007000 fe:00 32306893                   /home/milian/projects/src/heaptrack/build-nwind/tests/manual/threaded
55555555e000-55555555f000 rw-p 00007000 fe:00 32306893                   /home/milian/projects/src/heaptrack/build-nwind/tests/manual/threaded
55555555f000-555555580000 rw-p 00000000 00:00 0                          [heap]
7ffff6bff000-7ffff6c00000 ---p 00000000 00:00 0 
7ffff6c00000-7ffff7400000 rw-p 00000000 00:00 0 
7ffff7400000-7ffff749b000 r--p 00000000 fe:00 32307271                   /home/milian/projects/src/not-perf/target/debug/libnwind_capi.so
7ffff749b000-7ffff769f000 r-xp 0009a000 fe:00 32307271                   /home/milian/projects/src/not-perf/target/debug/libnwind_capi.so
7ffff769f000-7ffff76c0000 r--p 0029d000 fe:00 32307271                   /home/milian/projects/src/not-perf/target/debug/libnwind_capi.so
7ffff76c0000-7ffff76c1000 rw-p 002bd000 fe:00 32307271                   /home/milian/projects/src/not-perf/target/debug/libnwind_capi.so
7ffff7800000-7ffff7826000 r--p 00000000 fe:00 26218135                   /usr/lib/libc.so.6
7ffff7826000-7ffff7985000 r-xp 00026000 fe:00 26218135                   /usr/lib/libc.so.6
7ffff7985000-7ffff79da000 r--p 00185000 fe:00 26218135                   /usr/lib/libc.so.6
7ffff79da000-7ffff7a3e000 r--p 001d9000 fe:00 26218135                   /usr/lib/libc.so.6
7ffff7a3e000-7ffff7a40000 rw-p 0023d000 fe:00 26218135                   /usr/lib/libc.so.6
7ffff7a40000-7ffff7a4d000 rw-p 00000000 00:00 0 
7ffff7bd7000-7ffff7bdb000 rw-p 00000000 00:00 0 
7ffff7bdb000-7ffff7bdf000 r--p 00000000 fe:00 26242794                   /usr/lib/libgcc_s.so.1
7ffff7bdf000-7ffff7bfa000 r-xp 00004000 fe:00 26242794                   /usr/lib/libgcc_s.so.1
7ffff7bfa000-7ffff7bfe000 r--p 0001f000 fe:00 26242794                   /usr/lib/libgcc_s.so.1
7ffff7bfe000-7ffff7bff000 r--p 00022000 fe:00 26242794                   /usr/lib/libgcc_s.so.1
7ffff7bff000-7ffff7c00000 rw-p 00023000 fe:00 26242794                   /usr/lib/libgcc_s.so.1
7ffff7c00000-7ffff7c9c000 r--p 00000000 fe:00 26298269                   /usr/lib/libstdc++.so.6.0.32
7ffff7c9c000-7ffff7dd3000 r-xp 0009c000 fe:00 26298269                   /usr/lib/libstdc++.so.6.0.32
7ffff7dd3000-7ffff7e69000 r--p 001d3000 fe:00 26298269                   /usr/lib/libstdc++.so.6.0.32
7ffff7e69000-7ffff7e76000 r--p 00269000 fe:00 26298269                   /usr/lib/libstdc++.so.6.0.32
7ffff7e76000-7ffff7e77000 rw-p 00276000 fe:00 26298269                   /usr/lib/libstdc++.so.6.0.32
7ffff7e77000-7ffff7e7b000 rw-p 00000000 00:00 0 
7ffff7e7c000-7ffff7e7e000 rw-p 00000000 00:00 0 
7ffff7e7e000-7ffff7e8e000 r--p 00000000 fe:00 26218148                   /usr/lib/libm.so.6
7ffff7e8e000-7ffff7f0d000 r-xp 00010000 fe:00 26218148                   /usr/lib/libm.so.6
7ffff7f0d000-7ffff7f69000 r--p 0008f000 fe:00 26218148                   /usr/lib/libm.so.6
7ffff7f69000-7ffff7f6a000 r--p 000ea000 fe:00 26218148                   /usr/lib/libm.so.6
7ffff7f6a000-7ffff7f6b000 rw-p 000eb000 fe:00 26218148                   /usr/lib/libm.so.6
7ffff7fb4000-7ffff7fb9000 r--p 00000000 fe:00 32306940                   /home/milian/projects/src/heaptrack/build-nwind/lib/heaptrack/libheaptrack_preload.so
7ffff7fb9000-7ffff7fbf000 r-xp 00004000 fe:00 32306940                   /home/milian/projects/src/heaptrack/build-nwind/lib/heaptrack/libheaptrack_preload.so
7ffff7fbf000-7ffff7fc0000 r--p 00009000 fe:00 32306940                   /home/milian/projects/src/heaptrack/build-nwind/lib/heaptrack/libheaptrack_preload.so
7ffff7fc0000-7ffff7fc2000 rw-p 00009000 fe:00 32306940                   /home/milian/projects/src/heaptrack/build-nwind/lib/heaptrack/libheaptrack_preload.so
7ffff7fc2000-7ffff7fc4000 rw-p 00000000 00:00 0 
7ffff7fc4000-7ffff7fc8000 r--p 00000000 00:00 0                          [vvar]
7ffff7fc8000-7ffff7fca000 r-xp 00000000 00:00 0                          [vdso]
7ffff7fca000-7ffff7fcb000 r--p 00000000 fe:00 26218117                   /usr/lib/ld-linux-x86-64.so.2
7ffff7fcb000-7ffff7ff1000 r-xp 00001000 fe:00 26218117                   /usr/lib/ld-linux-x86-64.so.2
7ffff7ff1000-7ffff7ffb000 r--p 00027000 fe:00 26218117                   /usr/lib/ld-linux-x86-64.so.2
7ffff7ffb000-7ffff7ffd000 r--p 00031000 fe:00 26218117                   /usr/lib/ld-linux-x86-64.so.2
7ffff7ffd000-7ffff7fff000 rw-p 00033000 fe:00 26218117                   /usr/lib/ld-linux-x86-64.so.2
7ffffffdc000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0                  [vsyscall]

thread '<unnamed>' panicked at 'assertion failed: !duplicate_matches', nwind/src/address_space.rs:929:5
stack backtrace:
   0:     0x7ffff767878a - std::backtrace_rs::backtrace::libunwind::trace::h9a6b80bbf328ba5d
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x7ffff767878a - std::backtrace_rs::backtrace::trace_unsynchronized::hd162ec543a11886b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7ffff767878a - std::sys_common::backtrace::_print_fmt::h78a5099be12f51a6
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x7ffff767878a - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::ha1c5390454d74f71
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x7ffff7698a5f - core::fmt::write::h9ffde816c577717b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/fmt/mod.rs:1254:17
   5:     0x7ffff7675fc5 - std::io::Write::write_fmt::h88186074961638e4
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/io/mod.rs:1698:15
   6:     0x7ffff7678555 - std::sys_common::backtrace::_print::h184198273ed08d59
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:47:5
   7:     0x7ffff7678555 - std::sys_common::backtrace::print::h1b4d8e7add699453
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:34:9
   8:     0x7ffff76798ee - std::panicking::default_hook::{{closure}}::h393bcea75423915a
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:269:22
   9:     0x7ffff7679695 - std::panicking::default_hook::h48c64f31d8b3fd03
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:288:9
  10:     0x7ffff7679eae - std::panicking::rust_panic_with_hook::hafdc493a79370062
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:691:13
  11:     0x7ffff7679d62 - std::panicking::begin_panic_handler::{{closure}}::h0a64bc82e36bedc7
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:580:13
  12:     0x7ffff7678bf6 - std::sys_common::backtrace::__rust_end_short_backtrace::hc203444fb7416a16
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:150:18
  13:     0x7ffff7679b02 - rust_begin_unwind
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
  14:     0x7ffff76962c3 - core::panicking::panic_fmt::h0f6ef0178afce4f2
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
  15:     0x7ffff769635d - core::panicking::panic::h0ead933cb8f56d66
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
  16:     0x7ffff750cef7 - nwind::address_space::match_mapping::h571f20ecd5df2ae9
                               at /home/milian/projects/src/not-perf/nwind/src/address_space.rs:929:5
  17:     0x7ffff74f9073 - nwind::address_space::reload::h7bdbef3c4ecee369
                               at /home/milian/projects/src/not-perf/nwind/src/address_space.rs:1098:34
  18:     0x7ffff759b155 - nwind::local_unwinding::LocalAddressSpace::reload::h3b5f5e2abb8bcbde
                               at /home/milian/projects/src/not-perf/nwind/src/local_unwinding.rs:739:9
  19:     0x7ffff759ad09 - nwind::local_unwinding::LocalAddressSpace::new_with_opts::hc3923e5b6a30003f
                               at /home/milian/projects/src/not-perf/nwind/src/local_unwinding.rs:715:9
  20:     0x7ffff749cfa3 - nwind_create_address_space
                               at /home/milian/projects/src/not-perf/nwind-capi/src/lib.rs:58:29
  21:     0x7ffff7fbe84a - AddressSpace
                               at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:21:44
  22:     0x7ffff7fbe84a - addressSpace
                               at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:36:25
  23:     0x7ffff7fbe916 - _ZN5Trace6unwindEPPv
                               at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:87:27
  24:     0x7ffff7fbda1a - _ZN5Trace4fillEi
                               at /home/milian/projects/src/heaptrack/src/track/trace.h:47:26
  25:     0x7ffff7fbda1a - heaptrack_malloc
                               at /home/milian/projects/src/heaptrack/src/track/libheaptrack.cpp:879:19
  26:     0x7ffff7fb9d98 - malloc
                               at /home/milian/projects/src/heaptrack/src/track/heaptrack_preload.cpp:214:21
  27:     0x7ffff7cac4fb - pool
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:235:30
  28:     0x7ffff7cac4fb - __static_initialization_and_destruction_0
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:373:8
  29:     0x7ffff7cac4fb - _GLOBAL__sub_I_eh_alloc.cc
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:456:1
  30:     0x7ffff7fcf0fe - <unknown>
  31:     0x7ffff7fcf1ec - <unknown>
  32:     0x7ffff7fe5560 - <unknown>

Thread 1 "threaded" hit Breakpoint 1, std::panicking::rust_panic () at library/std/src/panicking.rs:741
741     library/std/src/panicking.rs: No such file or directory.

@koute
Copy link
Owner

koute commented Sep 17, 2023

That triggers because it finds two or more PT_LOAD matches for the same memory mapping, so it's confused as to which one to pick.

It may or may not indicate a problem. In general I'm not 100% sure that I got the heuristics to match these together right in every case, and honestly there might be a better way to do this altogether, but in general it tends to work so I wasn't too motivated to dig deeper.

You can try compiling in release mode, which should remove that assertion, and then it'll just pick the first match and that should probably work anyway.

Until now the way I've been handling this is to add a new test every time this assertion hits, and update the heuristics so that the correct mapping is always picked, and the tests I've added in the past ensure that I haven't broken other corner cases.

Could you also paste the warn logs to see which exact PT_LOAD and which exact mappings it's complaining about? (Since you're calling this from C code the easiest/quickest way to one-time do that would probably be to temporarily replace the warn!s in match_mapping in address_space.rs with eprintln!.)

@milianw milianw force-pushed the nwind-capi branch 2 times, most recently from 2c3cb93 to 0d5abd9 Compare September 19, 2023 20:35
@milianw
Copy link
Author

milianw commented Sep 19, 2023

In a release build, unwinding just crashes:

Thread 1 "test_c" received signal SIGSEGV, Segmentation fault.
0x00007ffff7f3ed02 in <nwind::arch::amd64::Arch as nwind::arch::Architecture>::unwind::h0f6566e4f7bd5897 () from /home/milian/projects/compiled/other/lib/libnwind_capi.so
(gdb) bt
#0  0x00007ffff7f3ed02 in <nwind::arch::amd64::Arch as nwind::arch::Architecture>::unwind::h0f6566e4f7bd5897 () from /home/milian/projects/compiled/other/lib/libnwind_capi.so
#1  0x00007ffff7f3b2db in nwind_backtrace () from /home/milian/projects/compiled/other/lib/libnwind_capi.so
#2  0x00007ffff7f75ad1 in nwind_ret_trampoline_start () from /home/milian/projects/compiled/other/lib/libnwind_capi.so

That's why I thought the assertion in debug mode would help to fix this.

Here is the requested log for when I run into the assertion:

RUST_BACKTRACE=full ./bin/heaptrack --raw --record-only --debug ./tests/manual/test_c                                                                                 work/nwind-capi 
heaptrack output will be written to "/home/milian/projects/kde/src/heaptrack/build-nwind-capi/heaptrack.test_c.19559.raw.zst"
starting application in GDB, this might take some time...
Reading symbols from ./tests/manual/test_c...
Starting program: /home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c 

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) 
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff77ff6c0 (LWP 19610)]
Reloading...
Adding memory region at 0x0000555555554000-0x0000555555555000 (r--) for '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c' with offset 0x00000000
Got binary for '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c' from '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c'
PT_LOAD: address=0x00000000, file_offset=0x00000000, file_size=0x0000069C, mem_size=0x0000069C, alignment=4096, r--
PT_LOAD: address=0x000016A0, file_offset=0x000006A0, file_size=0x0000022B, mem_size=0x0000022B, alignment=4096, r-x
PT_LOAD: address=0x000028D0, file_offset=0x000008D0, file_size=0x00000238, mem_size=0x00000730, alignment=4096, rw-
PT_LOAD: address=0x00003B08, file_offset=0x00000B08, file_size=0x00000058, mem_size=0x00000059, alignment=4096, rw-
'/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c': found base address at 0x0000555555554000
Matching region: memory = 555555554000..555555555000 (1000), file = 0..1000
Matched mapping: 0 (1000) inside of 0..1000 (1000) (original: 0..69c (69c))
Matched mapping: 0 (1000) inside of 0..1000 (1000) (original: 8d0..b08 (238))
Matched mapping: 0 (1000) inside of 0..1000 (1000) (original: b08..b60 (58))
0x0000000000000000-0x0000000000001000 from '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c' is mapped at 0000555555554000-0000555555555000 in memory
Adding memory region at 0x0000555555555000-0x0000555555556000 (r-x) for '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c' with offset 0x00000000
'/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c': found base address at 0x0000555555555000
Matching region: memory = 555555555000..555555556000 (1000), file = 0..1000
Matched mapping: 0 (1000) inside of 0..1000 (1000) (original: 6a0..8cb (22b))
0x0000000000000000-0x0000000000001000 from '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c' is mapped at 0000555555555000-0000555555556000 in memory
Adding memory region at 0x0000555555556000-0x0000555555557000 (r--) for '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c' with offset 0x00000000
'/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c': found base address at 0x0000555555556000
Matching region: memory = 555555556000..555555557000 (1000), file = 0..1000
Matched mapping: 0 (1000) inside of 0..1000 (1000) (original: 0..69c (69c))
Matched mapping: 0 (1000) inside of 0..1000 (1000) (original: 8d0..b08 (238))
Matched mapping: 0 (1000) inside of 0..1000 (1000) (original: b08..b60 (58))
0x0000000000000000-0x0000000000001000 from '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c' is mapped at 0000555555556000-0000555555557000 in memory
Adding memory region at 0x0000555555557000-0x0000555555558000 (rw-) for '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c' with offset 0x00000000
'/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c': found base address at 0x0000555555557000
Matching region: memory = 555555557000..555555558000 (1000), file = 0..1000
Matched mapping: 0 (1000) inside of 0..1000 (1000) (original: 0..69c (69c))
Matched mapping: 0 (1000) inside of 0..1000 (1000) (original: 8d0..b08 (238))
Matched mapping: 0 (1000) inside of 0..1000 (1000) (original: b08..b60 (58))
0x0000000000000000-0x0000000000001000 from '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/tests/manual/test_c' is mapped at 0000555555557000-0000555555558000 in memory
Adding memory region at 0x00007FFFF7800000-0x00007FFFF789C000 (r--) for '/usr/lib/libstdc++.so.6.0.32' with offset 0x00000000
Got binary for '/usr/lib/libstdc++.so.6.0.32' from '/usr/lib/libstdc++.so.6.0.32'
PT_LOAD: address=0x00000000, file_offset=0x00000000, file_size=0x0009BB68, mem_size=0x0009BB68, alignment=4096, r--
PT_LOAD: address=0x0009C000, file_offset=0x0009C000, file_size=0x00136E71, mem_size=0x00136E71, alignment=4096, r-x
PT_LOAD: address=0x001D3000, file_offset=0x001D3000, file_size=0x00095F35, mem_size=0x00095F35, alignment=4096, r--
PT_LOAD: address=0x002694D0, file_offset=0x002694D0, file_size=0x0000CD18, mem_size=0x00010BF0, alignment=4096, rw-
'/usr/lib/libstdc++.so.6.0.32': found base address at 0x00007FFFF7800000
Matching region: memory = 7ffff7800000..7ffff789c000 (9c000), file = 0..9c000
Matched mapping: 0 (9c000) inside of 0..9c000 (9c000) (original: 0..9bb68 (9bb68))
0x0000000000000000-0x000000000009C000 from '/usr/lib/libstdc++.so.6.0.32' is mapped at 00007FFFF7800000-00007FFFF789C000 in memory
Adding memory region at 0x00007FFFF789C000-0x00007FFFF79D3000 (r-x) for '/usr/lib/libstdc++.so.6.0.32' with offset 0x0009C000
Matching region: memory = 7ffff789c000..7ffff79d3000 (137000), file = 9c000..1d3000
Matched mapping: 9c000 (137000) inside of 9c000..1d3000 (137000) (original: 9c000..1d2e71 (136e71))
0x000000000009C000-0x00000000001D3000 from '/usr/lib/libstdc++.so.6.0.32' is mapped at 00007FFFF789C000-00007FFFF79D3000 in memory
Adding memory region at 0x00007FFFF79D3000-0x00007FFFF7A69000 (r--) for '/usr/lib/libstdc++.so.6.0.32' with offset 0x001D3000
Matching region: memory = 7ffff79d3000..7ffff7a69000 (96000), file = 1d3000..269000
Matched mapping: 1d3000 (96000) inside of 1d3000..269000 (96000) (original: 1d3000..268f35 (95f35))
0x00000000001D3000-0x0000000000269000 from '/usr/lib/libstdc++.so.6.0.32' is mapped at 00007FFFF79D3000-00007FFFF7A69000 in memory
Adding memory region at 0x00007FFFF7A69000-0x00007FFFF7A76000 (r--) for '/usr/lib/libstdc++.so.6.0.32' with offset 0x00269000
Matching region: memory = 7ffff7a69000..7ffff7a76000 (d000), file = 269000..276000
Matched mapping: 269000 (d000) inside of 269000..276000 (d000) (original: 2694d0..2761e8 (cd18))
0x0000000000269000-0x0000000000276000 from '/usr/lib/libstdc++.so.6.0.32' is mapped at 00007FFFF7A69000-00007FFFF7A76000 in memory
Adding memory region at 0x00007FFFF7A76000-0x00007FFFF7A77000 (rw-) for '/usr/lib/libstdc++.so.6.0.32' with offset 0x00276000
Matching region: memory = 7ffff7a76000..7ffff7a77000 (1000), file = 276000..277000
Adding memory region at 0x00007FFFF7B13000-0x00007FFFF7B23000 (r--) for '/usr/lib/libm.so.6' with offset 0x00000000
Got binary for '/usr/lib/libm.so.6' from '/usr/lib/libm.so.6'
PT_LOAD: address=0x00000000, file_offset=0x00000000, file_size=0x0000FA08, mem_size=0x0000FA08, alignment=4096, r--
PT_LOAD: address=0x00010000, file_offset=0x00010000, file_size=0x0007EA75, mem_size=0x0007EA75, alignment=4096, r-x
PT_LOAD: address=0x0008F000, file_offset=0x0008F000, file_size=0x0005B48E, mem_size=0x0005B48E, alignment=4096, r--
PT_LOAD: address=0x000EBC18, file_offset=0x000EAC18, file_size=0x00000400, mem_size=0x00000510, alignment=4096, rw-
'/usr/lib/libm.so.6': found base address at 0x00007FFFF7B13000
Matching region: memory = 7ffff7b13000..7ffff7b23000 (10000), file = 0..10000
Matched mapping: 0 (10000) inside of 0..10000 (10000) (original: 0..fa08 (fa08))
0x0000000000000000-0x0000000000010000 from '/usr/lib/libm.so.6' is mapped at 00007FFFF7B13000-00007FFFF7B23000 in memory
Adding memory region at 0x00007FFFF7B23000-0x00007FFFF7BA2000 (r-x) for '/usr/lib/libm.so.6' with offset 0x00010000
Matching region: memory = 7ffff7b23000..7ffff7ba2000 (7f000), file = 10000..8f000
Matched mapping: 10000 (7f000) inside of 10000..8f000 (7f000) (original: 10000..8ea75 (7ea75))
0x0000000000010000-0x000000000008F000 from '/usr/lib/libm.so.6' is mapped at 00007FFFF7B23000-00007FFFF7BA2000 in memory
Adding memory region at 0x00007FFFF7BA2000-0x00007FFFF7BFE000 (r--) for '/usr/lib/libm.so.6' with offset 0x0008F000
Matching region: memory = 7ffff7ba2000..7ffff7bfe000 (5c000), file = 8f000..eb000
Matched mapping: 8f000 (5c000) inside of 8f000..eb000 (5c000) (original: 8f000..ea48e (5b48e))
0x000000000008F000-0x00000000000EB000 from '/usr/lib/libm.so.6' is mapped at 00007FFFF7BA2000-00007FFFF7BFE000 in memory
Adding memory region at 0x00007FFFF7BFE000-0x00007FFFF7BFF000 (r--) for '/usr/lib/libm.so.6' with offset 0x000EA000
Matching region: memory = 7ffff7bfe000..7ffff7bff000 (1000), file = ea000..eb000
Matched mapping: ea000 (1000) inside of 8f000..eb000 (5c000) (original: 8f000..ea48e (5b48e))
Matched mapping: ea000 (1000) inside of ea000..eb000 (1000) (original: eac18..eb018 (400))
0x00000000000EA000-0x00000000000EB000 from '/usr/lib/libm.so.6' is mapped at 00007FFFF7BFE000-00007FFFF7BFF000 in memory
Adding memory region at 0x00007FFFF7BFF000-0x00007FFFF7C00000 (rw-) for '/usr/lib/libm.so.6' with offset 0x000EB000
Matching region: memory = 7ffff7bff000..7ffff7c00000 (1000), file = eb000..ec000
Adding memory region at 0x00007FFFF7C00000-0x00007FFFF7C26000 (r--) for '/usr/lib/libc.so.6' with offset 0x00000000
Got binary for '/usr/lib/libc.so.6' from '/usr/lib/libc.so.6'
PT_LOAD: address=0x00000000, file_offset=0x00000000, file_size=0x00025B00, mem_size=0x00025B00, alignment=4096, r--
PT_LOAD: address=0x00026000, file_offset=0x00026000, file_size=0x0015E4DD, mem_size=0x0015E4DD, alignment=4096, r-x
PT_LOAD: address=0x00185000, file_offset=0x00185000, file_size=0x000545AC, mem_size=0x000545AC, alignment=4096, r--
PT_LOAD: address=0x001DAE60, file_offset=0x001D9E60, file_size=0x00064868, mem_size=0x00071F30, alignment=4096, rw-
'/usr/lib/libc.so.6': found base address at 0x00007FFFF7C00000
Matching region: memory = 7ffff7c00000..7ffff7c26000 (26000), file = 0..26000
Matched mapping: 0 (26000) inside of 0..26000 (26000) (original: 0..25b00 (25b00))
0x0000000000000000-0x0000000000026000 from '/usr/lib/libc.so.6' is mapped at 00007FFFF7C00000-00007FFFF7C26000 in memory
Adding memory region at 0x00007FFFF7C26000-0x00007FFFF7D85000 (r-x) for '/usr/lib/libc.so.6' with offset 0x00026000
Matching region: memory = 7ffff7c26000..7ffff7d85000 (15f000), file = 26000..185000
Matched mapping: 26000 (15f000) inside of 26000..185000 (15f000) (original: 26000..1844dd (15e4dd))
0x0000000000026000-0x0000000000185000 from '/usr/lib/libc.so.6' is mapped at 00007FFFF7C26000-00007FFFF7D85000 in memory
Adding memory region at 0x00007FFFF7D85000-0x00007FFFF7DDA000 (r--) for '/usr/lib/libc.so.6' with offset 0x00185000
Matching region: memory = 7ffff7d85000..7ffff7dda000 (55000), file = 185000..1da000
Matched mapping: 185000 (55000) inside of 185000..1da000 (55000) (original: 185000..1d95ac (545ac))
0x0000000000185000-0x00000000001DA000 from '/usr/lib/libc.so.6' is mapped at 00007FFFF7D85000-00007FFFF7DDA000 in memory
Adding memory region at 0x00007FFFF7DDA000-0x00007FFFF7E3E000 (r--) for '/usr/lib/libc.so.6' with offset 0x001D9000
Matching region: memory = 7ffff7dda000..7ffff7e3e000 (64000), file = 1d9000..23d000
Matched mapping: 1d9000 (64000) inside of 185000..1da000 (55000) (original: 185000..1d95ac (545ac))
Matched mapping: 1d9000 (64000) inside of 1d9000..23e000 (65000) (original: 1d9e60..23e6c8 (64868))
0x00000000001D9000-0x000000000023D000 from '/usr/lib/libc.so.6' is mapped at 00007FFFF7DDA000-00007FFFF7E3E000 in memory
Adding memory region at 0x00007FFFF7E3E000-0x00007FFFF7E40000 (rw-) for '/usr/lib/libc.so.6' with offset 0x0023D000
Matching region: memory = 7ffff7e3e000..7ffff7e40000 (2000), file = 23d000..23f000
Matched mapping: 23d000 (2000) inside of 1d9000..23e000 (65000) (original: 1d9e60..23e6c8 (64868))
0x000000000023D000-0x000000000023F000 from '/usr/lib/libc.so.6' is mapped at 00007FFFF7E3E000-00007FFFF7E40000 in memory
Adding memory region at 0x00007FFFF7EBC000-0x00007FFFF7EC0000 (r--) for '/usr/lib/libgcc_s.so.1' with offset 0x00000000
Got binary for '/usr/lib/libgcc_s.so.1' from '/usr/lib/libgcc_s.so.1'
PT_LOAD: address=0x00000000, file_offset=0x00000000, file_size=0x00003350, mem_size=0x00003350, alignment=4096, r--
PT_LOAD: address=0x00004000, file_offset=0x00004000, file_size=0x0001AA45, mem_size=0x0001AA45, alignment=4096, r-x
PT_LOAD: address=0x0001F000, file_offset=0x0001F000, file_size=0x000038B4, mem_size=0x000038B4, alignment=4096, r--
PT_LOAD: address=0x00023C28, file_offset=0x00022C28, file_size=0x000003E0, mem_size=0x000005A0, alignment=4096, rw-
'/usr/lib/libgcc_s.so.1': found base address at 0x00007FFFF7EBC000
Matching region: memory = 7ffff7ebc000..7ffff7ec0000 (4000), file = 0..4000
Matched mapping: 0 (4000) inside of 0..4000 (4000) (original: 0..3350 (3350))
0x0000000000000000-0x0000000000004000 from '/usr/lib/libgcc_s.so.1' is mapped at 00007FFFF7EBC000-00007FFFF7EC0000 in memory
Adding memory region at 0x00007FFFF7EC0000-0x00007FFFF7EDB000 (r-x) for '/usr/lib/libgcc_s.so.1' with offset 0x00004000
Matching region: memory = 7ffff7ec0000..7ffff7edb000 (1b000), file = 4000..1f000
Matched mapping: 4000 (1b000) inside of 4000..1f000 (1b000) (original: 4000..1ea45 (1aa45))
0x0000000000004000-0x000000000001F000 from '/usr/lib/libgcc_s.so.1' is mapped at 00007FFFF7EC0000-00007FFFF7EDB000 in memory
Adding memory region at 0x00007FFFF7EDB000-0x00007FFFF7EDF000 (r--) for '/usr/lib/libgcc_s.so.1' with offset 0x0001F000
Matching region: memory = 7ffff7edb000..7ffff7edf000 (4000), file = 1f000..23000
Matched mapping: 1f000 (4000) inside of 1f000..23000 (4000) (original: 1f000..228b4 (38b4))
0x000000000001F000-0x0000000000023000 from '/usr/lib/libgcc_s.so.1' is mapped at 00007FFFF7EDB000-00007FFFF7EDF000 in memory
Adding memory region at 0x00007FFFF7EDF000-0x00007FFFF7EE0000 (r--) for '/usr/lib/libgcc_s.so.1' with offset 0x00022000
Matching region: memory = 7ffff7edf000..7ffff7ee0000 (1000), file = 22000..23000
Matched mapping: 22000 (1000) inside of 1f000..23000 (4000) (original: 1f000..228b4 (38b4))
Matched mapping: 22000 (1000) inside of 22000..23000 (1000) (original: 22c28..23008 (3e0))
0x0000000000022000-0x0000000000023000 from '/usr/lib/libgcc_s.so.1' is mapped at 00007FFFF7EDF000-00007FFFF7EE0000 in memory
Adding memory region at 0x00007FFFF7EE0000-0x00007FFFF7EE1000 (rw-) for '/usr/lib/libgcc_s.so.1' with offset 0x00023000
Matching region: memory = 7ffff7ee0000..7ffff7ee1000 (1000), file = 23000..24000
Adding memory region at 0x00007FFFF7EE3000-0x00007FFFF7EFC000 (r--) for '/home/milian/projects/src/not-perf/target/release/libnwind_capi.so' with offset 0x00000000
Got binary for '/home/milian/projects/src/not-perf/target/release/libnwind_capi.so' from '/home/milian/projects/src/not-perf/target/release/libnwind_capi.so'
PT_LOAD: address=0x00000000, file_offset=0x00000000, file_size=0x00018020, mem_size=0x00018020, alignment=4096, r--
PT_LOAD: address=0x00019020, file_offset=0x00018020, file_size=0x00078CF3, mem_size=0x00078CF3, alignment=4096, r-x
PT_LOAD: address=0x00092D18, file_offset=0x00090D18, file_size=0x00003900, mem_size=0x000042E8, alignment=4096, rw-
PT_LOAD: address=0x00097618, file_offset=0x00094618, file_size=0x00000048, mem_size=0x00000139, alignment=4096, rw-
'/home/milian/projects/src/not-perf/target/release/libnwind_capi.so': found base address at 0x00007FFFF7EE3000
Matching region: memory = 7ffff7ee3000..7ffff7efc000 (19000), file = 0..19000
Matched mapping: 0 (19000) inside of 0..19000 (19000) (original: 0..18020 (18020))
0x0000000000000000-0x0000000000019000 from '/home/milian/projects/src/not-perf/target/release/libnwind_capi.so' is mapped at 00007FFFF7EE3000-00007FFFF7EFC000 in memory
Adding memory region at 0x00007FFFF7EFC000-0x00007FFFF7F75000 (r-x) for '/home/milian/projects/src/not-perf/target/release/libnwind_capi.so' with offset 0x00018000
Matching region: memory = 7ffff7efc000..7ffff7f75000 (79000), file = 18000..91000
Matched mapping: 18000 (79000) inside of 18000..91000 (79000) (original: 18020..90d13 (78cf3))
0x0000000000018000-0x0000000000091000 from '/home/milian/projects/src/not-perf/target/release/libnwind_capi.so' is mapped at 00007FFFF7EFC000-00007FFFF7F75000 in memory
Adding memory region at 0x00007FFFF7F75000-0x00007FFFF7F7A000 (r--) for '/home/milian/projects/src/not-perf/target/release/libnwind_capi.so' with offset 0x00090000
Matching region: memory = 7ffff7f75000..7ffff7f7a000 (5000), file = 90000..95000
Matched mapping: 90000 (5000) inside of 90000..94000 (4000) (original: 90d18..94618 (3900))
0x0000000000090000-0x0000000000095000 from '/home/milian/projects/src/not-perf/target/release/libnwind_capi.so' is mapped at 00007FFFF7F75000-00007FFFF7F7A000 in memory
Adding memory region at 0x00007FFFF7F7A000-0x00007FFFF7F7B000 (rw-) for '/home/milian/projects/src/not-perf/target/release/libnwind_capi.so' with offset 0x00094000
Matching region: memory = 7ffff7f7a000..7ffff7f7b000 (1000), file = 94000..95000
Matched mapping: 94000 (1000) inside of 94000..95000 (1000) (original: 94618..94660 (48))
0x0000000000094000-0x0000000000095000 from '/home/milian/projects/src/not-perf/target/release/libnwind_capi.so' is mapped at 00007FFFF7F7A000-00007FFFF7F7B000 in memory
Adding memory region at 0x00007FFFF7FB4000-0x00007FFFF7FB9000 (r--) for '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/lib/heaptrack/libheaptrack_preload.so' with offset 0x00000000
Got binary for '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/lib/heaptrack/libheaptrack_preload.so' from '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/lib/heaptrack/libheaptrack_preload.so'
PT_LOAD: address=0x00000000, file_offset=0x00000000, file_size=0x000042F8, mem_size=0x000042F8, alignment=4096, r--
PT_LOAD: address=0x00005300, file_offset=0x00004300, file_size=0x00005771, mem_size=0x00005771, alignment=4096, r-x
PT_LOAD: address=0x0000BA78, file_offset=0x00009A78, file_size=0x00000338, mem_size=0x00000588, alignment=4096, rw-
PT_LOAD: address=0x0000CDB0, file_offset=0x00009DB0, file_size=0x000002F8, mem_size=0x00000820, alignment=4096, rw-
'/home/milian/projects/kde/src/heaptrack/build-nwind-capi/lib/heaptrack/libheaptrack_preload.so': found base address at 0x00007FFFF7FB4000
Matching region: memory = 7ffff7fb4000..7ffff7fb9000 (5000), file = 0..5000
Matched mapping: 0 (5000) inside of 0..5000 (5000) (original: 0..42f8 (42f8))
0x0000000000000000-0x0000000000005000 from '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/lib/heaptrack/libheaptrack_preload.so' is mapped at 00007FFFF7FB4000-00007FFFF7FB9000 in memory
Adding memory region at 0x00007FFFF7FB9000-0x00007FFFF7FBF000 (r-x) for '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/lib/heaptrack/libheaptrack_preload.so' with offset 0x00004000
Matching region: memory = 7ffff7fb9000..7ffff7fbf000 (6000), file = 4000..a000
Matched mapping: 4000 (6000) inside of 4000..a000 (6000) (original: 4300..9a71 (5771))
0x0000000000004000-0x000000000000A000 from '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/lib/heaptrack/libheaptrack_preload.so' is mapped at 00007FFFF7FB9000-00007FFFF7FBF000 in memory
Adding memory region at 0x00007FFFF7FBF000-0x00007FFFF7FC0000 (r--) for '/home/milian/projects/kde/src/heaptrack/build-nwind-capi/lib/heaptrack/libheaptrack_preload.so' with offset 0x00009000
Matching region: memory = 7ffff7fbf000..7ffff7fc0000 (1000), file = 9000..a000
Matched mapping: 9000 (1000) inside of 9000..a000 (1000) (original: 9a78..9db0 (338))
Matched mapping: 9000 (1000) inside of 9000..a000 (1000) (original: 9db0..a0a8 (2f8))
Duplicate PT_LOAD matches for a single memory region: Region { start: 140737353871360, end: 140737353875456, is_read: true, is_write: false, is_executable: false, is_shared: false, file_offset: 36864, major: 259, minor: 4, inode: 60937829, name: "/home/milian/projects/kde/src/heaptrack/build-nwind-capi/lib/heaptrack/libheaptrack_preload.so" }
  Match #0: LoadHeader { address: 0x000000000000BA78, file_offset: 0x0000000000009A78, file_size: 0x0000000000000338, memory_size: 0x0000000000000588, alignment: 0x0000000000001000, flags: rw- } => AddressMapping { declared_address: 0x000000000000B000, actual_address: 0x00007FFFF7FBF000, file_offset: 0x0000000000009000, size: 0x0000000000001000 }
  Match #1: LoadHeader { address: 0x000000000000CDB0, file_offset: 0x0000000000009DB0, file_size: 0x00000000000002F8, memory_size: 0x0000000000000820, alignment: 0x0000000000001000, flags: rw- } => AddressMapping { declared_address: 0x000000000000C000, actual_address: 0x00007FFFF7FBF000, file_offset: 0x0000000000009000, size: 0x0000000000001000 }
thread '<unnamed>' panicked at 'assertion failed: !duplicate_matches', nwind/src/address_space.rs:929:5
stack backtrace:
   0:     0x7ffff7f72b4a - std::backtrace_rs::backtrace::libunwind::trace::ha41f10623cf38383
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x7ffff7f72b4a - std::backtrace_rs::backtrace::trace_unsynchronized::he6cd450ba0f308bc
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7ffff7f72b4a - std::sys_common::backtrace::_print_fmt::h7ef04a4f2a2c5bcc
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x7ffff7f72b4a - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hcb68b526e4fd36c3
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x7ffff7f039af - core::fmt::rt::Argument::fmt::hccd86a08f3d4989c
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/fmt/rt.rs:138:9
   5:     0x7ffff7f039af - core::fmt::write::h9e36e10f7b9f1bc4
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/fmt/mod.rs:1094:21
   6:     0x7ffff7f46f28 - std::io::Write::write_fmt::h3c54b4c0f7ea8aa9
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/io/mod.rs:1714:15
   7:     0x7ffff7f73f59 - std::sys_common::backtrace::_print::hfc5156a36fc1e918
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x7ffff7f73f59 - std::sys_common::backtrace::print::hbf1e6e947a7607a7
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x7ffff7f73cd3 - std::panicking::default_hook::{{closure}}::hfc82318001581c89
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:269:22
  10:     0x7ffff7f74415 - std::panicking::default_hook::h6d98284b8f63d34b
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:288:9
  11:     0x7ffff7f74415 - std::panicking::rust_panic_with_hook::hc91ec44bae697703
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:705:13
  12:     0x7ffff7f740d3 - std::panicking::begin_panic_handler::{{closure}}::h9ad46c1902f90594
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:595:13
  13:     0x7ffff7f74066 - std::sys_common::backtrace::__rust_end_short_backtrace::h4e5a2f3f6ba108b1
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:151:18
  14:     0x7ffff7f74051 - rust_begin_unwind
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
  15:     0x7ffff7f029e2 - core::panicking::panic_fmt::h48b0bfd72f473878
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14
  16:     0x7ffff7f03652 - core::panicking::panic::h78ea1ee7a574382a
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:117:5
  17:     0x7ffff7f183ef - nwind::address_space::reload::hc1b321d3915a823a
  18:     0x7ffff7f259bf - nwind::local_unwinding::LocalAddressSpace::reload::h8ac7d4da6346e121
  19:     0x7ffff7f38783 - nwind_create_address_space
  20:     0x7ffff7fbe83a - AddressSpace
                               at /home/milian/projects/kde/src/heaptrack/src/track/trace_nwind_capi.cpp:21:50
  21:     0x7ffff7fbe83a - addressSpace
                               at /home/milian/projects/kde/src/heaptrack/src/track/trace_nwind_capi.cpp:34:25
  22:     0x7ffff7fbe906 - _ZN5Trace6unwindEPPv
                               at /home/milian/projects/kde/src/heaptrack/src/track/trace_nwind_capi.cpp:79:27
  23:     0x7ffff7fbda0a - _ZN5Trace4fillEi
                               at /home/milian/projects/kde/src/heaptrack/src/track/trace.h:47:26
  24:     0x7ffff7fbda0a - heaptrack_malloc
                               at /home/milian/projects/kde/src/heaptrack/src/track/libheaptrack.cpp:879:19
  25:     0x7ffff7fb9d88 - malloc
                               at /home/milian/projects/kde/src/heaptrack/src/track/heaptrack_preload.cpp:214:21
  26:     0x7ffff78ac4fb - pool
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:235:30
  27:     0x7ffff78ac4fb - __static_initialization_and_destruction_0
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:373:8
  28:     0x7ffff78ac4fb - _GLOBAL__sub_I_eh_alloc.cc
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:456:1
  29:     0x7ffff7fcf0fe - <unknown>
  30:     0x7ffff7fcf1ec - <unknown>
  31:     0x7ffff7fe5560 - <unknown>

@milianw
Copy link
Author

milianw commented Sep 19, 2023

Ah, disabling the assertion and continuing in debug mode, I get a bit further and this is where it actually crashes:

Thread 1 "test_c" received signal SIGSEGV, Segmentation fault.
core::ptr::read_unaligned<usize> (src=0x0) at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ptr/mod.rs:1272
Downloading source file /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ptr/mod.rs
1272    /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ptr/mod.rs: Directory not empty.                                                                                                                                  
(gdb) bt
#0  core::ptr::read_unaligned<usize> (src=0x0) at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ptr/mod.rs:1272
#1  0x00007ffff78bb76a in nwind::local_unwinding::{impl#0}::get_pointer_at_address (self=0x7fffffffbec8, address=0) at nwind/src/local_unwinding.rs:41
#2  nwind::arch::amd64::{impl#0}::unwind<nwind::local_unwinding::LocalMemory> (nth_frame=6, memory=0x7fffffffbec8, state=0x5555555730b0, regs=0x555555573138, initial_address=0x555555573090, ra_address=0x5555555730a0)
    at nwind/src/arch/amd64.rs:220
#3  0x00007ffff789d232 in nwind::unwind_context::UnwindHandle<nwind::arch::amd64::Arch>::unwind<nwind::arch::amd64::Arch, nwind::local_unwinding::LocalMemory> (self=0x7fffffffbed8, memory=0x7fffffffbec8)
    at nwind/src/unwind_context.rs:104
#4  0x00007ffff78babb2 in nwind::local_unwinding::LocalAddressSpace::unwind<nwind_capi::nwind_backtrace::{closure_env#0}> (self=0x555555577180, ctx=..., callback=...) at nwind/src/local_unwinding.rs:843
#5  nwind_capi::nwind_backtrace (address_space=0x555555577180, unwind_context=0x555555573090, buffer=0x7fffffffc0c8, size=64) at nwind-capi/src/lib.rs:53
#6  0x00007ffff79abd01 in nwind_ret_trampoline_start () at src/arch/amd64_trampoline.s:17

@milianw
Copy link
Author

milianw commented Sep 19, 2023

So, the second patch fixes the crash for me, though I'm unsure if that's enough or the correct fix.

With that, tracing of simple c applications seems to work. But in real-world C++ applications, I'm still seeing other crashes, such as now this one, which is less obvious to me:

Thread 1 "kwrite" received signal SIGSEGV, Segmentation fault.
0x00007ffff609d810 in nwind::local_unwinding::ShadowStack::push (self=0x7fffffffb078, stack_pointer=140737040618896, address_location=18445199356321760528) at nwind/src/local_unwinding.rs:455
455             if *slot == nwind_ret_trampoline as usize {
(gdb) bt
#0  0x00007ffff609d810 in nwind::local_unwinding::ShadowStack::push (self=0x7fffffffb078, stack_pointer=140737040618896, address_location=18445199356321760528) at nwind/src/local_unwinding.rs:455
#1  0x00007ffff60ba194 in nwind::local_unwinding::LocalAddressSpace::unwind<nwind_capi::nwind_backtrace::{closure_env#0}> (self=0x5555555797d0, ctx=0x555555563160, callback=...) at nwind/src/local_unwinding.rs:824
#2  nwind_capi::nwind_backtrace (address_space=0x5555555797d0, unwind_context=0x555555563160, buffer=0x7fffffffb258, size=64) at nwind-capi/src/lib.rs:53
#3  0x00007ffff61a8f31 in nwind_ret_trampoline_start () at src/arch/amd64_trampoline.s:17

Furthermore, at least in the simple benchmark I use for heaptrack (https://invent.kde.org/sdk/heaptrack/-/blob/master/tests/manual/threaded.cpp?ref_type=heads), using the nwind-capi based unwinding instead of libunwind actually significantly slows down heaptrack. I.e. (with everything as release builds):

 Performance counter stats for './bin/heaptrack --raw --record-only ./tests/manual/threaded' (5 runs):

          6.764,95 msec task-clock:u                     #   34,124 CPUs utilized               ( +- 14,21% )
                 0      context-switches:u               #    0,000 /sec                      
                 0      cpu-migrations:u                 #    0,000 /sec                      
            71.018      page-faults:u                    #   17,670 K/sec                       ( +- 14,40% )
     8.839.691.993      cycles:u                         #    2,199 GHz                         ( +- 14,24% )
    12.267.755.453      instructions:u                   #    2,33  insn per cycle              ( +- 14,15% )
     2.019.145.341      branches:u                       #  502,390 M/sec                       ( +- 14,15% )
        21.832.656      branch-misses:u                  #    1,80% of all branches             ( +- 14,21% )
    29.045.968.875      slots:u                          #    7,227 G/sec                       ( +- 14,20% )
    10.239.562.416      topdown-retiring:u               #     31,3% Retiring                   ( +- 14,17% )
    11.333.856.334      topdown-bad-spec:u               #     34,6% Bad Speculation            ( +- 14,13% )
     6.086.048.243      topdown-fe-bound:u               #     18,5% Frontend Bound             ( +- 14,22% )
     5.108.851.311      topdown-be-bound:u               #     15,5% Backend Bound              ( +- 14,33% )

           0,19824 +- 0,00326 seconds time elapsed  ( +-  1,64% )

compared to the following with libunwind:

 Performance counter stats for './bin/heaptrack --raw --record-only ./tests/manual/threaded' (5 runs):

          4.842,12 msec task-clock:u                     #   30,105 CPUs utilized               ( +- 13,70% )
                 0      context-switches:u               #    0,000 /sec                      
                 0      cpu-migrations:u                 #    0,000 /sec                      
           160.347      page-faults:u                    #   53,498 K/sec                       ( +- 14,17% )
     5.057.971.909      cycles:u                         #    1,688 GHz                         ( +- 14,00% )
     7.329.606.557      instructions:u                   #    2,38  insn per cycle              ( +- 14,13% )
     1.202.887.376      branches:u                       #  401,329 M/sec                       ( +- 14,13% )
         9.832.032      branch-misses:u                  #    1,36% of all branches             ( +- 13,92% )
    16.690.777.435      slots:u                          #    5,569 G/sec                       ( +- 14,08% )
     6.271.477.502      topdown-retiring:u               #     33,3% Retiring                   ( +- 14,16% )
     5.672.861.869      topdown-bad-spec:u               #     31,0% Bad Speculation            ( +- 13,71% )
     3.261.871.981      topdown-fe-bound:u               #     17,5% Frontend Bound             ( +- 14,04% )
     3.399.730.449      topdown-be-bound:u               #     18,2% Backend Bound              ( +- 14,13% )

           0,16084 +- 0,00613 seconds time elapsed  ( +-  3,81% )

I guess this example is bad for assessing the unwinding performance, as the backtraces are so easy to cache, and libunwind does caching internally too after all. Do you have a better benchmark harness that made you come up with the "10x faster unwinding" numbers that I can use myself to verify these claims?

@milianw
Copy link
Author

milianw commented Sep 19, 2023

Could it be that nwind is currently missing any kind of safety net against bad/broken debug information or unwind tables? At least I cannot find anything similar to access_mem and unw_address_is_valid in libunwind, see https://github.com/libunwind/libunwind/blob/1769de89024da574ceb099b1b5b149029284723a/src/x86_64/Ginit.c#L78 and https://github.com/libunwind/libunwind/blob/master/src/mi/Gaddress_validator.c#L237

@koute
Copy link
Owner

koute commented Sep 20, 2023

I guess this example is bad for assessing the unwinding performance, as the backtraces are so easy to cache, and libunwind does caching internally too after all. Do you have a better benchmark harness that made you come up with the "10x faster unwinding" numbers that I can use myself to verify these claims?

I don't really have a public benchmark, but I used it on big C++ and Rust projects with hundreds of thousands lines of code and stack trace hundreds of frames deep, on multiple architectures, and it was significantly faster.

0,16084 +- 0,00613 seconds time elapsed

This is a very short benchmark. I do a lot of preprocessing work, and this crate is optimized for profiling long-running real-world programs with potentially deep stack traces, so for really short programs it might be slower.

Also, to get the most performance benefit the unwind_through_fresh_frames should be used instead of unwind; the deeper the stack trace the more pronounced the difference is (because unwind_through_fresh_frames will only have to go through those frames that changed since the last time it was called, instead of always having to go through everything). This is also great because it minimizes the amount of data you need to write out logging the backtraces.

Could it be that nwind is currently missing any kind of safety net against bad/broken debug information or unwind tables?

It assumes the unwinding tables are correct, and that the stack isn't smashed in any way. If either is not true then it will crash.

So, the second patch fixes the crash for me, though I'm unsure if that's enough or the correct fix.

AFAIK that method should never actually have a null pointer passed to it, and this is probably just fixing the symptom rather than the cause.

With that, tracing of simple c applications seems to work. But in real-world C++ applications, I'm still seeing other crashes, such as now this one, which is less obvious to me:

This backtrace suggest that it's trying to access the stack slot on the stack where the return address of the frame is supposed to be stored.


In general the unwinding here works; I'm using it right now to profile a Rust program, which is, like ~600k lines of code big, and it's been running for... let's see, something like ~24 days non-stop. So there must be something either with how you use the crate, or with the program you're running that's making it crash. Maybe you could try profiling some bigger Rust app and see whether this is maybe due some funny business that your C++ compiler is doing?

It's often also possible to debug these kinds of problems if you enable all of the logs (with the crate feature flag) and enable trace logs, and analyze that. But that spews out a lot of logs and it usually takes quite a bit of time and effort to analyze them.

Also, make sure you wrap the address space object in a read-write lock and always call reload if anything was loaded/unloaded etc. before trying to unwind (I usually use perf_event_open or dlpi_adds/dlpi_subs from dl_iterate_phdr to detect that).

kdesysadmin pushed a commit to KDE/heaptrack that referenced this pull request Sep 20, 2023
See also: koute/not-perf#37

This is supposedly faster. tst_trace works, which is good, but trying
to use a heaptrack build with this API on the multi threaded test
crashes:

```
RUST_BACKTRACE=full ./bin/heaptrack --debug --record-only --raw ./tests/manual/threaded
heaptrack output will be written to "/home/milian/projects/src/heaptrack/build-nwind/heaptrack.threaded.68148.raw.zst"
starting application in GDB, this might take some time...
Reading symbols from ./tests/manual/threaded...
Starting program: /home/milian/projects/src/heaptrack/build-nwind/tests/manual/threaded
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff73ff6c0 (LWP 68207)]
thread '<unnamed>' panicked at 'assertion failed: !duplicate_matches', nwind/src/address_space.rs:929:5
stack backtrace:
   0:     0x7ffff7676e0a - std::backtrace_rs::backtrace::libunwind::trace::h9a6b80bbf328ba5d
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x7ffff7676e0a - std::backtrace_rs::backtrace::trace_unsynchronized::hd162ec543a11886b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7ffff7676e0a - std::sys_common::backtrace::_print_fmt::h78a5099be12f51a6
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x7ffff7676e0a - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::ha1c5390454d74f71
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x7ffff7696e6f - core::fmt::write::h9ffde816c577717b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/fmt/mod.rs:1254:17
   5:     0x7ffff7674745 - std::io::Write::write_fmt::h88186074961638e4
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/io/mod.rs:1698:15
   6:     0x7ffff7676bd5 - std::sys_common::backtrace::_print::h184198273ed08d59
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:47:5
   7:     0x7ffff7676bd5 - std::sys_common::backtrace::print::h1b4d8e7add699453
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:34:9
   8:     0x7ffff7677f6e - std::panicking::default_hook::{{closure}}::h393bcea75423915a
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:269:22
   9:     0x7ffff7677d15 - std::panicking::default_hook::h48c64f31d8b3fd03
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:288:9
  10:     0x7ffff767852e - std::panicking::rust_panic_with_hook::hafdc493a79370062
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:691:13
  11:     0x7ffff76783e2 - std::panicking::begin_panic_handler::{{closure}}::h0a64bc82e36bedc7
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:580:13
  12:     0x7ffff7677276 - std::sys_common::backtrace::__rust_end_short_backtrace::hc203444fb7416a16
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:150:18
  13:     0x7ffff7678182 - rust_begin_unwind
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
  14:     0x7ffff76946d3 - core::panicking::panic_fmt::h0f6ef0178afce4f2
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
  15:     0x7ffff769476d - core::panicking::panic::h0ead933cb8f56d66
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
  16:     0x7ffff750c587 - nwind::address_space::match_mapping::h571f20ecd5df2ae9
                               at /home/milian/projects/src/not-perf/nwind/src/address_space.rs:929:5
  17:     0x7ffff74f8703 - nwind::address_space::reload::h7bdbef3c4ecee369
                               at /home/milian/projects/src/not-perf/nwind/src/address_space.rs:1098:34
  18:     0x7ffff759a769 - nwind::local_unwinding::LocalAddressSpace::reload::h3b5f5e2abb8bcbde
                               at /home/milian/projects/src/not-perf/nwind/src/local_unwinding.rs:738:9
  19:     0x7ffff759a399 - nwind::local_unwinding::LocalAddressSpace::new_with_opts::hc3923e5b6a30003f
                               at /home/milian/projects/src/not-perf/nwind/src/local_unwinding.rs:715:9
  20:     0x7ffff749c633 - nwind_create_address_space
                               at /home/milian/projects/src/not-perf/nwind-capi/src/lib.rs:58:29
  21:     0x7ffff7fbe84a - AddressSpace
                               at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:21:44
  22:     0x7ffff7fbe84a - addressSpace
                               at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:36:25
  23:     0x7ffff7fbe916 - _ZN5Trace6unwindEPPv
                               at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:87:27
  24:     0x7ffff7fbda1a - _ZN5Trace4fillEi
                               at /home/milian/projects/src/heaptrack/src/track/trace.h:47:26
  25:     0x7ffff7fbda1a - heaptrack_malloc
                               at /home/milian/projects/src/heaptrack/src/track/libheaptrack.cpp:879:19
  26:     0x7ffff7fb9d98 - malloc
                               at /home/milian/projects/src/heaptrack/src/track/heaptrack_preload.cpp:214:21
  27:     0x7ffff7cac4fb - pool
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:235:30
  28:     0x7ffff7cac4fb - __static_initialization_and_destruction_0
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:373:8
  29:     0x7ffff7cac4fb - _GLOBAL__sub_I_eh_alloc.cc
                               at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:456:1
  30:     0x7ffff7fcf0fe - <unknown>
  31:     0x7ffff7fcf1ec - <unknown>
  32:     0x7ffff7fe5560 - <unknown>
terminate called without an active exception
```
@milianw
Copy link
Author

milianw commented Sep 20, 2023

Thanks, I'll withhold benchmarking until it works reliably on real world applications.
Tanks for pointing out unwind_through_fresh_frames, I'll have a look at that and how
to incorporate that into my code base eventually.

Could it be that nwind is currently missing any kind of safety net against bad/broken debug information or unwind tables?

It assumes the unwinding tables are correct, and that the stack isn't smashed in any way. If either is not true then it will crash.

I don't believe the stack is smashed, rather experience over the years has shown that unwinding tables are not correct
in all cases. Handwritten assembly might miss entries, or contain bogus entries. Compilers sometimes emit wrong
information. Or the info is simply not there at all for some low level libraries. I mean generally, what would happen
when your unwinder encounters a library compiled with -fno-exceptions?

I do not think libunwind goes through all this pain in validating memory before accessing it for no good reason.
I fear that this is simply the dark reality one has to face.

So, the second patch fixes the crash for me, though I'm unsure if that's enough or the correct fix.

AFAIK that method should never actually have a null pointer passed to it, and this is probably just fixing the symptom rather than the cause.

Do you have an idea what the cause might be then? Maybe it's related to the second crash down below in some way?

With that, tracing of simple c applications seems to work. But in real-world C++ applications, I'm still seeing other crashes, such as now this one, which is less obvious to me:

This backtrace suggest that it's trying to access the stack slot on the stack where the return address of the frame is supposed to be stored.

In general the unwinding here works; I'm using it right now to profile a Rust program, which is, like ~600k lines of code big, and it's been running for... let's see, something like ~24 days non-stop.

To make this clear: I'm not debating whether your unwinding works for you. I'm simply sharing my experience with
applying it to my own non-rust code bases. I would love to integrate your unwinder if it's working and faster
after all - otherwise I wouldn't do all this work here to begin with :)

So there must be something either with how you use the crate, or with the program you're running that's making it crash. Maybe you could try profiling some bigger Rust app and see whether this is maybe due some funny business that your C++ compiler is doing?

I'm new to the rust world, do you have a suggestion for a rust app to use?

I have now tried it with rg and that seems to work without crashing. So indeed, it could be something specific to
C/C++ code, potentially pulled in from system libraries.

But preload libbytehound.so to my C++ application also does not crash... I'm not any wiser as to where the issue is now.

It's often also possible to debug these kinds of problems if you enable all of the logs (with the crate feature flag) and enable trace logs, and analyze that. But that spews out a lot of logs and it usually takes quite a bit of time and effort to analyze them.

Also, make sure you wrap the address space object in a read-write lock and always call reload if anything was loaded/unloaded etc. before trying to unwind (I usually use perf_event_open or dlpi_adds/dlpi_subs from dl_iterate_phdr to detect that).

Thanks, for the reminder. I think I already do that, but in my case I rely on dlopen or dlclose being called
to notify me that I need to reload the address space.

For good measure, I will now also experimented with reloading the address space every time I unwind to see if
there's maybe something I'm missing in that area.

One way or another - the C API I'm proposing here should - I hope - be acceptable on its own, or what do you think?
I mean, the discussion around the crashes I'm seeing is important but probably outside the scope of the merge request
at hand. Can you review the first patch, and I'll drop the second one as you say it's handled on the wrong level?

Thanks for your input so far.

@koute
Copy link
Owner

koute commented Sep 20, 2023

I do not think libunwind goes through all this pain in validating memory before accessing it for no good reason.
I fear that this is simply the dark reality one has to face.

I'm not saying that it isn't there for no reason, it's just I'm not really interested in supporting such use case for in-process local unwinding. (:

If the compiler emits broken unwinding tables or the program has broken unwinding tables because e.g. there's an assembly snippet which doesn't emit them correctly then, ideally, that should be fixed. If the program is compiled with -fno-exceptions then it should be recompiled to force the compiler to emit the unwinding tables (basically, add -fasynchronous-unwind-tables, which shouldn't affect codegen itself).

Note that essentially I'm using the same unwinding tables as C++ exception handling uses, so if the C++ exception handling works then my unwinding should always work too if it's called from within the process (so it's kinda similar to throwing an exception). And if my unwinding is broken because of broken unwinding tables then C++ exception handling will also be broken.

Do you have an idea what the cause might be then? Maybe it's related to the second crash down below in some way?

Sorry, no idea. They're probably related. You'd really have to dig into the logs to understand why it happens.

To make this clear: I'm not debating whether your unwinding works for you.

Yeah, I just wanted to clarify that it should work and has not e.g. bitrotted. :P (In general the project might seem dead, but that's because it's stable and it works for what I need it for, so there's no need to change it. I don't have much time and I don't get paid to work on this stuff just for the sake of it. :P)

But preload libbytehound.so to my C++ application also does not crash... I'm not any wiser as to where the issue is now.

A very random shot in the dark - maybe try comparing the compile flags between a program which crashes and one which doesn't?

One way or another - the C API I'm proposing here should - I hope - be acceptable on its own, or what do you think?
I mean, the discussion around the crashes I'm seeing is important but probably outside the scope of the merge request
at hand. Can you review the first patch, and I'll drop the second one as you say it's handled on the wrong level?

Yeah, looks mostly fine. It doesn't particularly make much difference to me what the C API exactly looks like as I won't be using it myself, so whatever works for you is fine. (Although I guess there are a few nits I can post.)

nwind-capi/include/nwind_capi.h Outdated Show resolved Hide resolved
*/
int nwind_backtrace(nwind_address_space *address_space,
nwind_unwind_context *unwind_context,
void **buffer, int size);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why int and not size_t?

nwind-capi/include/nwind_capi.h Outdated Show resolved Hide resolved
@milianw
Copy link
Author

milianw commented Sep 20, 2023

I do not think libunwind goes through all this pain in validating memory before accessing it for no good reason.
I fear that this is simply the dark reality one has to face.

I'm not saying that it isn't there for no reason, it's just I'm not really interested in supporting such use case for in-process local unwinding. (:

If the compiler emits broken unwinding tables or the program has broken unwinding tables because e.g. there's an assembly snippet which doesn't emit them correctly then, ideally, that should be fixed. If the program is compiled with -fno-exceptions then it should be recompiled to force the compiler to emit the unwinding tables (basically, add -fasynchronous-unwind-tables, which shouldn't affect codegen itself).

This makes this unwinder unusable on a lot of applications that are not using Rust. To expand on that: I was mistakenly referring to -fno-exceptions, as on x64 that shouldn't disable the async unwind tables to my knowledge. But I'm clearly seeing crashes due to the fallback code in amd64.rs where the dwarf_unwind fails and then the code tries to do frame pointer unwinding by hand. If you try this on code compiled with -fomit-framepointers, which is still the default on all distributions out there, it might crash, as the RBP doesn't necessarily encode a valid frame pointer but is repurposed. Blindly de-referencing it might thus lead to the crashes I'm seeing.

I wouldn't be surprised if this is a non-issue when you are looking only on the rust world. But even there it might be problematic if you call into crates that wrap system libraries, no?

Just to re-iterate: This is not about broken unwinding tables here I believe, but rather about blindly trying to do frame pointer unwinding on code that wasn't compiled with frame pointers?

Note that essentially I'm using the same unwinding tables as C++ exception handling uses, so if the C++ exception handling works then my unwinding should always work too if it's called from within the process (so it's kinda similar to throwing an exception). And if my unwinding is broken because of broken unwinding tables then C++ exception handling will also be broken.

🤷 all I see is dwarf_unwind giving me no unwind info for address and then a crash follows. I'm surprised though - The .eh_frame shouldn't need debug info for unwinding, no?

Do you have an idea what the cause might be then? Maybe it's related to the second crash down below in some way?

Sorry, no idea. They're probably related. You'd really have to dig into the logs to understand why it happens.

To make this clear: I'm not debating whether your unwinding works for you.

Yeah, I just wanted to clarify that it should work and has not e.g. bitrotted. :P (In general the project might seem dead, but that's because it's stable and it works for what I need it for, so there's no need to change it. I don't have much time and I don't get paid to work on this stuff just for the sake of it. :P)

Same for me with heaptrack, except for the 10% edu time rule we have at work :)

But preload libbytehound.so to my C++ application also does not crash... I'm not any wiser as to where the issue is now.

A very random shot in the dark - maybe try comparing the compile flags between a program which crashes and one which doesn't?

I think it's rather the above - trying to do frame pointer unwinding in a system library for which I don't have any debug information installed.

One way or another - the C API I'm proposing here should - I hope - be acceptable on its own, or what do you think?
I mean, the discussion around the crashes I'm seeing is important but probably outside the scope of the merge request
at hand. Can you review the first patch, and I'll drop the second one as you say it's handled on the wrong level?

Yeah, looks mostly fine. It doesn't particularly make much difference to me what the C API exactly looks like as I won't be using it myself, so whatever works for you is fine. (Although I guess there are a few nits I can post.)

Great, thanks for the review, I'll attend that later.

@koute
Copy link
Owner

koute commented Sep 20, 2023

But I'm clearly seeing crashes due to the fallback code in amd64.rs where the dwarf_unwind fails and then the code tries to do frame pointer unwinding by hand.

There are two fallbacks there:

  1. The one in guess_ebp is due to a bug in perf_event_open where it returns garbage RBP values. This is only meant for out-of-process unwinding and is not needed and should never trigger when doing in-process unwinding.

  2. The one in unwind which uses the frame pointers I've added only because I wanted to unwind though frames generated by wasmtime. It should never trigger if proper unwinding tables exist.

Just to re-iterate: This is not about broken unwinding tables here I believe, but rather about blindly trying to do frame pointer unwinding on code that wasn't compiled with frame pointers?

If the unwinding tables are correct (and assuming there are no other bugs) then the fallback should never trigger, I think. I used this on a big C++ application with hundreds of lines of code before I even added the fallback.

Or to reiterate: I'm not assuming that frame pointers are always present. (: I'm assuming that either a) there are proper unwinding tables, or 2) frame pointers are present. Either of the two must be true. So basically the requirement is: it must be possible to always unwind to the very root of the stack (ideally through unwind tables, and if that's missing through frame pointers), or there must be a relevant entry in the unwinding table that will stop the unwinding beforehand.

I wouldn't be surprised if this is a non-issue when you are looking only on the rust world. But even there it might be problematic if you call into crates that wrap system libraries, no?

One of the apps I'm regularly profiling with this has rocksdb compiled in it, which is a big C++ dependency. Works fine.

Again, not saying that there are no bugs; there might as well be bugs here. But it's not necessarily a Rust vs C++ thing. If we can figure out why it crashes then of course I'd like to fix it, but it would need to be a proper fix which doesn't just make it ignore the underlying problem.


Anyway, maybe you can also check if commenting out the frame pointer based unwinding makes it not crash anymore? (Because then it'll just give up instead of trying to continue.)

@milianw
Copy link
Author

milianw commented Sep 20, 2023

Or to reiterate: I'm not assuming that frame pointers are always present. (: I'm assuming that either a) there are proper unwinding tables, or 2) frame pointers are present. Either of the two must be true.

This is not necessarily true for all system libraries. In my work on perf, hotspot and heaptrack I have over the years encountered various situations where we did not get either valid unwinding tables nor valid frame pointers. One scenario I remember was in a library with handwritten assembly code that didn't have the needed unwinding information.

Another, if contrived, yet possible scenario, is some third party library that is compiled without exceptions, async unwind tables, debug info nor frame pointers. As soon as that would be encountered, the unwinder could crash - right?

So basically the requirement is: it must be possible to always unwind to the very root of the stack (ideally through unwind tables, and if that's missing through frame pointers), or there must be a relevant entry in the unwinding table that will stop the unwinding beforehand.

Are you actively opposed to making the unwinder more fail safe in scenarios like the above? Considering that the local address space already knows the valid regions, can it not validate addresses before dereferencing them e.g.? That should be a relatively fast binary search, but obviously it would lead to a slow down for all applications. Maybe guard it behind a feature flag that I could enable only for the C-API then?

I wouldn't be surprised if this is a non-issue when you are looking only on the rust world. But even there it might be problematic if you call into crates that wrap system libraries, no?

One of the apps I'm regularly profiling with this has rocksdb compiled in it, which is a big C++ dependency. Works fine.

I don't think the issue is related to C++. I'll try to get some better logging in place to see if I can better pin point the exact places where it goes wrong. Also see below - this only applies to situations where the normal unwinding fails, so if your rocksdb is compiled "properly" it won't exhibit this issue.

Again, not saying that there are no bugs; there might as well be bugs here. But it's not necessarily a Rust vs C++ thing. If we can figure out why it crashes then of course I'd like to fix it, but it would need to be a proper fix which doesn't just make it ignore the underlying problem.

Agreed. But I fear the "proper fix" is adding additional safety checks into the frame pointer unwinder, or not enabling that by default as a fallback.

My test run where I reload the address space on every unwind finally has hit the crash too, meaning I can safely claim that it's not due to missing a reload. I'm attaching my terminal scroll back buffer as a log file, for reference
log.txt.gz. here's the excerpt from the very end:

[2023-09-20T18:25:04Z DEBUG nwind::unwind_context] Starting unwinding at: 0x00007FFFF5CFDA3B
[2023-09-20T18:25:04Z DEBUG nwind::address_space] Cannot find a binary corresponding to address 0x00007FFFF5CFDA3B
[2023-09-20T18:25:04Z DEBUG nwind::dwarf] No unwind info for address 0x00007FFFF5CFDA3B
[2023-09-20T18:25:04Z DEBUG nwind::unwind_context] Unwinding #0 -> #1 at: 0x0000000000000000
[2023-09-20T18:25:04Z DEBUG nwind::dwarf] Instruction pointer is NULL; cannot continue unwinding
[2023-09-20T18:25:04Z DEBUG nwind::unwind_context] Current address on frame #1: 0x0000000000000000
[2023-09-20T18:25:04Z DEBUG nwind::unwind_context] Unwinding #1 -> #2 at: 0x48185F8948104F89
[2023-09-20T18:25:04Z DEBUG nwind::address_space] Cannot find a binary corresponding to address 0x48185F8948104F89
[2023-09-20T18:25:04Z DEBUG nwind::dwarf] No unwind info for address 0x48185F8948104F88

[Switching to Thread 0x7fffe4c686c0 (LWP 240628)]
core::ptr::read_unaligned<usize> (src=0x4808578948078948) at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ptr/mod.rs:1281
1281    /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ptr/mod.rs: No such file or directory.
(gdb) bt
#0  core::ptr::read_unaligned<usize> (src=0x4808578948078948) at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ptr/mod.rs:1281
#1  0x00007ffff5cfb30c in nwind::local_unwinding::{impl#0}::get_pointer_at_address (self=0x7fffe4c673a8, address=5190494817675086152) at nwind/src/local_unwinding.rs:45
#2  nwind::arch::amd64::{impl#0}::unwind<nwind::local_unwinding::LocalMemory> (nth_frame=2, memory=0x7fffe4c673a8, state=0x7fffd8069a90, regs=0x7fffd80698b0, initial_address=0x7fffd8069a60, ra_address=0x7fffd8069a70) at nwind/src/arch/amd64.rs:224
#3  0x00007ffff5cf3053 in nwind::unwind_context::UnwindHandle<nwind::arch::amd64::Arch>::unwind<nwind::arch::amd64::Arch, nwind::local_unwinding::LocalMemory> (self=0x7fffe4c673b8, memory=0x7fffe4c673a8) at nwind/src/unwind_context.rs:104
#4  0x00007ffff5cfe031 in nwind::local_unwinding::LocalAddressSpace::unwind<nwind_capi::nwind_backtrace::{closure_env#0}> (self=0x7fffd8034de0, ctx=0x7fffd80698b0, callback=...) at nwind/src/local_unwinding.rs:847
#5  nwind_capi::nwind_backtrace (address_space=0x7fffd8034de0, unwind_context=0x7fffd80698b0, buffer=0x7fffe4c67728, size=64) at nwind-capi/src/lib.rs:59
#6  0x00007ffff7fbe935 in Trace::unwind(void**) (data=data@entry=0x7fffe4c67728) at /home/milian/projects/src/heaptrack/src/track/trace_nwind_capi.cpp:96
#7  0x00007ffff7fbda5a in Trace::fill(int) (skip=2, this=0x7fffe4c67720) at /home/milian/projects/src/heaptrack/src/track/trace.h:47
#8  heaptrack_malloc(void*, size_t) (ptr=0x7fffd80d5330, size=36) at /home/milian/projects/src/heaptrack/src/track/libheaptrack.cpp:879
#9  0x00007ffff7fb9dd8 in malloc(size_t) (size=36) at /home/milian/projects/src/heaptrack/src/track/heaptrack_preload.cpp:214
#10 0x00007ffff3d83bc4 in  () at /usr/lib/libxcb.so.1
#11 0x00007ffff3d843af in  () at /usr/lib/libxcb.so.1
#12 0x00007ffff3d85f3d in xcb_wait_for_event () at /usr/lib/libxcb.so.1
#13 0x00007fffe5a68ed2 in  () at /usr/lib/libQt5XcbQpa.so.5
#14 0x00007ffff66f35da in  () at /usr/lib/libQt5Core.so.5
#15 0x00007ffff628c9eb in  () at /usr/lib/libc.so.6
#16 0x00007ffff6310dfc in  () at /usr/lib/libc.so.6

Anyway, maybe you can also check if commenting out the frame pointer based unwinding makes it not crash anymore? (Because then it'll just give up instead of trying to continue.)

Yes, that fixes the crashes I'm seeing. For my purposes that should be another viable approach, better broken backtraces for fringe thirdparty code, rather than crashing at least :)

This allows other projects not written in Rust to consume this
unwinding library. The API is pretty basic and only offers the
fundamental building blocks required for an external consumer
to drive the unwinding. There is no fancy lazy global static
address space, nor any thread local unwind context - configuring
that is left to the consumer of this API.
In my case this seems to happen, leading to panics such as:
```
thread '<unnamed>' panicked at 'attempt to subtract with overflow', not-perf/nwind/src/frame_descriptions.rs:463:125
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::panicking::panic
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
   3: nwind::frame_descriptions::FrameDescriptions<E>::find_unwind_info
             at not-perf/nwind/src/frame_descriptions.rs:463:125
```
kdesysadmin pushed a commit to KDE/heaptrack that referenced this pull request Sep 20, 2023
This way one can opt-in to get extended debug logs when using
the library, i.e. `cargo build --features "env_logger"`
@koute
Copy link
Owner

koute commented Sep 21, 2023

Are you actively opposed to making the unwinder more fail safe in scenarios like the above?

I'm not convinced it should be necessary, or at least I don't have enough evidence that would convince me.

Considering I use the same unwinding tables which C++ uses for exception handling let's try to enumerate the scenarios where it would be broken:

  • If the compiler emits broken unwinding tables. This would also break C++ exception handling. I'm not interested in supporting this because I think it is reasonable to assume that C++ exception handling works, and if it doesn't work because there's a bug in the compiler then it will break the programs themselves, not just my unwinder. The compiler should be fixed, and the program recompiled.
  • If you try to profile a binary-only program without unwinding tables. I'm not interested in supporting this because 99% of the time you will either have the sources, or the binary will have unwinding tables, or it will have frame pointers. Besides, in this scenario the quality of the profiling will be significantly affected (because we won't be able to fully unwind everything), so you as a user do not want to run as-is anyway if you can help it. If you can't help it because e.g. you're trying to profile some proprietary binary blob for your employer, well, tough luck. (: (For the same reason e.g. I would just reject a PR which adds support for architectures like S390x, which are inaccessible to mere mortals and essentially only used by big for-profit corporations.)
  • If you have a few pieces of code in your program which don't have proper unwinding tables because e.g. they are assembly snippets. Now this is a more reasonable use case, but I'm still probably not interested in supporting this because 1) it should be rare, 2) to actually make the unwinding fail in this case your assembly snippet would have to call back into your C++ code (even more rare), 3) your program would either have to have exceptions completely disabled/guarantee to never throw from within that callback, or be broken (because you're using exceptions and can throw from the callback but you haven't added unwind tables to your assembly snippet)
  • If there's a bug in my code.

Usually when I see crashes like this it's for the last reason, where either there's a bug or I'm not handling something properly. Fundamentally I'm not opposed to handling corner cases where reasonable, but I do not want to just silently ignore buggy behavior, especially if the bug is in my code.

And in this case it is probably a bug in my code. You're seeing "Cannot find a binary corresponding to address" errors, which means that it wasn't able to match a given memory mapping to a binary, which is why it didn't find any unwinding tables, which is why it was trying the fallback, which is why it crashed.

So, again, the proper fix is not to gracefully handle this, but to actually fix it so that it can find the binary and access the unwinding tables, then it won't crash.


Also, please remember that this is not a general purpose unwinder. Its only purpose is to be used in profiling tools. That's it. So handling every corner case under the sun is out of scope, and the number one priority is performance, and the number two priority is that it tries very hard to always return full backtraces. The only situation where I'm willing to compromise on these two is when it's unambiguously necessary, and the lack of defensive checks (like general purpose unwinders have) is very much deliberate.

@milianw
Copy link
Author

milianw commented Sep 21, 2023

Thanks for your explanation. This is your project and you can dictate the rules. If this really turns out to be a bug somewhere else, all the better - let's see.

Based on what you wrote, it sounds like the "Cannot find a binary corresponding to address" should actually be an error then, and stop the unwinding (and emit an error, not just a debug message) as it might otherwise just crash, or what do you propose? Basically: I'd rather see explicit panics / asserts instead of this kind of silent and hard to debug crashes due to invalid memory accesses.

And what do you say to a mode that - optionally, via a feature flag - disables the frame pointer unwinding fallback? On all current linux distros they won't be available anyways, and your unwinder isn't used to unwindg through kernel space either, or?

Hmm though potentially FP unwinding is needed for JIT frames. Actually, thinking about JIT - that could be another area where your assumptions don't hold, no? I.e. no JIT frame will have corresponding .eh_frame or DWARF metadata. And some JITs don't setup the FP unwinding data either, and could thus crash the unwinder?

@milianw
Copy link
Author

milianw commented Oct 5, 2023

Ping?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make shadow-stack based unwinding reusable for other projects?
2 participants