Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iperf3 fails with exception nested to deeply on ROFS/RamFS image #1035

Closed
wkozaczuk opened this issue Mar 18, 2019 · 7 comments
Closed

iperf3 fails with exception nested to deeply on ROFS/RamFS image #1035

wkozaczuk opened this issue Mar 18, 2019 · 7 comments

Comments

@wkozaczuk
Copy link
Collaborator

OSv starts successfully with single CPU with iperf3 however once connected with a client it fails like so:

[I/35 dhcp]: Set hostname to: osv
warning: this system does not seem to support IPv6 - trying IPv4
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
random: device unblocked.
Accepted connection from 192.168.122.1, port 57142
page fault outsAssertion failed: sched::exception_depth <= 1 (core/sched.cc: reschedule_from_interrupt: 236)

[backtrace]

From gdb:

#0  abort (fmt=fmt@entry=0x662e32 "exception nested too deeply") at runtime.cc:105
#1  0x00000000003a18d6 in sched::arch_cpu::enter_exception (this=<optimized out>) at arch/x64/arch-cpu.cc:19
#2  sched::exception_guard::exception_guard (this=<optimized out>) at arch/x64/arch-cpu.cc:37
#3  0x00000000003a41bd in page_fault (ef=0xffff80000096d048) at arch/x64/mmu.cc:22
#4  <signal handler called>
#5  0x00000000003a1a06 in safe_load<frame*> (data=<synthetic pointer>: <optimized out>, potentially_bad_pointer=0x0)
    at arch/x64/safe-ptr.hh:16
#6  backtrace_safe (pc=pc@entry=0xffff8000016472e0, nr=nr@entry=128) at arch/x64/backtrace.cc:25
#7  0x000000000023f096 in print_backtrace () at runtime.cc:79
#8  0x000000000023f2bc in abort (fmt=fmt@entry=0x634050 "Assertion failed: %s (%s: %s: %d)\n") at runtime.cc:121
#9  0x000000000023f30b in __assert_fail (expr=expr@entry=0x66fadb "sched::exception_depth <= 1", 
    file=file@entry=0x66fa66 "core/sched.cc", line=line@entry=236, 
    func=func@entry=0x670240 <sched::cpu::reschedule_from_interrupt(bool, std::chrono::duration<long, std::ratio<1l, 1000000000l> >)::__func__> "reschedule_from_interrupt") at runtime.cc:139
#10 0x00000000003fc5de in sched::cpu::reschedule_from_interrupt (this=0xffff80000096a040, 
    called_from_yield=called_from_yield@entry=false, preempt_after=..., preempt_after@entry=...) at core/sched.cc:271
#11 0x00000000003a5669 in sched::preempt () at include/osv/sched.hh:1309
#12 interrupt (frame=0xffff800001648068) at arch/x64/exceptions.cc:264
#13 <signal handler called>
#14 console::isa_serial_console::putchar (ch=112 'p') at arch/x64/processor.hh:258
#15 console::isa_serial_console::write (this=<optimized out>, 
    str=0x8fe0b5 <debug_ll::msg+21> "plication, addr: 0x000000005c8f2000\n", len=<optimized out>) at drivers/isa-serial.cc:79
#16 0x0000000000354e88 in console::console_multiplexer::drivers_write (len=1, 
    str=0x8fe0b4 <debug_ll::msg+20> "pplication, addr: 0x000000005c8f2000\n", this=0x8e77e0 <console::mux>)
    at drivers/console-multiplexer.cc:49
#17 console::console_multiplexer::<lambda(char const*, size_t)>::operator() (__closure=<optimized out>, len=1, 
    str=0x8fe0b4 <debug_ll::msg+20> "pplication, addr: 0x000000005c8f2000\n") at drivers/console-multiplexer.cc:36
#18 std::_Function_handler<void(char const*, long unsigned int), console::console_multiplexer::start()::<lambda(char const*, size_t)> >::_M_invoke(const std::_Any_data &, const char *&&, unsigned long &&) (__functor=..., __args#0=<optimized out>, 
    __args#1=<optimized out>) at /usr/include/c++/8/bits/std_function.h:297
#19 0x0000000000355943 in std::function<void (char const*, unsigned long)>::operator()(char const*, unsigned long) const (
    __args#1=<optimized out>, __args#0=<optimized out>, this=0x8e7840 <console::mux+96>)
    at /usr/include/c++/8/bits/std_function.h:682
#20 console::LineDiscipline::write(char const*, unsigned long, std::function<void (char const*, unsigned long)>&) (
    this=0xffffa000012d3800, str=0x8fe0b5 <debug_ll::msg+21> "plication, addr: 0x000000005c8f2000\n", len=<optimized out>, 
    writer=...) at drivers/line-discipline.cc:179
#21 0x0000000000355209 in console::console_multiplexer::write_ll (this=this@entry=0x8e77e0 <console::mux>, 
    str=str@entry=0x8fe0a0 <debug_ll::msg> "\n[backtrace]\n", len=<optimized out>) at drivers/console-multiplexer.cc:71
#22 0x0000000000354b52 in console::write_ll (msg=msg@entry=0x8fe0a0 <debug_ll::msg> "\n[backtrace]\n", len=<optimized out>)
    at drivers/console.cc:63
--Type <RET> for more, q to quit, c to continue without paging-- 
#23 0x00000000003e0268 in debug_ll (fmt=fmt@entry=0x659000 "page fault outside application, addr: 0x%016lx\n")
    at core/debug.cc:250
#24 0x0000000000347086 in mmu::vm_sigsegv (addr=<optimized out>, ef=0xffff80000096c048) at core/mmu.cc:1314
#25 0x0000000000348653 in mmu::vm_fault (addr=1552883712, addr@entry=1552884064, ef=ef@entry=0xffff80000096c048)
    at core/mmu.cc:1338
#26 0x00000000003a41fe in page_fault (ef=0xffff80000096c048) at arch/x64/mmu.cc:38
#27 <signal handler called>
#28 fhold (fp=0x5c8f2154) at fs/vfs/kern_descrip.cc:184
#29 0x000000000034b584 in intrusive_ptr_add_ref (fp=<optimized out>) at ./fs/fs.hh:20
#30 boost::intrusive_ptr<file>::intrusive_ptr (rhs=..., this=0xffff80000164bbe8)
    at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:93
#31 mmu::file_vma::fault (this=0xffffa00001559480, addr=35184374190080, ef=0xffff80000164c068) at core/mmu.cc:1689
#32 0x0000000000348535 in mmu::vm_fault (addr=addr@entry=35184374190080, ef=ef@entry=0xffff80000164c068)
    at /usr/include/boost/move/detail/meta_utils.hpp:267
#33 0x00000000003a41fe in page_fault (ef=0xffff80000164c068) at arch/x64/mmu.cc:38
#34 <signal handler called>
#35 0x00000000003922c3 in repmovsb (n=<synthetic pointer>: <optimized out>, src=<synthetic pointer>: <optimized out>, 
    dest=<synthetic pointer>: <optimized out>) at arch/x64/string.cc:249
#36 memcpy_repmov_ssse3 (dest=0x200000201000, src=0x2000001fe660, n=4096) at arch/x64/string.cc:249
#37 0x000000000043936c in uiomove (cp=cp@entry=0x2000001fe660, n=4096, uio=uio@entry=0x2000001ff730) at fs/vfs/subr_uio.cc:60
#38 0x00000000003688b3 in randomdev::random_read (dev=<optimized out>, uio=0x2000001ff730, ioflags=<optimized out>)
    at drivers/random.cc:74
#39 0x000000000044165b in device_read (dev=0xffffa00001457300, uio=0x2000001ff730, ioflags=0) at fs/devfs/device.cc:387
#40 0x000000000043f37b in vfs_file::read (this=0xffffa00001559680, uio=0x2000001ff730, flags=<optimized out>)
    at fs/vfs/vfs_fops.cc:54
#41 0x000000000043cdcf in sys_read (fp=0xffffa00001559680, iov=iov@entry=0x2000001ff7e0, niov=niov@entry=2, 
    offset=offset@entry=-1, count=count@entry=0x2000001ff7a8) at fs/vfs/vfs_syscalls.cc:275
#42 0x000000000043306d in preadv (fd=<optimized out>, iov=iov@entry=0x2000001ff7e0, iovcnt=iovcnt@entry=2, 
    offset=offset@entry=-1) at fs/vfs/main.cc:421
#43 0x00000000004330dc in readv (fd=<optimized out>, iov=iov@entry=0x2000001ff7e0, iovcnt=iovcnt@entry=2)
    at fs/vfs/main.cc:437
#44 0x000000000044fd0a in __stdio_read (f=0xffff900001244000, buf=<optimized out>, len=131072)
    at libc/stdio/__stdio_read.c:28
#45 0x000000000048293e in fread (destv=<optimized out>, size=1, nmemb=131072, f=0xffff900001244000)
    at musl/src/stdio/fread.c:30
#46 0x0000100000028991 in readentropy (out=0x200000200000, outsize=131072) at iperf_util.c:73
#47 0x000010000002054e in iperf_new_stream (test=0xffffa00001640800, s=5) at iperf_api.c:3358
#48 0x000010000002571b in iperf_run_server (test=0xffffa00001640800) at iperf_server_api.c:511
#49 0x00001000000064db in run (test=0xffffa00001640800) at main.c:152
#50 0x000010000000636c in main (argc=2, argv=0xffffa00001302800) at main.c:107
--Type <RET> for more, q to quit, c to continue without paging--
#51 0x000000000042d94d in osv::application::run_main (this=0xffffa00001302210) at /usr/include/c++/8/bits/stl_vector.h:805
#52 0x000000000042da7f in __libc_start_main (main=0x100000006295 <main>) at core/app.cc:37
#53 0x00001000000061da in _start ()

The output with 2 CPUs looks slightly different:

-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
random: device unblocked.
Accepted connection from 192.168.122.1, port 57278
page fault outside application, addr: 0x000000005c8fAssertion failed: sched::exception_depth <= 1 (core/sched.cc: reschedule_from_interrupt: 236)
@nyh
Copy link
Contributor

nyh commented Mar 18, 2019

Note that the "nested exception" thing is a red herring, the real cause of the crash are stack frames as follows. Then, we had problems printing the page-fault message.

#33 0x00000000003a41fe in page_fault (ef=0xffff80000164c068) at arch/x64/mmu.cc:38
#34 <signal handler called>
#35 0x00000000003922c3 in repmovsb (n=<synthetic pointer>: <optimized out>, src=<synthetic pointer>: <optimized out>, 
    dest=<synthetic pointer>: <optimized out>) at arch/x64/string.cc:249
#36 memcpy_repmov_ssse3 (dest=0x200000201000, src=0x2000001fe660, n=4096) at arch/x64/string.cc:249
#37 0x000000000043936c in uiomove (cp=cp@entry=0x2000001fe660, n=4096, uio=uio@entry=0x2000001ff730) at fs/vfs/subr_uio.cc:60
#38 0x00000000003688b3 in randomdev::random_read (dev=<optimized out>, uio=0x2000001ff730, ioflags=<optimized out>)
    at drivers/random.cc:74
#39 0x000000000044165b in device_read (dev=0xffffa00001457300, uio=0x2000001ff730, ioflags=0) at fs/devfs/device.cc:387
#40 0x000000000043f37b in vfs_file::read (this=0xffffa00001559680, uio=0x2000001ff730, flags=<optimized out>)
    at fs/vfs/vfs_fops.cc:54
#41 0x000000000043cdcf in sys_read (fp=0xffffa00001559680, iov=iov@entry=0x2000001ff7e0, niov=niov@entry=2, 
    offset=offset@entry=-1, count=count@entry=0x2000001ff7a8) at fs/vfs/vfs_syscalls.cc:275
#42 0x000000000043306d in preadv (fd=<optimized out>, iov=iov@entry=0x2000001ff7e0, iovcnt=iovcnt@entry=2, 
    offset=offset@entry=-1) at fs/vfs/main.cc:421
#43 0x00000000004330dc in readv (fd=<optimized out>, iov=iov@entry=0x2000001ff7e0, iovcnt=iovcnt@entry=2)
    at fs/vfs/main.cc:437
#44 0x000000000044fd0a in __stdio_read (f=0xffff900001244000, buf=<optimized out>, len=131072)
    at libc/stdio/__stdio_read.c:28
#45 0x000000000048293e in fread (destv=<optimized out>, size=1, nmemb=131072, f=0xffff900001244000)
    at musl/src/stdio/fread.c:30
#46 0x0000100000028991 in readentropy (out=0x200000200000, outsize=131072) at iperf_util.c:73
#47 0x000010000002054e in iperf_new_stream (test=0xffffa00001640800, s=5) at iperf_api.c:3358
#48 0x000010000002571b in iperf_run_server (test=0xffffa00001640800) at iperf_server_api.c:511
#49 0x00001000000064db in run (test=0xffffa00001640800) at main.c:152
#50 0x000010000000636c in main (argc=2, argv=0xffffa00001302800) at main.c:107

In the call memcpy_repmov_ssse3 (dest=0x200000201000, src=0x2000001fe660, n=4096) that failed both pointers appear "reasonable" (not garbage), but maybe n=4096 (i.e., c=4096 in random_read() in random.cc) is too much and beyond the source bufffer's end? This 4096 comes from us taking the size PAGE_SIZE and then random_yarrow_read() (called as random_adaptor.read) didn't reduce it. Was one of these things wrong?

If you can easily reproduce this issue, maybe you can add printouts in random_read() or debug it with gdb and figure out what is not working.

@wkozaczuk
Copy link
Collaborator Author

I have played more with it and I must say it is really weird. I could see this output, after I added some printouts to drivers/random.cc:

Accepted connection from 192.168.122.1, port 48874
--> random_read() called to get 131072 bytes of data
--> random_read() resid:131072, c:4096
--> random_read() resid:126976, c:4096
page fault outAssertion failed: sched::exception_depth <= 1 (core/sched.cc: reschedule_from_interrupt: 236)

And it always crashes after 3rd random_read() printout same way. I also changed our tests/misc-urandom.cc to make it do almost exactly what iperf3 does (see https://github.com/esnet/iperf/blob/master/src/iperf_util.c#L58-L79) and the test executes without any problems:

OSv v0.53.0-3-g8cd7d8aa
eth0: 192.168.122.15
--> random_read() called to get 131072 bytes of data
--> random_read() resid:131072, c:4096
--> random_read() resid:126976, c:4096
--> random_read() resid:122880, c:4096
--> random_read() resid:118784, c:4096
--> random_read() resid:114688, c:4096
--> random_read() resid:110592, c:4096
--> random_read() resid:106496, c:4096
--> random_read() resid:102400, c:4096
--> random_read() resid:98304, c:4096
--> random_read() resid:94208, c:4096
--> random_read() resid:90112, c:4096
--> random_read() resid:86016, c:4096
--> random_read() resid:81920, c:4096
--> random_read() resid:77824, c:4096
--> random_read() resid:73728, c:4096
--> random_read() resid:69632, c:4096
--> random_read() resid:65536, c:4096
--> random_read() resid:61440, c:4096
--> random_read() resid:57344, c:4096
--> random_read() resid:53248, c:4096
--> random_read() resid:49152, c:4096
--> random_read() resid:45056, c:4096
--> random_read() resid:40960, c:4096
--> random_read() resid:36864, c:4096
--> random_read() resid:32768, c:4096
--> random_read() resid:28672, c:4096
--> random_read() resid:24576, c:4096
--> random_read() resid:20480, c:4096
--> random_read() resid:16384, c:4096
--> random_read() resid:12288, c:4096
--> random_read() resid:8192, c:4096
--> random_read() resid:4096, c:4096
16 e2 b0 37 a2 ff fb d2 d6 7c

I have also by accident discovered that iperf3 works just fine (no crashed) when OSv image built as ZFS! So all the crashes I described was with ROFS. But then also the unit test which passes.

So it makes me think that there is some bug somewhere else ROFS-specific that just looks like a crash in random.cc. What could it be?

I will be adding iperf3 as a app soon so one can re-create this issue.

@wkozaczuk
Copy link
Collaborator Author

I think I know what the issue is or at least I understand where the smoking gun is :-). Here are 2 critical lines in iper3f that explain everything - https://github.com/esnet/iperf/blob/40e7c05440583f229edd6b6ca05c5d97b66fcf15/src/iperf_api.c#L3581 and https://github.com/esnet/iperf/blob/40e7c05440583f229edd6b6ca05c5d97b66fcf15/src/iperf_api.c#L3581.

As you can first it creates and opens a temp file - most likely somewhere under /tmp. The it mmaps it to get a buffer to read/write to and then passes the result of mmap - buffer to fill with random test data in readentropy which ends up reading from /dev/urandom.

Now rofs images (per fstab) mount ramfs on tmp. So I am wondering if mmap is properly supported by ramfs or possibly there is a bug somewhere in its implementation?

@nyh
Copy link
Contributor

nyh commented Mar 18, 2019

The iperf code creates an empty file, and then ftruncate()s it (actually, enlarging it), to blksize bytes, and then mmaps those blksize bytes. Perhaps we have the following compound bug?

  1. maybe ftruncate fails to enlarge the file? maybe we have a ramfs-specific bug in ftruncate()?
  2. maybe mmap on a file shorter than the specified length, instead of returning an error (should it?) just maps fewer pages, so some of the pages remain unmapped and when written to, causes a crash().

@nyh
Copy link
Contributor

nyh commented Mar 18, 2019

Hmm, the code of ramfs_truncate() seems reasonable... So I still don't have a guess what's wrong in the ramfs implementation.

@wkozaczuk wkozaczuk changed the title iperf3 fails with exception nested to deeply iperf3 fails with exception nested to deeply on ROFS/RamFS image May 1, 2019
@wkozaczuk
Copy link
Collaborator Author

wkozaczuk commented May 13, 2019

I have a test that one can use to replicate the issue:

#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
#include <cassert>

int main(int argc, char *argv[])
{
    char file_template[20];
    snprintf(file_template, sizeof(file_template) / sizeof(char), "%s/iperf3.XXXXXX", "/tmp");

    auto tmp_fd = mkstemp(file_template);
    assert(tmp_fd != -1);

    assert(unlink(file_template) >= 0);
    size_t buf_size = 131072;
    assert(ftruncate(tmp_fd, buf_size) >= 0);

    auto buf = (char *) mmap(NULL, buf_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, tmp_fd, 0);
    assert( buf != MAP_FAILED);
    printf("Mmapped\n");

    auto frandom = fopen("/dev/urandom", "rb");
    assert(frandom != NULL);

    printf("About to fread ...\n");
    assert(fread(buf, 1, buf_size, frandom) == buf_size);

    printf("Read random\n");

    close(tmp_fd);
    munmap(buf,buf_size);

    return 0;
}

Same thing - passes with ZFS but fails when /tmp is on RAMFS during fread():

#27 0x00000000003895ea in repmovsq (n=<synthetic pointer>: <optimized out>, src=<synthetic pointer>: <optimized out>, dest=<synthetic pointer>: <optimized out>) at arch/x64/string.cc:232
#28 memcpy_repmov_old (dest=0x200000201000, src=0x2000001fecc0, n=4096) at arch/x64/string.cc:232
#29 0x000000000042d8cc in uiomove (cp=cp@entry=0x2000001fecc0, n=4096, uio=uio@entry=0x2000001ffd90) at fs/vfs/subr_uio.cc:60
#30 0x0000000000360c43 in randomdev::random_read (dev=<optimized out>, uio=0x2000001ffd90, ioflags=<optimized out>) at drivers/random.cc:74
#31 0x000000000043586b in device_read (dev=0xffffa0000176e300, uio=0x2000001ffd90, ioflags=0) at fs/devfs/device.cc:387
#32 0x00000000004336ab in vfs_file::read (this=0xffffa00001790400, uio=0x2000001ffd90, flags=<optimized out>) at fs/vfs/vfs_fops.cc:54
#33 0x000000000043111f in sys_read () at fs/vfs/vfs_syscalls.cc:275
#34 0x000000000042797d in preadv (fd=<optimized out>, iov=iov@entry=0x2000001ffe40, iovcnt=iovcnt@entry=2, offset=offset@entry=-1) at fs/vfs/main.cc:421
#35 0x00000000004279ec in readv (fd=<optimized out>, iov=iov@entry=0x2000001ffe40, iovcnt=iovcnt@entry=2) at fs/vfs/main.cc:437
#36 0x0000000000443caa in __stdio_read (f=0xffff900001d36000, buf=<optimized out>, len=131072) at libc/stdio/__stdio_read.c:28
#37 0x000000000047602e in fread (destv=destv@entry=0x200000200000, size=size@entry=1, nmemb=nmemb@entry=131072, f=f@entry=0xffff900001d36000) at musl/src/stdio/fread.c:30
#38 0x0000100000400924 in main (argc=<optimized out>, argv=<optimized out>) at /home/wkozaczuk/projects/osv/tests/tst-mmap-tmp-file.cc:39
#39 0x000000000042247d in osv::application::run_main() () at core/app.cc:417
#40 0x00000000002253d4 in osv::application::main() () at core/app.cc:320
#41 0x00000000004225f9 in osv::application::<lambda(void*)>::operator() (__closure=0x0, app=<optimized out>) at core/app.cc:233
---Type <return> to continue, or q <return> to quit---
#42 osv::application::<lambda(void*)>::_FUN(void *) () at core/app.cc:235
#43 0x000000000044dde6 in pthread_private::pthread::<lambda()>::operator() (__closure=0xffffa00001d38200) at libc/pthread.cc:114
#44 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/8/bits/std_function.h:297
#45 0x00000000003f4357 in sched::thread_main_c (t=0xffff800001d42040) at arch/x64/arch-switch.hh:271
#46 0x000000000039b573 in thread_main () at arch/x64/entry.S:113

The key thing - unlink() right after mkstemp(). I am wondering if that is a deficiency in VFS or RAMFS when it does not properly do reference counting for open file descriptors and lets file data be freed to early.

@wkozaczuk
Copy link
Collaborator Author

So I think we have a gap in ramfs implementation. Neither open nor close are implemented:

#define ramfs_open      ((vnop_open_t)vop_nullop)
#define ramfs_close     ((vnop_close_t)vop_nullop)

and ramfs_remove() simply deleted underlying data right away. Correctly it should be done in the close() if implemented. I am not sure if we also need any reference counting in ramfs but it seems this is already done in vfs layer.

wkozaczuk added a commit to wkozaczuk/osv that referenced this issue May 20, 2019
Even though it is valid to delete a file its data (i-node)
should not be deleted until all file descriptors are closed.

This patch fixes the bug in file deletion logic in RAMFS
to make sure that file node does not get deleted
until all file descriptors are closed.

Fixes cloudius-systems#1035

Signed-off-by: Waldemar Kozaczuk <[email protected]>
wkozaczuk pushed a commit to wkozaczuk/osv that referenced this issue May 22, 2019
Even though it is valid to delete a file its data (i-node)
should not be deleted until all file descriptors are closed.

This patch fixes the bug in file deletion logic in RAMFS
to make sure that file node does not get deleted
until all file descriptors are closed.

Fixes cloudius-systems#1035

Signed-off-by: Waldemar Kozaczuk <[email protected]>
Message-Id: <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants