-
-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iperf3 fails with exception nested to deeply on ROFS/RamFS image #1035
Comments
Note that the "nested exception" thing is a red herring, the real cause of the crash are stack frames as follows. Then, we had problems printing the page-fault message.
In the call If you can easily reproduce this issue, maybe you can add printouts in random_read() or debug it with gdb and figure out what is not working. |
I have played more with it and I must say it is really weird. I could see this output, after I added some printouts to drivers/random.cc:
And it always crashes after 3rd random_read() printout same way. I also changed our tests/misc-urandom.cc to make it do almost exactly what iperf3 does (see https://github.com/esnet/iperf/blob/master/src/iperf_util.c#L58-L79) and the test executes without any problems:
I have also by accident discovered that iperf3 works just fine (no crashed) when OSv image built as ZFS! So all the crashes I described was with ROFS. But then also the unit test which passes. So it makes me think that there is some bug somewhere else ROFS-specific that just looks like a crash in random.cc. What could it be? I will be adding iperf3 as a app soon so one can re-create this issue. |
I think I know what the issue is or at least I understand where the smoking gun is :-). Here are 2 critical lines in iper3f that explain everything - https://github.com/esnet/iperf/blob/40e7c05440583f229edd6b6ca05c5d97b66fcf15/src/iperf_api.c#L3581 and https://github.com/esnet/iperf/blob/40e7c05440583f229edd6b6ca05c5d97b66fcf15/src/iperf_api.c#L3581. As you can first it creates and opens a temp file - most likely somewhere under /tmp. The it mmaps it to get a buffer to read/write to and then passes the result of mmap - buffer to fill with random test data in readentropy which ends up reading from /dev/urandom. Now rofs images (per fstab) mount ramfs on tmp. So I am wondering if mmap is properly supported by ramfs or possibly there is a bug somewhere in its implementation? |
The iperf code creates an empty file, and then ftruncate()s it (actually, enlarging it), to blksize bytes, and then mmaps those blksize bytes. Perhaps we have the following compound bug?
|
Hmm, the code of ramfs_truncate() seems reasonable... So I still don't have a guess what's wrong in the ramfs implementation. |
I have a test that one can use to replicate the issue: #include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
#include <cassert>
int main(int argc, char *argv[])
{
char file_template[20];
snprintf(file_template, sizeof(file_template) / sizeof(char), "%s/iperf3.XXXXXX", "/tmp");
auto tmp_fd = mkstemp(file_template);
assert(tmp_fd != -1);
assert(unlink(file_template) >= 0);
size_t buf_size = 131072;
assert(ftruncate(tmp_fd, buf_size) >= 0);
auto buf = (char *) mmap(NULL, buf_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, tmp_fd, 0);
assert( buf != MAP_FAILED);
printf("Mmapped\n");
auto frandom = fopen("/dev/urandom", "rb");
assert(frandom != NULL);
printf("About to fread ...\n");
assert(fread(buf, 1, buf_size, frandom) == buf_size);
printf("Read random\n");
close(tmp_fd);
munmap(buf,buf_size);
return 0;
} Same thing - passes with ZFS but fails when /tmp is on RAMFS during fread():
The key thing - unlink() right after mkstemp(). I am wondering if that is a deficiency in VFS or RAMFS when it does not properly do reference counting for open file descriptors and lets file data be freed to early. |
So I think we have a gap in ramfs implementation. Neither open nor close are implemented:
and ramfs_remove() simply deleted underlying data right away. Correctly it should be done in the close() if implemented. I am not sure if we also need any reference counting in ramfs but it seems this is already done in vfs layer. |
Even though it is valid to delete a file its data (i-node) should not be deleted until all file descriptors are closed. This patch fixes the bug in file deletion logic in RAMFS to make sure that file node does not get deleted until all file descriptors are closed. Fixes cloudius-systems#1035 Signed-off-by: Waldemar Kozaczuk <[email protected]>
Even though it is valid to delete a file its data (i-node) should not be deleted until all file descriptors are closed. This patch fixes the bug in file deletion logic in RAMFS to make sure that file node does not get deleted until all file descriptors are closed. Fixes cloudius-systems#1035 Signed-off-by: Waldemar Kozaczuk <[email protected]> Message-Id: <[email protected]>
OSv starts successfully with single CPU with iperf3 however once connected with a client it fails like so:
From gdb:
The output with 2 CPUs looks slightly different:
The text was updated successfully, but these errors were encountered: