-
-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reschedule_from_interrupt assert(sched::exception_depth <= 1) when run specjbb2015 #933
Comments
Hi, thanks, the gdb stack trace is an excellent start. Does this happen all the time, or intermittently? Here I don't know how to continue. If you can easily reproduce this bug, you can try add an assert() in free_huge_page() to crash immediately when the given address is in the physical map (0xffff8...) but more than 8 GB. You can add a similar assert in alloc_huge_page() to see if it ever returns such an impossible address. I'm sure there's a bug somewhere, but I still can't put my finger on it. |
@nyh pr: ffff80023fe00000 //this is add by me, it is larger than 8g [backtrace] |
@nyh @narkisr @pdziepak @anatol @benoit-canet [backtrace] |
@nyh @narkisr @pdziepak @anatol @benoit-canet void vma::fault(uintptr_t addr, exception_frame *ef)
{
- auto hp_start = align_up(_range.start(), huge_page_size);
- auto hp_end = align_down(_range.end(), huge_page_size);
+// auto hp_start = align_up(_range.start(), huge_page_size);
+// auto hp_end = align_down(_range.end(), huge_page_size);
size_t size;
- if (!has_flags(mmap_jvm_balloon|mmap_small) && (hp_start <= addr && addr < hp_end)) {
- addr = align_down(addr, huge_page_size);
- size = huge_page_size;
- } else {
+// if (!has_flags(mmap_jvm_balloon|mmap_small) && (hp_start <= addr && addr < hp_end)) {
+// addr = align_down(addr, huge_page_size);
+// size = huge_page_size;
+// } else {
size = page_size;
- }
+// } |
Disabling huge pages will reduce performance, as the CPU's address cache (the TLB) will get filled with a lot of small pages instead of few huge pages, so it's generally not a good idea to disable it, except for debugging of course. I have to admit I don't remember all the details of our memory allocation, which @pdziepak rewrote, so nothing springs to my mind as to why this bug happens. The alloc_huge_page returning an address bigger than 8G seems like a bug to me but maybe there is an explanation why this is legal? (I can't think of any). If that is indeed the bug, you can start to try figuring out where this number comes from, and since it's so easy to recognize with an if() you can perhaps add such if()s in various places to find what is causing this. You said the bug is specific to having many CPUs. This might also indicate a race or locking bug or memory overrun (like in commit 74543fc) or something so that when a lot of CPUs allocate memory concurrently, one gets wrongs results. Of course, this is just a guess, I have no idea what the actual problem is. Issue #755 is another problem we saw in this area of the code, maybe it's related (but then again, maybe not). |
page fault outside application, addr: 0xffff800240000000 pc:0x00000000003e0e3c text_start:0x0000000000202000 text_end 0x000000000059df08
this is mean page fault in osv kernel code. but i don't how to debug
[registers]
RIP: 0x00000000003e0e3c memory::page_range_allocator::free(memory::page_range*)+268
RFL: 0x0000000000010202 CS: 0x0000000000000008 SS: 0x0000000000000010
RAX: 0x0000000000000000 RBX: 0xffff80023fe00000 RCX: 0Assertion failed: sched::exception_depth <= 1 (core/sched.cc: reschedule_from_interrupt: 238)
here is the call trace, and the command is "./scripts/run.py -m 8G -c 12", and if i use "./scripts/run.py -m 8G -c 4" it doesn't crash.
The text was updated successfully, but these errors were encountered: