Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aarch64: ZFS image crashes due to a page fault caused by enabled access flag #1131

Closed
wkozaczuk opened this issue Mar 13, 2021 · 2 comments
Closed
Labels

Comments

@wkozaczuk
Copy link
Collaborator

The crash looks like this:

OSv v0.55.0-206-g6df05b6c
getauxval() stubbed
eth0: 192.168.122.15
Booted up in 141.08 ms
Cmdline: /hello
page fault outside application, addr: 0x0000100000000000
[registers]
PC: 0x000000004047a070 <???+1078435952>
X00: 0x0000100000000000 X01: 0x0000100000001000 X02: 0x0000100000000000
X03: 0x0000000084448004 X04: 0x0000000000000040 X05: 0x000000004087a000
X06: 0x0000200000100140 X07: 0x0000100000000000 X08: 0x0000000000000000
X09: 0x0000100000000000 X10: 0x000000000000007b X11: 0x0000000000000000
X12: 0x0000000000000001 X13: 0x0000000000000000 X14: 0x0000000000010000
X15: 0x0000000000000000 X16: 0x00000000401d93b0 X17: 0x0000000000000001
X18: 0x0000000000000000 X19: 0xffffa000410adb00 X20: 0x0000100000000000
X21: 0x0000000000001000 X22: 0x0000000000000000 X23: 0xffffa000410adb00
X24: 0x000000009600000b X25: 0x0000000000000005 X26: 0xffffa00040947be0
X27: 0xffffa000410ada00 X28: 0x0000200000100680 X29: 0x00002000001000e0
X30: 0x00000000401e4ac4 SP:  0x00002000001000e0 ESR: 0x000000009600014b
PSTATE: 0x0000000080000345
Aborted

[backtrace]
0x00000000401da5c4 <mmu::vm_fault(unsigned long, exception_frame*)+724>
0x000000004020bb18 <page_fault+100>
0x000000004020b824 <???+1075886116>
0x00000000401da3b8 <mmu::vm_fault(unsigned long, exception_frame*)+200>
0x000000004020bb18 <page_fault+100>
0x000000004020b824 <???+1075886116>
0x00000000401f1f98 <elf::program::load_object(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<std::shared_ptr<elf::object>, std::allocator<std::shared_ptr<elf::object> > >&)+2696>
0x00000000401f28e4 <elf::program::get_library(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool)+116>
0x000000004031507c <osv::application::prepare_argv(elf::program*)+252>
0x0000000040315890 <osv::application::application(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<0x00000000403160ac <osv::application::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>0x0000000040316328 <osv::application::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+72>
0x00000000400dc78c <do_main_thread(void*)+2220>
0x0000000040348670 <???+1077184112>
0x00000000402e7290 <thread_main_c+32>
0x00000000402c04dc <???+1076626652>

Disabling the page_access_scanner thread makes the image run without this error but it takes long before the app is actually run.

The relevant information from AArch64 programmer's guide:
"Another memory attribute bit in the descriptor, the Access Flag (AF), indicates when a block entry is used for the first time.
• AF = 0: This block entry has not yet been used.
• AF = 1: This block entry has been used.
Operating systems use an access flag bit to keep track of which pages are being used. Software manages the flag. When the page is first created, its entry has AF set to 0. The first time the page is accessed by code, if it has AF at 0, this triggers an MMU fault. The Page fault handler records that this page is now being used and manually sets the AF bit in the table entry. For example, the Linux kernel uses the [AF] bit for PTE_AF on ARM64 (the Linux kernel name for AArch64), which is used to check whether a page has ever been accessed. This influences some of the kernel memory management choices. For example, when a page must be swapped out of memory, it is less likely to swap out pages that are being actively used."

The issue seems to be that OSv does not handle or recognize memory access fault. It seems that page cache scanner needs to be adapter to work on AArch64 where rather than scanning if a page has been accessed, it should simply handle the fault and react then.

wkozaczuk added a commit that referenced this issue Jun 11, 2021
Please note that ZFS is not supported on OSv until the issue #1131
is fixed therefore we default to ROFS (ramfs is also supported).
This makes it a bit more convenient for users to build aarch64
images as they do not need to explictly append 'fs=rofs --create-disk'.

Signed-off-by: Waldemar Kozaczuk <[email protected]>
@wkozaczuk
Copy link
Collaborator Author

wkozaczuk commented Apr 11, 2022

While working on another patch I came across this issue again and I think I have more clarity now on what exactly needs to be done here. So I am writing the details down.

Normally, all PTEs created by the make_pte() (see arch/aarch/arch-mmu.hh) have the AF bit set, but the ZFS page access scanner in core/pagecache.cc, clears the AF bit for the relevant pages with the expectation that it will be set again when the page is accessed for read or write.

On Intel x64, according to this excerpt from the paragraph "Page-Directory and Page-Table Entries, 3.7.6" of this manual about the Accessed Flag bit:

"Indicates whether a page or page table has been accessed (read from or written to) when set. Memory management software typically clears this flag when a page or page table is initially loaded into physical memory. The processor then sets this flag the first time a page or page table is accessed.

This flag is a “sticky” flag, meaning that once set, the processor does not implicitly clear it. Only software can clear this flag. The accessed and dirty flags are provided for use by memory management software to manage the transfer of pages and page tables into and out of physical memory.",

the AF bit is automatically set by the processor when relevant page of memory is accessed first time since the bit was cleared.

Now on ARM64, according to this documentation, the access flag (AF) bit can be set by either software or hardware:

"There are two ways that the AF bit can be set on access:

  • Software Update: Accessing the page causes a synchronous exception (Access Flag fault). In the exception handler, software is responsible for setting the AF bit in the relevant translation table entry and returns.
  • Hardware Update: Accessing the page causes hardware to automatically set the AF bit without needing to generate an exception. This behavior needs to be enabled and was added in Armv8.1-A."

To have the hardware set it automatically like on Intel, we would need to set the 39th (HA) bit of the TCR_EL1 (see https://developer.arm.com/documentation/102226/0002/Register-descriptions/AArch64-system-registers/TCR-EL1--Translation-Control-Register--EL1?lang=en for details) in init_cpu in arch/aarch/boot.S BUT only if the cpu has such capability. For that we would need to read the HAFDBS (0-3) bits of the ID_AA64MMFR1_EL1 register (see https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/ID-AA64MMFR1-EL1--AArch64-Memory-Model-Feature-Register-1). On the ARM machines I am testing at least on QEMU, all bits of the ID_AA64MMFR1_EL1 register are 0 so maybe this hardware capability is not that common.

So it seems we need to handle this in software and change our page fault handler (page_fault() in arch/aarch64/mmu.cc) to:

  1. Read bits 0-5 (DFSC or IFSC) of the ESR register to detect if it is an Access Flag fault (see "Access Flag fault" combinations in https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/ESR-EL1--Exception-Syndrome-Register--EL1-?lang=en, I do not think we should differentiate between Instruction Access vs Data Access). This should be easy.
  2. Do page walk to navigate to the leaf PTE and set the AF bit to 1 - pte.set_accessed(true). We should probably use mmu::get_root_pt() and then do something similar to mmu::map_level::follow() (see core/mmu.cc) to navigate all the way down (4 levels because it should be a leaf) to the leaf pte. This should not be very difficult but it it involves very heavy template-based code so it will not be very easy to figure it out.
  3. Also we may need to do it all under some lock, possibly the same as we use when manipulating VMAs, and possibly flush TLB after we set the AF bit.

@wkozaczuk
Copy link
Collaborator Author

Similarly, I think we should also set Dirty Flag bit at the same time Access Flag exception is raised. We would do it ONLY if the Dirty Flag bit was clear at that point. The same can be also handled by hardware when setting the 40th (HD) bit of the TCR_EL1 register if such capability is provided by the cpu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant