-
-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aarch64: ZFS image crashes due to a page fault caused by enabled access flag #1131
Comments
Please note that ZFS is not supported on OSv until the issue #1131 is fixed therefore we default to ROFS (ramfs is also supported). This makes it a bit more convenient for users to build aarch64 images as they do not need to explictly append 'fs=rofs --create-disk'. Signed-off-by: Waldemar Kozaczuk <[email protected]>
While working on another patch I came across this issue again and I think I have more clarity now on what exactly needs to be done here. So I am writing the details down. Normally, all PTEs created by the On Intel x64, according to this excerpt from the paragraph "Page-Directory and Page-Table Entries, 3.7.6" of this manual about the Accessed Flag bit: "Indicates whether a page or page table has been accessed (read from or written to) when set. Memory management software typically clears this flag when a page or page table is initially loaded into physical memory. The processor then sets this flag the first time a page or page table is accessed. This flag is a “sticky” flag, meaning that once set, the processor does not implicitly clear it. Only software can clear this flag. The accessed and dirty flags are provided for use by memory management software to manage the transfer of pages and page tables into and out of physical memory.", the AF bit is automatically set by the processor when relevant page of memory is accessed first time since the bit was cleared. Now on ARM64, according to this documentation, the access flag (AF) bit can be set by either software or hardware: "There are two ways that the AF bit can be set on access:
To have the hardware set it automatically like on Intel, we would need to set the 39th (HA) bit of the So it seems we need to handle this in software and change our page fault handler (
|
Similarly, I think we should also set Dirty Flag bit at the same time Access Flag exception is raised. We would do it ONLY if the Dirty Flag bit was clear at that point. The same can be also handled by hardware when setting the 40th (HD) bit of the |
The crash looks like this:
Disabling the page_access_scanner thread makes the image run without this error but it takes long before the app is actually run.
The relevant information from AArch64 programmer's guide:
"Another memory attribute bit in the descriptor, the Access Flag (AF), indicates when a block entry is used for the first time.
• AF = 0: This block entry has not yet been used.
• AF = 1: This block entry has been used.
Operating systems use an access flag bit to keep track of which pages are being used. Software manages the flag. When the page is first created, its entry has AF set to 0. The first time the page is accessed by code, if it has AF at 0, this triggers an MMU fault. The Page fault handler records that this page is now being used and manually sets the AF bit in the table entry. For example, the Linux kernel uses the [AF] bit for PTE_AF on ARM64 (the Linux kernel name for AArch64), which is used to check whether a page has ever been accessed. This influences some of the kernel memory management choices. For example, when a page must be swapped out of memory, it is less likely to swap out pages that are being actively used."
The issue seems to be that OSv does not handle or recognize memory access fault. It seems that page cache scanner needs to be adapter to work on AArch64 where rather than scanning if a page has been accessed, it should simply handle the fault and react then.
The text was updated successfully, but these errors were encountered: