Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PAL/Linux] Log a message on mmap() returning -ENOMEM #2023

Merged

Conversation

kailun-qin
Copy link
Contributor

@kailun-qin kailun-qin commented Oct 10, 2024

Description of the changes

Linux can return -ENOMEM if a request to mmap() exceeds the process's maximum number of mappings. After commit c0a2765 "[LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support", this can occur during the checkpoint/restore on the Linux PAL. In such case, each committed page is sent separately and Linux creates a mapping for each one, which can result in a large number of mappings.

This commit makes this failure more visible and provides a hint to the user on what can be done on the host.

Related to gramineproject/examples#107.

How to test this PR?

CI.


This change is Reviewable

dimakuv
dimakuv previously approved these changes Oct 10, 2024
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)


pal/src/host/linux/pal_memory.c line 43 at r1 (raw file):

        int ret = PTR_TO_ERR(res_addr);
        if (ret == -ENOMEM && FIRST_TIME()) {
            log_error("Host Linux returned -ENOMEM on mmap(); this may happen because process's "

I think that's the correct spelling, but I'm not 100% sure, the rules are a bit weird.

Suggestion:

process'

pal/src/host/linux/pal_memory.c line 44 at r1 (raw file):

        if (ret == -ENOMEM && FIRST_TIME()) {
            log_error("Host Linux returned -ENOMEM on mmap(); this may happen because process's "
                      "maximum number of mappings is exceeded. Gramine cannot handle this case. "

Suggestion:

would be exceeded

pal/src/host/linux/pal_memory.c line 46 at r1 (raw file):

                      "maximum number of mappings is exceeded. Gramine cannot handle this case. "
                      "You may want to increase the value in /proc/sys/vm/max_map_count.");
        }

This should be debug-only, ENOMEM is pretty common in normal apps (e.g. trying to allocate a large contiguous chunk, if ENOMEM then retrying with 2 smaller chunks, etc.) and it also corresponds to other possible causes of failure.

Copy link
Contributor Author

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @mkow)


pal/src/host/linux/pal_memory.c line 43 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I think that's the correct spelling, but I'm not 100% sure, the rules are a bit weird.

I think the correct possessive form of process should be process's, but I'm not a native speaker so I can't say for sure. :)


pal/src/host/linux/pal_memory.c line 44 at r1 (raw file):

        if (ret == -ENOMEM && FIRST_TIME()) {
            log_error("Host Linux returned -ENOMEM on mmap(); this may happen because process's "
                      "maximum number of mappings is exceeded. Gramine cannot handle this case. "

I basically mirror the logging msg for munmap() returning -ENOMEM:

log_error("Host Linux returned -ENOMEM on munmap(); this may happen because process's "
"maximum number of mappings is exceeded. Gramine cannot handle this case. "
"You may want to increase the value in /proc/sys/vm/max_map_count.");
. Do we want to also update that?


pal/src/host/linux/pal_memory.c line 46 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This should be debug-only, ENOMEM is pretty common in normal apps (e.g. trying to allocate a large contiguous chunk, if ENOMEM then retrying with 2 smaller chunks, etc.) and it also corresponds to other possible causes of failure.

Agree, I'll update in the fixup (after we align on the above comments).

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)


pal/src/host/linux/pal_memory.c line 43 at r1 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

I think the correct possessive form of process should be process's, but I'm not a native speaker so I can't say for sure. :)

I think the rules are not clear on this and kinda both are ok, so unblocking. But out of curiosity, maybe @donporter would have a minute to elaborate on this :)


pal/src/host/linux/pal_memory.c line 44 at r1 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

I basically mirror the logging msg for munmap() returning -ENOMEM:

log_error("Host Linux returned -ENOMEM on munmap(); this may happen because process's "
"maximum number of mappings is exceeded. Gramine cannot handle this case. "
"You may want to increase the value in /proc/sys/vm/max_map_count.");
. Do we want to also update that?

Oh. Yeah, I'd update also that one.

Copy link
Contributor Author

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)


pal/src/host/linux/pal_memory.c line 44 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Oh. Yeah, I'd update also that one.

Done.


pal/src/host/linux/pal_memory.c line 46 at r1 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Agree, I'll update in the fixup (after we align on the above comments).

Done.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners

Linux can return `-ENOMEM` if a request to `mmap()` exceeds the
process's maximum number of mappings. After commit c0a2765
"[LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support", this can occur
during the checkpoint/restore on the Linux PAL. In such case, each
committed page is sent separately and Linux creates a mapping for each
one, which can result in a large number of mappings.

This commit makes this failure more visible and provides a hint to the
user on what can be done on the host.

Signed-off-by: Kailun Qin <[email protected]>
@kailun-qin kailun-qin force-pushed the kailun-qin/add-hint-mmap-enomem branch from 8a95c28 to 9221027 Compare October 16, 2024 16:09
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@mkow
Copy link
Member

mkow commented Oct 16, 2024

Jenkins, retest Jenkins-Direct-24.04-Sanitizers please (pipe04 from LTP timed out, we probably have to extend its timeout even more).

@mkow
Copy link
Member

mkow commented Oct 16, 2024

Jenkins, retest linux-sgx-vm-gcc-release please (gramine-sgx-gen-private-key failed with "cryptography.exceptions.InternalError: Unknown OpenSSL error. This error is commonly encountered when another library is not cleaning up the OpenSSL error stack. If you are using cryptography with another library that uses OpenSSL try disabling it before reporting a bug. Otherwise please file an issue at https://github.com/pyca/cryptography/issues with information on how to reproduce this. ([])", seems like a bug in cryptography.io?)

@mkow mkow merged commit 9221027 into gramineproject:master Oct 16, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants