You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description of the problem
It is observed that the receive_checkpoint_and_restore() function call fails with ENOMEM error in gramine-direct mode for the examples/scikit-learn-intelex workload in the examples repo, as shown in the below markup. On further debugging, we found that this is a regression caused by the commit c0a2765[LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support.
intel@intel-M50CYP2SBSTD:~/gramine_edmm/examples/scikit-learn-intelex$ gramine-direct ./sklearnex scripts/kmeans_perf_eval.py
error: failed allocating 0x1a06ff6c2000-0x1a06ff6c3000
error: libos_init() failed in receive_checkpoint_and_restore: Cannot allocate memory (ENOMEM)
[P1:T1:python3.10] error: failed sending checkpoint: Permission denied (EACCES)
[P1:T1:python3.10] error: process creation failed
error: failed allocating 0x1a06ff6c2000-0x1a06ff6c3000
error: libos_init() failed in receive_checkpoint_and_restore: Cannot allocate memory (ENOMEM)
[P1:T1:python3.10] error: failed sending checkpoint: Permission denied (EACCES)
[P1:T1:python3.10] error: process creation failed
error: failed allocating 0x1a06ff6c2000-0x1a06ff6c3000
error: libos_init() failed in receive_checkpoint_and_restore: Cannot allocate memory (ENOMEM)
[P1:T1:python3.10] error: failed sending checkpoint: Permission denied (EACCES)
[P1:T1:python3.10] error: process creation failed
error: failed allocating 0x1a06ff6c2000-0x1a06ff6c3000
error: libos_init() failed in receive_checkpoint_and_restore: Cannot allocate memory (ENOMEM)
[P1:T1:python3.10] error: failed sending checkpoint: Permission denied (EACCES)
[P1:T1:python3.10] error: process creation failed
Emulating a raw system/supervisor call. This degrades performance, consider patching your application to use Gramine syscall API.
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
*** Stock Scikit-learn ***
Train time: 22.878 s
Inertia: 2468815.517
Number of iterations: 98
Davies-Bouldin metric on train data: 2.877
Predict time: 0.028 s
Davies-Bouldin metric on test data: 2.896
*** Intel extension for Scikit-learn ***
Train time: 6.979 s
Inertia: 2468787.252
Number of iterations: 157
Davies-Bouldin metric on train data: 2.780
Predict time: 0.006 s
Davies-Bouldin metric on test data: 2.768
Kmeans perf evaluation finished
Note
The above observation is not seen with gramine-sgx execution mode.
EDMM is NOT enabled.
Even though we see multiple ENOMEM/EACCES messages, the workload successfully executes and prints the final success message like Kmeans perf evaluation finished.
Description of the problem
It is observed that the
receive_checkpoint_and_restore()
function call fails withENOMEM
error ingramine-direct
mode for theexamples/scikit-learn-intelex
workload in theexamples
repo, as shown in the below markup. On further debugging, we found that this is a regression caused by the commitc0a2765
[LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support.Note
gramine-sgx
execution mode.ENOMEM/EACCES
messages, the workload successfully executes and prints the final success message likeKmeans perf evaluation finished
.Steps to reproduce
git clone https://github.com/gramineproject/examples.git
.Expected results
We are able to execute the
scikit-learn-intelex
example without any error messages.Actual results
Even though the
scikit-learn-intelex
example is executed, we get error messages as seen above.Gramine commit hash
c0a2765
The text was updated successfully, but these errors were encountered: