Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe need to reset fpu status word on context switch #1020

Closed
nyh opened this issue Dec 2, 2018 · 5 comments
Closed

Maybe need to reset fpu status word on context switch #1020

nyh opened this issue Dec 2, 2018 · 5 comments

Comments

@nyh
Copy link
Contributor

nyh commented Dec 2, 2018

This issue was created due to a thought experiment - I don't have evidence that it leads to an actual bug.

When we return to a thread A after a context switch from another thread B, if the reason we left thread A was an asynchronous event (e.g., interrupt) we will restore A's full FPU state. However, if the reason we left A was a function call (e.g., sleep), we never restore A's FPU state and just keep the one B had (at some arbitrary point in the middle of its computation). The status of the FPU registers doesn't matter (because they are caller-saved), but in commit 0959f20 we realized the control words do matter - changes in thread B will change A as well. In this issue I suggest the status word (FSW) is important too - if thread B had something in its FPU stack, we will return to A thinking there is something in the stack. There shouldn't be. The Linux ABI says that we need to return from a function with an empty stack (unless the function returns a floating point value). On on context switch back to A via thread::switch_to() we should probably clear the FPU stack via FNINIT (we need to do this before loading the control word, of course, because fninit resets that too). If A stopped running because of a function call, this clear stack is what it will receive, but if A stopped on an interrupt, after the switch back to A it will return from interrupt(), leading the fpu_lock to be restored, and the stack will be filled again with what it may have had while A was running.

I need to think if there's any SSE equivalent to the above that needs fixing (I think there is no register stack in SSE, but not sure...).

@nyh
Copy link
Contributor Author

nyh commented Dec 2, 2018

https://wiki.osdev.org/FPU also says:

Using the MMX aliases for the FPU registers will cause those registers to be invalidated for floating point use. The EMMS instruction will reset the registers to non-vector use. The x86 calling convention assumes that the stack is usable for either floating point or vector use, so you will need to call EMMS before calling or returning to regular compiler-generated code.

So maybe we also need to use EMMS on context switch. And probably also in init_fpu() called on new threads (in by thread_main_c()).

@nyh
Copy link
Contributor Author

nyh commented Dec 2, 2018

Note that if we do FNINIT and EMMS after saving the FPU state (see issue #1019), then this issue will also be solved: when we interrupt B with some funky FPU state we will reset the FPU state so the kernel will get a clean one, but when we switch back to A the same clean state will remain and we can return to A fine.
So probably no real need to do both #1019 and this one. Just remember to also do EMMS in #1019.

@wkozaczuk
Copy link
Collaborator

I am not an expert on how many cycles given instruction costs but per this FNINIT costs 25 cycles and I am not sure how much percentage wise it will slow down context switch in OSv. How would we measure that?

But it is more important to have correct and possibly slower code than fast bug buggy. In any case we might eventually optimize this logic by examining if FPU was actually used by the thread we are switching from. I think the information in paragraphs "13.5.4 Processor Tracking of XSAVE-Managed State" and "13.6 OPERATION OF XSAVE" of this document https://www.ti.uni-bielefeld.de/html/teaching/WS1819/techinf1/64-ia-32-architectures-software-developers-manual.pdf may help us accomplish this.

@wkozaczuk
Copy link
Collaborator

Applying this patch:

diff --git a/arch/x64/arch-switch.hh b/arch/x64/arch-switch.hh
index e3769540..0a8953ae 100644
--- a/arch/x64/arch-switch.hh
+++ b/arch/x64/arch-switch.hh
@@ -103,6 +103,7 @@ void thread::switch_to()
            [rip]"i"(offsetof(thread_state, rip))
          : "rbx", "rdx", "rsi", "rdi", "r8", "r9",
            "r10", "r11", "r12", "r13", "r14", "r15", "memory");
+    asm volatile("emms");
     processor::fldcw(fpucw);
     processor::ldmxcsr(mxcsr);
 }

seems to have addresses following issues:

Given that EMMS instruction costs only 1 cycle adding it to every task switch does not dramatically increase its cost at least based on results of running tests/misc-ctxsw.cc.

It would be nice to add proper unit test doing FPU/MMX/SSE work in one thread and something filling the FPU stack without clearing in other thread.

@wkozaczuk
Copy link
Collaborator

wkozaczuk commented Dec 4, 2018

I am thinking that fixing this issue does fix #1019. On other hand based on results of testing a patch to address #1019, fixing it might not necessarily fix #1020 (this one) as we seemed to think initially.

@nyh nyh closed this as completed in 25209d8 Jan 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants