-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in blas_server_omp.c #1536
Conversation
Hi @martin-frbg The CI built failed(https://travis-ci.org/xianyi/OpenBLAS/jobs/370388725). It seems that gcc version is too low. It doesn't have Would you please upgrade the gcc to 4.9 or above, or would you please give me some suggestion for how to fix this? |
Can you try with |
Is there a (preferably simple) test code that shows the problem ? The general impression so far has been that using OpenMP "solves" the locking problems seen elsewhere in the code, and using tools like helgrind to debug OMP threads (as was tried in #1416) will only generate lots of false positives. |
It can fix no stdatomic.h issue, but there is still no atomic operation function, such as: Does some lock mechanism acceptable? This issue can be reproduced by multi thread call |
Falling back to the old "unsafe" code is certainly better than just telling everybody to use a newer/better compiler. Locking is acceptable, but so far the impression has been that OMP already seemed to take care of that. |
OK, then I will fall back to old code when compiler STDC_VERSION_ < 201112L. |
I uploaded the new version of code, would you please check if that is OK now.
Because _Atomic is disabled by default when we enable OpenMP: |
Hi @martin-frbg
It isn't because of error. Would you please help me to review this commit is OK? |
Do you happen to have a test case that fails without your changes, or did you find this by reviewing the current implementation ? |
I use openblas in |
From the current implementation, we can also see that if two threads enter
|
It is unlimited hierarchy levels in recent OMP versions. If not holding per-thread register , even less n^2 sized , one could just rely on omp thread detection. Usually OMP_NUM_THREADS=N,1 is default assumption, one might want number of numa nodes followed by processors in numa node on huge or CoD machine |
Hi @brada4 , I'm sorry that I'm not very clear about what you mean. |
Hi @martin-frbg , |
Will it handle deep nesting? e.g. OMP_NUM_THREADS=2,2,10,2,1 |
I only see OMP_NUM_THREADS equal to a single integer in OpenBLAS, such as: Is there anyplace use deep nesting OMP_NUM_THREADS in OpenBLAS? |
Hi @martin-frbg Hi @brada4 |
Could someone help me to review this commit please? |
Looks good to me on first glance, but I lack the time to really review it at the moment. Could you add your NUM_PARALLEL parameter to Makefile.rule with a short explanation as well ? |
If we have like 8-way 28-core machine you make 1.5MB N^2 structure that does not fit in fast caches. |
@martin-frbg @brada4 |
Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d
Hi @martin-frbg |
OMP already has such functionality built in, without recompiling library. |
You want to address overwriting array with list of threads. O(n^Inf) complexity is not a solution. |
@martin-frbg Thanks. It's much better now. @brada4 |
The output data itself is independent and will not clash.
|
Seems to me this fixes a use case that was previously "don't do that", without affecting anything else (as long as the new option is not set). |
New code adds only I don't know how omp_get_num_threads could get the application's thread number, I think it can only get the parallel region's number, would you explain it in more detail? |
@martin-frbg if the caffe threads are OpenMP threads then they know (with default being NPROC,1, tunable with variables, not with build static structure....) |
I think we can't make sure every application(which use OpenBLAS) would call OpenBLAS API in OpenMP thread(Maybe it would call it in pthread or something else), so we can't make sure Would you please submit a better solution code, then I can close this pull request. |
Hi @martin-frbg |
Hi @martin-frbg |
I lack the time (and serious hardware) for appropriate testing at the moment, also waiting for other comments (including answer from xianyi if he wants this to go in before or after the - hopefully - upcoming release) |
How do you call caffe in two thread? If you build OpenBLAS with OpenMP, we suppose your application is OpenMP, too. If you application calls |
Yes, my application start two pthreads, each of them has a unique caffe instance. Both of my application and OpenBLAS use OpenMP. |
I see. I will merge this PR. |
Multi thread call
cblas_sgemm
will cause a race condition inblas_server_omp.c
.Maybe #1416 is the same issue, but I'm not sure.
I only meet this issue when I use openblas in caffe with multi thread enabled.
This commit try to separate the buffer for each thread which call
exec_blas
in parallel.