You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Therefore, the code is limited by the cache memory, and for large problems, as every process (worker) needs a copy of all the variables the memory gets full and the problems is then not cpu bounded anymore (as we get no speed up at all).
Depending on the function that is used, it seems that the problem gets worse (some functions must use more memory space). Gmres seems to worsen the cache memory limitations but sparse solve doesn't.
Multithreading would solve this problem as all parallelized steps use the same variables (shared memory) and they are only used for reading purposes (no access exclusion therefore speed up should be seen).
2.- The python code is not fast enough: the slowest parts of the code are: scipy gmres and _calculateMatrix function. The gmres code has internally a python loop (that could be the cause of it being slower). A solution could be using a compiled code (or maybe they really can't be faster).
The text was updated successfully, but these errors were encountered:
Our code doesn't speed up enough to be competitive with the scipy solver. Two problems here:
1.- The code doesn't speed up: The problem is the cache memory limitation, because we are using multiprocessing. See last comment (mine) of this post:
http://stackoverflow.com/questions/29358872/inefficient-multiprocessing-of-numpy-based-calculations
Therefore, the code is limited by the cache memory, and for large problems, as every process (worker) needs a copy of all the variables the memory gets full and the problems is then not cpu bounded anymore (as we get no speed up at all).
Depending on the function that is used, it seems that the problem gets worse (some functions must use more memory space). Gmres seems to worsen the cache memory limitations but sparse solve doesn't.
Multithreading would solve this problem as all parallelized steps use the same variables (shared memory) and they are only used for reading purposes (no access exclusion therefore speed up should be seen).
2.- The python code is not fast enough: the slowest parts of the code are: scipy gmres and _calculateMatrix function. The gmres code has internally a python loop (that could be the cause of it being slower). A solution could be using a compiled code (or maybe they really can't be faster).
The text was updated successfully, but these errors were encountered: