Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load Imbalance #1

Open
jrhaberstroh opened this issue May 24, 2013 · 3 comments
Open

Load Imbalance #1

jrhaberstroh opened this issue May 24, 2013 · 3 comments

Comments

@jrhaberstroh
Copy link
Owner

The NVT equilibration (using cluster_equilibrate) of FMO currently suffers from 400% load imbalance on 72 cores -- 12 for PME. Cause not known.

@jrhaberstroh
Copy link
Owner Author

This question branches into more general optimization questions.

How should I balance PME order and PME cutoff? How many PME threads to use?

What is constraints = all-bonds about, and how should I modify lincs-order and lincs-iter to allow for greater parallelization?

What is the maximum number of threads I can use for a certain simulation, and how can I estimate it?

What is thread-MPI vs. MPI? OpenMP vs. MPI?

What is the verlet cutoff scheme vs. group scheme? Group uses the "charge groups" defined in the MD file, and is very efficient with water. Verlet is new, and works with CUDA and OpenMP.

@jrhaberstroh
Copy link
Owner Author

Bringing 12 PME cores to 24 PME cores brought 400% to 200%, but an additional increase to 48 PME cores (and 48 MD cores) had no additional benefit.

Changing pme-cutoff from .16 to .32 had no effect.
Changing pme-order from 4 to 10 had no effect.
Turning "free-energy=yes" to "free-energy=no" brought the imbalance to 2%, fixing the issue but leaving me confused. Is it more costly to use the B-state parameters? Is there a way to switch to those parameters without paying this imbalance?
"init-lambda" is no better than "init-lamdba-state".

@jrhaberstroh
Copy link
Owner Author

With an average imbalance of 330%:

NOTE: 37.3 % of the available CPU time was lost due to load imbalance
in the domain decomposition.

NOTE: 9.7 % performance was lost because the PME nodes
had less work to do than the PP nodes.
You might want to decrease the number of PME nodes
or decrease the cut-off and the grid spacing.

Maybe the nstdhdl=10 is forcing [-gcom 10]. But this does not make sense because the load imbalance persists even when equilibrating an excited state system, where nstdhdl = 0...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant