Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Time Measurements Not Synchronized #1014

Open
bhpayne opened this issue Oct 28, 2023 · 7 comments
Open

Run Time Measurements Not Synchronized #1014

bhpayne opened this issue Oct 28, 2023 · 7 comments
Assignees

Comments

@bhpayne
Copy link

bhpayne commented Oct 28, 2023

When using print-timing-info, the "Run Loop Time" and "Build Time" can add up to a total time larger than the wall clock time.

This is because in start_simulation , start_build is set before the first barrier is called. This creates a situation where the first task is considered to be "simulating" while all other tasks are still building.

@bhpayne bhpayne changed the title "Run Loop Time" and "Build Time" can add up to more than the wall clock time. Run Time Measurements Not Synchronized Oct 28, 2023
@researcherben
Copy link

Update: our team hasn't run tests to ensure this assessment is correct, so the following may be more precise:

This is could be because in start_simulation...

@researcherben
Copy link

In start_simulation , start_build is set before the first barrier is called. This creates a situation where the first task is considered to be "simulating" while all other tasks are still building.

Is that understanding of how things run correct? The assumption is that you can subtract the build time from the total time to get the run time. If there is some part of sst-core metrics that I don't understand then I could be wrong and this isn't a bug.

@feldergast
Copy link
Contributor

The runtime timings in the core were put in to get an approximate idea of how the various phases of the runtime were split up. There is no guarantee that the numbers will add up exactly to the runtime. The new profiling points are there to provide more accurate profiling information. That being said, things could probably be improved with the timings provided with -v and --print-timing-info.

@jwilso
Copy link
Contributor

jwilso commented Mar 28, 2024

@researcherben Is this still an issue?

@researcherben
Copy link

@jwilso I'll check and get back you next week.

@bpswenson
Copy link
Contributor

bpswenson commented Apr 2, 2024

The breakdowns for timings in the code are as follows:

double start = sst_get_cpu_time();
...
double start_build = sst_get_cpu_time();
...
double start_run = sst_get_cpu_time();
info.build_time = start_run - start_build;
...
double end_time = sst_get_cpu_time();
info.run_time = end_time - start_run;
...
double total_end_time = sst_get_cpu_time();
double total_time = total_end_time - start;

Given that you're never going to get total_time = build_time + run_time even with 1 thread on 1 rank. When you throw in multiple ranks, threads it just grabs the max of each of those. max(run_time), max(total_time), max(build_time).

It may be a good idea to modify the output text somewhat to clear up any confusion.

@researcherben
Copy link

It may be a good idea to modify the output text somewhat to clear up any confusion.

The output lines are here https://github.com/sstsimulator/sst-core/blob/master/src/sst/core/main.cc#L1215 but that doesn't seem like a convenient place to provide the context.

things could probably be improved with the timings provided with -v and --print-timing-info.

Would the right place to make edits be https://github.com/sstsimulator/sst-core/blob/master/src/sst/core/config.cc#L891 ?

Maybe change the line

        "print-timing-info", 0, "Print SST timing information", std::bind(&ConfigHelper::setPrintTiming, this, _1),

to be something like

        "print-timing-info", 0, "Print SST timing information. Just an approximate idea of how the various phases of the runtime were split up. There is no guarantee that the numbers will add up exactly to the runtime. The new profiling points provide more accurate profiling information.", std::bind(&ConfigHelper::setPrintTiming, this, _1),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants