Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiler improvements #2724

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

andresovela
Copy link
Contributor

@andresovela andresovela commented Oct 18, 2024

This PR adds a few improvements to the MicroProfiler class.

bug=#2725

First, I made it a template so that it's possible to adjust the maximum number of events the profiler can handle via a template parameter.

The motivation for this change is to save memory. #1835 increased the maximum number of events from 1024 to 4096, which is a huge waste of RAM. The MicroProfiler class has these buffers:

const char* tags_[kMaxEvents];
uint32_t start_ticks_[kMaxEvents];
uint32_t end_ticks_[kMaxEvents];

struct TicksPerTag {
  const char* tag;
  uint32_t ticks;
};

TicksPerTag total_ticks_per_tag[kMaxEvents];

We need 20 bytes per event, so an instance of MicroProfiler uses 80 KB of RAM, even if the model you're profiling has only a few nodes.

This PR also adds the following enum to specify the logging format and remove some duplicated code and clean the API.

enum class MicroProfilerLogFormat {
  HumanReadable,
  Csv,
};

I replaced LogTicksPerTagCsv() with LogGrouped(), which does essentially the same thing, but also prints in a human readable format which looks like this:

Cumulative event times:
Count    Tag                              Ticks        Time
6        DEPTHWISE_CONV_2D                3085702      5.194 ms
2        SUM                              1281206      2.156 ms
66       FULLY_CONNECTED                  934142       1.572 ms
10       CONV_2D                          765357       1.288 ms
67       QUANTIZE                         731923       1.232 ms
20       CONCATENATION                    362212       0.609 ms
76       STRIDED_SLICE                    335828       0.565 ms
3        RELU                             245643       0.413 ms
4        DEQUANTIZE                       156682       0.263 ms
6        PAD                              145834       0.245 ms
29       ADD                              130229       0.219 ms
65       RESHAPE                          49593        0.083 ms
13       MUL                              48852        0.082 ms
4        SUB                              16371        0.027 ms
6        LOGISTIC                         10336        0.017 ms
3        TANH                             4761         0.008 ms
3        UNPACK                           2738         0.004 ms

Total time: 13.985 ms (8307409 ticks)

Finally I also added TicksToUs() to micro_time.h for convenience.

@andresovela andresovela requested a review from a team as a code owner October 18, 2024 13:11
// sections. This can be used in conjunction with running the relevant micro
// benchmark to evaluate end-to-end performance.
template <int MAX_EVENTS>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be template <int MAX_EVENTS = 4096> so that all existing applications using the profiler do not need to be changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 107 to 118
case MicroProfilerLogFormat::Csv:
MicroPrintf("\"Event\",\"Tag\",\"Ms\",\"Ticks\"");
for (int i = 0; i < num_events_; ++i) {
#if defined(HEXAGON) || defined(CMSIS_NN)
int ticks = end_ticks_[i] - start_ticks_[i];
MicroPrintf("%d,%s,%u,%d", i, tags_[i], TicksToMs(ticks), ticks);
#else
uint32_t ticks = end_ticks_[i] - start_ticks_[i];
uint64_t us = TicksToUs(ticks);
MicroPrintf("%d,%s,%" PRIu64 ".%" PRIu64 ",%" PRIu32, i, tags_[i],
us / 1000, us % 1000, ticks);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the CSV format probably won't pass review, as it most likely breaks countless internal Google, and external third-party scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the Ms column

const char* tag;
uint32_t ticks;
uint32_t tag_count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only used by the "human readable" log format.

Copy link
Contributor Author

@andresovela andresovela Oct 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should I do about that? #1290 introduced

  struct TicksPerTag {
    const char* tag;
    uint32_t ticks;
  };
  // In practice, the number of tags will be much lower than the number of
  // events. But it is theoretically possible that each event to be unique and
  // hence we allow total_ticks_per_tag to have kMaxEvents entries.
  TicksPerTag total_ticks_per_tag[kMaxEvents];

which was "only" used by the CSV log format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andresovela

The comment is really just there for @suleshahid to see.

Comment on lines 209 to 219
void SortTagGroups() {
for (int i = 0; i < num_tag_groups_ - 1; ++i) {
for (int j = i + 1; j < num_tag_groups_; ++j) {
if (tag_groups_[j].ticks > tag_groups_[i].ticks) {
TagGroup temp_tag_group = tag_groups_[i];
tag_groups_[i] = tag_groups_[j];
tag_groups_[j] = temp_tag_group;
}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use standard C++ library algorithms (as long as they don't require memory allocation). Here std::sort should be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 36 to 49
virtual uint32_t BeginEvent(const char* tag) override;
uint32_t BeginEvent(const char* tag) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an override of the abstract base class, and should remain as such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 45 to 71
virtual void EndEvent(uint32_t event_handle) override;
void EndEvent(uint32_t event_handle) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an override of the abstract base class, and should remain as such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// Prints the profiling information of each of the events in human readable
// form.
void Log() const;
void Log(MicroProfilerLogFormat format) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a default argument of MicroProfilerLogFormt::Csv so that all existing applications continue to compile without modification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why CSV though? This method logs in human readable by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added HumanReadable as default argument since that's what would keep the existing behavior while continuing to compile


// Prints the profiling information of each of the events in CSV (Comma
// Separated Value) form.
void LogCsv() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogCsv should be retained, and should call Log(MicroProfilerFormat::Csv), so that existing applications do not require modification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 159 to 165
case MicroProfilerLogFormat::Csv: {
MicroPrintf("\"Tag\",\"Total ms\",\"Total ticks\"");
for (int i = 0; i < num_tag_groups_; ++i) {
uint64_t us = TicksToUs(tag_groups_[i].ticks);
MicroPrintf("%s, %" PRIu64 ".%" PRIu64 ", %u", tag_groups_[i].tag,
us / 1000, us % 1000, tag_groups_[i].ticks);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the CSV format probably won't pass review, as it most likely breaks countless internal Google, and external third-party scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the Total ms column

@ddavis-2015
Copy link
Member

@andresovela

Thanks for your PR submission (and all the spelling corrections!).

This will require some discussion with @suleshahid. I like the template change for sizing the timing data.

// Prints total ticks for each unique tag in CSV format.
// Output will have one row for each unique tag along with the
// total ticks summed across all events with that particular tag.
void LogTicksPerTagCsv();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method cannot be removed as it breaks existing applications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a policy to never break stuff like this? IMO duplicating APIs like this is also not great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andresovela Let's just say, that at the present time, no one is allowed to break anything. 🙂

@andresovela andresovela closed this Nov 2, 2024
@andresovela andresovela deleted the profiling-improvements branch November 2, 2024 03:32
@andresovela andresovela restored the profiling-improvements branch November 4, 2024 12:16
@andresovela
Copy link
Contributor Author

Oops, closed by mistake.

@andresovela andresovela reopened this Nov 4, 2024
Copy link
Contributor

"This PR is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

@github-actions github-actions bot added the Stale label Dec 15, 2024
@andresovela
Copy link
Contributor Author

Waiting for review

@ddavis-2015 ddavis-2015 removed the Stale label Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants