You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NCCL profiler defines hierarchical structure context -> group -> task -> op -> step.
I found when parent handle is not set, sub-event's won't get invoked. I suggest to just invoke sub-event if the eActivationMask is set, no matter the parent handle is set or not.
For instance the example profiler:
on start group event, if the plugin's groupPool is fully used, plan->groupEventHandle will be null
__hidden ncclResult_t watchdogProfilerStartEvent(void* context, void** eHandle, ncclProfilerEventDescr_v1_t* eDescr) {
*eHandle = NULL;
structcontext* ctx = (structcontext *)context;
if (eDescr->type == ncclProfileGroup) {
structgroup* event;
int groupId = __atomic_fetch_add(&ctx->groupPoolIndex, 1, __ATOMIC_RELAXED);
if ((groupId - __atomic_load_n(&ctx->groupPoolBase, __ATOMIC_RELAXED)) < groupPoolSize) {
// if there are available group events grab one// ...
} else {
// else drop this event__atomic_fetch_sub(&ctx->groupPoolIndex, 1, __ATOMIC_RELAXED);
return ncclSuccess; //----------------------- it doesn't set *eHandle, so it will be NULL
}
on start task event, as plan->groupEventHandle is null, start task event of plugin won't get invoked
ncclResult_t ncclProfilerStartTaskEvents(structncclKernelPlan* plan) {
TIME_START_EVENT(taskStart);
if (__builtin_expect(ncclProfiler != NULL, 0)) {
int enable = eActivationMaskGroup & (ncclProfileProxyOp | ncclProfileProxyStep | ncclProfileColl);
if (plan->groupEventHandle && enable) { //---------------------- this condition is falsestructncclTaskColl* ct = ncclIntruQueueHead(&plan->collTaskQueue);
while (ct) {
// ...
ncclProfiler->startEvent(plan->comm->profilerContext, &ct->eventHandle, &eDescr); // ----------- plugin method not called// update collective task with group event activation mask
ct->eActivationMask = eActivationMaskGroup; //---------------------- activation mask won't be passed down
ct = ct->next;
}
structncclTaskP2p* pt = ncclIntruQueueHead(&plan->p2pTaskQueue);
while (pt) {
// ...
ncclProfiler->startEvent(plan->comm->profilerContext, &pt->eventHandle, &eDescr); // ----------- plugin method not called// update collective task with group event activation mask
pt->eActivationMask = eActivationMaskGroup; //---------------------- activation mask won't be passed down
pt = pt->next;
}
The text was updated successfully, but these errors were encountered:
NCCL profiler defines hierarchical structure context -> group -> task -> op -> step.
I found when parent handle is not set, sub-event's won't get invoked. I suggest to just invoke sub-event if the eActivationMask is set, no matter the parent handle is set or not.
For instance the example profiler:
groupPool
is fully used,plan->groupEventHandle
will be nullplan->groupEventHandle
is null, start task event of plugin won't get invokedThe text was updated successfully, but these errors were encountered: