-
Notifications
You must be signed in to change notification settings - Fork 66
Conversation
Is this a follow up on "trace events" as introduced in nodejs/node#9304? I'm generally against using the term tracing formally, it can mean so many things. For example we already have:
all of which does some kind of tracing. If this API integrates with |
Yes, the Background section of the proposal doc links to that "trace events" PR. As also noted in the doc, the module name "tracing" was already reserved for this purpose long ago. I don't feel strongly about it personally, certainly we can go with whatever name is agreed by consensus. |
const scopeId = tracing.emit(category, { eventType: 'begin', name: 'myTask' }); | ||
tracing.emit(category, { eventType: 'end', name: 'myTask', id: scopeId }); | ||
|
||
tracing.emit(category, { eventType: 'count', name: 'myCounter', value: 3 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This emit function is different enough from EventEmitter.emit that I'd prefer it was named differently so as to not confuse new users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should support only the alternate API? All the conditional validation needed for this unified function is a bit worrying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this should not be called emit
at all. Maybe record
?
00X-tracing.md
Outdated
emit( | ||
category: string | string[], | ||
name: string, | ||
args?: { [name: string]: any } | () => { [name: string]: any }): boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should call this instant rather than emit. Multi-signature methods risk unforeseen problems, as the buffer issue has proven.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, the polymorphism of multi-signature methods has implications for optimization, which raises performance concerns. Historically, this has led to problems for APM and other diagnostic solutions that are intended to be embedded in production applications. In general, the simpler / closer to monomorphic the APIs can be made, the better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this method would be better named instant
. (Or perhaps timeStamp
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for instant
because it matches the terms used in the trace format descriptions. It makes it easier to find it elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to instant
in a new commit.
|
||
To support listeners, the `Tracing` class (from the previous section) extends a new | ||
`CategoryEventEmitter` base class. That class is similar to `EventEmitter`, but does not extend | ||
`EventEmitter` because it has different behavior related to the multi-category support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, similar yet different enough from a typical event emitter that I worry it'd be confusing to new users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered trying to make it more different from EventEmitter
, but it felt silly to use different names for operations that are really the same as the normal EventEmitter
methods.
I think the differences probably should not really be a problem. You can in fact use a CategoryEventEmitter
(which the Tracing
module extends) just like a normal EventEmitter
if you don't care about the multi-category support.
Would it help to think of CategoryEventEmitter
as a subclass of EventEmitter
? (I considered literally extending the class, but it didn't provide any benefit since all the methods needed to be overridden to use different data structures from the base class.)
00X-tracing.md
Outdated
tracing.emit(['example2', 'example3'], 'info', 'Hello tracing!'); | ||
``` | ||
|
||
### 3.2 Listern API specification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Listen*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in a new commit.
|
||
The existing Node.js implementation of those Console tracing methods simply logs tracing events to | ||
the console. This proposal extends those methods with a new optional category parameter to enable | ||
emitting categorized tracing events in a way that is backward-compatible with existing console APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of deviating from the behaviour of the console API spec. If they decide to add their own extra argument in the future, suddenly we have problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood; I haven't entirely convinced myself that it is a good idea either. That's why I made it an optional part of the proposal. It might be nice for some scenarios, so I'd like to at least get some feedback about the idea of integrating with the console tracing APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 👎 in adding any complexity to console
.
While I agree that treating those three APIs as separate concerns that can be implemented separately, I think it's helpful to treat emitting and listening of trace data as duals for a single abstraction. As someone who, like @Qard, has worked on APM products in the past, the challenge for me was that there have been a variety of strategies for emitting trace events, but no reliable way to observe that data across platforms and deployment environments (in particular, PaaS and containerization). In practical terms, it feels like it's a lot easier to hit performance and overhead targets for the emitting rather than the listening side, and coming up with an efficient in-process interface for capturing that data would be the biggest win for a whole class of use cases. |
I'd like to see documentation about functionality this introduces that would be impossible to implement on top of async hooks (the async hooks EP is up to date). |
I will update the doc to explain how this proposal relates to async hooks. I expect tracing instrumentation of node core can leverage async hooks for tracing of async operations, so in that regard the features are complementary. Async hooks provide a way to hook into the beginning and end of async operations built on libuv primitives. Of course those async operations are a key subset of performance-related events. This proposal is about a way to capture and/or listen to all kinds of performance-related events, including non-async begin/end, point-in-time, and counters. Additionally and most importantly it provides a very high-performance way to write the events out to a trace log file. |
Is this somehow related to the browser's User Timing API? See https://developer.mozilla.org/en-US/docs/Web/API/User_Timing_API |
@targos It is not directly related, but I'm aware of it. I think it would not be a good fit to implement the User Timing API directly in node, because it only has a subset of the functionality (does not support counters, categories, or any mechanism for turning off the tracing) and it does not have the right event-listener model. If desired, it should be straightforward to create a polyfill for the User Timing API that would use this proposed Node.js tracing API underneath. |
@jasongin I should have clarified. I specifically meant in regards to the JS API. I'm fine placing the trace event hooks in native source. I already have for some testing and they work great. My concern is the JS API. Even if the call does nothing but return immediately there's still enough overhead to be noticeable. Especially when the function call is cross module (V8 has a harder time optimizing those for some reason). For async hooks to incur the least amount of overhead possible in similar situations I had to do some things I'm rather not proud of (see https://git.io/vDJq9 as an example). So I'd like to avoid placing anything directly in JS. I'd like to see if the JS trace events could be optional async hooks that are enabled when those specific trace events are requested. |
The main purpose of a JS API for tracing is to enable people to instrument their JavaScript code. While instrumenting internal async handles and other internal constructs may also provide useful data, that is only a piece of the performance picture and is not the focus of this proposal.
Only if your definition of "noticeable" is extremely generous. When tracing is completely off, any of the tracing event-emitter functions will literally be a no-op function, the equivalent of:
Even if the JIT cannot completely optimize-out the function call, it is still going to be extremely low-cost, I think small enough to not be noticeable in any reasonable sense. Then of course when tracing gets enabled, either by turning on recording or adding a listener for any category, the emit functions will get replaced with functions that need to do a little bit of work, but are still fast enough not to have a significant impact on most applications. |
I spent around 1.5 week figuring out the best way to optimize for calls to timers and nextTick in my async hooks PR. The reason it was so important to improve is because node doesn't code for the average performance case. We code it for the fast case. Which means measuring against a high volume scenario. Take a simple http server and add hooks when resources are created and also add timing around each resource's callback. In a high volume scenario, let's say a proxy (and these numbers are small compared to what F5 pushes through node), and do 45k req/sec. Just tracking the mentioned locations you're going to add ~1.5 million calls/sec. That's ~1.5 million noops in JS that have now been added. Which aren't actually noops because V8 can't completely optimize them out. At that volume I'm not being "extremely generous". I'm being practical. Looking at the code from the link I posted you'd see that all the |
I appreciate the passion for performance. I am working on some benchmarks involving tracing in realistic scenarios. I will follow up when I have data ready to share. |
const category = 'example'; | ||
tracing.enableRecording(category); | ||
|
||
tracing.emit(category, { eventType: 'instant', name: 'myEvent' }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could tracing
expose constants to pass in as eventType instead of expecting particular strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm waiting for this, but I think there are still some nits to be fixed in this EP.
const scopeId = tracing.emit(category, { eventType: 'begin', name: 'myTask' }); | ||
tracing.emit(category, { eventType: 'end', name: 'myTask', id: scopeId }); | ||
|
||
tracing.emit(category, { eventType: 'count', name: 'myCounter', value: 3 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this should not be called emit
at all. Maybe record
?
* @returns Nonzero event id (either provided or generated) if the event was emitted, or null if | ||
* the event was not emitted because tracing was not enabled for any of the event categories. | ||
*/ | ||
emit(category: string | string[], e: TracingEvent): number | null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is redundant, maybe we can try to avoid overloading as much as possible as it is harder to optimize, and this API should try to be as fast as possible.
* This interface has methods that are similar to EventEmitter, with an important difference: | ||
* events may be emitted under multiple categories simultaneously instead of a single event name, | ||
* and listeners may listen to multiple categories while still only being called once per event. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is similar but different, it should not be called EventEmitter
, but something else.
|
||
The existing Node.js implementation of those Console tracing methods simply logs tracing events to | ||
the console. This proposal extends those methods with a new optional category parameter to enable | ||
emitting categorized tracing events in a way that is backward-compatible with existing console APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 👎 in adding any complexity to console
.
which way may want to adapt to Node.js eventually. And there has been discussion of a more compact | ||
binary format. This is an area that needs some further development, however for performance reasons | ||
the tracing output probably would not be exposed to JavaScript anyway, beyond the listener APIs | ||
that are proposed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some time should be taken to design a processing pipeline for these. I think some tools and documentation to explain how to process this data format. Without interactive and easy-to-understand visualisations, tracing data is extremely hard to understand.
I should be straightforward to plot this data into a flamegraph for example.
https://github.com/brycebaril/transaction-tracer is a proposal that @brycebaril put forward a few years ago coming out the Tracing Summit held a few years ago. Might be good to connect to some of that work. @othiym23 and @trevnorris you were involved in that too weren't you? Was there any other documentation or notes that came out of that which might be useful? |
I'm pretty sure @brycebaril's notes were the most complete; @groundwater acted as our facilitator, and may have recorded some of the outcomes, but I don't know how extensive his notes were, given that he was facilitating. @tjfontaine and @misterdjules were our hosts at the Joyent Vancouver offices and had extensive contributions to make around our efforts to come up with something that merged both APM and dynamic tracing along the lines of DTrace and ETW. I believe @Qard was also at that meeting as well, representing his APM-vendor employers of the time (sorry, Stephen, I'm blanking on what company that was temporarily). I don't think @trevnorris made it to that particular tracing meeting, although he was certainly integral to the discussions that @groundwater, @tjfontaine, and I were having earlier that year. Trevor was at the error-handling "summit" convened by @dshaw earlier that year, at which @davepacheco was also present. As Trevor's involvement in those discussions eventually produced I'll double-check my notebooks from the time, but my recollection is that I didn't keep detailed notes, in part because the discussion was largely focused on establishing the problem space more than solutions, in part because I'd moved on from APM at the time and was more just trying to act as a kind of expert witness. |
FWIW, my notes from the meeting were: General observations
What some APM providers do/needNew Relic:
Strongloop:
Appneta:
Concurix:
General comments:
Conclusion of the "problem discussion"
ImplementationConcerns/Questions:
Observations:
Proposed API by TJ/Jacob (for a simple solution):
Amendments by Jacob:API: do not need to create the probe explicitely. This is done by the API Concerns
|
The challenge that APM vendors faced with coming up with a generic API was that we all had slightly different needs for data collection. That's a big part of why I've been pursuing #56 as a way to apply instrumentation more abstractly. |
Thanks for the detailed notes, @misterdjules, and (belated) apologies for tagging you into the discussion! |
Closing, but by all means re-open or comment if that isn't right. |
@nodejs/diagnostics and anyone else interested, please review this proposal draft.