-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: add domain post mortem #6159
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,299 @@ | ||
# Domain Module Postmortem | ||
|
||
## Usability Issues | ||
|
||
### Implicit Behavior | ||
|
||
It's possible for a developer to create a new domain and then simply run | ||
`domain.enter()`. Which then acts as a catch-all for any exception in the | ||
future that couldn't be observed by the thrower. Allowing a module author to | ||
intercept the exceptions of unrelated code in a different module. Preventing | ||
the originator of the code from knowing about its own exceptions. | ||
|
||
Here's an example of how one indirectly linked modules can affect another: | ||
|
||
```js | ||
// module a.js | ||
const b = require('./b'); | ||
const c = require('./c'); | ||
|
||
|
||
// module b.js | ||
const d = require('domain').create(); | ||
d.on('error', () => { /* silence everything */ }); | ||
d.enter(); | ||
|
||
|
||
// module c.js | ||
const dep = require('some-dep'); | ||
dep.method(); // Uh-oh! This method doesn't actually exist. | ||
``` | ||
|
||
Since module `b` enters the domain but never exits any uncaught exception will | ||
be swallowed. Leaving module `c` in the dark as to why it didn't run the entire | ||
script. Leaving a potentially partially populated `module.exports`. Doing this | ||
is not the same as listening for `'uncaughtException'`. As the latter is | ||
explicitly meant to globally catch errors. The other issue is that domains are | ||
processed prior to any `'uncaughtException'` handlers, and prevent them from | ||
running. | ||
|
||
Another issue is that domains route errors automatically if no `'error'` | ||
handler was set on the event emitter. There is no opt-in mechanism for this, | ||
and automatically propagates across the entire asynchronous chain. This may | ||
seem useful at first, but once asynchronous calls are two or more modules deep | ||
and one of them doesn't include an error handler the creator of the domain will | ||
suddenly be catching unexpected exceptions, and the thrower's exception will go | ||
unnoticed by the author. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A code sample here would also do wonders to explain this. Aside from that both examples are very good. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added. |
||
|
||
The following is a simple example of how a missing `'error'` handler allows | ||
the active domain to hijack the error: | ||
|
||
```js | ||
const domain = require('domain'); | ||
const net = require('net'); | ||
const d = domain.create(); | ||
d.on('error', (err) => console.error(err.message)); | ||
|
||
d.run(() => net.createServer((c) => { | ||
c.end(); | ||
c.write('bye'); | ||
}).listen(8000)); | ||
``` | ||
|
||
Even manually removing the connection via `d.remove(c)` does not prevent the | ||
connection's error from being automatically intercepted. | ||
|
||
Failures that plagues both error routing and exception handling are the | ||
inconsistencies in how errors are bubbled. The following is an example of how | ||
nested domains will and won't bubble the exception based on when they happen: | ||
|
||
```js | ||
const domain = require('domain'); | ||
const net = require('net'); | ||
const d = domain.create(); | ||
d.on('error', () => console.error('d intercepted an error')); | ||
|
||
d.run(() => { | ||
const server = net.createServer((c) => { | ||
const e = domain.create(); // No 'error' handler being set. | ||
e.run(() => { | ||
// This will not be caught by d's error handler. | ||
setImmediate(() => { | ||
throw new Error('thrown from setImmediate'); | ||
}); | ||
// Though this one will bubble to d's error handler. | ||
throw new Error('immediately thrown'); | ||
}); | ||
}).listen(8080); | ||
}); | ||
``` | ||
|
||
It may be expected that nested domains always remain nested, and will always | ||
propagate the exception up the domain stack. Or that exceptions will never | ||
automatically bubble. Unfortunately both these situations occur, leading to | ||
potentially confusing behavior that may even be prone to difficult to debug | ||
timing conflicts. | ||
|
||
|
||
### API Gaps | ||
|
||
While APIs based on using `EventEmitter` can use `bind()` and errback style | ||
callbacks can use `intercept()`, alternative APIs that implicitly bind to the | ||
active domain must be executed inside of `run()`. Meaning if module authors | ||
wanted to support domains using a mechanism alternative to those mentioned they | ||
must manually implement domain support themselves. Instead of being able to | ||
leverage the implicit mechanisms already in place. | ||
|
||
|
||
### Error Propagation | ||
|
||
Propagating errors across nested domains is not straight forward, if even | ||
possible. Existing documentation shows a simple example of how to `close()` an | ||
`http` server if there is an error in the request handler. What it does not | ||
explain is how to close the server if the request handler creates another | ||
domain instance for another async request. Using the following as a simple | ||
example of the failing of error propagation: | ||
|
||
```js | ||
const d1 = domain.create(); | ||
d1.foo = true; // custom member to make more visible in console | ||
d1.on('error', (er) => { /* handle error */ }); | ||
|
||
d1.run(() => setTimeout(() => { | ||
const d2 = domain.create(); | ||
d2.bar = 43; | ||
d2.on('error', (er) => console.error(er.message, domain._stack)); | ||
d2.run(() => { | ||
setTimeout(() => { | ||
setTimeout(() => { | ||
throw new Error('outer'); | ||
}); | ||
throw new Error('inner') | ||
}); | ||
}); | ||
})); | ||
``` | ||
|
||
Even in the case that the domain instances are being used for local storage so | ||
access to resources are made available there is still no way to allow the error | ||
to continue propagating from `d2` back to `d1`. Quick inspection may tell us | ||
that simply throwing from `d2`'s domain `'error'` handler would allow `d1` to | ||
then catch the exception and execute its own error handler. Though that is not | ||
the case. Upon inspection of `domain._stack` you'll see that the stack only | ||
contains `d2`. | ||
|
||
This may be considered a failing of the API, but even if it did operate in this | ||
way there is still the issue of transmitting the fact that a branch in the | ||
asynchronous execution has failed, and that all further operations in that | ||
branch must cease. In the example of the http request handler, if we fire off | ||
several asynchronous requests and each one then `write()`'s data back to the | ||
client many more errors will arise from attempting to `write()` to a closed | ||
handle. More on this in _Resource Cleanup on Exception_. | ||
|
||
|
||
### Resource Cleanup on Exception | ||
|
||
The script [`domain-resource-cleanup.js`](domain-resource-cleanup.js) | ||
contains a more complex example of properly cleaning up in a small resource | ||
dependency tree in the case that an exception occurs in a given connection or | ||
any of its dependencies. Breaking down the script into its basic operations: | ||
|
||
- When a new connection happens, concurrently: | ||
- Open a file on the file system | ||
- Open Pipe to unique socket | ||
- Read a chunk of the file asynchronously | ||
- Write chunk to both the TCP connection and any listening sockets | ||
- If any of these resources error, notify all other attached resources that | ||
they need to clean up and shutdown | ||
|
||
As we can see from this example a lot more must be done to properly clean up | ||
resources when something fails than what can be done strictly through the | ||
domain API. All that domains offer is an exception aggregation mechanism. Even | ||
the potentially useful ability to propagate data with the domain is easily | ||
countered, in this example, by passing the needed resources as a function | ||
argument. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For what it's worth, most complaints I've seen about deprecating domains are about context and not error handling. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are you referring to users using domains as a type of continuation local storage? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, very often people ask in StackOverflow for storage for the duration of a request. They complain about having to pass parameters to 6-7 functions. In my opinion passing the parameters around a-la-dependency-injection is a much better and safer idea than 'request/continuation/"thread" local storage' - but I'm just pointing out that people do that. That's just StackOverflow though, maybe it's a lot less common in the wild. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for clarifying. I know it's being used, but only by users. Haven't come across a module that uses it yet (though I also haven't looked very hard). I'll be happy to make a note of it. Think I made a note of it, but was mostly just cumbersome than anything. e.g. if another module did the same within your async branch then the domains would stack and There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. section added for this topic. |
||
|
||
One problem domains perpetuated was the supposed simplicity of being able to | ||
continue execution, contrary to what the documentation stated, of the | ||
application despite an unexpected exception. This example demonstrates the | ||
fallacy behind that idea. | ||
|
||
Attempting proper resource cleanup on unexpected exception becomes more complex | ||
as the application itself grows in complexity. This example only has 3 basic | ||
resources in play, and all of them with a clear dependency path. If an | ||
application uses something like shared resources or resource reuse the ability | ||
to cleanup, and properly test that cleanup has been done, grows greatly. | ||
|
||
In the end, in terms of handling errors, domains aren't much more than a | ||
glorified `'uncaughtException'` handler. Except with more implicit and | ||
unobservable behavior by third-parties. | ||
|
||
|
||
### Resource Propagation | ||
|
||
Another use case for domains was to use it to propagate data along asynchronous | ||
data paths. One problematic point is the ambiguity of when to expect the | ||
correct domain when there are multiple in the stack (which must be assumed if | ||
the async stack works with other modules). Also the conflict between being | ||
able to depend on a domain for error handling while also having it available to | ||
retrieve the necessary data. | ||
|
||
The following is a involved example demonstrating the failing using domains to | ||
propagate data along asynchronous stacks: | ||
|
||
```js | ||
const domain = require('domain'); | ||
const net = require('net'); | ||
|
||
const server = net.createServer((c) => { | ||
// Use a domain to propagate data across events within the | ||
// connection so that we don't have to pass arguments | ||
// everywhere. | ||
const d = domain.create(); | ||
d.data = { connection: c }; | ||
d.add(c); | ||
// Mock class that does some useless async data transformation | ||
// for demonstration purposes. | ||
const ds = new DataStream(dataTransformed); | ||
c.on('data', (chunk) => ds.data(chunk)); | ||
}).listen(8080, () => console.log(`listening on 8080`)); | ||
|
||
function dataTransformed(chunk) { | ||
// FAIL! Because the DataStream instance also created a | ||
// domain we have now lost the active domain we had | ||
// hoped to use. | ||
domain.active.data.connection.write(chunk); | ||
} | ||
|
||
function DataStream(cb) { | ||
this.cb = cb; | ||
// DataStream wants to use domains for data propagation too! | ||
// Unfortunately this will conflict with any domain that | ||
// already exists. | ||
this.domain = domain.create(); | ||
this.domain.data = { inst: this }; | ||
} | ||
|
||
DataStream.prototype.data = function data(chunk) { | ||
// This code is self contained, but pretend it's a complex | ||
// operation that crosses at least one other module. So | ||
// passing along "this", etc., is not easy. | ||
this.domain.run(function() { | ||
// Simulate an async operation that does the data transform. | ||
setImmediate(() => { | ||
for (var i = 0; i < chunk.length; i++) | ||
chunk[i] = ((chunk[i] + Math.random() * 100) % 96) + 33; | ||
// Grab the instance from the active domain and use that | ||
// to call the user's callback. | ||
const self = domain.active.data.inst; | ||
self.cb.call(self, chunk); | ||
}); | ||
}); | ||
}; | ||
``` | ||
|
||
The above shows that it is difficult to have more than one asynchronous API | ||
attempt to use domains to propagate data. This example could possibly be fixed | ||
by assigning `parent: domain.active` in the `DataStream` constructor. Then | ||
restoring it via `domain.active = domain.active.data.parent` just before the | ||
user's callback is called. Also the instantiation of `DataStream` in the | ||
`'connection'` callback must be run inside `d.run()`, instead of simply using | ||
`d.add(c)`, otherwise there will be no active domain. | ||
|
||
In short, for this to have a prayer of a chance usage would need to strictly | ||
adhere to a set of guidelines that would be difficult to enforce or test. | ||
|
||
|
||
## Performance Issues | ||
|
||
A significant deterrent from using domains is the overhead. Using node's | ||
built-in http benchmark, `http_simple.js`, without domains it can handle over | ||
22,000 requests/second. Whereas if it's run with `NODE_USE_DOMAINS=1` that | ||
number drops down to under 17,000 requests/second. In this case there is only | ||
a single global domain. If we edit the benchmark so the http request callback | ||
creates a new domain instance performance drops further to 15,000 | ||
requests/second. | ||
|
||
While this probably wouldn't affect a server only serving a few hundred or even | ||
a thousand requests per second, the amount of overhead is directly proportional | ||
to the number of asynchronous requests made. So if a single connection needs to | ||
connect to several other services all of those will contribute to the overall | ||
latency of delivering the final product to the client. | ||
|
||
Using `AsyncWrap` and tracking the number of times | ||
`init`/`pre`/`post`/`destroy` are called in the mentioned benchmark we find | ||
that the sum of all events called is over 170,000 times per second. This means | ||
even adding 1 microsecond overhead per call for any type of setup or tear down | ||
will result in a 17% performance loss. Granted, this is for the optimized | ||
scenario of the benchmark, but I believe this demonstrates the necessity for a | ||
mechanism such as domain to be as cheap to run as possible. | ||
|
||
|
||
## Looking Ahead | ||
|
||
The domain module has been soft deprecated since Dec 2014, but has not yet been | ||
removed because node offers no alternative functionality at the moment. As of | ||
this writing there is ongoing work building out the `AsyncWrap` API and a | ||
proposal for Zones being prepared for the TC39. At such time there is suitable | ||
functionality to replace domains it will undergo the full deprecation cycle and | ||
eventually be removed from core. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A code sample here would do wonders to explain this to people considering domains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.