-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster + rotating file + large log file = crash #117
Comments
I think it's because I'm using cluster with multiple processes...it doesn't happen when I test with 1 process. How would you recommend using bunyan with cluster? |
If I name the log files with the pid in them like so: bunyanStreams = [{ I don't have issues with multiple processes, but now I have three log files, one for each process - the master and the two forked children. This sort of defeats the purpose of having the pid queryable in the log for me. Suggestions? |
Turns out naming the file with the pid in it doesn't work either under load. Suggestions? |
Found some time to look into this some more...things work fine with small log files but once you start getting into bigger log (hundreds of MB) files you get the error. Looking into using some sort of lock file solution to sync across the processes but it's hard because each process sets its own log rotate time independently. |
Couldn't get it to do what I want. Now I'm just sending bunyan output to stdout and using systemd service to send to journal. |
+1 Started seeing this on our larger scale apps. Cluster: 1 master, 4 children
|
I'm getting the same issue. It's causing my server to restart each time resulting in lost sessions. |
+1 This (or something very similar) just bit me with cluster and rotating-file:
|
Running in to this issue. Causing workers to crash and refork. |
The best solution I've found so far is to just use a single file for logging, and then setup logrotate with the copytruncate option. Gives you the same end result without pissing off cluster workers. |
It looks like winston supports rotation. How do they avoid this issue? |
@gabegorelick the last time I used winston their log-rotate didn't follow normal *nix file rotation standards. One of many reasons I'm using bunyan now. |
what I have in /etc/logrotate.d/my-node-app: /path/to/my-node-app/logs/* { |
+1 here. This started occurring for me as it scaled up |
+1 |
1 similar comment
+1 |
What is the status of this? Is there a plan to fix this bug because bunyan is awesome but if its going to crash an app every time a file rotates, thats no bueno. |
@apriendeau See @larrymyers comment on setting up logrotate. |
cool thanks. I will try it out. |
Hi All, If those who have hit this could provide some details -- like OS, node version, bunyan version -- that would be helpful. Also, a small script that reproduces this fairly easily would be gold for solving the issue. Personally I don't use cluster, nor Bunyan's own log rotation that much (I tend to use SmartOS's logadm tool for log rotation). Sorry I haven't been active on this ticket. |
Here is a repro script and log run:
I believe the issue is that the master and all the worker processes are trying to do the log rotation. It should just be the master. I'll try to verify I can fix it that way. |
This hit us last week and makes sense - we lost all of the workers when they tried to rotate. Our cluster code was poor (our master wasn't handling the error scenario). What we couldn't explain was why all of the workers died - if there was just one left, what process did it have to race against to rotate the logfile? The master. |
What if bunyan created a lock file on rotating the current log file? It'd act as a semaphore, so any competing workers would know not to proceed? |
Okay, I could "fix" the crashes in this case via:
I.e. only have the master (in a cluster) do file rotation. That means we don't get contention on moving/removing log files. However we are left with the major problem that only the master then recreates its file handle to the new log file. The worker processes still have a handle open to the now rotated "foo.log.0" file, and will continue to write to that file while the master writes to the newly created "foo.log". After N rotations the file to which all the workers are writing will have been unlinked. IOW, this is useless. A not very good answer would be to have the master signal the workers after rotation to reopen their streams (perhaps via
Basically I think the suggestion be that you don't attempt to use Bunyan's log file rotation with the same file used in multiple processes (i.e. the simple cluster case where the logger instance is created before
|
@nicholasf I'd prefer not to get into file locking (pretty hard to get right cross-platform, IME) in Bunyan. It would also leave a serious problem. Take this scenario:
This rotation by C is bad because it will result in a double rotation of files. "A" will have rotated "foo.log" to "foo.log.0" (and so on). Soon after "C" will rotate "foo.log" (now with only seconds or milliseconds of content) to "foo.log.0". |
Added a warning to the readme section on |
@trentm I think it's reasonable to say not to use log rotation when using node cluster. We're thinking we'll set up logrotate. Good call on not using the file locking. I don't think that'd be very elegant with Node in a race condition. |
@trentm, regarding your comment about the master signaling to the workers after file rotation to reopen their stream, would the workers need to reopen their stream if we instead have the master handle the logging and file rotation only? For example, something like this:
If each worker's stdout and stderr feed into the master, the streams wouldn't need to be reopened, right? I'm pretty new to nodejs as it is and wondering if there's any potential issues I am missing here. Thanks, |
@yousefj There's also the |
@gabegorelick, yes I left that out above but did include this flag as well. Would this theoretically be a solution for the cluster problem encountered above? |
What about using a specific For example, using this library: var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
var ipc = require('node-ipc')
let messages = []
if (cluster.isMaster)
{
ipc.config.id = 'world'
ipc.config.retry= 1500
ipc.config.rawBuffer=true
ipc.serve(
function(){
ipc.server.on(
'connect',
function(socket){
//ipc.server.emit(
// socket,
// 'hello'
//);
}
);
ipc.server.on(
'data',
function(data,socket){
ipc.log('got a message'.debug, data,data.toString());
log.info(data)
//ipc.server.emit(
// socket,
// 'goodbye'
//);
}
);
const log_configuration =
{
streams:
[{
type : 'rotating-file',
path : log_path,
period : '1d', // daily rotation
count : 3 // keep 3 back copies
}],
serializers:
{
error : bunyan.stdSerializers.err,
request : bunyan.stdSerializers.req,
response : bunyan.stdSerializers.res,
}
}
const log = bunyan.createLogger(extend({ name: name }, log_configuration))
// Fork workers.
for (var i = 0; i < numCPUs; i++)
{
cluster.fork();
}
cluster.on('exit', function(worker, code, signal)
{
console.log('worker ' + worker.process.pid + ' died');
});
}
);
ipc.server.start();
}
else
{
ipc.config.id = 'hello' + worker_id
ipc.config.retry= 1500
ipc.config.rawBuffer=true
ipc.connectTo(
'world',
function(){
ipc.of.world.on(
'connect',
function(){
ipc.log('## connected to world ##'.rainbow, ipc.config.delay);
const worker_output = new stream()
worker_output.writable = true
worker_output.write = data =>
{
// or some other thing like Thrift or ProtocolBuffers
ipc.of.world.emit(JSON.stringify(data))
}
const log_configuration =
{
streams:
[{
type: 'raw',
stream: worker_output
}],
serializers:
{
error : bunyan.stdSerializers.err,
request : bunyan.stdSerializers.req,
response : bunyan.stdSerializers.res,
}
}
const log = bunyan.createLogger(extend({ name: name }, log_configuration))
// Workers can share any TCP connection
// In this case it is an HTTP server
http.createServer(function(req, res)
{
res.writeHead(200)
res.end("hello world\n")
log.info('Test')
})
.listen(8000)
}
);
//ipc.of.world.on(
// 'data',
// function(data){
// ipc.log('got a message from world : '.debug, data,data.toString());
// }
//);
}
);
} |
Why is this closed on winstonjs link ? |
Running on linux...see this in my journal at exactly midnight:
Dec 19 17:00:00 surespot-node-2 coffee[19539]: events.js:72
Dec 19 17:00:00 surespot-node-2 coffee[19539]: events.js:72
Dec 19 17:00:00 surespot-node-2 coffee[19539]: throw er; // Unhandled 'error' event
Dec 19 17:00:00 surespot-node-2 coffee[19539]: ^
Dec 19 17:00:00 surespot-node-2 coffee[19539]: throw er; // Unhandled 'error' event
Dec 19 17:00:00 surespot-node-2 coffee[19539]: Error: ENOENT, rename 'logs/surespot.log'
Dec 19 17:00:00 surespot-node-2 coffee[19539]: ^
Dec 19 17:00:00 surespot-node-2 coffee[19539]: Error: ENOENT, rename 'logs/surespot.log'
The log files now look like this:
-rw-r--r-- 1 surespot users 10966716 Dec 19 16:59 surespot.log.2
-rw-r--r-- 1 surespot users 0 Dec 19 17:00 surespot.log.0
-rw-r--r-- 1 surespot users 58868418 Dec 20 08:11 surespot.log
The text was updated successfully, but these errors were encountered: