cluster + rotating file + large log file = crash #117

2fours · 2013-12-20T15:10:59Z

Running on linux...see this in my journal at exactly midnight:

Dec 19 17:00:00 surespot-node-2 coffee[19539]: events.js:72
Dec 19 17:00:00 surespot-node-2 coffee[19539]: events.js:72
Dec 19 17:00:00 surespot-node-2 coffee[19539]: throw er; // Unhandled 'error' event
Dec 19 17:00:00 surespot-node-2 coffee[19539]: ^
Dec 19 17:00:00 surespot-node-2 coffee[19539]: throw er; // Unhandled 'error' event
Dec 19 17:00:00 surespot-node-2 coffee[19539]: Error: ENOENT, rename 'logs/surespot.log'
Dec 19 17:00:00 surespot-node-2 coffee[19539]: ^
Dec 19 17:00:00 surespot-node-2 coffee[19539]: Error: ENOENT, rename 'logs/surespot.log'

The log files now look like this:

-rw-r--r-- 1 surespot users 10966716 Dec 19 16:59 surespot.log.2
-rw-r--r-- 1 surespot users 0 Dec 19 17:00 surespot.log.0
-rw-r--r-- 1 surespot users 58868418 Dec 20 08:11 surespot.log

2fours · 2013-12-20T16:01:40Z

I think it's because I'm using cluster with multiple processes...it doesn't happen when I test with 1 process. How would you recommend using bunyan with cluster?

2fours · 2013-12-20T16:09:05Z

If I name the log files with the pid in them like so:

bunyanStreams = [{
type: 'rotating-file'
path: "logs/surespot_#{process.pid}.log"
count: 7
level: 'debug'
period: '6h'
}]

I don't have issues with multiple processes, but now I have three log files, one for each process - the master and the two forked children. This sort of defeats the purpose of having the pid queryable in the log for me. Suggestions?

2fours · 2013-12-21T14:35:59Z

Turns out naming the file with the pid in it doesn't work either under load. Suggestions?

2fours · 2014-01-17T00:23:02Z

Found some time to look into this some more...things work fine with small log files but once you start getting into bigger log (hundreds of MB) files you get the error. Looking into using some sort of lock file solution to sync across the processes but it's hard because each process sets its own log rotate time independently.

2fours · 2014-01-17T21:32:54Z

Couldn't get it to do what I want. Now I'm just sending bunyan output to stdout and using systemd service to send to journal.

seanzx85 · 2014-02-19T15:57:33Z

+1 Started seeing this on our larger scale apps.

Cluster: 1 master, 4 children
Log Size: 23KB

{"name":"xxxx","hostname":"server1","pid":24054,"level":50,"msg":"uncaughtException: Error: ENOENT, rename '/logs/node/xxxx.log.4'","time":"2014-02-19T00:00:00.009Z","v":0}
{"name":"xxxx","hostname":"server1","pid":24054,"level":50,"msg":"Error: ENOENT, rename '/logs/node/xxxx.log.4'","time":"2014-02-19T00:00:00.010Z","v":0}

dnndev · 2014-02-27T00:06:28Z

I'm getting the same issue. It's causing my server to restart each time resulting in lost sessions.

gabegorelick · 2014-03-06T00:06:19Z

+1 This (or something very similar) just bit me with cluster and rotating-file:

events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: ENOENT, rename '/foo/bar.log.1'

events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: ENOENT, rename '/foo/bar.log.1'

ryanrolds · 2014-03-21T16:49:14Z

Running in to this issue. Causing workers to crash and refork.

larrymyers · 2014-03-21T17:02:47Z

The best solution I've found so far is to just use a single file for logging, and then setup logrotate with the copytruncate option. Gives you the same end result without pissing off cluster workers.

gabegorelick · 2014-03-21T17:53:31Z

It looks like winston supports rotation. How do they avoid this issue?

rosskukulinski · 2014-03-21T17:55:23Z

@gabegorelick the last time I used winston their log-rotate didn't follow normal *nix file rotation standards. One of many reasons I'm using bunyan now.

larrymyers · 2014-03-21T18:04:57Z

what I have in /etc/logrotate.d/my-node-app:

/path/to/my-node-app/logs/* {
daily
missingok
rotate 7
size 10M
notifempty
compress
delaycompress
copytruncate
}

apriendeau · 2014-03-30T00:03:52Z

+1 here. This started occurring for me as it scaled up

mogadanez · 2014-04-28T05:39:47Z

+1

nicholasf · 2014-05-30T00:43:03Z

+1

apriendeau · 2014-05-30T19:06:41Z

What is the status of this? Is there a plan to fix this bug because bunyan is awesome but if its going to crash an app every time a file rotates, thats no bueno.

gabegorelick · 2014-05-30T22:09:24Z

@apriendeau See @larrymyers comment on setting up logrotate.

apriendeau · 2014-05-31T00:44:52Z

cool thanks. I will try it out.

trentm · 2014-06-02T04:03:04Z

Hi All,

If those who have hit this could provide some details -- like OS, node version, bunyan version -- that would be helpful.

Also, a small script that reproduces this fairly easily would be gold for solving the issue.

Personally I don't use cluster, nor Bunyan's own log rotation that much (I tend to use SmartOS's logadm tool for log rotation). Sorry I haven't been active on this ticket.

trentm · 2014-06-02T04:59:30Z

Here is a repro script and log run:
https://gist.github.com/trentm/d8e57a6d34084a64bbc8
Specifically you can see cases like:

mv /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log.1 /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log.2
     mv /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log.0 /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log.1
   throw er; // Unhandled 'error' event
              ^
mv /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log.0
open /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log
Error: ENOENT, rename '/Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log.5'

I believe the issue is that the master and all the worker processes are trying to do the log rotation. It should just be the master. I'll try to verify I can fix it that way.

nicholasf · 2014-06-02T05:31:28Z

This hit us last week and makes sense - we lost all of the workers when they tried to rotate. Our cluster code was poor (our master wasn't handling the error scenario). What we couldn't explain was why all of the workers died - if there was just one left, what process did it have to race against to rotate the logfile? The master.

nicholasf · 2014-06-02T05:34:26Z

What if bunyan created a lock file on rotating the current log file? It'd act as a semaphore, so any competing workers would know not to proceed?

trentm · 2014-06-02T06:19:31Z

Okay, I could "fix" the crashes in this case via:

Index: /Users/trentm/tm/node-bunyan-0.x/lib/bunyan.js
index 1ea6ea4..7e77abe 100644
--- a/lib/bunyan.js
+++ b/lib/bunyan.js
@@ -1006,7 +1006,11 @@ function RotatingFileStream(options) {

     this.rotQueue = [];
     this.rotating = false;
-    this._setupNextRot();
+
+    var cluster = require('cluster');
+    if (cluster.isMaster) {
+        this._setupNextRot();
+    }
 }

I.e. only have the master (in a cluster) do file rotation. That means we don't get contention on moving/removing log files. However we are left with the major problem that only the master then recreates its file handle to the new log file. The worker processes still have a handle open to the now rotated "foo.log.0" file, and will continue to write to that file while the master writes to the newly created "foo.log". After N rotations the file to which all the workers are writing will have been unlinked. IOW, this is useless.

A not very good answer would be to have the master signal the workers after rotation to reopen their streams (perhaps via Logger.prototype.reopenFileStreams). However that has issues:

cluster IPC is app-specific so this signaling would intrude into app code
the current Bunyan stream design doesn't allow a stream to emit an event (e.g. a "rotated" event) on the Logger. This would be needed for signalling rotation to the master.
the file rotation takes time, during which log records can come in. In a single process that is handled on the RotatingFileStream via a rotQueue buffer. To handle that multi-process would require master <--> worker coordination on starting to buffer, then rotate, then reopen streams, then stop buffering. That's not really practical.

Basically I think the suggestion be that you don't attempt to use Bunyan's log file rotation with the same file used in multiple processes (i.e. the simple cluster case where the logger instance is created before cluster.fork()ing). Some suggestions:

One idea (as @2fours hit on) is to have a separate file for each process (incorporating either the pid or a worker id into the log file name). That means multiple sets of files. Note that the Bunyan CLI does support merging multiple bunyan files in time order, so one could have a cron job that merges rotated files via something like bunyan /var/log/my-app-*.log.0 -o bunyan > /var/log/my-app.log.0; rm -f /var/log/my-app-*.log.0.
Another idea (and my preferred) is to NOT using Bunyan's built-in file rotation and use a system log rotation tool (e.g. logrotate on Linux, logadm on SmartOS/Illumos) which supports copy-truncate rotation.

trentm · 2014-06-02T06:26:51Z

@nicholasf I'd prefer not to get into file locking (pretty hard to get right cross-platform, IME) in Bunyan. It would also leave a serious problem. Take this scenario:

process A starts a rotation, acquires the lock
process B starts a rotation, but can't get the lock so skips rotation (good)
process A completes rotation and releases the lock
process C starts a rotation soon after, acquires the lock

This rotation by C is bad because it will result in a double rotation of files. "A" will have rotated "foo.log" to "foo.log.0" (and so on). Soon after "C" will rotate "foo.log" (now with only seconds or milliseconds of content) to "foo.log.0".

trentm · 2014-06-02T06:38:54Z

Added a warning to the readme section on rotating-file abuot using cluster and the same rotating log file path for multiple processes. Calling this "done" for now, though it is unsatisfying. At the least I need to do #71 and pass through a stream error on the failing mv so that the process failing rotation doesn't crash. I'll leave that to #71.

nicholasf · 2014-06-02T06:47:57Z

@trentm I think it's reasonable to say not to use log rotation when using node cluster. We're thinking we'll set up logrotate.

Good call on not using the file locking. I don't think that'd be very elegant with Node in a race condition.

janajri · 2014-07-08T18:11:24Z

@trentm, regarding your comment about the master signaling to the workers after file rotation to reopen their stream, would the workers need to reopen their stream if we instead have the master handle the logging and file rotation only? For example, something like this:

var log = bunyan.createLogger(opts);

  for(var worker in cluster.workers){ 
    cluster.workers[worker].process.stdout.process_pid = cluster.workers[worker].process.pid;
    cluster.workers[worker].process.stdout.on('data', function(chunk) { 
        log.info( { pid: this.process_pid}, chunk.toString()); // or similar
    });
    cluster.workers[worker].process.stderr.on('data', function(chunk) {
        log.warn( { pid: this.process_pid},  chunk.toString()); 
    });
  }

If each worker's stdout and stderr feed into the master, the streams wouldn't need to be reopened, right? I'm pretty new to nodejs as it is and wondering if there's any potential issues I am missing here.

Thanks,
Yousef

gabegorelick · 2014-07-09T20:34:06Z

@yousefj There's also the silent option in cluster.setupMaster. It defaults to false, but setting it to true sends worker output to parent's stdio.

janajri · 2014-07-10T22:06:28Z

@gabegorelick, yes I left that out above but did include this flag as well. Would this theoretically be a solution for the cluster problem encountered above?

catamphetamine · 2015-10-26T12:56:18Z

What about using a specific Log in workers which sends messages to the master via some kind of IPC.
And then there will be a single log file.
And noone will be forced to rely on 3rd party tools like logrotate.

For example, using this library:
https://github.com/RIAEvangelist/node-ipc

var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;

var ipc = require('node-ipc')

let messages = []

if (cluster.isMaster)
{
ipc.config.id   = 'world'
ipc.config.retry= 1500
ipc.config.rawBuffer=true

ipc.serve(
    function(){
        ipc.server.on(
            'connect',
            function(socket){
                //ipc.server.emit(
                //    socket,
                //    'hello'
                //);
            }
        );

        ipc.server.on(
            'data',
            function(data,socket){
                ipc.log('got a message'.debug, data,data.toString());

                log.info(data)

                //ipc.server.emit(
                //    socket,
                //    'goodbye'
                //);
            }
        );


    const log_configuration =
    {
        streams:
        [{
            type   : 'rotating-file',
            path   : log_path,
            period : '1d',            // daily rotation
            count  : 3                // keep 3 back copies
        }],
        serializers: 
        {
            error    : bunyan.stdSerializers.err,
            request  : bunyan.stdSerializers.req,
            response : bunyan.stdSerializers.res,
        }
    }

  const log = bunyan.createLogger(extend({ name: name }, log_configuration))

  // Fork workers.
  for (var i = 0; i < numCPUs; i++)
  {
    cluster.fork();
  }

  cluster.on('exit', function(worker, code, signal)
  {
    console.log('worker ' + worker.process.pid + ' died');
  });
    }
);

ipc.server.start();
}
else
{

ipc.config.id   = 'hello' + worker_id
ipc.config.retry= 1500
ipc.config.rawBuffer=true

ipc.connectTo(
    'world',
    function(){
        ipc.of.world.on(
            'connect',
            function(){
                ipc.log('## connected to world ##'.rainbow, ipc.config.delay);



  const worker_output = new stream()
  worker_output.writable = true

  worker_output.write = data =>
  {
    // or some other thing like Thrift or ProtocolBuffers
    ipc.of.world.emit(JSON.stringify(data))
  }

    const log_configuration = 
    {
        streams: 
        [{
            type: 'raw',
            stream: worker_output
        }],
        serializers: 
        {
            error    : bunyan.stdSerializers.err,
            request  : bunyan.stdSerializers.req,
            response : bunyan.stdSerializers.res,
        }
    }

  const log = bunyan.createLogger(extend({ name: name }, log_configuration))

  // Workers can share any TCP connection
  // In this case it is an HTTP server
  http.createServer(function(req, res)
  {
    res.writeHead(200)
    res.end("hello world\n")

    log.info('Test')
  })
  .listen(8000)


            }
        );

        //ipc.of.world.on(
        //    'data',
        //    function(data){
        //        ipc.log('got a message from world : '.debug, data,data.toString());
        //    }
        //);
    }
);
}

mihir83in · 2021-02-08T15:40:07Z

Why is this closed on winstonjs link ?

2fours closed this as completed Jan 17, 2014

2fours reopened this Jan 17, 2014

larrymyers mentioned this issue Mar 18, 2014

bunyan rotating-file and cluster causes error on rotation. IncisentLabs/node-hummel#1

Closed

trentm added a commit that referenced this issue Jun 2, 2014

#117 add warning about using rotating-file and cluster together

2c2459d

trentm closed this as completed Jun 2, 2014

jsolis mentioned this issue Apr 7, 2015

added periodic-file logging #239

Closed

ghost mentioned this issue Dec 5, 2015

added dailyRotateFile transport. winstonjs/winston#205

Closed

gigouni mentioned this issue Jul 31, 2018

Crash "Error: ENOENT: no such file or directory" Rcomian/bunyan-rotating-file-stream#21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster + rotating file + large log file = crash #117

cluster + rotating file + large log file = crash #117

2fours commented Dec 20, 2013

2fours commented Dec 20, 2013

2fours commented Dec 20, 2013

2fours commented Dec 21, 2013

2fours commented Jan 17, 2014

2fours commented Jan 17, 2014

seanzx85 commented Feb 19, 2014

dnndev commented Feb 27, 2014

gabegorelick commented Mar 6, 2014

ryanrolds commented Mar 21, 2014

larrymyers commented Mar 21, 2014

gabegorelick commented Mar 21, 2014

rosskukulinski commented Mar 21, 2014

larrymyers commented Mar 21, 2014

apriendeau commented Mar 30, 2014

mogadanez commented Apr 28, 2014

nicholasf commented May 30, 2014

apriendeau commented May 30, 2014

gabegorelick commented May 30, 2014

apriendeau commented May 31, 2014

trentm commented Jun 2, 2014

trentm commented Jun 2, 2014

nicholasf commented Jun 2, 2014

nicholasf commented Jun 2, 2014

trentm commented Jun 2, 2014

trentm commented Jun 2, 2014

trentm commented Jun 2, 2014

nicholasf commented Jun 2, 2014

janajri commented Jul 8, 2014

gabegorelick commented Jul 9, 2014

janajri commented Jul 10, 2014

catamphetamine commented Oct 26, 2015

mihir83in commented Feb 8, 2021

cluster + rotating file + large log file = crash #117

cluster + rotating file + large log file = crash #117

Comments

2fours commented Dec 20, 2013

2fours commented Dec 20, 2013

2fours commented Dec 20, 2013

2fours commented Dec 21, 2013

2fours commented Jan 17, 2014

2fours commented Jan 17, 2014

seanzx85 commented Feb 19, 2014

dnndev commented Feb 27, 2014

gabegorelick commented Mar 6, 2014

ryanrolds commented Mar 21, 2014

larrymyers commented Mar 21, 2014

gabegorelick commented Mar 21, 2014

rosskukulinski commented Mar 21, 2014

larrymyers commented Mar 21, 2014

apriendeau commented Mar 30, 2014

mogadanez commented Apr 28, 2014

nicholasf commented May 30, 2014

apriendeau commented May 30, 2014

gabegorelick commented May 30, 2014

apriendeau commented May 31, 2014

trentm commented Jun 2, 2014

trentm commented Jun 2, 2014

nicholasf commented Jun 2, 2014

nicholasf commented Jun 2, 2014

trentm commented Jun 2, 2014

trentm commented Jun 2, 2014

trentm commented Jun 2, 2014

nicholasf commented Jun 2, 2014

janajri commented Jul 8, 2014

gabegorelick commented Jul 9, 2014

janajri commented Jul 10, 2014

catamphetamine commented Oct 26, 2015

mihir83in commented Feb 8, 2021