Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster + rotating file + large log file = crash #117

Closed
2fours opened this issue Dec 20, 2013 · 32 comments
Closed

cluster + rotating file + large log file = crash #117

2fours opened this issue Dec 20, 2013 · 32 comments

Comments

@2fours
Copy link

2fours commented Dec 20, 2013

Running on linux...see this in my journal at exactly midnight:

Dec 19 17:00:00 surespot-node-2 coffee[19539]: events.js:72
Dec 19 17:00:00 surespot-node-2 coffee[19539]: events.js:72
Dec 19 17:00:00 surespot-node-2 coffee[19539]: throw er; // Unhandled 'error' event
Dec 19 17:00:00 surespot-node-2 coffee[19539]: ^
Dec 19 17:00:00 surespot-node-2 coffee[19539]: throw er; // Unhandled 'error' event
Dec 19 17:00:00 surespot-node-2 coffee[19539]: Error: ENOENT, rename 'logs/surespot.log'
Dec 19 17:00:00 surespot-node-2 coffee[19539]: ^
Dec 19 17:00:00 surespot-node-2 coffee[19539]: Error: ENOENT, rename 'logs/surespot.log'

The log files now look like this:

-rw-r--r-- 1 surespot users 10966716 Dec 19 16:59 surespot.log.2
-rw-r--r-- 1 surespot users 0 Dec 19 17:00 surespot.log.0
-rw-r--r-- 1 surespot users 58868418 Dec 20 08:11 surespot.log

@2fours
Copy link
Author

2fours commented Dec 20, 2013

I think it's because I'm using cluster with multiple processes...it doesn't happen when I test with 1 process. How would you recommend using bunyan with cluster?

@2fours
Copy link
Author

2fours commented Dec 20, 2013

If I name the log files with the pid in them like so:

bunyanStreams = [{
type: 'rotating-file'
path: "logs/surespot_#{process.pid}.log"
count: 7
level: 'debug'
period: '6h'
}]

I don't have issues with multiple processes, but now I have three log files, one for each process - the master and the two forked children. This sort of defeats the purpose of having the pid queryable in the log for me. Suggestions?

@2fours
Copy link
Author

2fours commented Dec 21, 2013

Turns out naming the file with the pid in it doesn't work either under load. Suggestions?

@2fours
Copy link
Author

2fours commented Jan 17, 2014

Found some time to look into this some more...things work fine with small log files but once you start getting into bigger log (hundreds of MB) files you get the error. Looking into using some sort of lock file solution to sync across the processes but it's hard because each process sets its own log rotate time independently.

@2fours
Copy link
Author

2fours commented Jan 17, 2014

Couldn't get it to do what I want. Now I'm just sending bunyan output to stdout and using systemd service to send to journal.

@2fours 2fours closed this as completed Jan 17, 2014
@2fours 2fours reopened this Jan 17, 2014
@seanzx85
Copy link

+1 Started seeing this on our larger scale apps.

Cluster: 1 master, 4 children
Log Size: 23KB

{"name":"xxxx","hostname":"server1","pid":24054,"level":50,"msg":"uncaughtException: Error: ENOENT, rename '/logs/node/xxxx.log.4'","time":"2014-02-19T00:00:00.009Z","v":0}
{"name":"xxxx","hostname":"server1","pid":24054,"level":50,"msg":"Error: ENOENT, rename '/logs/node/xxxx.log.4'","time":"2014-02-19T00:00:00.010Z","v":0}

@dnndev
Copy link

dnndev commented Feb 27, 2014

I'm getting the same issue. It's causing my server to restart each time resulting in lost sessions.

@gabegorelick
Copy link

+1 This (or something very similar) just bit me with cluster and rotating-file:

events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: ENOENT, rename '/foo/bar.log.1'

events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: ENOENT, rename '/foo/bar.log.1'

@ryanrolds
Copy link

Running in to this issue. Causing workers to crash and refork.

@larrymyers
Copy link

The best solution I've found so far is to just use a single file for logging, and then setup logrotate with the copytruncate option. Gives you the same end result without pissing off cluster workers.

@gabegorelick
Copy link

It looks like winston supports rotation. How do they avoid this issue?

@rosskukulinski
Copy link

@gabegorelick the last time I used winston their log-rotate didn't follow normal *nix file rotation standards. One of many reasons I'm using bunyan now.

@larrymyers
Copy link

what I have in /etc/logrotate.d/my-node-app:

/path/to/my-node-app/logs/* {
daily
missingok
rotate 7
size 10M
notifempty
compress
delaycompress
copytruncate
}

@apriendeau
Copy link

+1 here. This started occurring for me as it scaled up

@mogadanez
Copy link

+1

1 similar comment
@nicholasf
Copy link

+1

@apriendeau
Copy link

What is the status of this? Is there a plan to fix this bug because bunyan is awesome but if its going to crash an app every time a file rotates, thats no bueno.

@gabegorelick
Copy link

@apriendeau See @larrymyers comment on setting up logrotate.

@apriendeau
Copy link

cool thanks. I will try it out.

@trentm
Copy link
Owner

trentm commented Jun 2, 2014

Hi All,

If those who have hit this could provide some details -- like OS, node version, bunyan version -- that would be helpful.

Also, a small script that reproduces this fairly easily would be gold for solving the issue.

Personally I don't use cluster, nor Bunyan's own log rotation that much (I tend to use SmartOS's logadm tool for log rotation). Sorry I haven't been active on this ticket.

@trentm
Copy link
Owner

trentm commented Jun 2, 2014

Here is a repro script and log run:
https://gist.github.com/trentm/d8e57a6d34084a64bbc8
Specifically you can see cases like:

mv /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log.1 /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log.2
     mv /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log.0 /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log.1
   throw er; // Unhandled 'error' event
              ^
mv /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log.0
open /Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log
Error: ENOENT, rename '/Users/trentm/tm/node-bunyan-0.x/test/log-rotate-with-cluster.log.5'

I believe the issue is that the master and all the worker processes are trying to do the log rotation. It should just be the master. I'll try to verify I can fix it that way.

@nicholasf
Copy link

This hit us last week and makes sense - we lost all of the workers when they tried to rotate. Our cluster code was poor (our master wasn't handling the error scenario). What we couldn't explain was why all of the workers died - if there was just one left, what process did it have to race against to rotate the logfile? The master.

@nicholasf
Copy link

What if bunyan created a lock file on rotating the current log file? It'd act as a semaphore, so any competing workers would know not to proceed?

@trentm
Copy link
Owner

trentm commented Jun 2, 2014

Okay, I could "fix" the crashes in this case via:

Index: /Users/trentm/tm/node-bunyan-0.x/lib/bunyan.js
index 1ea6ea4..7e77abe 100644
--- a/lib/bunyan.js
+++ b/lib/bunyan.js
@@ -1006,7 +1006,11 @@ function RotatingFileStream(options) {

     this.rotQueue = [];
     this.rotating = false;
-    this._setupNextRot();
+
+    var cluster = require('cluster');
+    if (cluster.isMaster) {
+        this._setupNextRot();
+    }
 }

I.e. only have the master (in a cluster) do file rotation. That means we don't get contention on moving/removing log files. However we are left with the major problem that only the master then recreates its file handle to the new log file. The worker processes still have a handle open to the now rotated "foo.log.0" file, and will continue to write to that file while the master writes to the newly created "foo.log". After N rotations the file to which all the workers are writing will have been unlinked. IOW, this is useless.

A not very good answer would be to have the master signal the workers after rotation to reopen their streams (perhaps via Logger.prototype.reopenFileStreams). However that has issues:

  • cluster IPC is app-specific so this signaling would intrude into app code
  • the current Bunyan stream design doesn't allow a stream to emit an event (e.g. a "rotated" event) on the Logger. This would be needed for signalling rotation to the master.
  • the file rotation takes time, during which log records can come in. In a single process that is handled on the RotatingFileStream via a rotQueue buffer. To handle that multi-process would require master <--> worker coordination on starting to buffer, then rotate, then reopen streams, then stop buffering. That's not really practical.

Basically I think the suggestion be that you don't attempt to use Bunyan's log file rotation with the same file used in multiple processes (i.e. the simple cluster case where the logger instance is created before cluster.fork()ing). Some suggestions:

  1. One idea (as @2fours hit on) is to have a separate file for each process (incorporating either the pid or a worker id into the log file name). That means multiple sets of files. Note that the Bunyan CLI does support merging multiple bunyan files in time order, so one could have a cron job that merges rotated files via something like bunyan /var/log/my-app-*.log.0 -o bunyan > /var/log/my-app.log.0; rm -f /var/log/my-app-*.log.0.
  2. Another idea (and my preferred) is to NOT using Bunyan's built-in file rotation and use a system log rotation tool (e.g. logrotate on Linux, logadm on SmartOS/Illumos) which supports copy-truncate rotation.

@trentm
Copy link
Owner

trentm commented Jun 2, 2014

@nicholasf I'd prefer not to get into file locking (pretty hard to get right cross-platform, IME) in Bunyan. It would also leave a serious problem. Take this scenario:

  • process A starts a rotation, acquires the lock
  • process B starts a rotation, but can't get the lock so skips rotation (good)
  • process A completes rotation and releases the lock
  • process C starts a rotation soon after, acquires the lock

This rotation by C is bad because it will result in a double rotation of files. "A" will have rotated "foo.log" to "foo.log.0" (and so on). Soon after "C" will rotate "foo.log" (now with only seconds or milliseconds of content) to "foo.log.0".

@trentm
Copy link
Owner

trentm commented Jun 2, 2014

Added a warning to the readme section on rotating-file abuot using cluster and the same rotating log file path for multiple processes. Calling this "done" for now, though it is unsatisfying. At the least I need to do #71 and pass through a stream error on the failing mv so that the process failing rotation doesn't crash. I'll leave that to #71.

@trentm trentm closed this as completed Jun 2, 2014
@nicholasf
Copy link

@trentm I think it's reasonable to say not to use log rotation when using node cluster. We're thinking we'll set up logrotate.

Good call on not using the file locking. I don't think that'd be very elegant with Node in a race condition.

@janajri
Copy link

janajri commented Jul 8, 2014

@trentm, regarding your comment about the master signaling to the workers after file rotation to reopen their stream, would the workers need to reopen their stream if we instead have the master handle the logging and file rotation only? For example, something like this:

var log = bunyan.createLogger(opts);

  for(var worker in cluster.workers){ 
    cluster.workers[worker].process.stdout.process_pid = cluster.workers[worker].process.pid;
    cluster.workers[worker].process.stdout.on('data', function(chunk) { 
        log.info( { pid: this.process_pid}, chunk.toString()); // or similar
    });
    cluster.workers[worker].process.stderr.on('data', function(chunk) {
        log.warn( { pid: this.process_pid},  chunk.toString()); 
    });
  }

If each worker's stdout and stderr feed into the master, the streams wouldn't need to be reopened, right? I'm pretty new to nodejs as it is and wondering if there's any potential issues I am missing here.

Thanks,
Yousef

@gabegorelick
Copy link

@yousefj There's also the silent option in cluster.setupMaster. It defaults to false, but setting it to true sends worker output to parent's stdio.

@janajri
Copy link

janajri commented Jul 10, 2014

@gabegorelick, yes I left that out above but did include this flag as well. Would this theoretically be a solution for the cluster problem encountered above?

@catamphetamine
Copy link

What about using a specific Log in workers which sends messages to the master via some kind of IPC.
And then there will be a single log file.
And noone will be forced to rely on 3rd party tools like logrotate.

For example, using this library:
https://github.com/RIAEvangelist/node-ipc

var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;

var ipc = require('node-ipc')

let messages = []

if (cluster.isMaster)
{
ipc.config.id   = 'world'
ipc.config.retry= 1500
ipc.config.rawBuffer=true

ipc.serve(
    function(){
        ipc.server.on(
            'connect',
            function(socket){
                //ipc.server.emit(
                //    socket,
                //    'hello'
                //);
            }
        );

        ipc.server.on(
            'data',
            function(data,socket){
                ipc.log('got a message'.debug, data,data.toString());

                log.info(data)

                //ipc.server.emit(
                //    socket,
                //    'goodbye'
                //);
            }
        );


    const log_configuration =
    {
        streams:
        [{
            type   : 'rotating-file',
            path   : log_path,
            period : '1d',            // daily rotation
            count  : 3                // keep 3 back copies
        }],
        serializers: 
        {
            error    : bunyan.stdSerializers.err,
            request  : bunyan.stdSerializers.req,
            response : bunyan.stdSerializers.res,
        }
    }

  const log = bunyan.createLogger(extend({ name: name }, log_configuration))

  // Fork workers.
  for (var i = 0; i < numCPUs; i++)
  {
    cluster.fork();
  }

  cluster.on('exit', function(worker, code, signal)
  {
    console.log('worker ' + worker.process.pid + ' died');
  });
    }
);

ipc.server.start();
}
else
{

ipc.config.id   = 'hello' + worker_id
ipc.config.retry= 1500
ipc.config.rawBuffer=true

ipc.connectTo(
    'world',
    function(){
        ipc.of.world.on(
            'connect',
            function(){
                ipc.log('## connected to world ##'.rainbow, ipc.config.delay);



  const worker_output = new stream()
  worker_output.writable = true

  worker_output.write = data =>
  {
    // or some other thing like Thrift or ProtocolBuffers
    ipc.of.world.emit(JSON.stringify(data))
  }

    const log_configuration = 
    {
        streams: 
        [{
            type: 'raw',
            stream: worker_output
        }],
        serializers: 
        {
            error    : bunyan.stdSerializers.err,
            request  : bunyan.stdSerializers.req,
            response : bunyan.stdSerializers.res,
        }
    }

  const log = bunyan.createLogger(extend({ name: name }, log_configuration))

  // Workers can share any TCP connection
  // In this case it is an HTTP server
  http.createServer(function(req, res)
  {
    res.writeHead(200)
    res.end("hello world\n")

    log.info('Test')
  })
  .listen(8000)


            }
        );

        //ipc.of.world.on(
        //    'data',
        //    function(data){
        //        ipc.log('got a message from world : '.debug, data,data.toString());
        //    }
        //);
    }
);
}

@mihir83in
Copy link

Why is this closed on winstonjs link ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests