-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
frequent socket hangups #116
Comments
Yeah we've run into this very sporadically as well. Not really sure where the blame goes either. |
my gut says node, because this would be completely unacceptable availability for such a service, 1/5 concurrent requests fails, but if it really is s3's backlog denying connections or similar then.. wtf.. lol I know some kernels will silently drop denied sockets without any notice so it could be that |
I just found a "solution" on the nodejs mailing list that I'd never seen before: https://groups.google.com/d/msg/nodejs/kYnfJZeqGZ4/uHVOfFneroAJ If someone gets this reproducibly it's worth trying some of those fixes. |
forgot about this, closing until we're running this portion in prod because it might just be my crappy local canadian connection haha |
still a problem in prod, either knox is busted, or node is busted. I'll try and take a closer look at the packets soon and see wtf is going on |
Some of the recent changes in Node 0.8.20 look related; might be worth giving it a shot. |
oh really? which ones? |
From http://blog.nodejs.org/2013/02/15/node-v0-8-20-stable/
Hmm not sure. |
hmm worth a try ill update our node |
no dice |
This is not just happening with knox but anything with socket.io. I am not a node expert by any means, but i reproduce this error when a client closes the window and does not warn socket. ie: Android v4.0.2, closing a tab that is listening to sockets on the tab manager, does not send a disconnect or window.onbeforeunload event, not doing so, makes another request from another browser try to send to that hung up socket crashing the server with "socket hang up" error |
@7Ds7 that particular case is pretty much expected behavior, as outlined here. If the user hangs up on your socket, you of course will get a socket hang up error. I'm pretty sure Knox either returns event emitters that you can listen to the |
I'm able to reproduce this in our code. Didn't have time to isolate in a test case yet.
|
Wow interesting find Dan On Fri, Feb 22, 2013 at 1:57 PM, Dan Milon [email protected] wrote:
Guillermo Rauch |
I've encountered this as well, with the same timeout error reported by @danmilon. My original use case was piping an HTTP response directly into S3 using putStream. I'd been using that for ~2 months and have never seen this issue. I ran into this for the first time today after I switched to putFile (I needed to add some local pre-processing so I now write to a temp file first). Not sure if there's a difference between putFile and putStream or it's purely coincidence. I'm using somewhat old versions of knox (0.4.2) and Node (0.8.6). I'll update to the latest and let you all know if I see this again. |
I have this error reproducible very stable using node v0.8.21 And I think I have nailed the issue: it happens if the maxSockets of the agent is lower than the amount of requests we are doing. If I set https.globalAgent.maxSockets = 50; and do 50 parallel requests - after some seconds the error will be there. If I do 40 parallel requests - I am able to download thousands of files from S3. Possible solutions:
|
Possibly node could throw something more meaningful than socket hang up in this special case? |
or the queueing logic is somewhere wrong in node ...? |
@kof, could you share the code to reproduce this? |
its a script with some dependencies to the main project, I need to reduce it to the pure reproducible snippet .... but I can post the original script if somebody wants to play with it. |
setting agent=false solves the issue for me too. |
setting agent=false? Can you explain? @substack says the problem can be solved by calling |
I mean agent=false on request options: https://github.com/LearnBoost/knox/blob/master/lib/client.js#L139 http://nodejs.org/docs/v0.8.21/api/all.html#all_http_request_options_callback There is no documented option pool=false, but agent=false will do exactly this thing: "false: opts out of connection pooling with an Agent, defaults request to Connection: close." This will fix the issue, while it will be possible to open unlimited amount of sockets to the same host, where one can run into OS limits. I suppose this is the reason why maxSockets option exists.
It looks like the right way would be to pass an own Agent instance with a good default maxSockets value, which will work for all OSs, but when the limit is reiched, nodes pull logic will be an issue again. |
I misspoke earlier. { agent: false } is the correct thing. You should basically never, ever use anything other than { agent: false } in any program ever. The default value in core is completely absurd. |
@substack can you link to more information about this? sounds like you're onto something but obviously the casual reader should look more into it before taking your word at face value |
fwiw we're not using node for this anymore but even increasing the max sockets to a very high number fixed nothing, so it seems like the comment about it being related the pooling logic could be right |
I just had the problem again. I wrote a script which has to process 17k images. Now I got it working without hangups. Hangups do still happen, but I just use retry https://npmjs.org/package/retry Without retry I had up to 80 hangups for 17k requests. I am not sure if knox might want to use retry in the client. But it might be a good idea. |
I think Node.js itself maintains a connection pool, and that's part of what the That isn't actually the same problem as in this thread, which discusses "socket hang up" errors. See the OP. That is a classic error when you set an incorrect Content-Length; I'd try asking the multiparty maintainers. Maybe you're trying to upload something smaller than 5 MB? (Not sure how multiparty works.) Sorry to hear you're still having this problem. I wonder if they fixed it in later versions of Node? It'd be worth trying on Node.js 0.10.4 if you can, or at least 0.8.20 per above. |
The crash was out fault. Simply forgetting to listen to an error event. I will see if I can get things on 0.8.20, but for now we have something written by one of our guys, https://github.com/jergason/intimidate, which retries with back-off. We hit s3 so often I would not be surprised to see failed requests every so often but this has worked well so far. |
Thanks @domenic. I am following up on this issue on the multiparty repo |
@addisonj that's awesome! Submit a pull request to add it to our "Beyond Knox" section in the readme :). |
Per discussion in issue Automattic#116, adding a blurb about intimidate, a wrapper for retriable uploads with exponential backoff, to the Beyond Knox section of the readme.
Anyone have a good way to reproduce this? We are getting this issue occasionally (i.e. once every few days on a lightly used production machine), but I can't reproduce deliberately in any environment. I will try some of the fixes here and see if they end the issue, but it would be great to have a way to make sure. |
@stalbot if you can create a reliable reproduce I will jump all over fixing this. |
Disabling the http agent seems like a way for some people to shoot themselves in the foot. If you're not careful and make too many requests (e.g., doing a
And wonder WTF.... Therefore I made sure the agent was re-enabled by doing (e.g., |
I assume that's on a Mac, which has a pretty horrible global limit on file descriptors? |
Yes that was on a Mac. |
@dweinstein just bump your limit for maximum # of sockets |
I can reproduce this by creating a 1000 empty text files |
is there a limit for concurrency?
|
The only limit I've seen documented is the one mentioned above. But that pertains to reusing a connection, which I'm not doing. |
FWIW s3 has a concurrent access limit on like-named prefixes. I can't find the thread but apparently due to how they store things if you have say "foo-{1,10000}" and try say 1500 concurrent requests many will fail, but if you have "{1,100000}-foo" it should be fine. This is screwing us pretty hard right now, looking at replacing s3 all together and just using Riak for our primary access and storing in s3 as a backup |
I found this http://stackoverflow.com/questions/27392923/uploading-to-s3-with-node-knox-socket-hang-up when searching for a solution into this problem. Looks like a lot of people have this problem with our good ol' friend S3. var req = client.putStream(res, elem._id, headers,function(err,s3res){
if(err) console.log(err);
console.log(s3res);
}).end(); This was the last comment. Worked for the guy who posted the solution. |
The following solved it for me:
Some suggest |
I know this isn't StackOverflow, but for anyone coming into this issue in the future, make sure you call IE this will return a socket hangup...
and this will not
A pretty obvious coding error, but one that I usually end up running into when I add Knox to a new project. 😆 |
FWIW, I've had this problem when I was passing the request object into a callback function in Loopback. Long story short, the only thing that "fixed" the problem was to do |
<3 node
looking into it, seems ridiculous to accuse s3 here, but it wouldn't surprise me either way
The text was updated successfully, but these errors were encountered: