-
-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Haraka is throwing errors when the client sends QUIT too early #1410
Comments
Hi Dave, nice to hear from you! (for other Haraka devs reading this - I used to work at the company using Knowing a bit about what goes on in hdmailer, it's related to handling We wouldn't process the QUIT from the client if the connection was paused. I thought I had put enough code into hdmailer to support this - I guess not. You can see what is required by looking into the attachment plugin: Alternatively just contact me personally (you have my email) and I'll take As a side note, I'd love to improve all of this hackery. But it's a big On Fri, Apr 8, 2016 at 6:21 PM, Dave Stibrany [email protected]
|
Hey Matt! Great to hear from you as well. I had to end up going in and making some changes to the plugin, because we now support html emails. So we need to try and decide if an attachment is a true attachment or if it's part of an html email and making that decision turns out to be pretty hard. Anyways, looks like this bug was introduced when I added that functionality. I'll try adding some more logging around pause/resume as you suggested. Thanks! Dave |
You could probably just ignore attachments with a content-id header, no? On Mon, Apr 11, 2016 at 10:37 AM, Dave Stibrany [email protected]
|
I think the problem was that some legitimate attachments still have a content-id, like when somebody drags an image into a gmail compose window, gmail treats it like an inline attachment. |
Isn't Content-Disposition more appropriate here. e.g. Content-Disposition: attachment ... |
A lot of clients seem to mark them as 'inline', iPhone's mail app for example. |
Not all clients add the content disposition either. Isn't email fun :)
|
Processing attachments correctly has been crazy. I had to write some heuristic to make a decision, and in cases where we're not sure, get both the html body and the attachments. That way we're at least somewhat guaranteed not to miss anything. |
You'll also want to upgrade to 2.8 as soon as you can because it handles the headers/file names more correctly.
|
Nice, thanks! |
Hey Matt, We're having this issue pop up again. We end up reading the client's QUIT too early and then throw errors. I'm able to reproduce this locally against the stock "attachment.js" plugin via netcat:
See the attached bad.email file. I'm pretending that the file is a zip so that it goes through the unarchive process and also note that I needed to add a \r\n after the final dot to get it to work in netcat. Do you have an suggestions on how to work around this? This issue only happens with 2 or 3 clients, but happens consistently for those clients. Some additional info: |
There's no gap between the headers and the body. Is that intentional?
…On Tue, Mar 21, 2017 at 11:15 AM, Dave Stibrany ***@***.***> wrote:
Hey Matt,
We're having this issue pop up again. We end up reading the client's QUIT
too early and then throw errors. I'm able to reproduce this locally against
the stock "attachment.js" plugin via netcat:
$ cat broken.txt | nc -i 1 localhost 25
See the attached bad.email file. I'm pretending that the file is a zip so
that it goes through the unarchive process and also note that I needed to
add a \r\n after the final dot to get it to work in netcat.
broken.txt <https://github.com/haraka/Haraka/files/858574/broken.txt>
Do you have an suggestions on how to work around this? This issue only
happens with 2 or 3 clients, but happens consistently for those clients.
Some additional info:
OS: Ubuntu 14.04
Node version: v4.6.0
Haraka version: 2.8.12
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1410 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAobY5aTCbUcrxt01yQ-XLOCbOlcIUbKks5rn-l0gaJpZM4IDbym>
.
|
Damn, the no gap was unintentional. With the gap the test case now totally passes both on the stock attachment.js and our own plugin. I'll see if I can find another test case that breaks our app. I pushed a change to how we do pause/resume of the stream, so I'll see if that makes this go away. |
In fairness it shouldn't crash Haraka, even with no gap. Is it doing that?
…On Tue, Mar 21, 2017 at 12:29 PM, Dave Stibrany ***@***.***> wrote:
Damn, the no gap was unintentional. With the gap the test case now totally
passes both on the stock attachment.js and our own plugin. I'll see if I
can find another test case that breaks our app.
I pushed a change to how we do pause/resume of the stream, so I'll see if
that makes this go away.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1410 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAobY9Rfy2PBHgjeqoxPtC5F-jrC16J6ks5rn_rjgaJpZM4IDbym>
.
|
Not a hard crash, just lots of errors:
and then we send a 421 Internal Server Error |
OK, but that error also shouldn't occur. I'll have a look anyway.
…On Tue, Mar 21, 2017 at 1:28 PM, Dave Stibrany ***@***.***> wrote:
Not a hard crash, just lots of errors:
[ERROR] [BA8E40D0-3D19-4233-8C54-386388B00007.1] [core] cmd_quit failed: Error: We are already running hooks! Fatal error!
[ERROR] [-] [core] undefinedError: We are already running hooks! Fatal error!0[ 'Error: We are already running hooks! Fatal error!',
' at Object.plugins.run_hooks (/home/vagrant/.nvm/versions/node/v4.6.0/lib/node_modules/Haraka/plugins.js:396:15)',
' at Connection.cmd_quit (/home/vagrant/.nvm/versions/node/v4.6.0/lib/node_modules/Haraka/connection.js:1279:13)',
' at Connection.process_line (/home/vagrant/.nvm/versions/node/v4.6.0/lib/node_modules/Haraka/connection.js:319:29)',
' at Connection._process_data (/home/vagrant/.nvm/versions/node/v4.6.0/lib/node_modules/Haraka/connection.js:464:18)',
' at Immediate._onImmediate (/home/vagrant/.nvm/versions/node/v4.6.0/lib/node_modules/Haraka/connection.js:680:37)',
' at processImmediate [as _immediateCallback] (timers.js:383:17)' ]
and then we send a 421 Internal Server Error
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1410 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAobY9lp0wFNP8qa7ijJPLKzIPIlziPlks5roAjFgaJpZM4IDbym>
.
|
Awesome, thanks Matt. |
Hey, I've managed to create another minimal test case that breaks both our app and the attachment plugin. The key was having an attachment size >= 16384 bytes, and sending a QUIT right after the final dot. If I reduce the attachment size down to 16383, everything works fine.
|
Bump on this, I think it might be a legitimate issue and am wondering if you guys can confirm. Running the above test case against the stock attachment plugin should reproduce it. |
@dstibrany - looks like an easy fix. Can you change this line https://github.com/haraka/Haraka/blob/master/plugins.js#L402 to
And let me know that it fixes this. |
@smfreegard That fix prevents all of the "'Error: We are already running hooks!" errors from being thrown, but the attachment is then not processed in my test case. Haraka reads the QUIT too early, then disconnects, at which point the data_post hooks do not run. I updated the test case; my previous version was using a content-type of application/pdf which was skipping the unarchive code. The new version has a content-type of application/zip; if you send it in to the attachment plugin as is (with the above fix you provided), it skips the attachment. If you remove one line of text from the body, it processes the attachment and works fine. This issue seems to be related to having a body size > 16k and having the client send the QUIT early. Let me know if there's any more info I can provide or any way that I can help. |
@dstibrany - Ok reading back, your 'test-case' is broken though. You're making the SMTP Client send the ending dot and then immediately pipelining a QUIT straight after which a real SMTP client would never do. What should happen in that case is that Haraka shouldn't deliver the message because the client didn't wait for the response, it shouldn't process the attachment because the remote end has disconnected, so it would be wasting it's time to do so. The 16Kb thing is probably what was already buffered by the kernel in a single packet, so splitting it would mean the QUIT would likely be received in a separate packet, allowing it to proceed through the attachment hooks before the QUIT is seen by Haraka. In your test case - if this were a real SMTP client, it should retry the message again because it doesn't know the message was delivered, because it never received the '250' in response to sending the ending '.', so it should be retried again. So - there's either a bug in your code, or, your attachment hooks are taking so long (e.g. usually this would be > 30s) that the remote clients are dropping the connections on a timeout. Looking at your logs should tell you this e.g.: Apr 11 04:07:06 mail1-ewh haraka[25302]: [NOTICE] [544C61F5-BE96-41AD-BBCD-895F18082605.1] [core] message mid="1820796063.60037570.1491898011346.JavaMail.rocketman@push-dispatcher4" size=89417 rcpts=1/0/0 delay=10.825 code=CONT msg="" or - if you don't have a message entry (this will only fire after hook_data_post completes): Apr 11 04:07:11 mail1-ewh haraka[25302]: [NOTICE] [544C61F5-BE96-41AD-BBCD-895F18082605.1] [core] disconnect ip=50.115.211.91 rdns="mta391s1.r.groupon.com" helo="mta391s1.r.groupon.com" relay=N early=N esmtp=Y tls=Y pipe=N errors=0 txns=1 rcpts=1/0/0 msgs=1/0/0 bytes=89417 lr="" time=20.197 |
I still think this is a bug, not a broken test case. Yes the client SHOULD
wait for the response to final dot, but even if it doesn't we shouldn't be
processing the QUIT command before processing attachments finishes,
assuming the connection is paused/resumed properly.
I haven't had time to look into it yet. Maybe this easter weekend.
…On Tue, Apr 11, 2017 at 4:40 AM, Steve Freegard ***@***.***> wrote:
@dstibrany <https://github.com/dstibrany> - Ok reading back, your
'test-case' is broken though. You're making the SMTP Client send the ending
dot and then immediately pipelining a QUIT straight after which a real SMTP
client would never do. What should happen in that case is that Haraka
shouldn't deliver the message because the client didn't wait for the
response, it shouldn't process the attachment because the remote end has
disconnected, so it would be wasting it's time to do so.
The 16Kb thing is probably what was already buffered by the kernel in a
single packet, so splitting it would mean the QUIT would likely be received
in a separate packet, allowing it to proceed through the attachment hooks
before the QUIT is seen by Haraka.
In your test case - if this were a real SMTP client, it should retry the
message again because it doesn't know the message was delivered, because it
never received the '250' in response to sending the ending '.', so it
should be retried again.
So - there's either a bug in your code, or, your attachment hooks are
taking so long (e.g. usually this would be > 30s) that the remote clients
are dropping the connections on a timeout. Looking at your logs should tell
you this e.g.:
Apr 11 04:07:06 mail1-ewh haraka[25302]: [NOTICE] [544C61F5-BE96-41AD-BBCD-895F18082605.1]
[core] message mid="1820796063.60037570.1491898011346.JavaMail.
***@***.***" size=89417 rcpts=1/0/0 *delay=10.825*
code=CONT msg=""
or - if you don't have a message entry (this will only fire after
hook_data_post completes):
Apr 11 04:07:11 mail1-ewh haraka[25302]: [NOTICE] [544C61F5-BE96-41AD-BBCD-895F18082605.1]
[core] disconnect ip=50.115.211.91 rdns="mta391s1.r.groupon.com" helo="
mta391s1.r.groupon.com" relay=N early=N esmtp=Y tls=Y pipe=N errors=0
txns=1 rcpts=1/0/0 msgs=1/0/0 bytes=89417 lr="" *time=20.197*
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1410 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAobYwYxQ6rKM5zQDRwOPgVSBUo1NH8Vks5ruzx_gaJpZM4IDbym>
.
|
Is it possible that the client is doing something bizarre? This issue does not happen with > 99% of our users, but happens consistently for those ones. Here's an example, the attachment processing took < 1s but the client still appears to be sending the QUIT early, or has already sent it.
|
@dstibrany - that log excerpt shows that the client immediately sends QUIT after the dot, there is not even a seconds delay whilst it waits for the response. Whatever is doing that is terminally busted IMO and isn't a real SMTP client. |
Yep it's super frustrating. Originally we had one customer whose client was behaving this way and I wrote it off as a one-off error that I'll never encounter again; then recently we had a second customer who exhibits the same behaviour so I started looking into it again. What do you mean by "isn't a real SMTP client", though? I'm fairly sure that these users aren't doing anything fancy and are using some sort of email client to send this stuff in (although maybe the client they're using is busted). |
The clients aren't doing anything wrong. See
https://tools.ietf.org/html/rfc2920 section 4 the example even shows
pipelining of the QUIT command.
We have a bug.
…On Tue, Apr 11, 2017 at 3:20 PM, Dave Stibrany ***@***.***> wrote:
Yep it's super frustrating. Originally we had one customer whose client
was behaving this way and I wrote it off as a one-off error that I'll never
encounter again; then recently we had a second customer who exhibits the
same behaviour so I started looking into it again.
What do you mean by "isn't a real SMTP client", though? I'm fairly sure
that these users aren't doing anything fancy and are using some sort of
email client to send this stuff in (although maybe the client they're using
is busted).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1410 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAobY-ngjl32Ytpp0dyw_hrpPAk571Urks5ru9KQgaJpZM4IDbym>
.
|
See the linked PR. It requires changes to whatever plugin you're using to process attachments - a pause when you enter the end state, and a resume when you're done. Plus there's a minor core change which appears to help. |
Thanks for looking into that, I'll give this a try today. |
Nice, that worked! |
* ignore undefined socket.remoteAddress (#1846) similar to #239 fixes #1835 * URL to manual was 404, point to Plugins.md (#1844) * smtp forward dest split routing (#1847) * split transactions with different forward routing * use tls_ini_section_with_defaults (#1848) * use net_utils.tls_ini_section_with_defaults() fixes #1691 * assure conn/tran still exists before storing results (#1849) * chore(package): update redis to version 2.7.0 (#1854) https://greenkeeper.io/ * fix(package): update ipaddr.js to version 1.3.0 (#1857) https://greenkeeper.io/ * Fix modifier regexp (#1859) * doc typo in config file name (#1865) Actual this is typo in plugin source - it uses "ini" file from another plugin called rabbitmq . So, in fact, no one of the plugins uses config file with name "rabbitmq_amqplib.ini" . This cause error (default values for the plugin) if you trying use the official documentation. * fix(package): update async to version 2.2.0 (#1863) https://greenkeeper.io/ * replace util.inherits with class .. extends (#1862) * replace the discouraged util.inherits with `class Foo extends Bar` syntax * chunkemitter: remove Buffer.concat shim for node < 0.7.11 * fsync_writesteam: moved into ./outbound (only required once) * tls_socket: always set `isServer=true` (an recent merge made it conditional) * removed test for cleartext.getCipher, it *should* always be present * Added RabbitMQ vhost support (#1866) * Add RabbitMQ vhost support Virtual Host. Start with "/". Leave blank or not use if you don't want to use virtual hosts. * Get pool (#1868) * pass cfg object instead, simplifying get_pool * more consistency with variable naming and passing * add integration test: send message w/smtp_client * tls consistency cleanups (#1851) * tls_socket: add options.SNICallback - refresh options.secureContext after setting a different cert in options - plugins/tls: add plugin.loadPemDir - replace some + ‘ ‘ + patterns with string interpolation - replace some `function (a,b) {` patterns with `(a,b) => {` * add very basic tests for tls_socket * move SNI into plugins/tls * revert * tls: remove interim variables (#1871) * - remove interim variables * remove unused _controlReleased * Split delivery relay (#1875) * do not defer relaying clients with split transactions * disable naïve comment stripping (#1876) * disable naïve comment stripping fixes #1874 * tickle travis * fix(package): update async to version 2.3.0 (#1878) https://greenkeeper.io/ * chore(package): update ocsp to version 1.2.0 (#1879) https://greenkeeper.io/ * Added missing 'default' keyword in rspamd plugin. Score wasn't added … (#1856) * Added missing 'default' keyword in rspamd plugin. Score wasn't added properly to header * Logging envelope from / recipients in lowercase in order to avoid changing ES analyzer ( slower performance ) * handle case where OCSP server is unavailable (#1880) this likely fixes #1855 * Add .editorconfig (#1884) * Removing typo from 'relay.md' (#1886) * Fix for issue #1410 (#1887) * Fix for issue #1410 * Remove logging * Confirmed that it works * fixed auth_proxy socket error (#1894) #1893 * Implement the missing upgrade method on SMTPClient. (#1901) * fix(package): update async to version 2.4.0 (#1896) https://greenkeeper.io/ * 1861 binary log (#1902) * Binary log file The log file without that replacement, is interpreted as binary file log issue. * Binary log replace regexp binary replace with code replace * remove variable, line wrap closes 1861 * Outbound split and move into folder (#1850) * Move outbound to its own sub-folder * Start of splitting up outbound into separate files * Progress on splitting * More progress, not working yet * Outbound now seems to work. Tests still fail. * Add missing file * Fix outbound.js test * Fix lint errors * Outbound-Tests for RFC3464 bounce messages reworked (#1889) * fix path to fsync_writestream (./ vs ../) * make sure list_re is defined before access (#1903) * Check error code instead of number. Fixes GH-1895. (#1899) * Bring client pool up to where we were before split (#1915) * Bring client pool up to where we were before split * Add changes from #1848 * Merge in class/extends changes * Fix all logging to be [outbound] prefixed * Fix self logger * Fix net_utils call * fix(package): update sprintf-js to version 1.1.0 (#1906) * F/outbound client certificates (#1908) * Add tls_options and associated tests to queue/smtp_forward * Fix exception when no tls blacklist added. Add tests for smtp_client. * Add documentation for outbound client_certificate * Fix path parsing bug on Windows platform (#1919) * Attempt to add some belt and braces around FIN (#1916) * Attempt to add some belt and braces around FIN * Add logging on write failure * Allow outbound pools to be disabled (#1917) * Allow outbound pools to be disabled * Few changes after actually testing the code * Fix for shutting down clients * Set graceful shutdown off by default (#1927) * Set graceful shutdown off by default Still allow haraka -c <path> --graceful Also sets graceful shutdown to shutdown in max 2xChildren amount of time it sets to shutdown. * Allow gracefulShutdown() to always be graceful Makes the tests pass * Fix comment * Run dumped logs through plugins not console (#1929) * Don't blow the stack on qstat (#1930) * Don't blow the stack on qstat * Should fix another call stack blowing issue * Allow "Unknown Result" and Socket Error to Try Next Host (#1931) * Allow "Unknown Result" and Socket Error to Try Next Host If multiple hosts are listed for clamd and an "unknown result" is returned or a socket error occurs after connection, then attempt to try remaining hosts before returning DENYSOFT. * Remove trailing spaces * Clear off some done tasks (#1928) * Clear off some done tasks * remove completed or abandoned TODO items * Check pool exists before delete (#1937) * Fixes loading outbound under cluster. (#1934) I don't really understand why this stuff broke. It seems to be a bug in node's require() caching, frankly. When loading modules they would return {} before actually being loaded. This appears to be due to recursive requires. This might fix #1933 Please test * Fix cluster messaging for node v6+ (#1938) * Fix undefined variable platformDOT in hmail.js (#1943) * Update qfile.js * Update hmail.js * Update index.js * Fix queue not loaded for single process (#1941) * Fix queue not loaded for single process * Update server.js * Use the sunset keyword in version specific code blocks (#1939) Because `sunset` is the keyword we'll grep for (if we follow our instructions) when doing a semver major release. * Fix PROTOCOL logs that have intermediate \n chars (#1947) Most noted in SpamAssassin output * Load logger in a setImmediate call (#1948) * Bump haraka-results required version (#1949) * Bump haraka-results required version * Update package.json * Fix undefined FsyncWriteStream var (#1953) * Fix undefined FsyncWriteStream var * Update hmail.js * Be more strict in attachment filename matching (#1957) * Be more strict in attachment filename matching This copes with filename="d'euvre" * Add a simple test * Use punycode domain (#1944) * Support SMTPUTF8 properly * Use punycode for mail from too * Enable SMTPUTF8 support * Transaction is null there. Wait until it's set * Add needed utf8 changes to connection.js * punycode for the other addresses * Fix trailing space * Be consistent with todo.domain (#1960) * Minor typo fix (#1963) Thanks * Update package.json (#1968) * Fix unaquired socket errors in test suite (#1971) * Support the new address-rfc2822 (#1970) * Support the new address-rfc2822 New module supports Groups, which have no host method * Force version of rfc2822 required by the fix * Fix missing backslash * RabbitMQ: fix encoding of user and password when creating connection string (#1964) * fix(package): update ipaddr.js to version 1.4.0 (#1972) * Fix link redirection (#1975) * require node LTS version (6+) (#1958) * require node LTS version (6+) * appveyor: install node v6 * whitespace changes for eslint 4 compat (#1979) * whitespace changes for eslint 4 compat * one more ws change * fix(package): update iconv to version 2.3.0 (#1981) * fix(package): update async to version 2.5.0 (#1982) * Add node v8 (#1951) * Add node v8 * enable node v8 testing, permitting v8 failures * import Plugins.md from v3 branch (#1991) * remove logging line (#1985) the prefix is not required in config/plugins as the message states, but it is required for an NPM packaged plugin that requires another NPM packaged plugin and is then spurious * doc: add note that smtp_forward only supports STARTTLS (#1988) * Update qmail-queue.js (#1997) qmail-queue binary Adds a Received: header with bare LF. Haraka sets CRLF by default resulting in mixed line endings which break header processing in some e-mail clients. messagestream can convert CRLF to LF on the fly when calling qmail-send. * rebuild blacklist upon file change (#1990) previously just added new entries fixes 1987 * Remove spurious logs (#1989) * replace plugins/log.elasticsearch with npm packaged (#2004) * replace access with npm packaged haraka-plugin-access (#1992) * use WRITE_EXCL from haraka-constants (#2011) * fix(package): update js-yaml to version 3.9.0 (#2002) * build_todo() is part of the outbound/index.js api (#2016) * rename xclient.hosts to match plugin & docs (#2014) fixes #1265 * correct the config file name to relay.ini (#2012) * fix(package): update semver to version 5.4.0 (#2015) * require node 8 tests to pass (#2017) now that node 8.2+ is on Travis * fixes #1765 (#2013) * Fix auth plugin failure when re-selecting auth method (#2000) When sending the following sequence of commands, the auth plugin fails since it tries to decode an undefined value as base64: AUTH LOGIN Z2lyaXNo 334 UGFzc3dvcmQ6 AUTH LOGIN 500 Unrecognized command # at this point plugin has failed Fixes #1999 * Improve logging UUID tracking in Outbound (#2018) * prep release 2.8.14 (#1932) * prep for next release * update Changes * improve Changes formatting * make the rest of the versions h2 * update Changes
Haraka version
2.7.2
Observed behaviour
I occasionally see clients send 'QUIT' commands before we send '250 Message Queued' replies.
In those situations Haraka throws:
2016-04-08T13:54:53.00749 [ERROR] [EF24362D-72DC-4DBB-8B60-00DAD4A12BEF.1] [core] cmd_quit failed: Error: We are already running hooks! Fatal error! 2016-04-08T13:54:53.00778 [ERROR] [-] [core] undefinedError: We are already running hooks! Fatal error!0[ 'Error: We are already running hooks! Fatal error!', 2016-04-08T13:54:53.00779 ' at Object.plugins.run_hooks (/usr/local/lib/node_modules/Haraka/plugins.js:281:15)', 2016-04-08T13:54:53.00780 ' at Connection.cmd_quit (/usr/local/lib/node_modules/Haraka/connection.js:1193:13)', 2016-04-08T13:54:53.00781 ' at Connection.process_line (/usr/local/lib/node_modules/Haraka/connection.js:280:29)', 2016-04-08T13:54:53.00782 ' at Connection._process_data (/usr/local/lib/node_modules/Haraka/connection.js:423:18)', 2016-04-08T13:54:53.00783 ' at /usr/local/lib/node_modules/Haraka/connection.js:632:41', 2016-04-08T13:54:53.00783 ' at process._tickDomainCallback (node.js:486:13)' ]
Expected behaviour
Steps to reproduce
Here is the last bit of data from the client before the error occurs:
Thanks, let me know if there's any more info I can provide.
Dave
The text was updated successfully, but these errors were encountered: