Move message processing on workers into separate thread #103

DasSkelett · 2023-08-05T17:46:52Z

Problem

Our wgkex workers reconnect to the MQTT broker very frequently, something like every 40 seconds on average.
This appears to be because whenever there's a burst of messages coming in, the worker is busy handling all these and the heavy netlink processing - on the main MQTT loop thread - that it might not be able to send out the MQTT ping request when it's due (every 5 seconds as of our current configuration).
If the pings are not sent (and answered) right away, the MQTT client deems the connection faulty, closes it (TCP RST) and reconnects to the broker with a new one.

I believe eclipse-paho/paho.mqtt.python#328 also causes this problem to show like this, because as it looks like Paho MQTT only waits one loop iteration after the keepalive (=ping interval) timer expired, giving it a bit more time would make sense. (The MQTT broker waits 1.5 * keepalive after the last ping before it cuts the connection, i.e. 0.5 * keepalive after it should've received one).
That said, due to the burstiness of our traffic and amount of processing required for handling each message, these 0.5 * keepalive would not help us much, especially not with a keepalive of 5.

To investigate I also played around with the keepalive time, bumping it should reduce the chance that it happens right after a burst and increase the average time it has for working through each burst before the ping is due.
10 seconds didn't make any difference, 20 seconds helped a tad bit, maybe reducing it to 1/3rd or 1/5th.

All MQTT packets in black, RSTs in red. From a packet capture on docker04 where Mosquitto is running. Notice how the resets (4, we have 4 workers/gateways) always come after the bursts.

Packet capture on gw04 only, keepalive at 10:

See also http://www.steves-internet-guide.com/mqtt-keep-alive-by-example/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move message processing on workers into separate thread #103

Move message processing on workers into separate thread #103

DasSkelett commented Aug 5, 2023

Move message processing on workers into separate thread #103

Move message processing on workers into separate thread #103

Comments

DasSkelett commented Aug 5, 2023

Problem

Suggested Solution