Fix LXD lowering host bridge MTU #11919
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When dealing with reasonably complex bridges that are not LXD managed, in my case, a bridge using VLAN filtering, it's not unusual for some VLANs to use a different MTU than others.
The problem is that currently LXD slams the same MTU on both the host and guest side device.
This feels like a good idea overall, if it wasn't for the fact that the Linux kernel will automatically lower the bridge MTU should any device be added to it with an MTU lower than its current value.
In practice what this means is that starting a single instance which has a network interface using an MTU lower than the current one on the bridge (1500 vs 9000 in my case) will cause the bridge MTU to get lowered and break a whole bunch of stuff.
Instead what we need to do is ensure that the host side device always has the same MTU as the bridge it's being put into, then set the user request MTU on the guest side device instead.
This all does cause a small behavior difference though, as MTU stands for maximum "transmission" unit, the kernel only enforces it on egress, not on ingress. So while with the current (buggy) implementation, this effectively ensures that a container with mtu=1500 will never receive a packet > 1500, the fixed logic technically doesn't prevent it.
A host with a bridge MTU of 9000 and a container with an interface MTU of 1500 will now be able to receive packets headed to it with a MTU of up to 9000, but it won't be able to send any response or any new packet out with an MTU higher than its configured 1500.
This certainly isn't ideal but the current behavior is also a grey area as we have a bridge that can be forced back to its full MTU of 9000 while retaining some devices inside of it with an MTU of 1500. This most likely results in packets being dropped somewhere in the kernel, rather than the physical version of this setup where they would just get truncated to match the MTU.