-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
security/p2p: prevent peers who errored being added to the peer_set #9500
Conversation
if peer.GetRemovalFailed() { | ||
return ErrPeerRemoval{} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this check before calling PeerSet.Add
here. I would prefer to do it on line 813 of switch.go before initializing the peer to the individual reactors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will reordering this as you've recommended @cmwaters not change the effectiveness of the fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by effectiveness? As in it will no longer fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this check happens anywhere earlier we introduce the exact same race condition we already heaving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can absolutely do it before and if this flag happens to be set we don't do anything but we also do have to do it here as well to be sure we are not in the same position as w/o the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this check happens anywhere earlier we introduce the exact same race condition we already heaving.
I see. I wasn't aware of that. Need to look in more closely then
Co-authored-by: Callum Waters <[email protected]>
p2p/switch.go
Outdated
// This is done to avoid the corner case of adding a peer when this error happens before the | ||
// AddPeer routine has finished. Setting this flag will prevent a peer from being added | ||
// to a node's peer set. | ||
peer.SetRemovalFailed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to do this here without the peer set being locked in the same way it is when calling GetRemovalFailed
here? (the GetRemovalFailed
call happens while the peer set's mutex is locked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had intuitively implemented with a lock in a previous version (although by setting a lock on the peer itself, not on the entire set). But I did not notice the attack being more successful so I opted to not add a mutex into the peer structure itself even though I still do a small window of opportunity for the removal to happen first and then the Add to grab a lock and add the peer before we set the removal failed flag. Thinking about it now, I could actually set the flag inside the Remove function above the line that returns false and not do a lock on the peer itself. I will do a local test with this change to examine the behaviour unless you have objections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on a local test I think it is better to put it inside the Remove. Will confirm on the testnet before pushing the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not change the behaviour much but is definetly more correct for this to be inside Remove. I moved it there now. Tested both locally and on DO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we should go ahead and merge this. Next steps from here:
- Backport to
v0.37.x
andv0.34.x
, and add aCHANGELOG_PENDING.md
entry to each backport PR. - Run a large-scale testnet (a single test) on
v0.34.x
with this fix and check that things work as expected. - Cut v0.34.22.
- Keep the
v0.37.x
backport branch open until we've cut v0.37.0, then merge it. - Run another large-scale testnet on
v0.37.x
to check that things work before cutting v0.37.1.
Unless anyone has any objections there?
Thanks! I'll create an issue with the items you listed, I think they all make sense are definitely worth doing. Do you think we should do a large testnet test of whether the attacked is prevented? If so, we can add this item to the list and I'll need to patch the testnet scripts to be able to execute the attack. |
The main thing we'd need to know is whether the P2P network is still stable and that the fix doesn't meaningfully affect peer connectivity. I don't think it needs to be a long test. I assume that your previous tests showed that the fix worked in larger-scale testnets? |
I tested that a more aggressive attack does not bring the node down and that the attack is prevented. I did not run a 200node test with the attack. I also don;t think it would reveal additional information. But it was worth testing that when we multiple the attackers the node is still behaving correctly. So I agree with you, we should confirm that this fix does not break something else rather then attack a node within a bigger network. |
…9500) * Mark failed removal of peer to address security bug Co-authored-by: Callum Waters <[email protected]> (cherry picked from commit c0bdb24)
…9500) * Mark failed removal of peer to address security bug Co-authored-by: Callum Waters <[email protected]> (cherry picked from commit c0bdb24)
…backport #9500) (#9516) * security/p2p: prevent peers who errored being added to the peer_set (#9500) * Mark failed removal of peer to address security bug Co-authored-by: Callum Waters <[email protected]> (cherry picked from commit c0bdb24) * Changelong entry and added missing functions for implementations of Peer Co-authored-by: Jasmina Malicevic <[email protected]>
This is an elegant workaround. As Jasmina mentioned, while I agree we should assess the p2p layer as a whole in large testnet, I don't see how a larger environment would affect this form of attack or compromise the solution. |
) **tl;dr DOS mitigation migrated from [tendermint/tendermint/pull/9500](tendermint/tendermint#9500 Validated use LocalNet instructions at [doc/guides/localnet.md](https://github.com/pokt-network/pocket-core/blob/staging/doc/guides/localnet.md) Original PR description: ``` This work is a fix for a bug in the P2P layer. A node can be attacked via the p2p layer by saturating its incoming connection slots and not allowing the node to accept new conditions. This happens when an attacker continuously submits requests to connect with an erroneous message causing the incoming request to error before it has been accepted. The attacked node, tries to remove the peer from its peer set which silently fails (due to the peer not yet being in the peer set). The routine adding a peer into the peer set happens in parallel in the background and will add the peer after the error has been reported. This fix resolves the issue in the following way: We add a field removalAttemptFailed to the Peer datastructure. If removal of this peer fails, we set it to true. When adding a peer into the peer set, the Add function will return an ErrPeerRemoval error if this field was true and not add the peer. Note. This attack does not work if the config flag allow_duplicate_ips is set to false. ```
…216) * security/p2p: prevent peers who errored being added to the peer_set (backport tendermint#9500) (#87) * security/p2p: prevent peers who errored being added to the peer_set (tendermint#9500) * Mark failed removal of peer to address security bug * Added p2p bugfix
This work is a fix for a bug in the P2P layer.
A node can be attacked via the p2p layer by saturating its incoming connection slots and not allowing the node to accept new conditions. This happens when an attacker continuously submits requests to connect with an erroneous message causing the incoming request to error before it has been accepted. The attacked node, tries to remove the peer from its peer set which silently fails (due to the peer not yet being in the peer set). The routine adding a peer into the peer set happens in parallel in the background and will add the peer after the error has been reported.
This fix resolves the issue in the following way:
removalAttemptFailed
to thePeer
datastructure.Add
function will return anErrPeerRemoval
error if this field was true and not add the peer.Note. This attack does not work if the config flag
allow_duplicate_ips
is set tofalse
.PR checklist
CHANGELOG_PENDING.md
updated, or no changelog entry neededdocs/
) and code comments, or nodocumentation updates needed