Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

I'm online messages #2719

Closed
cmichi opened this issue May 29, 2019 · 10 comments
Closed

I'm online messages #2719

cmichi opened this issue May 29, 2019 · 10 comments
Assignees
Labels
J0-enhancement An additional feature request.
Milestone

Comments

@cmichi
Copy link
Contributor

cmichi commented May 29, 2019

This feature is intended to handle the case when validators go offline without malicious intent (i.e. without the intention of attacking the network). We don't want to slash them if there is no malicious intent.

Sending I'm online messages

  • Each validator gossips an I'm online heartbeat every n blocks session.
  • The heartbeat is a signed transaction, which was signed using the session key and includes the recent best block number of the local validators chain as well as the multiaddr (physical node address). It is sent as an Unsigned Transaction (no account nonce/signature/sender).
    • The multiaddr provides a secondary mechanism to look nodes up, since the DHT is flaky.

Interpreting I'm online messages

  • There will be a points system for the token mechanics, which we can also use for tracking if validators are online.
    • A validator is considered online if a threshold number of points is reached.
    • If a validator doesn't hit the threshold it is considered offline.
    • By using a points threshold we determine if a validator is online by taking anything into account which a validator contributed to the protocol (and not just I'm online messages).
    • The threshold is probably going to be 1 (i.e. any points at all signify the authority is online).
  • Create a new srml-im-online module which has a node_online hook.
  • Introduce a validate_transaction trait in the srml-im-online module which interprets and validates the unsigned transaction.
  • srml-im-online contains a map babe_session_key => Option<multiaddr>. Clear this map every time on_session_change() is called.
  • Validators may only produce a single im-online transaction for each session in each chain. Further im-omline messages are ignored once a valid message has been received in the current session for that validator. No slash is made.

Future stuff with the points system

  • Slash when a node is considered online according to the points system, but that node hasn't authored any blocks (or significantly less than expected) in a long time during which it should have authored blocks.
    • This "long time" is calculated by the likelihood that the node should have authored a block by now. If this threshold is below x then slash them. The further we get below this threshold the more we slash.
  • If a validator is considered offline it is ignored as a validator, but not slashed. Kinda like a protection mechanism.

Open questions

  • What of the above info is persisted on chain? Just the one map.
  • How often is the heartbeat sent? Every block? Once per session.
  • Where do we determine that a slash happens? In srml-im-online? Don't bother
@cmichi cmichi self-assigned this May 29, 2019
@rphmeier
Copy link
Contributor

rphmeier commented May 29, 2019

Each validator gossips an I'm online heartbeat every n block

It should be triggered somehow, since it's not supposed to be arbitrary n, but rather all synchronized to the start of a BABE epoch ( == session)

How often is the heartbeat send? Every block?

See above.

When are token mechanics implemented? The interpretation of I'm online messages can't be finished until the points system is available.

Not entirely, but the logic of "do something if X says they're online" can be done, with "do something" being a trait parameter.

The exact percentage of how much we slash:
…in case of duplicate I'm online messages.
…in case of I'm online messages, but no blocks authored for a sufficiently long time

I would mark this as beyond the scope of the goal of getting information about those who say they're online into the runtime. The separation of architecture here will make things easier to audit.

Where do we determine that a slash happens? In srml-im-online?

Definitely not. It should be a trait associated type of im_online::Trait which in practice will be wired up to a Rewards system.

If more than one I'm online message arrives for a block they are slashed.

How do you check this, except to see if there are different signatures? And note that ed25519 signatures are deterministic, so with the same key, you always get the same signature for the same message.

@rphmeier
Copy link
Contributor

I also don't know why the BABE module would be responsible for handling the incoming I'm Online messages if there is an srml-im-online module. We want this functionality to be reusable within Substrate. BABE should definitely have a hook where it can inform other modules that someone has authored a block.

@rphmeier rphmeier added the J0-enhancement An additional feature request. label May 29, 2019
@cmichi cmichi added this to the 2.0-alpha milestone May 29, 2019
@soulofamachine soulofamachine modified the milestones: 2.0-alpha, 2.0-beta Jun 3, 2019
@kianenigma kianenigma modified the milestones: 2.0-alpha, 2.0-kusama Jun 3, 2019
@tomaka
Copy link
Contributor

tomaka commented Jul 1, 2019

includes the recent best block number of the local validators chain as well as the multiaddr (physical node address)

It should also include the PeerId (aka. network key). We preferably don't want to just blindly dial a multiaddress.
And if we have the PeerId, we can also look into the DHT for a more recent list of multiaddresses for that PeerId in case the one on chain is unreachable.

@burdges
Copy link

burdges commented Jul 2, 2019

We do need a mechanism for validators to be offline after session key revocation, but then rejoining requires key change and the incurred delay.

I doubt mere "I'm online" messages address any security concerns however, so can you explain what purpose this serves?

Al and I designed BABE variants that actually do prove your presence by proving exactly how many blocks you missed. We never pushed them and folks opted for the simpler design closer to ouroboros.

Instead, we wanted validators to be slashed for being offline in proportion to the number of validators who were offline simultaneously since the interesting attacks require more participants.

If more than one I'm online message arrives for a block they are slashed.

What does this achieve?

How do you check this, except to see if there are different signatures?

You likely compare what is signed and the signer, but not sure the point. And Ed25519 might not stick around in the session key.

@rphmeier
Copy link
Contributor

rphmeier commented Jul 2, 2019

You likely compare what is signed and the signer

Well, if you sign 2 payloads a and b with a deterministic algorithm, s(a) == s(b) when a == b. We'd need a way to differentiate between "I'm online in Session X" messages.

@burdges
Copy link

burdges commented Jul 2, 2019

You must include the X regardless, so comparing the messages and signers works. Ed25519's determinism is mostly so that test vectors make sense. It gives you nothing else really.

@rphmeier
Copy link
Contributor

rphmeier commented Jul 2, 2019

@burdges I'm not sure that you follow the point, which is that you can't tell the difference between someone signing two "I'm online" messages and someone replaying the one that someone signed the first time.

@burdges
Copy link

burdges commented Jul 3, 2019

I'm saying the only differences come from the message and the signer. A comparison a.signature == b.signature failing conveys zero information while a.msg == b.msg && a.signer == b.signer failing does. There are many valid signatures for a valid message and signer in particular, some of which a third party can produce from a valid signature.

@kianenigma
Copy link
Contributor

@cmichi closed?

@cmichi
Copy link
Contributor Author

cmichi commented Jul 27, 2019

Closed and follow-up integration ToDos are in #3223.

@cmichi cmichi closed this as completed Jul 27, 2019
@gnunicorn gnunicorn modified the milestones: 2.1, Polkadot Mar 4, 2020
melekes added a commit to melekes/substrate that referenced this issue May 29, 2023
Users should use DHT for discovering new nodes. The reason for adding external addresses was
unstable work of authority discovery (see paritytech#2719),
which is now stable. Hence we can safely remove `external_addresses`.

Refs https://github.com/paritytech/polkadot/issues/7181
paritytech-processbot bot pushed a commit that referenced this issue Jun 15, 2023
* [frame/im-online] remove `external_addresses` from heartbeats

Users should use DHT for discovering new nodes. The reason for adding external addresses was
unstable work of authority discovery (see #2719),
which is now stable. Hence we can safely remove `external_addresses`.

Refs https://github.com/paritytech/polkadot/issues/7181

* remove unused import

* run benchmark

* remove external_addresses from offchain NetworkState

* add missing fn to TestNetwork

* Revert "run benchmark"

This reverts commit a282042.

* update weights

* address @bkchr comments

* remove duplicate fn

* cleanup benchmarking.rs

* fix executor tests

* remove peer_id from hearbeat as well

#14251 (comment)

* remove MaxPeerDataEncodingSize

* change storage value type to `()`

#14251 (comment)

* scaffold storage migration

* no need to check the type actually

* remove unnecessary types from v0 mod

* add a test for migration

* expose Config types

+ pre_upgrade and post_upgrade working fn

* fix test

* replace dummy type with ConstU32

* add some comments to migration test

* fix comment

* respond to @bkchr comments

* use BoundedOpaqueNetworkState::default

intead of using default for each field
nathanwhit pushed a commit to nathanwhit/substrate that referenced this issue Jul 19, 2023
)

* [frame/im-online] remove `external_addresses` from heartbeats

Users should use DHT for discovering new nodes. The reason for adding external addresses was
unstable work of authority discovery (see paritytech#2719),
which is now stable. Hence we can safely remove `external_addresses`.

Refs https://github.com/paritytech/polkadot/issues/7181

* remove unused import

* run benchmark

* remove external_addresses from offchain NetworkState

* add missing fn to TestNetwork

* Revert "run benchmark"

This reverts commit a282042.

* update weights

* address @bkchr comments

* remove duplicate fn

* cleanup benchmarking.rs

* fix executor tests

* remove peer_id from hearbeat as well

paritytech#14251 (comment)

* remove MaxPeerDataEncodingSize

* change storage value type to `()`

paritytech#14251 (comment)

* scaffold storage migration

* no need to check the type actually

* remove unnecessary types from v0 mod

* add a test for migration

* expose Config types

+ pre_upgrade and post_upgrade working fn

* fix test

* replace dummy type with ConstU32

* add some comments to migration test

* fix comment

* respond to @bkchr comments

* use BoundedOpaqueNetworkState::default

intead of using default for each field
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J0-enhancement An additional feature request.
Projects
None yet
Development

No branches or pull requests

7 participants