Sei remote sign hotfix: aggregate tcp chunks before unmarshal proto #1

zarkone · 2024-04-25T15:00:35Z

// tendermint/cometbft proposal:
type Proposal struct {
	Type      SignedMsgType
	Height    int64
	Round     int32
	PolRound  int32
	BlockID   BlockID
	Timestamp time.Time
	Signature []byte
}

// vs sei-tendermint proposal
type Proposal struct {
	Type            SignedMsgType
	Height          int64
	Round           int32
	PolRound        int32
	BlockID         BlockID
	Timestamp       time.Time
	Signature       []byte

        // this is a list, and can be very long...
	TxKeys          []*TxKey
	Evidence        *EvidenceList
	LastCommit      *Commit
	Header          Header
	ProposerAddress []byte
}

Since Proposal has TxKeys and other lists, Proposal has variable length It is easily goes > 1024 bytes if block has big mount of txs. And it is not a problem of canonical tendermint/cometbft implementations since due to its message structure, it has a fixed max length < 1024 (DATA_MAX_SIZE)

sei-tendermint, when it connects to remote signer over tcp, sends proposal divided by chunk of DATA_MAX_SIZE (1024) each, which kind of fits the expectation of tmkms. However, tmkms never tries to aggregate chunks. In fact, it is impossible for tmkms to implement aggregation properly without knowing the length beforehand: which is not provided by tendermint protocol.

There might be a confusioon also, because all implementations of tendermint send lenght-delimited protobufs, and tmkms also reads with a function "length delimited". However, it actually means that the protobuf msg is prepended by it's length: so that when tmkms reads 1024 bytes it knows which zeroes are payload and which a need to be cut. Another words, it has nothing to do with multi-chunk payload.

Which means that sei-tendermint just doesn't bother about tcp remote signer, and it is impossible to make it work with tmkms without rewriting both and adding this custom protocol of "aggregate chunks until you get full message length".

--
This code implements aggregation by trying to unmarshal aggregated message each time it gets a new chunk. I don't think it is a good idea in a long run, however, the alternative would be to adjust both Sei and tmkms, rolling out new length-aware protocol between them -- I'm not sure how sufficient it is and definitely needs a discussion. Current solution is compartable with both cometbft/tendermint and sei-tendermint, however, way less efficient then the original read implementation of tmkms.

P.S: Apart from custom length-aware protocol, there is another option: implement grpc in tmkms, which seem to be supported by sei-tendermint.

zarkone · 2024-04-25T15:01:10Z

tested locally on sei / hello, to be tested on testnet yet

```go // tendermint/cometbft proposal: type Proposal struct { Type SignedMsgType Height int64 Round int32 PolRound int32 BlockID BlockID Timestamp time.Time Signature []byte } ``` ```go // vs sei-tendermint proposal type Proposal struct { Type SignedMsgType Height int64 Round int32 PolRound int32 BlockID BlockID Timestamp time.Time Signature []byte // this is a list, and can be very long... TxKeys []*TxKey Evidence *EvidenceList LastCommit *Commit Header Header ProposerAddress []byte } ``` Since Proposal has TxKeys and other lists, Proposal has variable length It is easily goes > 1024 bytes if block has big mount of txs. And it is not a problem of canonical tendermint/cometbft implementations since due to its message structure, it has a fixed max length < 1024 (DATA_MAX_SIZE) sei-tendermint, when it connects to remote signer over tcp, sends proposal divided by chunk of DATA_MAX_SIZE (1024) each, which kind of fits the expectation of tmkms. However, tmkms never tries to aggregate chunks. In fact, it is impossible for tmkms to implement aggregation properly without knowing the length beforehand: which is not provided by tendermint protocol. There might be a confusioon also, because all implementations of tendermint send lenght-delimited protobufs, and tmkms also reads with a function "length delimited". However, it actually means that the protobuf msg is prepended by it's length: so that when tmkms reads 1024 bytes it knows which zeroes are payload and which a need to be cut. Another words, it has nothing to do with multi-chunk payload. Which means that sei-tendermint just doesn't bother about tcp remote signer, and it is impossible to make it work with tmkms without rewriting both and adding this custom protocol of "aggregate chunks until you get full message length". -- This code implements aggregation by trying to unmarshal aggregated message each time it gets a new chunk. I don't think it is a good idea in a long run, however, the alternative would be to adjust both Sei and tmkms, rolling out new length-aware protocol between them -- I'm not sure how sufficient it is and definitely needs a discussion. Current solution is compartable with both cometbft/tendermint and sei-tendermint, however, way less efficient then the original `read` implementation of tmkms. P.S: Apart from custom length-aware protocol, there is another option: implement grpc in tmkms, which seem to be supported by sei-tendermint.

qezz · 2024-04-25T15:28:00Z

I briefly looked through it.

Couple of notes on how to try it on Sei Testnet

On Github stuff:

We probably won't merge it to main, but it's OK to have the PR for discussion purposes. (If we merge into main, we'll need to rebase on top of upstream next time, so these commits will be overwritten anyway)
Once tested, we should make a PR to the upstream.

On how to try it:
(Sent privately)

qezz

Looks good, let's see how it works

qezz · 2024-04-25T15:29:35Z

src/rpc.rs

+use tendermint_p2p::secret_connection::DATA_MAX_SIZE;
 use tendermint_proto as proto;

-// TODO(tarcieri): use `tendermint_p2p::secret_connection::DATA_MAX_SIZE`
-// See informalsystems/tendermint-rs#1356
-const DATA_MAX_SIZE: usize = 262144;
-


We need to confirm that this change doesn't affect other things, once Sei works

yes sure, makes sense! should I revert it for now then?

The reasoning behind this change is:
I might not see the full picture, but p2p lib reads chunk of this size internally, so it doesn't make sense to have a larger buffer here actually. Same think they say in the comment.

qezz · 2024-04-25T15:31:16Z

src/rpc.rs

+        let msg;
+
+        // fix for Sei: collect incoming bytes of Protobuf from incoming msg
+        loop {


Wonder if this logic should go to tendermint_proto, e.g. proto::privval::Message::decode()

maybe not to proto, since proto shouldn't care about transport. Rather to tendermint_p2p, as tmkms author mentioned.

qezz · 2024-04-25T15:37:13Z

src/rpc.rs

+            match proto::privval::Message::decode_length_delimited(msg_bytes.as_ref()) {
+                Ok(m) => {
+                    msg = m.sum;
+                    break;
+                }
+                Err(e) => {
+                    // if chunk_len < DATA_MAX_SIZE (1024) we assume it was the end of the message and it is malformed
+                    if chunk_len < DATA_MAX_SIZE {
+                        return Err(format_err!(
+                            ErrorKind::ProtocolError,
+                            "malformed message packet: {}",
+                            e
+                        )
+                        .into());
+                    }
+                    // otherwise, we go to start of the loop assuming next chunk(s)
+                    // will fill the message
+                }
+            }


So... if decode_length_delimited() fails, we add the new chunk to the data, and try again, correct?
Though if the chunk len is < DATA_MAX_SIZE, we fail?

correct. The idea is, when we get a chunk, if it is < 1024, then we think it is the end of the message. Or, if it is = 1024 and we can parse aggregated, then it is the end of the message as well.

This is far from elegant, however, I don't know what else we can do given that tmkms can't know the full length of incoming chunked message (see also he explanation in description).

https://github.com/sei-protocol/sei-tendermint/blob/main/privval/secret_connection.go#L210-L219

I would add the GH URL as a comment to explain where this assumption came from. Those few lines explain everything:
https://github.com/cometbft/cometbft/blob/ffd2d3f9475b6f101cc1d4c5ff94a4d928db6bb4/p2p/conn/secret_connection.go#L201-L203

(also please use the cometbft url I pasted, rather than sei-tendermint :) )

(also) there is grpc endpoint that tmkms will use some day, tony (main maintainer of tmkms) has some PR somwhere to make it work, but it's not there yet.

So this code-path one day will be deprecated, but for now it is a main logic that needs to be fixed

qezz · 2024-04-25T15:37:48Z

src/rpc.rs

+                    break;
+                }
+                Err(e) => {
+                    // if chunk_len < DATA_MAX_SIZE (1024) we assume it was the end of the message and it is malformed


For a cleaned up version, I would remove any hardcoded numbers, i.e. 1024. The DATA_MAX_SIZE is enough

zarkone · 2024-04-25T15:55:56Z

We probably won't merge it to main, but it's OK to have the PR for discussion purposes. (If we merge into main, we'll need to rebase on top of upstream next time, so these commits will be overwritten anyway)

Makes sense! Even though it is compatible with canonical protocol, it is rather an inefficient quickfix then a proper solution. Ideal solution for me would be gRPC support, but we should weigh if it worth to spend time on implementation of such feature.

qezz · 2024-04-26T09:59:43Z

It works 🎉

mkaczanowski · 2024-04-29T07:23:23Z

noice, to be sure it works, I would play with connection resets / malforming the requests. This is kill SEI daemon a few times and see that tmkms doesn't get stuck in the loop / continues to work fine

plus lets test it on some testnets with canonical proposals

src/rpc.rs

mkaczanowski · 2024-11-10T23:17:27Z

this is merged upstream (iqlusioninc repo)

zarkone requested a review from a team April 25, 2024 15:00

zarkone self-assigned this Apr 25, 2024

zarkone force-pushed the aggregate-messages-from-sei-tendermint branch from f38ca1c to 5ab22bb Compare April 25, 2024 15:17

qezz reviewed Apr 25, 2024

View reviewed changes

mkaczanowski reviewed Apr 29, 2024

View reviewed changes

src/rpc.rs Show resolved Hide resolved

zarkone mentioned this pull request Apr 29, 2024

Aggregate SecretConnection chunks with unmarshal protobuf retry iqlusioninc/tmkms#903

Merged

zarkone and others added 2 commits July 18, 2024 10:46

integration test with bigger Proposal

39b4013

rename buffer-overflow-proposal.bin to buffer-underflow-proposal.bin

bf95b8f

mkaczanowski closed this Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sei remote sign hotfix: aggregate tcp chunks before unmarshal proto #1

Sei remote sign hotfix: aggregate tcp chunks before unmarshal proto #1

zarkone commented Apr 25, 2024

zarkone commented Apr 25, 2024

qezz commented Apr 25, 2024 •

edited

Loading

qezz left a comment

qezz Apr 25, 2024

zarkone Apr 25, 2024

qezz Apr 25, 2024

zarkone Apr 25, 2024

qezz Apr 25, 2024

zarkone Apr 25, 2024

mkaczanowski Apr 29, 2024

mkaczanowski Apr 29, 2024

qezz Apr 25, 2024

zarkone commented Apr 25, 2024

qezz commented Apr 26, 2024

mkaczanowski commented Apr 29, 2024 •

edited

Loading

mkaczanowski commented Nov 10, 2024

Sei remote sign hotfix: aggregate tcp chunks before unmarshal proto #1

Sei remote sign hotfix: aggregate tcp chunks before unmarshal proto #1

Conversation

zarkone commented Apr 25, 2024

zarkone commented Apr 25, 2024

qezz commented Apr 25, 2024 • edited Loading

qezz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zarkone commented Apr 25, 2024

qezz commented Apr 26, 2024

mkaczanowski commented Apr 29, 2024 • edited Loading

mkaczanowski commented Nov 10, 2024

qezz commented Apr 25, 2024 •

edited

Loading

mkaczanowski commented Apr 29, 2024 •

edited

Loading