Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adr: Un-Ordered Transaction Inclusion #18553

Merged
merged 23 commits into from
Dec 5, 2023
Merged
Changes from 19 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 177 additions & 0 deletions docs/architecture/adr-069-unordered-account.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# ADR 070: Un-Ordered Transaction Inclusion

## Changelog

* Dec 4, 2023: Initial Draft

## Status

Proposed

## Abstract

We propose a way to do replay-attack protection without enforcing the order of transactions, it don't use nonce at all. In this way we can support un-ordered transaction inclusion.
yihuang marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstract contains a grammatical error. Consider revising to "it doesn't use nonces at all" for clarity and correctness.


## Context

As of today, the nonce value (account sequence number) prevents replay-attack and ensures the transactions from the same sender are included into blocks and executed in sequential order. However it makes it tricky to send many transactions from the same sender concurrently in a reliable way. IBC relayer and crypto exchanges are typical examples of such use cases.

## Decision

We propose to add a boolean field `unordered` to transaction body to mark "un-ordered" transactions.
alexanderbez marked this conversation as resolved.
Show resolved Hide resolved

Un-ordered transactions will bypass the nonce rules and follow the rules described below instead, in contrary, the default ordered transactions are not impacted by this proposal, they'll follow the nonce rules the same as before.

When an un-ordered transaction is included into a block, the transaction hash is recorded in a dictionary. New transactions are checked against this dictionary for duplicates, and to prevent the dictionary grow indefinitely, the transaction must specify `timeout_height` for expiration, so it's safe to removed it from the dictionary after it's expired.

The dictionary can be simply implemented as an in-memory golang map, a preliminary analysis shows that the memory consumption won't be too big, for example `32M = 32 * 1024 * 1024` can support 1024 blocks where each block contains 1024 unordered transactions. For safty, we should limit the range of `timeout_height` to prevent very long expiration, and limit the size of the dictionary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short timeout_height window also ensures a tighter bound on replay protection.


### Transaction Format

```protobuf
message TxBody {
...

boolean unordered = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unordered field in the TxBody message is incorrectly defined as boolean. The correct Protobuf type is bool.

}
```

### `DedupTxHashManager`

```golang
// can reduce frequency we check the expiration.
const ExpireCheckInterval = 1

// DedupTxHashManager contains the tx hash dictionary for duplicates checking,
// and expire them when block number progresses.
type DedupTxHashManager struct {
// tx hash -> expire block number
// for duplicates checking and expiration
hashes map[TxHash]uint64
tac0turtle marked this conversation as resolved.
Show resolved Hide resolved
}

func (dtm *DedupTxHashManager) Contains(hash TxHash) (ok bool) {
dtm.mutex.RLock()
defer dtm.mutex.RUnlock()

_, ok = dtm.hashes[hash]
return
}

func (dtm *DedupTxHashManager) Size() int {
dtm.mutex.RLock()
defer dtm.mutex.RUnlock()

return len(dtm).hashes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a syntax error in the Size method of the DedupTxHashManager struct. It should be len(dtm.hashes) instead of len(dtm).hashes.

}

func (dtm *DedupTxHashManager) Add(hash TxHash, expire uint64) (ok bool) {
dtm.mutex.Lock()
defer dtm.mutex.Unlock()

dtm.hashes[hash] = expire
return
}

// EndBlock remove expired tx hashes, need to wire in abci cycles.
func (dtm *DedupTxHashManager) EndBlock(ctx sdk.Context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will this run in endblock? could we not do a quick check against current time and remove everything behind it? could be a DeleteUntil then it checks and removes all previous hashes. This could be done async as well if we wanted

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will this run in endblock?

Wire into baseapp.

Yeah, it can run in background, the end blocker could be a trigger.

It could grab a read lock first and iterate to find out expired hashes, then grab write lock to do the deletion, that should have maximum parallel performance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds good. Since everything under current height wont be touched by the state machine, do we need locks?

Copy link
Collaborator Author

@yihuang yihuang Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds good. Since everything under current height wont be touched by the state machine, do we need locks?

we need locks because check tx runs concurrently, and we also have a background purging loop now.

if ctx.BlockNumber() % ExpireCheckInterval != 0 {
return
}

dtm.mutex.Lock()
defer dtm.mutex.Unlock()

for k, expire := range dtm.hashes {
if ctx.BlockNumber() > expire {
delete(dtm.hashes, k)
}
}
}
```

### Ante Handlers

Bypass the nonce decorator for un-ordered transactions.

```golang
func (isd IncrementSequenceDecorator) AnteHandle(ctx sdk.Context, tx sdk.Tx, simulate bool, next sdk.AnteHandler) (sdk.Context, error) {
if tx.UnOrdered() {
return next(ctx, tx, simulate)
}

// the previous logic
}
```

A decorator for the new logic.

```golang
type TxHash [32]byte

const (
// MaxNumberOfTxHash * 32 = 128M max memory usage
MaxNumberOfTxHash = 1024 * 1024 * 4

// MaxUnOrderedTTL defines the maximum ttl an un-order tx can set
MaxUnOrderedTTL = 1024
)

type DedupTxDecorator struct {
m *DedupTxHashManager
}

func (dtd *DedupTxDecorator) AnteHandle(ctx sdk.Context, tx sdk.Tx, simulate bool, next sdk.AnteHandler) (sdk.Context, error) {
// only apply to un-ordered transactions
if !tx.UnOrdered() {
return next(ctx, tx, simulate)
}

if tx.TimeoutHeight() == 0 {
return nil, errorsmod.Wrap(sdkerrors.ErrLogic, "unordered tx must set timeout-height")
}

if tx.TimeoutHeight() > ctx.BlockHeight() + MaxUnOrderedTTL {
return nil, errorsmod.Wrapf(sdkerrors.ErrLogic, "unordered tx ttl exceeds %d", MaxUnOrderedTTL)
}

if !ctx.IsCheckTx() {
// a new tx included in the block, add the hash to the dictionary
yihuang marked this conversation as resolved.
Show resolved Hide resolved
if dtd.m.Size() >= MaxNumberOfTxHash {
return nil, errorsmod.Wrap(sdkerrors.ErrLogic, "dedup map is full")
}
dtd.m.Add(tx.Hash(), tx.TimeoutHeight())
} else {
// check for duplicates
if dtd.m.Contains(tx.Hash()) {
return nil, errorsmod.Wrap(sdkerrors.ErrLogic, "tx is duplicated")
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AnteHandle method in DedupTxDecorator should check for duplicates before adding a new transaction hash to the dictionary, even when !ctx.IsCheckTx() is true.


return next(ctx, tx, simulate)
}
```

### EndBlocker

Wire up the `EndBlock` method of `DedupTxHashManager` into the application's abci life cycle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simply purge the map during the FinalizeBlock hook that app's can set. There are many places we can purge, it can be EndBlock too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe Commit event, I changed the method name to OnNewBlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, overall, my point is that I think this is a minor implementation detail in the grand scheme of things 👍


### Start Up

On start up, the node needs to re-fill the tx hash dictionary of `DedupTxHashManager` by scanning `MaxUnOrderedTTL` number of historical blocks for un-ordered transactions.

An alternative design is to store the tx hash dictionary in kv store, then no need to warm up on start up.

## Consequences
tac0turtle marked this conversation as resolved.
Show resolved Hide resolved

### Positive

* Support un-ordered and concurrent transaction inclusion.

### Negative

- Start up overhead to scan historical blocks.
alexanderbez marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the filesystem to write the known hashes at shutdown? On start we would populate from the file

Copy link
Collaborator Author

@yihuang yihuang Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we write to disk, we need to handle commit, rollbacks the same as the other state machine state, to keep it consistent. So better to use existing non-iavl kvstore directly.
but we can cache the whole thing in memory on start up, so duplication checking is fast.

Copy link
Member

@tac0turtle tac0turtle Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do something like this in store, but not sure its exposed to things out side of store currently. The main worry i have is around upgrades, where everyone stops and starts, then the map would be empty. Looking back at the x blocks would be best but since we dont know if everyone has it, we could end up in a bad situation.

we could create a simple txhash store for store v1 that handles this and is exposed through baseapp.

Copy link
Collaborator Author

@yihuang yihuang Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we need some support from storage layer here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consequences section should include the negative aspect of the nonce max number becoming 2^63, as mentioned in the existing comments.


## References

* https://github.com/cosmos/cosmos-sdk/issues/13009
Loading