Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the SyncService to more closely implement the specification #552

Merged
merged 32 commits into from
May 26, 2021

Conversation

tynes
Copy link
Contributor

@tynes tynes commented Apr 22, 2021

Description
Replaces #477

Description
Implements the SyncService spec
https://github.com/ethereum-optimism/specs/blob/main/l2-geth/transaction-ingestor.md

The spec was designed to prevent double ingestion or skipping of ingestion of transactions. The function names in the spec should match 1:1 the methods on the SyncService. The actual logic implemented differs slightly from the spec in how it implements its break conditions, but is functionally equivalent.

  • The sequencer queries for transaction batches and compares the individual transactions to what it has locally (first step to reorgs based on L1)
  • The verifier syncs using transaction batches
  • Query param based calls to the DTL to query for L1 or L2 transactions (explicit instead of implicit DTL config)
  • Ended up adding an additional db index for the latest BatchIndex because that was more simple and less bug prone

Adds a new config option:

  • ROLLUP_BACKEND - it must be set to L1 or L2. This sets the query param that is used by the RollupClient when querying the DTL

Additional context
#477 was broken up and this is the PR that includes most of the logic around the SyncService refactor

@changeset-bot
Copy link

changeset-bot bot commented Apr 22, 2021

🦋 Changeset detected

Latest commit: 112f0e8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@eth-optimism/l2geth Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@tynes tynes force-pushed the l2geth/sync-spec branch from 61e4190 to 895cb81 Compare April 22, 2021 01:05
@tynes tynes added A-geth C-feature Category: features and removed P-DO-NOT-MERGE labels Apr 22, 2021
@tynes tynes marked this pull request as ready for review April 22, 2021 21:53
@tynes tynes changed the title WIP: implement syncservice spec geth: implement sync service spec Apr 22, 2021
l2geth/rollup/sync_service.go Outdated Show resolved Hide resolved
@tynes tynes added the P-high label Apr 23, 2021
@snario snario removed the P-high label Apr 26, 2021
@gakonst gakonst self-assigned this Apr 28, 2021
Copy link
Contributor

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rebased this to master and resolved the conflicts.

High level logic looks good - I think we need more unit tests before we merge this, e.g. checks that batches get correctly applied, that the tip gets synced, that the timestamps are correct after a mix of batch/queue transactions becoming available

l2geth/cmd/utils/flags.go Outdated Show resolved Hide resolved
l2geth/rollup/sync_service.go Outdated Show resolved Hide resolved
l2geth/rollup/sync_service.go Show resolved Hide resolved
l2geth/rollup/sync_service.go Show resolved Hide resolved
if err != nil {
return fmt.Errorf("Canot get enqueue transaction; %w", err)
}
if err := s.applyTransaction(tx); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be any validation on the transaction here, similar to the ValidateAndApplySequencerTransaction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enqueue transactions are validated by the L1 contracts so we can assume that they have already been validated by this point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, let's add this as a comment for clarity. Something like "We do not need to do validation checks here because they've already been done in the CTC, link to code"

l2geth/rollup/types.go Show resolved Hide resolved
Comment on lines +582 to +583
s.SetLatestL1Timestamp(ts)
s.SetLatestL1BlockNumber(bn.Uint64())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this desired behavior? Why? Maybe obvious, but should be noted in the comment that there's side effects to the blocknumber and timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the incoming transaction's timestamp is greater than the latest timestamp, increase the latest timestamp to the incoming transaction's timestamp. This is to prevent monotonicity errors

l2geth/rollup/sync_service.go Show resolved Hide resolved
l2geth/rollup/sync_service.go Show resolved Hide resolved
Comment on lines 610 to 613
s.txFeed.Send(core.NewTxsEvent{Txs: txs})
// Block until the transaction has been added to the chain
log.Debug("Waiting for transaction to be added to chain", "hash", tx.Hash().Hex())
<-s.chainHeadCh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: This is the reason transactions go through the full txpool / miner codepath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do not go through the txpool but they do go through the miner. This is to block until the transaction sent over the feed is added to the chain. Without this, calling reorganize breaks with an unknown ancestor error. This is due to many transactions sitting in the channel - a reorg would need to empty the channel before reorg'ing if we went with that approach. There may be other complications, for example when shutting down we would need to wait for the channel to fully empty, otherwise transactions may be skipped. This PR does not include the reorganize logic but it needs to be added in the future

@gakonst gakonst assigned tynes and unassigned gakonst Apr 29, 2021
@snario snario changed the title geth: implement sync service spec Refactor the SyncService to more closely implement the specification Apr 29, 2021
@github-actions github-actions bot added the M-dtl label May 13, 2021
tynes and others added 6 commits May 13, 2021 13:49
* l2geth: fix get last confirmed enqueue

* chore: add changeset

* client: return error correctly
* batch-submitter: backwards compatible configuration

* chore: add changeset

* deps: update

* js: move bcfg interface to core-utils

* batch-submitter: parse USE_SENTRY and add to env example

* chore: add changeset

* batch-submitter: parse as float instead of int

* batch-submitter: better error logging
Co-authored-by: Georgios Konstantopoulos <[email protected]>
Co-authored-by: Georgios Konstantopoulos <[email protected]>
@github-actions github-actions bot removed A-op-batcher Area: op-batcher A-ts-packages A-pkg-core-utils Area: packages/core-utils labels May 13, 2021
l2geth/cmd/geth/main.go Show resolved Hide resolved
l2geth/cmd/utils/flags.go Show resolved Hide resolved
@@ -69,3 +69,24 @@ func WriteHeadVerifiedIndex(db ethdb.KeyValueWriter, index uint64) {
log.Crit("Failed to store verifier index", "err", err)
}
}

// ReadHeadBatchIndex will read the known tip of the processed batches
func ReadHeadBatchIndex(db ethdb.KeyValueReader) *uint64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name (Read/Write)HeadBatchIndex is a bit confusing, might consider updating it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{Read/Write}HeadTransactionBatchIndex ?
Open to a suggestion

l2geth/core/rawdb/rollup_indexes.go Show resolved Hide resolved
l2geth/core/rawdb/rollup_indexes.go Show resolved Hide resolved
l2geth/rollup/sync_service.go Outdated Show resolved Hide resolved
// If the timestamp of the transaction is greater than the sync
// service's locally maintained timestamp, update the timestamp and
// blocknumber to equal that of the transaction's. This should happen
// with `enqueue` transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say that this should happen with enqueue transactions. Should we add a sanity check that the tx.QueueOrigin().Uint64() == uint64(types.QueueOriginL1ToL2) in this case then? Or is it guaranteed that tx.L1Timestamp() != 0 only in the case that the transaction is an enqueue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is guaranteed that tx.L1Timestamp() == 0 only for types.QueueOriginSequencer transactions, assuming no bugs in the upstream software

if tx.GetMeta().Index == nil {
index := s.GetLatestIndex()
if index == nil {
tx.SetIndex(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'm somewhat averse to having the client mutate transactions directly like this. It feels prone to error. For instance, verifiers should never be mutating transactions, right? Could that be an invariant that we try to enforce?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transaction index must be set by the SyncService (both sequencer and verifier) as the index is dependent on the current blockchain. It is set and then indexed

txs := types.Transactions{tx}
s.txFeed.Send(core.NewTxsEvent{Txs: txs})
// Block until the transaction has been added to the chain
log.Trace("Waiting for transaction to be added to chain", "hash", tx.Hash().Hex())
<-s.chainHeadCh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause sendTransaction RPC requests to hang until all previous transactions have been fully executed and included in the chain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - I believe it will, which is not ideal at all but we need to implement a queue asap that transactions are added to as they come in. We will currently block once the txFeed channel is full and on shutdown, all of the txs are lost that are waiting to get into the channel or are in the channel. Short term we need a queue that txs are added to from the RPC handler and the tx hash is returned to the user and then the sync service pulls from that queue continuously. Our longer term plan is to have a single database be the source of truth where transactions are indexed and pulled in one by one.

Right now there isn't a good way to get around having this synchronization mechanism if we want to do reorgs and I do have a follow up PR that removes this by removing the miner from the hot code path and using a custom consensus.Engine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this change (removed here) as long as we include the miner change ASAP.

Comment on lines +599 to +600
// the L1 block that holds the transaction. This should never happen but is
// a sanity check to prevent fraudulent execution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the be a panic if it should never happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot panic here - it would be ideal to but we need a db healer after it panics

l2geth/core/rawdb/rollup_indexes.go Show resolved Hide resolved
Copy link
Contributor

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Next steps to make this code better are to 1) remove the miner codepath (#875) and introduce the sequencer FIFO queue (like a mempool for the sequencer txs), in separate PRs

@tynes tynes merged commit a25acbb into develop May 26, 2021
@tynes tynes deleted the l2geth/sync-spec branch May 26, 2021 16:24
@snario
Copy link
Contributor

snario commented May 26, 2021

🥳🥳🥳🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category: features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants