Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add runtime support for offchain data sources & templates #3791

Merged
merged 33 commits into from
Aug 25, 2022

Conversation

Theodus
Copy link
Member

@Theodus Theodus commented Jul 29, 2022

This implements most of the runtime support for handling offchain data sources. Unfortunately there's a lot of boilerplate here, but I think it's best for a first pass since my initial approach at something more cohesive became a massive time sink.

@Theodus Theodus changed the title Add offchain data sources & templates Add runtime support for offchain data sources & templates Aug 1, 2022
@Theodus Theodus marked this pull request as ready for review August 1, 2022 20:07
@Theodus Theodus requested a review from leoyvens August 1, 2022 20:07
@Theodus Theodus force-pushed the theodus/offchain-data-sources branch 2 times, most recently from 44c1db9 to ca43c4a Compare August 4, 2022 20:58
graph/src/data_source/mod.rs Outdated Show resolved Hide resolved

/// Trigger data as parsed from the triggers adapter.
type TriggerData: TriggerData + Ord + Send + Sync + Debug;

/// Decoded trigger ready to be processed by the mapping.
/// New implementations should have this be the same as `TriggerData`.
type MappingTrigger: MappingTrigger + Debug;
type MappingTrigger: Send + Sync + Debug;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just move the ToAscPtr bound here, for convenience.

.instance
.process_trigger(
&self.logger,
&Arc::default(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not great, though it does seem to meet our initial requirements. Worth a comment that this will need further refactoring at some point.

core/src/subgraph/instance.rs Outdated Show resolved Hide resolved
core/src/subgraph/runner.rs Outdated Show resolved Hide resolved
@leoyvens
Copy link
Collaborator

leoyvens commented Aug 8, 2022

Great stuff!

One thing I hadn't previously considered is handling reverts. In the DB we get that for free, but we need to also revert the in-memory state. It might be that this

pub(crate) fn revert_data_sources(&mut self, reverted_block: BlockNumber) {
just happens to be sufficent, but do we additionally need any revert handling in the offchain monitor?

@leoyvens leoyvens force-pushed the theodus/offchain-data-sources branch 6 times, most recently from 8e1de3c to 962e080 Compare August 15, 2022 13:19
@leoyvens leoyvens force-pushed the theodus/offchain-data-sources branch 2 times, most recently from 76d856c to f85a33f Compare August 19, 2022 12:54
@leoyvens
Copy link
Collaborator

leoyvens commented Aug 22, 2022

This PR is now sufficient to to merge for internal testing, after review and integration cluster of course.

There is a severe though infrequent known bug, which is that if a file data source is created on block B1, but the files are only found on a later block B2 and B2 is reverted, then the entities will be gone forever. The intended fix is to write the entities for file data sources on the creation block rather than the current block, this way they will only be reverted if the data source also is. This will be the first follow up to this PR in case it isn't done in time for merging.

@leoyvens leoyvens force-pushed the theodus/offchain-data-sources branch from 52dc36e to 4dc8d1c Compare August 23, 2022 13:01
Copy link
Member Author

@Theodus Theodus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Forbidding static file data sources for now seems like a sensible decision.

@leoyvens leoyvens force-pushed the theodus/offchain-data-sources branch 2 times, most recently from 4be1ed3 to 5825395 Compare August 23, 2022 17:36
Copy link
Collaborator

@lutter lutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at dynds/private.rs, but that all makes sense to me.

// Data source deduplication enforces this invariant.
// See also: data-source-is-duplicate-of
return Err(constraint_violation!(
"expected to remove at most one offchain data source but removed {}, ds: {:?}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this will abort the txn, I would phrase the error as expected to remove at most one offchain data source but would have removed ... to make it a little less scary (nothing was lost in this case)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well noted!

self.qname
);
// Onchain data sources have the causality region explicitly set to 0, while offchain
// data sources have an unique causality region assigned from the sequence.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment would match the code better if was something like

// Offchain data sources have a unique causality region assigned from a sequence in the database,
// while onchain data sources always have causality region 0

Bound::Included(block) => Some(block),

// Should never happen.
Bound::Excluded(_) | Bound::Unbounded => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're dealing with integers, wouldn't Bound::Excluded(block) be the same as Bound::Included(block+1)? It probably won't happen but isn't really fatal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we don't use or plan to use excluded initial bounds, so we can't know what they would mean. And our usage of lower(block_range) assumes an included bound, so it's best to blow up if an excluded bound is ever found.

@leoyvens leoyvens force-pushed the theodus/offchain-data-sources branch from e975314 to eaea37f Compare August 23, 2022 20:11
@leoyvens leoyvens force-pushed the theodus/offchain-data-sources branch from eaea37f to 149a14d Compare August 24, 2022 12:59
@leoyvens leoyvens merged commit fc4cd68 into master Aug 25, 2022
@leoyvens leoyvens deleted the theodus/offchain-data-sources branch August 25, 2022 11:11
@leoyvens
Copy link
Collaborator

Integration cluster shows no divergences, merging!

kaiwetlesen pushed a commit to kaiwetlesen/graph-node that referenced this pull request Sep 7, 2022
…ocol#3791)

* Refactor manifest data sources

* Refactor manifest data source templates

* Start offchain monitors for static sources

* Run offchain handlers

* offchain: dont expect `manifest_idx` in the manifest

* offchain: add `match_and_decode`, persist normally, require source

* trigger processor: take block ptr from the trigger

* offchain: Return cid to dataSource.address host fn

* runner: transact modifications of offchain events

* ethereum: fix test build

* ipfs: Set a default maximum file size

* ipfs: Add env var for max concurrent requests

* ipfs: Share ipfs service across subgraphs

* offchain: move `ready_offchain_events` to `OffchainMonitor`

* runner: Clarify comments

* core: Remove unecessary params from `add_dynamic_data_source`

* core: Move poi_version out of the instance

* core: Move `mod instance` under `mod context`

This should clarify the SubgraphInstance responsiblity of keeping
track of the hosts.

* core: Refactor OffchainMonitor::add_data_source

* offchain: Better handling of duplicates

* offchain: Bump max ipfs concurrent requests to 100

* refactor: Expose RuntimeHost data source

* offchain: Remove dses that have been processed

* refactor: Extract ReadStore out of WritableStore

* test: Add graphql queries to end-to-end tests

* feat(file ds): Bump max spec version to 0.0.7

* test: Add basic file data sources e2e test

* runner: Isolate offchain data sources

* offchain: Forbid static file data sources

* store: Assign separate causality region for offchain dses

* graph: Fix release build

* tests: yarn upgrade, add file ds to the workspace

* fix: Update comments

Co-authored-by: Leonardo Yvens <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants