-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New DB representation for data sources #3405
Comments
This looks good. A few small comments:
|
The About |
That's pretty good, how will migrations be handled? |
My current thinking is to not migrate existing subgraphs, so graph node will continue to support the existing schema. |
Another option for migration would be to do it during |
It would make the code cleaner, though it is adds risk to the migration itself. One thing to consider is that when doing the migration we need to block all indexing that could be happening concurrently, and make sure the index nodes are updated before continuing. And the downtime this migration could impose on hosted, due to the number of subgraphs, might not be acceptable. |
That's why I suggested to do it in |
Ah I see now. Then a remaining issue would be if we should have code to revert the migration ready, or if we should rely on the integration testing to give us enough confidence to migrate without the option of going back. |
My concern is on the backwards compatibility with previous graph node versions. It's been a while since we've made a change that breaks the possibility of downgrading. And it's dangerous if downgrading corrupts the DB. The migration doesn't really need to be reverted since it's not destructive, it only copies data to a new table. One way to maintain backwards compatibility would be to write ethereum data sources both to the legacy |
I think this data is only relevant during indexing, so could this be handled by apiVersioning? (file data sources will necessitate a new apiVersion) |
This change can be made without changing any existing behaviour, so it doesn't need a new api version. |
Yes, but the functionality which requires it (file data sources) will require a new api version. If we want to maintain backwards compatibility, either the new Graph Node versions write to the legacy table, or versioning determines which table is written to for a given subgraph (older Graph Node versions won't be able to index the new apiVersion anyway, so we don't need to worry about rolling back) |
That's possible, but makes it less likely that we'll ever remove the code path which supports the legacy table. |
After discussions with @lutter, the latest plan is:
|
This comment was marked as off-topic.
This comment was marked as off-topic.
This is implemented for new deployments. The remaining work tracked by this issue would be to migrate existing deployments to the new schema. |
It's been a year and no particular need to actively move subgraphs to the new schema has been identified, so I'm closing. |
The DB storage of dynamic data sources needs to change to reflect that graph node is now multichain and to fullfill the requirements of file data sources.
Dynamic data sources are currently stored in
sugraphs.dynamic_ethereum_contract_data_source
. This proposes that each subgraph will gain asgd*.data_source
table under its own DB namespace. The proposed schema for this table is:Each field is explained and justified below. This is a good time to bikeshed on all of their names.
vid
: Just a primary key, has no particular meaning.block_range
: The use of a block range is meant to allow an efficent implementation of data source removal, which is not currently a feature but quickly becomes necessary for use cars such as replacing the IPFS file associated with a user profile. Since we antecipate that the query "which data sources are live at this block" will be relevant, a range makes sense compared to separate start and end block fields.access_group
: Works as described here in the 'data source indepedency' section. This will also be added to entity tables. We let the DB handle generating new access groups from a sequence when necessary, since these need not be deterministic.manifest_idx
: Serves the purposes thatname
currently serves, to look up the definition in the manifest. This more efficient representation stores the position in the manifest rather than the name.id
: An optional user given id, in case the user needs to refer to a specfic data source after creation, such as for removal.param
: The stored representation of theparams
argument passed on creation. Examples of stored data: Address for Ethereum, account id for near, file CID for file data sources. Abytea
representation is optimal for current uses, and we can always add ajsonb
alternative if required in the future.context
: Optional data soure context, as it exists today.parent
: The data source which triggered the craetion of this data source. Tracking parent-child wil relationships will likely be required for features such as ipfs data sources created from other data sources. It's assumed in this design that each data source has a single parent.Unblocks:
#3072
#3202
Ping @lutter @maoueh for review.
The text was updated successfully, but these errors were encountered: