-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from Postgres to ClickHouse #29
Comments
I probably won’t have time to work on this for a while, but just wanted to document what I’m currently thinking is the solution here: manually-refreshed materialized views. I think we’ll need to migrate each table to a view which either directly references a column in an underlying table or falls back to the function. On startup, the agent would detect any configuration changes in “query acceleration columns” and add the column(s), changing the respective view to reference the new columns after they have been added. (Maybe a separate So for example, Each enabled column should also add triggers as necessary to any tables which affect the column. (For best performance, should triggers be combined so that each table only has a single column computation trigger to avoid multiple lookups?) Finally, some columns can be computed by the agent to save DB work, e.g. Management of this strategy will probably require that we move away from having Hasura manage the DB schema and instead have the agent actively manage it (such that the agent can detect configuration options and make changes where needed to the tables/views/triggers at startup). Most existing Hasura computed fields would also become normal fields since the function would be called by the view (Added benefit: this would clean up the unnecessary computed field parts of schema docs which don't need to be exposed to end users, e.g. The primary benefit of this strategy is that it offers the best possible performance, doesn’t waste computation/storage on columns which a particular use case doesn’t require, and it offers a consistent interface regardless of which “accelerated columns” are enabled – we won’t have to maintain or generate multiple Hasura metadata configurations, and SQL queries written for Chaingraph will always work regardless of configuration (albeit slower for queries requiring lots of non-unaccelerated, expensive computations). |
fee_satoshis
and other computed fields
Some important movement on this issue in the past few months: Hasura now supports connecting to a ClickHouse database as of v2.33.0. In general, ClickHouse is an excellent choice for our workload (schema requires no mutations, even during re-org events). With support now in Hasura, we could get far better performance out of a ClickHouse-backed Chaingraph than from the heavily-optimized Postgres-based solution described above. A migration would not be trivial, but I think it could be made mostly-transparent to end users. Query performance should be incomparably better, especially for block explorer/analytics-type queries which aren't accelerated by the indexes Chaingraph currently creates. I also expect the total database size to be significantly reduced, as ClickHouse should be able to heavily compress even small values (and ClickHouse is also much more conservative with row/column overhead). |
In early iterations (on Postgres 12), I experimented with expression indexes for
fee_satoshis
and other computed fields. I found that expression indexes didn't improve performance as much as I'd hoped, and they are surprisingly expensive to build and maintain (both in terms of DB latency and storage space).It would be valuable to try again using Postgres 14 and the latest version of Hasura – ideally, it would be possible to simply create expression indexes for any computed fields which will be commonly needed by a Chaingraph instance.
However, I think expression indexes are likely to be insufficient for use cases like block explorers or blockchain analysis systems, where we want most of these computed fields to be accessible in a single table row. (If a query has to hit 8 different expression indexes to return all the computed fields for a particular transaction or block, it's still going to be expensive.)
For these use cases, it would be valuable to somehow support a "storage-heavy" mode, where Chaingraph alters tables to include new columns for any computed values needed by the application. (It would be great if the particular columns were configurable, but if it makes maintenance significantly easier, we could also just have a single storage-heavy mode which would enable everything. Wallet backends might use the default install, while block explorers would choose the heavy option.)
It would be great if chaingraph could move back and forth between these modes – if the agent detects "heavy" mode has been enabled, it would alter all the required tables, then start intelligently filling columns (building up results by e.g. computing
input_value_satoshis
andoutput_value_satoshis
beforefee_satoshis
). Once enabled, the additional values could be maintained by either/both Postgres triggers and/or computations by the agent.This mode would also have to configure Hasura to read from the added table columns rather than use the computed field functions. (That might require us to ship two versions of Chaingraph's Hasura metadata.)
The text was updated successfully, but these errors were encountered: