Pipeline store writes #3177

lutter · 2022-01-21T21:35:13Z

This PR makes it possible to perform database writes in parallel with the rest of processing during indexing. The size of the write queue can be set with the environment variable GRAPH_STORE_WRITE_QUEUE which defaults to 5 write or revert operations. Setting this to 0 makes writing synchronously and reinstates the previous behavior, also bypassing a bunch of new code.

leoyvens · 2022-02-11T15:43:04Z

We had agreed to do point 2 of #3084 (comment) where the instance manager keeps track of its subgraph pointer rather than querying the writable agent for it. Does that still sound like a good idea to you?

lutter · 2022-02-12T18:10:24Z

We had agreed to do point 2 of #3084 (comment) where the instance manager keeps track of its subgraph pointer rather than querying the writable agent for it. Does that still sound like a good idea to you?

It does, I just didn't want to hold up this PR by adding that in, too. It'll require returning subgraph pointer/cursor from start_subgraph_deployment and needs some changes in the instance manager to thread that through IIRC.

leoyvens

Big brain PR 🧠

store/postgres/src/writable.rs

graph/src/components/store.rs

store/postgres/src/subgraph_store.rs

store/postgres/src/writable.rs

leoyvens · 2022-02-18T11:30:48Z

store/postgres/src/writable.rs

+        // work since that has its own call hierarchy, and using the
+        // foreground metrics will lead to incorrect nesting of sections
+        let stopwatch =
+            StopwatchMetrics::new(logger, queue.store.site.deployment.clone(), registry);


This will end up double counting on the unknown section and in the total deployment_sync_secs metric. A solution might be for StopwatchMetrics to support adding a prefix to all metrics, so each pipeline stage would have its own prefix.

I made it so that you can now add a 'stage' to StopwatchMetrics. LMK what you think

A label is very smart!

store/postgres/src/writable.rs

graph/src/util/bounded_queue.rs

store/postgres/src/writable.rs

lutter · 2022-02-23T00:32:51Z

Addressed all review comments

graph/src/util/bounded_queue.rs

leoyvens · 2022-02-24T17:18:30Z

store/postgres/src/writable.rs

+                        firehose_cursor,
+                    } => queue
+                        .store
+                        .revert_block_operations(block_ptr.clone(), firehose_cursor.as_deref()),


Now that this is called from a non-blocking tokio task, transact_block_operations and revert_block_operations should be asyncified.

I tried asyncifying transact_block_operations for a bit, but it requires that DeploymentStore.transact_block_operations clone all its arguments to own them because they get put into a background task since Rust doesn't understand that the with_conn(|_, _| ...).await that uses the arguments has to return before transact_block_operations returns and wants references with a static lifetime.

Rather than copy large amounts of data, I think we should hold off on asyncifying.

Then we should spawn the writer in a blocking task.

I didn't put the whole writer into a blocking task since that blocks the process from exiting (very annoying for tests); rather, I run the actual store operations as a blocking task now.

leoyvens · 2022-02-25T11:22:01Z

graph/src/util/bounded_queue.rs

+        queue.iter().rev().fold(init, f)
+    }
+
+    pub async fn clear(&self) {


The way we use it wouldn't manifest this, but in principle calling clear and pop concurrently could cause pop to panic, since clear will empty the queue without first acquiring the permits. If it tried to acquire the permits first we'd have a different potential bug, where the permit count may change between it being checked and the permits actually being acquired, and then clear could deadlock.

I'm thinking that BoundedQueue shouldn't intrinsically support clear, but rather the writer should call pop in a loop until the queue is empty when clearing it. This way we can justify the assumption that neither pop nor push are being called concurrently to this.

I added a try_pop (non-blocking pop) and made clear a wrapper around calling that in a loop; since there's no await on that code path now, there shouldn't be any possibility for races.

lutter · 2022-03-18T02:13:10Z

Addressed all comments; I'll rebase after the review to make it not too hard to follow the convo.

leoyvens · 2022-03-18T11:44:08Z

store/postgres/src/writable.rs

+                let store = queue.store.cheap_clone();
+                let stopwatch = stopwatch.cheap_clone();
+                let res =
+                    graph::spawn_blocking(


The naming of the graph::task_spawn methods is not very consistent, but you should use graph::spawn_blocking_allow_panic here which takes a closure rather than this which takes a future.

Or if you'd rather abort on panic you'd need to add a new helper that takes an FnOnce and does that, though catching the panic and propagating as an error seems like what we'd want.

I didn't realize that - I just updated the code; now, panics are turned into StoreError which seems like a much better way to handle them anyway

lutter · 2022-03-23T00:33:01Z

Rebased to latest master

lutter · 2022-03-30T22:31:33Z

@leoyvens I made some changes to this PR since you reviewed it. Could you have a look at the last 4 commits? I think the other ones didn't change, they were just rebased.

leoyvens

Seems like it would be helpful to rename Request::Revert to Request::RevertTo.

leoyvens · 2022-03-31T15:57:51Z

store/postgres/src/writable.rs

    }

    /// Return `true` if a write at this block will be visible, i.e., not
    /// reverted by a previous queue entry
    fn visible(&self, block_ptr: &BlockPtr) -> bool {
-        self.revert > block_ptr.number
+        self.revert >= block_ptr.number


Maybe it's just me but it feels like this is in Yoda order, and it would be more natural to write block_ptr.number <= self.revert.

Hah .. Yoda order :) Yes, reversing the comparison looks better

lutter · 2022-03-31T23:55:06Z

Added a commit with the two suggestions and rebased to latest master

We will need to make sure that we do not see changes that may or may not have been written yet by the background writer for `get`, `get_many`, and `load_dynamic_data_sources`. We do this by passing in an explicit block constraint.

Pipelining/buffering of writes can be turned off by setting GRAPH_STORE_WRITE_QUEUE to 0

This avoids a potential deadlock in flushing the write queue

…mment

Rather than clearing the queue by removing entries in bulk, which could race against a pop at the same time, clear the queue by popping one entry at a time.

Tests wait for the queue to be empty to see the result of changes; the code previously emptied the queue before a possible error had been recorded, which would cause test failures.

Since we now call StopwatchMetrics::new twice for each deployment, we need to go through the global_ .. mechanisms in the registry. Otherwise, only one set of metrics gets actually kept.

leoyvens requested changes Feb 18, 2022

View reviewed changes

leoyvens reviewed Feb 23, 2022

View reviewed changes

graph/src/util/bounded_queue.rs Outdated Show resolved Hide resolved

leoyvens reviewed Feb 24, 2022

View reviewed changes

leoyvens reviewed Feb 25, 2022

View reviewed changes

leoyvens mentioned this pull request Mar 3, 2022

block stream: Cleanup block stream metrics #3315

Merged

leoyvens requested changes Mar 18, 2022

View reviewed changes

leoyvens approved these changes Mar 21, 2022

View reviewed changes

lutter force-pushed the lutter/async-write branch from 44bbae5 to 68b71c7 Compare March 23, 2022 00:32

lutter force-pushed the lutter/async-write branch 3 times, most recently from 4af3183 to 703294a Compare March 30, 2022 22:30

leoyvens reviewed Mar 31, 2022

View reviewed changes

lutter force-pushed the lutter/async-write branch from 703294a to 86a6e79 Compare March 31, 2022 23:54

leoyvens approved these changes Apr 1, 2022

View reviewed changes

lutter force-pushed the lutter/async-write branch 3 times, most recently from 5204771 to 1f6b8ad Compare April 11, 2022 16:27

lutter added 6 commits April 12, 2022 13:30

all: Make store revert and transact async

808b80c

store: Rename some of the structs in writable

593df06

graph: Add an async queue of bounded size

fa38788

core: Use test_store::insert_entities in interfaces test

67aeb22

graph, store: Pipeline writes to the store

d67bc91

Pipelining/buffering of writes can be turned off by setting GRAPH_STORE_WRITE_QUEUE to 0

lutter added 16 commits April 12, 2022 13:30

node, store: Use a separate stopwatch for background work

b3e2d09

graph, store: Rename WritableStore.wait to flush

1b3efec

graph, store: Address some review comments

51d7e66

graph, store: Clear the write queue on error

0b7a280

This avoids a potential deadlock in flushing the write queue

all: Label StopWatchMetrics with a phase

6595bd2

graph, store: Actually forget permits when clearing queue. Improve co…

a4124a4

…mment

graph, store: Fix potential race condition in BoundedQueue.clear()

224196b

Rather than clearing the queue by removing entries in bulk, which could race against a pop at the same time, clear the queue by popping one entry at a time.

store: Run background writer for the write queue as blocking task

5f0b501

graph, store: Return panics from the background writer as errors

1cfeb7e

store: Reorganize write queue and requests to be more self-contained

079b526

store: Add some logging to the background writer

a1c7eaa

store: Fix race condition in tests

53b6405

Tests wait for the queue to be empty to see the result of changes; the code previously emptied the queue before a possible error had been recorded, which would cause test failures.

store: Fix store test handle_large_bytea_with_index

b216715

store: Fix off-by-on bug in BlockTracker, and test queue processing

410e6d5

store: Minor touchup in writable

cfc3869

graph: Fix error in how StopwatchMetrics are created

764f5c1

Since we now call StopwatchMetrics::new twice for each deployment, we need to go through the global_ .. mechanisms in the registry. Otherwise, only one set of metrics gets actually kept.

lutter force-pushed the lutter/async-write branch from 1f6b8ad to 764f5c1 Compare April 12, 2022 20:30

lutter merged commit 764f5c1 into master Apr 12, 2022

lutter deleted the lutter/async-write branch April 13, 2022 14:52

lutter mentioned this pull request Apr 13, 2022

Pipeline database writes #3071

Closed

azf20 mentioned this pull request Apr 29, 2022

Slowdown in SQL Inserts for Graph Nodes with many subgraphs #2940

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline store writes #3177

Pipeline store writes #3177

lutter commented Jan 21, 2022

leoyvens commented Feb 11, 2022 •

edited

Loading

lutter commented Feb 12, 2022

leoyvens left a comment •

edited

Loading

leoyvens Feb 18, 2022

lutter Feb 23, 2022

leoyvens Feb 23, 2022

lutter commented Feb 23, 2022

leoyvens Feb 24, 2022

lutter Feb 24, 2022

leoyvens Feb 25, 2022

lutter Mar 18, 2022 •

edited

Loading

leoyvens Feb 25, 2022

lutter Mar 18, 2022

lutter commented Mar 18, 2022

leoyvens Mar 18, 2022

lutter Mar 19, 2022

lutter commented Mar 23, 2022

lutter commented Mar 30, 2022

leoyvens left a comment

leoyvens Mar 31, 2022

lutter Mar 31, 2022

lutter commented Mar 31, 2022

Pipeline store writes #3177

Pipeline store writes #3177

Conversation

lutter commented Jan 21, 2022

leoyvens commented Feb 11, 2022 • edited Loading

lutter commented Feb 12, 2022

leoyvens left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lutter commented Feb 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lutter Mar 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lutter commented Mar 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lutter commented Mar 23, 2022

lutter commented Mar 30, 2022

leoyvens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lutter commented Mar 31, 2022

leoyvens commented Feb 11, 2022 •

edited

Loading

leoyvens left a comment •

edited

Loading

lutter Mar 18, 2022 •

edited

Loading