persist: reintroduce in-mem blob cache

Originally introduced in MaterializeInc#19614 but reverted in MaterializeInc#19945 because we were seeing segfaults in the lru crate this was using. I've replaced it with a new simple implementation of an lru cache. This is particularly interesting to revisit now because we might soon be moving to a world in which each machine has attached disk and this is a useful stepping stone to a disk-based cache that persists across process restarts (and thus helps rehydration). The original motivation is as follows. A one-time (skunkworks) experiment showed that showed an environment running our demo "auction" source + mv got 90%+ cache hits with a 1 MiB cache. This doesn't scale up to prod data sizes and doesn't help with multi-process replicas, but the memory usage seems unobjectionable enough to have it for the cases that it does help. Possibly, a decent chunk of why this is true is pubsub. With the low pubsub latencies, we might write some blob to s3, then within milliseconds notify everyone in-process interested in that blob, waking them up and fetching it. This means even a very small cache is useful because things stay in it just long enough for them to get fetched by everyone that immediately needs them. 1 MiB is enough to fit things like state rollups, remap shard writes, and likely many MVs (probably less so for sources, but atm those still happen in another cluster).
danhhz · Jan 5, 2024 · fa64c99 · fa64c99
1 parent e941223
commit fa64c99
Show file tree

Hide file tree

Showing 2 changed files with 607 additions and 42 deletions.
diff --git a/src/persist-client/proptest-regressions/internal/cache.txt b/src/persist-client/proptest-regressions/internal/cache.txt
@@ -0,0 +1,7 @@
+# Seeds for failure cases proptest has generated in the past. It is
+# automatically read and these particular cases re-run before any
+# novel cases are generated.
+#
+# It is recommended to check this file in to source control so that
+# everyone who runs the test benefits from these saved cases.
+cc 520a1ce380cba2b6a303454a884b5feecbf32e3628eae0f2840b793c9a75b78a # shrinks to state = [Insert { key: 235, weight: 0 }, Get { key: 235 }]