Optimize repeated apollo-cache-inmemory reads by caching partial query results. #3394

benjamn · 2018-05-03T00:00:32Z

Reading repeatedly from apollo-cache-inmemory using either readQueryFromStore or diffQueryAgainstStore currently returns a newly-computed object each time, even if no data IDs in the cache have changed.

Passing the previousResult option can improve application performance by ensuring that equivalent results are === to each other, but the presence of previousResult only makes the cache reading computation more expensive, because new objects are still created and then thrown away if they are structurally equivalent to the previousResult.

This PR is a work-in-progress with the goal of returning previous results (including nested result objects, not just the top-level result) immediately, without any unnecessary recomputation, as long as the underlying data IDs involved in the original computation have not been modified in the meantime.

This functionality is based on an npm package called optimism that I wrote to salvage rebuild performance for Meteor 1.4.2, by avoiding unnecessarily rereading files from the file system. It is not an overstatement to say that Meteor would no longer exist as a project without this powerful caching technique.

The optimism library allows caching the results of functions based on (a function of) their arguments, while also keeping track of any other cached functions that were called in the process of evaluating the result, so that the result can be invalidated (or "dirtied") when any of the results of those other functions are dirtied. Dirtying is a very cheap, idempotent operation, since it does not force immediate recomputation, but simply marks the dirtied result as needing to be recomputed the next time the cached function is called with equivalent arguments.

If this approach is successful, it should effectively close the performance gap between apollo-cache-inmemory and https://github.com/convoyinc/apollo-cache-hermes, at least as far as cache reads are concerned, without sacrificing exactness.

Cache write performance should also benefit dramatically, since much of the cost of writing to the cache comes from broadcasting new results for existing queries, which requires first rereading those results from the updated cache.

Along the way, I have taken many opportunities to refactor and simplify the apollo-cache-inmemory code. For example, the first few commits in this PR eliminate the use of graphql-anywhere to read from the local store, which unlocks a number of optimization opportunities by removing a relatively opaque layer of abstraction.

I will try to add comments to the commits below to highlight areas of special interest.

apollo-cla · 2018-05-03T00:02:21Z

	Warnings
⚠️	❗ Big PR

Generated by 🚫 dangerJS

stubailo · 2018-05-03T00:33:57Z

packages/apollo-cache-inmemory/src/executeStoreQuery.ts

+  const args = argumentsObjectFromField(field, variables);
+
+  const info: ExecInfo = {
+    resultKey: resultKeyNameFromField(field),


I'm curious if this would be faster if we cached resultKey as a property on field.

clayne11 · 2018-05-04T15:41:23Z

This is a fantastic idea. I brought up this issue as it pertains to rendering in #2895 but this will also solve other performance problems. Great work!

bbrzoska · 2018-05-10T18:28:16Z

packages/apollo-cache-inmemory/src/inMemoryCache.ts

    return diffQueryAgainstStore({
-      store: this.config.storeFactory(this.extract(query.optimistic)),
+      store: store,


: store can be omitted.

bbrzoska · 2018-05-10T18:28:52Z

packages/apollo-cache-inmemory/src/optimism.ts

+    options?: OptimisticWrapOptions,
+  ): OptimisticWrapperFunction<T>;
+  defaultMakeCacheKey(...args: any[]): any;
+} = require('optimism'); // tslint:disable-line


Why require instead of import + ambient type declaration file for optimism?

bbrzoska · 2018-05-10T18:32:52Z

packages/apollo-cache-inmemory/src/writeToStore.ts

        // we should only merge if it's an object of the same type
        // otherwise, we should delete the generated object
        if (typenameChanged) {
-          store.delete(generatedKey);
+          store.delete(escapedId.id);


Should probably port this fix too:

apollo-client/packages/apollo-cache-inmemory/src/writeToStore.ts

Lines 456 to 461 in 2e7f191

// remove the old generated value in case the old value was

// inlined and the new value is not, which is indicated by

// the old id being generated and the new id being real

if (!generated) {

store.delete(generatedKey);

}

clayne11 · 2018-05-18T16:56:38Z

packages/apollo-cache-inmemory/src/inMemoryCache.ts

+  public hasDepTrackingCache() {
+    return this.data instanceof DepTrackingCache;
+  }
+
  protected broadcastWatches() {
    // Skip this when silenced (like inside a transaction)
    if (this.silenceBroadcast) return;

    // right now, we invalidate all queries whenever anything changes


Probably want to remove this comment since it's no longer true.

jamesreggio · 2018-05-21T01:21:48Z

I tried to give this branch a try tonight, but ran into some issues.

First, I encountered the problem I described over here: #3300 (comment)

After patching my way around this issue, I ran into an infinite recursion bug in the merge helper function. It wasn't clear what the problem is, and reverting the changes to merge from #3300 didn't fix the issue.

I'll try to pull together a Gist with a query and payload so you can reproduce it.

jamesreggio · 2018-05-21T14:10:13Z

Alright, here's a repro: https://gist.github.com/jamesreggio/eedd17511a3d64d1ba1613cbc08d78c5

It includes the original GraphQL document + variables, the parsed GraphQL document, the resulting data from the server, and the error.

I have a hunch that changes to merge in #3300 led to this issue — but it's quite onerous to cut builds of individual packages within the apollo-client repo for use in another project, so I haven't tried reverting those change wholesale to see if it resolves the problem. Perhaps you can give that a try?

clayne11 · 2018-06-04T15:28:59Z

Any movement on this? This is an incredibly exciting improvement.

Restoring non-enumerability of the ID_KEY Symbol in #3544 made ID_KEY slightly more hidden from application code, at the cost of slightly worse performance (because of Object.defineProperty), but tests were still broken because Jest now includes Symbol keys when checking object equality (even the non-enumerable ones). Fortunately, given all the previousResult refactoring that has happened in PR #3394, we no longer need to store ID_KEY properties at all, which completely side-steps the question of whether ID_KEY should be enumerable or not, and avoids any problems due to Jest including Symbol keys when checking deep equality. If we decide to bring this ID metadata back in the future, we could use a WeakMap to associate result objects with their IDs, so that we can avoid modifying the result objects.

@hwillson

After #3444 removed `Map`-based caching for `addTypenameToDocument` (in order to fix memory leaks), the `InMemoryCache#transformDocument` method now creates a completely new `DocumentNode` every time it's called (assuming this.addTypename is true, which it is by default). This commit uses a `WeakMap` to cache calls to `addTypenameToDocument` in `InMemoryCache#transformDocument`, so that repeated cache reads will no longer create an unbounded number of new `DocumentNode` objects. The benefit of the `WeakMap` is that it does not prevent its keys (the original `DocumentNode` objects) from being garbage collected, which is another way of preventing memory leaks. Note that `WeakMap` may have to be polyfilled in older browsers, but there are many options for that. This optimization will be important for #3394, since the query document is involved in cache keys used to store cache partial query results. cc @hwillson @jbaxleyiii @brunorzn

@hwillson

After #3444 removed `Map`-based caching for `addTypenameToDocument` (in order to fix memory leaks), the `InMemoryCache#transformDocument` method now creates a completely new `DocumentNode` every time it's called (assuming this.addTypename is true, which it is by default). This commit uses a `WeakMap` to cache calls to `addTypenameToDocument` in `InMemoryCache#transformDocument`, so that repeated cache reads will no longer create an unbounded number of new `DocumentNode` objects. The benefit of the `WeakMap` is that it does not prevent its keys (the original `DocumentNode` objects) from being garbage collected, which is another way of preventing memory leaks. Note that `WeakMap` may have to be polyfilled in older browsers, but there are many options for that. This optimization will be important for #3394, since the query document is involved in cache keys used to store cache partial query results. cc @hwillson @jbaxleyiii @brunorzn

@hwillson

After #3444 removed `Map`-based caching for `addTypenameToDocument` (in order to fix memory leaks), the `InMemoryCache#transformDocument` method now creates a completely new `DocumentNode` every time it's called (assuming `this.addTypename` is true, which it is by default). This commit uses a `WeakMap` to cache calls to `addTypenameToDocument` in `InMemoryCache#transformDocument`, so that repeated cache reads will no longer create an unbounded number of new `DocumentNode` objects. The benefit of the `WeakMap` is that it does not prevent its keys (the original `DocumentNode` objects) from being garbage collected, which is another way of preventing memory leaks. Note that `WeakMap` may have to be polyfilled in older browsers, but there are many options for that. This optimization will be important for #3394, since the query document is involved in cache keys used to store cache partial query results. cc @hwillson @jbaxleyiii @brunorzn

Not all environments where WeakMap must be polyfilled do so reliably: #3394 (comment)

The previousResult option was originally a way to ensure referential identity of structurally equivalent cache results, before the result caching system was introduced in #3394. It worked by returning previousResult whenever it was deeply equal to the new result. The result caching system works a bit differently, and in particular never needs to do a deep comparison of results. However, there were still a few (test) cases where previousResult seemed to have a positive effect, and removing it seemed like a breaking change, so we kept it around. In the meantime, the equality check has continued to waste CPU cycles, and the behavior of previousResult has undermined other improvements, such as freezing cache results (#4514). Even worse, previousResult effectively disabled an optimization that allowed InMemoryCache#broadcastWatches to skip unchanged queries (see comments I removed if curious). This commit restores that optimization. I realized eliminating previousResult might finally be possible while working on PR #5617, which made the result caching system more precise by depending on IDs+fields rather than just IDs. This additional precision seems to have eliminated the few remaining cases where previousResult had any meaningful benefit, as evidenced by the lack of any test changes in this commit... even among the many direct tests of previousResult in __tests__/diffAgainstStore.ts! The removal of previousResult is definitely a breaking change (appropriate for Apollo Client 3.0), because you can still contrive cases where some never-before-seen previousResult object just happens to be deeply equal to the new result. Also, it's fair to say that this removal will strongly discourage disabling the result caching system (which is still possible for diagnostic purposes), since we rely on result caching to get the benefits that previousResult provided.

The result caching system introduced by #3394 gained the ability to cache optimistic results (rather than just non-optimistic results) in #5197, but since then has suffered from unnecessary cache key diversity during optimistic updates, because every EntityStore.Layer object (corresponding to a single optimistic update) counts as a distinct cache key, which prevents cached results from being reused if they were originally read from a different Layer object. This commit introduces the concept of a CacheGroup, store.group, which manages dependency tracking and also serves as a source of keys for the result caching system. While the Root object has its own CacheGroup, Layer objects share a CacheGroup object, which is the key to limiting diversity of cache keys when more than one optimistic update is pending. This separation allows the InMemoryCache to enjoy the full benefits of result caching for both optimistic (Layer) and non-optimistic (Root) data, separately.

The previousResult option was originally a way to ensure referential identity of structurally equivalent cache results, before the result caching system was introduced in #3394. It worked by returning previousResult whenever it was deeply equal to the new result. The result caching system works a bit differently, and in particular never needs to do a deep comparison of results. However, there were still a few (test) cases where previousResult seemed to have a positive effect, and removing it seemed like a breaking change, so we kept it around. In the meantime, the equality check has continued to waste CPU cycles, and the behavior of previousResult has undermined other improvements, such as freezing cache results (#4514). Even worse, previousResult effectively disabled an optimization that allowed InMemoryCache#broadcastWatches to skip unchanged queries (see comments I removed if curious). This commit restores that optimization. I realized eliminating previousResult might finally be possible while working on PR #5617, which made the result caching system more precise by depending on IDs+fields rather than just IDs. This additional precision seems to have eliminated the few remaining cases where previousResult had any meaningful benefit, as evidenced by the lack of any test changes in this commit... even among the many direct tests of previousResult in __tests__/diffAgainstStore.ts! The removal of previousResult is definitely a breaking change (appropriate for Apollo Client 3.0), because you can still contrive cases where some never-before-seen previousResult object just happens to be deeply equal to the new result. Also, it's fair to say that this removal will strongly discourage disabling the result caching system (which is still possible for diagnostic purposes), since we rely on result caching to get the benefits that previousResult provided.

The previousResult option was originally a way to ensure referential identity of structurally equivalent cache results, before the result caching system was introduced in #3394. It worked by returning previousResult whenever it was deeply equal to the new result. The result caching system works a bit differently, and in particular never needs to do a deep comparison of results. However, there were still a few (test) cases where previousResult seemed to have a positive effect, and removing it seemed like a breaking change, so we kept it around. In the meantime, the equality check has continued to waste CPU cycles, and the behavior of previousResult has undermined other improvements, such as freezing cache results (#4514). Even worse, previousResult effectively disabled an optimization that allowed InMemoryCache#broadcastWatches to skip unchanged queries (see comments I removed if curious). This commit restores that optimization. I realized eliminating previousResult might finally be possible while working on PR #5617, which made the result caching system more precise by depending on IDs+fields rather than just IDs. This additional precision seems to have eliminated the few remaining cases where previousResult had any meaningful benefit, as evidenced by the lack of any test changes in this commit... even among the many direct tests of previousResult in src/cache/inmemory/__tests__/diffAgainstStore.ts! The removal of previousResult is definitely a breaking change (appropriate for Apollo Client 3.0), because you can still contrive cases where some never-before-seen previousResult object just happens to be deeply equal to the new result. Also, it's fair to say that this removal will strongly discourage disabling the result caching system (which is still possible for diagnostic purposes), since we rely on result caching to get the benefits that previousResult provided.

The result caching system introduced by #3394 gained the ability to cache optimistic results (rather than just non-optimistic results) in #5197, but since then has suffered from unnecessary cache key diversity during optimistic updates, because every EntityStore.Layer object (corresponding to a single optimistic update) counts as a distinct cache key, which prevents cached results from being reused if they were originally read from a different Layer object. This commit introduces the concept of a CacheGroup, store.group, which manages dependency tracking and also serves as a source of keys for the result caching system. While the Root object has its own CacheGroup, Layer objects share a CacheGroup object, which is the key to limiting diversity of cache keys when more than one optimistic update is pending. This separation allows the InMemoryCache to enjoy the full benefits of result caching for both optimistic (Layer) and non-optimistic (Root) data, separately.

When an object is evicted from the cache, common intuition says that any dangling references to that object should be proactively removed from elsewhere in the cache. Thankfully, this intuition is misguided, because a much simpler and more efficient approach to handling dangling references is already possible, without requiring any new cache features. As the tests added in this commit demonstrate, the cleanup of dangling references can be postponed until the next time the affected fields are read from the cache, simply by defining a custom read function that performs any necessary cleanup, in whatever way makes sense for the logic of the particular field. This lazy approach is vastly more efficient than scanning the entire cache for dangling references would be, because it kicks in only for fields you actually care about, the next time you ask for their values. For example, you might have a list of references that should be filtered to exclude the dangling ones, or you might want the dangling references to be nullified in place (without filtering), or you might have a single reference that should default to something else if it becomes invalid. All of these options are matters of application-level logic, so the cache cannot choose the right default strategy in all cases. By default, references are left untouched unless you define custom logic to do something else. It may actually be unwise/destructive to remove dangling references from the cache, because the evicted data could always be written back into the cache at some later time, restoring the validity of the references. Since eviction is not necessarily final, dangling references represent useful information that should be preserved by default after eviction, but filtered out just in time to keep them from causing problems. Even if you ultimately decide to prune the dangling references, proactively finding and removing them is way more work than letting a read function handle them on-demand. This system works because the result caching system (#3394, #5617) tracks hierarchical field dependencies in a way that causes read functions to be reinvoked any time the field in question is affected by updates to the cache, even if the changes are nested many layers deep within the field. It also helps that custom read functions are consistently invoked for a given field any time that field is read from the cache, so you don't have to worry about dangling references leaking out by other means.

The makeVar method was originally attached to InMemoryCache so that we could call cache.broadcastWatches() whenever the variable was updated. See #5799 and #5976 for background. However, as a number of developers have reported, requiring access to an InMemoryCache to create a ReactiveVar can be awkward, since the code that calls makeVar may not be colocated with the code that creates the cache, and it is often desirable to create and initialize reactive variables before the cache has been created. As this commit shows, the ReactiveVar function can infer the current InMemoryCache from a contextual Slot, when called without arguments (that is, when reading the variable). When the variable is updated (by passing a new value to the ReactiveVar function), any caches that previously read the variable will be notified of the update. Since this logic happens at variable access time rather than variable creation time, makeVar can be a free-floating global function, importable directly from @apollo/client. This new system allows the variable to become associated with any number of InMemoryCache instances, whereas previously a given variable was only ever associated with one InMemoryCache. Note: when I say "any number" I very much mean to include zero, since a ReactiveVar that has not been associated with any caches yet can still be used as a container, and will not trigger any broadcasts when updated. The Slot class that makes this all work may seem like magic, but we have been using it ever since Apollo Client 2.5 (#3394, via the optimism library), so it has been amply battle-tested. This magic works.

…6512) The makeVar method was originally attached to InMemoryCache so that we could call cache.broadcastWatches() whenever the variable was updated. See #5799 and #5976 for background. However, as a number of developers have reported, requiring access to an InMemoryCache to create a ReactiveVar can be awkward, since the code that calls makeVar may not be colocated with the code that creates the cache, and it is often desirable to create and initialize reactive variables before the cache has been created. As this commit shows, the ReactiveVar function can infer the current InMemoryCache from a contextual Slot, when called without arguments (that is, when reading the variable). When the variable is updated (by passing a new value to the ReactiveVar function), any caches that previously read the variable will be notified of the update. Since this logic happens at variable access time rather than variable creation time, makeVar can be a free-floating global function, importable directly from @apollo/client. This new system allows the variable to become associated with any number of InMemoryCache instances, whereas previously a given variable was only ever associated with one InMemoryCache. Note: when I say "any number" I very much mean to include zero, since a ReactiveVar that has not been associated with any caches yet can still be used as a container, and will not trigger any broadcasts when updated. The Slot class that makes this all work may seem like magic, but we have been using it ever since Apollo Client 2.5 (#3394, via the optimism library), so it has been amply battle-tested. This magic works.

benjamn added 🐎 performance in-progress labels May 3, 2018

benjamn self-assigned this May 3, 2018

benjamn requested review from martijnwalraven, stubailo and jbaxleyiii May 3, 2018 00:00

stubailo reviewed May 3, 2018

View reviewed changes

clayne11 mentioned this pull request May 4, 2018

Add proper query invalidation #2895

Closed

bbrzoska reviewed May 10, 2018

View reviewed changes

benjamn force-pushed the benjamn/cache-result-objects-with-optimism branch 3 times, most recently from 35f549f to 4d5a851 Compare May 17, 2018 00:18

clayne11 suggested changes May 18, 2018

View reviewed changes

jamesreggio mentioned this pull request May 21, 2018

Improve read performance for complicated queries by up to 50% #3300

Merged

hwillson changed the title ~~Optimize repeated apollo-cache-inmemory reads by caching partial query results.~~ [WIP] Optimize repeated apollo-cache-inmemory reads by caching partial query results. May 29, 2018

Frizi mentioned this pull request Jun 1, 2018

Umbrella issue: Cache invalidation & deletion #621

Closed

benjamn force-pushed the benjamn/cache-result-objects-with-optimism branch from 4d5a851 to 0cc85c1 Compare June 5, 2018 23:24

benjamn mentioned this pull request Jun 6, 2018

Fix map memory leak #3444

Merged

3 tasks

benjamn mentioned this pull request Jun 6, 2018

Cache query documents transformed by InMemoryCache. #3553

Merged

benjamn force-pushed the benjamn/cache-result-objects-with-optimism branch from 0cc85c1 to 8b2ab9b Compare June 6, 2018 22:19

benjamn mentioned this pull request Oct 31, 2018

Unexpected query result when working with a large data set #4080

Closed

mpoeter mentioned this pull request Dec 4, 2018

Memory leak with optimistic response #4210

Closed

benjamn mentioned this pull request Dec 17, 2018

Remove QueryKeyMaker abstraction. #4245

Merged

benjamn mentioned this pull request Mar 18, 2019

Perform all DocumentNode transforms once, and cache the results. #4601

Merged

benjamn added a commit that referenced this pull request Mar 22, 2019

Centralize WeakMap feature detection in apollo-utilities.

f58fc63

Not all environments where WeakMap must be polyfilled do so reliably: #3394 (comment)

benjamn mentioned this pull request Apr 5, 2019

Referential identity of arrays lost when setting the cache in client resolvers #4662

Closed

benjamn mentioned this pull request Jun 28, 2019

Unreasonable heap size calling fetchMore, most prevalent with JavaScriptCore environments #5007

Closed

benjamn mentioned this pull request Aug 16, 2019

Implement EntityCache as replacement for DepTrackingCache and ObjectCache. #5197

Merged

benjamn mentioned this pull request Sep 16, 2019

Update optimism to use lighter-weight dependency API in EntityCache. #5325

Merged

benjamn mentioned this pull request Nov 25, 2019

Finer-grained InMemoryCache result caching. #5617

Merged

benjamn mentioned this pull request Dec 2, 2019

Stop paying attention to previousResult in InMemoryCache. #5644

Merged

benjamn mentioned this pull request Dec 3, 2019

Improve optimistic update performance by limiting cache key diversity. #5648

Merged

benjamn mentioned this pull request Jan 23, 2020

Disallow reading entire StoreObject from EntityStore by ID. #5828

Merged

benjamn mentioned this pull request Feb 14, 2020

Various cache read and write performance optimizations. #5948

Merged

benjamn mentioned this pull request May 27, 2020

Referential Integrity issue causes perf issues in Apollo 3 #6202

Closed

benjamn mentioned this pull request Jun 8, 2020

Test that cache evictions propagate to parent queries. #6412

Merged

benjamn mentioned this pull request Jun 30, 2020

Make makeVar a global function instead of a method of InMemoryCache. #6512

Merged

benjamn mentioned this pull request Nov 24, 2020

optimisticResponse is changing all immutable objects on query #4141

Closed

benjamn mentioned this pull request Mar 23, 2021

Simplify EntityStore CacheGroup logic to improve InMemoryCache result caching. #7887

Merged

benjamn mentioned this pull request Apr 16, 2021

refetch never updates cache read function values #7994

Closed

benjamn mentioned this pull request Jun 22, 2021

Add options to cache.gc to enable deleting nonessential/recomputable result caching data #8421

Merged

github-actions bot locked as resolved and limited conversation to collaborators Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize repeated apollo-cache-inmemory reads by caching partial query results. #3394

Optimize repeated apollo-cache-inmemory reads by caching partial query results. #3394

benjamn commented May 3, 2018 •

edited

Loading

apollo-cla commented May 3, 2018 •

edited

Loading

stubailo May 3, 2018

clayne11 commented May 4, 2018

bbrzoska May 10, 2018

bbrzoska May 10, 2018

bbrzoska May 10, 2018

clayne11 May 18, 2018

jamesreggio commented May 21, 2018

jamesreggio commented May 21, 2018

clayne11 commented Jun 4, 2018

	// remove the old generated value in case the old value was
	// inlined and the new value is not, which is indicated by
	// the old id being generated and the new id being real
	if (!generated) {
	store.delete(generatedKey);
	}

Optimize repeated apollo-cache-inmemory reads by caching partial query results. #3394

Optimize repeated apollo-cache-inmemory reads by caching partial query results. #3394

Conversation

benjamn commented May 3, 2018 • edited Loading

apollo-cla commented May 3, 2018 • edited Loading

stubailo May 3, 2018

Choose a reason for hiding this comment

clayne11 commented May 4, 2018

bbrzoska May 10, 2018

Choose a reason for hiding this comment

bbrzoska May 10, 2018

Choose a reason for hiding this comment

bbrzoska May 10, 2018

Choose a reason for hiding this comment

clayne11 May 18, 2018

Choose a reason for hiding this comment

jamesreggio commented May 21, 2018

jamesreggio commented May 21, 2018

clayne11 commented Jun 4, 2018

benjamn commented May 3, 2018 •

edited

Loading

apollo-cla commented May 3, 2018 •

edited

Loading