-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize repeated apollo-cache-inmemory reads by caching partial query results. #3394
Conversation
Generated by 🚫 dangerJS |
const args = argumentsObjectFromField(field, variables); | ||
|
||
const info: ExecInfo = { | ||
resultKey: resultKeyNameFromField(field), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if this would be faster if we cached resultKey
as a property on field
.
This is a fantastic idea. I brought up this issue as it pertains to rendering in #2895 but this will also solve other performance problems. Great work! |
return diffQueryAgainstStore({ | ||
store: this.config.storeFactory(this.extract(query.optimistic)), | ||
store: store, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: store
can be omitted.
options?: OptimisticWrapOptions, | ||
): OptimisticWrapperFunction<T>; | ||
defaultMakeCacheKey(...args: any[]): any; | ||
} = require('optimism'); // tslint:disable-line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why require
instead of import + ambient type declaration file for optimism
?
// we should only merge if it's an object of the same type | ||
// otherwise, we should delete the generated object | ||
if (typenameChanged) { | ||
store.delete(generatedKey); | ||
store.delete(escapedId.id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably port this fix too:
apollo-client/packages/apollo-cache-inmemory/src/writeToStore.ts
Lines 456 to 461 in 2e7f191
// remove the old generated value in case the old value was | |
// inlined and the new value is not, which is indicated by | |
// the old id being generated and the new id being real | |
if (!generated) { | |
store.delete(generatedKey); | |
} |
35f549f
to
4d5a851
Compare
public hasDepTrackingCache() { | ||
return this.data instanceof DepTrackingCache; | ||
} | ||
|
||
protected broadcastWatches() { | ||
// Skip this when silenced (like inside a transaction) | ||
if (this.silenceBroadcast) return; | ||
|
||
// right now, we invalidate all queries whenever anything changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to remove this comment since it's no longer true.
I tried to give this branch a try tonight, but ran into some issues. First, I encountered the problem I described over here: #3300 (comment) After patching my way around this issue, I ran into an infinite recursion bug in the I'll try to pull together a Gist with a query and payload so you can reproduce it. |
Alright, here's a repro: https://gist.github.com/jamesreggio/eedd17511a3d64d1ba1613cbc08d78c5 It includes the original GraphQL document + variables, the parsed GraphQL document, the resulting data from the server, and the error. I have a hunch that changes to |
Any movement on this? This is an incredibly exciting improvement. |
4d5a851
to
0cc85c1
Compare
Restoring non-enumerability of the ID_KEY Symbol in #3544 made ID_KEY slightly more hidden from application code, at the cost of slightly worse performance (because of Object.defineProperty), but tests were still broken because Jest now includes Symbol keys when checking object equality (even the non-enumerable ones). Fortunately, given all the previousResult refactoring that has happened in PR #3394, we no longer need to store ID_KEY properties at all, which completely side-steps the question of whether ID_KEY should be enumerable or not, and avoids any problems due to Jest including Symbol keys when checking deep equality. If we decide to bring this ID metadata back in the future, we could use a WeakMap to associate result objects with their IDs, so that we can avoid modifying the result objects.
After #3444 removed `Map`-based caching for `addTypenameToDocument` (in order to fix memory leaks), the `InMemoryCache#transformDocument` method now creates a completely new `DocumentNode` every time it's called (assuming this.addTypename is true, which it is by default). This commit uses a `WeakMap` to cache calls to `addTypenameToDocument` in `InMemoryCache#transformDocument`, so that repeated cache reads will no longer create an unbounded number of new `DocumentNode` objects. The benefit of the `WeakMap` is that it does not prevent its keys (the original `DocumentNode` objects) from being garbage collected, which is another way of preventing memory leaks. Note that `WeakMap` may have to be polyfilled in older browsers, but there are many options for that. This optimization will be important for #3394, since the query document is involved in cache keys used to store cache partial query results. cc @hwillson @jbaxleyiii @brunorzn
After #3444 removed `Map`-based caching for `addTypenameToDocument` (in order to fix memory leaks), the `InMemoryCache#transformDocument` method now creates a completely new `DocumentNode` every time it's called (assuming this.addTypename is true, which it is by default). This commit uses a `WeakMap` to cache calls to `addTypenameToDocument` in `InMemoryCache#transformDocument`, so that repeated cache reads will no longer create an unbounded number of new `DocumentNode` objects. The benefit of the `WeakMap` is that it does not prevent its keys (the original `DocumentNode` objects) from being garbage collected, which is another way of preventing memory leaks. Note that `WeakMap` may have to be polyfilled in older browsers, but there are many options for that. This optimization will be important for #3394, since the query document is involved in cache keys used to store cache partial query results. cc @hwillson @jbaxleyiii @brunorzn
After #3444 removed `Map`-based caching for `addTypenameToDocument` (in order to fix memory leaks), the `InMemoryCache#transformDocument` method now creates a completely new `DocumentNode` every time it's called (assuming `this.addTypename` is true, which it is by default). This commit uses a `WeakMap` to cache calls to `addTypenameToDocument` in `InMemoryCache#transformDocument`, so that repeated cache reads will no longer create an unbounded number of new `DocumentNode` objects. The benefit of the `WeakMap` is that it does not prevent its keys (the original `DocumentNode` objects) from being garbage collected, which is another way of preventing memory leaks. Note that `WeakMap` may have to be polyfilled in older browsers, but there are many options for that. This optimization will be important for #3394, since the query document is involved in cache keys used to store cache partial query results. cc @hwillson @jbaxleyiii @brunorzn
0cc85c1
to
8b2ab9b
Compare
Not all environments where WeakMap must be polyfilled do so reliably: #3394 (comment)
The previousResult option was originally a way to ensure referential identity of structurally equivalent cache results, before the result caching system was introduced in #3394. It worked by returning previousResult whenever it was deeply equal to the new result. The result caching system works a bit differently, and in particular never needs to do a deep comparison of results. However, there were still a few (test) cases where previousResult seemed to have a positive effect, and removing it seemed like a breaking change, so we kept it around. In the meantime, the equality check has continued to waste CPU cycles, and the behavior of previousResult has undermined other improvements, such as freezing cache results (#4514). Even worse, previousResult effectively disabled an optimization that allowed InMemoryCache#broadcastWatches to skip unchanged queries (see comments I removed if curious). This commit restores that optimization. I realized eliminating previousResult might finally be possible while working on PR #5617, which made the result caching system more precise by depending on IDs+fields rather than just IDs. This additional precision seems to have eliminated the few remaining cases where previousResult had any meaningful benefit, as evidenced by the lack of any test changes in this commit... even among the many direct tests of previousResult in __tests__/diffAgainstStore.ts! The removal of previousResult is definitely a breaking change (appropriate for Apollo Client 3.0), because you can still contrive cases where some never-before-seen previousResult object just happens to be deeply equal to the new result. Also, it's fair to say that this removal will strongly discourage disabling the result caching system (which is still possible for diagnostic purposes), since we rely on result caching to get the benefits that previousResult provided.
The result caching system introduced by #3394 gained the ability to cache optimistic results (rather than just non-optimistic results) in #5197, but since then has suffered from unnecessary cache key diversity during optimistic updates, because every EntityStore.Layer object (corresponding to a single optimistic update) counts as a distinct cache key, which prevents cached results from being reused if they were originally read from a different Layer object. This commit introduces the concept of a CacheGroup, store.group, which manages dependency tracking and also serves as a source of keys for the result caching system. While the Root object has its own CacheGroup, Layer objects share a CacheGroup object, which is the key to limiting diversity of cache keys when more than one optimistic update is pending. This separation allows the InMemoryCache to enjoy the full benefits of result caching for both optimistic (Layer) and non-optimistic (Root) data, separately.
The previousResult option was originally a way to ensure referential identity of structurally equivalent cache results, before the result caching system was introduced in #3394. It worked by returning previousResult whenever it was deeply equal to the new result. The result caching system works a bit differently, and in particular never needs to do a deep comparison of results. However, there were still a few (test) cases where previousResult seemed to have a positive effect, and removing it seemed like a breaking change, so we kept it around. In the meantime, the equality check has continued to waste CPU cycles, and the behavior of previousResult has undermined other improvements, such as freezing cache results (#4514). Even worse, previousResult effectively disabled an optimization that allowed InMemoryCache#broadcastWatches to skip unchanged queries (see comments I removed if curious). This commit restores that optimization. I realized eliminating previousResult might finally be possible while working on PR #5617, which made the result caching system more precise by depending on IDs+fields rather than just IDs. This additional precision seems to have eliminated the few remaining cases where previousResult had any meaningful benefit, as evidenced by the lack of any test changes in this commit... even among the many direct tests of previousResult in __tests__/diffAgainstStore.ts! The removal of previousResult is definitely a breaking change (appropriate for Apollo Client 3.0), because you can still contrive cases where some never-before-seen previousResult object just happens to be deeply equal to the new result. Also, it's fair to say that this removal will strongly discourage disabling the result caching system (which is still possible for diagnostic purposes), since we rely on result caching to get the benefits that previousResult provided.
The previousResult option was originally a way to ensure referential identity of structurally equivalent cache results, before the result caching system was introduced in #3394. It worked by returning previousResult whenever it was deeply equal to the new result. The result caching system works a bit differently, and in particular never needs to do a deep comparison of results. However, there were still a few (test) cases where previousResult seemed to have a positive effect, and removing it seemed like a breaking change, so we kept it around. In the meantime, the equality check has continued to waste CPU cycles, and the behavior of previousResult has undermined other improvements, such as freezing cache results (#4514). Even worse, previousResult effectively disabled an optimization that allowed InMemoryCache#broadcastWatches to skip unchanged queries (see comments I removed if curious). This commit restores that optimization. I realized eliminating previousResult might finally be possible while working on PR #5617, which made the result caching system more precise by depending on IDs+fields rather than just IDs. This additional precision seems to have eliminated the few remaining cases where previousResult had any meaningful benefit, as evidenced by the lack of any test changes in this commit... even among the many direct tests of previousResult in src/cache/inmemory/__tests__/diffAgainstStore.ts! The removal of previousResult is definitely a breaking change (appropriate for Apollo Client 3.0), because you can still contrive cases where some never-before-seen previousResult object just happens to be deeply equal to the new result. Also, it's fair to say that this removal will strongly discourage disabling the result caching system (which is still possible for diagnostic purposes), since we rely on result caching to get the benefits that previousResult provided.
The result caching system introduced by #3394 gained the ability to cache optimistic results (rather than just non-optimistic results) in #5197, but since then has suffered from unnecessary cache key diversity during optimistic updates, because every EntityStore.Layer object (corresponding to a single optimistic update) counts as a distinct cache key, which prevents cached results from being reused if they were originally read from a different Layer object. This commit introduces the concept of a CacheGroup, store.group, which manages dependency tracking and also serves as a source of keys for the result caching system. While the Root object has its own CacheGroup, Layer objects share a CacheGroup object, which is the key to limiting diversity of cache keys when more than one optimistic update is pending. This separation allows the InMemoryCache to enjoy the full benefits of result caching for both optimistic (Layer) and non-optimistic (Root) data, separately.
When an object is evicted from the cache, common intuition says that any dangling references to that object should be proactively removed from elsewhere in the cache. Thankfully, this intuition is misguided, because a much simpler and more efficient approach to handling dangling references is already possible, without requiring any new cache features. As the tests added in this commit demonstrate, the cleanup of dangling references can be postponed until the next time the affected fields are read from the cache, simply by defining a custom read function that performs any necessary cleanup, in whatever way makes sense for the logic of the particular field. This lazy approach is vastly more efficient than scanning the entire cache for dangling references would be, because it kicks in only for fields you actually care about, the next time you ask for their values. For example, you might have a list of references that should be filtered to exclude the dangling ones, or you might want the dangling references to be nullified in place (without filtering), or you might have a single reference that should default to something else if it becomes invalid. All of these options are matters of application-level logic, so the cache cannot choose the right default strategy in all cases. By default, references are left untouched unless you define custom logic to do something else. It may actually be unwise/destructive to remove dangling references from the cache, because the evicted data could always be written back into the cache at some later time, restoring the validity of the references. Since eviction is not necessarily final, dangling references represent useful information that should be preserved by default after eviction, but filtered out just in time to keep them from causing problems. Even if you ultimately decide to prune the dangling references, proactively finding and removing them is way more work than letting a read function handle them on-demand. This system works because the result caching system (#3394, #5617) tracks hierarchical field dependencies in a way that causes read functions to be reinvoked any time the field in question is affected by updates to the cache, even if the changes are nested many layers deep within the field. It also helps that custom read functions are consistently invoked for a given field any time that field is read from the cache, so you don't have to worry about dangling references leaking out by other means.
The makeVar method was originally attached to InMemoryCache so that we could call cache.broadcastWatches() whenever the variable was updated. See #5799 and #5976 for background. However, as a number of developers have reported, requiring access to an InMemoryCache to create a ReactiveVar can be awkward, since the code that calls makeVar may not be colocated with the code that creates the cache, and it is often desirable to create and initialize reactive variables before the cache has been created. As this commit shows, the ReactiveVar function can infer the current InMemoryCache from a contextual Slot, when called without arguments (that is, when reading the variable). When the variable is updated (by passing a new value to the ReactiveVar function), any caches that previously read the variable will be notified of the update. Since this logic happens at variable access time rather than variable creation time, makeVar can be a free-floating global function, importable directly from @apollo/client. This new system allows the variable to become associated with any number of InMemoryCache instances, whereas previously a given variable was only ever associated with one InMemoryCache. Note: when I say "any number" I very much mean to include zero, since a ReactiveVar that has not been associated with any caches yet can still be used as a container, and will not trigger any broadcasts when updated. The Slot class that makes this all work may seem like magic, but we have been using it ever since Apollo Client 2.5 (#3394, via the optimism library), so it has been amply battle-tested. This magic works.
…6512) The makeVar method was originally attached to InMemoryCache so that we could call cache.broadcastWatches() whenever the variable was updated. See #5799 and #5976 for background. However, as a number of developers have reported, requiring access to an InMemoryCache to create a ReactiveVar can be awkward, since the code that calls makeVar may not be colocated with the code that creates the cache, and it is often desirable to create and initialize reactive variables before the cache has been created. As this commit shows, the ReactiveVar function can infer the current InMemoryCache from a contextual Slot, when called without arguments (that is, when reading the variable). When the variable is updated (by passing a new value to the ReactiveVar function), any caches that previously read the variable will be notified of the update. Since this logic happens at variable access time rather than variable creation time, makeVar can be a free-floating global function, importable directly from @apollo/client. This new system allows the variable to become associated with any number of InMemoryCache instances, whereas previously a given variable was only ever associated with one InMemoryCache. Note: when I say "any number" I very much mean to include zero, since a ReactiveVar that has not been associated with any caches yet can still be used as a container, and will not trigger any broadcasts when updated. The Slot class that makes this all work may seem like magic, but we have been using it ever since Apollo Client 2.5 (#3394, via the optimism library), so it has been amply battle-tested. This magic works.
Reading repeatedly from
apollo-cache-inmemory
using eitherreadQueryFromStore
ordiffQueryAgainstStore
currently returns a newly-computed object each time, even if no data IDs in the cache have changed.Passing the
previousResult
option can improve application performance by ensuring that equivalent results are===
to each other, but the presence ofpreviousResult
only makes the cache reading computation more expensive, because new objects are still created and then thrown away if they are structurally equivalent to thepreviousResult
.This PR is a work-in-progress with the goal of returning previous results (including nested result objects, not just the top-level result) immediately, without any unnecessary recomputation, as long as the underlying data IDs involved in the original computation have not been modified in the meantime.
This functionality is based on an npm package called
optimism
that I wrote to salvage rebuild performance for Meteor 1.4.2, by avoiding unnecessarily rereading files from the file system. It is not an overstatement to say that Meteor would no longer exist as a project without this powerful caching technique.The
optimism
library allows caching the results of functions based on (a function of) their arguments, while also keeping track of any other cached functions that were called in the process of evaluating the result, so that the result can be invalidated (or "dirtied") when any of the results of those other functions are dirtied. Dirtying is a very cheap, idempotent operation, since it does not force immediate recomputation, but simply marks the dirtied result as needing to be recomputed the next time the cached function is called with equivalent arguments.If this approach is successful, it should effectively close the performance gap between
apollo-cache-inmemory
and https://github.com/convoyinc/apollo-cache-hermes, at least as far as cache reads are concerned, without sacrificing exactness.Cache write performance should also benefit dramatically, since much of the cost of writing to the cache comes from broadcasting new results for existing queries, which requires first rereading those results from the updated cache.
Along the way, I have taken many opportunities to refactor and simplify the
apollo-cache-inmemory
code. For example, the first few commits in this PR eliminate the use ofgraphql-anywhere
to read from the local store, which unlocks a number of optimization opportunities by removing a relatively opaque layer of abstraction.I will try to add comments to the commits below to highlight areas of special interest.