Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Referential Integrity issue causes perf issues in Apollo 3 #6202

Closed
3nvi opened this issue Apr 27, 2020 · 5 comments
Closed

Referential Integrity issue causes perf issues in Apollo 3 #6202

3nvi opened this issue Apr 27, 2020 · 5 comments

Comments

@3nvi
Copy link

3nvi commented Apr 27, 2020

Whenever I use fetchMore to fetch more items from the server, I couple it with updateQuery in order to provide the merge strategy of the new data. A common practice is to "append" the new items into the existing list like so [...prevResult.data.items, ...fetchMoreResult.data.items].

I would expect that this would retain referential equality of the existing items, since that happens in vanilla JS. Unfortunately, this doesn't happen when using Apollo, since the items array has objects that are all referentially different from the previous iteration. For example, If I had an array of one item:

[{ text: ' first' }]

and then by fetching more I received another new item

[{ text: 'second' }]

and I merged them in the updateQuery using the typical practice

return [...prevResult.data.items, ...fetchMoreResult.data.items]

then the { text: ' first' } object would not be the same between renders, which means that React would have to re-render and any React.memo would be rendered useless. In a scenario where we have 100 or 200 expensive components that need to un-necessarily re-render everytime something gets added to the array, this may cause real issues.

Intended outcome:

I would expect the objects would not be changed in any shape or form, meaning that Apollo would not "clone" or in any way manipulate what I return from the updateQuery handler

Actual outcome:

Apollo alters the data somehow and sadly they lose their referential integrity

How to reproduce the issue:

In this codesandbox you can clearly see the issue if you open the console.

Clicking "fetch more" continuously appends items to a list. You would expect that between renders the 1st item would be the same (since we' re only adding to a list), but unfortunately it's referentially different on each render pass.

https://codesandbox.io/s/apollo-reference-equality-1exes?file=/src/App.js

Versions

System:
OS: macOS 10.15.2
Binaries:
Node: 10.16.0 - /usr/local/bin/node
Yarn: 1.19.0 - /usr/local/bin/yarn
npm: 6.14.2 - /usr/local/bin/npm
Browsers:
Chrome: 81.0.4044.122
Firefox: 72.0.2
Safari: 13.0.4
npmPackages:
@apollo/client: 3.0.0-beta.41 => 3.0.0-beta.41
apollo-link-error: ^1.1.12 => 1.1.12

@3nvi 3nvi changed the title Referential Integrity issue during fetchMore causes perf issues in Apollo 3 Referential Integrity issue causes perf issues in Apollo 3 Apr 27, 2020
@hwillson hwillson added this to the Release 3.0 milestone May 21, 2020
@hwillson hwillson assigned hwillson and benjamn and unassigned hwillson May 26, 2020
@benjamn
Copy link
Member

benjamn commented May 27, 2020

Quick solution: if you want referential equality here, you need to give those { text } objects a __typename and id so they will have stable object references across repeated reads.

You seem to be working under the impression that the objects you write into the cache are the very same objects that the cache gives you back as result objects, as if the cache simply holds onto those objects temporarily, and later returns them to you. This is an incorrect interpretation of how the result caching system (#3394) works.

The goal/promise of the result caching system is to give you the same (===) result objects when you read a particular selection set repeatedly from the cache without changing any of the fields values that were involved in the previous read. Since field values are identified by the parent object ID plus field name (#5617), the only result objects eligible for reuse are those that have IDs.

In your reproduction, the only object with an ID is the ROOT_QUERY object, because you have not given your { text } objects IDs of their own. After you write the new getItems list, obviously the cache cannot return exactly the same { getItems: ... } result object as before, because the field identified by (ROOT_QUERY, getItems) has indeed changed in a meaningful way (its length increased by one). This change triggers rereading starting from ROOT_QUERY and continuing down to any other normalized (ID-having) entity objects that are encountered. Since there are no other normalized objects involved in this query, the rereading produces an entirely fresh tree of objects.

All of which is to say: if you value referential identity preservation, you should make sure the objects in question are normalized in the cache. In situations where you don't care so much about fine-grained referential identity preservation, you can omit the necessary __typename and key fields (e.g. id, or whatever keyFields you specify), and the cache will still give you correctly-shaped (just not ===) data.

You could imagine the cache somehow canonicalizing unrelated result objects that happen to be deeply equal to each other, returning the same object reference automatically. This is one of the reasons I'm excited about the Record and Tuple proposal that TC39 is currently considering. However, please note that this canonicalization step takes roughly the same amount of time as a deep equality check, which is something you could do in application-land just as quickly. The beauty of the current result caching system is that it completely avoids any rereading work as long as the previous result remains valid, without doing any hidden/extra work to guarantee === equality when the previous result is no longer valid.

Happy to answer questions about this system, since these are subtle issues/tradeoffs, but I believe it is working as designed.

@3nvi
Copy link
Author

3nvi commented May 27, 2020

Wow thanks a lot for the detailed answer. I was indeed not aware of some of the logic involved with the cache.

if you value referential identity preservation, you should make sure the objects in question are normalized in the cache

Perfect, good to know!

which is something you could do in application-land just as quickly.

With the new trend on hooks, components tend to get bloated with memoization logic, so I'd prefer if I handled it outside the "context" of a React component.

In our app we fetch a paginated list of heavy read-only data, that I'd explicitly like to not normalize, since they have no reason to be normalized (e.g. server logs displayed in a table). Based on what you said, the only way to have referential equality is to add normalization to hte game, but I feel that this contradicts the benefits described in the related section here. Specifically, this part:

Although normalization makes sense in most cases, normalizing certain types of data can negatively affect performance. This is especially true if you’re caching large quantities of transient data (such as metrics that are identified by a timestamp and never receive updates).

I understand that an app-land solution for memoization can do the trick, but I'm just wondering if this is indeed out of the scope of Apollo or just a limitation that's tied to the implementation.

Thanks again

@benjamn
Copy link
Member

benjamn commented May 27, 2020

I do think some sort of final memoization/canonicalization step is in scope for future versions of Apollo Client. I'd like to see some more progress on the Record and Tuple proposal, so we can start recommending/providing a lightweight polyfill implementation. It's also not out of scope to implement similar behavior ourselves, it's just more code for us to ship (and for you to bundle).

@benjamn
Copy link
Member

benjamn commented May 27, 2020

As for the performance tradeoffs of normalization, I think you're right to point out that contradiction. Specifically, I think our recommendation about disabling normalization for performance may have been somewhat speculative, and could easily be trumped by the downstream performance benefits of reference equality.

Nowadays, I think of disabling normalization (with keyFields: false) as a way of expressing "this object belongs to its parent, so don't bother storing it separately." While that kind of control can be valuable in some cases, I would be surprised if it significantly affects performance.

@3nvi
Copy link
Author

3nvi commented May 27, 2020

Gotcha,

I'll run some tests, compare timings with and get back to you if something seems off. For the time being, since what I originally described is expected from your side, I think it's fair to close this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants