fix: CollectDirtyChildren causing app to hang when saving object with many pointers #1589

shlusiak · 2021-01-05T07:15:49Z

Please see parse-community/Parse-SDK-Android#1058 for the Android PR and parse-community/Parse-SDK-Android#1006 (comment) for the analysis of the issue.

The collectDirtyChildren forgets seen siblings when traversing, which causes extra traversals if those siblings point to the same objects and causes lots of garbage to be created in memory. It also has the potential of adding the same dirty object twice to the dirtyChildren collection, which may or may not be a set.

Imagine the simplified setup of an Object A with an attribute which is an Array containing two pointers to object B. When traversing A, the array will be looped over. For each object found, the seen set would be duplicated and B would be added and B would be traversed. When backing out and iterating over the next attribute in that array, the original seen variable would be used and B would be traversed again and all of it's child pointers.

Removing the copy of the seen set may improve performance significantly.

Closes: #1588
Possibly closes: #982

cbaker6 · 2021-01-05T13:56:18Z

This may be related to #1588. My suggestion would be to port the test cases I listed first (in this PR), then if that utility method is broken, add the fix (it’s in the comment also). I say this because saving child objects relies on batch saves and looks like the batching decision for large arrays is messed up.

If the Android SDK is using the same algorithm for the batch calculation, it will probably need to be fixed also.

cbaker6 · 2021-01-05T17:36:20Z

Parse/Parse/PFObject.m

@@ -207,7 +207,7 @@ + (BFTask *)_enqueue:(BFTask *(^)(BFTask *toAwait))taskStart forObjects:(NSArray
 + (BOOL)collectDirtyChildren:(id)node
                    children:(NSMutableSet *)dirtyChildren
                       files:(NSMutableSet *)dirtyFiles
-                        seen:(NSSet *)seen
+                        seen:(NSMutableSet *)seen


Thinking about this more, this may or not be the right solution. This is a recursive call, depending on how it's used this may or may not need to be mutable.

Does the original implementation work fine on a number of smaller children? If it does, I'm assuming it's less than 50, and the issue comes up over 50. If so, the problem might be the batching calculation I mentioned.

When I wrote those test cases, when it went over the set amount to batch (50), it returned only the first 50 objects every time which sounds like a similar symptom that you are reporting.

I don't think this is a batching issue as batching will happen after all dirty children are calculated. In my situation this recursive method spins without finishing, consuming all CPU. It does indeed work fine for small objects but explodes quickly. There is no batching involved in my case as I know there is only one dirty object, the root object, but this method will run into a seemingly endless loop trying to traverse all child objects.

This recursive call is internal and only called in one place passing in a NSMutableSet (changing this to require a mutable set only failed in this one call site).

The problem here is triggered by our complex schema with a couple of classes, where each unfortunately is linked back to a root parent class. Now when we use arrays that are traversed and at the end many pointers point to the same parent, this parent is visited many, many, many times, and this explodes very quickly.

p = [Parent] a.parent = p b.parent = p c.parent = p d.parent = p r = [Root] r.objects = [a, b, c, d] // saving this root will visit each of the children a, b, c, d, // but the algorithm will not remember that p has already been visited and p will be visited 4 times, // including all of p's children. r.saveInBackground()

This explodes exponentially when you have layers of nested pointers to the common object or if you imagine the parent having pointers to other objects. This happens before the dirty children are returned, so before any batching could occur, so I don't think #1588 is necessarily related.

In my case this method never returns in reasonable time (I think it will return eventually, I cannot detect an endless loop in there, but that may take years).

p = [Parent]
a.parent = p
b.parent = p
c.parent = p
d.parent = p
r = [Root]
r.objects = [a, b, c, d]
// saving this root will visit each of the children a, b, c, d,
// but the algorithm will not remember that p has already been visited and p will be visited 4 times,
// including all of p's children.
r.saveInBackground()

I agree that it seems like it should remember p after visiting a.

It also seems like this would be remedied if p points to all it's children and if the children really want to know their relation to their parent p, they just add a field referencing p's objectId, not pointing back up to p, allowing everything to traverse. This was the issue that was coming up in the other issue you referenced.

cbaker6 · 2021-01-05T18:20:50Z

Eyeballing android, it seems to have may the batch count issue also https://github.com/parse-community/Parse-SDK-Android/blob/a5cfa1f9c6654281c76367ad8b3252e2499bb821/parse/src/main/java/com/parse/Lists.java#L67

cbaker6 · 2021-01-06T19:05:46Z

Can you rebase this with the master branch to run the CI?

Also, it sounds like you have a possible testcase to verify your bug fix, can you add that also?

shlusiak · 2021-01-07T03:27:34Z

Can you rebase this with the master branch to run the CI?

Also, it sounds like you have a possible testcase to verify your bug fix, can you add that also?

Rebased to master. I don't have a test case written and my knowledge of ObjC is limited and don't think I'd be able to write one. I'd also probably not write a test case to catch this bug (which is an infinite loop given the right schema), but a test case to verify that the fix doesn't break the intended behavior. It would be unfortunate if there were no such test cases yet, but I have not checked.

Does my explaination of the flaw in the algorithm make sense though?

cbaker6 · 2021-01-07T04:15:02Z

Does my explaination of the flaw in the algorithm make sense though?

The explanation makes sense, but I need to think more if the case you presented should work and not introduce other issues. The problems listed in #982 look like circular dependencies to me that the algorithm couldn't detect. In those examples, it doesn't seem like they setup the schema in a way conducive to Parse. What have you seen after your change? Are you using the change in production?

Do you have any sample code you can show that demonstrates the issue?

cbaker6 · 2021-01-07T04:35:57Z

@dplewis @mtrezza @TomWFox what do you all think? This was also brought up in the Android repo. Sounds like it can be a big problem. The questions I have is “what should the algorithm do in this scenario?”. “Does the change fix the problem without introducing other problems?”. I left some other comments above also...

shlusiak · 2021-01-07T05:06:34Z

Yes, we are using this fix in production now as it has fixed the issue we were seeing of the CPU spinning when trying to save a dirty object that contains an array of pointers to other objects, which point to quite a lot of other objects.

I at first also thought about circular dependencies that this algorithm wouldn't detect and that led me down to this code here, where objects would be visited multiple times. Now of course we have circular dependencies everywhere, but the algorithm is 1) detecting the only problematic case just time (two new objects pointing to each other) and 2) is supposed to stop when an object has already been seen. Now this second part, where objects are traversed multiple times smells like an infinite loop, but I couldn't proof that and believe the algorithm is actually meant to terminate eventually.

In our case it has been working fine without issues but a recent addition of another level of pointers to our schema made the whole thing spin forever. Again, I cannot proof that this is stuck in an infinite loop or only for 200+ years.

We have the same fix in production in Android since a few month and have not seen any issues related to this, but our saving usually only saves a single dirty object in one operation.

shlusiak · 2021-01-07T05:15:38Z

Forgive my limited ObjC / Swift / iOS skills, but a simple test case would be:

let parent = PFObject(className: "parent")
        let obj = PFObject(className: "object")
        let ch1 = PFObject(className: "child")
        let ch2 = PFObject(className: "child")
        ch1["parent"] = parent
        ch2["parent"] = parent
        
        obj["values"] = [
            ch1,
            ch2
        ]
        
        try! obj.save()

No circular dependencies. This call should succeed in both cases, with or without my fix, but without my fix parent and all it's children will be visited twice, with my fix only once. I don't know how you'd cement that into a test case as the result is exactly the same, just one is more efficient.

cbaker6 · 2021-01-07T05:29:23Z

No circular dependencies. This call should succeed in both cases, with or without my fix, but without my fix parent and all it's children will be visited twice, with my fix only once. I don't know how you'd cement that into a test case as the result is exactly the same, just one is more efficient.

in this particular framework, every PFObject is a class (reference type) and since you didn’t save the parent first it might have trouble hashing to a unique value (It doesn’t have an objectId yet) causing the parent not to be recognized as the same object. If you really wanted to do this it seems you should save the parent first (Gets an objectId), then make the children point to the parent after the parent is saved

cbaker6 · 2021-01-07T05:35:49Z

Two objects attempting to point to the same object that is unsaved and therefore not unique can cause a problem

cbaker6 · 2021-01-07T05:38:31Z

If PFObjects were structs and equatable then there probably wouldn’t be an issue with this, but they are not here

shlusiak · 2021-01-07T05:43:07Z

This would have disastrous consequences already as you'd end up with multiple dirty instances if that was the case. Also I'm pretty sure that because PFObject is a class and not a struct, two pointers to the parent will be identical (reference) and the hash will be identical and the object would only be added to the set once.

But this is not the point I am trying to make, my algorithm change will not change what happens when two objects point to the same unsaved object.

cbaker6 · 2021-01-07T13:03:02Z

Classes don’t hash to the same value unless you make them conform to hashable/equatable or do some bitwise comparison. Since PFObject subclasses NSObject, you may be getting some help, but that’s not something I would personally depend on. Parse considers an object “Identfiable” when it has an objectId, in your case, none of them do yet but are pointing to the same object. You are claiming they should be identifiable because they are referencing the same local object, but these objects are being encoded and sent to the server with no objectId. Two objects can look exactly the same in encoded form, does that make them the same object? No way to tell, this can either be resolved as a circular dependency or treated as separate ParseObjects...

Your coded example looks like a corner case circular dependency when using the current design of the SDK and it seems to be the case with the infinite loop you are getting. I gave a couple of examples of how to offset this, at least when it comes to saving to the server. I don’t know much about how save to local storage work as I assume they get some localId before saving.

My questions above about will this change fix the problem without introducing others still remain. If the algorithm is indeed missing this corner case or can be improved to mitigate it, it should.

shlusiak · 2021-01-08T11:56:29Z

In my example the parent object is the same instance. This should have the same hash because it is literally the same instance, not another object of the same class qith the same values. The two child classes are different and will get different objectIds. Also there is no circle and this is no edge case. Feel free to assume that all objects are already persisted and have objectIds and that there is no local storage. The algorithm is simply inefficient and with larger data sets is likely to quickly churn on CPU and may appear to never finish even if none of the objects are dirty.

cbaker6 · 2021-01-08T12:24:28Z

Feel free to assume that all objects are already persisted and have objectIds and that there is no local storage.

This was not the example you presented, it’s completely different. You presented saving 3 “new” objects, that means no objectIds. Your latest comment says assume objectIds... if the children have Parse Ponters to “p”, then their shouldn’t be a traversal issue. I’m assuming in your new scenario you have all keys included, having the complete object of p pointed to which is why p’s keys are even being looked at from the children...

At any rate, in order to understand the possible improvements or issues your change may introduce you should post the updated code depicting the scenario that’s causing the issue as closely as possible.

shlusiak · 2021-01-09T05:15:57Z

I fail to see why my example is not sufficient enough to outline what is happening in the algorithm and what I believe is a flaw. Whether objects are new or not is not relevant as they will be traversed regardless, to find circular dependencies of new objects. Neither does my example contain a circular dependency, to keep it simple.

Can we agree that the algorithm would be expected to:

Traverse all objects and find no cycle
come up with all objects being dirty / new
persist parent first
persist the two ch objects, once parent has an objectId
persist the obj instance last once the ch objects have an objectId

The algorithm does all that and there is no issue.
Except that it will traverse the parent object twice. And that is should not do that since that might really blow out the search tree.

My fix is a way to stop the algorithm from visiting nodes it has already seen. Now this will ideally not change the end result of what the algorithm is producing.

Whether my fix is negatively impacting other aspects of this algorithm other than outlined above would be up for debate and I'd like to focus on that rather than batching issues outside the scope of the affected function.

My simplified test case is of course not suitable to find regressions my fix could introduce but I was hoping we could get back to discuss my proposed changes and whether we can agree on them having positive or negative effect on the algorithm.

eeallen1 · 2021-05-26T00:19:16Z

Is there any update on this? We have the occasional issue when saving complex objects, and I suspect it's related.

stale · 2021-07-13T02:00:57Z

This issue has been automatically marked as stale because it has not had recent activity. If you believe it should stay open, please let us know! As always, we encourage contributions, check out the Contributing Guide

shlusiak · 2021-07-13T02:16:11Z

This project is stale. :(

stale · 2021-10-02T02:12:01Z

This issue has been automatically marked as stale because it has not had recent activity. If you believe it should stay open, please let us know! As always, we encourage contributions, check out the Contributing Guide

parse-github-assistant · 2023-06-26T04:15:58Z

I will reformat the title to use the proper commit message syntax.

shlusiak · 2023-06-26T04:17:07Z

Rebased onto master, as this issue was never fixed. Is there any progress on this or has this project become stale?

mtrezza · 2023-06-26T12:18:27Z

@shlusiak Could you try to write up a test case? A test case is usually required to sustainably fix a bug. Maybe someone from @parse-community/ios-sdk could help. You could look at existing test cases, choose and that is most similar to what you're trying to test, then copy and modify it. Also maybe take a look at the related issue in the Android SDK if there is any test case you can use here.

codecov · 2023-06-26T12:35:47Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

mtrezza · 2023-06-26T12:46:47Z

@stephanboner Would you be interested in joining the iOS SDK team there for occasional reviews?

stephanboner · 2023-06-27T12:17:59Z

@stephanboner Would you be interested in joining the iOS SDK team there for occasional reviews?

Hey @mtrezza, thanks for asking! Unfortunately I'm not working as a software developer anymore and therefore I no longer have the required infrastructure (a Mac with Xcode). Thank you though!

mtrezza · 2023-06-27T13:14:53Z

@stephanboner Thanks for replying, wish you all the best in your new career!

…ng when saving object with many pointers

mtrezza · 2024-10-23T17:28:01Z

@shlusiak I noticed you made some commits; what is the state of this PR?

shlusiak · 2024-10-23T23:05:51Z

@shlusiak I noticed you made some commits; what is the state of this PR?

Thanks for reaching out. This PR is still open and valid, and we rely on this branch for our production environment. I had to rebase this branch to get it to compile using XCode 16, but getting this into master would be very much appreciated.

mtrezza · 2024-10-25T13:33:57Z

Do you think it's possible to write up a short test for this change? Then we can go ahead and merge this.

mtrezza · 2024-10-25T13:41:54Z

Maybe something like this, added to ParseObjectTests.m?

(void)testCollectDirtyChildrenWithManyPointers {
    // Create a root object
    ParseObject *rootObject = [ParseObject objectWithClassName:@"RootObject"];

    // Add a large number of child pointers to simulate heavy usage
    for (int i = 0; i < 1000; i++) {
        ParseObject *childObject = [ParseObject objectWithClassName:@"ChildObject"];
        childObject[@"data"] = @(i); // Mock data
        [rootObject addObject:childObject forKey:@"children"];
    }

    // Attempt to save the root object, expecting no hang or excessive recursion
    XCTestExpectation *expectation = [self expectationWithDescription:@"Object saved without hang"];
    
    [rootObject saveInBackgroundWithBlock:^(BOOL succeeded, NSError *error) {
        XCTAssertTrue(succeeded, @"Root object with many pointers should save successfully.");
        XCTAssertNil(error, @"No error should occur.");
        [expectation fulfill];
    }];
    
    [self waitForExpectationsWithTimeout:10 handler:nil];
}

shlusiak · 2024-10-26T12:19:24Z

@mtrezza I'm not actually an iOS developer, and unfortunately don't know Objective C, so even finding the fix was hard enough. If you are happy for me to just commit that test you provided, I can certainly do that. I'm a bit hesitant though to add tests that expect certain timings of functions, as success failure now may depend on the speed of the agents executing these tests, further I don't think unit tests are meant to stress test performance of functions. Are there sufficient unit tests covering PFObject to say with confidence that my change did not break existing business logic?

mtrezza · 2024-10-27T15:23:59Z

Could you try adding the test and running it without the fix, just so we know whether it even fails. If it doesn't fail then there's no need to add it.

Can you confirm that this fix worked for you? How did you try this out?

shlusiak · 2024-10-27T23:50:37Z

I'm sorry, this is very much out of me league. How do I run tests? None of them compile in my XCode, I do not know how to write ObjC code.

I'm very happy to explain the change to the algorithm, why the previous algorithm was very inefficient, caused us great performance problems in our production app, and why my proposed change fixes that for us. The same change was made for Android, and the issue is explained in parse-community/Parse-SDK-Android#1006 (comment)

We are running our production app from this custom branch here, and please believe me that the performance problems went away.

I understand you might want a usecase so you can reproduce and verify the issue, but I would need some help adding and running the tests.

…o-hang

mtrezza · 2024-10-30T00:06:34Z

Sure, maybe you want to take a look at the existing tests in ObjectUnitTests.m, in the last section Dirty. Is there any test case that looks similar to the scenario you are describing?

…o-hang

cbaker6 reviewed Jan 5, 2021

View reviewed changes

shlusiak force-pushed the remove-extra-child-traversal-causing-app-to-hang branch from fc60a5a to 1f26e35 Compare January 7, 2021 03:25

stephanboner approved these changes Jan 27, 2021

View reviewed changes

eeallen1 added a commit to eeallen1/Parse-SDK-iOS-OSX that referenced this pull request Jun 10, 2021

Incorporate changes from parse-community#1589

af7215b

stale bot added the Stale label Jul 13, 2021

stale bot removed the Stale label Jul 13, 2021

stale bot added the Stale label Oct 2, 2021

mtrezza removed the Stale label Oct 14, 2021

shlusiak force-pushed the remove-extra-child-traversal-causing-app-to-hang branch 2 times, most recently from 7db9be5 to 42f8143 Compare June 26, 2023 04:10

parse-github-assistant bot changed the title ~~fix: collectDirtyChildren causing app to hang when saving object with many pointers~~ fix: CollectDirtyChildren causing app to hang when saving object with many pointers Jun 26, 2023

shlusiak requested review from cbaker6 and stephanboner June 26, 2023 04:17

mtrezza requested review from a team and removed request for stephanboner, cbaker6 and a team June 26, 2023 12:09

shlusiak force-pushed the remove-extra-child-traversal-causing-app-to-hang branch from d7625ba to 0dc8128 Compare October 18, 2023 11:47

shlusiak force-pushed the remove-extra-child-traversal-causing-app-to-hang branch 2 times, most recently from 86c1353 to aac4bdc Compare October 23, 2024 11:33

fix: extra child traversal in collectDirtyChildren to cause app to ha…

717081a

…ng when saving object with many pointers

shlusiak force-pushed the remove-extra-child-traversal-causing-app-to-hang branch from aac4bdc to 717081a Compare October 23, 2024 13:31

Merge branch 'master' into remove-extra-child-traversal-causing-app-t…

7e087f8

…o-hang

Merge branch 'master' into remove-extra-child-traversal-causing-app-t…

b578116

…o-hang

fix: CollectDirtyChildren causing app to hang when saving object with many pointers #1589

Are you sure you want to change the base?

fix: CollectDirtyChildren causing app to hang when saving object with many pointers #1589

Conversation

shlusiak commented Jan 5, 2021 • edited Loading

cbaker6 commented Jan 5, 2021 • edited Loading

cbaker6 Jan 5, 2021

Choose a reason for hiding this comment

shlusiak Jan 6, 2021

Choose a reason for hiding this comment

cbaker6 Jan 7, 2021

Choose a reason for hiding this comment

cbaker6 Jan 7, 2021

Choose a reason for hiding this comment

cbaker6 commented Jan 5, 2021 • edited Loading

cbaker6 commented Jan 6, 2021

shlusiak commented Jan 7, 2021

cbaker6 commented Jan 7, 2021

cbaker6 commented Jan 7, 2021 • edited Loading

shlusiak commented Jan 7, 2021

shlusiak commented Jan 7, 2021

cbaker6 commented Jan 7, 2021 • edited Loading

cbaker6 commented Jan 7, 2021

cbaker6 commented Jan 7, 2021

shlusiak commented Jan 7, 2021

cbaker6 commented Jan 7, 2021 • edited Loading

shlusiak commented Jan 8, 2021

cbaker6 commented Jan 8, 2021

shlusiak commented Jan 9, 2021

eeallen1 commented May 26, 2021

stale bot commented Jul 13, 2021

shlusiak commented Jul 13, 2021

stale bot commented Oct 2, 2021

parse-github-assistant bot commented Jun 26, 2023

shlusiak commented Jun 26, 2023

mtrezza commented Jun 26, 2023 • edited Loading

codecov bot commented Jun 26, 2023 • edited Loading

Welcome to Codecov 🎉

mtrezza commented Jun 26, 2023 • edited Loading

stephanboner commented Jun 27, 2023

mtrezza commented Jun 27, 2023 • edited Loading

mtrezza commented Oct 23, 2024

shlusiak commented Oct 23, 2024

mtrezza commented Oct 25, 2024 • edited Loading

mtrezza commented Oct 25, 2024 • edited Loading

shlusiak commented Oct 26, 2024

mtrezza commented Oct 27, 2024

shlusiak commented Oct 27, 2024

mtrezza commented Oct 30, 2024

shlusiak commented Jan 5, 2021 •

edited

Loading

cbaker6 commented Jan 5, 2021 •

edited

Loading

cbaker6 commented Jan 5, 2021 •

edited

Loading

cbaker6 commented Jan 7, 2021 •

edited

Loading

cbaker6 commented Jan 7, 2021 •

edited

Loading

cbaker6 commented Jan 7, 2021 •

edited

Loading

mtrezza commented Jun 26, 2023 •

edited

Loading

codecov bot commented Jun 26, 2023 •

edited

Loading

mtrezza commented Jun 26, 2023 •

edited

Loading

mtrezza commented Jun 27, 2023 •

edited

Loading

mtrezza commented Oct 25, 2024 •

edited

Loading

mtrezza commented Oct 25, 2024 •

edited

Loading