Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: CollectDirtyChildren causing app to hang when saving object with many pointers #1589

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

shlusiak
Copy link

@shlusiak shlusiak commented Jan 5, 2021

Please see parse-community/Parse-SDK-Android#1058 for the Android PR and parse-community/Parse-SDK-Android#1006 (comment) for the analysis of the issue.

The collectDirtyChildren forgets seen siblings when traversing, which causes extra traversals if those siblings point to the same objects and causes lots of garbage to be created in memory. It also has the potential of adding the same dirty object twice to the dirtyChildren collection, which may or may not be a set.

Imagine the simplified setup of an Object A with an attribute which is an Array containing two pointers to object B. When traversing A, the array will be looped over. For each object found, the seen set would be duplicated and B would be added and B would be traversed. When backing out and iterating over the next attribute in that array, the original seen variable would be used and B would be traversed again and all of it's child pointers.

Removing the copy of the seen set may improve performance significantly.

Closes: #1588
Possibly closes: #982

@cbaker6
Copy link
Contributor

cbaker6 commented Jan 5, 2021

This may be related to #1588. My suggestion would be to port the test cases I listed first (in this PR), then if that utility method is broken, add the fix (it’s in the comment also). I say this because saving child objects relies on batch saves and looks like the batching decision for large arrays is messed up.

If the Android SDK is using the same algorithm for the batch calculation, it will probably need to be fixed also.

@@ -207,7 +207,7 @@ + (BFTask *)_enqueue:(BFTask *(^)(BFTask *toAwait))taskStart forObjects:(NSArray
+ (BOOL)collectDirtyChildren:(id)node
children:(NSMutableSet *)dirtyChildren
files:(NSMutableSet *)dirtyFiles
seen:(NSSet *)seen
seen:(NSMutableSet *)seen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, this may or not be the right solution. This is a recursive call, depending on how it's used this may or may not need to be mutable.

Does the original implementation work fine on a number of smaller children? If it does, I'm assuming it's less than 50, and the issue comes up over 50. If so, the problem might be the batching calculation I mentioned.

When I wrote those test cases, when it went over the set amount to batch (50), it returned only the first 50 objects every time which sounds like a similar symptom that you are reporting.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a batching issue as batching will happen after all dirty children are calculated. In my situation this recursive method spins without finishing, consuming all CPU. It does indeed work fine for small objects but explodes quickly. There is no batching involved in my case as I know there is only one dirty object, the root object, but this method will run into a seemingly endless loop trying to traverse all child objects.

This recursive call is internal and only called in one place passing in a NSMutableSet (changing this to require a mutable set only failed in this one call site).

The problem here is triggered by our complex schema with a couple of classes, where each unfortunately is linked back to a root parent class. Now when we use arrays that are traversed and at the end many pointers point to the same parent, this parent is visited many, many, many times, and this explodes very quickly.

p = [Parent]
a.parent = p
b.parent = p
c.parent = p
d.parent = p
r = [Root]
r.objects = [a, b, c, d]
// saving this root will visit each of the children a, b, c, d, 
// but the algorithm will not remember that p has already been visited and p will be visited 4 times,
// including all of p's children. 
r.saveInBackground()

This explodes exponentially when you have layers of nested pointers to the common object or if you imagine the parent having pointers to other objects. This happens before the dirty children are returned, so before any batching could occur, so I don't think #1588 is necessarily related.

In my case this method never returns in reasonable time (I think it will return eventually, I cannot detect an endless loop in there, but that may take years).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p = [Parent]
a.parent = p
b.parent = p
c.parent = p
d.parent = p
r = [Root]
r.objects = [a, b, c, d]
// saving this root will visit each of the children a, b, c, d,
// but the algorithm will not remember that p has already been visited and p will be visited 4 times,
// including all of p's children.
r.saveInBackground()

I agree that it seems like it should remember p after visiting a.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems like this would be remedied if p points to all it's children and if the children really want to know their relation to their parent p, they just add a field referencing p's objectId, not pointing back up to p, allowing everything to traverse. This was the issue that was coming up in the other issue you referenced.

@cbaker6
Copy link
Contributor

cbaker6 commented Jan 5, 2021

@cbaker6
Copy link
Contributor

cbaker6 commented Jan 6, 2021

Can you rebase this with the master branch to run the CI?

Also, it sounds like you have a possible testcase to verify your bug fix, can you add that also?

@shlusiak shlusiak force-pushed the remove-extra-child-traversal-causing-app-to-hang branch from fc60a5a to 1f26e35 Compare January 7, 2021 03:25
@shlusiak
Copy link
Author

shlusiak commented Jan 7, 2021

Can you rebase this with the master branch to run the CI?

Also, it sounds like you have a possible testcase to verify your bug fix, can you add that also?

Rebased to master. I don't have a test case written and my knowledge of ObjC is limited and don't think I'd be able to write one. I'd also probably not write a test case to catch this bug (which is an infinite loop given the right schema), but a test case to verify that the fix doesn't break the intended behavior. It would be unfortunate if there were no such test cases yet, but I have not checked.

Does my explaination of the flaw in the algorithm make sense though?

@cbaker6
Copy link
Contributor

cbaker6 commented Jan 7, 2021

Does my explaination of the flaw in the algorithm make sense though?

The explanation makes sense, but I need to think more if the case you presented should work and not introduce other issues. The problems listed in #982 look like circular dependencies to me that the algorithm couldn't detect. In those examples, it doesn't seem like they setup the schema in a way conducive to Parse. What have you seen after your change? Are you using the change in production?

Do you have any sample code you can show that demonstrates the issue?

@cbaker6
Copy link
Contributor

cbaker6 commented Jan 7, 2021

@dplewis @mtrezza @TomWFox what do you all think? This was also brought up in the Android repo. Sounds like it can be a big problem. The questions I have is “what should the algorithm do in this scenario?”. “Does the change fix the problem without introducing other problems?”. I left some other comments above also...

@shlusiak
Copy link
Author

shlusiak commented Jan 7, 2021

Yes, we are using this fix in production now as it has fixed the issue we were seeing of the CPU spinning when trying to save a dirty object that contains an array of pointers to other objects, which point to quite a lot of other objects.

I at first also thought about circular dependencies that this algorithm wouldn't detect and that led me down to this code here, where objects would be visited multiple times. Now of course we have circular dependencies everywhere, but the algorithm is 1) detecting the only problematic case just time (two new objects pointing to each other) and 2) is supposed to stop when an object has already been seen. Now this second part, where objects are traversed multiple times smells like an infinite loop, but I couldn't proof that and believe the algorithm is actually meant to terminate eventually.

In our case it has been working fine without issues but a recent addition of another level of pointers to our schema made the whole thing spin forever. Again, I cannot proof that this is stuck in an infinite loop or only for 200+ years.

We have the same fix in production in Android since a few month and have not seen any issues related to this, but our saving usually only saves a single dirty object in one operation.

@shlusiak
Copy link
Author

shlusiak commented Jan 7, 2021

Forgive my limited ObjC / Swift / iOS skills, but a simple test case would be:

let parent = PFObject(className: "parent")
        let obj = PFObject(className: "object")
        let ch1 = PFObject(className: "child")
        let ch2 = PFObject(className: "child")
        ch1["parent"] = parent
        ch2["parent"] = parent
        
        obj["values"] = [
            ch1,
            ch2
        ]
        
        try! obj.save()

No circular dependencies. This call should succeed in both cases, with or without my fix, but without my fix parent and all it's children will be visited twice, with my fix only once. I don't know how you'd cement that into a test case as the result is exactly the same, just one is more efficient.

@cbaker6
Copy link
Contributor

cbaker6 commented Jan 7, 2021

No circular dependencies. This call should succeed in both cases, with or without my fix, but without my fix parent and all it's children will be visited twice, with my fix only once. I don't know how you'd cement that into a test case as the result is exactly the same, just one is more efficient.

in this particular framework, every PFObject is a class (reference type) and since you didn’t save the parent first it might have trouble hashing to a unique value (It doesn’t have an objectId yet) causing the parent not to be recognized as the same object. If you really wanted to do this it seems you should save the parent first (Gets an objectId), then make the children point to the parent after the parent is saved

@cbaker6
Copy link
Contributor

cbaker6 commented Jan 7, 2021

Two objects attempting to point to the same object that is unsaved and therefore not unique can cause a problem

@cbaker6
Copy link
Contributor

cbaker6 commented Jan 7, 2021

If PFObjects were structs and equatable then there probably wouldn’t be an issue with this, but they are not here

@shlusiak
Copy link
Author

shlusiak commented Jan 7, 2021

This would have disastrous consequences already as you'd end up with multiple dirty instances if that was the case. Also I'm pretty sure that because PFObject is a class and not a struct, two pointers to the parent will be identical (reference) and the hash will be identical and the object would only be added to the set once.

But this is not the point I am trying to make, my algorithm change will not change what happens when two objects point to the same unsaved object.

@cbaker6
Copy link
Contributor

cbaker6 commented Jan 7, 2021

Classes don’t hash to the same value unless you make them conform to hashable/equatable or do some bitwise comparison. Since PFObject subclasses NSObject, you may be getting some help, but that’s not something I would personally depend on. Parse considers an object “Identfiable” when it has an objectId, in your case, none of them do yet but are pointing to the same object. You are claiming they should be identifiable because they are referencing the same local object, but these objects are being encoded and sent to the server with no objectId. Two objects can look exactly the same in encoded form, does that make them the same object? No way to tell, this can either be resolved as a circular dependency or treated as separate ParseObjects...

Your coded example looks like a corner case circular dependency when using the current design of the SDK and it seems to be the case with the infinite loop you are getting. I gave a couple of examples of how to offset this, at least when it comes to saving to the server. I don’t know much about how save to local storage work as I assume they get some localId before saving.

My questions above about will this change fix the problem without introducing others still remain. If the algorithm is indeed missing this corner case or can be improved to mitigate it, it should.

@shlusiak
Copy link
Author

shlusiak commented Jan 8, 2021

In my example the parent object is the same instance. This should have the same hash because it is literally the same instance, not another object of the same class qith the same values. The two child classes are different and will get different objectIds. Also there is no circle and this is no edge case. Feel free to assume that all objects are already persisted and have objectIds and that there is no local storage. The algorithm is simply inefficient and with larger data sets is likely to quickly churn on CPU and may appear to never finish even if none of the objects are dirty.

@cbaker6
Copy link
Contributor

cbaker6 commented Jan 8, 2021

Feel free to assume that all objects are already persisted and have objectIds and that there is no local storage.

This was not the example you presented, it’s completely different. You presented saving 3 “new” objects, that means no objectIds. Your latest comment says assume objectIds... if the children have Parse Ponters to “p”, then their shouldn’t be a traversal issue. I’m assuming in your new scenario you have all keys included, having the complete object of p pointed to which is why p’s keys are even being looked at from the children...

At any rate, in order to understand the possible improvements or issues your change may introduce you should post the updated code depicting the scenario that’s causing the issue as closely as possible.

@shlusiak
Copy link
Author

shlusiak commented Jan 9, 2021

I fail to see why my example is not sufficient enough to outline what is happening in the algorithm and what I believe is a flaw. Whether objects are new or not is not relevant as they will be traversed regardless, to find circular dependencies of new objects. Neither does my example contain a circular dependency, to keep it simple.

Can we agree that the algorithm would be expected to:

  • Traverse all objects and find no cycle
  • come up with all objects being dirty / new
  • persist parent first
  • persist the two ch objects, once parent has an objectId
  • persist the obj instance last once the ch objects have an objectId

The algorithm does all that and there is no issue.
Except that it will traverse the parent object twice. And that is should not do that since that might really blow out the search tree.

My fix is a way to stop the algorithm from visiting nodes it has already seen. Now this will ideally not change the end result of what the algorithm is producing.

Whether my fix is negatively impacting other aspects of this algorithm other than outlined above would be up for debate and I'd like to focus on that rather than batching issues outside the scope of the affected function.

My simplified test case is of course not suitable to find regressions my fix could introduce but I was hoping we could get back to discuss my proposed changes and whether we can agree on them having positive or negative effect on the algorithm.

@eeallen1
Copy link
Contributor

Is there any update on this? We have the occasional issue when saving complex objects, and I suspect it's related.

eeallen1 added a commit to eeallen1/Parse-SDK-iOS-OSX that referenced this pull request Jun 10, 2021
@stale
Copy link

stale bot commented Jul 13, 2021

This issue has been automatically marked as stale because it has not had recent activity. If you believe it should stay open, please let us know! As always, we encourage contributions, check out the Contributing Guide

@stale stale bot added the Stale label Jul 13, 2021
@shlusiak
Copy link
Author

This project is stale. :(

@stale stale bot removed the Stale label Jul 13, 2021
@stale
Copy link

stale bot commented Oct 2, 2021

This issue has been automatically marked as stale because it has not had recent activity. If you believe it should stay open, please let us know! As always, we encourage contributions, check out the Contributing Guide

@stale stale bot added the Stale label Oct 2, 2021
@mtrezza mtrezza removed the Stale label Oct 14, 2021
@shlusiak shlusiak force-pushed the remove-extra-child-traversal-causing-app-to-hang branch 2 times, most recently from 7db9be5 to 42f8143 Compare June 26, 2023 04:10
@parse-github-assistant
Copy link

I will reformat the title to use the proper commit message syntax.

@parse-github-assistant parse-github-assistant bot changed the title fix: collectDirtyChildren causing app to hang when saving object with many pointers fix: CollectDirtyChildren causing app to hang when saving object with many pointers Jun 26, 2023
@shlusiak
Copy link
Author

Rebased onto master, as this issue was never fixed. Is there any progress on this or has this project become stale?

@mtrezza mtrezza requested review from a team and removed request for stephanboner, cbaker6 and a team June 26, 2023 12:09
@mtrezza
Copy link
Member

mtrezza commented Jun 26, 2023

@shlusiak Could you try to write up a test case? A test case is usually required to sustainably fix a bug. Maybe someone from @parse-community/ios-sdk could help. You could look at existing test cases, choose and that is most similar to what you're trying to test, then copy and modify it. Also maybe take a look at the related issue in the Android SDK if there is any test case you can use here.

@codecov
Copy link

codecov bot commented Jun 26, 2023

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@mtrezza
Copy link
Member

mtrezza commented Jun 26, 2023

@stephanboner Would you be interested in joining the iOS SDK team there for occasional reviews?

@stephanboner
Copy link

@stephanboner Would you be interested in joining the iOS SDK team there for occasional reviews?

Hey @mtrezza, thanks for asking! Unfortunately I'm not working as a software developer anymore and therefore I no longer have the required infrastructure (a Mac with Xcode). Thank you though!

@mtrezza
Copy link
Member

mtrezza commented Jun 27, 2023

@stephanboner Thanks for replying, wish you all the best in your new career!

@shlusiak shlusiak force-pushed the remove-extra-child-traversal-causing-app-to-hang branch from d7625ba to 0dc8128 Compare October 18, 2023 11:47
@shlusiak shlusiak force-pushed the remove-extra-child-traversal-causing-app-to-hang branch 2 times, most recently from 86c1353 to aac4bdc Compare October 23, 2024 11:33
@shlusiak shlusiak force-pushed the remove-extra-child-traversal-causing-app-to-hang branch from aac4bdc to 717081a Compare October 23, 2024 13:31
@mtrezza
Copy link
Member

mtrezza commented Oct 23, 2024

@shlusiak I noticed you made some commits; what is the state of this PR?

@shlusiak
Copy link
Author

@shlusiak I noticed you made some commits; what is the state of this PR?

Thanks for reaching out. This PR is still open and valid, and we rely on this branch for our production environment. I had to rebase this branch to get it to compile using XCode 16, but getting this into master would be very much appreciated.

@mtrezza
Copy link
Member

mtrezza commented Oct 25, 2024

Do you think it's possible to write up a short test for this change? Then we can go ahead and merge this.

@mtrezza
Copy link
Member

mtrezza commented Oct 25, 2024

Maybe something like this, added to ParseObjectTests.m?

(void)testCollectDirtyChildrenWithManyPointers {
    // Create a root object
    ParseObject *rootObject = [ParseObject objectWithClassName:@"RootObject"];

    // Add a large number of child pointers to simulate heavy usage
    for (int i = 0; i < 1000; i++) {
        ParseObject *childObject = [ParseObject objectWithClassName:@"ChildObject"];
        childObject[@"data"] = @(i); // Mock data
        [rootObject addObject:childObject forKey:@"children"];
    }

    // Attempt to save the root object, expecting no hang or excessive recursion
    XCTestExpectation *expectation = [self expectationWithDescription:@"Object saved without hang"];
    
    [rootObject saveInBackgroundWithBlock:^(BOOL succeeded, NSError *error) {
        XCTAssertTrue(succeeded, @"Root object with many pointers should save successfully.");
        XCTAssertNil(error, @"No error should occur.");
        [expectation fulfill];
    }];
    
    [self waitForExpectationsWithTimeout:10 handler:nil];
}

@shlusiak
Copy link
Author

@mtrezza I'm not actually an iOS developer, and unfortunately don't know Objective C, so even finding the fix was hard enough. If you are happy for me to just commit that test you provided, I can certainly do that. I'm a bit hesitant though to add tests that expect certain timings of functions, as success failure now may depend on the speed of the agents executing these tests, further I don't think unit tests are meant to stress test performance of functions. Are there sufficient unit tests covering PFObject to say with confidence that my change did not break existing business logic?

@mtrezza
Copy link
Member

mtrezza commented Oct 27, 2024

Could you try adding the test and running it without the fix, just so we know whether it even fails. If it doesn't fail then there's no need to add it.

Can you confirm that this fix worked for you? How did you try this out?

@shlusiak
Copy link
Author

I'm sorry, this is very much out of me league. How do I run tests? None of them compile in my XCode, I do not know how to write ObjC code.

I'm very happy to explain the change to the algorithm, why the previous algorithm was very inefficient, caused us great performance problems in our production app, and why my proposed change fixes that for us. The same change was made for Android, and the issue is explained in parse-community/Parse-SDK-Android#1006 (comment)

We are running our production app from this custom branch here, and please believe me that the performance problems went away.

I understand you might want a usecase so you can reproduce and verify the issue, but I would need some help adding and running the tests.

@mtrezza
Copy link
Member

mtrezza commented Oct 30, 2024

Sure, maybe you want to take a look at the existing tests in ObjectUnitTests.m, in the last section Dirty. Is there any test case that looks similar to the scenario you are describing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants