Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immer patches for Array are correct, yet inefficient. Suggesting a new algorithm for optimized patches #1052

Open
yoavaa opened this issue Jun 23, 2023 · 1 comment
Labels

Comments

@yoavaa
Copy link

yoavaa commented Jun 23, 2023

I am thinking as offering it as a pull request - let me know what you think

🚀 Feature Proposal

Today when computing a patch on arrays, immer compares objects by index, which results in all array change in case of unshift or splice usage. An example is the following test taken from immer testing -

https://github.com/immerjs/immer/blob/main/__tests__/patch.js#L543-L559

describe("arrays - multiple prepend", () => {
	runPatchTest(
		{x: [1, 2, 3]},
		d => {
			d.x.unshift(4)
			d.x.unshift(5)
			// 4,5,1,2,3
		},
		[
			{op: "replace", path: ["x", 0], value: 5},
			{op: "replace", path: ["x", 1], value: 4},
			{op: "replace", path: ["x", 2], value: 1},
			{op: "add", path: ["x", 3], value: 2},
			{op: "add", path: ["x", 4], value: 3}
		]
	)
})

the expected result should be instead

		[
			{op: "add", path: ["x", 0], value: 5},
			{op: "add", path: ["x", 1], value: 4}
		]

Motivation

I am working on a new framework that also serializes patches, and the above is a performance issue.

Consider the above case - for an array of object, lets say an array of 100 objects with 10 attributes each, the current algorithm will generate 1000 patches, while my suggestion is to produce only 1 per added / removed object

Can this be solved in user-land code?

no

Example

See the example above.

Usage: immer -> serialize -> deserialize -> patched object

suggested algorithm

(pseudo code)

the problem with json diff is that given two arrays, without any additional knowledge, to figure out if an item was
pushed to the front, we have to do a === comparison between the array items, which is O(n^2) complexity.

for (let a = 0; a < A.length; a++) {
    for (let b = 0; b < A.length; b++) {
        if (A[a] === B[b])
            // compute the diff    
    }    
}

However, we can create an algorithm focused on small changes that has complexity O(n) with a cutoff.
The algorithm makes the assumption that small changes can be found fast and require small number of mutations to describe.
The cutoff is set at let limit = log(min(A.length, B.length)) and the algorithm has complexity O(limit^2) ~ O(n)

The algorithm -

let limit = log(min(A.length, B.length));
let a=0, b=0;
mainLoop: while (a < A.length && b < B.length) {
    if (A(a) === B(b)) {
        a += 1;
        b += 1
    }
    continue;
    for (let seekSize = 1; seekSize < limit; seekSize++) {
        for (let index = 1; index <= seekSize; index++) {
            if (A[a+index] === B[b+seekSize-index])
                // we have a match, the elements from A between (a..a+index) and replace them with the items from B between (b..b+seekSize-index)
            continue mainLoop;    
        }
    }
    // revert to direct comparison of attributes json diff model (the standard model used by all other algorithms)
}
@yoavaa
Copy link
Author

yoavaa commented Jun 25, 2023

Update:

Reading the Immer source again, I can see that for objects there is the state.assigned_ member which marks which object properties have been updated or deleted.

I suggest to extend the idea for arrays to also include added and removed array items, those making the above algorithm redundant and making for a simpler solution.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant