Optimize diff() for bigger data sets #5013

Reinmar · 2019-01-11T10:16:33Z

When researching engine's performance issues I found out that calling the diff() function with a and b of length 1000 (use data set from #1398) will take diff() 10s to finish. That's a terrible result.

There's no way to significantly speed up the diff() function itself. It's a highly optimal minimum diff function. Its implementation is based on a popular paper and I don't think it's reasonable to look deeper into this. Especially that we need to improve it at least 100x.

What we could do, though, is resorting to a dumb but fast function when a and b are longer than some threshold (e.g. 100 items). We already have fastDiff() but it accepts a bit different params than diff() at the moment. If we'd align fastDiff() to diff() we could make a simple change in the diff() function:

if ( a.length > 100 && b.length > 100 ) {
    return fastDiff( a, b, comp );
}

RFC.

The text was updated successfully, but these errors were encountered:

f1ames · 2019-01-11T10:29:57Z

There's no way to significantly speed up the diff() function itself. It's a highly optimal minimum diff function. Its implementation is based on a popular paper and I don't think it's reasonable to look deeper into this. Especially that we need to improve it at least 100x.

What about the fast-diff implementation I have mentioned in https://github.com/ckeditor/ckeditor5-engine/issues/403#issuecomment-375282791? It is significantly faster so maybe we can switch from diff to this one. And then for longer strings which still takes too long use our simplified fastDiff() (btw. I think it should be called partialDiff(), it's faster because it only compares part of the strings).

From what I have posted in https://github.com/ckeditor/ckeditor5-engine/issues/403#issuecomment-375282791 it seems fast-diff is like 30x to 100x times faster than our current diff() (the longer the modified parts the greater the difference). However, it will still need some performance checks.

f1ames · 2019-01-11T10:40:11Z

Realted to https://github.com/ckeditor/ckeditor5-typing/issues/145.

pjasiun · 2019-01-11T10:41:14Z

Actually, we could do it even smarter. Diff algorithm if very fast (linear) if arrayA == arrayB and takes longer and longer if arrayA has less and less common parts with arrayB. For instance:

diff( 'aaaaaaaaaa', 'aaaaaaaaaa' ); // takes 10 comparisons
diff( 'aaaaaaaaaa', 'bbbbbbbbbb' ); // takes 100 comparisons
diff( 'aaaa', 'bbbb' ); // takes 16 comparisons

So it is not the matter of the string length, but the length of the different part.

Fortunately, how diff works is:

check if strings are the same,
check if it can find a match with 1 different character,
check if it can find a match with 2 different characters,
check if it can find a match with 3 different characters,
...

We can basically stop it at some point and say that if there is more than (for instance) 30 different characters, fast diff should be used.

Reinmar · 2019-01-14T11:14:31Z

We can think about this in the future, but KISS now. We need a patch ASAP.

f1ames · 2019-02-06T17:28:57Z

I did some testing when looking for proper threshold when to switch to fastDiff() checking execution time of the whole diff() function and number of internal snake() function calls (which is the main mechanism in diff()). It looks like this:

text1 length	text2 length	`diff()` times (ms)	avg. `diff()` time (ms)	`snake()` calls
100	90	11, 7, 5, 3, 4, 12, 6, 7, 4, 5	6.4	3111
499	450	218, 342, 232, 238, 300, 245, 178, 190, 189, 167	229.9	73656
599	570	368, 1368, 1314, 1435, 1437, 1424, 1383, 1353, 1364, 1335	1278.1	111680
699	680	2359, 2370, 2461, 2378, 2323, 2317, 2499, 2332, 2334, 2327	2370	152400
799	777	3536, 4644, 3845, 3982, 4395, 4069, 3737, 3608, 3466, 3639	3892.1	196128
200	1281	1132, 971, 904, 917, 913, 890, 895, 884, 921, 915	934.2	41366

So there is clearly an issue when both compared texts are long, the number of snake() calls seems to increase exponentially when length of both texts increases. I was thinking about having a safe threshold somewhere around 600 characters, but then there could be situations like 500 + 1500 or 400 + 2000 which still will take loooooooong to diff. So I think we should go with something like if smaller length is above 400 the sum above 1200 triggers a switch to fastDiff() (I have to still check those numbers and adjust them, but you get the idea).

One more thing about snake() function - each snake() call uses cmp() function (comparator function passed to diff()) at least once. While I have used === in tests (comparing strings), for DOM nodes/elements we use sameNodes() function is more complex and more costly itself so this should also be taken into account (I will check if it has some significant execution time itself).

f1ames · 2019-02-06T17:42:37Z

It seems the sameNodes() takes only a fraction of snake() execution time which seems acceptable:

I also found one more interesting thing. It looks like the most of the exec time is spend in renderer._updateChildrenMappings() method which ofc calls diff() internally:

Without renderer._updateChildrenMappings() when setting very long content (data from #1398) there is only one diff() call on rather not complex content (estimated by number of snake() calls):

diff - snakes 2209

when I enabled renderer._updateChildrenMappings() and run the same scenario again (setting mentioned content two times in a row):

diff - snakes 2209
diff - snakes 1
diff - snakes 1
diff - snakes 9
diff - snakes 1
diff - snakes 1
diff - snakes 9
diff - snakes 1
diff - snakes 1
diff - snakes 9
diff - snakes 1
diff - snakes 4
diff - snakes 1
diff - snakes 1
diff - snakes 1
diff - snakes 4
diff - snakes 1
diff - snakes 1
diff - snakes 4
diff - snakes 1
diff - snakes 1
diff - snakes 1
diff - snakes 144
diff - snakes 1
diff - snakes 1
diff - snakes 4225
diff - snakes 1
diff - snakes 1
diff - snakes 9
diff - snakes 1
diff - snakes 564001
diff - snakes 1
diff - snakes 1
diff - snakes 9
diff - snakes 1
diff - snakes 521284
diff - snakes 1
diff - snakes 1
diff - snakes 1
diff - snakes 4
diff - snakes 1
diff - snakes 1
diff - snakes 9
diff - snakes 1
diff - snakes 4
diff - snakes 1
diff - snakes 4
diff - snakes 1
diff - snakes 4
diff - snakes 1
diff - snakes 4
diff - snakes 1
diff - snakes 4
diff - snakes 1
diff - snakes 25
diff - snakes 1
diff - snakes 1
diff - snakes 1
diff - snakes 1
diff - snakes 4
diff - snakes 1

most calls are just comparing single nodes (like p), but there are two significant ones diff - snakes 521284 and diff - snakes 564001 which seems to be comparing the whole content. Not sure if we can do something about it (well, optimizing diff is ofc one of the solutions), but this looks like the main cause of this performance issue.

Reinmar · 2019-02-11T09:22:39Z

Not sure if we can do something about it (well, optimizing diff is ofc one of the solutions), but this looks like the main cause of this performance issue.

Optimising the rendering algorithm is, of course, one of the things we could look into, but I expect that it won't be easy. You know how much time we can spend fixing the renderer – months. That's why I didn't even want to think about it. Especially that renderer is just too complex already. Adding some optimisations there would most likely make it even more so.

f1ames · 2019-02-11T10:00:58Z

Optimising the rendering algorithm is, of course, one of the things we could look into, but I expect that it won't be easy. You know how much time we can spend fixing the renderer – months...

That's true and also a reason we have https://github.com/ckeditor/ckeditor5-engine/issues/1456 :) I think going with fastDiff() is a good step for improving performance here and then, maybe one day, a time will come for more complex renderer optimization.

f1ames · 2019-02-11T11:05:00Z

So I think we should go with something like: if smaller length is above 400 and the sum above 1200, switch to fastDiff().

I did some profiling and it looks like below:

Chrome

Testing... (t1 length - t2 length - avg time - snake() calls - times)
l1: 300 l2: 700 avg: 254ms snakes(): 66276 (15) [279, 226, 222, 202, 415, 423, 285, 198, 233, 218, 221, 221, 268, 197, 200]
l1: 350 l2: 700 avg: 321ms snakes(): 81600 (15) [269, 307, 323, 333, 304, 297, 297, 401, 402, 301, 289, 313, 452, 264, 266]
l1: 400 l2: 700 avg: 354ms snakes(): 94464 (15) [364, 333, 474, 321, 369, 356, 349, 360, 351, 341, 358, 352, 323, 317, 339]
l1: 450 l2: 700 avg: 460ms snakes(): 109691 (15) [422, 403, 643, 738, 418, 404, 411, 429, 397, 403, 475, 526, 409, 429, 393]
l1: 300 l2: 800 avg: 266ms snakes(): 69996 (15) [252, 293, 239, 240, 284, 254, 285, 367, 238, 242, 277, 240, 254, 280, 241]
l1: 350 l2: 800 avg: 335ms snakes(): 87016 (15) [314, 355, 343, 318, 345, 319, 328, 364, 320, 321, 351, 344, 328, 354, 316]
l1: 400 l2: 800 avg: 464ms snakes(): 104400 (15) [412, 449, 413, 416, 425, 410, 425, 419, 419, 466, 541, 452, 550, 557, 604]
l1: 450 l2: 800 avg: 540ms snakes(): 121475 (15) [559, 566, 538, 553, 528, 504, 527, 534, 536, 548, 510, 517, 602, 544, 527]
l1: 300 l2: 900 avg: 319ms snakes(): 74836 (15) [298, 352, 307, 307, 312, 334, 299, 312, 325, 362, 306, 326, 306, 335, 310]
l1: 350 l2: 900 avg: 438ms snakes(): 94944 (15) [416, 455, 416, 453, 429, 448, 416, 457, 442, 427, 437, 458, 419, 490, 407]
l1: 400 l2: 900 avg: 565ms snakes(): 112224 (15) [579, 530, 590, 553, 552, 556, 516, 612, 581, 626, 565, 533, 590, 548, 539]
l1: 450 l2: 900 avg: 732ms snakes(): 132559 (15) [699, 742, 691, 730, 692, 755, 760, 749, 726, 727, 744, 724, 715, 752, 770]
l1: 300 l2: 1000 avg: 435ms snakes(): 79101 (15) [445, 436, 399, 433, 448, 410, 421, 441, 434, 436, 457, 435, 439, 461, 426]
l1: 350 l2: 1000 avg: 618ms snakes(): 99584 (15) [583, 549, 585, 549, 564, 634, 626, 587, 541, 560, 599, 736, 741, 694, 725]
l1: 400 l2: 1000 avg: 799ms snakes(): 124369 (15) [800, 753, 766, 775, 793, 800, 906, 853, 858, 802, 762, 786, 824, 731, 769]
l1: 450 l2: 1000 avg: 934ms snakes(): 142464 (15) [957, 913, 932, 897, 893, 948, 928, 933, 941, 944, 942, 943, 951, 961, 933]
l1: 300 l2: 1200 avg: 758ms snakes(): 84796 (15) [755, 785, 758, 741, 764, 759, 750, 755, 776, 753, 777, 692, 764, 757, 787]
l1: 350 l2: 1200 avg: 1037ms snakes(): 108819 (15) [1069, 1033, 1136, 1139, 1086, 988, 1020, 1046, 1011, 1054, 1020, 992, 978, 961, 1015]
l1: 400 l2: 1200 avg: 1321ms snakes(): 133764 (15) [1355, 1285, 1327, 1355, 1331, 1313, 1310, 1312, 1332, 1308, 1268, 1336, 1306, 1316, 1355]
l1: 450 l2: 1200 avg: 1707ms snakes(): 162976 (15) [1697, 1736, 1719, 1623, 1662, 1668, 1745, 1706, 1769, 1685, 1714, 1747, 1707, 1720, 1701]
l1: 300 l2: 1500 avg: 1494ms snakes(): 82225 (15) [1285, 1339, 1326, 1384, 1316, 1547, 1786, 1505, 1449, 1785, 1360, 1321, 1356, 1814, 1836]
l1: 350 l2: 1500 avg: 2335ms snakes(): 111600 (15) [1861, 2905, 2154, 1924, 1997, 2433, 2334, 2921, 2279, 1835, 1995, 2603, 2269, 2910, 2608]
l1: 400 l2: 1500 avg: 2690ms snakes(): 150429 (15) [2544, 2545, 3293, 2704, 2636, 2609, 2610, 2448, 2425, 3155, 2811, 2770, 2863, 2523, 2413]
l1: 450 l2: 1500 avg: 2991ms snakes(): 177304 (15) [2960, 3392, 2853, 2934, 3338, 2862, 2791, 2800, 2784, 2816, 3026, 2905, 3180, 3288, 2935]

Firefox

Testing... (t1 length - t2 length - avg time - snake() calls - times)
l1: 300 l2: 700 avg: 633ms snakes(): 66276 Array(15) [ 943, 771, 444, 1007, 344, 894, 396, 843, 544, 459, … ]
l1: 350 l2: 700 avg: 757ms snakes(): 81600 Array(15) [ 867, 1124, 802, 723, 643, 1031, 653, 758, 635, 689, … ]
l1: 400 l2: 700 avg: 1094ms snakes(): 94464 Array(15) [ 1007, 854, 817, 1017, 827, 1030, 1283, 946, 1229, 822, … ]
l1: 450 l2: 700 avg: 1014ms snakes(): 109691 Array(15) [ 1151, 1436, 1318, 1037, 1089, 951, 885, 1154, 1098, 1203, … ]
l1: 300 l2: 800 avg: 673ms snakes(): 69996 Array(15) [ 714, 453, 573, 540, 815, 845, 682, 502, 520, 621, … ]
l1: 350 l2: 800 avg: 834ms snakes(): 87016 Array(15) [ 748, 819, 846, 652, 993, 761, 587, 1541, 752, 804, … ]
l1: 400 l2: 800 avg: 984ms snakes(): 104400 Array(15) [ 919, 1167, 1120, 857, 931, 1120, 860, 911, 1032, 909, … ]
l1: 450 l2: 800 avg: 1170ms snakes(): 121475 Array(15) [ 1466, 1220, 1198, 1016, 1311, 1065, 1134, 1146, 1301, 1145, … ]
l1: 300 l2: 900 avg: 714ms snakes(): 74836 Array(15) [ 652, 792, 500, 626, 713, 901, 673, 548, 887, 579, … ]
l1: 350 l2: 900 avg: 976ms snakes(): 94944 Array(15) [ 1123, 672, 1200, 1073, 938, 907, 894, 1113, 731, 1221, … ]
l1: 400 l2: 900 avg: 1128ms snakes(): 112224 Array(15) [ 1170, 941, 1079, 1247, 1122, 966, 1268, 1107, 1188, 1251, … ]
l1: 450 l2: 900 avg: 1339ms snakes(): 132559 Array(15) [ 1447, 1461, 1320, 1386, 1401, 1182, 1481, 1340, 1137, 1330, … ]
l1: 300 l2: 1000 avg: 785ms snakes(): 79101 Array(15) [ 644, 714, 814, 821, 846, 776, 704, 843, 665, 886, … ]
l1: 350 l2: 1000 avg: 1062ms snakes(): 99584 Array(15) [ 819, 1250, 843, 1162, 1338, 775, 1397, 1081, 842, 1070, … ]
l1: 400 l2: 1000 avg: 1296ms snakes(): 124369 Array(15) [ 1365, 1272, 1218, 1142, 1362, 1309, 1208, 1230, 1229, 1368, … ]
l1: 450 l2: 1000 avg: 1440ms snakes(): 142464 Array(15) [ 1386, 1530, 1505, 1435, 1254, 1495, 1549, 1452, 1409, 1418, … ]
l1: 300 l2: 1200 avg: 1085ms snakes(): 84796 Array(15) [ 864, 1305, 1010, 967, 979, 808, 1196, 1000, 815, 1260, … ]
l1: 350 l2: 1200 avg: 1700ms snakes(): 108819 Array(15) [ 1344, 1981, 1639, 1615, 974, 2583, 1815, 955, 2315, 1720, … ]
l1: 400 l2: 1200 avg: 2246ms snakes(): 133764 Array(15) [ 3217, 1958, 2203, 1847, 2141, 2843, 2669, 2197, 2395, 1840, … ]
l1: 450 l2: 1200 avg: 2381ms snakes(): 162976 Array(15) [ 2249, 2318, 2137, 2112, 2000, 2846, 2527, 2584, 2213, 3091, … ]
l1: 300 l2: 1500 avg: 1304ms snakes(): 82225 Array(15) [ 999, 1390, 1826, 1904, 944, 1561, 1487, 988, 1502, 815, … ]
l1: 350 l2: 1500 avg: 1703ms snakes(): 111600 Array(15) [ 1644, 1687, 1678, 2404, 1342, 2041, 1983, 2394, 1974, 1174, … ]
l1: 400 l2: 1500 avg: 1824ms snakes(): 150429 Array(15) [ 1848, 1366, 1961, 1473, 2245, 1653, 2190, 1421, 1849, 1731, … ]
l1: 450 l2: 1500 avg: 1985ms snakes(): 177304 Array(15) [ 1657, 2082, 2124, 2002, 1734, 2048, 1924, 1814, 1738, 2172, … ]

Safari

Testing... (t1 length - t2 length - avg time - snake() calls - times)
l1: 300 – "l2: 700" – "avg: 106ms" – "snakes(): 66276" – [150, 147, 107, …] (15)
l1: 350 – "l2: 700" – "avg: 124ms" – "snakes(): 81600" – [134, 120, 125, …] (15)
l1: 400 – "l2: 700" – "avg: 156ms" – "snakes(): 94464" – [176, 160, 163, …] (15)
l1: 450 – "l2: 700" – "avg: 183ms" – "snakes(): 109691" – [190, 184, 185, …] (15)
l1: 300 – "l2: 800" – "avg: 109ms" – "snakes(): 69996" – [113, 109, 110, …] (15)
l1: 350 – "l2: 800" – "avg: 137ms" – "snakes(): 87016" – [150, 144, 139, …] (15)
l1: 400 – "l2: 800" – "avg: 174ms" – "snakes(): 104400" – [180, 175, 174, …] (15)
l1: 450 – "l2: 800" – "avg: 216ms" – "snakes(): 121475" – [220, 217, 222, …] (15)
l1: 300 – "l2: 900" – "avg: 123ms" – "snakes(): 74836" – [133, 123, 125, …] (15)
l1: 350 – "l2: 900" – "avg: 167ms" – "snakes(): 94944" – [175, 173, 170, …] (15)
l1: 400 – "l2: 900" – "avg: 202ms" – "snakes(): 112224" – [213, 206, 201, …] (15)
l1: 450 – "l2: 900" – "avg: 263ms" – "snakes(): 132559" – [266, 255, 255, …] (15)
l1: 300 – "l2: 1000" – "avg: 167ms" – "snakes(): 79101" – [160, 145, 158, …] (15)
l1: 350 – "l2: 1000" – "avg: 198ms" – "snakes(): 99584" – [225, 202, 197, …] (15)
l1: 400 – "l2: 1000" – "avg: 261ms" – "snakes(): 124369" – [258, 269, 279, …] (15)
l1: 450 – "l2: 1000" – "avg: 321ms" – "snakes(): 142464" – [311, 327, 324, …] (15)
l1: 300 – "l2: 1200" – "avg: 235ms" – "snakes(): 84796" – [229, 226, 228, …] (15)
l1: 350 – "l2: 1200" – "avg: 345ms" – "snakes(): 108819" – [309, 318, 329, …] (15)
l1: 400 – "l2: 1200" – "avg: 422ms" – "snakes(): 133764" – [529, 411, 411, …] (15)
l1: 450 – "l2: 1200" – "avg: 518ms" – "snakes(): 162976" – [604, 499, 499, …] (15)
l1: 300 – "l2: 1500" – "avg: 334ms" – "snakes(): 82225" – [331, 339, 325, …] (15)
l1: 350 – "l2: 1500" – "avg: 492ms" – "snakes(): 111600" – [452, 480, 497, …] (15)
l1: 400 – "l2: 1500" – "avg: 698ms" – "snakes(): 150429" – [665, 677, 697, …] (15)
l1: 450 – "l2: 1500" – "avg: 852ms" – "snakes(): 177304" – [754, 755, 767, …] (15)

One visible thing is that the diff() time differs for different browsers (with Safari being the fastest 🤔) so we should also take this into consideration. I would consider exec time below 1500ms to be acceptable (taking into consideration that it is only diff() processing so there is other processing apart from that (whole conversion pipeline) and also the fact that since it runs in the browser it depends also on the end machine performance).

Looking at the data above and earlier findings (that running for text with both 600+ chars takes 1500ms+) I would go with threshold (switching to fastDiff()) when shorter string/array has above 400 elements and the length sum for both is above 1300 or if the length sum is above 2000 (for scenarios like 300 - 1800 or 200 - 2000 etc.).

Other: Optimized `diff()` function to use `fastDiff()` function internally for large data sets. Closes #269.

Reinmar assigned f1ames Jan 11, 2019

Reinmar closed this as completed in ckeditor/ckeditor5-utils#274 Feb 13, 2019

Reinmar referenced this issue in ckeditor/ckeditor5-utils Feb 13, 2019

Merge pull request #274 from ckeditor/t/269

ee9bed0

Other: Optimized `diff()` function to use `fastDiff()` function internally for large data sets. Closes #269.

mlewand added this to the iteration 22 milestone Oct 9, 2019

mlewand added status:discussion type:improvement This issue reports a possible enhancement of an existing feature. package:utils labels Oct 9, 2019

mlewand transferred this issue from ckeditor/ckeditor5-utils Oct 9, 2019

f1ames mentioned this issue Oct 11, 2019

Optimized diff() function which now uses fastDiff() function internally for large data sets ckeditor/ckeditor5-utils#274

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize diff() for bigger data sets #5013

Optimize diff() for bigger data sets #5013

Reinmar commented Jan 11, 2019

f1ames commented Jan 11, 2019

f1ames commented Jan 11, 2019

pjasiun commented Jan 11, 2019

Reinmar commented Jan 14, 2019

f1ames commented Feb 6, 2019

f1ames commented Feb 6, 2019 •

edited

Loading

Reinmar commented Feb 11, 2019

f1ames commented Feb 11, 2019

f1ames commented Feb 11, 2019 •

edited

Loading

Optimize diff() for bigger data sets #5013

Optimize diff() for bigger data sets #5013

Comments

Reinmar commented Jan 11, 2019

f1ames commented Jan 11, 2019

f1ames commented Jan 11, 2019

pjasiun commented Jan 11, 2019

Reinmar commented Jan 14, 2019

f1ames commented Feb 6, 2019

f1ames commented Feb 6, 2019 • edited Loading

Reinmar commented Feb 11, 2019

f1ames commented Feb 11, 2019

f1ames commented Feb 11, 2019 • edited Loading

Chrome

Firefox

Safari

f1ames commented Feb 6, 2019 •

edited

Loading

f1ames commented Feb 11, 2019 •

edited

Loading