-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize diff() for bigger data sets #5013
Comments
What about the From what I have posted in https://github.com/ckeditor/ckeditor5-engine/issues/403#issuecomment-375282791 it seems |
Actually, we could do it even smarter. Diff algorithm if very fast (linear) if arrayA == arrayB and takes longer and longer if arrayA has less and less common parts with arrayB. For instance:
So it is not the matter of the string length, but the length of the different part. Fortunately, how diff works is:
We can basically stop it at some point and say that if there is more than (for instance) 30 different characters, fast diff should be used. |
We can think about this in the future, but KISS now. We need a patch ASAP. |
I did some testing when looking for proper threshold when to switch to
So there is clearly an issue when both compared texts are long, the number of One more thing about |
It seems the I also found one more interesting thing. It looks like the most of the exec time is spend in Without
when I enabled
most calls are just comparing single nodes (like |
Optimising the rendering algorithm is, of course, one of the things we could look into, but I expect that it won't be easy. You know how much time we can spend fixing the renderer – months. That's why I didn't even want to think about it. Especially that renderer is just too complex already. Adding some optimisations there would most likely make it even more so. |
That's true and also a reason we have https://github.com/ckeditor/ckeditor5-engine/issues/1456 :) I think going with |
I did some profiling and it looks like below: Chrome
Firefox
Safari
One visible thing is that the Looking at the data above and earlier findings (that running for text with both 600+ chars takes 1500ms+) I would go with threshold (switching to |
Other: Optimized `diff()` function to use `fastDiff()` function internally for large data sets. Closes #269.
When researching engine's performance issues I found out that calling the
diff()
function witha
andb
of length 1000 (use data set from #1398) will takediff()
10s to finish. That's a terrible result.There's no way to significantly speed up the
diff()
function itself. It's a highly optimal minimum diff function. Its implementation is based on a popular paper and I don't think it's reasonable to look deeper into this. Especially that we need to improve it at least 100x.What we could do, though, is resorting to a dumb but fast function when
a
andb
are longer than some threshold (e.g. 100 items). We already havefastDiff()
but it accepts a bit different params thandiff()
at the moment. If we'd alignfastDiff()
todiff()
we could make a simple change in thediff()
function:RFC.
The text was updated successfully, but these errors were encountered: