Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lfs: optimize path filtering #355

Merged
merged 1 commit into from
May 7, 2024

Conversation

sisp
Copy link
Collaborator

@sisp sisp commented May 3, 2024

I've optimized LFS path filtering as we discussed in #338. The first optimization implements the suggestion in #338 (comment) to short-cut a single-path include filter. Here are the tests for the regex that extracts the path prefix: https://regex101.com/r/wBjHf0/1 Note that the extra \n on regex101.com is only necessary to allow one test case per line, it isn't needed in the actual regex. The second optimization unionizes the filename regex patterns derived from Unix filename patterns and matches each path against the pre-compiled single regex, which is faster than matching against the Unix filename patterns individually. Also, it avoids intermediate list materialization but instead implements a streaming filter.

As the two commits implement independent optimizations, I intend this PR to be rebase-merged without commit squashing.

Partially fixes #338.

@shcheklein shcheklein merged commit bd11bec into iterative:main May 7, 2024
13 checks passed
@sisp sisp deleted the lfs/collect-objects-perf branch May 7, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lfs: internal _filter_paths() function is prohibitively slow
2 participants