Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finalize gix-merge integration #5722

Merged
merged 11 commits into from
Dec 13, 2024

Conversation

Byron
Copy link
Collaborator

@Byron Byron commented Dec 2, 2024

Replace the remaining uses of git2::merge_trees with the respective gix function.
This was made possible thanks to two additions:

Follow-up to #5416.

Merging/Cheery-picking/Rebasing is now 3.1x faster! I'd also expect it to be more correct, this means it should be able to resolve more unambiguous conflicts.

Tasks

  • preliminary gix update
  • make API improvements for nicer integration
  • no test-double-run
  • replace all remaining usage of git2::merge_trees()
    • cherry-pick
    • edit-mode
    • repo/rebase.rs
    • get_commit_index
    • test-project/merge
  • once again evaluate the @~10 variant and find out what's going on there. Manual comparisons should be easier then.
    • Turns out there was one now fixed shortcoming/bug and another bug that was there already, to be addressed separately.
  • Enter resolve conflict mode - fix issue with diff problem
  • figure out why one of the conflicts isn't actually a conflict when trying to resolve it
  • update to latest gix on main

Notes for the reviewer

  • Thanks to (even) more exhaustive testing, merges are now of better quality particularly when a directory gets replaced with a file, or vice-versa.
  • There is one known niche condition in conjunction with tree-to-file or file-to-tree replacements where detected renames can throw of the resolution, leading to worse outcomes than what Git would produce. It's the only known case of this happening and it's unclear how it can be fixed. While writing this, I had an idea on how to fix it maybe, let's hope this comment doesn't have to stay long. - the issue was due to rename-detection in empty blobs, which now is deactivated by default if these renames happen to individual files.
  • I noticed that the stack tests run serially, which was due to a shortcoming in how gix-testtools locks its script before execution. However, as these script invocations run in their own tempdir, nothing needs to be locked, so everything can run in parallel. When testing this, it showed that running the gitbutler-cli in debug mode in parallel with 16 cores isn't great, and the runtime increased to 25s compared to 22s while running it serially, while massively increasing the used CPU. Most of the time was kernel time due to filesystem access, so it seems the filesystem doesn't like what's happening. This won't be an issue as gix-testtools v0.16 is current, while we use v0.15 here. I also didn't release the patch yet, so there can't be any surprises. At some point though, one would probably try to make these tests better in creating their initial state.

Validation

  • When testing the rebasing/cherry-picking, I used jgit, and set the base of the target-branch in virtual-branches.toml to 7b955048eb86e1c12114554beacb27b329252b15, about 500 commits in the past. Then I applied the 6.10 branch and hit the update button. This causes a lot of conflicting commits, which were then output to with gitbutler-cli branch status > git2|gix.rom for comparison.
    • Both look plausible, and seemingly are the same. Change-ids and dates make comparisons harder as commit-ids change strongly. The listing of paths seems different at times, and the UI can show duplicates.
    • I also don't understand how a single commit can become about 500 after applying, as in theory it should figure out that the tag is already merged mostly.
    • So it's somewhat inconclusive, but doesn't look wrong.
  • Actually, when going only 10 commits into the past, the baseline version shows conflicts, even lists them in the UI for a fraction of a second, and then all but one commits disappear. This disappearance doesn't happen when using the gix-merge version.
    • Interestingly this only happened once, and both versions seem to be exactly the same, i.e. git2 and gix.

Questions

  • How are conflicted_files used? It's a list of ancestor, ours and their files taken directly from the index, but I can't imagine what these duplicate sets of files are good for. In theory, these wouldn't have to be stored - instead one re-does the merge without merge-strategy and then builds the conflicting index for presentation to the user.

Performance

Cherry-Pick

The test setup is a jgit repository which rebases 500 commits by using a workspace update. The branch used is stable-6.10 and the 'base' commit (in virtual_branches.toml) is 7b955048eb86e1c12114554beacb27b329252b15.

Before

integrate_upstream takes 13.80s. It's notable that the cherry-pick merge runs twice.

Screenshot 2024-12-05 at 21 10 00
After

integrate_upstream is down to 4.43s, a 3.1x improvement.

Screenshot 2024-12-05 at 21 10 49

And according to the numbers with an updated diffing engine we land at 3.8s, a 3.6x improvement.

Screenshot 2024-12-09 at 07 05 18

Copy link

vercel bot commented Dec 2, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
gitbutler-components ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 13, 2024 1:24pm
gitbutler-web ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 13, 2024 1:24pm

Copy link

vercel bot commented Dec 2, 2024

@Byron is attempting to deploy a commit to the GitButler Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions bot added the rust Pull requests that update Rust code label Dec 2, 2024
@Byron Byron mentioned this pull request Dec 2, 2024
8 tasks
Byron added a commit to GitoxideLabs/gitoxide that referenced this pull request Dec 2, 2024
Byron added a commit to GitoxideLabs/gitoxide that referenced this pull request Dec 2, 2024
@Byron Byron force-pushed the finalize-merge-integration branch from f873049 to 34e6042 Compare December 4, 2024 06:24
@Byron Byron force-pushed the finalize-merge-integration branch from 34e6042 to 5856b9d Compare December 4, 2024 16:44
Byron added a commit to GitoxideLabs/gitoxide that referenced this pull request Dec 5, 2024
Byron added a commit to GitoxideLabs/gitoxide that referenced this pull request Dec 5, 2024
Byron added a commit to GitoxideLabs/gitoxide that referenced this pull request Dec 5, 2024
@Byron Byron force-pushed the finalize-merge-integration branch from 5856b9d to b40b347 Compare December 5, 2024 20:12
@Byron Byron force-pushed the finalize-merge-integration branch from b40b347 to a306692 Compare December 6, 2024 16:09
@Byron Byron force-pushed the finalize-merge-integration branch from a306692 to 85bf792 Compare December 6, 2024 16:40
@Byron Byron force-pushed the finalize-merge-integration branch from 809e06a to 997720b Compare December 7, 2024 15:31
@Byron Byron force-pushed the finalize-merge-integration branch from 997720b to 8dfbdc7 Compare December 7, 2024 17:47
@Caleb-T-Owens
Copy link
Contributor

How are conflicted_files used? It's a list of ancestor, ours and their files taken directly from the index, but I can't imagine what these duplicate sets of files are good for. In theory, these wouldn't have to be stored - instead one re-does the merge without merge-strategy and then builds the conflicting index for presentation to the user.

conflicted_files are used on the commit card and edit mode for listing what files are conflicted (because they likely will not be in the diff), and also for informing the user how a particular file is conflicted. IE ours deleted and theirs modified...

There is no good reason why we couldn't do the merge on the fly, and display the information that way. Especially because the result can be easily cached in the frontend.

@Caleb-T-Owens
Copy link
Contributor

Other than a few small questions, this is incredibly accurate. Amazing stuff

@Byron
Copy link
Collaborator Author

Byron commented Dec 9, 2024

Thanks a lot for the review - thanks to it I was able to make a change which makes the code more readable.

Copy link
Collaborator Author

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the review - thanks to it I was able to make a change which makes the code more readable.

Comment on lines 245 to 254
let conflicting_entries = index.entries().iter().filter_map(|e| {
let stage = e.stage();
if stage == gix::index::entry::Stage::Unconflicted {
None
} else {
Some((stage, e))
}
});
for (stage, entry) in conflicting_entries {
let storage = match stage {
Stage::Unconflicted => {
unreachable!("BUG: filtered above to not contain unconflicted entries")
}
Stage::Base => &mut ancestor_entries,
Stage::Ours => &mut our_entries,
Stage::Theirs => &mut their_entries,
};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A great catch, I changed it and it's so much easier to read.

crates/gitbutler-repo/src/rebase.rs Show resolved Hide resolved
crates/gitbutler-repo/src/rebase.rs Show resolved Hide resolved
crates/gitbutler-repo/src/rebase.rs Show resolved Hide resolved
crates/gitbutler-stack/tests/mod.rs Show resolved Hide resolved
@Caleb-T-Owens
Copy link
Contributor

All looks good. We're going to merge after we release next

@Byron
Copy link
Collaborator Author

Byron commented Dec 10, 2024

Great! Let me add #5753 to the merge-queue then.

Copy link
Member

@krlvi krlvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release is out, so let's merge this one too so that we can test it in the nightly :)

Byron added 11 commits December 13, 2024 14:20
Note that it also uses rebasing, but that's not supported natively in `gix` yet.
In many places, a test-module is defined to collect all integration tests.
However, this doesn't prevent each of these loose modules to become their
own test module by default.

`autotest = false` prevents this automation.
That way one won't accidentally access private APIs, while staying
consistent with a standard Rust crate layout.

Personally, I really want to keep modules small and the `src/` tree
free of any extras.
Having them makes it harder to search for and remove those added
by oneself.
This change affects the entire edit mode, and every rebase.
It has minimal dependencies and can be used everywhere.
* simplify iteration in `extract_conflicted_files`
@Byron Byron force-pushed the finalize-merge-integration branch from c64b840 to 2610251 Compare December 13, 2024 13:23
@Byron Byron enabled auto-merge December 13, 2024 13:23
@Byron Byron merged commit eef6f56 into gitbutlerapp:master Dec 13, 2024
20 of 21 checks passed
@Byron Byron deleted the finalize-merge-integration branch December 13, 2024 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants