Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repo: use is_repo_import to check for imports in outs_graph #9925

Merged
merged 1 commit into from
Sep 8, 2023

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Sep 8, 2023

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Fixes #9912

@pmrowla pmrowla added the bugfix fixes bug label Sep 8, 2023
@pmrowla pmrowla self-assigned this Sep 8, 2023
for dep in stage.deps:
if dep.fs_path is None:
# RepoDependency don't have a path
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer valid since repo deps now have a dvcfs fs_path

Comment on lines +161 to +162
if stage.is_repo_import:
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check could actually be if stage.is_import here since outs can't overlap with external dep paths now?

(But I left the check as repo imports only for now to match the original behavior, and since import-url deps won't cause the erepo git cloning issue)

@pmrowla pmrowla requested a review from skshetry September 8, 2023 05:27
@pmrowla pmrowla changed the title repo: use is_repo_import to check for imports repo: use is_repo_import to check for imports in outs_graph Sep 8, 2023
@codecov
Copy link

codecov bot commented Sep 8, 2023

Codecov Report

Patch coverage is 100.00% of modified lines.

Files Changed Coverage
dvc/repo/graph.py 100.00%

πŸ“’ Thoughts on this report? Let us know!.

@skshetry skshetry merged commit 408d23f into iterative:main Sep 8, 2023
@skshetry
Copy link
Member

skshetry commented Sep 8, 2023

Thanks @pmrowla.

@pmrowla pmrowla deleted the outs-graph-repo-imports branch September 8, 2023 10:57
Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmrowla for the quick fix! Curious what is your take on adding tests on issues like this (and probably for bug fixes in general?). (I had a conversation with @efiop and our opinions differ)- I wanted to calibrate with you and probably with the whole team as well :) (clearly it goes case by case, and there are very valid exception from any policy).

@pmrowla
Copy link
Contributor Author

pmrowla commented Sep 11, 2023

I'm not really sure that a test for this would be particularly useful (i.e. having a func test for "does dvc add or outs_graph trigger an erepo clone").

This issue was really an outs_graph performance regression due to the new behavior that didn't exist when outs_graph was first implemented. The unexpected erepo clone behavior was introduced because the underlying support for repo imports has been improved (so they can now be treated like a regular dependency that has a path). But at the end of the day, both dvc add and the outs_graph worked correctly both before and after this patch.

This is something that I would expect to be caught via higher level QA and/or benchmarks. It would be better for us to have more useful real-world test cases in dvc-bench, but IMO this isn't something that needs to be tested at the DVC func or unit test level

@skshetry
Copy link
Member

I see this as a design issue with the current fs.path thing. It should not be this easy to have unwanted side effects (especially on a DVCS and a build system where you need controlled side-effects).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unnecessary remote repository fetches
3 participants