Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: fix issue when adding already tracked symlinked files #4778

Merged
merged 6 commits into from
Oct 27, 2020

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Oct 23, 2020

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Will fix #4654.

  • dvc add <symlinked_file> now works as expected. The DVC file will be placed in the same directory as the original symlink
  • dvc add'ing symlinked a directory or a file inside a symlinked directory is now forbidden.

Docs PR: iterative/dvc.org#1890

@pmrowla pmrowla added the bugfix fixes bug label Oct 23, 2020
@pmrowla pmrowla self-assigned this Oct 23, 2020
@@ -367,7 +367,7 @@ def resolve_paths(repo, out):
if (
not scheme
and abspath.isin_or_eq(repo.root_dir)
and not contains_symlink_up_to(abspath, repo.root_dir)
and not contains_symlink_up_to(dirname, repo.root_dir)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to place the .dvc files into cwd() when the path leading up to the output contains a link, rather than when the output itself is a link, so we only need to check up to dirname here rather than abspath.

If the output itself is a symlink, it should not matter to DVC, as we will end up copying the file contents into DVC cache, and then replacing the original link with our own link or copy to the DVC cache entry. In that case, it is still safe for us to place the .dvc into the same directory as the original symlink.

@pmrowla pmrowla changed the title 4654 add symlink add: fix issue when adding symlinked files Oct 23, 2020
@pmrowla pmrowla changed the title add: fix issue when adding symlinked files add: fix issue when adding already tracked symlinked files Oct 23, 2020
@pmrowla pmrowla requested review from efiop, pared and skshetry October 23, 2020 09:49
Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!


# Test that subsequent add succeeds
# See https://github.com/iterative/dvc/issues/4654
dvc.add(os.path.join("dir", "foo"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@pmrowla pmrowla changed the title add: fix issue when adding already tracked symlinked files [WIP] add: fix issue when adding already tracked symlinked files Oct 24, 2020
Comment on lines -425 to -470
class SymlinkAddTestBase(TestDvc):
def _get_data_dir(self):
raise NotImplementedError

def _prepare_external_data(self):
data_dir = self._get_data_dir()

data_file_name = "data_file"
external_data_path = os.path.join(data_dir, data_file_name)
with open(external_data_path, "w+") as f:
f.write("data")

link_name = "data_link"
System.symlink(data_dir, link_name)
return data_file_name, link_name

def _test(self):
data_file_name, link_name = self._prepare_external_data()

ret = main(["add", os.path.join(link_name, data_file_name)])
self.assertEqual(0, ret)

stage_file = data_file_name + DVC_FILE_SUFFIX
self.assertTrue(os.path.exists(stage_file))

d = load_yaml(stage_file)
relative_data_path = posixpath.join(link_name, data_file_name)
self.assertEqual(relative_data_path, d["outs"][0]["path"])


class TestShouldAddDataFromExternalSymlink(SymlinkAddTestBase):
def _get_data_dir(self):
return self.mkdtemp()

def test(self):
self._test()


class TestShouldAddDataFromInternalSymlink(SymlinkAddTestBase):
def _get_data_dir(self):
return self.DATA_DIR

def test(self):
self._test()


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cases are covered by the new pytest based tests

@pmrowla
Copy link
Contributor Author

pmrowla commented Oct 26, 2020

After discussion with @efiop, the prior behavior regarding adding symlinked dirs or files inside symlinked dirs was too ambiguous and is now forbidden. This use-case should instead be handled by implementing proper support for adding external data directly to cache - see #4520.

@pmrowla pmrowla changed the title [WIP] add: fix issue when adding already tracked symlinked files add: fix issue when adding already tracked symlinked files Oct 26, 2020
):
raise DvcException(
"Cannot add files inside symlinked directories to DVC."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we should make it a bit more helpful. Maybe a link to troubleshooting doc?

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! πŸ™

@efiop efiop merged commit b2495aa into iterative:master Oct 27, 2020
@pmrowla pmrowla deleted the 4654-add-symlink branch October 27, 2020 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot dvc add an already tracked file that lies in a folder
3 participants