check link target exists #542

dberenbaum · 2024-08-06T15:16:31Z

Closes iterative/dvc#10500. Open to other ideas for how to solve it, but right now linking to a nonexistent target raises no error and can cause downstream problems like in that issue.

skshetry · 2024-08-06T15:23:33Z

@dberenbaum, can we check if the cache.dir exists or not (preferably in checkout or _checkout)? Would that work?

I don't want to do this on every individual transfer because performance for checkout is already bad enough. In fact, I was hoping to get rid of exists() call above that we are already doing in _remove() IIRC.

dberenbaum · 2024-08-06T15:49:21Z

@skshetry I think that's the right idea, but I either don't follow what you mean or don't see how to do it. I don't see a higher-level cache path to check anywhere. The best I can see is to check the first path in the diff.

skshetry · 2024-08-06T15:59:07Z

cache is an ObjectDB/HashFileDB and it should have .path property.

dvc-data/src/dvc_data/hashfile/checkout.py

Line 235 in de16710

cache,

https://github.com/iterative/dvc-objects/blob/4ec9be99b9c96d54c74559d9f3203e3d38c666f2/src/dvc_objects/db.py#L43

dberenbaum · 2024-08-06T16:03:43Z

The cache itself does exist though, so that will not solve the problem unless I'm still not understanding what you mean.

shcheklein · 2024-08-06T20:29:14Z

@dberenbaum any idea behind the reason for this to come to this place (no file exists, but it's trying to checkout it)? can it be reproduced with dvc pull --allow-missing && dvc checkout for example (assuming some files are missing)?

dberenbaum · 2024-08-06T21:05:20Z

@dberenbaum any idea behind the reason for this to come to this place (no file exists, but it's trying to checkout it)?

Yes. iterative/dvc#10388 tries to checkout the imported data first (to avoid unnecessary downloads) and falls back to downloading if the checkout fails.

You can reproduce it with the script from iterative/dvc#10500, although you can skip the part that sets up a custom cache dir.

shcheklein · 2024-08-06T23:36:06Z

@dberenbaum okay, I see.

Yes. iterative/dvc#10388 tries to checkout the imported data first (to avoid unnecessary downloads) and falls back to downloading if the checkout fails.

will it redownload the whole dataset (even if let's say some parts exist)? just to make sure ..

in this specific PR - can we make it less general by passing something like "strict: true|false" to the Link object and use the strict mode in the context of imports only?

dberenbaum · 2024-08-07T18:54:02Z

in this specific PR - can we make it less general by passing something like "strict: true|false" to the Link object and use the strict mode in the context of imports only?

I think that could work, although I'm not sure it's better than iterative/dvc#10501 since we are introducing a new option and additional checks for a single narrow scenario.

check link target exists

b5b7c77

dberenbaum requested a review from skshetry August 6, 2024 15:16

dberenbaum mentioned this pull request Aug 6, 2024

dvc==3.53.0 import fails with No such file or directory when cache.dir configured and cache.type symlink iterative/dvc#10500

Closed

dberenbaum closed this Aug 15, 2024

skshetry deleted the link-check-from-path branch August 15, 2024 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check link target exists #542

check link target exists #542

dberenbaum commented Aug 6, 2024

skshetry commented Aug 6, 2024

dberenbaum commented Aug 6, 2024

skshetry commented Aug 6, 2024 •

edited

Loading

dberenbaum commented Aug 6, 2024

shcheklein commented Aug 6, 2024

dberenbaum commented Aug 6, 2024

shcheklein commented Aug 6, 2024

dberenbaum commented Aug 7, 2024

check link target exists #542

check link target exists #542

Conversation

dberenbaum commented Aug 6, 2024

skshetry commented Aug 6, 2024

dberenbaum commented Aug 6, 2024

skshetry commented Aug 6, 2024 • edited Loading

dberenbaum commented Aug 6, 2024

shcheklein commented Aug 6, 2024

dberenbaum commented Aug 6, 2024

shcheklein commented Aug 6, 2024

dberenbaum commented Aug 7, 2024

skshetry commented Aug 6, 2024 •

edited

Loading