-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check link target exists #542
Conversation
@dberenbaum, can we check if the I don't want to do this on every individual transfer because performance for checkout is already bad enough. In fact, I was hoping to get rid of |
@skshetry I think that's the right idea, but I either don't follow what you mean or don't see how to do it. I don't see a higher-level cache path to check anywhere. The best I can see is to check the first path in the diff. |
dvc-data/src/dvc_data/hashfile/checkout.py Line 235 in de16710
|
The cache itself does exist though, so that will not solve the problem unless I'm still not understanding what you mean. |
@dberenbaum any idea behind the reason for this to come to this place (no file exists, but it's trying to checkout it)? can it be reproduced with |
Yes. iterative/dvc#10388 tries to checkout the imported data first (to avoid unnecessary downloads) and falls back to downloading if the checkout fails. You can reproduce it with the script from iterative/dvc#10500, although you can skip the part that sets up a custom cache dir. |
@dberenbaum okay, I see.
will it redownload the whole dataset (even if let's say some parts exist)? just to make sure .. in this specific PR - can we make it less general by passing something like "strict: true|false" to the Link object and use the strict mode in the context of imports only? |
I think that could work, although I'm not sure it's better than iterative/dvc#10501 since we are introducing a new option and additional checks for a single narrow scenario. |
Closes iterative/dvc#10500. Open to other ideas for how to solve it, but right now linking to a nonexistent target raises no error and can cause downstream problems like in that issue.