-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import: add a mechanism to lock external dependencies #2139
Comments
Related #1487 |
Summary of changesThe most recent PR #2160 and a few previous merged PRs have introduced a mechanism to reuse (import) a data artifact (model or data file or directory) from one Github repository into another:
creates a DVC-file that has a dependency to the (the md5: 39c9c563afc97491c86db391afd9cf82
wdir: .
deps:
- md5: 3863d0e317dee0a55c4e59d2ec0eef33
path: model.pkl
repo:
url: https://github.com/iterative/example-get-started
outs:
- md5: 3863d0e317dee0a55c4e59d2ec0eef33
path: model.pkl
cache: true
metric: false
persist: false this command also automatically "executes" this DVC-file, more specifically, it brings the The existing What is the "lock" mechanism about?Now that we have an external dependency in some sense, the question is what should we do when we run After a few discussions we have decided that we need a common semantics for all external dependencies - It means that if we, say, in one moth decide to do The same for Thus we need to introduce some "lock" mechanism for these "import" DVC-files. They need to capture somewhere all the necessary context to get the same data when it's needed. It does not necessarily mean that we need some new fields, we might end up reusing what we have already and just change behavior. Implementation options
...
repo:
url: https://github.com/iterative/example-get-started
rev-lock: 123456dfdfgdfgdf
... So, each DVC-file is self-contained and has all the information to restore that specific version of the repo initial data was taken from. Cons/pros for this option: ✅No additional files or concepts, easier to implement (at least initially);
✅Repo update becomes easier - a single place to update. Anything else I forgot @efiop @dmpetrov @Suor? @MrOutis @pared @jorgeorpinel @villasv @ei-grad @sotte let's discuss, share you opinions and votes? Let me know if something is not clear - a lot of information to digest :) |
Isn't the same as doing a |
yep, but it can touch potentially a lot of files. Imagine you would have to update every single python code file that has Also, what should I do if some of them are not in sync? Should I update all of them anyway? |
What do you mean by not in sync, @shcheklein, also, why DVC is allowing files to be out-of-sync, is it useful? Is it like a submodule of |
yep, you got it right! Since we essentially have a lock per DVC-files nothing prevents us to lock two files to two different versions (revisions) of the same repository. It's hard to predict how useful is it. But if we go with a decentralized approach (and looks like we do) it means that we have this feature out of the box. The question is how much will we have to pay for this down the road :) |
Fixes iterative#1774 Fixes iterative#2139 Fixes iterative#2201 Signed-off-by: Ruslan Kuprieiev <[email protected]>
From #2012
The text was updated successfully, but these errors were encountered: