How to manage external data efficiently? #5154
-
Hi, I add it as an external dependency using the command: I now define a dvc run which copies the data from D:\Temp\Data and keeps it in my Workspace. It produces a .yaml and .lock file. I now change the original data on D:\Temp\Data by adding new files to it. When i do a dvc status, it shows files have been changed, but i dont know how to add those the version control. I tried dvc import but it works on only DVC/Git based repos not on data on local storage or S3. I do not want to push the data because it bloats the storage but i want to track the same using external dependency only. Any help would be appreciated :) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 17 replies
-
To add the new version to DVC and update
(This is the same way that you update regular local workspace data which has been tracked via Using external data is a fairly advanced DVC feature and generally not recommended, can you share a bit more about your specific needs/use case here? |
Beta Was this translation helpful? Give feedback.
-
@piseabhijeet Looks like you are looking for #4520 , which is on our todo list right now. The current workflow to achieve that is to just copy the file you wish to add into your workspace and then |
Beta Was this translation helpful? Give feedback.
To add the new version to DVC and update
Data.dvc
you would just rerun(This is the same way that you update regular local workspace data which has been tracked via
dvc add
).Using external data is a fairly advanced DVC feature and generally not recommended, can you share a bit more about your specific needs/use case here?