You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of my utilities pulls in data from various sources and builds a large hierarchy where versioning may or may not be required. Ideally, the hierarchy needs to allow for users to simply changes files in certain directories from time to time and version the appropriate subdirectory of the data.
At first I wanted to have a simple command to track changes, e.g. dvc add rootdir. But that seems to require reindexing of the entire hierarchy, which is not suitable for my use case. Ideally, data import process would dvc add the subdirectory itself when creating it. So I'd like to do that from python.
Otherwise, it would be useful if I can somehow avoid reindexing the whole hierarchy by running dvc add at the root.
As a feature request, I would ask for either some better documentation to describe how to achieve this approach, or documentation for calling dvc API from python (without running shell commands) in order to better work with this style of data organization.
The text was updated successfully, but these errors were encountered:
Thanks for the prompt response folks. This seems simple enough. Perhaps you could recommend a good strategy in terms of versioning the individual directories in a hierarchy.
For instance, consider that I run os.walk from the bottom up of a directory hierarchy, meaning that I will dvc add a directory full of items, and then add that directory's parent directory. Would dvc allow this and would it attempt to reindex the child directory again? Or would it instead somehow point to the child directory's .dvc file?
@JoeyCarson No, it won't allow it, as outputs of your stages will overlap that way. Do you need to ignore some dirs? If so, have you considered .dvcignore functionality?
One of my utilities pulls in data from various sources and builds a large hierarchy where versioning may or may not be required. Ideally, the hierarchy needs to allow for users to simply changes files in certain directories from time to time and version the appropriate subdirectory of the data.
At first I wanted to have a simple command to track changes, e.g. dvc add rootdir. But that seems to require reindexing of the entire hierarchy, which is not suitable for my use case. Ideally, data import process would dvc add the subdirectory itself when creating it. So I'd like to do that from python.
Otherwise, it would be useful if I can somehow avoid reindexing the whole hierarchy by running dvc add at the root.
As a feature request, I would ask for either some better documentation to describe how to achieve this approach, or documentation for calling dvc API from python (without running shell commands) in order to better work with this style of data organization.
The text was updated successfully, but these errors were encountered: