Reconfigurable modules #5922
Replies: 2 comments
-
A possible interface to reconfigurable modules is below. This example is based on https://dvc.org/doc/get-started # Clones a repository and pull data for reconfigurable modules with data
$ dvc clone https://github.com/iterative/so-dataset-posts-25K
$ dvc run -d prepare.py -d so-dataset-posts-25K/data.xml \
-o data.tsv -o data-test.tsv \
python prepare.py so-dataset-posts-25K/data.xml
$ dvc clone https://github.com/iterative/text-to-bag-of-words
# Run cloned module instead of:
# dvc run -d featurization.py -d data.tsv -o matrix.pkl \
# python featurization.py data.tsv matrix.pkl
# -d1 - pass a file as the first module input\dependency (since it can have a few)
# -o1 - instatiate (create a hardlink) the first module output as a data file
$ dvc sub text-to-bag-of-words -d1 data.tsv -o1 matrix.pkl \
-p columns=1,2 -p lowercase=true -p max_features=9000
# Just a regular run
$ dvc run -d train.py -d matrix.pkl \
-o model.pkl \
python train.py matrix.pkl model.pkl Details The module should not be executed Connection to build cache issue The module unique suffix can be based on the module instance config file (not in the example above) and set of params. In such a way DVC can easely identify a similar runs and can be reused as build cache #1234 for a regular runs (not modules). |
Beta Was this translation helpful? Give feedback.
-
@kskyten I'd love to hear your feedback on this. |
Beta Was this translation helpful? Give feedback.
-
This is the comment #1462 (comment) extracted as a separate feature request. Also, @kskyten opened submodels issue #301.
Users should have an ability to create a "library" of reconfigurable pipelines #1462 and reuse them from different projects. Pipeline import can work through copy, Git-submodules or git clone https://my-dvc-repo.
An analogy with programming:
UPDATE: Added a link to @kskyten issue.
Beta Was this translation helpful? Give feedback.
All reactions