Granular pipeline dependency status #9431
Labels
A: pipelines
Related to the pipelines feature
A: status
Related to the dvc diff/list/status
p2-medium
Medium priority, should be done, but less important
Feature Request
When I run
dvc repro
dvc detects which dependencies have changed an therefore which stages needs to be reproduced. I would like to access the granular changes of all the dependencies for a stage since it was last reproduced.Example usage
Change in dependencies for stage
preprocess
:Motivation
This feature would be very useful for pipelines which process many independent samples and take a long time to run.
Imagine the following simple data setup where samples get preprocessed and stored in a new folder.
And the corresponding simple pipeline.
With this feature the pipeline stage code could check which samples have changed (new/modified/deleted) and only process those. It could also detect that the code has changed and reprocess all samples.
This would save me a lot of time since we have a long and slow pipeline where the raw data gets updated quite often.
Link to extended Discord discussion: https://discord.com/channels/485586884165107732/1093361005754585109
Link to another discussion of the same problem: #5917
The text was updated successfully, but these errors were encountered: