Weekly Processing?
#10652
Replies: 1 comment 1 reply
-
Hi @davies-w . A few options:
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'd like to be able to run a dvc pipeline that will maintain a weekly state..
EG:
Input is:
s3:week-1.df.gz
s3:week-2.df.gz
Intermediate Output:
dvc_data/intermediate/week-1-processed.df
dvc_data/intermediate/week-2-processed.df
Final Output:
dvc_data/final/combined-formatted.dataset
So that when a new s3:week-3.df.gz appears, dvc will just run on that file, and produce:
dvc_data/intermediate/week-3-processed.df
and then updates the weeks together to produce:
dvc_data/final/combined-formatted.dataset
Extra credit if you can suck in the original version of dvc_data/final/combined-formatted.dataset and merge it with dvc_data/intermediate/week-3-processed.df
Beta Was this translation helpful? Give feedback.
All reactions