-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release 3.0 #7093
Comments
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Two more proposals from my side:
|
😅 https://discord.com/channels/485586884165107732/1111557378735865856/1111636564183883876 It's not clear to me whether the stage-level vars are needed there. @skshetry Maybe you have an idea for how else to handle that use case?
I've come around to not minding the explicit |
This comment was marked as off-topic.
This comment was marked as off-topic.
@skshetry It turns out they aren't needed in that case, so I'm on board to drop them. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
One last thing we forgot to include is to stop validating cache on every minor operation. This is a leftover from the past where symlink/hardlink were default and you ran a greater risk to corrupt your cache. These days we default to reflink/copy, so there is no reason really to validate cache on every operation like In practical terms, I don't think we document that certain operations check cache, so we are not really tied to 3.0 release and can just stop doing it whenever. |
Why don't we create a separate issue for it then and just make it high priority so we get to it soon? |
Stage-level stages:
train_model:
foreach: [us, uk]
do:
wdir: ${item}
vars: [params.yaml] # uk/params.yaml, us/params.yaml, etc.
cmd: cat params.yaml But I don't find it very elegant, is cumbersome, has a lot of complexity and haven't seen anyone use this. Hydra support may have made this redundant. |
We actually use this feature extensively. Our piplines tend to have a lot of this: stages:
preprocess_feature_one:
foreach: ${sessions}
do:
vars:
- preprocessing_key: feature_one
input_dir: load
<<: &stage_preprocess
cmd: >-
python src/preprocess.py
--params-file=params.yaml
${preprocessing_key}
data/interim/${input_dir}/${key}
data/interim/${preprocessing_key}/${key}
params:
- preprocess.${preprocessing_key}
deps:
- ../../poetry.lock
- data/interim/${input_dir}/${key}
- src/preprocess.py
outs:
- data/interim/${preprocessing_key}/${key}
preprocess_feature_two:
foreach: ${sessions}
do:
vars:
- preprocessing_key: feature_two
input_dir: feature_one
<<: *stage_preprocess While removing this functionality might not completely block us from upgrading to DVC 3.x—one could copy/paste the stage definition with small changes—it would definitely increase the maintenance burden of our pipelines and make mistakes much easier to make (oops, forgot to change X in all the stages). It's possible I'm overlooking some functionality made possible by hydra? I might have even gloated about our use of this pattern in my last DVC office hours 😅 |
@dberenbaum Created #9561 |
@sjawhar That's a really creative way to do things! If we were to implement #5172, do you think that would solve the same use case? From what I can tell, you are limited by needing to loop over both sessions and features. @skshetry WDYT about the comment from @sjawhar above? What do you think would be the best way to do it? |
@dberenbaum I will try to create |
@sjawhar, that's an interesting way to do nested looping with merge-key. How many sessions do you run? If they aren't that many, I'll suggest using a (duplicated) map/dictionary and use that loop. # params.yaml
sessions:
feature1_session1:
key: feature_one
input_dir: feature_one
feature1_session2:
key: feature_one
input_dir: feature_one
feature2_session1:
key: feature_two
input_dir: feature_two
feature2_session2:
key: feature_two
input_dir: feature_two This might be more readable, although you can still use the same merge-key to avoid duplications. sessions:
feature1_session1: &feature1
key: feature_one
input_dir: feature_one
feature1_session2: *feature1
...
feature2_session1: &feature2
key: feature_two
input_dir: feature_two
feature2_session2: *feature2
... #5172 might be a better solution to fix this. |
@skshetry When we use DVC in a data registry pattern, there could be hundreds or thousands of sessions. Yes, though, you're right that we could have an additional step the one runs before running |
Data Management
Experiments and Pipelines
checkpoints
#9221Deprecations
Other release blockers
data status --not-in-remote
: not working #9541Studio Readiness for 3.0 release
The text was updated successfully, but these errors were encountered: