Support multiline command? #4373

skshetry · 2020-08-11T06:06:13Z

Something like done in Github actions with run and script?

stages:
  stage1:
    cmd: |
      cmd1
      cmd2

Or, perhaps:

stages:
  stage1:
    cmd:
      - cmd1
      - cmd2

Or, maybe both?

Need to double check if we will be able to keep the dump in this shape or form.

Why?

Better expression

Current Workaround:

stages:
  stage1:
    cmd: >-
      cmd1
      && cmd2
      && cmd3

efiop · 2020-11-16T22:09:35Z

For the record:

stages:
  stage1:
    cmd: |
      cmd1
      cmd2

this is already supported and will be passed as is to the shell (we use shell=True in Popen). I would keep it as is.

stages:
  stage1:
    cmd:
      - cmd1
      - cmd2

This looks great and I think it should behave like cmd1 && cmd2 && cmd3, meaning that if one command fails then the rest is not executed and results in a stage failure.

ClementWalter · 2020-11-21T16:57:26Z

I have started to look at the code. @efiop is it ok to add a self.cmd = " && ".join(cmd) if isintance(cmd, list) else cmd in Stage.__init__ or should the Stage be able to truly handle a list of commands?

efiop · 2020-11-21T19:40:27Z

Hi @ClementWalter !

Thanks for looking into it! What advantages/disadvantages do you see for doing a join like that?

efiop · 2020-12-03T10:50:17Z

Keeping open for the question about multiline cmd handling raised by @skshetry

jorgeorpinel · 2020-12-28T19:01:53Z

Hi. I'm curious what's the use case for this. Is there a good example of multiple commands that share deps and outs? (to improve the docs)

skshetry · 2020-12-29T08:37:07Z

Hi. I'm curious what's the use case for this.

This is more of a convenience/ergonomical feature. So, anything you can write in a single line can be written in multiple lines or vice versa but might be easier than just cramming it in one with &&.

Is there a good example of multiple commands that share deps and outs? (to improve the docs)

You can use Github search for this (about usage of &&). But, why does it have to share deps/outs at all?

Here's a list of what I could find:

building a model and testing it with a different script
building each model in a separate virtual environment and evaluating them
building each model in a separate directory
cd ../directory && python script.py - looks like they are keeping dvc repo and code separately.
fetching a pre-trained model and training them on local datasets (wget, etc).
cleaning up randomly generated files after building/training/evaluating.
setting some env vars, PythonPath, etc.
cd directory && python test.py - maybe cd is more convenient than wdir as cd only affects during runtime.

jorgeorpinel · 2020-12-29T09:08:29Z

Thanks! I'm aware of POSIX &&. My question is about a real user need that this issue initially tried to cover. I'm getting the idea that this is basically for nice formatting (alt. to && basically).

Also thanks for the specific example ideas! I think that most of them may be better served by other forms thouogh:

building a model and testing it with a different script

fetching a pre-trained model and training them on local datasets (wget, etc).

cleaning up randomly generated files after building/training/evaluating.

Multiple stages

building each model in a separate virtual environment and evaluating them

building each model in a separate directory

Stage groups (foreach syntax in DVC 2.0)

cd ../directory && python script.py

cd directory && python test.py

wdir (or just keeping &&)

setting several env vars seems like the only instance where this is really convenient, to avoid an ugly string full of &&.

why does it have to share deps/outs at all?

Initially I was assuming that would be the point of having several commands in a stage, since having each command have separate deps/outs and listing those in a single stage doesn't sound very clean (again, better to use 2 stages instead?) Although I suppose if the stages are parallel, it could be a nice use of this feature, along with a desc or # comment about that perhaps.

Anyway, my Q is answered. Will check with Ivan whether it's worth improving the docs for this. Thanks again!

skshetry · 2020-12-29T09:14:48Z

Multiple stages

You are thinking of an ideal situation. I won't be creating a separate stage just to clean the logfile the script creates. It's a convenience feature.

building each model in a separate virtual environment and evaluating them

building each model in a separate directory

Stage groups (foreach syntax in DVC 2.0)

How do you create a virtual environment and run the script with foreach? It's a different thing.

cd ../directory && python script.py

cd directory && python test.py

wdir (or just keeping &&)

Like I said, cd definitely has its place. wdir applies to all outs/deps whereas cd is a runtime component.

jorgeorpinel · 2020-12-29T09:16:05Z

It all seems very hypothetical to me but yes, you may be right. It's nice to have this option anyway.

skshetry · 2020-12-29T09:16:55Z

It all seems very hypothetical to me but yes, you may be right.

All of the examples are from Github's search. How could that be hypothetical?

jorgeorpinel · 2020-12-30T23:17:30Z

Oh I meant not from DVC users.

shcheklein · 2020-12-30T23:23:24Z

@jorgeorpinel

Oh I meant not from DVC users.

I guess, they are from DVC users since they come from Github repos that use DVC.

jorgeorpinel · 2020-12-30T23:26:50Z

Ah, I didn't realize those GitHub repos use DVC! I thought it was from generic repos. Will try to find some some of them...

jorgeorpinel · 2020-12-30T23:42:48Z

Too bad that

You can't use the following wildcard characters as part of your search query: . , : ; / \ ` ' " = * ! ? # $ & + ^ | ~ < > ( ) { } [ ]. The search will simply ignore these symbols.

i.e. I can't actually search for && - according to GH search docs.

So far I've been able to find one interesting example (python -m ... && fdupes ...) in my search but @skshetry if you have a better search can you share it please? I'll keep looking for now.

johnnychen94 · 2020-12-31T06:47:31Z

building each model in a separate virtual environment and evaluating them

Here's the Julia example:

each stage has a separate virtual environment: --project=prepare. Package dependencies are provided by prepare/Project.toml and tracked by git.
Pkg.instantiate() sets up the project environment (like pip install -r requirements.txt)

prepare:
  foreach: ${models}
  do:
    cmd: >-
      julia --project=prepare -e "using Pkg; Pkg.instantiate()" &&
      julia --project=prepare \
        prepare/main.jl data/train/rawdata/ data/train/prepared/${item.name}.h5 \
        --patch-stride ${item.data.patch_stride} \
        --patch-size ${item.data.patch_size} \
        --augment-times ${item.data.augment_times}
    deps:
    - data/train/rawdata
    - prepare
    outs:
    - data/train/prepared/${item.name}.h5

I'm thinking of dvc as a build system like make, so setting up a virtual environment has real merits for reproducibility. Version compatibility is the root of many hidden problems.

skshetry · 2020-12-31T07:04:13Z

@johnnychen94, we do support list now if you are using from the master.

So, you could write that as:

cmd:
  - julia --project=prepare -e "using Pkg; Pkg.instantiate()"
  - julia --project=prepare \
        prepare/main.jl data/train/rawdata/ data/train/prepared/${item.name}.h5 \
        --patch-stride ${item.data.patch_stride} \
        --patch-size ${item.data.patch_size} \
        --augment-times ${item.data.augment_times}

I see that you use \ at the end which should not be required as you are using folded style string (with >-), which will make it even cleaner.

cmd:
  - julia --project=prepare -e "using Pkg; Pkg.instantiate()"
  - >-
    julia --project=prepare
        prepare/main.jl data/train/rawdata/ data/train/prepared/${item.name}.h5
        --patch-stride ${item.data.patch_stride}
        --patch-size ${item.data.patch_size}
        --augment-times ${item.data.augment_times}

skshetry · 2020-12-31T07:27:23Z

i.e. I can't actually search for && - according to GH search docs.

@jorgeorpinel, I use a simple script to fetch all the files matching a query using Github's token and search locally. There aren't that many, so should be quick. Try the script in wsl.

jorgeorpinel · 2021-01-03T20:09:39Z

Thanks for sharing your real like example @johnnychen94 ! I'm not familiar with Julia so forgive me if this is obvious but why do you run using Pkg; Pkg.instantiate() separately instead of having it in prepare/main.jl?

I'll try your script and lyk @skshetry! Thanks

johnnychen94 · 2021-01-04T00:43:50Z

I'm not familiar with Julia so forgive me if this is obvious but why do you run using Pkg; Pkg.instantiate() separately instead of having it in prepare/main.jl?

There are no standard rules about it. It's doable but I deliberately kept them apart because I think the environment initialization part has nothing to do with the real code. We normally run the real code multiple times (for developing or debugging purposes) while only initialize the environment once.

This is also something that the whole Julia community tries to commit when dealing with CI. For example: https://github.com/JuliaImages/juliaimages.github.io/blob/cbd83915f5dfb0259fedaa3d1d14c67f34d983eb/.travis.yml#L14-L16

@skshetry

Implements an approach proposed by @skshetry. Fixes iterative#4373

@skshetry

Implements an approach proposed by @skshetry. Fixes #4373

efiop added feature request Requesting a new feature p3-nice-to-have It should be done this or next sprint labels Aug 11, 2020

efiop mentioned this issue Nov 22, 2020

Allow user to pass a list of commands for one single stage #4934

Merged

2 tasks

ClementWalter mentioned this issue Nov 29, 2020

Update the cmd entry of the dvc.yaml file to add cmd as list option iterative/dvc.org#1980

Merged

efiop closed this as completed in #4934 Dec 3, 2020

efiop reopened this Dec 3, 2020

jorgeorpinel mentioned this issue Dec 28, 2020

Misc. updates (2.0ish) iterative/dvc.org#2062

Merged

jorgeorpinel mentioned this issue Jan 3, 2021

guide: multi-cmd example in a dvc.yaml sample iterative/dvc.org#2070

Closed

efiop self-assigned this Jan 4, 2021

efiop added a commit to efiop/dvc that referenced this issue Jan 4, 2021

dvc: handle multiline command as list

08384d0

Implements an approach proposed by @skshetry. Fixes iterative#4373

efiop mentioned this issue Jan 4, 2021

dvc: handle multiline command as list #5194

Merged

2 tasks

efiop closed this as completed in #5194 Jan 4, 2021

efiop added a commit that referenced this issue Jan 4, 2021

dvc: handle multiline command as list (#5194)

7df53e5

Implements an approach proposed by @skshetry. Fixes #4373

jorgeorpinel mentioned this issue Feb 4, 2021

support adding/transfering data straight to cache/remote #4520

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiline command? #4373

Support multiline command? #4373

skshetry commented Aug 11, 2020

efiop commented Nov 16, 2020

ClementWalter commented Nov 21, 2020 •

edited

Loading

efiop commented Nov 21, 2020

efiop commented Dec 3, 2020

jorgeorpinel commented Dec 28, 2020

skshetry commented Dec 29, 2020 •

edited

Loading

jorgeorpinel commented Dec 29, 2020 •

edited

Loading

skshetry commented Dec 29, 2020

jorgeorpinel commented Dec 29, 2020 •

edited

Loading

skshetry commented Dec 29, 2020

jorgeorpinel commented Dec 30, 2020

shcheklein commented Dec 30, 2020

jorgeorpinel commented Dec 30, 2020

jorgeorpinel commented Dec 30, 2020

johnnychen94 commented Dec 31, 2020 •

edited

Loading

skshetry commented Dec 31, 2020

skshetry commented Dec 31, 2020 •

edited

Loading

jorgeorpinel commented Jan 3, 2021 •

edited

Loading

johnnychen94 commented Jan 4, 2021

Support multiline command? #4373

Support multiline command? #4373

Comments

skshetry commented Aug 11, 2020

Why?

Current Workaround:

efiop commented Nov 16, 2020

ClementWalter commented Nov 21, 2020 • edited Loading

efiop commented Nov 21, 2020

efiop commented Dec 3, 2020

jorgeorpinel commented Dec 28, 2020

skshetry commented Dec 29, 2020 • edited Loading

jorgeorpinel commented Dec 29, 2020 • edited Loading

skshetry commented Dec 29, 2020

jorgeorpinel commented Dec 29, 2020 • edited Loading

skshetry commented Dec 29, 2020

jorgeorpinel commented Dec 30, 2020

shcheklein commented Dec 30, 2020

jorgeorpinel commented Dec 30, 2020

jorgeorpinel commented Dec 30, 2020

johnnychen94 commented Dec 31, 2020 • edited Loading

skshetry commented Dec 31, 2020

skshetry commented Dec 31, 2020 • edited Loading

jorgeorpinel commented Jan 3, 2021 • edited Loading

johnnychen94 commented Jan 4, 2021

ClementWalter commented Nov 21, 2020 •

edited

Loading

skshetry commented Dec 29, 2020 •

edited

Loading

jorgeorpinel commented Dec 29, 2020 •

edited

Loading

jorgeorpinel commented Dec 29, 2020 •

edited

Loading

johnnychen94 commented Dec 31, 2020 •

edited

Loading

skshetry commented Dec 31, 2020 •

edited

Loading

jorgeorpinel commented Jan 3, 2021 •

edited

Loading