Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify execution of various runs with different params #7891

Closed
behrica opened this issue Jun 13, 2022 · 15 comments
Closed

Simplify execution of various runs with different params #7891

behrica opened this issue Jun 13, 2022 · 15 comments
Labels
A: experiments Related to dvc exp A: params Related to dvc params feature request Requesting a new feature

Comments

@behrica
Copy link

behrica commented Jun 13, 2022

As an alternative to repeated (manual) executions of:

dvc exp run -S a=1
dvc exp run -S a=2

it might be usefull (and clean, I think) to allow some way to "pass several param files" (from a folder maybe),
and this would "auto-queue" runs automatically accordingly.

@dtrifiro dtrifiro added feature request Requesting a new feature A: experiments Related to dvc exp labels Jun 13, 2022
@Houstonwp
Copy link

This is what Facebook's Hydra does, and it really is intuitive and clean.

@daavoo
Copy link
Contributor

daavoo commented Jul 28, 2022

This is what Facebook's Hydra does, and it really is intuitive and clean.

We are currently exploring different ways to integrate with Hydra. This feature is part of the scope

@daavoo daavoo added the A: params Related to dvc params label Aug 22, 2022
@daavoo
Copy link
Contributor

daavoo commented Sep 6, 2022

#8187 is adding support for using Hydra syntax in --set-param, so:

dvc exp run -S 'a=1,2' --queue

Will put 2 experiments in the queue, that can be later executed with dvc queue start.

@behrica
Copy link
Author

behrica commented Sep 7, 2022

My main usecase for the feature request would be to auto-generate such parameter files.
#8187 does not allow this. The parameters are interwoven with the rest of the exp run command.

This would allow to use any algorithm for calculating the concrete parameters, without the need to include all such algorithms in dvc itself.

@dberenbaum
Copy link
Collaborator

@behrica Would you be interested in a Python API to do this?

My feeling is that if you need to auto-generate all parameter combinations, you may as well call dvc exp run --queue from your code for each parameter combination (have you tried this as a workaround?). Saving them all to different files in a folder seems sort of against DVC expectations, since it is assumed each experiment contains only its own parameters. It also doesn't seem to work for adaptive algorithms, where it's not known from the start every parameter combination that will be tried.

A Python API could add more parameter combinations as you go. It also adds possibilities to do more complex operations like randomly select a parameter from an interval. Maybe we could support that in a way that is broadly useful across any search algorithm?

@behrica
Copy link
Author

behrica commented Sep 8, 2022

I am very happy that DVC is language independent. I use it from Clojure.
So I would favor a command line which takes a file with all parameters combinations I want.

Then I could generate such a file from Clojure

@behrica
Copy link
Author

behrica commented Sep 8, 2022

But this makes it rather static, indeed.

@behrica
Copy link
Author

behrica commented Sep 8, 2022

But the workaround you mentioned is feasible as well.

@behrica
Copy link
Author

behrica commented Sep 8, 2022

I think the general question is to decide on this question :

Should dvc itself start to provide various algorithms to "statically calculate" concrete parameters from "a user supplied parameter space"
yes/no

It seems to me that #8187 is a first step in this direction. The user gives the space, and dvc calculates all combinations.
(taking a random subset of this would be an other algorithm)
(using a https://en.wikipedia.org/wiki/Sobol_sequence is an other optimization)
Both only take a subset of all combinations or work with continuous intervals and split them smartly.

To allow a "parameter file" would externalize this and allow to keep it out of dvc.
But then #8187 should maybe not be merged.

This does not address yet the question of doing this non static using past results of training for example.

@behrica
Copy link
Author

behrica commented Sep 8, 2022

I see the "user interface" very similar to #8187

$ dvc exp run -Sfile "param_combinations.csv " --queue     # file being in somehow a table format, maybe csv
Queueing with '{'params.yaml': ['db=mysql', 'schema=warehouse']}'.
Queued experiment '5ab98b8' for future execution.
Queueing with '{'params.yaml': ['db=mysql', 'schema=school']}'.
Queued experiment '57c2fb6' for future execution.
Queueing with '{'params.yaml': ['db=postgresql', 'schema=warehouse']}'.
Queued experiment 'b9d6391' for future execution.
Queueing with '{'params.yaml': ['db=postgresql', 'schema=school']}'.
Queued experiment '145cd55' for future execution.

@dberenbaum
Copy link
Collaborator

it might be usefull (and clean, I think) to allow some way to "pass several param files" (from a folder maybe)

By the way, this is already possible to do with Hydra. You would save them as YAML files in your conf directory and then select each conf file like dvc exp run -S conf_file=file1,file2. There's a simple example in https://github.com/dberenbaum/hydra-dvc-multirun.

@behrica
Copy link
Author

behrica commented Oct 20, 2022

it might be usefull (and clean, I think) to allow some way to "pass several param files" (from a folder maybe)

By the way, this is already possible to do with Hydra. You would save them as YAML files in your conf directory and then select each conf file like dvc exp run -S conf_file=file1,file2. There's a simple example in https://github.com/dberenbaum/hydra-dvc-multirun.
This syntax does no work for me:

[hydra-dvc-multirun]$ dvc exp run  --queue -S conf_file=one.yaml,two.yaml 
ERROR: unexpected error - Could not override 'conf_file'.             
To append to your config use +conf_file=one.yaml: Key 'conf_file' is not in struct
    full_key: conf_file
    object_type=dict
 dvc exp run  --queue -S conf_file=one.yaml,two.yaml 
ERROR: unexpected error - Could not override 'conf_file'.             
To append to your config use +conf_file=one.yaml: Key 'conf_file' is not in struct
    full_key: conf_file
    object_type=dict

@dberenbaum
Copy link
Collaborator

Sorry, there is some hydra-specific syntax. You have to use group (since that's the dir inside conf where the files are stored), and you can optionally drop .yaml. See the readme of that repo:

$ dvc exp run --queue -S group=one,two
Queueing with overrides '{'params.yaml': ['group=one']}'.
Queued experiment '634a8fa' for future execution.
Queueing with overrides '{'params.yaml': ['group=two']}'.
Queued experiment '0c283dc' for future execution.

@behrica
Copy link
Author

behrica commented Oct 21, 2022

I tried it out, and that might work.
My use case would be massive grid searches, so I would maybe generate a few thousand files.
I could give all of them a random name and list them all in a very long list .... (probable reaching the maximum length of a command line)

I did it now in a complete different way, which is working as well, not using hydra al all.

Basically I loop over all my parameter combinations in code and do:

  1. write param.yaml to disk
  2. shell out and run dvc exp run --queue

This is maybe even good enough for closing this issue.

@dberenbaum
Copy link
Collaborator

Makes sense @behrica! Yeah, there are too many different ways to do this to have them all be "built in," but glad you found a pattern that works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp A: params Related to dvc params feature request Requesting a new feature
Projects
None yet
Development

No branches or pull requests

5 participants