Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document checkpoints without make_checkpoint() #2292

Closed
1 task
dberenbaum opened this issue Mar 11, 2021 · 15 comments
Closed
1 task

Document checkpoints without make_checkpoint() #2292

dberenbaum opened this issue Mar 11, 2021 · 15 comments
Assignees
Labels
A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions

Comments

@dberenbaum
Copy link
Contributor

dberenbaum commented Mar 11, 2021

Our documentation so far in https://dvc.org/doc/command-reference/exp/run (and maybe elsewhere?) assumes that checkpoints only work with make_checkpoint() or a signal file. However, checkpoints still work without make_checkpoint() if the stage is designed to make a single checkpoint instead of multiple checkpoints. The final output will still be saved for the following iteration as long as it is marked as checkpoint: true in the dvc.yaml.

This would be great to document because:

  1. It is language agnostic (without resorting to clunky workarounds) and therefore more consistent with the rest of DVC
  2. It provides a simpler starting point for using checkpoints

See https://github.com/iterative/dvc-checkpoints-mnist/tree/python_agnostic for an example.

@pmrowla @jorgeorpinel @dmpetrov

@pmrowla
Copy link
Contributor

pmrowla commented Mar 12, 2021

This is just using a regular output as a checkpoint. I guess the distinction is that in order for the "resume" feature in exp run to work, you still have to specify the checkpoint flag, but otherwise this is just doing the regular

dvc repro
git add .
git commit
<repeat>

workflow (with the results stored in an experiment ref instead of a regular git branch).

Maybe we should just consider having an explicit flag to extend ("resume") an existing experiment branch (that may or may not have checkpoint outputs) instead?

edit: although in this case exp run --reset behavior would not work the same as it does now (unless the user still specifies checkpoint outs)

@dberenbaum
Copy link
Contributor Author

This is just using a regular output as a checkpoint.

Exactly, which makes it easier to grasp for existing DVC users and more consistent with the rest of DVC (no need to inject DVC API functions into the user code). Not that make_checkpoint() isn't useful, it just seems like we are jumping straight into the advanced use case.

@iesahin
Copy link
Contributor

iesahin commented Mar 13, 2021

So make_checkpoint() is for creating checkpoints manually, but without it dvc exp run already saves a checkpoint after running the command. Right?

If checkpoint: true in dvc.yaml, it's as if running the experiments in a loop. We can

while true ; do 
   dvc exp run
done

to run the experiments continuously.

BTW, yes, this works as I expected but quitting from the loop is very difficult. I had to close tmux pane. 😅

@pmrowla
Copy link
Contributor

pmrowla commented Mar 14, 2021

So make_checkpoint() is for creating checkpoints manually, but without it dvc exp run already saves a checkpoint after running the command. Right?

Yes, DVC will always generate a final checkpoint commit after running the command (assuming the workspace state has actually changed and there is actually changes to commit)

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Mar 19, 2021

checkpoints still work without make_checkpoint() if the stage is designed to make a single checkpoint (per exp run)
great to document because: ... It is language agnostic ... provides a simpler starting point

Yeah I didn't even realize this was possible. Sure!

just using a regular output as a checkpoint

But the experiments (checkpoints) are added into a branch this way, which is quite different from regular experiments. Kind of the main difference with checkpoints!

it's as if running the experiments in a loop

BTW that connects with iterative/dvc/issues/5608

@jorgeorpinel jorgeorpinel added A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions labels Mar 19, 2021
@dberenbaum
Copy link
Contributor Author

We also probably need an example in the docs and/or in https://github.com/iterative/dvc-checkpoints-mnist that shows how to use signal files to generate checkpoints.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Mar 22, 2021

how to use signal files to generate checkpoints

Meaning withouot make_checkpoint(), or (both that and) without Python?

@dberenbaum
Copy link
Contributor Author

I mean manually doing the steps in https://dvc.org/doc/api-reference/make_checkpoint#description. Torn on whether to do it in Python for consistency or in another language to show how that works. Maybe we can start with Python and possibly translate into another language later.

@dberenbaum
Copy link
Contributor Author

I mean manually doing the steps in https://dvc.org/doc/api-reference/make_checkpoint#description

I put an example in https://github.com/iterative/dvc-checkpoints-mnist/tree/signal_file.

Next step is probably to reference this repo in the docs or develop a more robust tutorial out of this or another scenario. Seems like checkpoints in general are not featured that prominently yet, and the different ways to implement them are not laid out clearly.

Any thoughts on an approach here?

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Apr 4, 2021

Agree. Just not sure what the priority is. We can wait and see if people seem to need this guidance (those who don't find the signal_file branch of https://github.com/iterative/dvc-checkpoints-mnist first).

@dberenbaum
Copy link
Contributor Author

IMHO a first step would be to reference https://github.com/iterative/dvc-checkpoints-mnist in the docs (I think it's only mentioned in dvclive now) and explain the different branches.

@dberenbaum
Copy link
Contributor Author

We can wait and see if people seem to need this guidance (those who don't find the signal_file branch of https://github.com/iterative/dvc-checkpoints-mnist first).

I'm worried that users aren't very aware of checkpoints or how to use them until we better highlight and document them. As a first step, what about referencing the mnist examples in https://dvc.org/doc/user-guide/experiment-management#checkpoints-in-source-code?

@jorgeorpinel
Copy link
Contributor

Makes sense to wait a little before assigning a priority.

what about referencing the mnist examples in https://dvc.org/doc/user-guide/experiment-management#checkpoints-in-source-code?

Sure, we can add a paragraph. Or would it be more useful in #2373? (I can contribute to the branch).

@jorgeorpinel
Copy link
Contributor

UPDATE: for now please see #2381 and iterative/dvc-checkpoints-mnist#3

@dberenbaum
Copy link
Contributor Author

#2381 seems sufficient for now. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions
Projects
None yet
Development

No branches or pull requests

4 participants