Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc studio login: setup auto pushing experiments #10137

Closed
dberenbaum opened this issue Dec 5, 2023 · 14 comments · Fixed by #10345
Closed

dvc studio login: setup auto pushing experiments #10137

dberenbaum opened this issue Dec 5, 2023 · 14 comments · Fixed by #10345
Assignees
Labels
A: cli Related to the CLI A: experiments Related to dvc exp p1-important Important, aka current backlog of things to do

Comments

@dberenbaum
Copy link
Collaborator

dberenbaum commented Dec 5, 2023

See #5029 (edit: iterative/dvc.org#5029) and the related issues linked there for background.

Rather than document the environment variables to auto push experiments, we could make this part of the studio login workflow since auto-pushing experiments is mostly useful when using studio rather than keeping experiments local. We would need to:

  1. Make config options like exp.auto_push and exp.git_remote
  2. During studio login, ask to set these options. The UI could look something like this:
$ dvc studio login
...
Authentication successful. The token will be available as risen-geum in Studio profile.
Do you want to push experiments automatically when they are completed [Y\n]?
Enter the Git remote to use [origin]:
@dberenbaum dberenbaum added the p1-important Important, aka current backlog of things to do label Dec 5, 2023
@dberenbaum dberenbaum added this to DVC Dec 5, 2023
@dberenbaum dberenbaum moved this to Backlog in DVC Dec 5, 2023
@skshetry
Copy link
Member

skshetry commented Dec 5, 2023

Since this is a studio login, I'd rather not have any prompts and enable everything by default. But we should notify users that they are enabled and provide hints to disable (and support arg to disable this behaviour).

@dberenbaum
Copy link
Collaborator Author

cc @iterative/vs-code since this should also impact the vs code flow

@dberenbaum
Copy link
Collaborator Author

@skshetry Do you mean to enable them during studio login or some other time? Auto-pushing is pretty connected to the Studio workflow since it's where the pushed experiments appear, and I don't think it's worthwhile to auto-push them without Studio.

Regardless, I think we can do the first step of adding config options. Having to set environment variables every time to auto push doesn't make much sense.

@skshetry
Copy link
Member

skshetry commented Dec 5, 2023

Do you mean to enable them during studio login or some other time?

Enabling them automatically during studio login (unless it's not disabled already by other means).

@dberenbaum
Copy link
Collaborator Author

Not that strong an opinion, but gh auth login has prompts. While they can be clumsy, in this case there is already some interaction needed, so I didn't think prompts would be bad UX. What's your concern?

@skshetry
Copy link
Member

skshetry commented Dec 5, 2023

From a new user perspective, it might be confusing and unclear what to choose. "Do you want to push experiments?" - maybe, maybe not, idk. What's experiments? etc.

It'll definitely lead to choice paralysis to me if I was using it for the first time. 😅

It's better to make a choice for them here. But the message should be clear that we are doing that.
We want to have less interactions as possible, less decisions for user to make as possible.

@dberenbaum
Copy link
Collaborator Author

We also need a way to auto push on exp save for dvclive-only experiments. DVC_EXP_AUTO_PUSH does not do this now.

@dberenbaum
Copy link
Collaborator Author

dberenbaum commented Dec 13, 2023

Thoughts on this approach?

  • Once you login to studio, everything will be pushed automatically unless you set it to offline, and we can make clear during login how to toggle offline mode
  • We can show a notification before starting the push making clear that if you don't want to wait, it's safe to cancel and you can always upload later with exp push

@dberenbaum dberenbaum added A: experiments Related to dvc exp A: cli Related to the CLI labels Dec 22, 2023
@dberenbaum dberenbaum moved this from Backlog to Todo in DVC Dec 22, 2023
@dberenbaum dberenbaum moved this from Todo to Backlog in DVC Jan 4, 2024
@dberenbaum
Copy link
Collaborator Author

Not a requirement but nice to have would be to incorporate #8843 when doing this. If we can push the dvc-tracked data at the end of each stage, and include the run cache, it can help in scenarios like recovery from failed runners but also break up the pushes during the experiment run so the final push may not feel so painful.

@dberenbaum dberenbaum moved this from Backlog to Todo in DVC Feb 12, 2024
@dberenbaum dberenbaum moved this from Todo to Backlog in DVC Feb 12, 2024
@AlexandreKempf AlexandreKempf self-assigned this Feb 20, 2024
@AlexandreKempf AlexandreKempf moved this from Backlog to Todo in DVC Feb 20, 2024
@dberenbaum
Copy link
Collaborator Author

dberenbaum commented Feb 20, 2024

Tasks for this issue:

  • Confirm DVC_EXP_AUTO_PUSH works as expected
  • Make DVC_EXP_AUTO_PUSH default to use git remote origin (currently requires DVC_EXP_GIT_REMOTE)
  • Make DVC_EXP_AUTO_PUSH work on dvc exp save
  • Add config options for dvc config exp.auto_push and dvc config exp.git_remote
  • Handle errors if no dvc or git remote
  • Enable during dvc studio login with instructions or option to opt out
  • During push, show useful messages in case it's slow (it's safe to cancel, how to upload later, how to disable push)
  • Handle case where remote doesn't exist
  • Simplify ways to set git remote url
  • Make auto push work with queue

Out of scope:

@dberenbaum
Copy link
Collaborator Author

@skshetry I updated the checklist above for what's left to do here.

@skshetry
Copy link
Member

skshetry commented Mar 5, 2024

@dberenbaum, any thoughts on how to simplify?

@skshetry skshetry pinned this issue Mar 5, 2024
@dberenbaum
Copy link
Collaborator Author

We could also make studio.repo_url an alias for exp.git_remote and deprecate it, so you can specify either a URL or a git remote name.

Originally posted by @skshetry in iterative/dvc.org#5165 (comment)

@skshetry This suggestion makes sense to me.

@dberenbaum
Copy link
Collaborator Author

Added Make auto push work with queue. Currently, queued experiments fail because origin is not set in the queued repo:

$ dvc exp run --run-all
Following logs for all queued experiments. Use Ctrl+C to stop following logs (experiment execution will continue).

Reproducing experiment 'sober-daze'
Running stage 'train':
> python src/stages/train.py --config=params.yaml
WARNING: Failed to validate remotes. Disabling auto push: 'origin' is not a valid Git remote or URL

Ran experiment(s):
To apply the results of an experiment to your workspace run:

        dvc exp apply <exp>

@skshetry skshetry unpinned this issue Mar 12, 2024
@skshetry skshetry moved this from In Progress to Done in DVC Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: cli Related to the CLI A: experiments Related to dvc exp p1-important Important, aka current backlog of things to do
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants