Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud versioning: Using dvc commit to track updates doesn't work as expected #8828

Closed
daavoo opened this issue Jan 16, 2023 · 2 comments · Fixed by #8873
Closed

cloud versioning: Using dvc commit to track updates doesn't work as expected #8828

daavoo opened this issue Jan 16, 2023 · 2 comments · Fixed by #8873
Assignees
Labels
A: cloud-versioning Related to cloud-versioned remotes bug Did we break something? p1-important Important, aka current backlog of things to do

Comments

@daavoo
Copy link
Contributor

daavoo commented Jan 16, 2023

Setup

I am ttracking a dictionary with cloud versioning enabled:

$ dvc remote modify myremote version_aware true
$ dvc remote modify myremote worktree true
$ mkdir data
$ echo foo > data/foo
$ echo bar > data/bar
$ dvc add data
$ dvc push
$ cat data.dvc
outs:                                                                                                             
- md5: 168fd6761b9c3f2c081085b51fd7bf15.dir
  size: 8
  nfiles: 2
  path: data
  files:
  - size: 4
    etag: 08deb086f8d8ccfc021001
    md5: c157a79031e1c40f85931829bc5fc552
    relpath: bar
  - size: 4
    etag: 08a48b8df8d8ccfc021001
    md5: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo

And performing an update:

Working as expected with dvc add

If I run dvc add data, it works as expected: The .dvc file preserves "cloud-versioning" format and dvc push only uploads and creates a new version for the modified file:

$ echo bar2 > data
$ dvc add data
$ cat data.dvc
outs:
- md5: 43f06a6517d71d0267bed23cfb1a53fb.dir
  size: 9
  nfiles: 2
  path: data
  files:
  - size: 5
    md5: 042967ff088f7f436015fbebdcad1f28
    relpath: bar
  - size: 4
    etag: 08a48b8df8d8ccfc021001
    md5: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo
$ dvc push
1 file pushed                                                                                                     
$ cat data.dvc
outs:
- md5: 43f06a6517d71d0267bed23cfb1a53fb.dir
  size: 9
  nfiles: 2
  path: data
  files:
  - size: 5
    md5: 042967ff088f7f436015fbebdcad1f28
    relpath: bar
    etag: 08eca3ebfddaccfc021001
  - size: 4
    etag: 08f0f286fcdaccfc021001
    md5: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo

Unexpected behavior with dvc commit

However, If I instead run dvc commit to track the updates, first the .dvc file loses the "cloud versioning" format and then all files are uploaded and redundant versions are created on the remote:

$ bar3 > data/bar
$ dvc commit
outputs ['data'] of stage: 'data.dvc' changed. Are you sure you want to commit it? [y/n] y 
$ cat data.dvc 
outs:                                                                                                             
- md5: a41477be72fa9617d877592a0b22542a.dir                                                                       
  size: 9
  nfiles: 2
  path: data
$ dvc push
2 files pushed 
$ cat data.dvc
outs:                                                                                                             
- md5: a41477be72fa9617d877592a0b22542a.dir
  size: 9
  nfiles: 2
  path: data
  files:
  - size: 5
    etag: 0898e6edffdaccfc021001
    md5: f19345bd4a711b82a13e55c61cad54e3
    relpath: bar
  - size: 4
    etag: 08bfe9e5ffdaccfc021001
    md5: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo
@daavoo daavoo added bug Did we break something? A: cloud-versioning Related to cloud-versioned remotes labels Jan 16, 2023
@daavoo
Copy link
Contributor Author

daavoo commented Jan 16, 2023

@dberenbaum @pmrowla is this by design?

Are there any problems with making merge_versioned arg of stage.save True by default (should do nothing if there are not cloud versioning outputs)?

daavoo added a commit that referenced this issue Jan 16, 2023
@daavoo daavoo linked a pull request Jan 16, 2023 that will close this issue
@daavoo daavoo self-assigned this Jan 17, 2023
@daavoo daavoo added this to DVC Jan 17, 2023
@daavoo daavoo moved this from Backlog to In Progress in DVC Jan 17, 2023
@github-project-automation github-project-automation bot moved this to Backlog in DVC Jan 17, 2023
@dberenbaum
Copy link
Collaborator

@dberenbaum @pmrowla is this by design?

No, it's not by design. Good catch.

@dberenbaum dberenbaum added the p1-important Important, aka current backlog of things to do label Jan 19, 2023
daavoo added a commit that referenced this issue Jan 24, 2023
If we want `add`, `commit` and `move` to work for cloud versioning, option must be always True so it is the same as not having it.

Closes #8828
(We were not passing `merge_versioned=True` in `dvc commit`)
daavoo added a commit that referenced this issue Jan 24, 2023
If we want `add`, `commit` and `move` to work for cloud versioning, option must be always True so it is the same as not having it.

Closes #8828
(We were not passing `merge_versioned=True` in `dvc commit`)
daavoo added a commit that referenced this issue Jan 25, 2023
If we want `add`, `commit` and `move` to work for cloud versioning, option must be always True so it is the same as not having it.

Closes #8828
(We were not passing `merge_versioned=True` in `dvc commit`)
@github-project-automation github-project-automation bot moved this from In Progress to Done in DVC Jan 25, 2023
daavoo added a commit that referenced this issue Jan 25, 2023
If we want `add`, `commit` and `move` to work for cloud versioning, option must be always True so it is the same as not having it.

Closes #8828
(We were not passing `merge_versioned=True` in `dvc commit`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: cloud-versioning Related to cloud-versioned remotes bug Did we break something? p1-important Important, aka current backlog of things to do
Projects
No open projects
Archived in project
2 participants