Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushing artifacts via WebDAV results in a 411 Length Required response #4796

Closed
LucaButera opened this issue Oct 27, 2020 · 40 comments
Closed
Labels
fs: webdav Related to the Webdav filesystem research

Comments

@LucaButera
Copy link
Contributor

Bug Report

I am trying to connect to a remote via WebDAV. I can correctly setup user and password along with the url, but when I try to push the artifacts I get a 411 Length Required response. How can I solve the missing header problem?

Please provide information about your setup

DVC version: 1.9.0 (brew)

Platform: Python 3.9.0 on macOS-10.15.7-x86_64-i386-64bit
Supports: azure, gdrive, gs, http, https, s3, ssh, oss, webdav, webdavs
Cache types: reflink, hardlink, symlink
Repo: dvc, git

@efiop
Copy link
Contributor

efiop commented Oct 27, 2020

Hi @LucaButera

Please show full log for dvc push -v.

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Oct 27, 2020
@LucaButera
Copy link
Contributor Author

2020-10-27 18:30:54,485 DEBUG: Check for update is enabled.
2020-10-27 18:30:54,487 DEBUG: fetched: [(3,)]
2020-10-27 18:30:54,487 DEBUG: Checking if stage 'as' is in 'dvc.yaml'
2020-10-27 18:30:54,494 DEBUG: Assuming '/Users/lucabutera/bolt_datasets/.dvc/cache/67/560cedfa23a09b3844c3278136052f.dir' is unchanged since it is read-only
2020-10-27 18:30:54,495 DEBUG: Assuming '/Users/lucabutera/bolt_datasets/.dvc/cache/67/560cedfa23a09b3844c3278136052f.dir' is unchanged since it is read-only
2020-10-27 18:30:54,510 DEBUG: Preparing to upload data to 'https://<user>@drive.switch.ch/remote.php/dav/files/<user>/datasets'
2020-10-27 18:30:54,510 DEBUG: Preparing to collect status from https://<user>@drive.switch.ch/remote.php/dav/files/l<user>/datasets
2020-10-27 18:30:54,510 DEBUG: Collecting information from local cache...
2020-10-27 18:30:54,511 DEBUG: Assuming '/Users/lucabutera/bolt_datasets/.dvc/cache/67/560cedfa23a09b3844c3278136052f.dir' is unchanged since it is read-only
2020-10-27 18:30:54,511 DEBUG: Assuming '/Users/lucabutera/bolt_datasets/.dvc/cache/a7/1ba7ec561a112e0af205674a767b7a' is unchanged since it is read-only
2020-10-27 18:30:54,511 DEBUG: Collecting information from remote cache...
2020-10-27 18:30:54,511 DEBUG: Querying 1 hashes via object_exists
2020-10-27 18:30:55,791 DEBUG: Matched '0' indexed hashes
2020-10-27 18:30:56,302 DEBUG: Estimated remote size: 256 files
2020-10-27 18:30:56,303 DEBUG: Querying '2' hashes via traverse
2020-10-27 18:30:58,349 DEBUG: Uploading '.dvc/cache/a7/1ba7ec561a112e0af205674a767b7a' to 'https://<user>@drive.switch.ch/remote.php/dav/files/<user>/datasets/a7/1ba7ec561a112e0af205674a767b7a'
2020-10-27 18:30:59,678 ERROR: failed to upload '.dvc/cache/a7/1ba7ec561a112e0af205674a767b7a' to 'https://<user>@drive.switch.ch/remote.php/dav/files/<user>/datasets/a7/1ba7ec561a112e0af205674a767b7a' - Request to https://<user>@drive.switch.ch/remote.php/dav/files/<user>/datasets/a7/1ba7ec561a112e0af205674a767b7a failed with code 411 and message: b'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>411 Length Required</title>\n</head><body>\n<h1>Length Required</h1>\n<p>A request of the requested method PUT requires a valid Content-length.<br />\n</p>\n<hr>\n<address>Apache/2.4.18 (Ubuntu) Server at a01.drive.switch.ch Port 80</address>\n</body></html>\n'
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/dvc/cache/local.py", line 32, in wrapper
    func(from_info, to_info, *args, **kwargs)
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/dvc/tree/base.py", line 356, in upload
    self._upload(  # noqa, pylint: disable=no-member
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/dvc/tree/webdav.py", line 243, in _upload
    self._client.upload_to(buff=chunks(), remote_path=to_info.path)
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/webdav3/client.py", line 66, in _wrapper
    res = fn(self, *args, **kw)
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/webdav3/client.py", line 438, in upload_to
    self.execute_request(action='upload', path=urn.quote(), data=buff)
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/webdav3/client.py", line 226, in execute_request
    raise ResponseErrorCode(url=self.get_url(path), code=response.status_code, message=response.content)
webdav3.exceptions.ResponseErrorCode: Request to https://<user>@drive.switch.ch/remote.php/dav/files/<user>/datasets/a7/1ba7ec561a112e0af205674a767b7a failed with code 411 and message: b'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>411 Length Required</title>\n</head><body>\n<h1>Length Required</h1>\n<p>A request of the requested method PUT requires a valid Content-length.<br />\n</p>\n<hr>\n<address>Apache/2.4.18 (Ubuntu) Server at a01.drive.switch.ch Port 80</address>\n</body></html>\n'
------------------------------------------------------------
2020-10-27 18:30:59,680 DEBUG: failed to upload full contents of 'as', aborting .dir file upload
2020-10-27 18:30:59,680 ERROR: failed to upload '.dvc/cache/67/560cedfa23a09b3844c3278136052f.dir' to 'https://<user>@drive.switch.ch/remote.php/dav/files/<user>/datasets/67/560cedfa23a09b3844c3278136052f.dir'
2020-10-27 18:30:59,680 DEBUG: fetched: [(1925,)]
2020-10-27 18:30:59,682 ERROR: failed to push data to the cloud - 2 files failed to upload
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/dvc/command/data_sync.py", line 50, in run
    processed_files_count = self.repo.push(
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/dvc/repo/__init__.py", line 51, in wrapper
    return f(repo, *args, **kwargs)
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/dvc/repo/push.py", line 35, in push
    return len(used_run_cache) + self.cloud.push(used, jobs, remote=remote)
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/dvc/data_cloud.py", line 65, in push
    return self.repo.cache.local.push(
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/dvc/remote/base.py", line 15, in wrapper
    return f(obj, named_cache, remote, *args, **kwargs)
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/dvc/cache/local.py", line 427, in push
    return self._process(
  File "/usr/local/Cellar/dvc/1.9.0/libexec/lib/python3.9/site-packages/dvc/cache/local.py", line 396, in _process
    raise UploadError(fails)
dvc.exceptions.UploadError: 2 files failed to upload
------------------------------------------------------------
2020-10-27 18:30:59,687 DEBUG: Analytics is disabled.

This is the full output, minus the user field which is changed to for privacy reasons.
Hope this can help.

@efiop
Copy link
Contributor

efiop commented Oct 27, 2020

Any info about the server? At the first glance seems like the server is not understanding the chunked upload. Might be missing something though. CC @iksnagreb

@efiop efiop added the research label Oct 27, 2020
@LucaButera
Copy link
Contributor Author

The server is a Switch Drive, which is a cloud storage provider based on ownCloud. I would assume the WebDAV server is the same as ownCloud, but I don't have further info

@efiop
Copy link
Contributor

efiop commented Oct 27, 2020

@LucaButera Thanks! So it might be a bug in https://github.com/ezhov-evgeny/webdav-client-python-3 , need to take a closer look.

@efiop efiop removed the awaiting response we are waiting for your reply, please respond! :) label Oct 27, 2020
@LucaButera
Copy link
Contributor Author

@efiop After browsing the source code it seems plausible to me. Mind that I am able to connect to the server through MacOS Finder, so it doesn't seem a server issue.

Sadly the upload_to method in the webdav library does not allow to pass headers, even though the underlying method that performs the request allows custom headers. They even have an issue open regarding this subject ezhov-evgeny/webdav-client-python-3#80

One solution might be to emulate the upload_to method and directly call execute_request on the Client object. In the meantime hope for the resolution of the mentioned issue. Otherwise one should open a PR on https://github.com/ezhov-evgeny/webdav-client-python-3 and solve the issue so that custom headers can be directly passed to the upload_to method.

I am willing to help, I might download both DVC and webdav-client source code and try out these modifications by myself, just to report if adding the header fixes the issue. I just don't know how to trigger the dvc push using the modified source.

@iksnagreb
Copy link
Contributor

Any info about the server? At the first glance seems like the server is not understanding the chunked upload. Might be missing something though. CC @iksnagreb

@efiop @LucaButera Can we try to figure out, whether it is really (only) the chunked upload and not something else?

@LucaButera If you have a copy of the dvc repository and some time to try something: It should be quite easy to change the _upload method of the WebDAVTree to use the upload_file method, which irrc does no chunking of the file.

https://github.com/iterative/dvc/blob/master/dvc/tree/webdav.py#L243

You would have to change the last line
self._client.upload_to(buff=chunks(), remote_path=to_info.path)
to
self._client.upload_file(local_path=from_file, remote_path=to_info.path)

If this modification lets you upload files, we can be pretty sure it is the chunking or a bug in the webdavclient upload_to method. Note that this will disable the progressbar, so it might seem as it is hanging...

I assume you have no valid dvc cache at the remote yet (as uploading does not work at all)? So you cannot check whether downloading is working?

Before trying to upload the file, the parent directories should be created e.g. datasets/a7, could you please check, whether this was successful?

@LucaButera
Copy link
Contributor Author

@efiop @iksnagreb I will try to modify the source in the afternoon and report to you.

Concerning the creation of the base folders, yes they get created, so connection to the server should be working.

@skshetry
Copy link
Member

skshetry commented Oct 28, 2020

@LucaButera, to see if the chunking upload is the issue, you could also try sending a curl request with chunking upload:

$ curl --upload-file test.txt https://<user>@drive.switch.ch/remote.php/dav/files/<user>/test.txt -vv --http1.1 --header "Transfer-Encoding: chunked"

Also, check without that header. If the files are uploaded successfully on both instances, something's wrong with the library. If it's just the former, chunking upload might have been forbidden on the server entirely.

@LucaButera
Copy link
Contributor Author

@skshetry I tried your suggestion which seemed quicker. Actually without the header it correctly uploads the file, while with the chunked upload it returns 411 as with dvc.

@iksnagreb  @efiop Do I have any way to perform non-chunked upload in DVC? Or I have no choice but to contact the provider and hope they can somehow enable the chunked upload?

@iksnagreb
Copy link
Contributor

iksnagreb commented Oct 28, 2020

Do I have any way to perform non-chunked upload in DVC?

Hm, I do not thinks this is possible right now - at least for the WebDAV remote. It should be possible to implement an option to enable non-chunked upload, the problem I see is: This would also disable the progressbar (without chunking, we cannot count progress...) which is not obvious and might confuse users. @efiop Are there options for disabling chunking for other remotes, if yes, how do these handle that problem?

Or I have no choice but to contact the provider and hope they can somehow enable the chunked upload?

I think an option for selecting chunked/non-chunked upload could be an configuration option (if we can find a way to handle this conveniently), there are probably other cloud providers disallowing chunked upload as well...

@skshetry
Copy link
Member

@LucaButera, did you try @iksnagreb's suggestion? If that works, we could provide a config for disabling it.

If that didn't work, I am afraid there's no other easy solution than to contact the provider. Nextcloud/Owncloud does support non-standard webdav extension for chunking upload for these kind of situations, but it's unlikely we are going to support it.

@LucaButera
Copy link
Contributor Author

@iksnagreb actually it could be intuitive to have an option on dvc push for non-chunked upload, like, for example, setting jobs to 0.

@skshetry I am trying it, I just downloaded dvc source and I'm trying to figure it out. Will report back soon.

@LucaButera
Copy link
Contributor Author

@skshetry I can confirm that @iksnagreb suggestion works, I have been able to push and pull from the WebDAV storage. Moreover I must say that the progressbar works, but it updates less frequently, probably on each file upload.

What should I do next?

@iksnagreb
Copy link
Contributor

Then lets think about implementing something like dvc remote modify <remote> chunked_upload false (I think true should be the default). Maybe chunked_transfer or just chunked would be a better name as this might apply to download as well?

@LucaButera
Copy link
Contributor Author

@iksnagreb I think chunked is a good choice. Otherwise, as I suggested previously, it could be an idea to have non chunked behavior when --jobs 0 as I imagine having multiple jobs works only when using chunks. But I might be wrong.

@iksnagreb
Copy link
Contributor

iksnagreb commented Oct 29, 2020

Hm, I think jobs just control how many upload processes to start in parallel, each of these could then be a chunked or non-chunked transfer. You might be right, that more jobs makes sense with chunking (as it allows for transmitting and reading from disk more parallel), so there is probably not much (performance) benefit from a single chunked job. But I do not known much about the jobs thing (@efiop?).

However, I think of the chunking more as an option choice at the communication/transmission level between server and client (where the client needs to match what the server can understand). Furthermore, chunking allowed to implement the progressbar per file, irrc that was the reason to use the chunked upload in the first place.

@LucaButera
Copy link
Contributor Author

@iksnagreb then I think having something like chunked false with true as default should be a nice solution.

It could also be overridden by a command option on dvc push and dvc pull.
This can allow the user to change the option on the fly in a non permanent way.
I don't know if it has use cases, but should be easy to implement.

@efiop
Copy link
Contributor

efiop commented Oct 29, 2020

Seems like adding a config option for it would greatly worsen the ui. Not having a progress bar is a very serious thing. I also don't like the idea of introducing a CLI option, because that seems out-of-place. Plus it potentially breaks future scenarios in which dvc would push automatically.

I'm genuinely that this problem even exists, hope we are not simply missing some info here.

If I understand the situation correctly, if we introduce that option in any way, it will also result in people running into timeout errors for big files. This is unacceptable for dvc, as we are storing files without chunking them (at least for now, there are some plans #829 ) and so webdav uploads will break for big files (files might gigabytes and much bigger) which is our core use case. This is a dealbreaker.

As pointed out by @skshetry , this is likely a provider problem, so I would look for a solution there. I didn't look deeply into https://docs.nextcloud.com/server/15/developer_manual/client_apis/WebDAV/chunking.html , but that seems like a feature request for our webdav library and not for dvc, right? Or am I missing something?

@iksnagreb
Copy link
Contributor

[...] it will also result in people running into timeout errors for big files [...]

Uff, yes, did not even think about this yet... You probably not want to adjust the timeout config depending on your expected file size, so chunked transmission is the only solution to avoid timeouts per request.

@LucaButera
Copy link
Contributor Author

@efiop I think you are right with the large files issue. Tell me if I got this straight. The problem here is not chunking being enabked or not but rather the fact that chunking is implemented in a peculiar way in this provider's webdav. Is this correct?

Mind that this platform is based on ownCloud and not nextcloud. Don't know if that is relevant.

@skshetry
Copy link
Member

skshetry commented Oct 30, 2020

I'm also facing similar but slightly different issue with "Nextcloud + mod_fcgi" (which is a bug in httpd2), in which files are uploaded empty.

The original issue might be due to that bug (not fixed yet) or, this bug which was only fixed 2 years ago (OP's server is 2.4.18, whereas recent one is 2.4.46).

Sabredav's wiki has a good insight into these bugs:

Finder (On OS X) uses Transfer-Encoding: Chunked in PUT request bodies. This is a little-used HTTP feature, and therefore not implemented in a bunch of web servers. The only server I've seen so far that handles this reasonably well is Apache + mod_php. Nginx and Lighttpd respond with 411 Length Required, which is completely ignored by Finder. This was seen on Nginx 0.7.63. It was recently reported that a development release (1.3.8) no longer had this issue.

When using this with Apache + FastCGI PHP completely drops the request body, so it will seem as if the PUT request was successful, but the file will end up empty.

So, the best thing to do is either drop "chunked" requests on PUT or introduce config to disable it.

Not having a progress bar is a very serious thing

@efiop, as the webdavclient3 uses streaming upload, we can still support progress bars:

with open(file, "rb") as fd:
    with Tqdm.wrapattr(fd, "read", ...) as wrapped:
        self._client.upload_to(buff=wrapped, remote_path=to_info.path)

Look here for the change:

def chunks():

but that seems like a feature request for our WebDAV library and not for DVC, right? Or am I missing something?

The Owncloud Chunking (NG) might be too slow for our use case, as it needs to create a separate request for each chunk (and, then send "MOVE" that joins all the chunk which is again expensive). So, unless we change our upload strategy to parallelize chunking upload rather than file upload, we will make it 3-4x slower, just for the sake of having a progress bar.
And, it seems it's possible to have a progress bar without it.
Not to add, it's not a WebDAV standard, that's unsupported outside of Nextcloud and Owncloud.

it will also result in people running into timeout errors

I don't think, there is any way around timeout errors, especially if we talk about PHP based WebDAV servers (they have a set max_execution_time). The Owncloud Chunking NG exists because of this very reason.

Though, we could just chunk and upload and then assemble it during pull. I think, this is what rclone chunker does.

For closing this issue, we could just disable chunking upload via a config or by default.

@LucaButera
Copy link
Contributor Author

@skshetry it would be wonderful to have a simple solution like that.

On the other hand a more reliable solution like the one of the "assembly on pull" seems also a nice feature in the long run.

I have never contributed to open source projects but I am willing to help if needed, as I think DVC is really a much needed tool.

@skshetry
Copy link
Member

skshetry commented Nov 1, 2020

I am willing to help if needed

@LucaButera, that'd be great. See if that above snippets work. Also, make sure you test a few scenarios manually (we lack tests for webdav, though that will be added soon).

If you face any issues, please comment here or ask on #dev-talk on the Discord. Thanks.

@LucaButera
Copy link
Contributor Author

@skshetry Ok, I'll test a few scenarios, namely:

  1. Loading many small files.
  2. Loading a really large file.
  3. Loading a realistic folder.

Just a question, do you need me to simply test a few cases with the snippet above or do I need to open a PR implementing the snippet and the relative config needed to use it?

@skshetry
Copy link
Member

skshetry commented Nov 2, 2020

@LucaButera, It'd be great if you could make a PR. Thanks. Check contributing-guide for setup.

the relative config needed to use it?

Maybe, no need of the config, but we can decide that on the PR discussion.

LucaButera pushed a commit to LucaButera/dvc that referenced this issue Nov 2, 2020
@gstrauss
Copy link

gstrauss commented Nov 5, 2020

@LucaButera @skshetry FYI: lighttpd supports PUT with Transfer-Encoding: chunked since lighttpd 1.4.44, released almost 4 years ago. lighttpd 1.4.54, released a year and a half ago, has major performance enhancements to lighttpd mod_webdav and large files.

What version of lighttpd are you having trouble with?

@skshetry
Copy link
Member

skshetry commented Nov 5, 2020

@gstrauss, thanks for participating and the info. I was quoting from the Sabredav's wiki, which is more than 6 years old, so it might not be up-to-date. And, we were not using lighthttpd, @LucaButera's server is Apache 2.4.18 which is ~5yrs old whereas mine is Apache 2.4.46.

But, we'll bump into old web-servers, so we have to err in the side of caution and just remove chunking upload (is there any disadvantages/performance hit to that?)

@gstrauss
Copy link

gstrauss commented Nov 5, 2020

just remove chunking upload (is there any disadvantages/performance hit to that?)

If you already know the content length on the client side, then there should be no performance hit.

If the upload is generated content, then the content would have to first be cached locally on the client to be able to determine the content length when Transfer-Encoding: chunked is not being used. There can be a performance hit and additional local resource usage to do so.

@LucaButera
Copy link
Contributor Author

@gstrauss @skshetry So are you suggesting to completely remove the option for chunked upload? Doesn't this pose an issue with the upload of large files?

@skshetry
Copy link
Member

skshetry commented Nov 6, 2020

@LucaButera, we stream-upload the file, so it does not affect the memory usage. There should not be any issues that were not already there with this approach.

LucaButera pushed a commit to LucaButera/dvc that referenced this issue Nov 6, 2020
LucaButera pushed a commit to LucaButera/dvc that referenced this issue Nov 6, 2020
LucaButera pushed a commit to LucaButera/dvc that referenced this issue Nov 9, 2020
I159 added a commit to I159/dvc that referenced this issue Nov 12, 2020
…limit

* 'master' of github.com:iterative/dvc:
  dag: add --outs option (iterative#4739)
  Add test server and tests for webdav (iterative#4827)
  Simpler param updates with python-benedict (iterative#4780)
  checkpoints: set DVC_ROOT environment variable (iterative#4877)
  api: add support for simple wildcards (iterative#4864)
  tests: mark azure test as flaky (iterative#4881)
  setup.py: limit responses version for moto (iterative#4879)
  remote: avoid chunking on webdav. Fixes iterative#4796 (iterative#4828)
  checkpoints: `exp run` and `exp res[ume]` refactor (iterative#4855)
@jdonzallaz
Copy link

Hi @LucaButera and @skshetry, sorry to intervene on this closed issue, but I don't understand how it was solved.

I have exactly the same problem, I try to push to the drive.switch.ch server and get the same error (411 Length Required).

How did you configure your remote?

@skshetry
Copy link
Member

skshetry commented Feb 16, 2021

@jdonzallaz, could you please share the dvc version output and the full traceback of the error (try adding -v on the command that you are using)?

This change should not require any configuration on the user's side.

@LucaButera
Copy link
Contributor Author

@jdonzallaz as @skshetry said, after the fix I didn't need any particular configuration. Note that I configured username and password from .config file rather than being prompted for them.

@jdonzallaz
Copy link

Thanks for the fast replies.

DVC version gives:

DVC version: 1.11.16 (pip)
---------------------------------
Platform: Python 3.8.7 on Windows-10-10.0.19041-SP0
Supports: http, https, webdav, webdavs
Cache types: hardlink, symlink
Caches: local
Remotes: https
Repo: dvc, git

And the dvc push -v:

dvc push -v
2021-02-16 12:13:17,841 DEBUG: Check for update is enabled.
2021-02-16 12:13:18,012 DEBUG: Trying to spawn '['daemon', '-q', 'updater']'
2021-02-16 12:13:18,072 DEBUG: Spawned '['daemon', '-q', 'updater']'
2021-02-16 12:13:18,084 DEBUG: fetched: [(3,)]
2021-02-16 12:13:18,093 DEBUG: Assuming 'C:\code\testdvc\.dvc\cache\4c\832f3bfe6edb1b0e7947d657185a97.dir' is unchanged since it is read-only
2021-02-16 12:13:18,096 DEBUG: Assuming 'C:\code\testdvc\.dvc\cache\4c\832f3bfe6edb1b0e7947d657185a97.dir' is unchanged since it is read-only
2021-02-16 12:13:18,112 DEBUG: Preparing to upload data to 'https://<username>@drive.switch.ch/remote.php/dav/files/<username>/'
2021-02-16 12:13:18,114 DEBUG: Preparing to collect status from https://<username>@drive.switch.ch/remote.php/dav/files/<username>/
2021-02-16 12:13:18,116 DEBUG: Collecting information from local cache...
2021-02-16 12:13:18,117 DEBUG: Assuming 'C:\code\testdvc\.dvc\cache\6e\75a7d28ffe827ca5c2763afd79d14e' is unchanged since it is read-only
2021-02-16 12:13:18,119 DEBUG: Assuming 'C:\code\testdvc\.dvc\cache\4c\832f3bfe6edb1b0e7947d657185a97.dir' is unchanged since it is read-only
2021-02-16 12:13:18,121 DEBUG: Assuming 'C:\code\testdvc\.dvc\cache\3d\8f7869e657486809713fea3cdc23f1' is unchanged since it is read-only
2021-02-16 12:13:18,122 DEBUG: Collecting information from remote cache...
2021-02-16 12:13:18,123 DEBUG: Querying 1 hashes via object_exists
Enter a password for host 'drive.switch.ch' user '<username>':
  0% Querying remote cache|                                                              |0/1 [00:00<?,     ?file/s] 2021-02-16 12:14:56,953 DEBUG: Matched '0' indexed hashes
2021-02-16 12:14:56,954 DEBUG: Querying 3 hashes via object_exists
2021-02-16 12:14:58,964 DEBUG: Uploading 'C:\code\testdvc\.dvc\cache\6e\75a7d28ffe827ca5c2763afd79d14e' to 'https://<username>@drive.switch.ch/remote.php/dav/files/<username>/6e/75a7d28ffe827ca5c2763afd79d14e'
2021-02-16 12:14:58,964 DEBUG: Uploading 'C:\code\testdvc\.dvc\cache\3d\8f7869e657486809713fea3cdc23f1' to 'https://<username>@drive.switch.ch/remote.php/dav/files/<username>/3d/8f7869e657486809713fea3cdc23f1'
2021-02-16 12:14:59,072 ERROR: failed to upload 'C:\code\testdvc\.dvc\cache\3d\8f7869e657486809713fea3cdc23f1' to 'https://<username>@drive.switch.ch/remote.php/dav/files/<username>/3d/8f7869e657486809713fea3cdc23f1' - '411 Length Required'
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\python38\lib\site-packages\dvc\remote\base.py", line 35, in wrapper
    func(from_info, to_info, *args, **kwargs)
  File "c:\python38\lib\site-packages\dvc\tree\base.py", line 377, in upload
    self._upload(  # noqa, pylint: disable=no-member
  File "c:\python38\lib\site-packages\dvc\tree\http.py", line 207, in _upload
    raise HTTPError(response.status_code, response.reason)
dvc.exceptions.HTTPError: '411 Length Required'
------------------------------------------------------------
2021-02-16 12:14:59,079 DEBUG: failed to upload full contents of 'data', aborting .dir file upload
2021-02-16 12:14:59,081 ERROR: failed to upload 'C:\code\testdvc\.dvc\cache\4c\832f3bfe6edb1b0e7947d657185a97.dir' to 'https://<username>@drive.switch.ch/remote.php/dav/files/<username>/4c/832f3bfe6edb1b0e7947d657185a97.dir'
2021-02-16 12:15:04,860 ERROR: failed to upload 'C:\code\testdvc\.dvc\cache\6e\75a7d28ffe827ca5c2763afd79d14e' to 'https://<username>@drive.switch.ch/remote.php/dav/files/<username>/6e/75a7d28ffe827ca5c2763afd79d14e' - '411 Length Required'
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\python38\lib\site-packages\dvc\remote\base.py", line 35, in wrapper
    func(from_info, to_info, *args, **kwargs)
  File "c:\python38\lib\site-packages\dvc\tree\base.py", line 377, in upload
    self._upload(  # noqa, pylint: disable=no-member
  File "c:\python38\lib\site-packages\dvc\tree\http.py", line 207, in _upload
    raise HTTPError(response.status_code, response.reason)
dvc.exceptions.HTTPError: '411 Length Required'
------------------------------------------------------------
2021-02-16 12:15:04,868 DEBUG: fetched: [(6,)]
2021-02-16 12:15:04,879 ERROR: failed to push data to the cloud - 3 files failed to upload
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\python38\lib\site-packages\dvc\command\data_sync.py", line 50, in run
    processed_files_count = self.repo.push(
  File "c:\python38\lib\site-packages\dvc\repo\__init__.py", line 54, in wrapper
    return f(repo, *args, **kwargs)
  File "c:\python38\lib\site-packages\dvc\repo\push.py", line 35, in push
    return len(used_run_cache) + self.cloud.push(used, jobs, remote=remote)
  File "c:\python38\lib\site-packages\dvc\data_cloud.py", line 65, in push
    return remote.push(
  File "c:\python38\lib\site-packages\dvc\remote\base.py", line 56, in wrapper
    return f(obj, *args, **kwargs)
  File "c:\python38\lib\site-packages\dvc\remote\base.py", line 432, in push
    return self._process(
  File "c:\python38\lib\site-packages\dvc\remote\base.py", line 401, in _process
    raise UploadError(fails)
dvc.exceptions.UploadError: 3 files failed to upload
------------------------------------------------------------
2021-02-16 12:15:04,893 DEBUG: Analytics is enabled.
2021-02-16 12:15:04,897 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', 'C:\\Users\\<user>\\AppData\\Local\\Temp\\tmpwxuf_fuq']'
2021-02-16 12:15:04,961 DEBUG: Spawned '['daemon', '-q', 'analytics', 'C:\\Users\\<user>\\AppData\\Local\\Temp\\tmpwxuf_fuq']'

I tried with both password configured or prompted.
Maybe the problem is in the url ? (https://drive.switch.ch/remote.php/dav/files/<username>/)
Also note that my username is my email address, so it contains a "@". I'm unable to change it though.

@skshetry
Copy link
Member

skshetry commented Feb 16, 2021

@jdonzallaz, you'd need to add the url as:

webdavs://<username>@drive.switch.ch/remote.php/dav/files/<username>
# or, set `user` to get rid of `<username>@` in the front, though you still need the one in the end.

Look for the webdav in the doc here: https://dvc.org/doc/command-reference/remote/add#supported-storage-types

@jdonzallaz
Copy link

Ok, that fixed the problem, thank you.

Now when I push, I randomly had the following error: <s:exception>Sabre\\DAV\\Exception\\Forbidden</s:exception>

Forbidden
2021-02-16 12:55:03,612 DEBUG: Check for update is enabled.
2021-02-16 12:55:03,786 DEBUG: Trying to spawn '['daemon', '-q', 'updater']'
2021-02-16 12:55:03,844 DEBUG: Spawned '['daemon', '-q', 'updater']'
2021-02-16 12:55:03,856 DEBUG: fetched: [(3,)]
2021-02-16 12:55:03,862 DEBUG: Assuming 'C:\code\testdvc\.dvc\cache\4c\832f3bfe6edb1b0e7947d657185a97.dir' is unchanged since it is read-only
2021-02-16 12:55:03,863 DEBUG: Assuming 'C:\code\testdvc\.dvc\cache\4c\832f3bfe6edb1b0e7947d657185a97.dir' is unchanged since it is read-only
2021-02-16 12:55:03,907 DEBUG: Preparing to upload data to 'https://drive.switch.ch/remote.php/dav/files/[email protected]/jobroom/'
2021-02-16 12:55:03,909 DEBUG: Preparing to collect status from https://drive.switch.ch/remote.php/dav/files/[email protected]/jobroom/
2021-02-16 12:55:03,910 DEBUG: Collecting information from local cache...
2021-02-16 12:55:03,912 DEBUG: Assuming 'C:\code\testdvc\.dvc\cache\3d\8f7869e657486809713fea3cdc23f1' is unchanged since it is read-only
2021-02-16 12:55:03,914 DEBUG: Assuming 'C:\code\testdvc\.dvc\cache\6e\75a7d28ffe827ca5c2763afd79d14e' is unchanged since it is read-only
2021-02-16 12:55:03,918 DEBUG: Assuming 'C:\code\testdvc\.dvc\cache\4c\832f3bfe6edb1b0e7947d657185a97.dir' is unchanged since it is read-only
2021-02-16 12:55:03,919 DEBUG: Collecting information from remote cache...
2021-02-16 12:55:03,920 DEBUG: Querying 1 hashes via object_exists
2021-02-16 12:55:06,117 DEBUG: Matched '0' indexed hashes
2021-02-16 12:55:06,493 DEBUG: Estimated remote size: 256 files
2021-02-16 12:55:06,494 DEBUG: Querying '3' hashes via traverse
2021-02-16 12:55:12,098 DEBUG: Uploading 'C:\code\testdvc\.dvc\cache\3d\8f7869e657486809713fea3cdc23f1' to 'https://drive.switch.ch/remote.php/dav/files/[email protected]/jobroom/3d/8f7869e657486809713fea3cdc23f1'
2021-02-16 12:55:16,411 DEBUG: Uploading 'C:\code\testdvc\.dvc\cache\4c\832f3bfe6edb1b0e7947d657185a97.dir' to 'https://drive.switch.ch/remote.php/dav/files/[email protected]/jobroom/4c/832f3bfe6edb1b0e7947d657185a97.dir'

2021-02-16 12:55:18,986 ERROR: failed to upload 'C:\code\testdvc\.dvc\cache\4c\832f3bfe6edb1b0e7947d657185a97.dir' to 'https://drive.switch.ch/remote.php/dav/files/[email protected]/jobroom/4c/832f3bfe6edb1b0e7947d657185a97.dir' - Request to https://drive.switch.ch/remote.php/dav/files/jonathan.donzallaz%40edu.hefr.ch/jobroom/4c/832f3bfe6edb1b0e7947d657185a97.dir failed with code 403 and message: b'<?xml version="1.0" encoding="utf-8"?>\n<d:error xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns">\n  <s:exception>Sabre\\DAV\\Exception\\Forbidden</s:exception>\n  <s:message/>\n</d:error>\n'
------------------------------------------------------------
Traceback (most recent call last):
File "c:\python38\lib\site-packages\dvc\remote\base.py", line 35, in wrapper
  func(from_info, to_info, *args, **kwargs)
File "c:\python38\lib\site-packages\dvc\tree\base.py", line 377, in upload
  self._upload(  # noqa, pylint: disable=no-member
File "c:\python38\lib\site-packages\dvc\tree\webdav.py", line 234, in _upload
  self._client.upload_to(
File "c:\python38\lib\site-packages\webdav3\client.py", line 66, in _wrapper
  res = fn(self, *args, **kw)
File "c:\python38\lib\site-packages\webdav3\client.py", line 438, in upload_to
  self.execute_request(action='upload', path=urn.quote(), data=buff)
File "c:\python38\lib\site-packages\webdav3\client.py", line 226, in execute_request
  raise ResponseErrorCode(url=self.get_url(path), code=response.status_code, message=response.content)
webdav3.exceptions.ResponseErrorCode: Request to https://drive.switch.ch/remote.php/dav/files/jonathan.donzallaz%40edu.hefr.ch/jobroom/4c/832f3bfe6edb1b0e7947d657185a97.dir failed with code 403 and message: b'<?xml version="1.0" encoding="utf-8"?>\n<d:error xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns">\n  <s:exception>Sabre\\DAV\\Exception\\Forbidden</s:exception>\n  <s:message/>\n</d:error>\n'
------------------------------------------------------------
2021-02-16 12:55:18,995 DEBUG: fetched: [(9,)]
2021-02-16 12:55:19,005 ERROR: failed to push data to the cloud - 1 files failed to upload
------------------------------------------------------------
Traceback (most recent call last):
File "c:\python38\lib\site-packages\dvc\command\data_sync.py", line 50, in run
  processed_files_count = self.repo.push(
File "c:\python38\lib\site-packages\dvc\repo\__init__.py", line 54, in wrapper
  return f(repo, *args, **kwargs)
File "c:\python38\lib\site-packages\dvc\repo\push.py", line 35, in push
  return len(used_run_cache) + self.cloud.push(used, jobs, remote=remote)
File "c:\python38\lib\site-packages\dvc\data_cloud.py", line 65, in push
  return remote.push(
File "c:\python38\lib\site-packages\dvc\remote\base.py", line 56, in wrapper
  return f(obj, *args, **kwargs)
File "c:\python38\lib\site-packages\dvc\remote\base.py", line 432, in push
  return self._process(
File "c:\python38\lib\site-packages\dvc\remote\base.py", line 401, in _process
  raise UploadError(fails)
dvc.exceptions.UploadError: 1 files failed to upload
------------------------------------------------------------
2021-02-16 12:55:19,015 DEBUG: Analytics is enabled.
2021-02-16 12:55:19,020 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', 'C:\\Users\\JONATH~1.DON\\AppData\\Local\\Temp\\tmphvkdy0qe']'
2021-02-16 12:55:19,086 DEBUG: Spawned '['daemon', '-q', 'analytics', 'C:\\Users\\JONATH~1.DON\\AppData\\Local\\Temp\\tmphvkdy0qe']'

Apparently, it is related to whether the folder is open in my browser or not. Weird.
When it succeeds to push, it sometimes says 2 files pushed or 3 files pushed. And when I pull, it said 1 file modified.
I added the whole data folder to dvc, which contains 2 files.

But the files seem correctly uploaded/pushed and downloaded/pulled.

@LucaButera
Copy link
Contributor Author

@jdonzallaz regarding the Forbidden issue: it happens to me as well. If you push again eventually all the files get pushed without errors. I don't really know why that happens.

@skshetry
Copy link
Member

@LucaButera @jdonzallaz, it's hard to say from the client side what's wrong, as there's no error message there (server logs would have helped here).

Try using --jobs 1 and see if it fixes the issue. If it does not, the server might be cleaning things up ( if you have deleted some files before trying to push again in a very short time difference). So, please try pushing in a certain interval.

If it still comes up randomly, maybe the firewall/antivirus is to blame on the server side (but there could be many more reasons).

@jdonzallaz
Copy link

Indeed it works when I retried.
Thanks for you help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs: webdav Related to the Webdav filesystem research
Projects
None yet
Development

No branches or pull requests

6 participants