-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pushing artifacts via WebDAV results in a 411 Length Required response #4796
Comments
Hi @LucaButera Please show full log for |
This is the full output, minus the user field which is changed to for privacy reasons. |
Any info about the server? At the first glance seems like the server is not understanding the chunked upload. Might be missing something though. CC @iksnagreb |
The server is a Switch Drive, which is a cloud storage provider based on ownCloud. I would assume the WebDAV server is the same as ownCloud, but I don't have further info |
@LucaButera Thanks! So it might be a bug in https://github.com/ezhov-evgeny/webdav-client-python-3 , need to take a closer look. |
@efiop After browsing the source code it seems plausible to me. Mind that I am able to connect to the server through MacOS Finder, so it doesn't seem a server issue. Sadly the One solution might be to emulate the I am willing to help, I might download both DVC and webdav-client source code and try out these modifications by myself, just to report if adding the header fixes the issue. I just don't know how to trigger the |
@efiop @LucaButera Can we try to figure out, whether it is really (only) the chunked upload and not something else? @LucaButera If you have a copy of the dvc repository and some time to try something: It should be quite easy to change the https://github.com/iterative/dvc/blob/master/dvc/tree/webdav.py#L243 You would have to change the last line If this modification lets you upload files, we can be pretty sure it is the chunking or a bug in the webdavclient I assume you have no valid dvc cache at the remote yet (as uploading does not work at all)? So you cannot check whether downloading is working? Before trying to upload the file, the parent directories should be created e.g. |
@efiop @iksnagreb I will try to modify the source in the afternoon and report to you. Concerning the creation of the base folders, yes they get created, so connection to the server should be working. |
@LucaButera, to see if the chunking upload is the issue, you could also try sending a curl request with chunking upload: $ curl --upload-file test.txt https://<user>@drive.switch.ch/remote.php/dav/files/<user>/test.txt -vv --http1.1 --header "Transfer-Encoding: chunked" Also, check without that header. If the files are uploaded successfully on both instances, something's wrong with the library. If it's just the former, chunking upload might have been forbidden on the server entirely. |
@skshetry I tried your suggestion which seemed quicker. Actually without the header it correctly uploads the file, while with the chunked upload it returns 411 as with dvc. @iksnagreb @efiop Do I have any way to perform non-chunked upload in DVC? Or I have no choice but to contact the provider and hope they can somehow enable the chunked upload? |
Hm, I do not thinks this is possible right now - at least for the WebDAV remote. It should be possible to implement an option to enable non-chunked upload, the problem I see is: This would also disable the progressbar (without chunking, we cannot count progress...) which is not obvious and might confuse users. @efiop Are there options for disabling chunking for other remotes, if yes, how do these handle that problem?
I think an option for selecting chunked/non-chunked upload could be an configuration option (if we can find a way to handle this conveniently), there are probably other cloud providers disallowing chunked upload as well... |
@LucaButera, did you try @iksnagreb's suggestion? If that works, we could provide a config for disabling it. If that didn't work, I am afraid there's no other easy solution than to contact the provider. Nextcloud/Owncloud does support non-standard webdav extension for chunking upload for these kind of situations, but it's unlikely we are going to support it. |
@iksnagreb actually it could be intuitive to have an option on @skshetry I am trying it, I just downloaded dvc source and I'm trying to figure it out. Will report back soon. |
@skshetry I can confirm that @iksnagreb suggestion works, I have been able to push and pull from the WebDAV storage. Moreover I must say that the progressbar works, but it updates less frequently, probably on each file upload. What should I do next? |
Then lets think about implementing something like |
@iksnagreb I think |
Hm, I think jobs just control how many upload processes to start in parallel, each of these could then be a chunked or non-chunked transfer. You might be right, that more jobs makes sense with chunking (as it allows for transmitting and reading from disk more parallel), so there is probably not much (performance) benefit from a single chunked job. But I do not known much about the jobs thing (@efiop?). However, I think of the chunking more as an option choice at the communication/transmission level between server and client (where the client needs to match what the server can understand). Furthermore, chunking allowed to implement the progressbar per file, irrc that was the reason to use the chunked upload in the first place. |
@iksnagreb then I think having something like It could also be overridden by a command option on |
Seems like adding a config option for it would greatly worsen the ui. Not having a progress bar is a very serious thing. I also don't like the idea of introducing a CLI option, because that seems out-of-place. Plus it potentially breaks future scenarios in which dvc would push automatically. I'm genuinely that this problem even exists, hope we are not simply missing some info here. If I understand the situation correctly, if we introduce that option in any way, it will also result in people running into timeout errors for big files. This is unacceptable for dvc, as we are storing files without chunking them (at least for now, there are some plans #829 ) and so webdav uploads will break for big files (files might gigabytes and much bigger) which is our core use case. This is a dealbreaker. As pointed out by @skshetry , this is likely a provider problem, so I would look for a solution there. I didn't look deeply into https://docs.nextcloud.com/server/15/developer_manual/client_apis/WebDAV/chunking.html , but that seems like a feature request for our webdav library and not for dvc, right? Or am I missing something? |
Uff, yes, did not even think about this yet... You probably not want to adjust the timeout config depending on your expected file size, so chunked transmission is the only solution to avoid timeouts per request. |
@efiop I think you are right with the large files issue. Tell me if I got this straight. The problem here is not chunking being enabked or not but rather the fact that chunking is implemented in a peculiar way in this provider's webdav. Is this correct? Mind that this platform is based on ownCloud and not nextcloud. Don't know if that is relevant. |
I'm also facing similar but slightly different issue with "Nextcloud + mod_fcgi" (which is a bug in httpd2), in which files are uploaded empty. The original issue might be due to that bug (not fixed yet) or, this bug which was only fixed 2 years ago (OP's server is Sabredav's wiki has a good insight into these bugs:
So, the best thing to do is either drop "chunked" requests on PUT or introduce config to disable it.
@efiop, as the webdavclient3 uses streaming upload, we can still support progress bars: with open(file, "rb") as fd:
with Tqdm.wrapattr(fd, "read", ...) as wrapped:
self._client.upload_to(buff=wrapped, remote_path=to_info.path) Look here for the change: Line 224 in f827d64
The Owncloud Chunking (NG) might be too slow for our use case, as it needs to create a separate request for each chunk (and, then send "MOVE" that joins all the chunk which is again expensive). So, unless we change our upload strategy to parallelize chunking upload rather than file upload, we will make it 3-4x slower, just for the sake of having a progress bar.
I don't think, there is any way around timeout errors, especially if we talk about PHP based WebDAV servers (they have a set Though, we could just chunk and upload and then assemble it during For closing this issue, we could just disable chunking upload via a config or by default. |
@skshetry it would be wonderful to have a simple solution like that. On the other hand a more reliable solution like the one of the "assembly on pull" seems also a nice feature in the long run. I have never contributed to open source projects but I am willing to help if needed, as I think DVC is really a much needed tool. |
@LucaButera, that'd be great. See if that above snippets work. Also, make sure you test a few scenarios manually (we lack tests for webdav, though that will be added soon). If you face any issues, please comment here or ask on #dev-talk on the Discord. Thanks. |
@skshetry Ok, I'll test a few scenarios, namely:
Just a question, do you need me to simply test a few cases with the snippet above or do I need to open a PR implementing the snippet and the relative config needed to use it? |
@LucaButera, It'd be great if you could make a PR. Thanks. Check contributing-guide for setup.
Maybe, no need of the config, but we can decide that on the PR discussion. |
@LucaButera @skshetry FYI: lighttpd supports PUT with What version of lighttpd are you having trouble with? |
@gstrauss, thanks for participating and the info. I was quoting from the Sabredav's wiki, which is more than 6 years old, so it might not be up-to-date. And, we were not using But, we'll bump into old web-servers, so we have to err in the side of caution and just remove chunking upload (is there any disadvantages/performance hit to that?) |
If you already know the content length on the client side, then there should be no performance hit. If the upload is generated content, then the content would have to first be cached locally on the client to be able to determine the content length when |
@LucaButera, we stream-upload the file, so it does not affect the memory usage. There should not be any issues that were not already there with this approach. |
…limit * 'master' of github.com:iterative/dvc: dag: add --outs option (iterative#4739) Add test server and tests for webdav (iterative#4827) Simpler param updates with python-benedict (iterative#4780) checkpoints: set DVC_ROOT environment variable (iterative#4877) api: add support for simple wildcards (iterative#4864) tests: mark azure test as flaky (iterative#4881) setup.py: limit responses version for moto (iterative#4879) remote: avoid chunking on webdav. Fixes iterative#4796 (iterative#4828) checkpoints: `exp run` and `exp res[ume]` refactor (iterative#4855)
Hi @LucaButera and @skshetry, sorry to intervene on this closed issue, but I don't understand how it was solved. I have exactly the same problem, I try to push to the drive.switch.ch server and get the same error (411 Length Required). How did you configure your remote? |
@jdonzallaz, could you please share the This change should not require any configuration on the user's side. |
@jdonzallaz as @skshetry said, after the fix I didn't need any particular configuration. Note that I configured username and password from .config file rather than being prompted for them. |
Thanks for the fast replies. DVC version gives:
And the dvc push -v: dvc push -v
I tried with both password configured or prompted. |
@jdonzallaz, you'd need to add the url as:
Look for the |
Ok, that fixed the problem, thank you. Now when I push, I randomly had the following error: Forbidden
Apparently, it is related to whether the folder is open in my browser or not. Weird. But the files seem correctly uploaded/pushed and downloaded/pulled. |
@jdonzallaz regarding the |
@LucaButera @jdonzallaz, it's hard to say from the client side what's wrong, as there's no error message there (server logs would have helped here). Try using If it still comes up randomly, maybe the firewall/antivirus is to blame on the server side (but there could be many more reasons). |
Indeed it works when I retried. |
Bug Report
I am trying to connect to a remote via WebDAV. I can correctly setup user and password along with the url, but when I try to push the artifacts I get a
411 Length Required
response. How can I solve the missing header problem?Please provide information about your setup
DVC version: 1.9.0 (brew)
Platform: Python 3.9.0 on macOS-10.15.7-x86_64-i386-64bit
Supports: azure, gdrive, gs, http, https, s3, ssh, oss, webdav, webdavs
Cache types: reflink, hardlink, symlink
Repo: dvc, git
The text was updated successfully, but these errors were encountered: