-
-
Notifications
You must be signed in to change notification settings - Fork 10
Permission Denied when writing to HDFS #132
Comments
@jcrist , do containers created by |
Yes. The delegation token for the default filesystem is provided in each container, and picked up automatically by libhdfs (the backend for pyarrow's hdfs reader), and maybe libhdfs3 (haven't tested). Currently skein doesn't handle delegation token renewal, so this will stop working after it expires, but that should only matter for long running jobs (> 1 day). |
Thanks @jcrist . |
The only reason I tried it with knit was that I wanted to have a quick and easy test, but if it does not work, then I'll try skein. One more question, though: when I set the permission 777 for everything, I received another error (see the update to my original question). I was using hdf3 at the time. I also tried pyarrow, but it complained about not being able to load |
Oop, actually, upon reading your error message it looks like you're using simple authentication instead of kerberos. In that case, no, skein has the same bug under simple authentication. I've been putting off fixing it in favor of other things, but I'll track that down today. Should be a simple fix.
Pyarrow should work fine on any system, but you may need to set some environment variables, see: https://arrow.apache.org/docs/python/filesystems.html#hadoop-file-system-hdfs |
@TomAugspurger , did something change in pandas |
What version of pandas? 0.23.0 introduced a couple when using compression: pandas-dev/pandas#21144 and pandas-dev/pandas#17778, and I think those fixes introduces another that is being included in 0.23.2. |
Ah, correct, this seems to be in the ZIP branch (https://github.com/pandas-dev/pandas/blob/master/pandas/io/formats/csvs.py#L176), but the dask file object also handles compression internally. Not sure what to do about that. @yuriy-davygora , you could try passing |
@TomAugspurger: conda installed 0.23.1 automatically, I did not specify pandas version explicitely. I will try 0.23.2 tomorrow. @jcrist Thank you for your answer, I will give pyarrow another go tomorrow. |
FYI 0.23.2 isn't released yet. Probably a few days.
…On Thu, Jun 28, 2018 at 10:38 AM, yuriy-davygora ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger>: conda installed 0.23.1
automatically, I did not specify pandas version explicitely. I will try
0.23.2 tomorrow.
@jcrist <https://github.com/jcrist> Thank you for your answer, I will
give pyarrow another go tomorrow.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#132 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABQHImRRaCddY0z0SE4rQYrdNhEPfngOks5uBPhugaJpZM4U7Uhp>
.
|
@yuriy-davygora, dask-yarn (https://dask-yarn.readthedocs.io/en/latest/) has been released and now uses Skein (https://jcrist.github.io/skein/index.html), a more robust library for python/yarn interaction. The above permissions issue has been addressed there. As for the |
I haven't looked closely at the error.
Pandas 0.23.2 will be out shortly (next few days) and will hopefully fix
this class of bugs.
…On Tue, Jul 3, 2018 at 2:05 PM, Jim Crist ***@***.***> wrote:
@yuriy-davygora <https://github.com/yuriy-davygora>, dask-yarn (
https://dask-yarn.readthedocs.io/en/latest/) has been released and now
uses Skein (https://jcrist.github.io/skein/index.html), a more robust
library for python/yarn interaction. The above permissions issue has been
addressed there.
As for the to_csv issue, I'm not sure what needs to be done here. This
has nothing to do with hadoop stuff necessarily, and is more a bug(?) in
dask's bytes handling/to_csv functions. @TomAugspurger
<https://github.com/TomAugspurger>, @martindurant
<https://github.com/martindurant> any ideas on what if anything needs to
be done here?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#132 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIkApbRnm6Kn41SN1EvrZDCLJ_FJgks5uC8B6gaJpZM4U7Uhp>
.
|
I am trying out a basic 'distributed "Hello World" ' job using Dask on a YARN cluster. Basically I am reading some data from HDFS, mapping some columns and then writing them to a different HDFS folder. However, the last step does not work, I receive the following error:
I have googled for this error, and, apparently, it occurs when I try to write to HDFS as 'yarn' user and not as my own user. I haven't found anything in the documentation or in the source code about setting the user. I tried initializing DaskYARNCluster with user='yuriyd', but I still got the same error.
Any assistance or advice will be greatly appreciated.
Here is my code:
UPDATE: I have temporarily granted every user write permissions (hadoop fs -chmod -R 777 ...), but now I get a different error:
The text was updated successfully, but these errors were encountered: