add dataproxy access to ebrains_drive #15

xgui3783 · 2022-06-30T06:40:09Z

TODO:

code documentation
add test coverage

bugfix: unauth decorator

xgui3783 · 2022-07-11T14:40:48Z

@apdavison or @appukuttan-shailesh I have tentatively added dataproxy support in ebrains-drive.

Are there any appetite to incorporate this functionality to ebrains-drive, or would it be more suitable to create a new package, in your opinion?

appukuttan-shailesh · 2022-07-11T14:51:56Z

I personally think it makes sense to include it here.
@Cracky5457 might also want to weigh in?

(I myself was intending to implement this at some point; so I am happy to see this contribution! Thanks @xgui3783 )

xgui3783 · 2022-07-12T14:11:41Z

we will also add support for HDG datasets in this PR in future commits.

I will convert this PR to a draft for the time being.

Please feel free to provide feedbacks all the same!

I will ping you all when the PR is ready for review.

xgui3783 · 2022-07-13T06:24:05Z

@i-Zaak I have push a commit that should have support for HDG (see read me example )

I would welcome your review of the PR as well as a user.

Whilst not in the scope, would the support for public {collab,datasest} buckets be something we are interested?

Cracky5457 · 2022-07-14T12:05:20Z

Hello thanks for theses developement.

We just need to ensure that the lib will works properly on different env (prod, int, dev) it's in discussion to put dataproxy and bucket on int too.

xgui3783 · 2022-07-14T12:17:02Z

We just need to ensure that the lib will works properly on different env (prod, int, dev) it's in discussion to put dataproxy and bucket on int too.

Would be lovely if so.

I don't think I have access to -int/-dev environments of data-proxy, and thus, currently, it will raise exception if env other than empty string is passed

Cracky5457 · 2022-07-21T13:23:38Z

Was you able to try your change and do you want a new release version for it ?

appukuttan-shailesh · 2022-07-21T13:46:20Z

@xgui3783: I notice that you have provided some examples in the README.md file. But could you also list out all the functionality available inside doc.md?

appukuttan-shailesh · 2022-07-21T13:47:49Z

Also, I do not use buckets much, so probably best to have someone who actively uses the same to test this out. Thanks again.

xgui3783 · 2022-07-22T07:56:31Z

@xgui3783: I notice that you have provided some examples in the README.md file. But could you also list out all the functionality available inside doc.md?

I will add some doc there. I did not see doc.md

Also, I do not use buckets much, so probably best to have someone who actively uses the same to test this out. Thanks again.

I believe @i-Zaak uses the buckets? He promised me that he would check it once he is back from vacation =)

Was you able to try your change and do you want a new release version for it ?

I will make some update the doc.md per @appukuttan-shailesh 's suggestion, then I would love to have the changes merged and released.

xgui3783 · 2022-07-22T10:04:33Z

Dear @appukuttan-shailesh @Cracky5457 , doc.md has been updated.

Let's wait on feedback on @i-Zaak on the functionality (or the lack there of) of buckets.

i-Zaak · 2022-07-22T16:44:35Z

Hey, sorry for the delays. I've tested the Buckets and datasets and I think this is definitely going in the right direction!

I've tested

opening existing private bucket (works)
listing files (works)
- we might consider later adding switches for more "file-system-like" behavior such as ommitting the prefix or listing only files with exact prefix (files in "directory") etc.
getting the file content (works)
- I'd also suggest adding a function wrapping the content in BytesIO to simplify things like np.load(io.BytesIO(file_handle.get_content()), allow_pickle=True)
opening an unrestricted dataset, and listing files (works)
opening a dataset behind HDG: here I ran into issues
- first I tried without asking for access, exception is thrown as expected
- then I ran with request_access=True, access was requested and email was sent (I clicked the link and gained access)
- but now I'm getting again 401 - and even for the public dataset which worked few moments ago o_O
- I can access all the datasets in the data-proxy gui

PS: there is a missing depenency for version_query

xgui3783 · 2022-07-22T16:54:16Z

Re: missing dependency, that was there before I got here =S

I will investigate the HDG issue again.

thanks for spending time on this! Greatly appreciated!

xgui3783 · 2022-07-25T15:03:24Z

I have had a chance to look into the code again, re @i-Zaak 's comments:

but now I'm getting again 401 - and even for the public dataset which worked few moments ago o_O

Unfortunately, I am not able to reproduce, but I have a few suspicion how this could have been the case:

token expiring at the wrong time
dataproxy/cscs object storage unstable

I added a new commit that checks token expiry, and will raise when it is. Unfortunately, there's not much I can do about the second one oO

we might consider later adding switches for more "file-system-like" behavior such as ommitting the prefix or listing only files with exact prefix (files in "directory") etc.

certainly reasonable. I would recommend adding this as an iterative improvement, in a separate PR?

I can imagine we can even create a directory abstraction for data proxy bucket.

I'd also suggest adding a function wrapping the content in BytesIO to simplify things like
np.load(io.BytesIO(file_handle.get_content()), allow_pickle=True)

This I will need a little help. should we use content type to decide how to decode the file, or file extension, or a mixture?

I feel much more comfortable letting the user of the library determine how the files should be parsed.

Another approach maybe, we can maintain a list of lambda functions, given such filehandle, this is how you can decode the file content.

PS: there is a missing depenency for version_query

ping @appukuttan-shailesh , I think this dependency is in the original seafile python client? I think this is only used when building the pypi artefact, and not needed otherwise? Still this cause some issue when user try to install the package via git, for example. Could we explicitly add it as a dep?

appukuttan-shailesh · 2022-07-25T15:15:02Z

I believe this dependency was added by Axel here.

I think there we can/should add it to the deps to avoid such problems.

i-Zaak · 2022-07-26T09:55:08Z

I just revisited my experiments and it looks like "token expiring at the wrong time" was the cause. However I'm suspicious that the "wrong time" might not be completely independent of the interaction with HDG. Is the token maybe reset after the user is granted access to a HDG dataset (wasn't able to reproduce)?

Is there a way to get some info on the current token (e.g. expiry date etc.)?

xgui3783 · 2022-07-26T10:21:26Z

Is there a way to get some info on the current token (e.g. expiry date etc.)?

the latest commit also now checks the token expiry, and will raise if the token is expired

note that this does not check the authenticity of the token, simply its expiry.

I don't think new tokens are generated pre or post granting access to HDG. The access is controlled purely at the dataproxy level, assuming a valid token is presented.

i-Zaak · 2022-07-26T12:13:39Z

My bad - I've seen the code, but misread it. I've seen the tokens have ~ 2.5h live time, so it is easy to get caught in between if one plays with the code for a while...

appukuttan-shailesh · 2022-07-26T12:24:34Z

@xgui3783, @i-Zaak : Token life of ~2.5 hours seems unusually short! I don't have much experience using the dataproxy... does it use the same token as would apply for the SeaFile client? i.e. would a token obtained via the DriveApiClient be usable with the BucketApiClient? I don't recollect the former having problem of token expiry (but possibly not tested for larger durations).

@Cracky5457 : is there a known workaround for getting longer duration tokens?
I know that for JavaScript apps different authentication processes led to tokens of different lifetimes (from hours to days....see PR and this commit). @xgui3783 : can something along these lines be done here?

i-Zaak · 2022-07-26T12:32:04Z

To correct myself, the expiry is actually 30m, I was getting the token for testing from the https://kg-editor.humanbrainproject.eu/ (bad habit from previous times). Token from the Collaboratory is substantially more durable (almost a week).

xgui3783 · 2022-07-26T13:39:56Z

i.e. would a token obtained via the DriveApiClient be usable with the BucketApiClient?

DriveApiClient does not currently fetch tokens. The user need to explicitly obtain the token from somewhere (lab.ebrains.eu, kg.ebrains.eu etc) If the said token has the correct scope (I think clb.drive:write and clb.drive:read), it will be usable by DriveApiClient

side note: it is probably feasible to check the scope check on DriveApiClient

can something along these lines be done here?

Token expiry depends entirely on how long it is requested. kg creates tokens of short expiry. Since the client does not fetch tokens, there is nothing much I can do.

appukuttan-shailesh · 2022-07-26T14:03:28Z

From what I remember, DriveApiClient can authenticate and get tokens via this.
Isn't that what happens when we do:

client = ebrains_drive.connect('hbp_username', 'password')

The same should be possible for BucketApiClient.

xgui3783 · 2022-07-26T16:11:58Z

From what I remember, DriveApiClient can authenticate and get tokens via this. Isn't that what happens when we do:
client = ebrains_drive.connect('hbp_username', 'password')
The same should be possible for BucketApiClient.

IMO, authentication is out of scope for this PR. But since the auth logic has been moved to ClientBase, I believe the following code should work:

from ebrains_drive import BucketApiClient

client = BucketApiClient(username="user", password="password")

I did not modify the connect method for backwards compat reasons. I could export a method such as connect_dataproxy, but I find it no more verbose or succinct using the constructor function.

appukuttan-shailesh · 2022-07-26T16:20:10Z

That's fine. The above is what I was proposing.

Could @i-Zaak check if the token expiration problem persists when the token is fetched via the above method, i.e. login using credentials (username, password), as opposed to providing the token obtained from elsewhere.

i-Zaak · 2022-07-28T11:15:30Z

Hello, yes - the token obtained through username and password has much longer live-time (a week-ish). For my usecase, I'm mainly looking into using the tokens obtained within the collaboratory (clb_nb_utils.oauth.get_token) to avoid usage leading people saving their password in the notebooks or REPL history.

And one more thing - to be able to call

client = ebrains_drive.connect('hbp_username', 'password')

I had to move around the _set_env call before the super().__init__ to avoid access to non-existent iam_url. Bug or feature? :)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/izaak/local_repos/nostromo/ebrains-drive/ebrains_drive/client.py", line 131, in __init__
    super().__init__(username, password, token, env)
  File "/home/izaak/local_repos/nostromo/ebrains-drive/ebrains_drive/client.py", line 28, in __init__
    self._get_token()
  File "/home/izaak/local_repos/nostromo/ebrains-drive/ebrains_drive/client.py", line 48, in _get_token
    self.iam_url+'/auth/realms/hbp/protocol/openid-connect/token',
AttributeError: 'BucketApiClient' object has no attribute 'iam_url'

xgui3783 · 2022-07-28T14:19:08Z

I had to move around the _set_env call before the super().init to avoid access to non-existent iam_url. Bug or feature? :)

haha oops... Lemme fix that 😅

xgui3783 · 2022-07-28T14:25:06Z

Just pushed a commit that should fix the iam_url issue

appukuttan-shailesh · 2022-07-30T10:15:05Z

Hello, yes - the token obtained through username and password has much longer live-time (a week-ish). For my usecase, I'm mainly looking into using the tokens obtained within the collaboratory (clb_nb_utils.oauth.get_token) to avoid usage leading people saving their password in the notebooks or REPL history.

Thats great... as good as it gets I guess. So the token issue seems resolved 👍

xgui3783 · 2022-08-11T12:43:33Z

@appukuttan-shailesh and @Cracky5457 , is there anything else we are waiting for this PR?

I would love to see this PR merged and also, hopefully a new pypi package published, so we can patch our dependency appropriately.

appukuttan-shailesh · 2022-08-11T13:00:55Z

All ok from my side (I haven't had a chance to test it, but @i-Zaak seems to have tried it out sufficiently).
@Cracky5457 handles the PRs and PyPI releases.

Cracky5457 · 2022-08-15T08:30:48Z

Thanks to all of you, version 0.5.0 is release. I hope it will works well ^^

xgui3783 added 5 commits June 29, 2022 15:33

MVP dataproxy

89f31ac

feat: cleaned up and unified file handle

5c368a5

bugfix: use name as marker (found by brute force)

e91f233

bugfix: unauth decorator

bugfix: access file content

4c7f353

add some test coverage

c163a1e

xgui3783 marked this pull request as ready for review July 11, 2022 14:36

xgui3783 changed the title ~~draft: add dataproxy access to ebrains_drive~~ add dataproxy access to ebrains_drive Jul 11, 2022

xgui3783 added 2 commits July 12, 2022 09:20

fix decorator & add test for decorator factory

c0a023e

fix typo

29b6315

xgui3783 marked this pull request as draft July 12, 2022 14:11

MVP support for dataset/hdg

ec4d8fa

xgui3783 marked this pull request as ready for review July 13, 2022 06:19

feat: add doc.md

a970964

bugfix: delete file

e1c7925

add token expiration exception

081fa51

fix driveclient init

5282c5f

bugfix: set env prior to get token

395aa7c

appukuttan-shailesh requested a review from Cracky5457 August 11, 2022 13:01

Cracky5457 merged commit 2d9f32a into HumanBrainProject:master Aug 15, 2022

xgui3783 deleted the feat_dataproxyMVP branch August 15, 2022 08:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add dataproxy access to ebrains_drive #15

add dataproxy access to ebrains_drive #15

xgui3783 commented Jun 30, 2022 •

edited

Loading

xgui3783 commented Jul 11, 2022

appukuttan-shailesh commented Jul 11, 2022

xgui3783 commented Jul 12, 2022 •

edited

Loading

xgui3783 commented Jul 13, 2022 •

edited

Loading

Cracky5457 commented Jul 14, 2022

xgui3783 commented Jul 14, 2022

Cracky5457 commented Jul 21, 2022 •

edited

Loading

appukuttan-shailesh commented Jul 21, 2022

appukuttan-shailesh commented Jul 21, 2022

xgui3783 commented Jul 22, 2022

xgui3783 commented Jul 22, 2022

i-Zaak commented Jul 22, 2022

xgui3783 commented Jul 22, 2022

xgui3783 commented Jul 25, 2022

appukuttan-shailesh commented Jul 25, 2022

i-Zaak commented Jul 26, 2022 •

edited

Loading

xgui3783 commented Jul 26, 2022

i-Zaak commented Jul 26, 2022

appukuttan-shailesh commented Jul 26, 2022

i-Zaak commented Jul 26, 2022

xgui3783 commented Jul 26, 2022

appukuttan-shailesh commented Jul 26, 2022 •

edited

Loading

xgui3783 commented Jul 26, 2022

appukuttan-shailesh commented Jul 26, 2022

i-Zaak commented Jul 28, 2022

xgui3783 commented Jul 28, 2022

xgui3783 commented Jul 28, 2022

appukuttan-shailesh commented Jul 30, 2022

xgui3783 commented Aug 11, 2022

appukuttan-shailesh commented Aug 11, 2022

Cracky5457 commented Aug 15, 2022

add dataproxy access to ebrains_drive #15

add dataproxy access to ebrains_drive #15

Conversation

xgui3783 commented Jun 30, 2022 • edited Loading

xgui3783 commented Jul 11, 2022

appukuttan-shailesh commented Jul 11, 2022

xgui3783 commented Jul 12, 2022 • edited Loading

xgui3783 commented Jul 13, 2022 • edited Loading

Cracky5457 commented Jul 14, 2022

xgui3783 commented Jul 14, 2022

Cracky5457 commented Jul 21, 2022 • edited Loading

appukuttan-shailesh commented Jul 21, 2022

appukuttan-shailesh commented Jul 21, 2022

xgui3783 commented Jul 22, 2022

xgui3783 commented Jul 22, 2022

i-Zaak commented Jul 22, 2022

xgui3783 commented Jul 22, 2022

xgui3783 commented Jul 25, 2022

appukuttan-shailesh commented Jul 25, 2022

i-Zaak commented Jul 26, 2022 • edited Loading

xgui3783 commented Jul 26, 2022

i-Zaak commented Jul 26, 2022

appukuttan-shailesh commented Jul 26, 2022

i-Zaak commented Jul 26, 2022

xgui3783 commented Jul 26, 2022

appukuttan-shailesh commented Jul 26, 2022 • edited Loading

xgui3783 commented Jul 26, 2022

appukuttan-shailesh commented Jul 26, 2022

i-Zaak commented Jul 28, 2022

xgui3783 commented Jul 28, 2022

xgui3783 commented Jul 28, 2022

appukuttan-shailesh commented Jul 30, 2022

xgui3783 commented Aug 11, 2022

appukuttan-shailesh commented Aug 11, 2022

Cracky5457 commented Aug 15, 2022

xgui3783 commented Jun 30, 2022 •

edited

Loading

xgui3783 commented Jul 12, 2022 •

edited

Loading

xgui3783 commented Jul 13, 2022 •

edited

Loading

Cracky5457 commented Jul 21, 2022 •

edited

Loading

i-Zaak commented Jul 26, 2022 •

edited

Loading

appukuttan-shailesh commented Jul 26, 2022 •

edited

Loading