Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Datatx driver #1242

Closed
wants to merge 40 commits into from
Closed

[WIP] Datatx driver #1242

wants to merge 40 commits into from

Conversation

redblom
Copy link
Contributor

@redblom redblom commented Oct 12, 2020

See sciencemesh/sciencemesh#140

Datatx driver pkg setup.
note: Datatx interface methods' signatures still indefinite.

@redblom redblom requested a review from labkode as a code owner October 12, 2020 10:38
labkode
labkode previously approved these changes Oct 19, 2020
@labkode
Copy link
Member

labkode commented Oct 19, 2020

@redblom good progress!

@redblom
Copy link
Contributor Author

redblom commented Oct 28, 2020

This now implements rclone driver DoTransfer, GetTransferStatus.
There are also 2 cli commands available for testing this: rclone-do-transfer, rclone-get-transfer-status
For tests you'll need a running rclone server and an rconfig file with the remotes/users specified.
Transfer eg. a cli triggered transfer between two running reva instances:

$ ./cmd/reva/reva rclone-do-transfer -senderEndpoint http://rclone-reva:5572/ -srcEndpoint reva-1 -srcPath /test/1gb.txt -srcToken marie -destEndpoint reva-2 -destPath /myfiles/1gb1.txt -destToken marie

This will echo the rclone job id.
Notes:

  • src/destEndpoint - the remotes' names as specified in the config
  • the tokens are not used but required by the cli

Get status eg.:

$ ./cmd/reva/reva rclone-get-transfer-status -senderEndpoint http://rclone-reva:5572/ -transferID 68

This will echo the transfer status.

@redblom
Copy link
Contributor Author

redblom commented Nov 4, 2020

All 3 methods (DoTransfer, GetTransferStatus, CancelTransfer) of rclone driver implemented.

Use reva cli rclone transfer commands:
(do transfer)
./reva rclone-do-transfer -senderEndpoint http://rclone-reva:5572 -srcEndpoint reva -srcPath /test/test01.txt -srcToken marie -destEndpoint reva-2 -destPath /myfiles/02112020-01.txt -destToken marie

(get transfer status)
./reva rclone-get-transfer-status -senderEndpoint http://rclone-reva:5572 -transferID 1

(cancel transfer)
./reva rclone-cancel-transfer -senderEndpoint http://rclone-reva:5572 -transferID 1

Copy link
Contributor

@ishank011 ishank011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The driver is supposed to be included in the GRPC service, not invoked directly by clients. Please take a look at the other GRPC services and make sure that this also follows that approach https://github.com/cs3org/reva/blob/master/internal/grpc/services/userprovider/userprovider.go

@redblom
Copy link
Contributor Author

redblom commented Nov 11, 2020

@ishank011 yeah I know, I'm working on those right now. These rclone clients are for driver testing purposes only, I don't expect them to be part of the final PR.

cmd/reva/main.go Outdated Show resolved Hide resolved
cmd/reva/transfer-create.go Outdated Show resolved Hide resolved
cmd/reva/transfer-create.go Show resolved Hide resolved
internal/grpc/services/datatx/datatx.go Outdated Show resolved Hide resolved
pkg/datatx/driver/rclone/rclone.go Outdated Show resolved Hide resolved
pkg/datatx/driver/rclone/rclone.go Outdated Show resolved Hide resolved
pkg/datatx/driver/config/config.go Outdated Show resolved Hide resolved
pkg/datatx/driver/txdriver.go Outdated Show resolved Hide resolved
pkg/datatx/driver/rclone/rclone.go Outdated Show resolved Hide resolved
Copy link
Contributor

@ishank011 ishank011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments. I'm not sure I fully understand how the token situation works. In OCM we transfer a token when sharing a file which is used to access resources on the file owner's provider, not sure how that translates here and how it connects to that, since these will be enabled only once an OCM share has been created, right? (https://wiki.cs3mesh4eosc.eu/wps/4/task44stuff/apis)

@@ -0,0 +1,25 @@
// Copyright 2018-2020 CERN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a folder called manager and move the loader, registry and rclone folders inside it. Refer how we do it in other packages.

@@ -33,14 +39,18 @@ func init() {
rgrpc.Register("datatx", New)
}

type config struct {
func (c *config) init() {
// set sane defaults
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set default value for Driver to rclone

// (token is username for now, it is used like that in the current rclone driver implementation)
// TODO: re-implement when the connection-string-remote syntax is available in rclone
// TODO: implement proper token use
srcToken := u.GetUsername()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do these tokens work? For the source token, we use the username and for destination, we use the grantee's ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the destination token should be available here, returned by OCM share request. But this is not implemented yet, see below: #1242 (comment).
Apparently a source token for the 'local' source file to be transferred should (at this point) be created somehow (see https://wiki.cs3mesh4eosc.eu/wps/4/task44stuff/apis#data-transfer 'generate token to read Hugo's file'). Maybe you could shed some light on (how to do) that.

So yes, the driver.DoTransfer token parameters are being 'abused' here. We need actual tokens.

txID := req.TxId.OpaqueId

// TODO find the jobID belonging to the transfer id; let transfer id be the jobID for now
var jobID int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to declare jobID before initializing

}

// rcloneCancelTransferReqJSON the rclone job/stop method json request
type rcloneCancelTransferReqJSON struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a generic struct for rcloneCancelTransferReqJSON and rcloneStatusReqJSON, since they have the same fields? (same for rcloneCancelTransferResJSON)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will declare them within the function there used in, that's actually what I wanted but thought it was invalid to declare a struct inside a method (my lack of golang knowledge).

@redblom
Copy link
Contributor Author

redblom commented Jan 6, 2021

Added a few comments. I'm not sure I fully understand how the token situation works. In OCM we transfer a token when sharing a file which is used to access resources on the file owner's provider, not sure how that translates here and how it connects to that, since these will be enabled only once an OCM share has been created, right? (https://wiki.cs3mesh4eosc.eu/wps/4/task44stuff/apis)

With data transfer a special transfer OCM share request (which will be a to be defined special datatx OCM protocol type) is done towards the anticipated destination and at destination a token is created and returned so the token is available at source site for use when doing the actual transfers (transfers are instigated at the source site so it must have a valid token).
Actually this is not described in the notes in CreateTransfer method of grpc service datatx.go. I will do that; should make it clear.

Copy link
Contributor

@ishank011 ishank011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Persistency should be implemented in the same interface, there shouldn't be the need for a new one. For example, refer how we do it for OCM invites https://github.com/cs3org/reva/blob/master/pkg/ocm/invite/manager/json/json.go

@redblom
Copy link
Contributor Author

redblom commented Mar 2, 2021

When the (datatx) share is received in the ocmd we know it's a datatx share through the protocol options parameter. However we currently cannot persist it as such. Share objects have no tx specifics. So the receiver does not know it's a datatx share either. How do we go about that?

@ishank011
Copy link
Contributor

Why can't we persist it? On the recipient side, the request would look like as described in https://wiki.cs3mesh4eosc.eu/wps/4/task44stuff/apis, so it would know it's a datatx share.

@redblom
Copy link
Contributor Author

redblom commented Mar 2, 2021

The datatx transfer ends up as a 'marshalled' Share object in the received_shares section of the shares file at the receiver's end. What options do I have to define there that it is a datatx share; so that the receiver (and the system) when accepting this share know that after reading this share from the shares file that this is a datatx share and the transfer can commence?

In other words, in the shares file, how does a datatx share differ from a regular share?

@ishank011
Copy link
Contributor

Add one more parameter to the schema?

@redblom
Copy link
Contributor Author

redblom commented Mar 2, 2021

That is one solution I was also thinking of.

So simplest:

cs3.sharing.ocm.v1beta1.Share { 
   ... ;
   bool is_data_transfer_share = 10;
}

Or maybe share_type in case there there could be other future types, don't know, maybe stream type ...

@ishank011
Copy link
Contributor

Sure, share_type would be better. Please create a PR and in the description part in the proto file, mention this use case

@redblom
Copy link
Contributor Author

redblom commented Mar 3, 2021

cs3org/cs3apis#109

@redblom
Copy link
Contributor Author

redblom commented Mar 4, 2021

How about using the share ID instead of transfer ID in cs3 datatx api. The share ID is already there. That would simplify things.

@redblom
Copy link
Contributor Author

redblom commented Mar 12, 2021

@ishank011 : I believe pkg.ocm.share manager interface method Save() needs an additional share type arg.

@ishank011
Copy link
Contributor

@redblom sure, feel free to make changes as you need.

@redblom
Copy link
Contributor Author

redblom commented Mar 23, 2021

@ishank011 please review #1578

@redblom
Copy link
Contributor Author

redblom commented Mar 24, 2021

@ishank011 : implementing this (#1578) I'm thinking we should not use CreateTransferRequest at all at the origin's end for a transfer but use CreateOCMShareRequest directly. An OCM share will always be the starting point anyway. So using reva cmd ocm-share-create would look something like:

cmd/reva/reva -insecure ocm-share-create -shareType transfer -grantee jarek1234 -granteeType user -idp softwaremind.com /home/test

I think it makes more sense.

Then we change the CS3 datatx CreateTransferRequest message to contain Share (or at the minimum the share id) (and optional opaque) fields only. *Note that the actual share id will be the transfer identifier.

Now when UpdateReceivedOCMShare is called at the recipient's end the CreateTransferRequest can follow with the share specified. And then the actual transfer can be initiated with a driver call:

pkg.datatx.TxDriver.CreateTransfer(shareID, ...src..., ...dest...)

TODO: the transfer driver needs a destination path. In case this is user defined it should be specified in the CreateTransferRequest message as well. The user interaction needed for this needs to be taken into account in the design.
2 options:
. through CreateTransferRequest by extending this message with a destination path
. through UpdateReceivedOCMShare and persist the destination path together with the share (Share needs to be extended). CreateTransferRequest does not need to be extended with destination path because it will be part of the persisted share.

@ishank011
Copy link
Contributor

When UpdateReceivedOCMShare is called, the reference is created in the shares directory https://github.com/cs3org/reva/blob/master/internal/grpc/services/gateway/ocmshareprovider.go#L260 -> https://github.com/cs3org/reva/blob/master/internal/grpc/services/gateway/ocmshareprovider.go#L291-L349, where the default path is /home/Shares/name-of-shared-resource. We can use the same path as the destination.

For WebDAV share references, we have the target webdav://token@idp?name=share-name. For transfers, we don't really need a target as the whole resource will be located here and we can just pass this path to rclone.

These references are statted in the gateway storageprovider service https://github.com/cs3org/reva/blob/master/internal/grpc/services/gateway/storageprovider.go#L1369-L1395. We'll need to add a case for transfer refs to just be returned as it is. We have all the existing functionalities that will allow users to make changes to any files/folders located inside this reference.

@redblom
Copy link
Contributor Author

redblom commented Mar 30, 2021

@ishank011 I'd like to see this merged first (create transfer type OCM share) #1578

@redblom
Copy link
Contributor Author

redblom commented Apr 8, 2021

When UpdateReceivedOCMShare is called, the reference is created in the shares directory https://github.com/cs3org/reva/blob/master/internal/grpc/services/gateway/ocmshareprovider.go#L260 -> https://github.com/cs3org/reva/blob/master/internal/grpc/services/gateway/ocmshareprovider.go#L291-L349, where the default path is /home/Shares/name-of-shared-resource. We can use the same path as the destination.

@ishank011 What would be the resulting location here, eg. marie/name-of-shared-resource ? And then ./reva stat /home/Shares/name-of-shared-resource would work?

@ishank011
Copy link
Contributor

ishank011 commented Apr 8, 2021

You're passing the /remote.php/webdav endpoints for both PUT and GET to rclone, right? Then you just need to use the path /home/Shares/name-of-shared-resource everywhere. Reva will resolve that based on the user in context.

@redblom
Copy link
Contributor Author

redblom commented Apr 8, 2021

You're passing the /remote.php/webdav endpoints for both PUT and GET to rclone, right? Then you just need to use the path /home/Shares/name-of-shared-resource everywhere. Reva will resolve that based on the user in context.

@ishank011 I use single endpoints for each request. For transfer the rclone service is called through 2 possible endpoints, operations/copyfile to transfer a file and sync/copy to transfer a folder structure. These calls will asynchronous and immediately return a jobID (to be used for status/cancel requests).
The requests' data is in the form of a single string with all necessary parameters included (https://rclone.org/docs/#connection-strings). A folder transfer example of such a req data string would be:
'{"_async":true,"srcFs":":webdav,bearer_token=\"...token...\",url=\"http://reva-datatx:20080/webdav\":/big-data","dstFs":":webdav,bearer_token=\"...token...\",url=\"http://reva-2:20080/webdav\":/big-data"}'
This would be a transfer to a 'local' path, eg marie/big-data/...

I experimented with accepted shares where I gave rclone the share path like url=\"http://reva-2:20080/webdav\":/MyShares/big-data but that did not quite work. I was expecting these files to exist somewhere in (my case) marie/big-data/... but for the user at /MyShares/big-data/.... So first I want to know if this is indeed what's expected.

@ishank011
Copy link
Contributor

Which service handles the requests at the /webdav endpoint?

@redblom
Copy link
Contributor Author

redblom commented Apr 8, 2021

Which service handles the requests at the /webdav endpoint?

../../internal/http/services/owncloud/ocdav/ocdav.go

@ishank011 I'm also not sure what should be returned here for the transfer case (https://github.com/cs3org/reva/blob/master/internal/grpc/services/gateway/storageprovider.go#L1369-L1395) exactly? Simply the provided resourceInfo? How should we establish here that this is a transfer ref?

@ishank011
Copy link
Contributor

Use the path /remote.php/webdav/home/MyShares/xyz. It should work

Simply the provided resourceInfo?

Yes

How should we establish here that this is a transfer ref?

Check the scheme

@redblom
Copy link
Contributor Author

redblom commented Apr 8, 2021

Use the path /remote.php/webdav/home/MyShares/xyz. It should work

Yes, that seems to work!

Simply the provided resourceInfo?

Yes

How should we establish here that this is a transfer ref?

Check the scheme

And that should read datatx I suppose, because now it's webdav. That makes sense.

@redblom
Copy link
Contributor Author

redblom commented Apr 12, 2021

Simply the provided resourceInfo?
Yes

How should we establish here that this is a transfer ref?

Check the scheme

And that should read datatx I suppose, because now it's webdav. That makes sense.

@ishank011 : I can set the scheme to datatx here:

TargetUri: fmt.Sprintf("webdav://%s@%s?name=%s", token, share.Creator.Idp, share.Name),

Then I can determine it at https://github.com/cs3org/reva/blob/master/internal/grpc/services/gateway/storageprovider.go#L1369-L1395 for the new case. Is that correct?

@redblom
Copy link
Contributor Author

redblom commented Dec 13, 2021

This PR is superseded by #2052

@redblom redblom closed this Dec 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants