support adding/transfering data straight to cache/remote #4520

efiop · 2020-09-02T23:24:17Z

We often see people trying to use --external to add some big dataset that they have on an external drive, where they also have an external cache dir. People often do that because they can't/don't want to copy their data to dvc repo to dvc add it normally, e.g. because their HDD/SSD won't be able to physically fit two copies of that dataset.

Same thing with s3/gs/etc, where people want to just move data straight to their remote, without having to download/add/push it, because, again, it might not even fit on their local machine.

That's why it would be great to introduce feature(s) to be able to move(or copy) data straight to cache/remote from it's original location. Potentially this is not only useful for dvc add but also for dvc import[-url], where you want to use some data (e.g. through streaming with our API) in your project, that won't fit on your machine.

Related to #3920

The text was updated successfully, but these errors were encountered:

jorgeorpinel · 2020-11-03T02:02:01Z

And yes, it seems like this feature could be valuable. It sounds like a "lightweight" data registry (see https://dvc.org/doc/use-cases/data-registries) for people to use DVC as an intermediary between data already hosted on cloud storage and other projects. So they can access the data with a unified interface (DVC CLI or API).

isidentical · 2020-12-14T13:50:42Z

Hey @efiop, is this copy/tranfer operation will be on a new command or an already existing one?

Also, may I request 2 workflow command-group samples (some sort of small, self-contained replication of the whole process so that I can test one my local environment)? One is from the current workflow of people, where they try to do on 2 different local file system points, and one is from what they are going to execute when this issue will be resolved.

efiop · 2020-12-14T14:17:34Z

@isidentical I'm not a 100% sure if this might fit into some existing command, would love to hear your thoughts on it.

Sure, here are some examples. Please make sure you are familiar with our workflow already (see our get-started guide)

`straight-to-remote` scenario:

Imagine having some file/directory on s3(or other cloud) and you want to add it to your dvc repo and push to your remote (say it is an s3 remote too, say dvc remote add -d mys3 s3://bucket/dvc-remote), you would need to

aws s3 cp s3://bucket/path/to/data data
dvc add data
dvc push

but now what if data is so big that it doesn't fit as a whole on your machine? You won't be able to perform the same set of commands. DVC is able to pull partial data(e.g. if data is a diretory: dvc pull data/subdir) or provide you streaming access to the data on remote (e.g. if data is a file, you could use api.open/read) but we need to somehow shove that giant file into the remote first. So we need some way to put that data into remote without downloading it fully locally. Again have to stress that this new operation, when it is done, should leave the dvc repo in a state that is indistinguishable from that regular worklow described above with the exception that you won't have that data in the local cache or in your workspace, only present in the remote (you'll have data.dvc though, so if you ever move to another machine that can fit the whole file, dvc pull should be able to bring it and checkout it as normal). Regarding particular command name and semantics - it is up to you to research and suggest the CLI design for it 😉

`straight-to-cache` scenario:

Now imagine you have giant HDD and a small SSD that you keep your dvc repo(aka workspace) on. Since SSD is too small, you set up a dvc cache on HDD (see https://dvc.org/doc/use-cases/shared-development-server) (say /storage/dvc-cache) and use symlinks to link from cache into your local workspace (see https://dvc.org/doc/user-guide/large-dataset-optimization). Now say you have a giant dataset in /storage/some/path/data that you want to add to your repo as if you'd be able to:

cp /storage/some/path/data data
dvc add data

your first instinct might be to dvc add /storage/some/path/data but that file is outside of dvc repo, but dvc will complain. At this point people often misuse --external and dvc add /storage/some/path/data --extranal, but that will make /storage/some/path/data a part of your workspace (dvc will change it there) instead of having data in your dvc repo. So it would be nice to have a way to transfer that file to your workspace. The solution for this feature might be related to straight-to-remote solution, or it might be a separate command/flag, it is unclear right now. If you wish, we can focus on one straight-to-remote or straight-to-cache exclusively for now, to narrow the scope, but it is just useful to keep in mind that these might have a common solution/interface.

isidentical · 2020-12-16T11:12:12Z

As a command-line model, I am thinking something like this;

dvc transfer to-remote [-r] <from> <file>
dvc transfer to-cache <from>

# straigh-to-remote's example
dvc transfer to-remote s3://bucket/path/to/data data
# straight-to-cache's example
dvc transfer to-cache /storage/some/path/data

isidentical · 2021-01-06T08:43:44Z

I am a bit blocked on the user interface, even though I can start writing tests and migrate them later I'd rather prefer to work on the interface first. Here are the ideas that came and go during meetings

dvc export-to-remote <url> <name>? -r <remote>
dvc add <name> --straight-to-remote --from-url <url> -r <remote>
dvc import-url <url> <name> -straight-to-remote -r <remote>

I see amending to the dvc add as the least likely option, since it would be too complicated to adjust it with 2 mandatory flags that one doesn't work with the other one (--straight-to-remote, --from-url).

@efiop said import-url tracks the original location where we would ignore it when straight-to-remote functionality is activated, so it might also not be the ideal one.

For export-to-remote, even though it is a new command @shcheklein suggested semantics are not clear from the name which I definitely agree with. Maybe we could re-use the import keyword, something like import-to-remote (since as @efiop already stated, this is what it does) and also import already associated with both track + download operations.

efiop · 2021-01-06T09:35:50Z

Straight-to-remote would be useful both in dvc add scenario and in dvc import-url actually, and in the latter one it should still track the original location as it does right now. This actually makes me wonder if it should indeed be a flag for both dvc add and dvc import-url. E.g.

dvc add s3://bucket/path -o mydata --straight-to-remote -r myremote  # would create mydata.dvc or path.dvc if no -o specified
dvc import-url s3://bucket/path mydata --straight-to-remote -r myremote  # would create mydata.dvc or path.dvc if no out is provided

the name of the flag looks a bit ugly though 🙂 And i'm not sure how intuitive it will be for users, though those same users find --external somehow and misuse it all the time 🙂

Notice how I didn't use --from-url in dvc add, because it seems unintuitive and users already misuse --external with dvc add s3://bucket/path --external, so that might be the intuitive way to add something from external location.

Regarding dvc export-to-remote, it creates an association with uploading dvc-tracked data to an external location, at least for me. Kinda like if you need to deploy a model to s3 from your dvc repo.

The cons of --straight-to-remote for add/import-url is that they will be overloading the add/import-url CLI, which might be too much. There might be a better way, but if not - this one seems acceptable, even if just in experimental mode, until we find a better name.

efiop · 2021-01-06T12:57:02Z

For the record: while discussing with @isidentical , we found that in straight-to-cache scenario something like

dvc add /external/file -o data

would do what user expects: it will cache the file(aka transfer it to cache) and will make it known as data in the root of the repo. If the user has configured external cache with, say, symlinks, then data will be a symlink too, so you can add a giant file to your project this way.

There is a weird association with --no-pull for dvc add/import-url when discussing --straight-to-remote, but that's not quite it. Maybe --transfer? Not finding the perfect name/ui for it so far 🙁

isidentical · 2021-01-06T13:22:06Z

There is a weird association with --no-pull for dvc add/import-url when discussing --straight-to-remote, but that's not quite it. Maybe --transfer? Not finding the perfect name/ui for it so far slightly_frowning_face

After a bit more discussion, we decided that just bare --to-remote (without straight) also might be an option.

The issue regarding it is the combination of it with -r looks a bit odd dvc add s3://bucket/file --to-remote -r my_other_remote.
We could potentially reduce both --to-remote and --remote into just --to-remote <remote (optional)> but it wouldn't be much consistent with actions (pull, push etc).

jorgeorpinel · 2021-01-13T04:11:33Z

Hi! Re-discovered this from docs (and mentioned in demo). Some comments and questions:

but also for dvc import[-url]
import-url tracks the original location where we would ignore it when straight-to-remote... something like import-to-remote
it should still track the original location

@efiop Wouldn't #4527 be a requirement for applying this to imports?
Otherwise indeed, it doesn't seem to make sense to enable this for import[-url] (if the result is exactly the same as for dvc add).

Also, the import-to-remote only sounds correct for the import case, what if you don't want to save the original location ("just add" case)? Cc @isidentical Personally, I would focus on add only at first, but up to you.

straight-to-remote scenario... aws s3 cp s3://bucket/path/to/data data; dvc add data; dvc push

Again, wouldn't #4527 cover that somwhat? ~~Maybe you could do import --no-exec wth --store somehow to reach the same "straight-to-remote" outcome.~~

straight-to-cache scenario

This is only meaningful when you have a custom cache outside the DVC project, right? Does that have any special implementation implications? (Prob not.)

dvc transfer
dvc export-to-remote

@isidentical those names sound more like a disconnected utility, similar to dvc get (cc @shcheklein). ~~If this is about tracking data in the process (just skipping placing it in the cache and/or workspace) then I agree (as implemented) that it makes more sense as a flag for dvc add.~~ (No longer sure because add would include push here, which may be too much...)

add --straight-to-remote --from-url ... too complicated

Agree on avoiding --from-url — ~~by implying --external in --to-remote, if I understand this correctly~~.

users find --external somehow and misuse it all the time

DVC gives a hint to try that, I think 🙂 BTW would we be able to give a hint about --to-remote when needed?

dvc add /external/file -o data would do what user expects

Except that you can't check it out to the workspace, right? But def. there are several similar operations and maybe we should reconsider them to see how they can be consolidated... Probably won't be obvious which one is appropriate to most users.

Maybe --no-pull or --transfer? Not finding the perfect name/ui for it so far

I think just add --to-remote is short and accurate enough. Another idea is to have a flag for each case: --no-cache or --skip-cache [straight to remote] AND --no/skip-local/workspace/wdir [straight to cache] (the former includes the latter).

We could potentially reduce both --to-remote and --remote

I definitely agree on letting --to-remote take an optional remote name as argument instead.

jorgeorpinel · 2021-01-13T05:56:43Z

So I answered some of my questions (~~scratched~~ above) reviewing UI in #5198. I have one thing to add though, after realizing that add --to-remote includes push.

Is this too much functionality in add? Maybe a standalone command for this is best after all... (dvc store/backup [--external]?)
What would be the result if pushing fails (does dvc push complete the operation after fixing the remote or do you have to try again with -f) ?
Is there any alternative workflow e.g. import --no-commit first, and push separately (again, maybe via import: Allow pushing imported files to remote #4527) ? I think it would be ideal to understand all the ways we deal with external data and try to consolidate them because it's getting confusing 🙂

jorgeorpinel · 2021-02-02T01:54:12Z

Hey sorry for the delay.

The current behavior says they are not in cache, which is actually true. I guess only thing we can do is somehow make it explicit? like not in local cache?

I like the idea of a new status/message for this, yes. I was thinking more about the changed outs: grouping, as they're not "changed". Maybe transferred:? Idk if it complicates things too much.

efiop · 2021-02-02T07:22:05Z

Support for remote status in regular dvc status is something that we've been talking about for a while #5369 . It will naturally support transfered data once we implement it. Def not part of this issue.

efiop · 2021-02-02T07:23:45Z

@isidentical #5343 is the last part we are missing (not counting for docs) until we can close this ticket, right? Just checking.

isidentical · 2021-02-02T07:25:52Z

@isidentical #5343 is the last part we are missing (not counting for docs) until we can close this ticket, right? Just checking.

Yes

jorgeorpinel · 2021-02-02T18:05:29Z

Support for remote status in regular dvc status is something that we've been talking about... Def not part of this issue.

Sure. I'm not suggesting that. I'm saying that the "change outs" message is misleading. No outputs have changed. But I guess that can happen in other circumstances too, so it's also out of scope anyway. But do you think we should try to address this (separately)?

efiop · 2021-02-02T19:02:33Z

@jorgeorpinel Sorry, didn't mean to dismiss it like that 🙁 That was just for the record, no bad intention.

Sure, we should reconsider the status output, it is quite obsolete. Doesn't seem like there are good simple action points there though, we'll need a general redesign of output to address multiple complaints that we've collected over the years. Looks like it is better to create an epic to collect all complaints instead of creating a yet another ticket. Though we can always repurpose it for an epic, no problem.

jorgeorpinel · 2021-02-03T23:54:58Z

Howdy again! Something came up in https://discord.com/channels/485586884165107732/563406153334128681/806673366416883733 about this recently: what happens if you setup a remote, then set is as external cache e.g. with dvc config cache.local (or any other kind), and finally add/import something straight --to-remote? I should try before asking but don't have time this instant. Seems like an unintended confusion or maybe even a possibly problematic circular situation.

efiop · 2021-02-03T23:56:33Z

@jorgeorpinel Nothing, it all gets transfered to remote directly. So those options don't affect --to-remote at all.

jorgeorpinel · 2021-02-04T00:06:19Z

OK its just confusing because the result is that the data gets transferred to the remote, which happens to be your (external) cache, so the data gets "cached" (even when you told DVC not to) but without calculating hashes. I wonder if dvc pull skips the non-download and then no hashes are ever calculated, or something like that.

I guess you can just use dvc checkout? Sorry I'm just thinking out loud. Should QA this properly... The point is that there may be unintended side effects to keep in mind.

efiop · 2021-02-04T00:18:56Z

@jorgeorpinel dvc pull, you need to download it from the remote first.

alealv · 2021-02-04T12:25:21Z

Hi folks,

I was the one discussing with @jorgeorpinel on the discord channel. I used DVC in the past so I have some experience with it.
At this new project, I face the mentioned straight-to-cache issue.

We use a server to train ML models with many GPUS, hence I configured DVC as Shared Development Server. The project/workspace is on a small SDD and the DVC cache is on a big NAS HDD.

The problem is that the pipeline I created to process the raw dataset (~3TB) and generate the data use to train (~6TB) is too much for the SDD, and I need/want to have it inside the project folder. The solution is to have a symlink pointing to the data in the NAS.

Tracking data directly

Before using the DVC pipeline I tracked the data with a data.dvc file and achieved what I intended in the following way:

dvc config cache.type hardlink,symlink
dvc add /mnt/nas/data
dvc move /mnt/nas/data ~/project/data

I guess the straight-to-cache solution you were proposing would simplify the process

Expand to see the rest of the discussion

The rest isn't related to this issue so I hid it.

Using a pipeline

But now, I created a pipeline which downloads (raw data comes from public datasets) and process the data. In this scenario I have a dvc.yaml file similar to this one (I don't put the original because it has 158 lines):

stages:
  download-data:
    cmd: wget https://<some-repo> -P /mnt/nas/raw_data
    outs:
      - /mnt/nas/raw_data:
      	cache: false
        persist: true

  proc-data:
    cmd: ./proc_data.sh --input /mnt/nas/raw_data --output /mnt/nas/waves
    outs:
      - /mnt/nas/waves

This works fine, but it's tracking the /mnt/nas/waves directory, which is outside the project. This brings some problems:

Given that I'm in a Shared Development Server different users will not be able to be on different commits at the same time.
The training scripts expect the data in a specific folder inside the project. I can solve this by creating a symlink by hand, but the data will still be tracked outside the repository.

If I do the following:

...
  proc-data:
    cmd: ./proc_data.sh --input /mnt/nas/raw_data --output waves
    outs:
      - waves

It will track the data where I intended, having a symlink waves -> /mnt/nas/dvc-cache/0f/<hash>.dir, which is what I want. But not before generating all the wave files in the SDD and then transferring them to the external cache.

That's why the straight-to-cache option will be optimal. But as I understand you are considering this for some commands but not for the pipeline, I'm I wrong?

Having a `local remote` as `cache`

I also asked @jorgeorpinel about this thread and configuring the cache as a remote, because the documentation says:

cache.local - name of a local remote to use as a custom cache directory. (Refer to dvc remote for more information on "local remotes".) This will overwrite the value provided to dvc config cache.dir or dvc cache dir.
https://dvc.org/doc/command-reference/config

I guess with that and this feature it will behave as I wanted.

To be honest, this confuses me a little, about the difference between a local remote and the cache. AFAIK, if I have a local remote and a cache on the same disk but different folders, I would have a double copy of the data, Am I right? I have not considered merging these two, and what are the implications of that.

Sorry for the long explanation, but I wanted to be very clear.

jorgeorpinel · 2021-02-04T18:48:44Z

Hey Ale nice chatting with you and thanks for posting here instead for visibility. My comments:

Before using the DVC pipeline
dvc add /mnt/nas/data
dvc move /mnt/nas/data ~/project/data

I think you meant dvc add --external there. BTW that's an interesting use of add + move! 👍

I guess the straight-to-cache solution you were proposing would simplify the process

I think that import-url by itself (no need for --to-cache) already achieves the same result: copies the data to the (external) cache first, then links it locally (assuming file links are configured in DVC and supported by the FS).

Or, if you meant setting up a remote with the same path as your external cache (error-prone), and then using add --to-remote. As you can see you still need 2+ steps, and things get a little confusing. So not sure... It's def. another interesting command interaction! But probably not the intended use here (hacky).

That's why the straight-to-cache option will be optimal. But as I understand you are considering this for some commands but not for the pipeline, I'm I wrong?

Correct, this issue doesn't affect pipelines at all. But I guess it's a good idea to consider implementing to-cache/remote for dvc.yaml outs. Thoughts @efiop @isidentical ?

Expand to see the rest of the discussion

The rest isn't related to this issue so I hid it.

now, I created a pipeline
tracking the /mnt/nas/waves directory, which is outside the project brings some problems...

Actually, there should be no problem for users to checkout different versions from the same cache. That's what the share cache patters is for 🙂 (The only issue could be file system errors if users try to write the same file at once — unlikely)
Is there absolutely no way for your scripts to read from /mnt/nas/waves instead? Otherwise, my recs are to add a stage, or an extra command in the training stage (cmd: can contain a list) that creates the link for now.

proc_data.sh --input /mnt/nas/raw_data --output waves will track the data where I intended But not before generating all the wave files in the SDD

Yes, DVC doesn't change your code and can't capture/redirect file system write operations, at least for now (we accept feature requests).

the documentation says:
cache.local - name of a local remote... this confuses me a little

Yeah that doc needs an update, thanks for the heads-up! I'm starting to review it in iterative/dvc.org/pull/2154

if I have a local remote and a cache on the same disk but different folders, I would have a double copy of the data, Am I right?

You are correct.

merging these two, and what are the implications of that

Not a good idea 🙂 — the cache could potentially get corrupted, and it's just confusing! We do sin a little in that we kind of do just that (merge the remote and cache concepts) for external outputs, which is part of the reason why we don't really recommend that anymore.

alealv · 2021-02-05T21:10:43Z

think you meant dvc add --external there. BTW that's an interesting use of add + move!

Yes, my bad. I forgot the --external

Actually, there should be no problem for users to checkout different versions from the same cache. That's what the share cache patters is for slightly_smiling_face (The only issue could be file system errors if users try to write the same file at once — unlikely)

I think there will be because when checking out the folder being tracked will change (aka /mnt/nas/data) and this is shared through all users.

Thanks for solving my other doubts.

I could modify the training script but it's not a good idea. What I will do is manually symlink the data folder inside the project.

jorgeorpinel · 2021-02-07T03:10:22Z

when checking out the folder being tracked will change (aka /mnt/nas/data

dvc checkout doesn't alter the cache at all. It reads .dvc files and dvc.yaml/lock files and creates the appropriate links FROM the cache into your workspace 🙂

alealv · 2021-02-07T10:26:25Z

dvc checkout doesn't alter the cache at all. It reads .dvc files and dvc.yaml/lock files and creates the appropriate links FROM the cache into your workspace slightly_smiling_face

Exactly, I wasn't referring to the cache. /mnt/nas/data will be changed and this is shared across all users. If I'm training and somebody checks out to a different commit it will corrupt my training.

jorgeorpinel · 2021-02-08T21:56:44Z

Ah I see what you mean. Yes, the external data itself should not be shared by several projects! External outputs are considered part of the extended workspace and project workspaces can't overlap, naturally. That's another reason why we don't recommend external outputs except for very specific uses where absolutely no other option is available.

I guess https://dvc.org/doc/user-guide/managing-external-data shouldn't link to https://dvc.org/doc/use-cases/shared-development-server#configure-the-external-shared-cache for instructions to config an external cache, as shared external caches are not compatible with external outputs. Moved this to iterative/dvc.org#654 (comment).

shcheklein · 2021-02-08T22:03:21Z

@alealv there is a workaround for this problem - https://github.com/PeterFogh/dvc_dask_use_case . You could use remote://something notation, and for different folks ask them to have their own --local , --global, or --system config to that remote that defines the exact location personally for them.

per iterative/dvc#4520 (comment) rel #654 (comment)

efiop added feature request Requesting a new feature p2-medium Medium priority, should be done, but less important labels Sep 2, 2020

pmrowla mentioned this issue Oct 26, 2020

add: fix issue when adding already tracked symlinked files #4778

Merged

2 tasks

This comment has been minimized.

Sign in to view

This was referenced Nov 30, 2020

For Cross Account S3 buckets - DVC Add gives put object access denied exception even after providing ACL of bucket-owner-full-control. #4887

Closed

[s3/windows] HeadObject operation: Not Found #3745

Closed

efiop mentioned this issue Dec 11, 2020

How to push in-memory object directly to remote store? #5068

Closed

iterative deleted a comment from tomasfarias Dec 13, 2020

This was referenced Jan 4, 2021

Implement straight-to-remote functionality #5198

Merged

add: handle long filenames #5201

Merged

efiop assigned isidentical Jan 5, 2021

efiop mentioned this issue Feb 2, 2021

2.0 release checklist #4841

Closed

11 tasks

dberenbaum mentioned this issue Feb 2, 2021

2.0 meta release plan #5367

Closed

31 tasks

efiop closed this as completed Feb 3, 2021

jorgeorpinel mentioned this issue Feb 8, 2021

cases: generalized shared dev server iterative/dvc.org#654

Closed

jorgeorpinel mentioned this issue Feb 9, 2021

guide: consolidate external data mgmt guides iterative/dvc.org#520

Closed

8 tasks

efiop mentioned this issue Feb 23, 2021

repro: S3 ETag Mismatch #5507

Closed

This was referenced Mar 11, 2021

cmd: add to-cache docs iterative/dvc.org#2246

Merged

import*: fully support to-cache/remote transfers #5623

Closed

jorgeorpinel added a commit to iterative/dvc.org that referenced this issue May 14, 2021

guide: add warning about sharing external outputs

4c96c21

per iterative/dvc#4520 (comment) rel #654 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support adding/transfering data straight to cache/remote #4520

support adding/transfering data straight to cache/remote #4520

efiop commented Sep 2, 2020

This comment has been minimized.

jorgeorpinel commented Nov 3, 2020 •

edited

Loading

This comment has been minimized.

isidentical commented Dec 14, 2020

efiop commented Dec 14, 2020 •

edited

Loading

isidentical commented Dec 16, 2020 •

edited

Loading

isidentical commented Jan 6, 2021

efiop commented Jan 6, 2021

efiop commented Jan 6, 2021 •

edited

Loading

isidentical commented Jan 6, 2021 •

edited

Loading

jorgeorpinel commented Jan 13, 2021 •

edited

Loading

jorgeorpinel commented Jan 13, 2021

jorgeorpinel commented Feb 2, 2021

efiop commented Feb 2, 2021

efiop commented Feb 2, 2021

isidentical commented Feb 2, 2021

jorgeorpinel commented Feb 2, 2021 •

edited

Loading

efiop commented Feb 2, 2021

jorgeorpinel commented Feb 3, 2021

efiop commented Feb 3, 2021 •

edited

Loading

jorgeorpinel commented Feb 4, 2021 •

edited

Loading

efiop commented Feb 4, 2021

alealv commented Feb 4, 2021 •

edited by jorgeorpinel

Loading

Using a pipeline

Having a `local remote` as `cache`

jorgeorpinel commented Feb 4, 2021

alealv commented Feb 5, 2021

jorgeorpinel commented Feb 7, 2021

alealv commented Feb 7, 2021

jorgeorpinel commented Feb 8, 2021

shcheklein commented Feb 8, 2021

support adding/transfering data straight to cache/remote #4520

support adding/transfering data straight to cache/remote #4520

Comments

efiop commented Sep 2, 2020

This comment has been minimized.

jorgeorpinel commented Nov 3, 2020 • edited Loading

This comment has been minimized.

isidentical commented Dec 14, 2020

efiop commented Dec 14, 2020 • edited Loading

straight-to-remote scenario:

straight-to-cache scenario:

isidentical commented Dec 16, 2020 • edited Loading

isidentical commented Jan 6, 2021

efiop commented Jan 6, 2021

efiop commented Jan 6, 2021 • edited Loading

isidentical commented Jan 6, 2021 • edited Loading

jorgeorpinel commented Jan 13, 2021 • edited Loading

jorgeorpinel commented Jan 13, 2021

jorgeorpinel commented Feb 2, 2021

efiop commented Feb 2, 2021

efiop commented Feb 2, 2021

isidentical commented Feb 2, 2021

jorgeorpinel commented Feb 2, 2021 • edited Loading

efiop commented Feb 2, 2021

jorgeorpinel commented Feb 3, 2021

efiop commented Feb 3, 2021 • edited Loading

jorgeorpinel commented Feb 4, 2021 • edited Loading

efiop commented Feb 4, 2021

alealv commented Feb 4, 2021 • edited by jorgeorpinel Loading

Tracking data directly

Using a pipeline

Having a local remote as cache

jorgeorpinel commented Feb 4, 2021

alealv commented Feb 5, 2021

jorgeorpinel commented Feb 7, 2021

alealv commented Feb 7, 2021

jorgeorpinel commented Feb 8, 2021

shcheklein commented Feb 8, 2021

jorgeorpinel commented Nov 3, 2020 •

edited

Loading

efiop commented Dec 14, 2020 •

edited

Loading

`straight-to-remote` scenario:

`straight-to-cache` scenario:

isidentical commented Dec 16, 2020 •

edited

Loading

efiop commented Jan 6, 2021 •

edited

Loading

isidentical commented Jan 6, 2021 •

edited

Loading

jorgeorpinel commented Jan 13, 2021 •

edited

Loading

jorgeorpinel commented Feb 2, 2021 •

edited

Loading

efiop commented Feb 3, 2021 •

edited

Loading

jorgeorpinel commented Feb 4, 2021 •

edited

Loading

alealv commented Feb 4, 2021 •

edited by jorgeorpinel

Loading

Having a `local remote` as `cache`