Skip to content

Commit

Permalink
guide: cleanup some md links (#2534)
Browse files Browse the repository at this point in the history
* ref: -c option typos

* start: typo
per #2507 (comment)

* test: md link style (1)

* guide: refactor md links in external data page

* start: undo typo fix

* ref: md ref link

* api: use ref links
  • Loading branch information
jorgeorpinel authored Jul 2, 2021
1 parent cb14045 commit dfdf445
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 29 deletions.
36 changes: 21 additions & 15 deletions content/docs/api-reference/get_url.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,22 +31,26 @@ specified by its `path` in a `repo` (<abbr>DVC project</abbr>), is stored.
The URL is formed by reading the project's
[remote configuration](/doc/command-reference/config#remote) and the `dvc.yaml`
or `.dvc` file where the given `path` is found (`outs` field). The schema of the
URL returned depends on the
[type](/doc/command-reference/remote/add#supported-storage-types) of the
`remote` used (see the [Parameters](#parameters) section).
URL returned depends on the [type][storage-types] of the `remote` used (see the
[Parameters](#parameters) section).

If the target is a directory, the returned URL will end in `.dir`. Refer to
[Structure of cache directory](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory)
and `dvc add` to learn more about how DVC handles data directories.
[Structure of cache directory] and `dvc add` to learn more about how DVC handles
data directories.

⚠️ This function does not check for the actual existence of the file or
directory in the remote storage.

💡 Having the resource's URL, it should be possible to download it directly with
an appropriate library, such as
[`boto3`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.download_fileobj)
or
[`paramiko`](https://docs.paramiko.org/en/stable/api/sftp.html#paramiko.sftp_client.SFTPClient.get).
an appropriate library, such as [`boto3`] or [`paramiko`].

[storage-types]: /doc/command-reference/remote/add#supported-storage-types
[structure of cache directory]:
/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory
[`boto3`]:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.download_fileobj
[`paramiko`]:
https://docs.paramiko.org/en/stable/api/sftp.html#paramiko.sftp_client.SFTPClient.get

## Parameters

Expand Down Expand Up @@ -88,21 +92,23 @@ The script above prints
`https://remote.dvc.org/dataset-registry/a3/04afb96060aad90176268345e10355`

This URL represents the location where the data is stored, and is built by
reading the corresponding `.dvc` file
([`get-started/data.xml.dvc`](https://github.com/iterative/dataset-registry/blob/master/get-started/data.xml.dvc))
where the `md5` file hash is stored,
reading the corresponding `.dvc` file ([`get-started/data.xml.dvc`]) where the
`md5` file hash is stored,

```yaml
outs:
- md5: a304afb96060aad90176268345e10355
path: get-started/data.xml
```

and the project configuration
([`.dvc/config`](https://github.com/iterative/dataset-registry/blob/master/.dvc/config))
where the remote URL is saved:
and the project configuration ([`.dvc/config`]) where the remote URL is saved:

```ini
['remote "storage"']
url = https://remote.dvc.org/dataset-registry
```

[`.dvc/config`]:
https://github.com/iterative/dataset-registry/blob/master/.dvc/config
[`get-started/data.xml.dvc`]:
https://github.com/iterative/dataset-registry/blob/master/get-started/data.xml.dvc
3 changes: 3 additions & 0 deletions content/docs/command-reference/destroy.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ set to an
in your project, DVC will replace them with the latest versions of the actual
files and directories first, so that your data is intact after destruction.

[external cache]:
/doc/use-cases/shared-development-server#configure-the-external-shared-cache

> Refer to [Project Structure](/doc/user-guide/project-structure) for more
> details on the directories and files deleted by this command.
Expand Down
29 changes: 15 additions & 14 deletions content/docs/user-guide/managing-external-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@

> ⚠️ This is an advanced feature for very specific situations and not
> recommended except if there's absolutely no other alternative. In most cases
> alternatives like the
> [to-cache](/doc/command-reference/add#example-transfer-to-the-cache) or
> [to-remote](/doc/command-reference/add#example-transfer-to-remote-storage)
> strategies of `dvc add` and `dvc import-url` are more convenient. **Note**
> that external outputs are not pushed or pulled from/to
> [remote storage](/doc/command-reference/remote).
> alternatives like the [to-cache] or [to-remote] strategies of `dvc add` and
> `dvc import-url` are more convenient. **Note** that external outputs are not
> pushed or pulled from/to [remote storage].
[to-cache]: /doc/command-reference/add#example-transfer-to-the-cache
[to-remote]: /doc/command-reference/add#example-transfer-to-remote-storage
[remote storage]: /doc/command-reference/remote

There are cases when data is so large, or its processing is organized in such a
way, that its impossible to handle it in the local machine disk. For example
Expand Down Expand Up @@ -39,16 +40,17 @@ their remote URLs or external paths to `dvc add`, or put them in `dvc.yaml`
> external cache, because it may cause data collisions: the hash of an external
> output could collide with that of a local file with different content.
> Note that [remote storage](/doc/command-reference/remote) is a different
> feature.
> Note that [remote storage] is a different feature.
## Setting up an external cache

DVC requires that the project's <abbr>cache</abbr> is configured in the same
external location as the data that will be tracked (external outputs). This
avoids transferring files to the local environment and enables
[file linking](/doc/user-guide/large-dataset-optimization) within the external
storage.
avoids transferring files to the local environment and enables [file links]
within the external storage.

[file links]:
/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache

As an example, let's create a directory external to the workspace and set it up
as cache:
Expand Down Expand Up @@ -183,9 +185,8 @@ custom cache location for local paths outside of your project.

> Except for external data on different storage devices or partitions mounted on
> the same file system (e.g. `/mnt/raid/data`). In that case please setup an
> external cache in that same drive to enable
> [file links](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
> and avoid copying data.
> external cache in that same drive to enable [file links] and avoid copying
> data.
```dvc
$ dvc add --external /home/shared/existing-data
Expand Down

0 comments on commit dfdf445

Please sign in to comment.