Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-0445: Option to Skip Raw Blocks in Gateway Responses #445

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 32 additions & 4 deletions src/http-gateways/trustless-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,21 @@ editors:
- name: Marcin Rataj
github: lidel
url: https://lidel.org/
affiliation:
name: Protocol Labs
url: https://protocol.ai/
- name: Henrique Dias
github: hacdias
url: https://hacdias.com/
affiliation:
name: Protocol Labs
url: https://protocol.ai/
- name: Hugo Valtier
github: Jorropo
url: https://jorropo.net/
affiliation:
name: Protocol Labs
url: https://protocol.ai/
xref:
- url
- path-gateway
Expand Down Expand Up @@ -183,6 +195,22 @@ returned:
returned to the client, the HTTP status code has already been sent to the
client.

### :dfn[skip-raw-blocks] (request query parameter)
lidel marked this conversation as resolved.
Show resolved Hide resolved

The optional `skip-raw-blocks` parameter is available only for CAR requests.

It specifies whether blocks with the multicodec `raw` `0x55` MUST be present in
the CAR response.

It accepts two values:
- `y`: Blocks with `raw` multicodec MUST NOT be returned.
- `n`, or missing (unspecified): no-op, no special handling of `raw` blocks.

When not specified a gateway implementation MUST assume `n`.

A Gateway MUST return HTTP error 400 Bad Request when `skip-raw-blocks=y` is
sent for a content path with a root CID with the `raw` multicodec.

# HTTP Response

Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gateway].
Expand Down Expand Up @@ -212,10 +240,10 @@ The Body hash MUST match the Multihash from the requested CID.

# CAR Responses (application/vnd.ipld.car)

A CAR stream for the requested
[application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
content type (with optional `order` and `dups` params), path and optional
`dag-scope` and `entity-bytes` URL parameters.
A CAR stream ([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
with optional `order` and `dups` content type parameters) for the requested
content path (and optional `dag-scope`, `entity-bytes` and/or `skip-raw-blocks`
URL parameters).

## CAR version

Expand Down
176 changes: 176 additions & 0 deletions src/ipips/ipip-0445.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
---
title: "IPIP-0445: Option to Skip Raw Blocks in Gateway Responses"
date: 2023-10-09
ipip: open
editors:
- name: Hugo Valtier
github: Jorropo
url: https://jorropo.net/
affiliation:
name: Protocol Labs
url: https://protocol.ai/
- name: Marcin Rataj
github: lidel
url: https://lidel.org/
affiliation:
name: Protocol Labs
url: https://protocol.ai/
relatedIssues:
- https://github.com/ipfs/specs/issues/444
order: 445
tags: ['ipips']
---

## Summary

Introduce `skip-raw-blocks` flag for the :cite[trustless-gateway].

## Motivation

Allow clients to read a stream which only contain proofs in a bottom heavy
graph using `raw` codec for it's leaves.

Usefull for UnixFS for features like webseeds
([ipfs/specs#444](https://github.com/ipfs/specs/issues/444)), where metadata
about a DAG is fetched from a trustless gateway, but the actual raw data can be
fetched from any source that supports either trustless gateway specification,
or plain HTTP Range Requests, allowing for trustless and verifiable data
retrieval from plain HTTP (non-IPFS) data sources.

## Detailed design

The `skip-raw-blocks` URL query parameter on :cite[trustless-gateway]
allows clients to download an entity except blocks with the multicodec
`raw` (`0x55`).

- When set to `y`, the parameter instructs the gateway not to transmit
blocks referenced with a CID with the `raw` multicodec.
- If set to `n`, or left unspecified, there is no special handling of `raw`
multicodec blocks (the existing default behavior remains the same).

Importantly, unless explicitly specified as `y`, the default operational
mode of the gateway MUST assume the value of `skip-raw-blocks` to be `n`.

## Design rationale

### User Benefit

Implementing the `skip-raw-blocks` parameter offers several benefits to users:

1. **Verification Flexibility:** Clients can verify out-of-band (OOB) received
files in their deserialized form without necessitating the transmission of
raw blocks from the gateway.

2. **Incremental Download:** Clients can incrementally download files in
deserialized forms from non-IPFS servers. Allowing applications to share
distribution for IPFS and non-IPFS clients.

3. **Efficient Block Discovery:** With the `skip-raw-blocks` option enabled,
clients can quickly discover numerous candidate blocks without being
bottlenecked by the gateway's transmission of raw blocks.

4. **Non-IPFS HTTP Mirrors Become Useful:** Legacy data that is already exposed
over HTTP in deserialized form can now act as sources for specific block
byte ranges, without having to support any IPFS specific APIs. Plain HTTP
Range Requests can be used for fetching remaining raw block data, and the
metadata read via `skip-raw-blocks=y` is enough for a client to verify the
remaining raw block byte ranges fetched from non-IPFS system match expected
CIDs.
Comment on lines +72 to +78
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to hint at that in 1 Verification Flexibility, this text goes in much more detail and I think should be merged with the first entry.


### Compatibility

Setting the default value of the `skip-raw-blocks` parameter to `n` ensures
backward compatibility with existing clients and systems that are unaware
of this new flag.

### Alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ updated this section with paths not taken, lmk if anything requires more elaboration.


An alternative approach would be to request blocks individually.
However, it adds extra round trips and more per HTTP request overhead
and thus is undesirable.

#### Why not `dag-scope=skip-raw-blocks` ?

The existing `dag-scope` parameter determines the overall range of blocks to retrieve,
while `skip-raw-blocks` selectively filters specific blocks across all scopes and ranges.
Combining them under one parameter would restrict their combined utility.

For example:
- A client is streaming a video from a webseed and the user seeks through the
video, then the client would send `dag-scope=entity&entity-bytes=42:1337`
with `skip-raw-blocks=y` to download the proofs for the required section of the
video, and then fetches remaining raw data byte ranges from a faster CDN.
- A client is verifying an OOB transferred directory in deserialized form,
then `dag-scope=all` with `skip-raw-blocks=y` makes sense.

#### Why not CAR content type parameter ?

CAR content type's
([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car))
optional parameters like `order` and `dups` impact the way data is represented
when returned as a CAR stream, but does modify the scope of the data itself.
Does not add nor subtract data from the response.

The scope of the data is controlled by URL content path and optional
`dag-scope`, `entity-bytes` URL parameters. This is where `skip-raw-blocks`
belongs.

This is not just a matter of aesthetics: the URL path and query parameters
allow for caching of different subsets of a DAG in a way that is interoperable
with existing HTTP tools and clients, minimizes risk of caching incomplete DAG
response due to HTTP cache misconfiguration. Thanks to `skip-raw-blocks` being
in the URL query, we ensure CAR responses without `raw` blocks will be cached
under different key than full responses (just like already existing `dag-scope`
and `entity-bytes`).

#### Why not generic `skip-leaves` that skips all leaves, not just `raw` blocks?

Prevention of amplification attacks and efficient server operation.

By utilizing the `raw` (`0x55`) codec servers can trivially determine whether
to fetch or skip a block without having to fetch it to learn any new
information.

If we framed this feature around skipping all leaf nodes, that would require
server to fetch the leaves to learn if they have any child nodes. This would
force server to fetch data that is never returned to the client.

Although `skip-raw-blocks` is more limited and not able to handle UnixFS files
chunked without `--raw-leaves` option, it allows both the client and server to
trivially verify a block must not be fetched. Preventing issues of
Amplification where a server could need to fetch multiple orders more data than
the client when executing the request.

## Security

This IPIP does not impact security model of trustless gateway.

## Test fixtures

:::issue

TODO: update below section with CIDs or CARs from conformance tests

Scenarios we should check:
- [ ] request for `/ipfs/cid` where CID has `raw` codec MUST return HTTP 400 (Bad Request)
- [ ] reuse existing UnixFS DAG that has raw-leaves, request it with
`skip-raw-blocks=n`, confirm the response includes expected raw leaves' CIDs
- [ ] create a new CAR fixture that only have non-raw blocks. Request it with
`skip-raw-blocks=y`, confirm the response includes expected CIDs and does not
include raw blocks referenced by parents.
- important part is creating CAR fixture by hand, and ensure the raw blocks are
NEVER announced anywhere (generate fixture with random data, add to ipfs
with raw-leaves option, then export DAG without `raw` blocks (use go-car's
[`filter`](https://github.com/ipld/go-car/tree/master/cmd/car#readme) or
similar)
- Why? This goes extra mile, but ensures every conformant gateway
implementation is not doing useless work of fetching raw blocks which are
not required for fulfilling `skip-raw-blocks=y` requests). We did
similar thing for `entity-bytes` and it was the only way we could show
bugs in Saturn project's cache implementation at the time.

:::
Comment on lines +150 to +172
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jorropo this is the minimal set of tests I've identified for this IPIP, lmk if you think it is sufficient, or if we need more.

The way we did this in the past, was to update test fixtures section at the very end:

  1. create reference implementation in Boxo & Kubo PRs
  2. then creating a gateway-conformance PR with relevant fixtures that pass against branches from (2)
  3. after reference implementation and tests are merged in respective repos, we update IPIP with final CID or link to a CAR fixture (examples: 428, 412)


### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).