RFC: Link encoding in IPLD #70

mikeal · 2018-08-28T21:14:23Z

This is a bit different than what we initially discussed in ipld/ipld#44

After implementing dag-json I felt comfortable enough writing up a solid set of recommendations for codec implementations.

I think this strikes the right balance of flexibility and interoperability. It avoids restricting a developers ability to use language and encoding features but still requires enough support for a canonical serialization that we can trans-encode nodes between codecs.

Stebalien

I'm really not happy baking this into IPLD. {'/': ...} was a hack to get JSON working.

This will need some strong arguments/motivations.

Stebalien · 2018-08-29T03:16:27Z

Links.md

+--------------------+             +---------------------+
+```
+
+A codec may represent object types and tree structures any way it wishes.


"s/codec/format"?

Stebalien · 2018-08-29T03:17:04Z

Links.md

+etc) or even new custom serializations. We will refer to this as the
+**representation**.
+
+Therefor, a **format** is the standardized representation of IPLD Links and Paths in a given **representation**.


Currently, the "format" is describes how to translate between structured data and binary.

Stebalien · 2018-08-29T03:22:09Z

Links.md

+# Canonical Link Representation
+
+Codec **serializers** MUST reserve the following canonical
+representation of link encoding. The canonical representation is an object with a single key of `"/"` and a base encoded string of the link's CID.


We originally said that no objects can have slashes in keys (keys must be valid path components) but backed off when we realized that wasn't going to work. At this point, I'm not sure if we can introduce a restriction like this. CBOR objects definitely can have a single "/" keys.

We really do need to sit down and think through what can and can't go into an IPLD object because I think we're getting closer and closer to "everything goes". That might be fine but we need to address it explicitly...

Also note, this was never intended to be the canonical representation. It was a hack to get JSON working.

I was somewhat aware of the history. But I think that we do need some form of canonical representation that can be represented in pure JSON in order to open a path for people to encode objects from one format to another.

However, I don't think, and am actively trying to change in dag-cbor, the default use of the canonical representation in the deserializer. It's a horrible pain to work with and, while I want to reserve it for interop, I don't want it to be in common use but instead buried in the implementations.

We originally said that no objects can have slashes in keys (keys must be valid path components) but backed off when we realized that wasn't going to work

It is worth nothing that CID encoded in base64 will have slashes in them. I am wondering it was really a good idea to allow that encoding, it will mess up paths like /ipfs/<base 64 cid>/file.txt.

Stebalien · 2018-08-29T03:40:45Z

So, it feels like this is trying to work around the fact that js-ipld has a single dag.put(anythingGoes) function. That is a nice function but I wonder if there's a better solution.

In go, we have typed nodes. dag.put(...) takes typed node (with a codec attached, etc.) and moves on from there. JavaScript could also do that. That is, one could say that untyped nodes are assumed to be raw objects. If the node has bytes() and cid() methods (on the prototype), the dag will use those instead (there may be more "javascripty" ways to do this).

mikeal · 2018-08-29T17:39:23Z

So, it feels like this is trying to work around the fact that js-ipld has a single dag.put(anythingGoes) function. That is a nice function but I wonder if there's a better solution.

I'm not actually thinking much about the dag.put() function right now. Most of what I've been doing lately is writing graphs with a whole bunch of nodes in memory and then dumping them all out to the block store.

What I'm mostly thinking about is how to define interop between implementations, specifically dag-json and dag-cbor, and how we can create better APIs for creating and working with nodes.

In go, we have typed nodes. dag.put(...) takes typed node (with a codec attached, etc.) and moves on from there. JavaScript could also do that. That is, one could say that untyped nodes are assumed to be raw objects. If the node has bytes() and cid() methods (on the prototype), the dag will use those instead (there may be more "javascripty" ways to do this).

What I've tried to do is leave the door totally open to people doing this kind of stuff in the serializer/deserializer. What I'm not comfortable with is defining the types that must be used in a particular language or serializer/deserialzer. There's a whole lot of preferences and opinions I'd prefer to just not step on or potentially exclude.

I really don't like working with the {"/": cid-string} representation but I find that it does provide a very nice compatibility vector across implementations without having to use a custom object or type and force all the implementations to use it.

For instance, dag-json's deserializer uses CID instances for links but provides a stringify() function that returns stringified JSON using the canonical representation. It's nice to know that I can always do something along these lines:

let node = dagJSON.from(block || buffer)
let transcoded = dagSomeFormat.serialize(JSON.parse(dagJSON.stringify(node)))

That only works if we have some canonical representation each serializer/deserializer has reserved. If we want to just completely give up on that, we can, but we won't have a good way to transcode nodes.

mikeal · 2018-08-29T17:46:48Z

Pushed some fixes for the other comments.

I also removed the yaml example because I find that it just complicates the messaging. The purpose of the form reservation isn't for expressing in the DSL but for expression in code between codecs.

warpfork · 2018-08-30T18:34:46Z

and how we can create better APIs for creating and working with nodes.

FWIW, on that front: I've been playing with some fresh takes on go-ipld APIs in a little sandbox off to the side, and one of the ideas I'm playing with that might have merit turned up these ideas:

ipldcbor.Node is implements an interface, has roughly what you'd expect
ipldbind.Node implements that same interface, and works by binding to an existing Go type.
- it can be traversed for reading, like other nodes...
- attempts to mutate it via the Node API might work, or might be rejected: you won't be capable of putting an int into a field that's of string type on the struct{..} that's bound, for obvious reasons.
- and most interestingly... it doesn't have a serializable form. You'd have to convert it to another kind of Node which is serializable if you wanted to generate a CID for it.

Now, like I said, that's just in a little toy experiment somewhere, and I don't actually know if it's a good idea. But maybe it's interesting food for thought, as another example of how {the way we operate on the data} versus {the codec we use for the data and hashing it} can be distinct.

/2c, I'll go back to lurking now :)

mikeal · 2018-09-10T21:59:23Z

I don't think that @Stebalien and @diasdavid are in alignment about the future of the canonical JSON representation. @diasdavid could you please weigh in so that we can move forward.

daviddias

Overall I'm onboard but I would like to see some examples to ensure that we and our future selfs are on the same page.

daviddias · 2018-09-22T22:00:58Z

Links.md

+implementation of `dag-json` includes a method called `stringify()` which
+returns a standard JSON string with links encoded in the canonical format.
+This makes trans-encoding of nodes into other formats much easier since
+they are required to accept the canonical format.


@mikeal can you add a few examples to this RFC that show how objects with links will be serialized and deserialized (and then again serialized and deserialized) by the dag-json and dag-cbor formats?

It will provide a ton of clarity to implementers and users and what is the expected behavior and how dag-json differs from dag-cbor and just plain JavaScript objects (!== JSON).

mikeal · 2018-09-24T18:25:58Z

Ok, I think we're closer to alignment now, but after some recent conversations I'm thinking about re-naming/re-scoping this document.

Essentially, what we care about here a JSON representation that can be used to convert between implementations. It's not just about links, we may want to reserve space for converting between other types in the future. To that end, I'd like to re-name to something like "Canonical JSON Representation" and also take a crack at standardizing a binary form, possibly something along the lines of {"/":{type: "binary", "base64": base64}}.

Stebalien · 2018-09-26T01:23:26Z

let node = dagJSON.from(block || buffer)
let transcoded = dagSomeFormat.serialize(JSON.parse(dagJSON.stringify(node)))

The issue here is that you're using the normal JSON parser. A dagJSON deserializer should, IMO, turn the CIDs into a special link type that the dagSomeFormat serializer would understand.

One could write:

let serialized = dagJSON.stringify({
  "thing": new Cid("QmId"),
})
assert(dagJSON.parse(serialized)["thing"] instanceof Cid)
let transcoded = dagSomeFormat.serialize(dagJSON.parse(serialized))

(Cid could even have a toJSON method that converts it to {"/": ...})

That only works if we have some canonical representation each serializer/deserializer has reserved. If we want to just completely give up on that, we can, but we won't have a good way to transcode nodes.

I see. So you're not saying that the format necessarily needs to use this, just that if I hand {"/": Cid} to, e.g. the CBOR serializer, it should turn it into a normal CID?

This is really looking like a JavaScript UX issue, not an IPLD format issue. We do need a consistent way to represent IPLD objects in-memory in javascript, but that doesn't have to conform to the DagJSON.

Stebalien · 2018-09-26T03:18:54Z

At the end of the day, my objection to this is in the motivation. If we had an "IPLD needs this" motivation and we couldn't find a reasonable alternative, I'd be fine with it (albeit really unhappy as it (the {"/": ...} syntax) really is an ugly hack). However, the current motivation is ~~"JavaScript wants this"~~ "JavaScript developers expect to work with JSON". That is, JavaScript developers expect the following workflow:

Fetch a JSON blob from some API endpoint.
Deserialize the JSON blob with JSON.parse.
Stick the JSON blob into some datastore/database.

The catch is that we're working with DagJSON, not JSON.

Brainstorming solutions:

Bare JavaScript objects use the DagJSON format. To use a "/" key, one would have to write something like: new Node({"/": "not a cid", "link": new Link("QmId...")}). The dag would have to detect if a node is a raw javascript object and handle it appropriately.
Move away from JSON. That is, come up with an IPLD textual format. This will probably just piss off users but it'll definitely get rid of the confusion.
Teach users to use DagJSON.parse(...). Kind of a footgun but possible.

Personally, I prefer option 1 but there are probably more we haven't considered.

FYI:

still requires enough support for a canonical serialization that we can trans-encode nodes between codecs.

In general, we still won't be able to transcode between formats until we get a type system. There was an endeavor to try to find a set of primitives to allow for this (see: #56) but this hit a dead-end (see the comment I just added). Basically, we agreed on a set of primitives and then realized that they wouldn't quite cut it, rinse, repeat, until we realized it just wasn't going to work. Unfortunately, without a concrete set of primitives, translating between formats isn't going to happen.

mikeal · 2018-09-26T16:28:20Z

The issue here is that you're using the normal JSON parser.

You're right, this is my mistake and I shouldn't have done this.

Rather, what this should be is something close to a standard toJSON() method, which does not return a string but instead returns a value encoded only into native types that can be encoded into JSON.

I see. So you're not saying that the format necessarily needs to use this, just that if I hand {"/": Cid} to, e.g. the CBOR serializer, it should turn it into a normal CID?

Exactly. How the codec decides to encode links is completely at the codec's discretion. The codec is also free to take any object it could interpret as a Link and encode it into the Link format it chooses. All we're asking is that, if any codec serializer see's this representation {"/": String} it should also interpret it as a Link and encode it into the standard internal format that serializer is using for Links.

To recap:

We are not dictating a codec's internal representation.
We are not limiting the forms a codec serializer interprets as a Link.
We are not dictating the form a codec deserializer uses for Links.
We are asking that codec serializers interpret a particular expression of a Link in simple types, {"/": String}, as a Link.

However, because we are reserving the interpretation of this form in the serializer it will necessarily make it impossible to use the same form to represent something that is not a Link.

However, the current motivation is "JavaScript wants this" "JavaScript developers expect to work with JSON".

Again, my apologies for relying on the standard parser in my example.

I think that this use case, transcoding nodes from one codec to another, is a broader need than just in JS.

The closest thing we have to a cross-language basic type system is JSON. Every language supports JSON and has a way to represent JSON types as types native in that language and encode those same types back into JSON. In a way, this isn't actually a "canonical JSON representation" it's a "canonical simple types representation." We're saying, "the language you write a serializer in will support these basic types, please interpret this encoding of links in those simple types as a link."

If we said that "IPLD Types: Level 0" is just the types that are in JSON, we would describe this with language along the lines of "this is how you describe a Link in strictly L0 types."

I hope that clears things up. This conversation makes it clear that this particular document needs to be scrapped as the approach the document has taken is confusing.

Instead, I'm going to define IPLD terminology generally in a document, which will include the definitions at the top of this document related to codec serializers and formats. Once that lands I'll take a pass at an RFC for how to support transcoding links (and maybe binary).

mikeal · 2018-09-28T18:22:27Z

Closing.

Canonical representations are out. ipld/ipld#50

spec: adding link encoding RFC

cec230e

mikeal mentioned this pull request Aug 28, 2018

dag-json VS dag-cbor ipld/ipld#44

Closed

Stebalien suggested changes Aug 29, 2018

View reviewed changes

fix: suggestions from @Stebalien and removal of yaml example

bd12835

mikeal mentioned this pull request Sep 10, 2018

BREAKING CHANGE: Serialize and de-serialize links to CID instances. ipld/js-ipld-dag-cbor#76

Merged

mikeal mentioned this pull request Sep 13, 2018

What is the purpose of toJSON()? multiformats/js-cid#57

Open

mikeal mentioned this pull request Sep 21, 2018

[WIP] Draft Specification ipld/legacy-unixfs-v2#12

Merged

kevina mentioned this pull request Sep 22, 2018

Should the JSON represenetation of CID always be formatted as links. ipfs/go-cid#76

Open

daviddias suggested changes Sep 22, 2018

View reviewed changes

mikeal closed this Sep 28, 2018

pepoospina mentioned this pull request Nov 25, 2019

Add multicodec to the validation of objects hashes. Possibly extend multicodec to handle holochain entries. uprtcl/spec#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Link encoding in IPLD #70

RFC: Link encoding in IPLD #70

mikeal commented Aug 28, 2018

Stebalien left a comment

Stebalien Aug 29, 2018

Stebalien Aug 29, 2018 •

edited

Loading

Stebalien Aug 29, 2018

Stebalien Aug 29, 2018

mikeal Aug 29, 2018

kevina Sep 22, 2018

Stebalien commented Aug 29, 2018

mikeal commented Aug 29, 2018

mikeal commented Aug 29, 2018

warpfork commented Aug 30, 2018

mikeal commented Sep 10, 2018

daviddias left a comment

daviddias Sep 22, 2018

mikeal commented Sep 24, 2018 •

edited

Loading

Stebalien commented Sep 26, 2018

Stebalien commented Sep 26, 2018 •

edited

Loading

mikeal commented Sep 26, 2018

mikeal commented Sep 28, 2018

RFC: Link encoding in IPLD #70

RFC: Link encoding in IPLD #70

Conversation

mikeal commented Aug 28, 2018

Stebalien left a comment

Choose a reason for hiding this comment

Stebalien Aug 29, 2018

Choose a reason for hiding this comment

Stebalien Aug 29, 2018 • edited Loading

Choose a reason for hiding this comment

Stebalien Aug 29, 2018

Choose a reason for hiding this comment

Stebalien Aug 29, 2018

Choose a reason for hiding this comment

mikeal Aug 29, 2018

Choose a reason for hiding this comment

kevina Sep 22, 2018

Choose a reason for hiding this comment

Stebalien commented Aug 29, 2018

mikeal commented Aug 29, 2018

mikeal commented Aug 29, 2018

warpfork commented Aug 30, 2018

mikeal commented Sep 10, 2018

daviddias left a comment

Choose a reason for hiding this comment

daviddias Sep 22, 2018

Choose a reason for hiding this comment

mikeal commented Sep 24, 2018 • edited Loading

Stebalien commented Sep 26, 2018

Stebalien commented Sep 26, 2018 • edited Loading

mikeal commented Sep 26, 2018

mikeal commented Sep 28, 2018

Stebalien Aug 29, 2018 •

edited

Loading

mikeal commented Sep 24, 2018 •

edited

Loading

Stebalien commented Sep 26, 2018 •

edited

Loading