Update ipld prime, use proper code-gen #5

hannahhoward · 2020-09-22T00:08:52Z

Goals

Get ipld-prime proto to a version of ipld-prime near master (specifically including byte buffer fix)

Implementation

The changes to code gen seemed to have gotten to the point where a more significant change to the generation script was needed, so I went ahead and implemented it, matching the styles used by the tests in node/gendemo of go-ipld-prime.

Then, since I was relying on private members of the node structures that had changed (#oops) I switched over to only accessing data through public methods.

The assembling of nodes for the PBNode decoder was EXTREMELY verbose without fluent, so I went ahead and used it along with fluent.Recover, hopefully in a way that wasn't too out of the design goals.

codecov · 2020-09-22T00:10:39Z

Codecov Report

Merging #5 into master will decrease coverage by 7.02%.
The diff coverage is 14.06%.

@@            Coverage Diff             @@
##           master       #5      +/-   ##
==========================================
- Coverage   23.22%   16.20%   -7.01%     
==========================================
  Files           7       11       +4     
  Lines         797     1945    +1148     
==========================================
+ Hits          185      315     +130     
- Misses        596     1591     +995     
- Partials       16       39      +23

Impacted Files	Coverage Δ
tBytes.go	`6.75% <6.75%> (ø)`
tInt.go	`12.36% <12.36%> (ø)`
tPBLink.go	`12.73% <12.73%> (ø)`
tPBNode.go	`13.00% <13.00%> (ø)`
tPBLinks.go	`13.13% <13.13%> (ø)`
tString.go	`13.19% <13.19%> (ø)`
tLink.go	`13.49% <13.49%> (ø)`
tRawNode.go	`14.61% <14.61%> (ø)`
coding.go	`76.48% <82.36%> (-6.24%)`	⬇️
node_builder_chooser.go	`81.82% <100.00%> (-1.51%)`	⬇️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38f5640...18d8669. Read the comment docs.

rvagg · 2020-09-22T03:46:31Z

Nice.

I'm currently trying to finalise the new JS dag-pb implementation adhering strictly to the schema and I'm keen to see if we can make the same happen here and I don't think this is quite there--I don't quite understand how ipld-prime deals with "absent" elements though, so maybe it is even with the "AssignX" calls.

Specifically, when decoding a block using go-merkledag the following should happen:

If Data is nil then it should be absent in the Data Model. Data can be a zero or more length byte array but a nil indicates its absence entirely which is valid.
If Links is nil or an empty array, it should be absent in the Data Model. It must have one or more elements. (This happens in PB, zero-length doesn't even encode, it's just absent, so you get that guaranteed on unmarshal).
If Name or Tsize come out as nil then they should be absent in the Data Model. Name can be zero or more characters long, Tsize can be 0 or other, just not nil as that indicates it's not present at all.

Further, Hash should be defined and be a valid CID, but it's probably not necessary to impose that strictly here yet.

I'm adding all of these details into the spec notes under the schema @ ipld/specs#297 and pointing to this repo as the new-generation DAG-PB implementation (even though it doesn't have a full builder impl yet).

I'd really like to make sure that the optional in the schema are actually treated as such but my grasp of ipld-prime isn't good enough to know from looking at this how that plays out. None of the properties can be Null (in the IsNull() sense) and I suspect that assigning a nil value might make it Null, but maybe it makes it absent? -- in that case, great!

warpfork · 2020-09-22T07:31:54Z

coding.go

-		if link.d.Name.Maybe == schema.Maybe_Value {
-			tmp := link.d.Name.Value.x
+		if link.FieldName().Exists() {
+			tmp := link.FieldName().Must().String()


I'd write a review about how all this causes unfortunate allocations, but... it's unavoidable in interacting with protobufs, as I understand it. Ah well.

(Fun fact if we ever come back to optimize this though, without just tearing deep through and doing the parse protobuf ourselves completely: I think we could bump these tmp things into a struct and make sure the whole thing escapes to heap at once, and it'll shave a constant factor off.)

warpfork · 2020-09-22T07:34:27Z

coding.go

-		if link.d.Hash.Maybe == schema.Maybe_Value {
-			cid := link.d.Hash.Value.x.(cidlink.Link).Cid
+		if link.FieldHash().Exists() {
+			cid := link.FieldHash().Must().Link().(cidlink.Link).Cid


~~If this field really is optional according to the schema, a Must here is fragile, isn't it?~~ This is fine. Exists is checked right above the Must. (Silly pinhole-optimizer brain of mine...) Leaving the comment here to remind myself I already checked this.

warpfork · 2020-09-22T08:16:08Z

To @rvagg 's comments about ensuring we understand and standardize some of this stuff so that it's behaviorally consistent with other IPLD libraries that handle dag-pb --

I think there's not anything drastically wrong going on here. In general, nil just plain isn't used in the any of the code around go-ipld-prime Node's, specifically to avoid this potential confusion: ipld.Null is a singleton ipld.Node, and so is ipld.Absent: and if an actual nil is floating around, we generally don't check for that at all and will panic as soon as we try to do something on it. And as I checked through the DecodeDagProto method in particular, there's no nil's wafting through there.
All the observations about how nil's in protobuf should be transformed to absence in IPLD Data Model are spot on, though.
... and I did have some questions about how that's going on, because it seems to be happening implicitly in some of the functions from the protobuf decoder. So maybe we should follow that a bit deeper.

I'll understand if @hannahhoward doesn't want to follow us on these details in this PR today though. Getting this repo this much closer to master on these interfaces is a big step forward and nice to do as quickly as possible.

warpfork

Hey this is kind of awesome.

Is it true that the generated files now have no manual patches inside them? If so that's super great -- means this'll be way easier to maintain going forward.

It also looks like the Encode* and Decode* methods attached to the types here are... not actually required to be attached to the types anymore: we could apply them on almost any ol' ipld.Node (except that a few shortcut methods are used that are specialized to the codegen types). So it would be pretty trivial to hoist this into a general codec function that people could use with other Node implementations if they really wanted to! That's cool. Not important to do today, of course. But super super cool that it's now just about trivial to take that step.

warpfork · 2020-09-22T07:37:31Z

coding.go

+// from RawNode__Builders
+func (nb *_RawNode__Builder) DecodeDagRaw(r io.Reader) error {
+	return fluent.Recover(func() {
+		fnb := fluent.WrapAssembler(nb)


Will definitely cause at least one extra alloc, and idk if it's paying for itself in simplicity added in this case. But I'm not complaining enough to demand a change, either. Probably not a big issue unless we profile it and find out that it is.

warpfork · 2020-09-22T07:50:16Z

coding.go

+	return fluent.Recover(func() {
+		fb := fluent.WrapAssembler(nb)
+		fb.CreateMap(0, func(fmb fluent.MapAssembler) {
+			fmb.AssembleEntry("Links").CreateList(len(pbn.Links), func(flb fluent.ListAssembler) {


So I think @rvagg 's comment about the links list here suggests there should be a branch on if len(pbn.Link) == 0 where we then skip assembling a data-model Links entry at all.

I can't decide if I think that's weird or not, though, now that I think about it. I'm hedging on if I think we should update our IPLD Schema describing this to just say that the list is required... and then if it's zero-length, sure, we can encode it to the proto wire format as missing.

That's a discussion that probably @rvagg and I should hash out on our specs documents about it though. If this is the behavior this code has had already, I think it's fine to keep it today. (I don't want to force you through testing a change like this in either direction today...)

warpfork · 2020-09-22T07:56:21Z

coding.go

-		hash, err := cid.Cast(link.GetHash())
+	return fluent.Recover(func() {
+		fb := fluent.WrapAssembler(nb)
+		fb.CreateMap(0, func(fmb fluent.MapAssembler) {


I'd mildly rather that these be -1 when they're saying "shrug idk deal with it". The contract about this isn't strict, but there are some logic branches that look at 0 as a "ah, expect 0" and look at -1 as "ah, take a guess". It's also helpful for reading.

... but none of those codepaths happen apply here in a codegen'd struct, so, this is a nit.

I need to improve the godoc on this, I've realized. Mea culpa.

warpfork · 2020-09-22T08:03:57Z

coding.go

-				},
-			},
+					if err != nil {
+						panic(fluent.Error{Err: fmt.Errorf("unmarshal failed. %v", err)})


@rvagg : I went deep on exactly what happens here if the protobuf has a nil value for hash, because of its relevance to our other discussions about documenting this behavior:

tl;dr: a lack of hash here is not actually tolerated by this code: we get an error returned here.

(In detail: cid.Cast passes the nil down until some varint code tries to read from it, and at that point we get cid.ErrVarintBuffSmall and that bubbles back up from here.)

So I think this is a pretty solid argument that we don't have to consider Hash an optional in our IPLD Schema describing this.

warpfork · 2020-09-22T08:07:03Z

coding.go

+					}
+					flb.AssembleValue().CreateMap(0, func(fmb fluent.MapAssembler) {
+						fmb.AssembleEntry("Hash").AssignLink(cidlink.Link{Cid: hash})
+						fmb.AssembleEntry("Name").AssignString(link.GetName())


I'm moderately confused what the protobuf library is doing here -- we had to hand the protobuf lib a pointer to the name string elsewhere, because it's option in protobuf's mind, right? But here it's returning a string (no pointer). Does this panic if the string is missing? Or silently coerce it to an empty string?

yea it silently coerces it tp empty string. Which is weird behavior except -- that's exactly what go-merkledag uses in in IPFS. My basic approach is follow as close as possible so as to not break things.

Okay, cool. I appreciate that, thanks for talking through it with us so we have it also as input to the other standardization quests @rvagg mentioned :)

warpfork · 2020-09-22T08:07:42Z

coding.go

+					flb.AssembleValue().CreateMap(0, func(fmb fluent.MapAssembler) {
+						fmb.AssembleEntry("Hash").AssignLink(cidlink.Link{Cid: hash})
+						fmb.AssembleEntry("Name").AssignString(link.GetName())
+						fmb.AssembleEntry("Tsize").AssignInt(int(link.GetTsize()))


Same question as above about name strings. Does this panic if the field isn't in the protobuf? Or silently coerce it to a zero?

rvagg · 2020-09-22T09:27:15Z

btw here's a deep-dive into the forms DAG-PB can take encoded and how go-merkledag represents it in the structs: ipfs/go-merkledag#58

So, there's an unfortunate behaviour with the getters in go-merkledag, they handle nil as a special case. I think we should just reach directly into the struct fields and pull them out and we can properly handle the nil case as omitted.

PBLink#GetTsize() returns 0 if Tsize is nil - it dereferences Tsize where it exists too so you lose the pointer
PBLink#GetName() returns "" if Name is nil - it also dereferences Name so you lose the pointer
PBNode#GetData(), PBNode#GetLinks() and PBLink#GetHash() do the right thing and expose the underlying value.

I see no good reason to avoid using the getters here unless there's a chance your PBLink or PBNode reference might itself be nil because the getters handle that case, but just check for that locally.

hannahhoward · 2020-09-22T19:12:09Z

re: detaching from types -- Encode does still use some of the Fields$$$ methods, so not quite -- but still very close. Probably it'd just be a bit more verbose.

feat(deps): update ipld prime, use proper code-gen

18d8669

warpfork reviewed Sep 22, 2020

View reviewed changes

warpfork approved these changes Sep 22, 2020

View reviewed changes

hannahhoward merged commit 1a794fc into master Sep 22, 2020

rvagg mentioned this pull request Jan 22, 2021

Replacing go-ipld-prime-proto ipld/go-codec-dagpb#6

Closed

aschmahmann mentioned this pull request Feb 18, 2021

Release v0.8.0 ipfs/kubo#7707

Closed

73 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ipld prime, use proper code-gen #5

Update ipld prime, use proper code-gen #5

hannahhoward commented Sep 22, 2020

codecov bot commented Sep 22, 2020

rvagg commented Sep 22, 2020

warpfork Sep 22, 2020

warpfork Sep 22, 2020

warpfork commented Sep 22, 2020

warpfork left a comment

warpfork Sep 22, 2020

warpfork Sep 22, 2020

warpfork Sep 22, 2020

warpfork Sep 22, 2020

warpfork Sep 22, 2020

warpfork Sep 22, 2020

hannahhoward Sep 22, 2020

warpfork Sep 23, 2020

warpfork Sep 22, 2020

rvagg commented Sep 22, 2020 •

edited

Loading

hannahhoward commented Sep 22, 2020

Update ipld prime, use proper code-gen #5

Update ipld prime, use proper code-gen #5

Conversation

hannahhoward commented Sep 22, 2020

Goals

Implementation

codecov bot commented Sep 22, 2020

Codecov Report

rvagg commented Sep 22, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

warpfork commented Sep 22, 2020

warpfork left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvagg commented Sep 22, 2020 • edited Loading

hannahhoward commented Sep 22, 2020

rvagg commented Sep 22, 2020 •

edited

Loading