Lists shared between graphs are not correctly serialized #357

lanthaler · 2014-07-23T05:58:22Z

We are encountering an issue when converting RDF Datasets to JSON-LD.

The problem is with blank nodes that are shared between graphs and lists.

In TriG (yes, this is a synthetic reduced test case that captures a
smaller example that might appear for real):

# Bnode references across graph and lists
PREFIX :        <http://www.example.com/>
PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

:G {
   # Written in short form it would be:
   # :z :q ("cell-A" "cell-B")
   # but we want to share the tail ("cell-B")

   :z :q _:z0 .

   _:z0   rdf:first "cell-A" .
   _:z0   rdf:rest  _:z1 .

   _:z1   rdf:first "cell-B" .
   _:z1   rdf:rest rdf:nil .
}

:G1 {
    # This references the tail  ("cell-B")
    :x :p _:z1 .
}

The triple in :G1 references into the list in :G.

But as we understand the conversion algorithm, section 4 only considers
each graph in turn and so does not see the cross graph sharing.

Is this a correct reading of the spec text?

Part 4 of the conversion algorithm has
"For each name and graph object in graph map: "

so 4.3.3.* walks back up the list in one graph only.

(Conversion generated by jsonld-java : it does not matter if compaction
is applied or not):

{
   "@graph" : [ {
     "@graph" : [ {
       "@id" : ":z",
       ":q" : {
         "@list" : [ "cell-A", "cell-B" ]
       }
     } ],
     "@id" : ":G"
   }, {
     "@graph" : [ {
       "@id" : ":x",
       ":p" : {
         "@id" : "_:b1"
       }
     } ],
     "@id" : ":G1"
   } ],
   "@context" : {
     "@base" : "http://www.example.com/",
     "" : "http://www.example.com/",
     "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   }
}

There is no _:b1 in :G to refer to because the algorith generated @list
and its implicit bNodes don't have labels.
This is a different dataset with no shared bNode.

If it is all the same graph (s/:G1/:G/), the RDF dataset structure is
correctly serialized.

Andy

The text was updated successfully, but these errors were encountered:

gkellogg · 2014-07-23T17:42:56Z

Thanks @afs, this appears to be a bug in the Serialize RDF as JSON-LD Algorithm in JSON-LD 1.0 Processing Algorithms and API [1]. In particular, step 4.3.3 checks to see that there are exactly 1 member of rdf:first and rdf:rest for a given node, that that node is a BNode and that the value of rdf:first is a BNode by looking in the usages member of the list node. The usages member is set up in step 3.5.8.2, but is done only within the context of a specific graph, and not the dataset.

One way to fix this would be to insert a step between steps 3 and 4 to consolidate the usage maps for nodes across the dataset, or to extract the usage collection outside of an individual graph context.

I think what needs to happen is to add new tests to the fromRdf test suite that cover this use case, propose and develop an update to the algorithm, and git buy in from the developer community that this is a reasonable fix, after which the algorithm update can be added to the errata for this document.

IMO, the appropriate solution will serialize to JSON-LD using rdf:first/rest/nil, rather than using the @list syntax, as that is the intention of step 4.3.3. As with Turtle lists, the JSON-lD @list structure is intended for fully conforming RDF Collections.

dlongley · 2014-07-23T18:49:55Z

+1 to what @gkellogg said.

…rent graphs. This addresses issue #357.

…ss graphs. This addresses issue #357.

gkellogg · 2014-07-27T22:01:59Z

I added a test, and updated spec/latest/json-ld-api/index.html; however, this currently won't show on the web site, as it redirects to the recommendation (through .htaccess, I believe).

I think we may want to remove that redirection, as the latest version does reference it, or we should come up with another location for edited specs, although the fact it's /latest seems fairly clear to me.

The fix to the fromRdf algorithm is probably not the most elegant, but it is simple and does the job.

- Addresses json-ld/json-ld.org#357.

lanthaler · 2014-07-30T01:51:36Z

This is a tricky question. It is certainly a bug in the sense that the result isn’t what we would like it to be. Nevertheless, the JSON-LD 1.0 spec algorithm is quite clear about this is supposed to be processed so all JSON-LD 1.0 conformant processors do exactly the same thing. Are we now changing this while the stable 1.0 spec says something else? Do we introduce a new processing mode (1.1? 2.0?)? I think we should really discuss this further on the mailing (sending mail in a minute).

gkellogg · 2014-07-30T02:27:32Z

It's an errata; we have to be able to fix obvious bugs, although the spec won't be revised until there's a new WG with that in its charter.

For my part, I think we should keep the "latest" versions on json-ld.org up to date with corrections. Changing these in no way invalidates the REC. AFAIK, there's still a CG, and the community can decide what to do with our version of the specs, as well as the other unreleased specs such as normalization and framing.

@gkellogg

The algorithm update in 3f31c3c doesn't fully address the issue. /cc @gkellogg @dlongley

This is related to #357

lanthaler · 2014-09-15T11:05:02Z

As fromRdf-0021 in fc0e6bf shows, the algorithm updates that have been made are not sufficient to fix this bug. I addressed it slightly differently in lanthaler/JsonLD@10a03f7#diff-86a207a446f1f808e29f4a32dba82be1R2305: Instead of storing usages for every node, I do that just for rdf:nil nodes. For all other nodes, the references are stored in the (what Gregg called) node usages map under a key graphName|subject|predicate. Please have a look and let me know if you are OK with that approach. If so, I'll go ahead and update the spec accordingly.

dlongley · 2014-09-15T18:27:45Z

I left some comments that may simplify things in your diff. I'd also prefer the @usages map to be declared separately from the graphs var in the algorithm spec text because it isn't a graph itself and doesn't belong in the graphs var, IMO. It may also confuse some people, despite the fact that it's unlikely to cause an actual issue.

Addresses spec bug: json-ld/json-ld.org#357.

dlongley · 2014-09-15T19:01:10Z

@lanthaler, If you look at digitalbazaar/jsonld.js@6f0cc8d, you can see a simplified implementation (see the referencedOnce map usage) via what I described in comments on your diff.

lanthaler · 2014-09-15T19:28:14Z

Thanks for the feedback @dlongley. I fully agree that in the spec we shouldn’t put this information in the graphs object. I just had a quick look at your changes in jsonld.js and was wondering whether it handles duplicate triples in the input RDF document (both in the same and in different graphs) properly!?

dlongley · 2014-09-15T19:39:59Z

It passes all the tests! :)

dlongley · 2014-09-15T19:45:28Z

Yeah, it might need to check the existing entry (for a match) before just setting it to false.

dlongley · 2014-09-15T20:01:45Z

@lanthaler, hmm, can an abstract RDF dataset have duplicate triples? Because our algorithm's input is an RDF dataset. My processor automatically strips out duplicate triples before it ever gets to the fromRDF algorithm.

lanthaler · 2014-09-15T20:06:06Z

No, an abstract dataset can’t, but a serialized one can. In the spec we do not filter duplicate triples as far as I remember (will check tomorrow) and thus we need to take that into consideration I’d say.

dlongley · 2014-09-15T20:13:39Z

What I'm saying is that by the time the dataset reaches the fromRDF algorithm, it has already been deserialized. My processor, for instance, has already parsed the nquads and removed any duplicates.

The fromRDF algorithm doesn't say anything about the serialization format of the RDF dataset, rather it's an abstract dataset at that point, isn't it? Do you think we should say, in the lead up to the algorithm, that any duplicate triples have already been removed in the RDF dataset, as a precaution?

dlongley · 2014-09-15T20:31:15Z

In the spec we say that the input to the fromRDF algorithm is an RDF Dataset (defined by RDF Contepts) which is:

An RDF dataset is a collection of RDF graphs

And an RDF graph is:

An RDF graph is a set of RDF triples.

A set cannot contain duplicates. Now, this doesn't rule out triples that appear in different graphs, but in that case, a @list shouldn't be used as in one of the tests you just added for this issue (IOW, the object therein should be counted as being referenced more than once). So I don't think we have any issue here -- other than deciding whether or not we feel the need to explicitly reiterate that RDF datasets can't have duplicates in the prose before the fromRDF algorithm.

gkellogg · 2014-09-15T20:53:03Z

My implementation also passes all these tests, and I did use a global node usages map. I tried to keep changes to the algorithm minimal, but perhaps this was too minimal. However, my implementation from which the algorithm updates were taken does pass these tests. In fact the amended text did say that the node usages map is initialize separate from the graphs. I'm a little unclear about exactly what the issue with the spec text is. Do you have an alternate wording?

gkellogg · 2018-08-01T22:32:15Z

Closed in favor of w3c/json-ld-api#13.

gkellogg · 2018-08-01T23:15:18Z

I agree with @dlongley that duplicates are handled by virtue of the RDF Dataset description, so the NQuads input would be normalized into a Dataset first, which would handle all triples. No need to duplicate this logic to deal with a serialization that may contain duplicates.

…graphs, as considered in json-ld/json-ld.org#357 (comment). Fixes #13.

lanthaler added spec-bug labels Jul 23, 2014

gkellogg added a commit that referenced this issue Jul 27, 2014

Added fromRdf-0020 to test for lists having nodes shared across diffe…

19b7667

…rent graphs. This addresses issue #357.

gkellogg added a commit that referenced this issue Jul 27, 2014

Update the JSON-LD-API with steps to check for list nodes shared acro…

3f31c3c

…ss graphs. This addresses issue #357.

gkellogg closed this as completed Jul 27, 2014

afs mentioned this issue Jul 28, 2014

Blank nodes across graphs / RDF lists jsonld-java/jsonld-java#118

Open

dlongley added a commit to digitalbazaar/jsonld.js that referenced this issue Jul 29, 2014

Fix bug w/blank nodes across graphs and RDF lists.

536da06

- Addresses json-ld/json-ld.org#357.

dlongley added a commit to digitalbazaar/pyld that referenced this issue Jul 29, 2014

Fix bug w/blank nodes across graphs and RDF lists.

f6a225b

- Addresses json-ld/json-ld.org#357.

dlongley added a commit to digitalbazaar/php-json-ld that referenced this issue Jul 29, 2014

Fix bug w/blank nodes across graphs and RDF lists.

964f2ae

- Addresses json-ld/json-ld.org#357.

lanthaler added a commit that referenced this issue Sep 15, 2014

Add new corner-case test for #357

fc0e6bf

The algorithm update in 3f31c3c doesn't fully address the issue. /cc @gkellogg @dlongley

lanthaler added a commit that referenced this issue Sep 15, 2014

Update SOTD of JSON-LD API spec

9478231

This is related to #357

lanthaler added a commit to lanthaler/JsonLD that referenced this issue Sep 15, 2014

Fix specification bug json-ld/json-ld.org#357

10a03f7

lanthaler reopened this Sep 15, 2014

dlongley added a commit to digitalbazaar/jsonld.js that referenced this issue Sep 15, 2014

Ensure non-rdf:nil node references are stored across graphs.

6f0cc8d

Addresses spec bug: json-ld/json-ld.org#357.

kazarena mentioned this issue Apr 1, 2015

Blank nodes across graphs / RDF lists kazarena/json-gold#1

Closed

gkellogg mentioned this issue Apr 11, 2015

fromRdf-021 still failing ruby-rdf/json-ld#16

Closed

gkellogg added the 1.1 label Sep 22, 2016

gkellogg added this to the JSON-LD.next milestone Sep 22, 2016

gkellogg removed the 1.1 label Oct 6, 2016

gkellogg mentioned this issue Aug 1, 2018

Lists shared between graphs are not correctly serialized w3c/json-ld-api#13

Closed

gkellogg closed this as completed Aug 1, 2018

gkellogg added a commit to w3c/json-ld-api that referenced this issue Aug 2, 2018

Update fromRdf algorithm to consider list nodes that are in multiple …

2a38c88

…graphs, as considered in json-ld/json-ld.org#357 (comment). Fixes #13.

gkellogg mentioned this issue Aug 2, 2018

Update fromRdf algorithm to consider list nodes that are in multiple graphs w3c/json-ld-api#15

Merged

gkellogg added a commit to w3c/json-ld-api that referenced this issue Aug 2, 2018

Update fromRdf algorithm to consider list nodes that are in multiple …

25c3828

…graphs, as considered in json-ld/json-ld.org#357 (comment). Fixes #13.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lists shared between graphs are not correctly serialized #357

Lists shared between graphs are not correctly serialized #357

lanthaler commented Jul 23, 2014

gkellogg commented Jul 23, 2014

dlongley commented Jul 23, 2014

gkellogg commented Jul 27, 2014

lanthaler commented Jul 30, 2014

gkellogg commented Jul 30, 2014

lanthaler commented Sep 15, 2014

dlongley commented Sep 15, 2014

dlongley commented Sep 15, 2014

lanthaler commented Sep 15, 2014

dlongley commented Sep 15, 2014

dlongley commented Sep 15, 2014

dlongley commented Sep 15, 2014

lanthaler commented Sep 15, 2014

dlongley commented Sep 15, 2014

dlongley commented Sep 15, 2014

gkellogg commented Sep 15, 2014

gkellogg commented Aug 1, 2018

gkellogg commented Aug 1, 2018

Lists shared between graphs are not correctly serialized #357

Lists shared between graphs are not correctly serialized #357

Comments

lanthaler commented Jul 23, 2014

gkellogg commented Jul 23, 2014

dlongley commented Jul 23, 2014

gkellogg commented Jul 27, 2014

lanthaler commented Jul 30, 2014

gkellogg commented Jul 30, 2014

lanthaler commented Sep 15, 2014

dlongley commented Sep 15, 2014

dlongley commented Sep 15, 2014

lanthaler commented Sep 15, 2014

dlongley commented Sep 15, 2014

dlongley commented Sep 15, 2014

dlongley commented Sep 15, 2014

lanthaler commented Sep 15, 2014

dlongley commented Sep 15, 2014

dlongley commented Sep 15, 2014

gkellogg commented Sep 15, 2014

gkellogg commented Aug 1, 2018

gkellogg commented Aug 1, 2018