-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lists shared between graphs are not correctly serialized #357
Comments
Thanks @afs, this appears to be a bug in the Serialize RDF as JSON-LD Algorithm in JSON-LD 1.0 Processing Algorithms and API [1]. In particular, step 4.3.3 checks to see that there are exactly 1 member of rdf:first and rdf:rest for a given node, that that node is a BNode and that the value of rdf:first is a BNode by looking in the One way to fix this would be to insert a step between steps 3 and 4 to consolidate the usage maps for nodes across the dataset, or to extract the usage collection outside of an individual graph context. I think what needs to happen is to add new tests to the fromRdf test suite that cover this use case, propose and develop an update to the algorithm, and git buy in from the developer community that this is a reasonable fix, after which the algorithm update can be added to the errata for this document. IMO, the appropriate solution will serialize to JSON-LD using rdf:first/rest/nil, rather than using the @list syntax, as that is the intention of step 4.3.3. As with Turtle lists, the JSON-lD @list structure is intended for fully conforming RDF Collections. |
+1 to what @gkellogg said. |
…rent graphs. This addresses issue #357.
…ss graphs. This addresses issue #357.
I added a test, and updated spec/latest/json-ld-api/index.html; however, this currently won't show on the web site, as it redirects to the recommendation (through .htaccess, I believe). I think we may want to remove that redirection, as the latest version does reference it, or we should come up with another location for edited specs, although the fact it's /latest seems fairly clear to me. The fix to the fromRdf algorithm is probably not the most elegant, but it is simple and does the job. |
This is a tricky question. It is certainly a bug in the sense that the result isn’t what we would like it to be. Nevertheless, the JSON-LD 1.0 spec algorithm is quite clear about this is supposed to be processed so all JSON-LD 1.0 conformant processors do exactly the same thing. Are we now changing this while the stable 1.0 spec says something else? Do we introduce a new processing mode (1.1? 2.0?)? I think we should really discuss this further on the mailing (sending mail in a minute). |
It's an errata; we have to be able to fix obvious bugs, although the spec won't be revised until there's a new WG with that in its charter. For my part, I think we should keep the "latest" versions on json-ld.org up to date with corrections. Changing these in no way invalidates the REC. AFAIK, there's still a CG, and the community can decide what to do with our version of the specs, as well as the other unreleased specs such as normalization and framing. |
As |
I left some comments that may simplify things in your diff. I'd also prefer the |
Addresses spec bug: json-ld/json-ld.org#357.
@lanthaler, If you look at digitalbazaar/jsonld.js@6f0cc8d, you can see a simplified implementation (see the |
Thanks for the feedback @dlongley. I fully agree that in the spec we shouldn’t put this information in the graphs object. I just had a quick look at your changes in jsonld.js and was wondering whether it handles duplicate triples in the input RDF document (both in the same and in different graphs) properly!? |
It passes all the tests! :) |
Yeah, it might need to check the existing entry (for a match) before just setting it to false. |
@lanthaler, hmm, can an abstract RDF dataset have duplicate triples? Because our algorithm's input is an RDF dataset. My processor automatically strips out duplicate triples before it ever gets to the fromRDF algorithm. |
No, an abstract dataset can’t, but a serialized one can. In the spec we do not filter duplicate triples as far as I remember (will check tomorrow) and thus we need to take that into consideration I’d say. |
What I'm saying is that by the time the dataset reaches the fromRDF algorithm, it has already been deserialized. My processor, for instance, has already parsed the nquads and removed any duplicates. The fromRDF algorithm doesn't say anything about the serialization format of the RDF dataset, rather it's an abstract dataset at that point, isn't it? Do you think we should say, in the lead up to the algorithm, that any duplicate triples have already been removed in the RDF dataset, as a precaution? |
In the spec we say that the input to the fromRDF algorithm is an RDF Dataset (defined by RDF Contepts) which is:
And an RDF graph is:
A set cannot contain duplicates. Now, this doesn't rule out triples that appear in different graphs, but in that case, a |
My implementation also passes all these tests, and I did use a global node usages map. I tried to keep changes to the algorithm minimal, but perhaps this was too minimal. However, my implementation from which the algorithm updates were taken does pass these tests. In fact the amended text did say that the node usages map is initialize separate from the graphs. I'm a little unclear about exactly what the issue with the spec text is. Do you have an alternate wording? |
Closed in favor of w3c/json-ld-api#13. |
I agree with @dlongley that duplicates are handled by virtue of the RDF Dataset description, so the NQuads input would be normalized into a Dataset first, which would handle all triples. No need to duplicate this logic to deal with a serialization that may contain duplicates. |
…graphs, as considered in json-ld/json-ld.org#357 (comment). Fixes #13.
…graphs, as considered in json-ld/json-ld.org#357 (comment). Fixes #13.
_Reported by @afs:_
We are encountering an issue when converting RDF Datasets to JSON-LD.
The problem is with blank nodes that are shared between graphs and lists.
In TriG (yes, this is a synthetic reduced test case that captures a
smaller example that might appear for real):
The triple in :G1 references into the list in :G.
But as we understand the conversion algorithm, section 4 only considers
each graph in turn and so does not see the cross graph sharing.
Is this a correct reading of the spec text?
Part 4 of the conversion algorithm has
"For each name and graph object in graph map: "
so 4.3.3.* walks back up the list in one graph only.
(Conversion generated by jsonld-java : it does not matter if compaction
is applied or not):
There is no _:b1 in :G to refer to because the algorith generated @list
and its implicit bNodes don't have labels.
This is a different dataset with no shared bNode.
If it is all the same graph (s/:G1/:G/), the RDF dataset structure is
correctly serialized.
Andy
The text was updated successfully, but these errors were encountered: