-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up code in jena.ext.xerces #2828
Comments
Great analysis! Your plan looks good and doesn't need to be coupled to Jena6, nor deprecation classes.
It has been valuable to have all the details of datatypes implemented in an efficient manner. Jena forked off the Xerces codebase when it switched to using the JDK parser, or user choice of XML parser, and then Jena stopped shipping a Xerces jar and ICU4J (which is huge). There is a use of I've opened the task #2833 to extract the regex engine for SPARQL/ARQ's use. The only use in the datatypes area of I experimented by passing null from
The build passed just fine. |
@afs Thanks! That clears a lot of things up. I will then start making some PRs for this in my spare time. |
This is very much a case of "less is more". No rush, doesn't need to be all done at once. |
Issue: apache#2828 Remove the code for validating datatypes that according to the RDF 1.1 spec SHOULD NOT be used: QName, ENTITY, ID, IDREF, NOTATION, ENTITIES, NMTOKENS, IDREFS. Also remove code for validating XSD lists and unions.
Issue: apache#2828 Remove the code for validating datatypes that according to the RDF 1.1 spec SHOULD NOT be used: QName, ENTITY, ID, IDREF, NOTATION, ENTITIES, NMTOKENS, IDREFS. Also remove code for validating XSD lists and unions.
Issue: #2828 Remove the code for validating datatypes that according to the RDF 1.1 spec SHOULD NOT be used: QName, ENTITY, ID, IDREF, NOTATION, ENTITIES, NMTOKENS, IDREFS. Also remove code for validating XSD lists and unions.
@afs could you reopen this? GitHub closed this automatically, but it will need 1–2 more PRs to complete. I can't reopen this myself. |
Change
This is a continuation of #2797 – trimming out the bloat in datatyped literal validations.
That issue tackled the worst offender – allocating two hashmaps per each validated literal. This is however not the whole story. For every literal validation we still allocate a new
ValidationState
:jena/jena-core/src/main/java/org/apache/jena/datatypes/xsd/XSDDatatype.java
Lines 267 to 271 in c7eaf83
This makes no sense, because that object is never really mutated (or, to be precise, does not need to be mutable). At the same time, it has quite a few fields. Although the JVM can probably figure out how to handle this efficiently with escape analysis, this is still unnecessary bloat.
The entire
org.apache.jena.ext.xerces
package contains a lot of unused code carried over from xerces. A lot of the infrastructure is not needed in Jena, because the more complex XML features make no sense in the context of RDF literals. For example, a large part of the original job ofValidationState
was to check if ID and IDREF attributes are correct with respect to one another. This, along with a few other quirky XML thingies "SHOULD NOT" be used in RDF according to the spec, so I think we can safely remove this.The plan
ValidationState
is allocated once per literal, not once per document. And what would a "document" even mean in RDF?ValidationState
per Jena instance, or something to a similar effect.How do I do this?
What is the process for making breaking changes to Jena APIs? I assume that even if a public class is not used in the Jena codebase, it can be removed only in a MAJOR release. So, should I do something like:
?
Notes on unused code
ValidationState
are not used anywhere in the Jena codebase:set*
methods, exceptsetEntityState
, which is used only inValidationManager
, which is in turn an unused class.resetIDTables
,reset
,useNamespaces
ConfigurableValidationState
classDatatypeValidator
interfaceEntityState
interface – it is only used in unused methods ofValidationState
andValidationManager
SchemaDVFactory
methods:createTypeRestriction
,createTypeList
,createTypeUnion
– this refers to some advanced XSD features... I think? Anyway, this doesn't seem to be used in RDF.BaseSchemaDVFactory
andBaseDVFactory
) are, by extension, also unused.XSDDatatype
: there is a huge comment block section that probably should be removed. The inner static classXSDGenericType
is also unused.ValidatedInfo
methods:stringValue
,isComparable
,copyFrom
NamespaceContext
fields:XML_URI
,XMLNS_URI
and methodspushContext
,popContext
,declarePrefix
,getDeclaredPrefixCount
,getDeclaredPrefixAt
NamespaceContext
, but this must be always null.Are you interested in contributing a pull request for this task?
Yes
The text was updated successfully, but these errors were encountered: