Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while converting xml to json #537

Closed
venkat-natsoft opened this issue Jul 18, 2020 · 9 comments · Fixed by #542
Closed

Issue while converting xml to json #537

venkat-natsoft opened this issue Jul 18, 2020 · 9 comments · Fixed by #542

Comments

@venkat-natsoft
Copy link

venkat-natsoft commented Jul 18, 2020

xml: Hypersegmented | Bld-Ser-Pla

Caller method:
JSONObject xmlJSONObj = XML.toJSONObject(xml, true);

Error:
Caused by: java.lang.NumberFormatException: For input string: "X7C"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[na:1.8.0_201]
at java.lang.Integer.parseInt(Integer.java:580) ~[na:1.8.0_201]
at java.lang.Integer.parseInt(Integer.java:615) ~[na:1.8.0_201]
at org.json.XML.unescape(XML.java:200) ~[json-20170516.jar:na]
at org.json.XML.parse(XML.java:398) ~[json-20170516.jar:na]
at org.json.XML.parse(XML.java:403) ~[json-20170516.jar:na]
at org.json.XML.toJSONObject(XML.java:485) ~[json-20170516.jar:na]

Error at calle:
at org.json.XML. public static String unescape(String string)

Error prone code piece
if (entity.charAt(1) == 'x') {
// hex encoded unicode
cp = Integer.parseInt(entity.substring(2), 16);
} else {
// decimal encoded unicode
cp = Integer.parseInt(entity.substring(1));
}

@stleary
Copy link
Owner

stleary commented Jul 18, 2020

Can you provide some context? Perhaps a small XML doc that demonstrates the problem?

@venkat-natsoft
Copy link
Author

venkat-natsoft commented Jul 20, 2020

"<tag/>Neutrophils.Hypersegmented &#X7C; Bld-Ser-Plas</tag>"
Full document url:
https://clinicaltrials.gov/ct2/show/NCT03874338?term=NCT02652975&rank=1&displayxml=true

@stleary
Copy link
Owner

stleary commented Jul 20, 2020

Is the problem that the XML-to-JSON parser does not properly handle hex-encoded XML escape sequences?

@johnjaylward
Copy link
Contributor

@stleary , I believe that @venkat-natsoft is pointing out that our parser is case sensitive when it should not be.

if (e.charAt(1) == 'x') {

Current code:

if (e.charAt(1) == 'x') {

Should be:

if (e.charAt(1) == 'x' || e.charAt(1) == 'X') {

@johnjaylward
Copy link
Contributor

@stleary, I'm not able to reproduce with just the

<tag>Neutrophils.Hypersegmented &#X7C; Bld-Ser-Plas</tag>

sample.

It appears we already lowercase the entity:

sb.append(Character.toLowerCase(c));

@stleary
Copy link
Owner

stleary commented Jul 20, 2020

Maybe OP is using an old JSON-Java release?
I get correct results as well, and don't see the reported exception.

@johnjaylward
Copy link
Contributor

johnjaylward commented Jul 20, 2020

ok, I used the full file, and got a slightly different issue:

org.junit.ComparisonFailure: string values should be equal for actual: Neutrophils.Hypersegmented &#X7C; Bld-Ser-Plas expected: Neutrophils.Hypersegmented | Bld-Ser-Plas expected:<...hils.Hypersegmented [|] Bld-Ser-Plas> but was:<...hils.Hypersegmented [&#X7C;] Bld-Ser-Plas>
	at org.junit.Assert.assertEquals(Assert.java:115)
	at org.json.junit.Util.compareActualVsExpectedObjects(Util.java:118)
	at org.json.junit.Util.compareActualVsExpectedJsonArrays(Util.java:53)
	at org.json.junit.Util.compareActualVsExpectedObjects(Util.java:95)
	at org.json.junit.Util.compareActualVsExpectedJsonObjects(Util.java:72)
	at org.json.junit.Util.compareActualVsExpectedObjects(Util.java:89)
	at org.json.junit.Util.compareActualVsExpectedJsonObjects(Util.java:72)
	at org.json.junit.XMLTest.testIssue537CaseSensitiveHexEscapeFullFile(XMLTest.java:939)

So for some reason it parsed it properly in the smaller sample, but in the larger sample it left it as the encoded entity.

johnjaylward pushed a commit to johnjaylward/JSON-java that referenced this issue Jul 20, 2020
@johnjaylward
Copy link
Contributor

johnjaylward commented Jul 20, 2020

ok, the issue with my test was that the source file had &amp;#X7c; in it, which when decoded becomes &#X7C; .

At this time, for random XML input I don't think we need any change. However, since XML.unescape is a public method which calls XMLTokener.unescapeEntity (non-public), I suggest we update the compare to be case insensitive at the source (XMLTockener) so that outside callers don't have issues.

If you agree, I'll make the PR. At this time I have a branch with tests showing that there is no error on parsing.

johnjaylward pushed a commit to johnjaylward/JSON-java that referenced this issue Jul 20, 2020
johnjaylward pushed a commit to johnjaylward/JSON-java that referenced this issue Jul 20, 2020
@venkat-natsoft
Copy link
Author

ok, I used the full file, and got a slightly different issue:

org.junit.ComparisonFailure: string values should be equal for actual: Neutrophils.Hypersegmented &#X7C; Bld-Ser-Plas expected: Neutrophils.Hypersegmented | Bld-Ser-Plas expected:<...hils.Hypersegmented [|] Bld-Ser-Plas> but was:<...hils.Hypersegmented [&#X7C;] Bld-Ser-Plas>
	at org.junit.Assert.assertEquals(Assert.java:115)
	at org.json.junit.Util.compareActualVsExpectedObjects(Util.java:118)
	at org.json.junit.Util.compareActualVsExpectedJsonArrays(Util.java:53)
	at org.json.junit.Util.compareActualVsExpectedObjects(Util.java:95)
	at org.json.junit.Util.compareActualVsExpectedJsonObjects(Util.java:72)
	at org.json.junit.Util.compareActualVsExpectedObjects(Util.java:89)
	at org.json.junit.Util.compareActualVsExpectedJsonObjects(Util.java:72)
	at org.json.junit.XMLTest.testIssue537CaseSensitiveHexEscapeFullFile(XMLTest.java:939)

So for some reason it parsed it properly in the smaller sample, but in the larger sample it left it as the encoded entity.

Exactly that's what happening with a smaller portion and the whole document. Please trace the issue .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants