Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: fix and optimize marshal for quoted string #34127

Closed
wants to merge 1 commit into from

Conversation

breml
Copy link
Contributor

@breml breml commented Sep 5, 2019

Since Go 1.2 every string can be marshaled to JSON without error even if it
contains invalid UTF-8 byte sequences. Therefore there is no need to use
Marshal again for the only reason of enclosing the string in double quotes.
Not using Marshal here also removes the error check as there has not been a
way for Marshal to fail anyway.

name old time/op new time/op delta
Issue34127-4 360ns ± 3% 200ns ± 3% -44.56% (p=0.008 n=5+5)

name old alloc/op new alloc/op delta
Issue34127-4 56.0B ± 0% 40.0B ± 0% -28.57% (p=0.008 n=5+5)

name old allocs/op new allocs/op delta
Issue34127-4 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.008 n=5+5)

Fixes #34154

@googlebot googlebot added the cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change. label Sep 5, 2019
@gopherbot
Copy link
Contributor

This PR (HEAD: d4bf400) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/193604 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Daniel Martí:

Patch Set 1:

Any optimization must include benchmark numbers via benchstat. For example, look at recent speed-ups in the json package. BenchmarkCodeEncoder might be a good start.


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@breml
Copy link
Contributor Author

breml commented Sep 6, 2019

@mvdan Thanks for your hint about the benchmarks. From my point of view, the change was not primary driven by performance optimization in mind, this came more as a welcome side effect.

My motivation for the proposed change is:

  1. remove the unnecessary error case
  2. fix encoding/json: string option (struct tag) on string field with SetEscapeHTML(false) escapes anyway #34154

Nevertheless I will try to have a look at benchstat and provide some benchmark statistics for the improved performance as well.

@breml breml force-pushed the optimize-json-quoted-string branch from d4bf400 to d417e71 Compare September 6, 2019 19:57
@gopherbot
Copy link
Contributor

This PR (HEAD: d417e71) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/193604 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@breml
Copy link
Contributor Author

breml commented Sep 6, 2019

@mvdan I had a look at the benchmarks in encoding/json and non of them contains struct fields with the string option set in the struct tags.

The change affects the encoding of struct like:

type Foo struct {
	Bar string `json:"bar,string"`
}

While looking through the commit messages for the recent changes on encoding/json I found this one from you:

...
This change should slightly reduce the amount of work when decoding into
json.Number, though that isn't very common nor part of any current
benchmarks.
...

I would argue, that this change also slightly improves encoding performance and that it is not very common nor part of any existing benchmarks. Would this be fine as well or do you insist on benchmarks for this particular case?

@mvdan
Copy link
Member

mvdan commented Sep 6, 2019

Please reply on Gerrit. I'd rather not keep the discussion in two places.

@gopherbot
Copy link
Contributor

Message from Daniel Martí:

Patch Set 2: Code-Review-1

There were some replies on GitHub - hopefully the author can post them here. Until then, -1 as per my comment above.


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Lucas Bremgartner:

Patch Set 2:

Patch Set 1:

Any optimization must include benchmark numbers via benchstat. For example, look at recent speed-ups in the json package. BenchmarkCodeEncoder might be a good start.

Copy'n'paste from #34127 (comment)

@mvdan Thanks for your hint about the benchmarks. From my point of view, the change was not primary driven by performance optimization in mind, this came more as a welcome side effect.

My motivation for the proposed change is:

remove the unnecessary error case
fix #34154
Nevertheless I will try to have a look at benchstat and provide some benchmark statistics for the improved performance as well.

Copy'n'paste from #34127 (comment)

@mvdan I had a look at the benchmarks in encoding/json and non of them contains struct fields with the string option set in the struct tags.

The change affects the encoding of struct like:

type Foo struct {
Bar string json:"bar,string"
}
While looking through the commit messages for the recent changes on encoding/json I found this one from you:

...
This change should slightly reduce the amount of work when decoding into
json.Number, though that isn't very common nor part of any current
benchmarks.
...

I would argue, that this change also slightly improves encoding performance and that it is not very common nor part of any existing benchmarks. Would this be fine as well or do you insist on benchmarks for this particular case?


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Lucas Bremgartner:

Patch Set 2:

Patch Set 2: Code-Review-1

There were some replies on GitHub - hopefully the author can post them here. Until then, -1 as per my comment above.

Please also consider the issue #34154, because it is related to this PR (in fact the issue is solved by this PR).


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Lucas Bremgartner:

Patch Set 3: Commit message was updated.


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@breml breml force-pushed the optimize-json-quoted-string branch from d417e71 to b4da9fd Compare September 9, 2019 20:45
@gopherbot
Copy link
Contributor

This PR (HEAD: b4da9fd) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/193604 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@breml breml force-pushed the optimize-json-quoted-string branch from b4da9fd to 96aea0f Compare September 9, 2019 21:01
@gopherbot
Copy link
Contributor

This PR (HEAD: 96aea0f) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/193604 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Daniel Martí:

Patch Set 8:

(5 comments)

Thanks, I didn't know there was an issue to fix here. My previous CL didn't include benchmark numbers because it wasn't a performance optimization. I was simply deleting dead code, so I imagined that the edge case would be a bit faster.

If this CL is about making JSON encoding follow an option correctly, and that's the bug you filed, then no need to add the benchmark. I only insisted on that because the original CL was written in a way that seemed like you were optimizing the code :)


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@breml breml changed the title encoding/json: optimize marshal for quoted string encoding/json: fix and optimize marshal for quoted string Sep 10, 2019
Since Go 1.2 every string can be marshaled to JSON without error even if it
contains invalid UTF-8 byte sequences. Therefore there is no need to use
Marshal again for the only reason of enclosing the string in double quotes.
Not using Marshal here also removes the error check as there has not been a
way for Marshal to fail anyway.

name old time/op new time/op delta
Issue34127-4 360ns ± 3% 200ns ± 3% -44.56% (p=0.008 n=5+5)

name old alloc/op new alloc/op delta
Issue34127-4 56.0B ± 0% 40.0B ± 0% -28.57% (p=0.008 n=5+5)

name old allocs/op new allocs/op delta
Issue34127-4 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.008 n=5+5)

Fixes golang#34154
@breml breml force-pushed the optimize-json-quoted-string branch from 96aea0f to 9b0ac1d Compare September 10, 2019 18:51
@gopherbot
Copy link
Contributor

This PR (HEAD: 9b0ac1d) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/193604 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Lucas Bremgartner:

Patch Set 10:

(5 comments)

I have updated the commit message and integrated the test into an existing table driven test.

I have also added the ideas, I considered regarding the implementation without make+append.


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@andig
Copy link
Contributor

andig commented Sep 10, 2019

For some reason I cannot comment on the CL. I have a stupid suggestion that might be worth trying. In https://go-review.googlesource.com/c/go/+/194338 an encodeBuffer from Pool is used to optimize allocations. Being in the same package it might also lend itself here instead of the bytes array. However, it contains overhead and cannot be allocated to the desired size. May not be worth it in the end or not even the cleanest thing to do.

@gopherbot
Copy link
Contributor

Message from Andreas Goetz:

Patch Set 10:

Patch Set 10:

(5 comments)

I have updated the commit message and integrated the test into an existing table driven test.

I have also added the ideas, I considered regarding the implementation without make+append.

Couldn't 607 just be like this:

	e.WriteByte('"')
	e.string(v.String(), opts.escapeHTML)
	e.WriteByte('"')

without the additional []byte allocation?


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Lucas Bremgartner:

Patch Set 10:

Patch Set 10:

Patch Set 10:

(5 comments)

I have updated the commit message and integrated the test into an existing table driven test.

I have also added the ideas, I considered regarding the implementation without make+append.

Couldn't 607 just be like this:

  e.WriteByte('"')
  e.string(v.String(), opts.escapeHTML)
  e.WriteByte('"')

without the additional []byte allocation?

This was my very first try as well, when I approached the problem. Unfortunately, this does not work, because e.string adds the double quotes as well (see:

e.WriteByte('"')
). So you end up with invalid json like this, because the inner quotes are not escaped properly:

 { "foo": ""bar"" }

Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Andreas Goetz:

Patch Set 10:

Patch Set 10:

Couldn't 607 just be like this:

	e.WriteByte('"')
	e.string(v.String(), opts.escapeHTML)
	e.WriteByte('"')

without the additional []byte allocation?

This was my very first try as well, when I approached the problem. Unfortunately, this does not work, because e.string adds the double quotes as well (see:

e.WriteByte('"')
). So you end up with invalid json like this, because the inner quotes are not escaped properly:

 { "foo": ""bar"" }

Ok, please tell me off if I'm confusing the discussion. Isn't then your patch really equal to:

e.string(v.String(), opts.escapeHTML)

and basically means that the entire optQuoted block could really be removed?


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Lucas Bremgartner:

Patch Set 10:

Patch Set 10:

Patch Set 10:

Couldn't 607 just be like this:

  e.WriteByte('"')
  e.string(v.String(), opts.escapeHTML)
  e.WriteByte('"')

without the additional []byte allocation?

This was my very first try as well, when I approached the problem. Unfortunately, this does not work, because e.string adds the double quotes as well (see:

e.WriteByte('"')
). So you end up with invalid json like this, because the inner quotes are not escaped properly:

 { "foo": ""bar"" }

Ok, please tell me off if I'm confusing the discussion. Isn't then your patch really equal to:

e.string(v.String(), opts.escapeHTML)

and basically means that the entire optQuoted block could really be removed?

No, the optQuoted flag is controlled by the "string" option in the json struct tags. The docs (https://golang.org/pkg/encoding/json/#Marshal) says:

The "string" option signals that a field is stored as JSON inside a JSON-encoded string. It applies only to fields of string, floating point, integer, or boolean types.

This means, that any string, that is encoded, is put into additional (escaped) quotes. E.g.

{ "foo": "\"bar\"" }

I suggest, you consult the added test case (https://go-review.googlesource.com/c/go/+/193604/10/src/encoding/json/stream_test.go) to see the expected output from a given input.


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Daniel Martí:

Patch Set 10: Run-TryBot+1 Code-Review+2

Thanks for the throrough explanation, Lucas! LGTM.


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gobot Gobot:

Patch Set 10:

TryBots beginning. Status page: https://farmer.golang.org/try?commit=ed63614a


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gobot Gobot:

Patch Set 10: TryBot-Result+1

TryBots are happy.


Please don’t reply on this GitHub thread. Visit golang.org/cl/193604.
After addressing review feedback, remember to publish your drafts!

gopherbot pushed a commit that referenced this pull request Sep 11, 2019
Since Go 1.2 every string can be marshaled to JSON without error even if it
contains invalid UTF-8 byte sequences. Therefore there is no need to use
Marshal again for the only reason of enclosing the string in double quotes.
Not using Marshal here also removes the error check as there has not been a
way for Marshal to fail anyway.

name          old time/op    new time/op    delta
Issue34127-4     360ns ± 3%     200ns ± 3%  -44.56%  (p=0.008 n=5+5)

name          old alloc/op   new alloc/op   delta
Issue34127-4     56.0B ± 0%     40.0B ± 0%  -28.57%  (p=0.008 n=5+5)

name          old allocs/op  new allocs/op  delta
Issue34127-4      3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.008 n=5+5)

Fixes #34154

Change-Id: Ib60dc11980f9b20d8bef2982de7168943d632263
GitHub-Last-Rev: 9b0ac1d
GitHub-Pull-Request: #34127
Reviewed-on: https://go-review.googlesource.com/c/go/+/193604
Reviewed-by: Daniel Martí <[email protected]>
Run-TryBot: Daniel Martí <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
@gopherbot
Copy link
Contributor

This PR is being closed because golang.org/cl/193604 has been merged.

@gopherbot gopherbot closed this Sep 11, 2019
Dirbaio pushed a commit to sqlbunny/json that referenced this pull request Sep 12, 2019
Since Go 1.2 every string can be marshaled to JSON without error even if it
contains invalid UTF-8 byte sequences. Therefore there is no need to use
Marshal again for the only reason of enclosing the string in double quotes.
Not using Marshal here also removes the error check as there has not been a
way for Marshal to fail anyway.

name          old time/op    new time/op    delta
Issue34127-4     360ns ± 3%     200ns ± 3%  -44.56%  (p=0.008 n=5+5)

name          old alloc/op   new alloc/op   delta
Issue34127-4     56.0B ± 0%     40.0B ± 0%  -28.57%  (p=0.008 n=5+5)

name          old allocs/op  new allocs/op  delta
Issue34127-4      3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.008 n=5+5)

Fixes #34154

Change-Id: Ib60dc11980f9b20d8bef2982de7168943d632263
GitHub-Last-Rev: 9b0ac1d4c5318b6bf9ed7930320f2bd755f9939c
GitHub-Pull-Request: golang/go#34127
Reviewed-on: https://go-review.googlesource.com/c/go/+/193604
Reviewed-by: Daniel Martí <[email protected]>
Run-TryBot: Daniel Martí <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
e-nikolov pushed a commit to e-nikolov/json that referenced this pull request Nov 25, 2021
Since Go 1.2 every string can be marshaled to JSON without error even if it
contains invalid UTF-8 byte sequences. Therefore there is no need to use
Marshal again for the only reason of enclosing the string in double quotes.
Not using Marshal here also removes the error check as there has not been a
way for Marshal to fail anyway.

name          old time/op    new time/op    delta
Issue34127-4     360ns ± 3%     200ns ± 3%  -44.56%  (p=0.008 n=5+5)

name          old alloc/op   new alloc/op   delta
Issue34127-4     56.0B ± 0%     40.0B ± 0%  -28.57%  (p=0.008 n=5+5)

name          old allocs/op  new allocs/op  delta
Issue34127-4      3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.008 n=5+5)

Fixes #34154

Change-Id: Ib60dc11980f9b20d8bef2982de7168943d632263
GitHub-Last-Rev: 9b0ac1d4c5318b6bf9ed7930320f2bd755f9939c
GitHub-Pull-Request: golang/go#34127
Reviewed-on: https://go-review.googlesource.com/c/go/+/193604
Reviewed-by: Daniel Martí <[email protected]>
Run-TryBot: Daniel Martí <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
e-nikolov pushed a commit to e-nikolov/json that referenced this pull request Nov 25, 2021
Since Go 1.2 every string can be marshaled to JSON without error even if it
contains invalid UTF-8 byte sequences. Therefore there is no need to use
Marshal again for the only reason of enclosing the string in double quotes.
Not using Marshal here also removes the error check as there has not been a
way for Marshal to fail anyway.

name          old time/op    new time/op    delta
Issue34127-4     360ns ± 3%     200ns ± 3%  -44.56%  (p=0.008 n=5+5)

name          old alloc/op   new alloc/op   delta
Issue34127-4     56.0B ± 0%     40.0B ± 0%  -28.57%  (p=0.008 n=5+5)

name          old allocs/op  new allocs/op  delta
Issue34127-4      3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.008 n=5+5)

Fixes #34154

Change-Id: Ib60dc11980f9b20d8bef2982de7168943d632263
GitHub-Last-Rev: 9b0ac1d4c5318b6bf9ed7930320f2bd755f9939c
GitHub-Pull-Request: golang/go#34127
Reviewed-on: https://go-review.googlesource.com/c/go/+/193604
Reviewed-by: Daniel Martí <[email protected]>
Run-TryBot: Daniel Martí <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
e-nikolov pushed a commit to e-nikolov/json that referenced this pull request Nov 25, 2021
Since Go 1.2 every string can be marshaled to JSON without error even if it
contains invalid UTF-8 byte sequences. Therefore there is no need to use
Marshal again for the only reason of enclosing the string in double quotes.
Not using Marshal here also removes the error check as there has not been a
way for Marshal to fail anyway.

name          old time/op    new time/op    delta
Issue34127-4     360ns ± 3%     200ns ± 3%  -44.56%  (p=0.008 n=5+5)

name          old alloc/op   new alloc/op   delta
Issue34127-4     56.0B ± 0%     40.0B ± 0%  -28.57%  (p=0.008 n=5+5)

name          old allocs/op  new allocs/op  delta
Issue34127-4      3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.008 n=5+5)

Fixes #34154

Change-Id: Ib60dc11980f9b20d8bef2982de7168943d632263
GitHub-Last-Rev: 9b0ac1d4c5318b6bf9ed7930320f2bd755f9939c
GitHub-Pull-Request: golang/go#34127
Reviewed-on: https://go-review.googlesource.com/c/go/+/193604
Reviewed-by: Daniel Martí <[email protected]>
Run-TryBot: Daniel Martí <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

encoding/json: string option (struct tag) on string field with SetEscapeHTML(false) escapes anyway
5 participants