Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected 0-th value enum JSON serialization #235

Closed
antongrbin opened this issue Oct 20, 2022 · 9 comments · Fixed by #238
Closed

Unexpected 0-th value enum JSON serialization #235

antongrbin opened this issue Oct 20, 2022 · 9 comments · Fixed by #238

Comments

@antongrbin
Copy link
Contributor

antongrbin commented Oct 20, 2022

The testing message is defined as follows:

syntax = "proto2";

message MessageWithEnum {
    enum EnumType {
      FOO = 0;
      BAR = 1;
    }
    optional EnumType value = 1;
}

We observe the following serialization with pbandk:

Source message current JSON serialization expected JSON serialization
MessageWithEnum() {"value":null} {}
MessageWithEnum(value=FOO) {} {"value":"FOO"}
MessageWithEnum(value=BAR) {"value":"BAR"} OK!

The bug is reproduced in a unit test here:
#234

I know that proto2 doesn't really have a JSON spec defined, but other protobuf implementations (java, python, swift) would all behave as expected in this example.

In proto3, we observed the following serialization behavior:

Source message current JSON serialization expected JSON serialization
TestAllTypesProto3() {} OK!
TestAllTypesProto3(optionalNestedEnum = FOO) {} {"value":"FOO"}
TestAllTypesProto3(optionalNestedEnum = BAR) {"value":"BAR"} OK!
  1. Do you agree that the current behavior is unexpected?
  2. Would you be open for us contributing a fix for it?

Thank you for your time :)

@antongrbin antongrbin changed the title Possible bug in proto2 enum JSON serialization Unexpected proto2 enum JSON serialization Oct 20, 2022
@antongrbin antongrbin changed the title Unexpected proto2 enum JSON serialization Unexpected 0-th value enum JSON serialization Oct 20, 2022
@garyp
Copy link
Collaborator

garyp commented Oct 20, 2022

The proto2 behavior definitely seems like a bug. As you mentioned, proto2 doesn't officially have a JSON spec defined and so pbandk doesn't officially support JSON with proto2. But we do try to match the behavior of the official protobuf libraries where possible. We would gladly accept a PR with a fix.

The proto3 behavior I would argue is compliant with the spec. According to https://developers.google.com/protocol-buffers/docs/proto3#json:

If a field has the default value in the protocol buffer, it will be omitted in the JSON-encoded data by default to save space. An implementation may provide options to emit fields with default values in the JSON-encoded output.

pbandk does have the JsonConfig.outputDefaultValues option available if you want the output to be {"value":"FOO"}.

@antongrbin
Copy link
Contributor Author

Thank you for the quick turnaround time :)

Happy to hear that we can move forward with the fix for proto2.

If you agree, please review #234 which only adds unit tests that show the current behavior. This will make our follow-up fix PR clearer.

--

I looked into implementing the fix for proto2 and it looks like most effort will go into determining whether the message is proto2 or proto3 with the information that JsonMessageEncoder has in runtime. Did I miss anything here?

A simpler approach is to say that field presence is what's important, not proto2 vs proto3. In that approach we would change the behavior of all optional enum fields with default values, regardless of whether they are in proto2 or proto3.

Digging deeper, I came across Application note: Field presence which suggests that our fix here might not be enum-specific. From that doc:

* No presence discipline:
  * Default values are not serialized.
* Explicit presence discipline:
  * Explicitly set values are always serialized, including default values.

This suggests that we should serialize default values of all optional fields (not only enums).

--

Having all of the above, we can apply the fix with 3 scopes:

  1. fix serialization of default values for proto2 enums
  2. fix serialization of default values for all optional enums
  3. fix serialization of default values for all optional fields

My hunch would be to move forward with 3) since it makes the code cleaner and is more aligned with other implementations. I can provide more details here if needed :)

Looking forward to your input!

@garyp
Copy link
Collaborator

garyp commented Oct 23, 2022

fix serialization of default values for all optional fields

I agree that this would be the preferred route.

In general, it's better to avoid writing code that makes decisions based on proto2 vs proto3 (when possible) and instead write code that makes decisions based on the field's presence discipline (as described in the doc you linked to).

This suggests that we should serialize default values of all optional fields (not only enums).

Yes agreed. Just keep in mind the distinction between "no presence" (the default behavior for non-repeated proto3 fields) and "explicit presence" (used in proto2 and proto3 for fields that include the optional keyword).

@antongrbin
Copy link
Contributor Author

Excellent. Thank you for the guidance!

@garyp will you be the reviewer for the changes? This will be 2 small PRs, first one with a unit test and second one with a fix.

The first one is here: #234

@garyp
Copy link
Collaborator

garyp commented Oct 24, 2022

Yep, I'll review them. I should have time later today to take a look at the PRs.

@antongrbin
Copy link
Contributor Author

Thank you for the review on the unit test PR.

The fix PR is here: #238

garyp pushed a commit that referenced this issue Oct 25, 2022
# Motivation

See #235 (comment).

This PR only adds the unit tests that document the current behavior. We will implement the fix as discussed in #235 in the follow-up PR, after this one is merged.

# Changes

* Added a unit test to document the behavior that will be fixed in a follow-up PR.
* After changing the `test_proto2.proto` I executed `./gradlew generateProtos` to generate the `TestProto.java` and `test_proto2.kt`

# Tested

Executed unit tests locally and they pass.
@garyp garyp added this to the 0.15.0 milestone Oct 25, 2022
garyp pushed a commit that referenced this issue Oct 25, 2022
…#238)

# Motivation

Fixes #235 

# Changes

* Change the behavior for JSON serialization for fields with explicit presence.
* If unset -> don't emit on the wire.
* If set -> emit on the wire, even if the value is default (regardless of the `jsonConfig.outputDefaultValues`)

This is consistent with the following:
https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md#semantic-differences

This PR also changes behavior of `outputDefaultValues` for unset nested message fields. Before we were outputting `null` and now we don't output any value. This is consistent with c++, python and java, as explained here: noom/protobuf#5

# Tested

Run all unit tests in `runtime`
@garyp
Copy link
Collaborator

garyp commented Oct 25, 2022

@antongrbin Thank you for the thorough fix and tests!

@garyp garyp modified the milestones: 0.15.0, 0.14.2 Oct 25, 2022
@antongrbin
Copy link
Contributor Author

@garyp is there anything blocking the 0.14.2 release? Milestone looks all closed: https://github.com/streem/pbandk/milestone/12

Thank you for your time :)

@garyp
Copy link
Collaborator

garyp commented Nov 18, 2022

@antongrbin It's released now!

garyp pushed a commit that referenced this issue Mar 7, 2023
See #235 (comment).

This PR only adds the unit tests that document the current behavior. We will implement the fix as discussed in #235 in the follow-up PR, after this one is merged.

* Added a unit test to document the behavior that will be fixed in a follow-up PR.
* After changing the `test_proto2.proto` I executed `./gradlew generateProtos` to generate the `TestProto.java` and `test_proto2.kt`

Executed unit tests locally and they pass.
garyp pushed a commit that referenced this issue Mar 7, 2023
…#238)

Fixes #235

* Change the behavior for JSON serialization for fields with explicit presence.
* If unset -> don't emit on the wire.
* If set -> emit on the wire, even if the value is default (regardless of the `jsonConfig.outputDefaultValues`)

This is consistent with the following:
https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md#semantic-differences

This PR also changes behavior of `outputDefaultValues` for unset nested message fields. Before we were outputting `null` and now we don't output any value. This is consistent with c++, python and java, as explained here: noom/protobuf#5

Run all unit tests in `runtime`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants