-
Notifications
You must be signed in to change notification settings - Fork 15.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the Ruby code generator to emit a serialized proto instead of the DSL #12319
Conversation
All but two of the conformance failures are fixed.
This avoids putting non-printable characters into the source, which makes some tooling unhappy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable in general. (Thank you in particular for using an escaped string, instead of __END__
-- that's been a headache for the current descriptor_pb.rb
.)
Are there golden outputs where we can actually see what this looks like (including the compatibility code)?
@haberman What's the next step on this? |
I'm going to try to improve the error message for the case that the fallback code kicks in. |
I improved the error message when the compatibility code is triggered. The message now looks something like this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. Couple minor nits on the warning message.
end | ||
end | ||
warn "Each proto file must use a consistent fully-qualified name." | ||
warn "This will become an error in the next version." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: "next major version"
parsed.clear_dependency | ||
serialized = parsed.class.encode(parsed) | ||
file = pool.add_serialized_file(serialized) | ||
warn "Warning: Protobuf detected import path issue while loading generated file #{__FILE__}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar nit: "an import path issue" or "import path issues"?
…the DSL (#12319) This PR removes the DSL from the code generator, in anticipation of splitting the DSL out into a separate package. Given a .proto file like: ```proto syntax = "proto3"; package pkg; message TestMessage { optional int32 i32 = 1; optional TestMessage msg = 2; } ``` Generated code before: ```ruby # Generated by the protocol buffer compiler. DO NOT EDIT! # source: protoc_explorer/main.proto require 'google/protobuf' Google::Protobuf::DescriptorPool.generated_pool.build do add_file("test.proto", :syntax => :proto3) do add_message "pkg.TestMessage" do proto3_optional :i32, :int32, 1 proto3_optional :msg, :message, 2, "pkg.TestMessage" end end end module Pkg TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass end ``` Generated code after: ```ruby # frozen_string_literal: true # Generated by the protocol buffer compiler. DO NOT EDIT! # source: test.proto require 'google/protobuf' descriptor_data = "\n\ntest.proto\x12\x03pkg\"S\n\x0bTestMessage\x12\x10\n\x03i32\x18\x01 \x01(\x05H\x00\x88\x01\x01\x12\"\n\x03msg\x18\x02 \x01(\x0b\x32\x10.pkg.TestMessageH\x01\x88\x01\x01\x42\x06\n\x04_i32B\x06\n\x04_msgb\x06proto3" begin Google::Protobuf::DescriptorPool.generated_pool.add_serialized_file(descriptor_data) rescue TypeError => e # <compatibility code, see below> end module Pkg TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass end ``` This change fixes nearly all remaining conformance problems that existed previously. This is a side effect of moving from the DSL (which is lossy) to a serialized descriptor (which preserves all information). ## Backward Compatibility This change should be 100% compatible with Ruby Protobuf >= 3.18.0, released in Sept 2021. Additionally, it should be compatible with all existing users and deployments. However there is some special compatibility code I inserted to achieve this level of backward compatibility. Without the compatibility code, there is an edge case that could break backward compatibility. The existing code is lax in a way that the new code would be more strict. When we use a full serialized descriptor, it will contain a list of all `.proto` files imported by this file (whereas the DSL never added dependencies properly): https://github.com/protocolbuffers/protobuf/blob/dfb71558a2226718dc3bcf5df27cbc11c1f72382/src/google/protobuf/descriptor.proto#L65-L66 `add_serialized_file` will verify that all dependencies listed in the descriptor were previously added with `add_serialized_file`. Generally that should be fine, because the generated code will contain Ruby `require` statements for all dependencies, and the descriptor will fail to load anyway if the types we depend on were not previously defined in the DescriptorPool. But there is a potential for problems if there are ambiguities around file paths. For example, consider the following scenario: ```proto // foo/bar.proto syntax = "proto2"; message Bar {} ``` ```proto // foo/baz.proto syntax = "proto2"; import "bar.proto"; message Baz { optional Bar bar = 1; } ``` If you invoke `protoc` like so, it will work correctly: ``` $ protoc --ruby_out=. -Ifoo foo/bar.proto foo/baz.proto $ RUBYLIB=. ruby baz_pb.rb ``` However if you invoke `protoc` like so, and didn't have any compatibility code, it would fail to load: ``` $ protoc --ruby_out=. -I. -Ifoo foo/baz.proto $ protoc --ruby_out=. -I. -Ifoo foo/bar.proto $ RUBYLIB=foo ruby foo/baz_pb.rb foo/baz_pb.rb:10:in `add_serialized_file': Unable to build file to DescriptorPool: Depends on file 'bar.proto', but it has not been loaded (Google::Protobuf::TypeError) from foo/baz_pb.rb:10:in `<main>' ``` The problem is that `bar.proto` is being referred to by two different canonical names: `bar.proto` and `foo/bar.proto`. This is a user error: each import should always be referred to by a consistent full path. Hopefully user errors of this sort are rare, but it is hard to know without trying. The code in this PR prints a warning using `warn` if we detect that this edge case has occurred. We will plan to remove this compatibility code in the next major version. Closes #12319 COPYBARA_INTEGRATE_REVIEW=#12319 from haberman:ruby-gencode-binary 5c0e8f2 FUTURE_COPYBARA_INTEGRATE_REVIEW=#12319 from haberman:ruby-gencode-binary 5c0e8f2 PiperOrigin-RevId: 524059189
…the DSL (#12319) This PR removes the DSL from the code generator, in anticipation of splitting the DSL out into a separate package. Given a .proto file like: ```proto syntax = "proto3"; package pkg; message TestMessage { optional int32 i32 = 1; optional TestMessage msg = 2; } ``` Generated code before: ```ruby # Generated by the protocol buffer compiler. DO NOT EDIT! # source: protoc_explorer/main.proto require 'google/protobuf' Google::Protobuf::DescriptorPool.generated_pool.build do add_file("test.proto", :syntax => :proto3) do add_message "pkg.TestMessage" do proto3_optional :i32, :int32, 1 proto3_optional :msg, :message, 2, "pkg.TestMessage" end end end module Pkg TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass end ``` Generated code after: ```ruby # frozen_string_literal: true # Generated by the protocol buffer compiler. DO NOT EDIT! # source: test.proto require 'google/protobuf' descriptor_data = "\n\ntest.proto\x12\x03pkg\"S\n\x0bTestMessage\x12\x10\n\x03i32\x18\x01 \x01(\x05H\x00\x88\x01\x01\x12\"\n\x03msg\x18\x02 \x01(\x0b\x32\x10.pkg.TestMessageH\x01\x88\x01\x01\x42\x06\n\x04_i32B\x06\n\x04_msgb\x06proto3" begin Google::Protobuf::DescriptorPool.generated_pool.add_serialized_file(descriptor_data) rescue TypeError => e # <compatibility code, see below> end module Pkg TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass end ``` This change fixes nearly all remaining conformance problems that existed previously. This is a side effect of moving from the DSL (which is lossy) to a serialized descriptor (which preserves all information). ## Backward Compatibility This change should be 100% compatible with Ruby Protobuf >= 3.18.0, released in Sept 2021. Additionally, it should be compatible with all existing users and deployments. However there is some special compatibility code I inserted to achieve this level of backward compatibility. Without the compatibility code, there is an edge case that could break backward compatibility. The existing code is lax in a way that the new code would be more strict. When we use a full serialized descriptor, it will contain a list of all `.proto` files imported by this file (whereas the DSL never added dependencies properly): https://github.com/protocolbuffers/protobuf/blob/dfb71558a2226718dc3bcf5df27cbc11c1f72382/src/google/protobuf/descriptor.proto#L65-L66 `add_serialized_file` will verify that all dependencies listed in the descriptor were previously added with `add_serialized_file`. Generally that should be fine, because the generated code will contain Ruby `require` statements for all dependencies, and the descriptor will fail to load anyway if the types we depend on were not previously defined in the DescriptorPool. But there is a potential for problems if there are ambiguities around file paths. For example, consider the following scenario: ```proto // foo/bar.proto syntax = "proto2"; message Bar {} ``` ```proto // foo/baz.proto syntax = "proto2"; import "bar.proto"; message Baz { optional Bar bar = 1; } ``` If you invoke `protoc` like so, it will work correctly: ``` $ protoc --ruby_out=. -Ifoo foo/bar.proto foo/baz.proto $ RUBYLIB=. ruby baz_pb.rb ``` However if you invoke `protoc` like so, and didn't have any compatibility code, it would fail to load: ``` $ protoc --ruby_out=. -I. -Ifoo foo/baz.proto $ protoc --ruby_out=. -I. -Ifoo foo/bar.proto $ RUBYLIB=foo ruby foo/baz_pb.rb foo/baz_pb.rb:10:in `add_serialized_file': Unable to build file to DescriptorPool: Depends on file 'bar.proto', but it has not been loaded (Google::Protobuf::TypeError) from foo/baz_pb.rb:10:in `<main>' ``` The problem is that `bar.proto` is being referred to by two different canonical names: `bar.proto` and `foo/bar.proto`. This is a user error: each import should always be referred to by a consistent full path. Hopefully user errors of this sort are rare, but it is hard to know without trying. The code in this PR prints a warning using `warn` if we detect that this edge case has occurred. We will plan to remove this compatibility code in the next major version. Closes #12319 COPYBARA_INTEGRATE_REVIEW=#12319 from haberman:ruby-gencode-binary 5c0e8f2 FUTURE_COPYBARA_INTEGRATE_REVIEW=#12319 from haberman:ruby-gencode-binary 5c0e8f2 PiperOrigin-RevId: 524059189
…the DSL (#12319) This PR removes the DSL from the code generator, in anticipation of splitting the DSL out into a separate package. Given a .proto file like: ```proto syntax = "proto3"; package pkg; message TestMessage { optional int32 i32 = 1; optional TestMessage msg = 2; } ``` Generated code before: ```ruby # Generated by the protocol buffer compiler. DO NOT EDIT! # source: protoc_explorer/main.proto require 'google/protobuf' Google::Protobuf::DescriptorPool.generated_pool.build do add_file("test.proto", :syntax => :proto3) do add_message "pkg.TestMessage" do proto3_optional :i32, :int32, 1 proto3_optional :msg, :message, 2, "pkg.TestMessage" end end end module Pkg TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass end ``` Generated code after: ```ruby # frozen_string_literal: true # Generated by the protocol buffer compiler. DO NOT EDIT! # source: test.proto require 'google/protobuf' descriptor_data = "\n\ntest.proto\x12\x03pkg\"S\n\x0bTestMessage\x12\x10\n\x03i32\x18\x01 \x01(\x05H\x00\x88\x01\x01\x12\"\n\x03msg\x18\x02 \x01(\x0b\x32\x10.pkg.TestMessageH\x01\x88\x01\x01\x42\x06\n\x04_i32B\x06\n\x04_msgb\x06proto3" begin Google::Protobuf::DescriptorPool.generated_pool.add_serialized_file(descriptor_data) rescue TypeError => e # <compatibility code, see below> end module Pkg TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass end ``` This change fixes nearly all remaining conformance problems that existed previously. This is a side effect of moving from the DSL (which is lossy) to a serialized descriptor (which preserves all information). ## Backward Compatibility This change should be 100% compatible with Ruby Protobuf >= 3.18.0, released in Sept 2021. Additionally, it should be compatible with all existing users and deployments. However there is some special compatibility code I inserted to achieve this level of backward compatibility. Without the compatibility code, there is an edge case that could break backward compatibility. The existing code is lax in a way that the new code would be more strict. When we use a full serialized descriptor, it will contain a list of all `.proto` files imported by this file (whereas the DSL never added dependencies properly): https://github.com/protocolbuffers/protobuf/blob/dfb71558a2226718dc3bcf5df27cbc11c1f72382/src/google/protobuf/descriptor.proto#L65-L66 `add_serialized_file` will verify that all dependencies listed in the descriptor were previously added with `add_serialized_file`. Generally that should be fine, because the generated code will contain Ruby `require` statements for all dependencies, and the descriptor will fail to load anyway if the types we depend on were not previously defined in the DescriptorPool. But there is a potential for problems if there are ambiguities around file paths. For example, consider the following scenario: ```proto // foo/bar.proto syntax = "proto2"; message Bar {} ``` ```proto // foo/baz.proto syntax = "proto2"; import "bar.proto"; message Baz { optional Bar bar = 1; } ``` If you invoke `protoc` like so, it will work correctly: ``` $ protoc --ruby_out=. -Ifoo foo/bar.proto foo/baz.proto $ RUBYLIB=. ruby baz_pb.rb ``` However if you invoke `protoc` like so, and didn't have any compatibility code, it would fail to load: ``` $ protoc --ruby_out=. -I. -Ifoo foo/baz.proto $ protoc --ruby_out=. -I. -Ifoo foo/bar.proto $ RUBYLIB=foo ruby foo/baz_pb.rb foo/baz_pb.rb:10:in `add_serialized_file': Unable to build file to DescriptorPool: Depends on file 'bar.proto', but it has not been loaded (Google::Protobuf::TypeError) from foo/baz_pb.rb:10:in `<main>' ``` The problem is that `bar.proto` is being referred to by two different canonical names: `bar.proto` and `foo/bar.proto`. This is a user error: each import should always be referred to by a consistent full path. Hopefully user errors of this sort are rare, but it is hard to know without trying. The code in this PR prints a warning using `warn` if we detect that this edge case has occurred. We will plan to remove this compatibility code in the next major version. Closes #12319 COPYBARA_INTEGRATE_REVIEW=#12319 from haberman:ruby-gencode-binary 5c0e8f2 FUTURE_COPYBARA_INTEGRATE_REVIEW=#12319 from haberman:ruby-gencode-binary 5c0e8f2 PiperOrigin-RevId: 524059189
This PR removes the DSL from the code generator, in anticipation of splitting the DSL out into a separate package.
Given a .proto file like:
Generated code before:
Generated code after:
This change fixes nearly all remaining conformance problems that existed previously. This is a side effect of moving from the DSL (which is lossy) to a serialized descriptor (which preserves all information).
Backward Compatibility
This change should be 100% compatible with Ruby Protobuf >= 3.18.0, released in Sept 2021. Additionally, it should be compatible with all existing users and deployments. However there is some special compatibility code I inserted to achieve this level of backward compatibility.
Without the compatibility code, there is an edge case that could break backward compatibility. The existing code is lax in a way that the new code would be more strict.
When we use a full serialized descriptor, it will contain a list of all
.proto
files imported by this file (whereas the DSL never added dependencies properly):protobuf/src/google/protobuf/descriptor.proto
Lines 65 to 66 in dfb7155
add_serialized_file
will verify that all dependencies listed in the descriptor were previously added withadd_serialized_file
. Generally that should be fine, because the generated code will contain Rubyrequire
statements for all dependencies, and the descriptor will fail to load anyway if the types we depend on were not previously defined in the DescriptorPool.But there is a potential for problems if there are ambiguities around file paths. For example, consider the following scenario:
If you invoke
protoc
like so, it will work correctly:However if you invoke
protoc
like so, and didn't have any compatibility code, it would fail to load:The problem is that
bar.proto
is being referred to by two different canonical names:bar.proto
andfoo/bar.proto
. This is a user error: each import should always be referred to by a consistent full path. Hopefully user errors of this sort are rare, but it is hard to know without trying.The code in this PR prints a warning using
warn
if we detect that this edge case has occurred. We will plan to remove this compatibility code in the next major version.