Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immitate protobuf.js for oneof and non-scalar behavior re: field optionality #74

Closed
philikon opened this issue May 29, 2020 · 25 comments
Closed

Comments

@philikon
Copy link
Contributor

philikon commented May 29, 2020

I don't find ts-proto's generated types particularly helpful because it forces me to always provide all fields (even if I set it to undefined), which is especially annoying when you have huge nested oneofs.

As an example, consider the following proto:

message Person {
  string name = 1;
  uint32 age = 2;
  oneof legal_document {
    Passport passport = 3;
    DriversLicense dl = 4;
    GovtIssuedId govt_id  = 5;
  }
}

ts-proto translates this to

export interface Person {
  name: string;
  age: number;
  passport: Passport | undefined;
  dl: DriversLicense | undefined;
  GovtIssuedId: govtId | undefined;
}

If I wanted to encode one of these messages, I'd have specify all fields:

Person.encode({
  name: "Bob",
  age: 37,
  passport: undefined,
  dl: undefined,
  govId: undefined,
})

That is tedious and you can imagine with large messages, especially nested oneofs, it becomes entirely impractical. I'm also debating whether we should need to specify age and name at this stage. proto3 says that all fields are essentially optional, though google.protobuf.*Value is the de-facto way to explicitly declare an optional value.

(Also, I personally would also consider code that explicitly sets a value to undefined to be bad form. IMHO, undefined is a fallback value that the JS/TS language provides when the thing you're trying to access cannot be found or was not defined. Code that explicitly wants to provide a null value should use null. That's my interpretation of JS's built-in nullish types.)

I think pbts actually gets this right. It would translate the above to

                interface IPerson {
                    name?: (string|null);
                    age?: (number|null);
                    passport?: (Passport|null);
                    dl?: (DriversLicense|null);
                    govtId?: (GovtIssuedId|null);
                }

which is the interface that consumed by Person.encode(). This means I can write

Person.encode({name: "Bob"})

and it's perfectly valid. We can debate whether name and age should be optional and nullable (probably not), but all the fields in the oneof definitely should be! And for clarity, I would argue that any value that's not a basic type should be optional too, to save tedious typing, having to set them to null (or, worse, undefined, which as said above, is not a value I'd ever expect application code to have to set, it's something to check for).

I'm happy to help with the implementation, and I'm also happy to hide any changes behind a compiler flag if that's preferable.

@stephenh
Copy link
Owner

Yeah, understood that is a preference. For the projects I built ts-proto for, we purposefully wanted that rigor. We also had generally-not-huge messages.

I'd be cool with a config flag to turn on optionals. useOptionals? Something like that.

Happy to give some pointers if you need any, otherwise a PR would be great. (My main ask would be a new integration-tests example that enables your new flag and then shows what all of the generated output now looks like, maybe has a super-simple test case that shows it works, etc.)

Thanks!

@timostamm
Copy link
Contributor

The drawback to making all fields optional in typescript is that it breaks strictNullChecks.

This would not compile:

function getName( person:Person ): string {
  return person.name;
}

To create a Person with default values, you can use the fromPartial method:

Person.fromPartial({
  name: "Bob"
});

@stephenh
Copy link
Owner

Ah shoot, that is a great point @timostamm . I'd forgotten about both the "now things are optional on read" aspect as well as fromPartial being what you should probably do instead.

@philikon does ^ work for you? I can add an FAQ entry to the home page around "btw, if you have a whole lot of message field, you should look at fromPartial?

@philikon
Copy link
Contributor Author

philikon commented May 30, 2020

The drawback to making all fields optional in typescript is that it breaks strictNullChecks.

I don't quite follow. Can you elaborate with an example?

I think you guys have clarified an important distinction though. fromPartial would allow me to do the equivalent of Person.encode({name: 'Bob'}), as @timostamm explains. (It's a pity that fromPartial is only generated when JSON methods are generated -- that's part of the reason why I missed that potential solution. Perhaps that's something we could change with a compiler flag? I might want fromPartial but none of the toJSON and fromJSON stuff.)

But fromPartial is still different from what I was actually proposing but did a poor job defining well: making optional and nullable (a) message fields, (b) fields within a oneof clause, if the nullableOptionals=true compiler flag were set. I have this working locally and this proto message

message SimpleWithWrappers {
  google.protobuf.StringValue name = 1;
  google.protobuf.Int32Value age = 2;
  google.protobuf.BoolValue enabled = 3;
  repeated google.protobuf.Int32Value coins = 6;
  repeated google.protobuf.StringValue snacks = 7;
}

translates to this

export interface SimpleWithWrappers {
  name?: string | null;
  age?: number | null;
  enabled?: boolean | null;
  coins: number[];
  snacks: string[];
}

const baseSimpleWithWrappers: object = {
};

// and the usual encode/decode functions

I'd be very happy with that as I'd still get type checking on all the properties that are supposed to be there, but I can either not specify at all or pass null for those that are clearly optional.

@philikon
Copy link
Contributor Author

philikon commented May 30, 2020

Hmm, should

  coins: number[];
  snacks: string[];

really be

  coins: Array<number | null>;
  snacks: Array<string | null>;

perhaps? Not that it's actually possible to encode a "hole" in a repeated field currently, so perhaps the answer is no.

For instance, if I do this using ts-proto's default options:

    const s = SimpleWithWrappers.encode({
      name: undefined,
      age: undefined,
      enabled: undefined,
      coins: [undefined, 3],
      snacks: [],
    }).finish()
    const d = SimpleWithWrappers.decode(s)

I get d = { coins: [ 0, 3 ], snacks: [] } as a result, so it seems that number[] is probably more correct than Array<number | undefined> even with default compiler settings?

@timostamm
Copy link
Contributor

@philikon

I don't quite follow. Can you elaborate with an example?

Let's assume you have this message interface:

interface IPerson {
   name?: (string|null);
   age?: (number|null);
   passport?: (Passport|null);
   dl?: (DriversLicense|null);
   govtId?: (GovtIssuedId|null);
}

Creating objects with this interface is easy:

let IPerson = {
  name: 'foo'
};

But in most real world use cases, you are not only creating objects, you are reading from them.
Let's say you have this function:

function showPersonalMessage(name: string, message: string)

And you want to pass the name to it:

let x:IPerson = {
  age: 23
};

showPersonalMessage(x.name, 'hello')

Do you see the Problem? You are passing a string|undefined|null to a function that accepts only string. The function might save this value to a NOT NULL database table and crash your application. If you have the typescript compiler option strictNullChecks enabled, this will fail to compile.

So what you should do is code like that:

let name = x.name === null || x.name === undefined ? "" : x.name;
showPersonalMessage(name, 'hello');

This tends to get very verbose very quickly. And protobuf does not have this behaviour in other languages. In C# for example, this will never fail:

var p = new Person {
   Age = 23
};
void ShowPersonalMessage(name: string, message:string);
ShowPersonalMessage(p.Name);

Same for PHP and C++, because they all have an empty string as default value. proto-ts enforces this default value for strings by having name:string in the interface.

I have some background on this specific matter. We have an application where hundreds of protobuf message are used in Typescript (Angular), PHP, C# and C++.

Angular receives the data as JSON, and I have Typescript interfaces that are generated pretty much the same way that you are proposing. Every field is optional. The reason in this case is that Protobuf PHP has no option to output JSON with default values, so a string txt=1 has to be a txt?: string.

And the outcome of that is that I have to use tedious checks in many places in typescript that are not necessary at all in any of the other languages.

That's just my use case of course - only elaborating as requested. But let's not forget that protobuf is intended to work across platforms and languages and there is a great benefit when implementations behave in a similar way.

@philikon
Copy link
Contributor Author

philikon commented May 30, 2020

Thanks @timostamm. Yes, if all fields were optional, you'd have to do a lot of x.name ?? 'fallback' type checks and I agree that would get very verbose.

As mentioned above, I think my initial post wasn't as clear as it could have been. I propose the nullableOptionals compiler flag (as implemented in the PR, please have a look) makes optional and nullable

  • message fields
  • fields within oneof

With that, a variation of my original example

message Person {
  string name = 1;
  google.protobuf.Uint32Value age = 2;
  oneof legal_document {
    Passport passport = 3;
    DriversLicense dl = 4;
    GovtIssuedId govt_id  = 5;
  }
}

would translate to

export interface Person {
  name: string;
  age?: number | null;
  passport?: Passport | null;
  dl?: DriversLicense | null;
  GovtIssuedId?: govtId | null;
}

So I can do Person.encode({name: 'Bob', passport: {...}}) but I can't do Person.encode({passport: {...}}) because name is a required field. Hope that makes sense.

@timostamm
Copy link
Contributor

Thanks for the clarification, @philikon. I would love to see optional properties as the default for non-scalar proto fields.

But: While I agree with your thoughts about undefined vs null, I am not sure your suggestion plays out so well. When you do

const p:Person = { name: 'pete' };

p.passport will be undefined - not null.

Optional properties implicitly become union types with undefined. With strictNullChecks=true, the following function will not compile.

function getPersonPassport( p:Person ): Passport | null {
  // TS2322: Type 'Passport | null | undefined' is not assignable to type 'Passport | null'
  return p.passport; 
}

The correct return type for the function would be Passport | null | undefined. And you have to carry this type through your code.

A pragmatic solution might be to mark the properties (only non-scalar proto fields) as optional, without any union:

export interface Person {
  name: string;
  age?: number;
  passport?: Passport;
  dl?: DriversLicense;
  govtId?: GovtIssuedId;
}

passport is Passport | undefined in this case. I agree it should technically be Passport | null, but on the other hand this declaration does look very readable to me and usage would be consistent.

What do you think?

@philikon
Copy link
Contributor Author

I also care about type correctness, including strictNullChecks. I'm less concerned about the practical implications regarding checking null and undefined but I hear your point. I'd be happy with your proposal. I'll send a PR.

You mention giving non-scalars (and I'm assuming also fields within oneof clauses, which is one my motivators) this treatment. Does that mean you'd also advocate that it should apply to repeated fields? I've been going back and forth on that personally, but left them to be always non-optional and not-nullable for now (as I mentioned above, ts-proto in default mode actually has a bit of an inconsistency regarding the array value type IMHO.)

@philikon philikon changed the title Immitate protobuf.js for oneof behavior, field optionality, and null values Immitate protobuf.js for oneof and non-scalar behavior re: field optionality May 31, 2020
@timostamm
Copy link
Contributor

@philikon scalar types cannot be null. string, bool, numeric types, enums are all scalar types.

For protobuf, a repeated, bytes or map<> field also cannot be null. Basically, a field cannot be null unless it is a message.

Members of repeated fields and maps cannot be null, regardless whether the element type is a message or scalar.

This is enforced by the official code generated by protoc, even if the language supports null values. I you try to put a null value into a string field in C#, which allows for null strings, there will be an exception.

I would generally recommend to stick to this (documented) convention. ts-proto does so for strings and bools, not sure about the others.

There is one peculiarity: the wellknown type struct can store arbitrary data very similar to JSON. It can store null and also arrays that include a null value. This is implemented by wrapping the scalar types into messages. All of those messages have custom JSON serialization rules and in JSON, a StringValue looks exactly the same as a scalar string. There is also a NullValue that maps to JSON null. There is nothing that prevents us from using a wrapper message in our messages declarations to have for example a string field that can also be null. But those are all messages, even if they look identical to the primitive types in JSON. As far as I have seen, ts-proto does not seem to support those types very well.

Regarding oneof: It does not change the null semantics of fields. It just means that the message guarantees that only at most one of its members is non-empty. The C#, PHP, etc. generated code will unset all other oneof members if one is set, and it also provides an additional "case" field, that you can use to conveniently discover which of the fields is set.

The oneof in this message:

message Demo {
   oneof result {
     string error = 1;
     int value = 2;
  }
}

Can be used like this in C#:

demo.Error = "message"; // will set value to 0
demo.Value = 123; // will set error to ""

switch (demo.ResultCase) {
  case Demo.ResultOneofCase.Error:
     // demo.Error is not ""
  case Demo.ResultOneofCase.Value:
     // demo.Value is not 0
  case Demo.ResultOneofCase.None:
     // demo.Error is "" and demo.Value is 0
}

For typescript: Since ts-proto uses interfaces, it is not possible to unset other oneof members automatically when setting a member.

I don't think oneof members should be made optional. It would make construction of message easier, but you would still have to take care that only one member is set.

It would be possible to enforce oneof behaviour in the interface with discriminated union types:

interface Demo {
   result: {
      oneofCase: 'error',
      error: string
    } | {
      oneofCase: 'value',
      value: number
    } | {
       oneofCase: 'none'
    }
}

if (demo.result.oneofCase === 'error') {
   // the compiler should have narrowed down the type here
   console.log(demo.result.error);
}

With the drawback that the message members "error" and "value" would no longer be direct properties of the interface.

TLDR: I fully agree with making all message types optional properties. I am not convinced about oneof.

@stephenh
Copy link
Owner

stephenh commented Jun 1, 2020

(Fwiw just chiming in that I'm really heads down at work for ~last week/through this week, so haven't been able to pay attention to ts-proto; I love the discussion you guys are having though, really appreciate it.)

@philikon
Copy link
Contributor Author

philikon commented Jun 1, 2020

I like the way you're going with that, @timostamm. I was going to open another ticket + PR for adding a member to decoded objects that would tell you which oneof case is populated. I had originally thought of something like this:

interface Demo {
  result: 'error' | 'value' | null,
  error?: string,
  value?: number,
}

but your proposed solution is superior because it links which result is populated to its value type.

I propose two separate compiler flags then:

  • useOptionals=true turns all non-scalar fields into optional properties rather than typing them as xyz | undefined (doesn't touch oneof fields)
  • oneof=unions emits oneofs as you proposed. The default would be oneof=fields, the current behavior.

Members of repeated fields and maps cannot be null, regardless whether the element type is a message or scalar.

Yeah, which I think means types like https://github.com/stephenh/ts-proto/blob/master/integration/simple/simple.ts#L70-L71 are not correct and should probably be fixed. Should we file a separate ticket for that?

@timostamm
Copy link
Contributor

Sounds perfect, @philikon.

Could you open a ticket for Array<number | undefined>? I will post some reproducible example code there.

@philikon
Copy link
Contributor Author

philikon commented Jun 7, 2020

@timostamm your proposal was this:

interface Demo {
   result: { oneofCase: 'error', error: string }
         | { oneofCase: 'value', value: number }
         | { oneofCase: 'none' },
}

I find oneofCase a bit wordy, any reason we can't just call it case? It reads quite nicely IMHO:

interface Demo {
   result: { case: 'error', error: string }
         | { case: 'value', value: number }
         | { case: 'none' },
}

But more importantly, I'm not sure I like the magic string 'none'. It might very well be a valid field name within a oneof clause! We'd have a clash.

I see two possible solutions:
(1) The oneofCase values come from an enum we define, similar to how google-protobuf does it:

interface Demo {
  enum ResultCase {
    NOT_SET = 0,
    ERROR = 1, // this would correspond to the protobuf field index,
    VALUE = 2, // so it might be anything except 0
  }
  result: { case: ResultCaseERROR, error: string }
        | { case: ResultCase.VALUE, value: number },
        | { case: ResultCase.NOT_SET },
}

(2) Instead of another type union, the result property is made optional:

interface Demo {
   result?: { case: 'error', error: string }
          | { case: 'value', value: number },
}

I personally much prefer the latter due because of its brevity. Yes, consumers will have to check that demo.result isn't null-ish before switch (demo.result.case), but that seems acceptable to me. It could be as easy as switch (demo.result?.case).

Your thoughts?

@timostamm
Copy link
Contributor

@philikon, case certainly looks better than oneofCase.

The idea was that oneofCase is much more unlikely to clash with a field name. But thinking about it, I would rather use case as the discriminator, and if a field is named "case", rewrite that name during code generation.

Regarding case none:

You are right about the clash of course, I missed that. I don't see a good reason for the enum and prefer (2) as well. But does the compiler narrow down demo.result?.case correctly?

Possible solution (3): case: undefined

@stephenh
Copy link
Owner

stephenh commented Jun 7, 2020

I might want fromPartial but none of the toJSON and fromJSON stuff

Fwiw this seems like a good point and worth a separate/simple PR if you'd like @philikon . I'm forgetting why we'd ever not want to include fromPartials...

Oh, yeah, it was b/c the person who wanted to include fromJSON also wanted essentially an "onlyTypes" option, so I think we cheated a bit and let fromPartial get skipped when outputJsonMethods was false...

So, @philikon maybe a PR that does: a) add a new config option of outputFromPartialMethods (default true), b) for convenience to the original "only types" use case, add a new config option of onlyTypes doesn't directly change the output but instead sets outputFromPartialMethods, outputEncodeMethods, outputJsonMethods, and outputClientImpl all to false? wdyt?

@stephenh
Copy link
Owner

stephenh commented Jun 7, 2020

I like the oneof handling discussion. Fwiw that is one of ts-protos todos: "Model OneOfs as an ADT" :-) so would be great to have this.

Let's split the oneofs/ADTs out from the optional option? I.e. a useOptionals that defaults to false and on true sets all non-primitive fields to firstName?: string instead of firstName: string | undefined seems good as a dedicated PR. (Or alternatively an avoidOptionals that defaults to true to maintain backwards compatibility.)

@stephenh
Copy link
Owner

stephenh commented Jun 7, 2020

case certainly looks better than oneofCase.

Fwiw I've seen kind generally/conventionally used for these sorts of type unions.

I think I'd prefer the kind: "none" being included in the type-union, b/c usually these are handled via exhaustive switches i.e. with assertNevers, and it seems nice to have all N clauses of "could be kind A or kind B or kind C or kind none" all handled at the "same level". But no strong feelings on that.

@cliedeman
Copy link
Contributor

I am also interested in

Fwiw that is one of ts-protos todos: "Model OneOfs as an ADT" :-) so would be great to have this.

I have a one of with +25 options and would definitetly love some better erogonomics to work with this.

The idea was that oneofCase is much more unlikely to clash with a field name. But thinking about it, I would rather use case as the discriminator, and if a field is named "case", rewrite that name during code generation.

You can easily sidestep collisions by using a special character like $case or something.

Accoriding to the specification only alphanum are acceptable identifiers
https://developers.google.com/protocol-buffers/docs/reference/proto3-spec#identifiers

@timostamm

But does the compiler narrow down demo.result?.case correctly?

Yes, both versions do

@philikon
Copy link
Contributor Author

philikon commented Jun 7, 2020

Thanks everyone for your comments!

Let's split the oneofs/ADTs out from the optional option? I.e. a useOptionals that defaults to false and on true sets all non-primitive fields to firstName?: string instead of firstName: string | undefined seems good as a dedicated PR.

Agreed. I already have that pulled out and will send a separate PR.

@timostamm wrote:

The idea was that oneofCase is much more unlikely to clash with a field name.

@stephenh wrote:

Fwiw I've seen kind generally/conventionally used for these sorts of type unions.

@cliedeman wrote:

You can easily sidestep collisions by using a special character like $case or something.

All good points! 'case' might clash with a field name just as much as 'none' might. I do like the idea of $case, that's a brilliant way of using an identifier that's allowed in JS/TS but not in proto! I'd be open to $kind too but part of the reason I like the word "case" is because of its close association with the switch statement. I don't feel strongly about it, though.

Possible solution (3): case: undefined

That's basically the same as an optional field, since demo.result?.$kind would resolve to undefined if demo.result was not provided.

and it seems nice to have all N clauses of "could be kind A or kind B or kind C or kind none" all handled at the "same level"

The optional field would still allow you to do that:

interface Demo1 {
   result?: { $case: 'error', error: string }
          | { $case: 'value', value: number },
}

function process1(demo: Demo1) {
    switch (demo.result?.$case) {
        case 'error':
            break;
        case 'value':
            break;
        case undefined:
            break;
        default:
            throw new Error('Should not get here!');
            break;
    }
}

I will work on a PR formulating the above approach.

philikon added a commit to philikon/ts-proto that referenced this issue Jun 7, 2020
philikon added a commit to philikon/ts-proto that referenced this issue Jun 7, 2020
philikon added a commit to philikon/ts-proto that referenced this issue Jun 7, 2020
stephenh pushed a commit that referenced this issue Jun 8, 2020
… properties (#88)

* Don't generate default entries in the base object when the value would be undefined

* Compiler flag useOptionals=true turns non-scalar fields into optional properties

Addresses first part of #74

* Make messageToTypeName 'options' parameter required
@stephenh
Copy link
Owner

stephenh commented Jun 8, 2020

Released @philikon useOptionals option in v1.25.0. Thanks!

@cliedeman
Copy link
Contributor

cliedeman commented Jun 17, 2020

I will work on a PR formulating the above approach.

@philikon Any luck with this?

@philikon
Copy link
Contributor Author

@cliedeman yeah, sorry, too many distractions, but PR is up now.

philikon added a commit to philikon/ts-proto that referenced this issue Jun 19, 2020
philikon added a commit to philikon/ts-proto that referenced this issue Jun 21, 2020
philikon added a commit to philikon/ts-proto that referenced this issue Jun 21, 2020
philikon added a commit to philikon/ts-proto that referenced this issue Jun 25, 2020
stephenh pushed a commit that referenced this issue Jul 11, 2020
@stephenh
Copy link
Owner

I released your oneof=unions as v1.26.0. Thanks again!

Is this issue good to close out now?

@philikon
Copy link
Contributor Author

It is, thanks a bunch!

zfy0701 added a commit to sentioxyz/ts-proto that referenced this issue Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants