Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative naming for has_extern keyword #3986

Closed
jonmeow opened this issue May 24, 2024 · 29 comments
Closed

Alternative naming for has_extern keyword #3986

jonmeow opened this issue May 24, 2024 · 29 comments
Labels
leads question A question for the leads team

Comments

@jonmeow
Copy link
Contributor

jonmeow commented May 24, 2024

Summary of issue:

#3980 is looking for a keyword to mark that an entity has an extern declaration. Tentatively it uses has_extern; is there a different name that leads would prefer?

Details:

So far, options I think have been brought up look similar:

  • has_extern
  • is_extern
  • externed

I'm currently going with has_extern because it's unambiguous, although the _ seems atypical for keywords. is_extern seems to have the wrong implication. externed seems more on the side of confusion.

Not marking is an option, although as noted in #3980 it brings benefits for validation. There may also be some other form of marking that would work rather than a simple modifier keyword.

Any other information that you want to share?

No response

@jonmeow jonmeow added the leads question A question for the leads team label May 24, 2024
@ilg-ul
Copy link

ilg-ul commented May 24, 2024

... going with has_extern because it's unambiguous, although the _ seems atypical for keywords.

has extern, in two words, is not a possible choice?

jonmeow added a commit to jonmeow/carbon-lang that referenced this issue May 24, 2024
@jonmeow
Copy link
Contributor Author

jonmeow commented May 24, 2024

has extern would be an option, although to note a couple concerns:

  • has here is essentially behaving as a modifier to the extern modifier keyword, something that I don't think we currently do.
  • This would require lexing has as a keyword, restricting its use for other identifiers (which would need to then write r#has, which would probably discourage use). To give an idea of impact, here's a quick search.
    • Note, is_extern has less hits but still a few, has_extern and externed are negligible.
    • While some other new keywords have similar overlap with C++ code, there's probably incremental friction with each decision that forces developers to change identifier names,.
  • If the intent of separating has is to allow its use with other similar keywords, it's probably worth considering how this works with repetition, e.g. has extern has something has else fn Foo(); -- the repetition may make it harder to understand, although that could probably be addressed with has(extern) syntax.

@jonmeow jonmeow closed this as completed May 24, 2024
@jonmeow jonmeow reopened this May 24, 2024
jonmeow added a commit to jonmeow/carbon-lang that referenced this issue May 24, 2024
@chandlerc
Copy link
Contributor

To be clear -- I think the leads need to decide both on a name, but also on whether they want a keyword marking this at all. While there has been some discussion of this marker being desirable in some ways, I think it's worth getting the leads do clearly decide that the benefits outweigh the costs.

(FWIW, as a lead, I'm currently ambivalent on whether its worth having this marker. There are very clear benefits, but there are also drawbacks and I'm having a hard time getting to a high-confidence stance here.)

@jonmeow
Copy link
Contributor Author

jonmeow commented May 28, 2024

@chandlerc Do the arguments in #3980 accurately capture your perspective on the trade-offs of whether the keyword should be provided? If not, perhaps you could detail the trade-offs that you're seeing? i.e.:

The modifier offers a benefit for being able to verify the association between
extern and has_extern declarations, and offers additional parity in
modifiers. It also makes it easy for a tool to know if it's missing a
declaration.

@zygoloid
Copy link
Contributor

I think we have a few options in front of us:

  • No marker in the library that there is an extern declaration.
  • Marker on the first declaration in the library.
  • Marker on the definition.
  • Marker on every declaration in the library.

We've recently said we want to avoid repeating annotations between declaration and definition, which suggests we don't want the fourth option. For the third option, we could use extern itself as the marker, but perhaps that's too subtle / clever. The second option might give some benefits to the toolchain, by letting it perform different operations for a declaration that expects to have a prior extern declaration versus one that doesn't, but I don't know that there's actually a difference in practice. The first option doesn't allow us to do any validation that extern declarations are intended to be permitted, but I'm not sure how much value that has, given that it doesn't affect whether we can do any validation for extern declarations that are not imported at the point where the type is declared in its owning library, which I'd expect to be the common case.

We have previously talked about a model where, for types with extern declarations, a direct import of the owning library is required for the type to be complete. The idea was to avoid an action-at-a-distance problem, where a distant library removing an import could otherwise change whether such a type is considered complete. If we still want to consider that rule, then I think we should consider providing an annotation on the definition that says "a direct import is necessary to use this definition", with an error if extern is used on a type without that annotation. (In particular, this wouldn't be a "has extern" annotation, because it would be valid and meaningful even if there is no extern declaration.)

@jonmeow
Copy link
Contributor Author

jonmeow commented May 29, 2024

The second option might give some benefits to the toolchain, by letting it perform different operations for a declaration that expects to have a prior extern declaration versus one that doesn't, but I don't know that there's actually a difference in practice. The first option doesn't allow us to do any validation that extern declarations are intended to be permitted, but I'm not sure how much value that has, given that it doesn't affect whether we can do any validation for extern declarations that are not imported at the point where the type is declared in its owning library, which I'd expect to be the common case.

I think if you have the code:

library "a";
extern fn F();
library "b";
fn F();

Then this is going to be a linker error. I'd been expecting that, when extern fn F() is declared, it'll emit something that F() only provides if it sees the extern directly. e.g., maybe there's some parallel "F-as-extern" symbol for this purpose.

If has_extern existed, we could emit a compile error in the absence of an import of the extern. To your point about it not affecting whether validation can be done, I think that's correct, but I think it affects the quality of diagnostics to promote linker errors to compiler errors.

@zygoloid
Copy link
Contributor

If has_extern existed, we could emit a compile error in the absence of an import of the extern.

When compiling "a", we don't know where the owning declaration is or whether it's marked has_extern, and when compiling "b", we don't know there's an extern declaration. How can we emit the compile error before link time?

@jonmeow
Copy link
Contributor Author

jonmeow commented May 29, 2024

If has_extern existed, we could emit a compile error in the absence of an import of the extern.

When compiling "a", we don't know where the owning declaration is or whether it's marked has_extern, and when compiling "b", we don't know there's an extern declaration. How can we emit the compile error before link time?

The example was intended for a "has_extern does not exist" case; the difference is that if "b" were instead doing has_extern fn F(), i.e., that the user is only forgetting the import not the explicit "I mean for there to be an extern", there would be a compile error. This could help in cases of both adding a new extern, as well as incorrectly moving an extern between files without fixing imports. [ed: although at the cost of a compile-time diagnostic for "forgot has_extern"]

@zygoloid
Copy link
Contributor

zygoloid commented Jun 4, 2024

I see. Yes, there's a benefit here that a declaration annotated with has_extern is safer under refactorings that might unintentionally lose the import of the extern declaration -- we can diagnose that during compilation.

There's perhaps an overarching principle that we might want to consider adopting here: we could try to guarantee that if your library still compiles when you remove a non-exported import, then it is safe to remove that import. This guarantee would be very useful for tooling seeking to automatically minimize the imports of a file. The has_extern keyword then helps us give that guarantee, as does the rule that a direct import is required for a has_extern type to be complete. (One other aspect of this we'd need to think about is impact of this guarantee on which types are considered complete during template instantiation.)

jonmeow added a commit to jonmeow/carbon-lang that referenced this issue Jun 6, 2024
@chandlerc
Copy link
Contributor

@zygoloid and I discussed this and I think we have a thought of what might be a reasonable direction here, unless someone finds a problem with it.

The key is that we'd prefer to have the syntax reflect some fundamental property of the type rather than the existence of some other thing (that there exists an extern declaration).

We think we can get there though, the meaning we're trying to confer is:

  • This type requires a direct import (or the same file) to have its definition available -- a transitive import is not sufficient.
  • Somewhat as a consequence of this, it allows us to have a forward declaration in another library.

Combined, they enable more sparse and minimal dependencies when desirable.

We don't have a great name for this property, but we have been using extern as essentially a familiar mnemonic for it. That seems like a reasonable starting position, and rotating from keyword A to keyword B if we come up with a better name should be an easy change to make.

To finish satisfying the idea of having the syntax reflect a fundamental property, we'd like to suggest requiring that all the declarations, including the definition, have the extern modifier. And its meaning on a definition is that there must be exactly one preceding forward declaration, which may be in a separate library, but if so must be imported.

This does suggest a small extension to #3986 which is to allow in a single API file:

extern class C;
extern class C {
  // ...
}

Or to allow an API file:

package Widget;

extern class C;

With an implementation file:

impl package Widget;

extern class C {
  // ...
}

These would not take advantage of the allowed separate library for their extern declaration, but would be effectively reserving the option of adding such a separate library in the future.

We would still avoid the need to do any merging here, as for an extern declaration in one library to be the same entity as an extern definition in another library, the defining library must import the declaring library.

And we would still get most of the validation in the face of refactorings and other changes as they would when removing an import result in no preceding extern forward declaraiton, and there must be at least one.

The main downside that I see here (beyond that this keyword isn't ideal or obviously tied to the semantic as mentioned above), is that there may be some confusion when writing a definition of a class with extern attached. That wouldn't make much sense coming from the C/C++ use of this mnemonic. However, I think I (and my impression is @zygoloid, but good to get confirmation) am more comfortable with that syntactic confusion than the more predicate-like formulation of has_extern or similar.

What do others think about this?

@josh11b
Copy link
Contributor

josh11b commented Jun 26, 2024

In the case where you have an extern declaration in the API file of library "a", a forward declaration in the API file of library "b", and a definition in the impl file of library "b", are you saying that both API files would have the same extern fn F(); text? That seems to make it very hard to tell which library owns it from the text of the API file.

Is the statement that what you can do with the name is the same, so users should not care about the difference, since the only factors that matter are whether you can see a declaration and whether you can see a definition?

@chandlerc
Copy link
Contributor

In the case where you have an extern declaration in the API file of library "a", a forward declaration in the API file of library "b", and a definition in the impl file of library "b", are you saying that both API files would have the same extern fn F(); text? That seems to make it very hard to tell which library owns it from the text of the API file.

Given that we require the API file of "b" to import "a", could we disallow adding the second forward declaration there?

Is the statement that what you can do with the name is the same, so users should not care about the difference, since the only factors that matter are whether you can see a declaration and whether you can see a definition?

More, whether you directly import a definition.

@jonmeow
Copy link
Contributor Author

jonmeow commented Jun 26, 2024

To be sure I understand correctly, in the proposal, this is essentially the "Allow up to two declarations total" option. Previously that had been declined because it prevents adding non-extern impls in the API file when the definition is in an implementation file. There had been concerns that disallowing that would make it harder to separate build dependencies. To be sure, that support is no longer important?

@josh11b
Copy link
Contributor

josh11b commented Jun 26, 2024

I also thought we expected documentation of entities in the owning library's API file even if there was an extern declaration in another library?

@zygoloid
Copy link
Contributor

This whole situation of an extern declaration in one library and a definition in an implementation file of a different library seems strange to me, and it's not clear to me how this comes about or what semantics we'd want it to have. I think a more concrete example would help clarify my thinking. Can we come up with one?

@josh11b
Copy link
Contributor

josh11b commented Jun 26, 2024

The most interesting example to me is when interface implementation is involved. So something like:

library "i";
interface I {}
library "e";
import library "i";
extern class C;
extern impl C as I;
library "o";
import library "e";
has_extern class C { }
has_extern impl C as I;
impl library "o";
has_extern impl C as I { }

The idea here is that you generally only need to know that a type implements an interface, not see the definition of the implementation, to be able to pass values and objects of that type to functions requiring that interface.

@zygoloid
Copy link
Contributor

Should the class definition be in "o" here? Or is the point to demonstrate an issue with "extern impl" rather than "extern class"?

@josh11b
Copy link
Contributor

josh11b commented Jun 26, 2024

Point was to show the issue with extern impl.

@zygoloid
Copy link
Contributor

Following a bunch of discussion, here's one possibility:

  • The non-owning declaration is written extern library "owning_library" class C;
  • The owning declarations are all written extern class C; / extern class C { ... }
  • We do require the owning library to import the extern declaration
  • We do not require there to be an extern declaration

An extern class must be imported directly to be complete, per #4025.

When an extern library "blah" type is imported, we check whether library "blah" is also imported.

  • No => the type is incomplete.
  • Yes => library "blah" had better contain a redeclaration of it, and that declaration is used instead.

The hope is that this will give us better diagnostics, better human readability and understandability of the code, a single meaning for extern, and only one new keyword. Compared to other options we considered, the extra verbosity goes on the forward declaration in the non-owning library, which we expect to be read less often.

How well does this work for people?

@jonmeow
Copy link
Contributor Author

jonmeow commented Jun 28, 2024

There was discussion about this in the open discussion yesterday. There aren't many notes, and since discussion is relevant here anyways, I'm going to try to summarize what I recall here.

I think there's consensus now regarding the interface issue noted by josh11b above due to concerns about how the impl's library ownership is expressed, particularly if a developer tried to move the declaration of class C to the implementation file (please correct me if I'm incorrectly summarizing that).

My understanding from the meeting is that there's still a desire to express that a declaration must be directly imported for types where all declarations are in the same library (i.e., not extern) as was noted by chandlerc above.

A few options we went over in the meeting were:

  • In the non-owning library, add a keyword, such as extern unowned.
    • A concern with this is that in the owning library, extern leaves it ambiguous as to whether there's an unowned declaration. The validation of import is desired.
  • In the owning library, add a keyword.
    • This second keyword could be used independently of extern to mean "must be directly imported".
    • Several options discussed were: extern direct, extern owned, extern nontransitive, extern iwyu (import what you use). I think people weren't really excited about any of these options.

If I'm missing anything in this summary, let me know.

@jonmeow
Copy link
Contributor Author

jonmeow commented Jun 28, 2024

Regarding extern library, I'm primarily concerned about the verbosity. I'd expect library names to typically be fairly long, similar to #include's in C++, although somewhat reduced due to the package name. Suppose they frequently run about 30 characters though, extern library "..." together would be about 45 characters; I think that would hide the class TypeName at the end visually, especially if there are several extern types creating a vertically repeated block, as in:

import library "diagnostics/diagnostic_emitter";

extern library "diagnostics/diagnostic_emitter" class DiagnosticEmitter;
extern library "diagnostics/diagnostic_emitter" class DiagnosticAnnotationScope;
extern library "diagnostics/diagnostic_emitter"
    fn MakeDiagnosticAnnotationScope(...);

That could be partly addressed by scoping (extern library "Foo" { class C; class D; class E; }) but I think there's generally been reticence towards scoping names this way. I understand the cost of a new keyword, and the restriction to have the extern decl in a specific library has its upsides, but the syntax cost seems substantial.

@chandlerc
Copy link
Contributor

FWIW, the extern library "..." approach is growing on me.

That said, I definitely see the point about a risk of verbosity. I think some of this depends on whether these end up useful in very narrow, targeted cases, or if we end up needing lots of types to use the pattern. If the latter, the verbosity seems like it will quickly be a problem.

If it does, I think Jon's suggestion here is great:

That could be partly addressed by scoping (extern library "Foo" { class C; class D; class E; }) but I think there's generally been reticence towards scoping names this way.

While there has been reticence, I think for a case like this, and notably "forward declarations only", it would be a reasonable way to address verbosity. It's not clear we need it right away or whether to keep it in our back pocket until we can see a bit more how widely we want to use this feature in practice.

Overall, I feel like even if we end up tweaking (as above, or in some other way) the syntax to address verbosity, this direction provides a nice starting point with strong semantic properties to let us move forward.

@chandlerc
Copy link
Contributor

After checking with leads, let's call this decided with the extern library "..." direction, which should unblock the proposal in this space.

@jonmeow
Copy link
Contributor Author

jonmeow commented Jul 2, 2024

@chandlerc For the proposal, can you please provide the rationale for why leads are choosing extern library "..."?

@chandlerc
Copy link
Contributor

chandlerc commented Jul 3, 2024

From @zygoloid's comment above:

The hope is that this will give us better diagnostics, better human readability and understandability of the code, a single meaning for extern, and only one new keyword. Compared to other options we considered, the extra verbosity goes on the forward declaration in the non-owning library, which we expect to be read less often.

For the verbosity issue:

If this is used with repeated types that make the verbosity a problem, we should explore some factoring or grouping structure that factors out the verbose library component, such as extern library "..." { <forward declarations> }. But that can be done as a follow-up with experience in how this is used.

@jonmeow
Copy link
Contributor Author

jonmeow commented Jul 3, 2024

Okay, I'll try to figure out how to map this to project goals.

@jonmeow
Copy link
Contributor Author

jonmeow commented Jul 3, 2024

Sorry, one point of clarification... Is this allowed:

library "a";

// The existence of `extern library` is not required.
// This is the version that requires direct imports.
extern class C {}

I want to be sure, as a consequence is that arbitrary libraries may later add an extern library declaration.

We do require the owning library to import the extern declaration
We do not require there to be an extern declaration

This is why I'm inferring this, but I want to be sure that the desire to restrict use is a feature.

Similarly, it sounds like multiple extern library declarations are allowed?

@chandlerc
Copy link
Contributor

Sorry, one point of clarification...

Clarifications are good!

Is this allowed:

library "a";

// The existence of `extern library` is not required.
// This is the version that requires direct imports.
extern class C {}

I want to be sure, as a consequence is that arbitrary libraries may later add an extern library declaration.

We do require the owning library to import the extern declaration
We do not require there to be an extern declaration

This is why I'm inferring this, but I want to be sure that the desire to restrict use is a feature.

I think there is a missing "not" before "restrict" here.

But yes, I think "We do not require there to be an extern declaration" is actually saying "We do not require there to be an extern [library] declaration".

And this would allow an API file to require direct-import-to-use without factoring out a forward declaration to a separate library. The idea behind this is that enabling that requirement is a breaking change for consumers of the library and so might be made independently or in advance of actually creating a separate-library declaration.

Similarly, it sounds like multiple extern library declarations are allowed?

They all need to be imported into the specified library, but otherwise, unless we encounter a reason to restrict them when implementing them, I would allow them. If we do end up finding reasons to restrict them, I think that would likely be fine, but I would wait until they show up.

And tagging @zygoloid to make sure I'm not misrepresenting / remembering anything ehre.

@zygoloid
Copy link
Contributor

zygoloid commented Jul 3, 2024

They all need to be imported into the specified library, but otherwise, unless we encounter a reason to restrict them when implementing them, I would allow them. If we do end up finding reasons to restrict them, I think that would likely be fine, but I would wait until they show up.

I think the desire to have a single canonical declaration for everything and to not need to do any merging at import time are good and known reasons for a restriction here. We'd need to do some non-trivial import-time merging work to get the right behavior with multiple imported extern library declarations and no owning declaration.

(Otherwise I agree.)

jonmeow added a commit to jonmeow/carbon-lang that referenced this issue Jul 3, 2024
jonmeow added a commit to jonmeow/carbon-lang that referenced this issue Jul 9, 2024
jonmeow added a commit to jonmeow/carbon-lang that referenced this issue Jul 15, 2024
jonmeow added a commit to jonmeow/carbon-lang that referenced this issue Jul 17, 2024
jonmeow added a commit to jonmeow/carbon-lang that referenced this issue Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leads question A question for the leads team
Projects
None yet
Development

No branches or pull requests

5 participants