Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underscore characters may start an identifier #690

Open
FichteFoll opened this issue Aug 7, 2024 · 1 comment
Open

Underscore characters may start an identifier #690

FichteFoll opened this issue Aug 7, 2024 · 1 comment
Labels

Comments

@FichteFoll
Copy link

Via SublimeText/Terraform#43, it was brought to my attention that apparently underscore characters (_ U+005F) are allowed to start an identifier, whereas the current spec does not say so.

hcl/hclsyntax/spec.md

Lines 90 to 111 in 360ae57

### Identifiers
Identifiers name entities such as blocks, attributes and expression variables.
Identifiers are interpreted as per [UAX #31][uax31] Section 2. Specifically,
their syntax is defined in terms of the `ID_Start` and `ID_Continue`
character properties as follows:
```ebnf
Identifier = ID_Start (ID_Continue | '-')*;
```
The Unicode specification provides the normative requirements for identifier
parsing. Non-normatively, the spirit of this specification is that `ID_Start`
consists of Unicode letter and certain unambiguous punctuation tokens, while
`ID_Continue` augments that set with Unicode digits, combining marks, etc.
The dash character `-` is additionally allowed in identifiers, even though
that is not part of the unicode `ID_Continue` definition. This is to allow
attribute names and block type names to contain dashes, although underscores
as word separators are considered the idiomatic usage.
[uax31]: http://unicode.org/reports/tr31/ "Unicode Identifier and Pattern Syntax"

As cited, the spec specifically only allows characters in the ID_Start Unicode property, which is defined as:

ID_Start characters are derived from the Unicode General_Category of uppercase letters, lowercase letters, titlecase letters, modifier letters, other letters, letter numbers, plus Other_ID_Start, minus Pattern_Syntax and Pattern_White_Space code points.

In UnicodeSet notation:
[\p{L}\p{Nl}\p{Other_ID_Start}-\p{Pattern_Syntax}-\p{Pattern_White_Space}]

None of these properties include the underscore character. (It is, however, included in ID_Continue through Pc.)

@crw
Copy link
Contributor

crw commented Aug 7, 2024

Thanks for this report!

@crw crw added the bug label Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants