Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong info about 0XE+2 #3

Open
pmor13 opened this issue Jun 9, 2022 · 1 comment
Open

Wrong info about 0XE+2 #3

pmor13 opened this issue Jun 9, 2022 · 1 comment

Comments

@pmor13
Copy link

pmor13 commented Jun 9, 2022

0XE+2 should evaluate to 16, however, both gcc and clang give an error: invalid suffix "+2" on integer constant. Both bugs are known: gcc, clang. MSVC handles it correctly. This may be due to the definition of pp-numbers and is mentioned in the standard https://eel.is/c++draft/lex.pptoken#example-2.

This is wrong. Here is why:

  1. By default (see below) 0XE+2 should not evaluate to 16. Per C11 5.1.1.2 Translation phases on phase 7 "Each preprocessing token is converted into a token". The 0XE+2 is pp-number (6.4.8 Preprocessing numbers), which is an instance of preprocessing-token. Hence, preprocessing token 0XE+2 is required to be converted into a token 0XE+2, which is invalid constant.
  2. Both gcc and clang correctly produce a diagnostic message (e.g. an error).
    Note: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3885 has status "RESOLVED INVALID".
  3. Both are not bugs, but diagnostics improvement / enhancement.
  4. MSVC handles it incorrectly (no diagnostic message produced). Both C and C++ compilers. Note: preprocessing token similar to 0XE+2 exists in Microsoft system header files.

Extra:

  1. As an extension I think that an implementation can convert a single preprocessing token into multiple tokens. However, diagnostics is required. It can be the following diagnostics:
warning: preprocessing token “0x1e+1” converted into multiple tokens “0x1e”, “+”, “1”
  1. Rationale for C (6.4.8 Preprocessing numbers):

The notion of preprocessing numbers was introduced to simplify the description of
preprocessing. It provides a means of talking about the tokenization of strings that look like
15 numbers, or initial substrings of numbers, prior to their semantic interpretation. In the interests
of keeping the description simple, occasional spurious forms are scanned as preprocessing
numbers. For example, 0x123E+1 is a single token under the rules. The C89 Committee felt
that it was better to tolerate such anomalies than burden the preprocessor with a more exact, and
exacting, lexical specification. It felt that this anomaly was no worse than the principle under
20 which the characters a+++++b are tokenized as a ++ ++ + b (an invalid expression), even
though the tokenization a ++ + ++ b would yield a syntactically correct expression. In both
cases, exercise of reasonable precaution in coding style avoids surprises.

  1. My personal view: I think that cases “valid pp-number (C11, 6.4.8 Preprocessing numbers) => invalid constant (C11, 6.4.4 Constants)” are the price of “simplification of the description of preprocessing”. These cases may be seen as language defects. I'd probably "burden the preprocessor with a more exact, and exacting, lexical specification".
@pmor13
Copy link
Author

pmor13 commented Jun 9, 2022

  1. Some C compilers diagnose it differently than GCC/LLVM. Examples:
xxx: error: cannot convert preprocessing token into valid token
Tendra: Error: Can't convert '0xe+1' to a number.
ICC: error: extra text after expected end of number
Chibicc: invalid numeric constant
TCC: error: invalid number

@pmor13 pmor13 changed the title Fix wrong info about 0XE+2 Wrong info about 0XE+2 Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant