Wrong info about 0XE+2 #3

pmor13 · 2022-06-09T18:01:18Z

0XE+2 should evaluate to 16, however, both gcc and clang give an error: invalid suffix "+2" on integer constant. Both bugs are known: gcc, clang. MSVC handles it correctly. This may be due to the definition of pp-numbers and is mentioned in the standard https://eel.is/c++draft/lex.pptoken#example-2.

This is wrong. Here is why:

By default (see below) 0XE+2 should not evaluate to 16. Per C11 5.1.1.2 Translation phases on phase 7 "Each preprocessing token is converted into a token". The 0XE+2 is pp-number (6.4.8 Preprocessing numbers), which is an instance of preprocessing-token. Hence, preprocessing token 0XE+2 is required to be converted into a token 0XE+2, which is invalid constant.
Both gcc and clang correctly produce a diagnostic message (e.g. an error).
Note: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3885 has status "RESOLVED INVALID".
Both are not bugs, but diagnostics improvement / enhancement.
MSVC handles it incorrectly (no diagnostic message produced). Both C and C++ compilers. Note: preprocessing token similar to 0XE+2 exists in Microsoft system header files.

Extra:

As an extension I think that an implementation can convert a single preprocessing token into multiple tokens. However, diagnostics is required. It can be the following diagnostics:

warning: preprocessing token “0x1e+1” converted into multiple tokens “0x1e”, “+”, “1”

Rationale for C (6.4.8 Preprocessing numbers):

The notion of preprocessing numbers was introduced to simplify the description of
preprocessing. It provides a means of talking about the tokenization of strings that look like
15 numbers, or initial substrings of numbers, prior to their semantic interpretation. In the interests
of keeping the description simple, occasional spurious forms are scanned as preprocessing
numbers. For example, 0x123E+1 is a single token under the rules. The C89 Committee felt
that it was better to tolerate such anomalies than burden the preprocessor with a more exact, and
exacting, lexical specification. It felt that this anomaly was no worse than the principle under
20 which the characters a+++++b are tokenized as a ++ ++ + b (an invalid expression), even
though the tokenization a ++ + ++ b would yield a syntactically correct expression. In both
cases, exercise of reasonable precaution in coding style avoids surprises.

My personal view: I think that cases “valid pp-number (C11, 6.4.8 Preprocessing numbers) => invalid constant (C11, 6.4.4 Constants)” are the price of “simplification of the description of preprocessing”. These cases may be seen as language defects. I'd probably "burden the preprocessor with a more exact, and exacting, lexical specification".

The text was updated successfully, but these errors were encountered:

pmor13 · 2022-06-09T20:10:22Z

Some C compilers diagnose it differently than GCC/LLVM. Examples:

xxx: error: cannot convert preprocessing token into valid token
Tendra: Error: Can't convert '0xe+1' to a number.
ICC: error: extra text after expected end of number
Chibicc: invalid numeric constant
TCC: error: invalid number

pmor13 changed the title ~~Fix wrong info about 0XE+2~~ Wrong info about 0XE+2 Jun 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong info about 0XE+2 #3

Wrong info about 0XE+2 #3

pmor13 commented Jun 9, 2022 •

edited

Loading

pmor13 commented Jun 9, 2022

Wrong info about 0XE+2 #3

Wrong info about 0XE+2 #3

Comments

pmor13 commented Jun 9, 2022 • edited Loading

pmor13 commented Jun 9, 2022

pmor13 commented Jun 9, 2022 •

edited

Loading