Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize filecheck directives #128018

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented Jul 20, 2024

Typically in the LLVM repo, filecheck prefixes will be uppercase (this is always true) and start with CHECK- (true a bit more than half the time). Currently we using exact revision names as filecheck directives, in the same case, so they don't really look like filecheck directives in code:

    // CHECK-LABEL: foo
    // x86: pop    rbp
    // aarch64: add     sp, sp, #32
    // CHECK-NEXT ret

Some tests use uppercase revisions, which I suspect this is likely to make them look more like typical filecheck.

So, do the following:

  • In compiletest, change the check prefixes to always be CHECK-{revision.to_uppercase()} rather than just revision
  • For any tests that currently have uppercase revision names, change them to lowercase
  • Update all revision directives to uppercase and start with CHECK-
  • For anything that used an invalid prefix as a way to skip a check (e.g. FIXME-CHECK:), change those to use COM: so we can eventually check for invalid check prefixes.

The result looks more consistent and makes it easier to identify and grep for directives. As a bonus, some editors make them stand out with highlighting (which currently happens for most directives, just not those from revisions):

image image

I used the following to verify there aren't any directives that missed an update:

rg --pcre2 '^\s*//\s*(?!(CHECK|COM))\S*:(?!(//|:))' \
    tests/codegen tests/assembly/ tests/mir-opt/

And this to identify uppercase revisions:

rg '//@\s*revisions:.*[A-Z]'

Marking as a draft since I did the changes, but still need to go through by hand and verify there is nothing that looks out of place.

This should get a .git-blame-ignore-revs entry.

r? @jieyouxu

Typically, filecheck prefixes will be uppercase (always true) and start
with `CHECK-` (almost always true). Currently we allow using revision
names as filecheck directives, but they are passed directly.  That means
that our directives are exactly what the revision name is, in the same
case, so they only look like filecheck directives if the revision name
is uppercase (usually they are lowercase).

Update this so that we always uppercase revision names and prefix them
with `CHECK-` when used as directives. This is better for consistency,
makes it easier to identify directives in the tests, and has the nice
side effect that some editors will make the directive stand out by
highlighting it (which currently happens for most directives, just not
those from revisions).
@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc PG-exploit-mitigations Project group: Exploit mitigations S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 20, 2024
@workingjubilee
Copy link
Member

obvious hazard: a revision named next

@rust-log-analyzer

This comment has been minimized.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel I want this general idea but I'm not sure exactly this way is right.

Comment on lines -116 to -120
// FIXME-CHECK-DAG: %[[EQ:.+]] = icmp eq i16 %[[A0]], %[[B0]]
// FIXME-CHECK-DAG: %[[CMP0:.+]] = icmp sge i16 %[[A0]], %[[B0]]
// FIXME-CHECK-DAG: %[[CMP1:.+]] = icmp uge i16 %[[A1]], %[[B1]]
// FIXME-CHECK: %[[R:.+]] = select i1 %[[EQ]], i1 %[[CMP1]], i1 %[[CMP0]]
// FIXME-CHECK: ret i1 %[[R]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

silly option to instead make this valid: emit FIXME-CHECK as a directive but check that it's failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh I like that idea. I guess we would have to do something like transform it to CHECK-NOT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that would cause problems if e.g. one line matches. Maybe there could be an implicit FIXME rev and then assert that it fails.

I am sort of working on a followup that will (hopefully) let us locate filecheck directives and make sure we don't have anything unexpected, that would probably be easier to do with that available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I would assume it would be an implicit rev and run as a separate test.

Comment on lines +325 to +326
// CHECK-X86_64: mov r{{[a-z0-9]+}}, r{{[a-z0-9]+}}
// CHECK-I686: mov e{{[a-z0-9]+}}, e{{[a-z0-9]+}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I feel like I would prefer CHECK-x86_64 and CHECK-i686? i.e. exact match of rev name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uppercasing was some of the point of this since it matches builtin directives, and is what LLVM uses (209 completely lowercase vs. 24.4k completely uppercase, mixed < 1k). So I think it is some of what people are used to, which I think is why we seem to have a mix of uppercase and lowercase revision names.

(plus, at least in my editor (helix), it only highlights if the string leading up to the : is all uppercase, so it looks more consistent. Obviously that is the about the furthest thing from a good reason to do anything, and I have no idea about other editors, but bonuses are nice).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can get used to it, probably, the other concern about ordering of the rev and CHECK would rate higher if any.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like X86_64 or CHECK-x86_64. When I encounter unfamiliar terms, all-uppercase is difficult to understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with the revision name exact matching. I'm not sure if in compiletest we currently treat revision names as case-insensitive (if we don't, then we probably should), but IMO an exact case match against a revision name is less confusing. For target names especially, sometimes the casing contributes to more familiarity.

I don't have a strong opinion on this stylistically, anyway.

Copy link
Contributor Author

@tgross35 tgross35 Jul 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like X86_64 or CHECK-x86_64. When I encounter unfamiliar terms, all-uppercase is difficult to understand.

I would counter that it is more likely for someone newish to understand that CHECK-X86_64: pertains to x86_64, than it is for someone newish to understand that X86_64: is indicating a pattern that gets checked.

@@ -51,7 +51,7 @@ pub fn square_packed_full(x: Simd<f32, 3>) -> FullSimd<f32, 3> {
pub fn square_packed(x: Simd<f32, 3>) -> Simd<f32, 3> {
// CHECK-NEXT: start
// CHECK-NEXT: load <3 x float>
// noopt-NEXT: load <3 x float>
// CHECK-NOOPT-NEXT: load <3 x float>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was made slightly harder to read. The rev name was sorta intentional, to make it stand out more, but now it blurs into the other CHECK*NEXTs and becomes easier to overlook. maybe it should be "{rev}-CHECK"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I wonder if FileCheck would accept [{REV}]-CHECK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I wonder if FileCheck would accept [{REV}]-CHECK?

Any string similar to a variable name is ok.

@tgross35 tgross35 force-pushed the normalize-filecheck-directives branch 3 times, most recently from fb33fb8 to 6b8b6b7 Compare July 20, 2024 23:20
Since revisions are now always passed as `CHECK-{rev.to_uppercase()}`,
update test cases to use this format.

There were also some test files that used uppercase names, presumably so
the directives would match filecheck. Normalize these revision names to
lowercase.

Lastly, remove use of invalid labels (such as `FIXME-CHECK:`) to skip
filecheck directives. (`COM: CHECK ...` is the syntax for a comment).

The below query can be used to ensure that there are no missing
directives:

    rg --pcre2 '^\s*//\s*(?!(CHECK|COM))\S*:(?!(//|:))' \
        tests/codegen tests/assembly/ tests/mir-opt/

(Note that the above may report some non-filecheck FIXMEs and other
comments).
@tgross35 tgross35 force-pushed the normalize-filecheck-directives branch from 6b8b6b7 to 144fa44 Compare July 20, 2024 23:23
@jieyouxu
Copy link
Member

To me normalizing filecheck directives like this makes sense, but I'm not the most familiar with filecheck, so I'll request a sanity check + vibe check early from people who do:

cc @scottmcm since I know you are quite familiar with filecheck and codegen tests, does normalizing filecheck directives like this make the test reading/writing experience better/worse?

@rustbot ping llvm
Dear wg-llvm members, does normalizing llvm filecheck directives in our tests seem like a good idea? From what I can tell, here we are trying to:

  • Use a more consistent filecheck prefix (make sure CHECK is present).
  • Use a more consistent casing (always upper case).

@rustbot rustbot added the ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group label Jul 21, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jul 21, 2024

Hey LLVM ICE-breakers! This bug has been identified as a good
"LLVM ICE-breaking candidate". In case it's useful, here are some
instructions for tackling these sorts of bugs. Maybe take a look?
Thanks! <3

cc @ayazhafiz @camelid @comex @cuviper @DianQK @hdhoang @henryboisdequin @heyrutvik @higuoxing @JOE1994 @jryans @luqmana @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @vertexclique

@jieyouxu
Copy link
Member

jieyouxu commented Jul 21, 2024

Err I'm so sorry, I think that's not the intended llvm group I wanted to ping... maybe cc @rust-lang/wg-llvm? But if anyone pinged has any filecheck experience and has feedback for our test suites' llvm filecheck syntax changes here, that is also very welcomed.

@tgross35
Copy link
Contributor Author

tgross35 commented Jul 21, 2024

^ wow does it just ping everybody related to LLVM? More people than I would have expected lol. That definitely seems like an unfortunately large ping group to have named llvm (opened a Zulip topic to discuss changing it).

In any case good call, thanks for the follow up.

@jieyouxu jieyouxu removed the ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group label Jul 21, 2024
Copy link
Member

@DianQK DianQK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we use uppercase to enhance readability here.

if let Some(rev) = self.revision {
filecheck.arg("--check-prefix").arg(rev);
filecheck.arg(format!("--check-prefix=CHECK-{}", rev.to_uppercase()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's appropriate to add CHECK- to all prefixes. For me, uppercase is enough to emphasize that this is not an ordinary comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't mind using CHECK- for everything either, but since it's a hidden convention, we at least need documentation to explain it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did update the in-repo docs as part of this https://github.com/rust-lang/rust/pull/128018/files#diff-b40381314bcd0d5dd401af44ac13150c8129d97c97c6a76cbae9e3841a551329 :) and I would follow this with a dev guide update

Comment on lines +325 to +326
// CHECK-X86_64: mov r{{[a-z0-9]+}}, r{{[a-z0-9]+}}
// CHECK-I686: mov e{{[a-z0-9]+}}, e{{[a-z0-9]+}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like X86_64 or CHECK-x86_64. When I encounter unfamiliar terms, all-uppercase is difficult to understand.

@@ -3,44 +3,44 @@
//
//@ needs-sanitizer-address
//@ needs-sanitizer-memory
//@ revisions:ASAN ASAN-RECOVER MSAN MSAN-RECOVER MSAN-RECOVER-LTO
//@ revisions: asan asan-recover msan msan-recover msan-recover-lto
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; if this goes forward, I will split the upper->lowercase items out.

// CHECK-MSAN-RECOVER-NOT: unreachable
// CHECK-MSAN-RECOVER: }
//
// CHECK-MSAN-RECOVER-LTO-LABEL: define dso_local noundef i32 @penguin(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A long check mark, I think, CHECK- reduces overall readability. :)

@@ -51,7 +51,7 @@ pub fn square_packed_full(x: Simd<f32, 3>) -> FullSimd<f32, 3> {
pub fn square_packed(x: Simd<f32, 3>) -> Simd<f32, 3> {
// CHECK-NEXT: start
// CHECK-NEXT: load <3 x float>
// noopt-NEXT: load <3 x float>
// CHECK-NOOPT-NEXT: load <3 x float>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I wonder if FileCheck would accept [{REV}]-CHECK?

Any string similar to a variable name is ok.

@nikic
Copy link
Contributor

nikic commented Jul 21, 2024

Don't really have a strong opinion on how we write this. I don't think this part of the PR description is correct though:

Typically in the LLVM repo, filecheck prefixes will be uppercase (this is always true) and start with CHECK- (true a bit more than half the time).

While the first part is true, LLVM does not have a convention to prefix everything with CHECK-. It's common to use CHECK as a common fallback prefix for --check-prefixes, but the non-fallback prefixes generally do not start with CHECK-, especially if the check prefixes are used to distinguish subtargets. You'll see lots of --check-prefixes=CHECK,AVX2 and very little --check-prefixes=CHECK,CHECK-AVX2.


Always using CHECK- may make sense for Rust though, which has a lot more people not familiar with FileCheck working on it. If it helps people distinguish what is and isn't a FileCheck directive, I don't mind.

@jieyouxu
Copy link
Member

Always using CHECK- may make sense for Rust though, which has a lot more people not familiar with FileCheck working on it. If it helps people distinguish what is and isn't a FileCheck directive, I don't mind.

Ever since we migrated from // <compiletest_directive> to //@ <compiletest_directive>, at least it's easier to tell which are compiletest directives and which are FileCheck directives. I don't have a strong opinion on whether we want the CHECK- prefix either.

@tgross35
Copy link
Contributor Author

tgross35 commented Jul 21, 2024

Don't really have a strong opinion on how we write this. I don't think this part of the PR description is correct though:

Typically in the LLVM repo, filecheck prefixes will be uppercase (this is always true) and start with CHECK- (true a bit more than half the time).

While the first part is true, LLVM does not have a convention to prefix everything with CHECK-. It's common to use CHECK as a common fallback prefix for --check-prefixes, but the non-fallback prefixes generally do not start with CHECK-, especially if the check prefixes are used to distinguish subtargets. You'll see lots of --check-prefixes=CHECK,AVX2 and very little --check-prefixes=CHECK,CHECK-AVX2.

I was just going off a rough search - 7.6k out of 15.7k total files seem to use prefixes starting with CHECK. I guess it is slightly less than half, not slightly more.

Always using CHECK- may make sense for Rust though, which has a lot more people not familiar with FileCheck working on it. If it helps people distinguish what is and isn't a FileCheck directive, I don't mind.

I think it is more obvious that CHECK-X86: is just a variant of plain CHECK, which helps if you are not familiar with the format. And it gives something easier to search for (compared to just searching "X86:").


I feel most strongly about uppercasing since there is strong precedent in LLVM, it reads better with -NOT -NEXT etc suffixes, and I don't like the status quo of having a portion of tests using uppercase revisions so their filecheck directives look more like LLVM.

CHECK- I feel slightly less strongly about. I still think this is helpful for newcomers to the format, and appreciate that it looks much more consistent. Also nice for tooling; a consistent prefix makes it easier for us to flag accidentally unused directives, or find things that look like they are supposed to be a directive but aren't getting picked up, and eventually drop --allow-unused-prefixes (I am working on these). I don't think there is any strong argument against it as long as this is documented well.

(There are a couple comments here about certain CHECK-REVNAMEs becoming less readable, but I think that is easier fixed by changing a few revision names than trying to build a global format around them. I am also not opposed to the proposed REV-CHECK, even though it is less common in LLVM I think it achieves similar goals).

In any case, I can change as needed. Suppose we may as well wait for Scott and any others to chime in.

@workingjubilee
Copy link
Member

Despite the "slightly" qualifier, I do feel more strongly about the {REV}-CHECK ordering.

When I scan code, I am often looking for breaks in patterns, and it is easiest to do that if the initial prefix for different revs is visually distinct. Of course, a revision named czech (resulting in a CZECH-NEXT) would also be bad, even if it was CZECH-CHECK-NEXT, simply because it has enough of the same letters. But I don't expect many revisions named for the Czech Republic, despite it producing apparently a significant amount of silicon.

@bors
Copy link
Contributor

bors commented Jul 30, 2024

☔ The latest upstream changes (presumably #128378) made this pull request unmergeable. Please resolve the merge conflicts.

@jieyouxu
Copy link
Member

jieyouxu commented Aug 16, 2024

@tgross35 since there's some different preferences and I want to gauge consensus and preferences for the people who will be maintaining/reviewing/reading the tests, I'll draft a T-compiler MCP later tonight or maybe tomorrow, so people can register concerns if they have strong preferences and suggestions. By filing a T-compiler MCP, we can make sure at least T-compiler reviewers are aware of the normalized FileCheck syntax (I'll help with follow-up doc updates and such). I'll also tag wg-llvm members and Jubilee (and Scott) along since Jubilee has some preferences and Scott reads/writes/reviews quite a bit of FileCheck tests as well. I'll try to come up with a draft MCP and then share it with you in case you want to propose any improvements/changes.

(Also sorry for the review delay, I ignored the draft PRs for a while due to other non-draft PRs)

@jieyouxu jieyouxu added S-waiting-on-MCP Status: PR has a compiler MCP and is waiting for the compiler MCP to complete. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc PG-exploit-mitigations Project group: Exploit mitigations S-waiting-on-MCP Status: PR has a compiler MCP and is waiting for the compiler MCP to complete. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants