-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCC now supports SARIF as an output format #531
Comments
Awesome, @davidmalcolm ! Thanks for doing this and for letting us know about it! |
FWIW I've now posted an initial set of patches that let GCC consume SARIF, replaying a (very much just a prototype for now, though) |
The GCC RFE for tracking being able to replay SARIF is: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96032 |
@davidmalcolm, this really great! if you'd like, we'd be very glad to take a detailed look at your SARIF and see if we have any suggestions to refine. i'm very glad to hear that some of the assets we put out on the web were useful crafting this content. I have looked at the Compiler Explorer example, is that definitive/comprehensive? Do you have other substantive examples available? [Looks like you are utilizing some very interesting SARIF domains, taxa, logical locations, etc.] |
Example of GCC SARIF output showing LTO report of a cross-TU type-mismatch in a use of a variadic API: |
same example (of LTO), reformatted for ease of reading: Note how this example spans two different source files; both have their full content quoted in the "artifacts" section (my producer code adds an artifact with content for any files referenced by any results mentioned in the file) For reference, the regular (non-SARIF) GCC output for this is:
|
Another example (reformatted so it's not all on one line, for ease of reading by humans): For reference, the non-SARIF GCC output for this one looks like:
|
Let me know if there are other examples you'd like to see. |
Note that I'm not expressing the macro expansion information in the SARIF output (is there a way to do that?) |
Also: I've tried to express metadata about events via threadFlowLocation "kinds" property values, but I have some kinds of events that don't fit well with the existing examples; see #530 for more info. |
Hello! Took a look and there's lots to talk about. This work is really great and we'd love to help keep building it out. Here some starter thoughts:
Right now, a couple of people are looking at your log files in various viewers and I can provide them back to you with some updates (though you may be able to just glean what you need from the above). Looking forward to continuing the discussion! |
Sorry for the delay in getting back to you, and thanks for taking the time to look at it. Addressing one specific point:
...when I spoke of macro expansion, I was referring to languages with a preprocessor, such as C/C++, where the question of "where in the source-code-under-analysis are we?" can involve a nested series of macro expansions, potentially involving multiple files (e.g. use of a macro declared in one header, which refers to a macro in another header, etc). Consider e.g.:
GCC output: https://godbolt.org/z/87vf1cGKK where GCC's textual output can emit the chain of macro expansions:
There didn't seem to be a way to express this within SARIF. Is there one, or did I miss it? Thanks! |
clang's textual output can express similar macro expansion information, such as:
...though it emits the expansion in the opposite order to GCC. |
FWIW I've extended the GCC SARIF output support so that it now captures crashes of GCC: |
FWIW it turned out that my rather naive JSON code was traversing keys in objects in non-deterministic order when serializing, so that the ordering would vary from run to run. I've fixed that for GCC 13 here: Should the "version" be the very first thing, or should the "$schema" come first? |
FWIW I've fixed a bug in GCC's SARIF output where it was naively assuming that source code was UTF-8, leading to invalid UTF-8 in the .sarif files when quoting artifacts (either fully, or via snippets). I've fixed this for GCC 13 in https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614600.html which has some "fun" test cases e.g. for reporting invalidly encoded artifacts via a validly encoded .sarif log file. |
Well, JSON is catering many host languages. Most of them do still not understand how to implement stable insert order and maintain that or are insisting on not doing it ... Our toy-like languages Python (dicts) and JavaScript (for string keys in objects) offer insert ordering these days. But, if there is a JSON library chosen at the consumer side, that libraries authors may have decided to garble the order of keys as they will still comply to the JSON spec. Long story short, this may be an optimization which is not as important as enforcing correct character encoding. Nice work, thanks a lot for providing your resources to create the tool. PS: In CSAF v2.0 we therefore provided the schema "properties" in sorted order as this should be trivial for insert order preserving producers and consumers and is trivial to re-establish per sorting in case one link in the processing chain messed with the order. |
|
I've now added support to gcc trunk (for gcc 14) for capturing timing/profile information about the compile/analysis in SARIF form (via a custom property in the property bag on the invocation object). |
I've now add support to GCC trunk (for GCC 14) for capturing multithreaded code flows in GCC diagnostics, including in SARIF output (although nothing in GCC currently makes use of this). |
Also changed in the upcoming GCC 14:
|
FWIW I mentioned SARIF in the GCC 14 release notes in a few places: |
I've added the |
I've created a page on the GCC wiki to track our SARIF support: https://gcc.gnu.org/wiki/SARIF I've also updated my test suites to validate the generated .sarif files against the schema, and fixed some bugs where we could generate invalid .sarif output; see the above link for details. |
I've pushed lots more improvements to GCC's SARIF output to trunk (for GCC 15):
via this patch kit |
I've pushed a patch to GCC trunk (for GCC 15) to add includes/includedBy information (tracking |
I've pushed a patch to GCC trunk (for GCC 15) to capture secondary locations without labels as relatedLocations (§3.27.22), which were otherwise only making it to the textual output. |
I've pushed a patch to GCC trunk (for GCC 15) to fix an issue where the invocation property |
Not yet in GCC trunk, but FWIW I've posted patches for review for GCC 15 that expose its diagnostics subsystem as a new shared library, The patches also implement a |
@davidmalcolm this is amazing! Thanks a lot for doing this work. It is much appreciated. |
I've pushed a patch to GCC trunk (for GCC 15) which adds embedded URIs in plain text messages (as per §3.11.6). |
I've pushed a patch to GCC trunk for GCC 15 to fix the SARIF schema URL (§3.13.3) and a validation issue with GCC SARIF output seen when using the latest schema. |
I've pushed a patch to GCC trunk for GCC 15 which adds initial support for SARIF 2.2 output , based on the draft spec as of 2024-08-08. |
I've now extended this for GCC 15 so that the crash handler attempts to capture a backtrace of its own stack, and adds that to a property bag of the notification in the SARIF output. The precise format of the JSON values in the backtrace representation is deliberately undocumented and could change. |
This is now all in GCC trunk for GCC 15; see https://gcc.gnu.org/wiki/libdiagnostics for more information. |
...and FWIW:
|
That is |
I'm not sure how best to contact the SARIF community (the "Ask a Question" link takes me to this issue tracker), but here's a heads-up that I've implemented SARIF output support for GCC trunk, for GCC 13 onwards (with this 3000 line patch):
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596138.html
As noted above, my generated .sarif files seem to pass the online validator, are viewable via the online React-based viewer, and seem to work OK in the VS Code extension for SARIF. I hope I'm correctly implementing everything.
You can see it running live on Compiler Explorer; here's an example of it emitting SARIF reporting a path-sensitive double-free:
https://godbolt.org/z/nYWMM8Wx7
FWIW I'm now experimenting with GCC accepting SARIF as input (i.e. as a consumer), so that it can "replay" diagnostics serialized in SARIF form.
The text was updated successfully, but these errors were encountered: