-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform reachability analysis on a per-harness basis #2439
Perform reachability analysis on a per-harness basis #2439
Conversation
Still missing: - cargo kani - other reachability modes
I still need to add the test I've been working on, but the code is ready for review. |
Done! |
Conflicts: kani-compiler/src/codegen_cprover_gotoc/compiler_interface.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an initial set of comments. I'm having difficulty wrapping my head around how project, target crate, harness, etc are structured or related.
The driver will now read the metadata for each crate, and collect all artifacts based on the symtab goto file that is recorded in the metadata of each harness.
Does each crate metadata file contain metadata for each harness in that crate?
Yes.That actually didn't change in this PR at all. It was a previous change. Each crate has one metadata. Each crate metadata has a vector with harnesses metadata. Each harness metadata contains the name of a goto file. Before this change, the name of the goto file was redundant since all harnesses metadata pointed to the same goto file. With this change, we extract the goto file for each harness from the metadata. |
Btw, the crate metadata is defined here: Lines 20 to 30 in 8d79ee5
It holds metadata for proof and test harnesses. In each harness metadata, we store the path to the model here: kani/kani_metadata/src/harness.rs Lines 23 to 24 in 8d79ee5
That said, I was hoping we could change that field to be a vector of artifacts instead. It would simplify the driver and also be engine agnostic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @celinval! Can we evaluate the impact of this change on the compilation time of a big project (e.g. s2n-quic)? The verification time numbers look great, but I wanted to make sure compilation time is not severely impacted.
Totally! That's exactly what I've been doing now. I noticed that the "Kani CI / perf" job finished rather quickly, so I'm hoping the overall time is still better. That said, the compilation time does become significant and it increases the need for optimizing the compiler. Unrelated to this change, I've also noticed that Kani driver is starting to consume a lot of memory when verifying the s2n-quic-core crate. |
I manually executed s2n-quic to get the overall execution time and compilation times using a release build. I compared the head of this PR (fd0075c) and main (7c4400d):
I think overall this change is still fairly beneficial for our users. I suggest that we prioritize compiler performance improvements to a follow up PR. I created an issue for adding more data to benchcomp and to investigate optimizations. |
Nice! Ship it! |
This fixes a regression from #2439. The compiler should store the location of the function body instead of the declaration. Storing the correct location fixes how concrete playback stores the generated unit test.
This was a regression introduced by #2439. We were still writing the result of the reachability algorithm to the same file for every harness. Thus, we could only see the MIR for the last harness that was processed. Use the file name that is specific for the harness instead and generate one MIR file per harness like we do with other files generated by kani-compiler.
Description of changes:
Kani compiler used to generate one
goto-program
for all harnesses in one crate. In some cases, this actually had a negative impact on the harness verification time. This was first reported in #1659, and it is now blocking the toolchain upgrade from #2406.The main changes were done in the compiler's module
compiler_interface
and the moduleproject
from the driver. The compiler will now gather all the harnesses beforehand and it will perform reachability + codegen steps for each harness. All files related to a harnessgoto-program
will follow the naming convention bellow:This applies to symtab / goto / type_map / restriction files.
The metadata file is still generated once per target crate, and its name is still the same (
<BASE_NAME>.kani-metadata.json
).On the driver side, the way we process the artifacts have changed. The driver will now read the metadata for each crate, and collect all artifacts based on the symtab goto file that is recorded in the metadata of each harness.
These changes do not apply for
--function
. We still keep all artifacts based on the crate's<BASE_NAME>
and we have a separate logic to handle that. Fixing this is captured by #2129.Resolved issues:
Resolves #1855
Related RFC:
Call-outs:
There are a few of TODOs in this code that I left to avoid making this PR even bigger. I'll try to cross them out in the next couple of weeks. The only one that I think is somewhat urgent is the one in the
compiler_interface.rs
about making sure the test description and mono item align. Although this logic works today, it is rather fragile.Unfortunately, this PR will impact compilation time. We could try to optimize this logic, but any optimization I can think of will make this PR much bigger. Also, the best optimization might just be improving our goto codegen performance in general.
Testing:
How is this change tested?
Is this a refactor change? Yes
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.