-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incomplete/incorrect callgraphs due to incorrect class names #1376
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1376 +/- ##
==========================================
- Coverage 67.87% 67.85% -0.03%
==========================================
Files 253 253
Lines 27702 27706 +4
==========================================
- Hits 18804 18801 -3
- Misses 8898 8905 +7
|
@Johanmyst Thank you for tackling this tricky case and for the thoughtful refactoring effort! I believe this patch is on the right track. The approach of inferring all possible class names from callbases or functions seems efficient and aligns well with our requirements. Consequently, it's great that we're eliminating the need for type inference at various sites (like alloc, load, store, etc.) for cpp. I've left some minor comments on specific parts of the patch for your consideration. Once these are addressed, I think we'll be in a good position to merge, Professor @yuleisui. |
@Johanmyst, could you review Xiao's comments to see if any changes are needed? I will merge your pull request once those minor issues have been addressed. |
Thanks for your response, of course I can integrate your comments! Give me a second. |
… copy-elision; fixed non-specific return types
It seems like you've already merged this pull request, I'll open a new one for the final changes. |
After the pull request I submitted yesterday (#1369) was merged I noticed another issue with C++ virtual functions and the way SVF retrieves the class name (used to determine which function is called for overloaded/virtual functions in derived classes) leading to an incomplete/incorrect call graph being generated.
This fix possibly also addresses issues #1314 and #1301.
This issue is caused by an incorrect way of retrieving the class name when the Class Hierarchy Graph is being built, which relies on the LLVM type and the assumption that an LLVM type can be relied upon for retrieving a class' name. However, if two classes have identical types (but can still have e.g. different overloaded function definitions) LLVM can choose to merge those types. Depending on the order of definition, LLVM can merge into either class' name. When SVF then extracts the class' name from the merged type to determine which function is being called in the VTables, the wrong type can be chosen and thus the call graph finds no/the incorrect callee.
For example, take the following snippet:
When looking at the LLVM IR for this example, only two types are visible because both
Child1
andChild2
are identical w.r.t. their type definitions:Because the function definition of
Child2::action
precedesChild1::action
the classesChild1
andChild2
get merged into a singleChild2
type instance. If the definition order of these functions are swapped, the types get merged like this:If the derived classes contain differences in their types (e.g. because
Child1
contains a new private fieldint x
or similar) the types cannot be merged, and this issue doesn't occur. Note that this issue also does not occur when the objects are heap-allocated objects, as the current class-name-finding-function finds the class' constructor function and correctly extracts the class name from the mangled constructor name.However, when types are merged like this, SVF's way of finding the underlying class name while constructing the Call Hierarchy Graph (specifically for virtual functions) fails because it assumes types cannot be merged like this.
More specifically, the
CHGBuilder::buildCHG()
function callsCHGBuilder::buildInternalMaps()
, which callsCHGBuilder::buildCSToCHAVtblsAndVfnsMap()
. This function searches for virtual calls, and for each callsite to a virtual function it finds it will try to get the classes that that callsite could resolve to withCHGBuilder::getCSClasses()
. This last function callscppUtil::getClassNameOfThisPtr()
with a pointer to the underlying object to try and get the name of the class that that pointer could point to.Without any metadata specifying the correct class name however, this function internally calls
ObjTypeInference::inferThisPtrClsName()
to infer the name of the underlying class. This function contains the following snippet:When the underlying types get merged however, this snippet (as in the above example) incorrectly returns the class as
Child2
, rather than the correctChild1
. Because (as far as I could find) disabling this merging behaviour in LLVM isn't easily done (and it might be done for good reason), I modified the way the name of the underlying class is found to no longer use the LLVM type.Instead, rather than only preforming the forward walk for heap allocated objects (note the small fix there because previously the check for
isNewAlloc()
only checked if the function name exactly equalled "_Znwm", but I think it should check for any heap allocator (which, for class objects is necessarily anew
operator)), the forward walk is also done for stack and global variables. This ensures the constructor function is also found for stack objects or global variables, from which the correct class name can be extracted.Moreover, because class names can only be extracted from C++
dynamic_cast
call sites (and not thedynamic_cast
function itself), I also ensured these are handled correctly (previously thedynamic_cast
function object was added as a source instead of the call site).Note that I also removed the early-exit (i.e. continue) whenever a source site is found during the backwards walk. In my tests I did not see any issues with this. I removed this early-exit because the rest of the infrastructure parses the returned values as a set with potentially more class names, and I wanted to ensure an early-exit wouldn't cause incomplete results again. If this is entirely useless, feel free to put the early-exit back in!
Also note that while I ran a bunch of my own tests, these were all based on the 2.9 release since I currently don't have LLVM 16 on my system. I tried testing against these changes applied to the
master
branch, but ran into unrelated assertion failures in theCHGBuilder::analyzeVTables()
function. More specifically, recent pushes seem to have removed checks forBitCast
ORIntToPtr
cast instructions, and now simply assume any cast in the vtable must be aIntToPtr
cast. But, since I'm running without opaque pointers, the bitcasts in the vtable entries causes these assertions to fail.