Skip to content

Commit

Permalink
added new heuristics to detect obfuscated code
Browse files Browse the repository at this point in the history
  • Loading branch information
mrphrazer committed Aug 10, 2021
1 parent 38735a2 commit 9f448fa
Show file tree
Hide file tree
Showing 8 changed files with 225 additions and 54 deletions.
50 changes: 45 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
# Obfuscation Detection (v1.0)
# Obfuscation Detection (v1.1)
Author: **Tim Blazytko**

_Automatically detect control-flow flattening and other state machines_
_Automatically detect obfuscated code and other state machines_

## Description:

Scripts and binaries to automatically detect control-flow flattening and other state machines in binaries.
Scripts and binaries to automatically detect obfuscated code and state machines in binaries.

Implementation is based on Binary Ninja. Check out the following blog post for more information:
Implementation is based on Binary Ninja. Check out the following blog posts for more information:

[Automated Detection of Control-flow Flattening](https://synthesis.to/2021/03/03/flattening_detection.html)
* [Automated Detection of Control-flow Flattening](https://synthesis.to/2021/03/03/flattening_detection.html)
* [Automated Detection of Obfuscated Code](https://synthesis.to/2021/08/10/obfuscation_detection.html)

## Usage

To detect control-flow flattening, run `detect_flattening.py`:

```
$ ./detect_flattening.py samples/finspy
Function 0x401602 has a flattening score of 0.9473684210526315.
Expand All @@ -27,6 +30,43 @@ Function 0x412f70 has a flattening score of 0.9927007299270073.
Function 0x4138e0 has a flattening score of 0.9629629629629629.
```

To apply various heuristics to detect obfuscated code, run `detect_obfuscation.py`:

```
$ ./detect_obfuscation.py samples/finspy
================================================================================
Control Flow Flattening
Function 0x401602 (sub_401602) has a flattening score of 0.9473684210526315.
Function 0x4017c0 (sub_4017c0) has a flattening score of 0.9981378026070763.
Function 0x405150 (sub_405150) has a flattening score of 0.9166666666666666.
Function 0x405270 (sub_405270) has a flattening score of 0.9166666666666666.
Function 0x405370 (sub_405370) has a flattening score of 0.9984544049459042.
Function 0x4097a0 (sub_4097a0) has a flattening score of 0.9992378048780488.
Function 0x412c70 (sub_412c70) has a flattening score of 0.9629629629629629.
Function 0x412df0 (sub_412df0) has a flattening score of 0.9629629629629629.
Function 0x412f70 (sub_412f70) has a flattening score of 0.9927007299270073.
Function 0x4138e0 (sub_4138e0) has a flattening score of 0.9629629629629629.
================================================================================
Cyclomatic Complexity
Function 0x4097a0 (sub_4097a0) has a cyclomatic complexity of 524.
Function 0x405370 (sub_405370) has a cyclomatic complexity of 258.
Function 0x4017c0 (sub_4017c0) has a cyclomatic complexity of 214.
Function 0x412f70 (sub_412f70) has a cyclomatic complexity of 54.
Function 0x4138e0 (sub_4138e0) has a cyclomatic complexity of 10.
Function 0x412df0 (sub_412df0) has a cyclomatic complexity of 10.
================================================================================
Large Basic Blocks
Basic blocks in function 0x405340 (sub_405340) contain on average 11 instructions.
Basic blocks in function 0x401240 (_start) contain on average 11 instructions.
Basic blocks in function 0x4013e3 (sub_4013e3) contain on average 10 instructions.
Basic blocks in function 0x413a80 (init) contain on average 9 instructions.
Basic blocks in function 0x401349 (sub_401349) contain on average 7 instructions.
Basic blocks in function 0x401030 (_init) contain on average 6 instructions.
================================================================================
Instruction Overlapping
```


## Note

The password for the zipped malware samples is "infected". To unpack, use the following command line:
Expand Down
48 changes: 2 additions & 46 deletions detect_flattening.py
Original file line number Diff line number Diff line change
@@ -1,53 +1,9 @@
#!/usr/bin/python
import sys
from obfuscation_detection.heuristics import find_flattened_functions
from binaryninja import BinaryViewType


def calc_flattening_score(function):
score = 0.0
# 1: walk over all basic blocks
for block in function.basic_blocks:
# 2: get all blocks that are dominated by the current block
dominated = get_dominated_by(block)
# 3: check for a back edge
if not any([edge.source in dominated for edge in block.incoming_edges]):
continue
# 4: calculate relation of dominated blocks to the blocks in the graph
score = max(score, len(dominated)/len(function.basic_blocks))
return score


def get_dominated_by(dominator):
# 1: initialize worklist
result = set()
# add to result
worklist = [dominator]
# 2: perform a depth-first search on the dominator tree
while worklist:
# get next block
block = worklist.pop(0)
result.add(block)
# add children from dominator tree to worklist
for child in block.dominator_tree_children:
worklist.append(child)
return result


def find_flattened_functions():
# walk over all functions
for function in bv.functions:
# calculate flattening score
score = calc_flattening_score(function)
# skip if score is too low
if score < 0.9:
# print(f"Function {hex(function.start)} has a flattening score of {score}.")
continue

# print function and score
print(
f"Function {hex(function.start)} has a flattening score of {score}.")


# check file arguments
if len(sys.argv) < 2:
print("[*] Syntax: {} <path to binary>".format(sys.argv[0]))
Expand All @@ -62,4 +18,4 @@ def find_flattened_functions():
bv.update_analysis_and_wait()

# find flattened functions
find_flattened_functions()
find_flattened_functions(bv)
22 changes: 22 additions & 0 deletions detect_obfuscation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/python
import sys

from binaryninja import BinaryViewType
from obfuscation_detection import detect_obfuscation


# check file arguments
if len(sys.argv) < 2:
print("[*] Syntax: {} <path to binary>".format(sys.argv[0]))
exit(0)

# parse arguments
file_name = sys.argv[1]

# init binary ninja
bv = BinaryViewType.get_view_of_file(file_name)
if not file_name.endswith(".bndb"):
bv.update_analysis_and_wait()

# look for obfuscation heuristics
detect_obfuscation(bv)
15 changes: 15 additions & 0 deletions obfuscation_detection/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from .heuristics import *


def detect_obfuscation(bv):
# find flattened functions
find_flattened_functions(bv)

# find complex functions
find_complex_functions(bv)

# find large basic blocks
find_large_basic_blocks(bv)

# find overlapping instructions
find_instruction_overlapping(bv)
93 changes: 93 additions & 0 deletions obfuscation_detection/heuristics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
import math

from binaryninja import highlight
from obfuscation_detection.utils import *


def find_flattened_functions(bv):
print("=" * 80)
print("Control Flow Flattening")
# walk over all functions
for function in bv.functions:
# calculate flattening score
score = calc_flattening_score(function)
# skip if score is too low
if score < 0.9:
# print(f"Function {hex(function.start)} has a flattening score of {score}.")
continue

# print function and score
print(
f"Function {hex(function.start)} ({function.name}) has a flattening score of {score}.")


def find_complex_functions(bv):
print("=" * 80)
print("Cyclomatic Complexity")
# sort functions by cyclomatic complexity
sorted_functions = sorted(
bv.functions, key=lambda x: calc_cyclomatic_complexity(x))

# bound to print only the top 10%
bound = math.ceil(((len(bv.functions) * 10) / 100))
# print top 10% (iterate in descending order)
for f in list(reversed(sorted_functions))[:bound]:
print(
f"Function {hex(f.start)} ({f.name}) has a cyclomatic complexity of {calc_cyclomatic_complexity(f)}.")


def find_large_basic_blocks(bv):
print("=" * 80)
print("Large Basic Blocks")
# sort functions by average basic block size
sorted_functions = sorted(
bv.functions, key=lambda x: calc_average_instructions_per_block(x))

# bound to print only the top 10%
bound = math.ceil(((len(bv.functions) * 10) / 100))
# print top 10% (iterate in descending order)
for f in list(reversed(sorted_functions))[:bound]:
print(
f"Basic blocks in function {hex(f.start)} ({f.name}) contain on average {math.ceil(calc_average_instructions_per_block(f))} instructions.")


def find_instruction_overlapping(bv):
print("=" * 80)
print("Instruction Overlapping")

# set of addresses
seen = {}

functions_with_overlapping = set()

# walk over all functions
for function in bv.functions:
# walk over all instructions
for instruction in function.instructions:
# parse address
address = instruction[-1]

# seen for the first time
if address not in seen:
# mark as instruction beginning
seen[address] = 1
# seen before and not marked as instruction beginning
elif seen[address] == 0:
functions_with_overlapping.add(function.start)
function.set_user_instr_highlight(
address, highlight.HighlightColor(red=0xff, blue=0xff, green=0))

# walk over instruction length and mark bytes as seen
for _ in range(1, bv.get_instruction_length(address)):
address += 1
# if seen before and marked as instruction beginning
if address in seen and seen[address] == 1:
functions_with_overlapping.add(function.start)
function.set_user_instr_highlight(
address, highlight.HighlightColor(red=0xff, blue=0xff, green=0))
else:
seen[address] = 0

for address in sorted(functions_with_overlapping):
print(
f"Overlapping instructions in function {hex(address)} ({bv.get_function_at(address).name}).")
45 changes: 45 additions & 0 deletions obfuscation_detection/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
def calc_flattening_score(function):
score = 0.0
# 1: walk over all basic blocks
for block in function.basic_blocks:
# 2: get all blocks that are dominated by the current block
dominated = get_dominated_by(block)
# 3: check for a back edge
if not any([edge.source in dominated for edge in block.incoming_edges]):
continue
# 4: calculate relation of dominated blocks to the blocks in the graph
score = max(score, len(dominated)/len(function.basic_blocks))
return score


def get_dominated_by(dominator):
# 1: initialize worklist
result = set()
# add to result
worklist = [dominator]
# 2: perform a depth-first search on the dominator tree
while worklist:
# get next block
block = worklist.pop(0)
result.add(block)
# add children from dominator tree to worklist
for child in block.dominator_tree_children:
worklist.append(child)
return result


def calc_cyclomatic_complexity(function):
# number of basic blocks
num_blocks = len(function.basic_blocks)
# number of edges in the graph
num_edges = sum([len(b.outgoing_edges) for b in function.basic_blocks])
return num_edges - num_blocks + 2


def calc_average_instructions_per_block(function):
# number of basic blocks
num_blocks = len(function.basic_blocks)
# number of instructions
num_instructions = sum(
[b.instruction_count for b in function.basic_blocks])
return num_instructions / num_blocks
6 changes: 3 additions & 3 deletions plugin.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
"api": [
"python3"
],
"description": "Automatically detect control-flow flattening and other state machines",
"longdescription": "Scripts and binaries to automatically detect control-flow flattening and other state machines in binaries.\n\nImplementation is based on Binary Ninja. Check out the following blog post for more information:\n\n[Automated Detection of Control-flow Flattening](https://synthesis.to/2021/03/03/flattening_detection.html)\n\n## Usage\n\n```\n$ ./detect_flattening.py samples/finspy \nFunction 0x401602 has a flattening score of 0.9473684210526315.\nFunction 0x4017c0 has a flattening score of 0.9981378026070763.\nFunction 0x405150 has a flattening score of 0.9166666666666666.\nFunction 0x405270 has a flattening score of 0.9166666666666666.\nFunction 0x405370 has a flattening score of 0.9984544049459042.\nFunction 0x4097a0 has a flattening score of 0.9992378048780488.\nFunction 0x412c70 has a flattening score of 0.9629629629629629.\nFunction 0x412df0 has a flattening score of 0.9629629629629629.\nFunction 0x412f70 has a flattening score of 0.9927007299270073.\nFunction 0x4138e0 has a flattening score of 0.9629629629629629.\n```\n\n\n## Contact\n\nFor more information, contact [@mr_phrazer](https://twitter.com/mr_phrazer).\n",
"description": "Automatically detect obfuscated code and other state machines",
"longdescription": "Scripts and binaries to automatically detect obfuscated code and state machines in binaries.\n\nImplementation is based on Binary Ninja. Check out the following blog posts for more information:\n\n* [Automated Detection of Control-flow Flattening](https://synthesis.to/2021/03/03/flattening_detection.html)\n* [Automated Detection of Obfuscated Code](https://synthesis.to/2021/08/10/obfuscation_detection.html)\n\n## Usage\n\nTo find control-flow flattening, run `detect_flattening.py`:\n\n```\n$ ./detect_flattening.py samples/finspy \nFunction 0x401602 has a flattening score of 0.9473684210526315.\nFunction 0x4017c0 has a flattening score of 0.9981378026070763.\nFunction 0x405150 has a flattening score of 0.9166666666666666.\nFunction 0x405270 has a flattening score of 0.9166666666666666.\nFunction 0x405370 has a flattening score of 0.9984544049459042.\nFunction 0x4097a0 has a flattening score of 0.9992378048780488.\nFunction 0x412c70 has a flattening score of 0.9629629629629629.\nFunction 0x412df0 has a flattening score of 0.9629629629629629.\nFunction 0x412f70 has a flattening score of 0.9927007299270073.\nFunction 0x4138e0 has a flattening score of 0.9629629629629629.\n```\n\nTo apply various heuristics to detect obfuscated code, run `detect_obfuscation.py`:\n\n```\n$ ./detect_obfuscation.py samples/finspy \n================================================================================\nControl Flow Flattening\nFunction 0x401602 (sub_401602) has a flattening score of 0.9473684210526315.\nFunction 0x4017c0 (sub_4017c0) has a flattening score of 0.9981378026070763.\nFunction 0x405150 (sub_405150) has a flattening score of 0.9166666666666666.\nFunction 0x405270 (sub_405270) has a flattening score of 0.9166666666666666.\nFunction 0x405370 (sub_405370) has a flattening score of 0.9984544049459042.\nFunction 0x4097a0 (sub_4097a0) has a flattening score of 0.9992378048780488.\nFunction 0x412c70 (sub_412c70) has a flattening score of 0.9629629629629629.\nFunction 0x412df0 (sub_412df0) has a flattening score of 0.9629629629629629.\nFunction 0x412f70 (sub_412f70) has a flattening score of 0.9927007299270073.\nFunction 0x4138e0 (sub_4138e0) has a flattening score of 0.9629629629629629.\n================================================================================\nCyclomatic Complexity\nFunction 0x4097a0 (sub_4097a0) has a cyclomatic complexity of 524.\nFunction 0x405370 (sub_405370) has a cyclomatic complexity of 258.\nFunction 0x4017c0 (sub_4017c0) has a cyclomatic complexity of 214.\nFunction 0x412f70 (sub_412f70) has a cyclomatic complexity of 54.\nFunction 0x4138e0 (sub_4138e0) has a cyclomatic complexity of 10.\nFunction 0x412df0 (sub_412df0) has a cyclomatic complexity of 10.\n================================================================================\nLarge Basic Blocks\nBasic blocks in function 0x405340 (sub_405340) contain on average 11 instructions.\nBasic blocks in function 0x401240 (_start) contain on average 11 instructions.\nBasic blocks in function 0x4013e3 (sub_4013e3) contain on average 10 instructions.\nBasic blocks in function 0x413a80 (init) contain on average 9 instructions.\nBasic blocks in function 0x401349 (sub_401349) contain on average 7 instructions.\nBasic blocks in function 0x401030 (_init) contain on average 6 instructions.\n================================================================================\nInstruction Overlapping\n```\n\n\n## Note\n\nThe password for the zipped malware samples is "infected". To unpack, use the following command line:\n\n```\n$ unzip -P infected samples.zip\n```\n\n## Contact\n\nFor more information, contact [@mr_phrazer](https://twitter.com/mr_phrazer).",
"license": {
"name": "GPL-2.0",
"text": "Copyright 2021 Tim Blazytko\n\nThis program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.\n\nThis program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License along with this program; if not, see <http://www.gnu.org/licenses/>."
Expand All @@ -24,6 +24,6 @@
"Windows": "",
"Linux": ""
},
"version": "1.0",
"version": "1.1",
"minimumbinaryninjaversion": 2487
}
Binary file modified samples.zip
Binary file not shown.

0 comments on commit 9f448fa

Please sign in to comment.