General

March 3, 2021

Tim Blazytko

*This article is a repost. It was originally published at Tim Blazytko’s personal blog: https://synthesis.to/2021/03/03/flattening_detection.html.*

Commercial businesses and malware authors often use code obfuscation to protect specific code areas to impede reverse engineering. In my experience, knowing which code areas are obfuscated often pinpoints sensitive code parts that are worth a closer look. For example, the FinSpy samples that were discovered in September 2020 obfuscate their main modules with Obfuscator-LLVM, while the two-staged dropper isn’t obfuscated at all.

Therefore, I would like to use this post to introduce a heuristic that automatically identifies one of the most common obfuscation techniques, control-flow flattening. However, before we approach this, we will have a look together at how reverse engineers use control-flow graphs, the impact of control-flow flattening on reverse engineering and a tiny bit of graph theory. Afterward, we have all the fundamentals to build a heuristic that detects flattening’s underlying structure and implement it within a few lines of Python. In the end, we investigate how the heuristic performs on obfuscated and non-obfuscated code and discuss its limitations.

If you would like to play around with it, you’ll find the code and all binaries on GitHub.

In many cases, we can reconstruct a function’s control flow without any code analysis, just by simply looking at its *control-flow graph*. For instance, consider the following graph:

As we can see, basic block `a`

has two outgoing edges: one to `b`

and one to `c`

. This is a typical pattern for a conditional branch, where we jump to one basic block (let’s say to `b`

) in the `true`

case and to the other one (to `c`

) in the `false`

case. While `b`

and `c`

most likely perform different calculations, their control flow *joins* in the basic block `d`

. On the high-level, this is a typical `if-then-else`

pattern. Thus, we can recover the control flow without reading a single assembly line:

```
/* type */ func(/* parameters */) {
/* block A */
if (/* jump condition */) {
/* block B */
} else {
/* block C */
}
/* block D */
}
```

After we recovered the high-level structure, we can reconstruct the code itself, basic block per basic block. In this case, let’s just assume that we recover the following function:

```
int check(int x) {
/* block A */
int r;
if (x < 10) {
/* block B */
r = 20;
} else {
/* block C */
r = 30;
}
/* block D */
return r;
}
```

The function `check`

takes a parameter `x`

as input and checks if it is smaller than 10. In this case, the function returns 20, otherwise 30. This was easy, huh? Well, let’s have a look at control-flow flattening then.

Control-flow flattening is a code transformation that removes structure from a control-flow graph. As a result, the control flow cannot be easily recovered as before. For instance, consider the following graph:

If we apply the same method to recover the control flow, we now only see that the blocks `a`

, `b`

, `c`

and `d`

are connected with a block labeled `dispatcher`

. However, we do not know how these blocks relate to each other:

- Is block
`a`

visited before or after block`b`

? - Is there any conditional branch?
- Is there any loop?

In other words, we do not get any *meaningful* information by looking only at the graph, since all blocks are on the same level; they are *flattened*.

In its most basic form, control-flow flattening has a central basic block—the *dispatcher*—that directs the control flow to the individual blocks. This is realized by a *state variable* that tracks which block will be visited next. The *entry* initially sets the state variable to the first basic block—in this case `a`

; then, each block updates the state variable in correspondence to the underlying program logic. In other words, control-flow flattening can be considered as a *state machine* in which each basic block represents a single state.

On the code level, this can be realized by an endless loop that dispatches a state variable in a `switch`

statement. For example, we can represent our function `check`

as follows:

```
int flattened(int x) {
/* entry */
int r;
int state = 0;
/* dispatcher */
while (1){
switch(state) {
case 0: {
/* block A */
if (x < 10) {
state = 1;
} else {
state = 2;
}
break;
}
case 1: {
/* block B */
r = 20;
state = 3;
break;
}
case 2: {
/* block C */
r = 30;
state = 3;
break;
}
case 3: {
/* block D */
return r;
}
}
}
}
```

Remember, each block represents a state that is identified via a variable (here: `state`

). After initializing it with 0, we enter the dispatching loop until we resolve the case for 0. The first state—basic block `a`

—checks if `x`

is smaller than 10 and sets `state`

according to the original control flow: if the condition is true, `state`

becomes 2, which directs the control flow to block `b`

, otherwise to `c`

(by setting `state`

to 3). Afterward, both blocks `b`

and `c`

then set the state variable to 3, realizing the joined control flow in block `d`

.

While we cannot recover the control flow by simply looking at the graph, control-flow flattening can often be broken in a simple manner: by looking at the code and analyzing the state transitions. In *static* analysis scenarios, this works as follows: First, we identify the state variable. Second, we look how it is initialized and follow the dispatching logic until we find the corresponding block. Since this block updates the state variable in accordance to the control flow, we know which state will be executed next. If we repeat this process for each block and track the updates to the state variable, we can consecutively reconstruct the original control flow. In *dynamic* scenarios, this process is even easier, since it is often *sufficient* to know the code relevant for dispatching. Since the dispatcher is called after each state transition, we can observe the interaction between the states without any efforts. As a matter of fact, control-flow flattening can be automatically removed in practice; this excellent blog post by Quarkslab as well as Stadeo are only two examples that automate this for specific flattening implementations via symbolic execution.

Similarly, it is easy to automatically detect generic instances of control-flow flattening based on its underlying structure. Before we go into the details, we will first cover a bit of graph theory to better understand the subsequent sections.

Even though we use control-flow graphs all the time, we may have never thought about what they really represent: Intuitively, a control-flow graph represents all paths in a function that can be traversed during program execution. Such a graph has some nice mathematical properties that are valuable for any kind of code analysis, be it for compiler optimizations, for decompilation or for reverse engineering in general. In the following, we will work towards the goal of automatically detecting loops in control-flow graphs. For this, we first discuss some basic terminology; then, we introduce the concept of dominance relations. For a better understanding, let’s keep it simple and demonstrate the concepts by means of the following graph:

We see a graph with seven nodes and nine edges; one node (`a`

) has no incoming edges, another node (`f`

) no outgoing edges. We call `a`

an *entry* and `f`

an *exit* node. Furthermore, a chain of transitions between nodes is called a *path*. If we are looking at the chain of transitions `a -> b -> d -> f`

, it is a path between `a`

and `f`

. We further say that a *loop* is a subgraph in which each node can be reached from each other node in the subgraph. In our example, `c`

, `e`

and `g`

form a loop, since we can walk from `c`

to `c`

via `c -> e -> g -> c`

; analogously, we can walk from `e`

to `e`

and from `g`

to `g`

. In the following, we introduce the concept of dominance relations that helps us to identify loops automatically.

Dominance relations describe the hierarchical relations between nodes. In particular, they describe which basic blocks are executed before others. More formally, a *dominator* `x`

guarantees that a basic block `x`

is *always executed* before a basic block `y`

. In this case, we say that a node `x`

*dominates* a node `y`

if *all* paths between the entry node and `y`

go through `x`

. In other words, if there is a path from *entry* to `y`

that does *not* go through `x`

, then `x`

is no dominator of `y`

. By design, each node dominates itself.

Given that, we can determine the following dominance relations in our graph. For each node `x`

, we list all nodes that are dominated by `x`

:

```
node x: nodes dominated by x
a: a,b,c,d,e,f
b: b
c: c, e, g
d: d
e: e, g
f: f
g: g
```

Since `a`

is the graph’s entry, it dominates all the other nodes. `c`

dominates `c`

, `e`

and `g`

, since all paths from `a`

to `e`

go through `c`

and all paths from `a`

to `g`

go through `g`

. Consequently, `e`

also dominates `g`

. However, the nodes `b`

and `d`

, `f`

and `g`

only dominate themselves. `b`

dominates only itself since `d`

can also be reached through `c`

; the same holds for `d`

, since `f`

is also reachable through `e`

. Finally, `f`

and `g`

only dominate themselves since they are exit nodes.

Okay, what now? What does all of this mean? Intuitively, this shows us that `a`

will always be executed first and `f`

last. Since `c`

dominates `g`

, we also know that `c`

is always executed before `g`

. Going a step further, we can say that there is an *execution hierarchy*: `a`

is executed before `c`

, `c`

is executed before `e`

and `e`

is executed before `g`

. In this hierarchy, we can introduce an important concept: immediate dominators.

We say that the *closest* node `x`

that dominates a node `y`

—where `x`

and `y`

are distinct—is the *immediate dominator* of `y`

. In our graph, we know that `c`

dominates `c`

, `e`

and `g`

. Since `c`

is the closest dominator to `e`

, it is its immediate dominator. Similarly, since `e`

dominates `g`

and it is the closest dominator to `g`

, we say `e`

is the immediate dominator of `g`

. Applying this to the whole graph, we end up with the following relations:

```
node x: x is immediate dominator of nodes
a: b, c, d, f
b: -
c: e
d: -
e: g
f: -
g: -
```

Basically, we see that `a`

is the immediate dominator of `b`

, `c`

, `d`

and `f`

, while `c`

is the immediate dominator of `e`

and `e`

of `g`

. Now, we can introduce a compact representation of these dominance relations, the dominator tree.

A *dominator tree* is a compact representation of dominance relations; it represents them in a tree-like graph. To build the dominator tree, we simply consider all immediate dominators as edges. Therefore, we can construct the tree by using the following edges for `a`

:

`a -> b`

`a -> c`

`a -> d`

`a -> f`

Further, we add the edges for `c`

and `e`

:

`c -> e`

`e -> g`

Then, the final dominator tree looks as follows:

This tree reveals the dominance relations and hierarchy in a compact form. Especially, we see that `a`

dominates the whole graph, while `c`

dominates `e`

and `g`

. With this representation, we now can easily detect loops in the graph.

Based on the dominator tree in the control-flow graph, we can automatically identify *natural loops* by looking for *back edges*: If we find an edge from a node *back* to one of its dominators, we identified a loop. We say that the dominator *controls* the loop since it dominates all blocks in it. In our graph, we see an edge from `g`

to `c`

. Since `c`

dominates `g`

, we have a loop between `c`

, `e`

and `g`

, where `c`

controls it. More intuitively, we can say that the dominance relations guarantee us that `c`

is executed before `g`

; in other words, we can reach `g`

from `c`

. Since we also can reach `c`

from `g`

, we have a loop.

To sum up, we can identify loops by looking for back edges to dominators. Now, we make use of this concept and build a heuristic to automatically identify control-flow flattening.

Before we talked about control-flow analysis, we said that control-flow flattening removes structure from a control-flow graph by introducing a state machine; this is realized by dispatching a variable that keeps track of the control flow in an endless loop. The following graph is a representation of such a state machine on the binary level:

While this dispatcher directs the control flow to the next basic block via indirect control-flow transfers (using jump tables), other dispatchers are implemented as as a binary search tree. In most cases, the individual states merge in a single block branching to the dispatcher.

To automatically identify these constructs, we design a heuristic that tells us *how likely* a graph’s structure is similar to those of a flattened graph. To achieve that, we want to know

- if there is a
*back edge*to a dominator … - … that dominates
*most of the basic blocks*in the function

In other words, we are looking for a block that controls a loop and also dominates large parts of the function. We express this as a score between 0 and 1 by looking for a back edge to a dominator `x`

and divide the number of basic blocks that are dominated by `x`

by the total number of basic blocks in the graph. In short, we define the *flattening score* as follows:

```
# basic blocks dominated by x
------------------------------
# basic blocks in the function
```

To apply the heuristic to the graph above, we divide the number of basic blocks dominated by the dispatcher (9) by the total number of basic blocks in the function (10) and obtain a score of 0.9.

Now that we know how the heuristic works, we can implement this in a few lines of Python on top of Binary Ninja:

```
def calc_flattening_score(function):
score = 0.0
# 1: walk over all basic blocks
for block in function.basic_blocks:
# 2: get all blocks that are dominated by the current block
dominated = get_dominated_by(block)
# 3: check for a back edge
if not any([edge.source in dominated for edge in block.incoming_edges]):
continue
# 4: calculate relation of dominated blocks to the blocks in the graph
score = max(score, len(dominated)/len(function.basic_blocks))
return score
```

The implementation works as follows:

- Walk over all basic blocks in the current function.
- Get all basic blocks dominated by the current block using
`get_dominated_by(block)`

. - Check if the dominator has an incoming edge from any of the blocks that it dominates.
- Calculate the flattening score by dividing the number of dominated blocks by the number of basic blocks in the function.

To obtain all blocks that are dominated by a given block, we make use of Binary Ninja’s dominator API and perform a depth-first search on the dominator tree:

```
def get_dominated_by(dominator):
# 1: initialize worklist
result = set()
worklist = [dominator]
# 2: perform a depth-first search on the dominator tree
while worklist:
# get next block
block = worklist.pop(0)
# add to result
result.add(block)
# add children from dominator tree to worklist
for child in block.dominator_tree_children:
worklist.append(child)
return result
```

After initializing the worklist, we iteratively visit all children in the dominator tree. In the end, we return the set of all descendants in the dominator tree for a given block.

Finally, we apply this heuristic to all functions. For this, we walk over all functions, calculate their flattening scores and filter those whose score is higher or equal to 0.9.

```
def find_flattened_functions():
# walk over all functions
for function in bv.functions:
# calculate flattening score
score = calc_flattening_score(function)
# skip if score is too low
if score < 0.9:
continue
# print function and score
print(f"Function {hex(function.start)} has a flattening score of {score}.")
```

To sum up, we designed a simple but powerful heuristic for the detection of control-flow flattening that can be implemented within a few lines of Python. Now, we analyze how this heuristic performs on obfuscated and non-obfuscated code.

To get a feeling of how good the heuristic works, we evaluate it on malware samples as well as on common non-obfuscated programs. Intuitively, we expect to pinpoint functions that implement control-flow flattening or other complex state machines which share a similar graph structure. So let’s check out how many functions we spot (relative to the amount of functions in the binary) and, in particular, what kind of state machines we identify.

Starting with malware, we use three samples in total: two of them are obfuscated with control-flow flattening, the third sample is not obfuscated at all. One sample, FinSpy, uses the control-flow flattening from Obfuscator-LLVM, the other, Emotet, ships a custom flattening implementation. The third, unobfuscated sample is a version of PlugX.

FinSpy is a remote access trojan being used for espionage in various countries. For this sample, the heuristic identifies 10 out of 57 functions in the binary (17.54%). While most of the functions are flattened, the only non-flattened function is of particular interest: in an endless loop, it performs anti-debug checks and terminates the application if any of the checks fail. All other functions share a similar flattening structure; the following snippet is created with Binary Ninja’s decompiler and serves as a example for the others. In this snippet, `var_18`

is used as a state variable.

```
uint64_t sub_405270(int64_t* arg1) {
int32_t var_14 = 1
int32_t var_18 = 0x16a0bd81
while (true) {
int32_t var_20_1 = var_18 - 0xe07f7fa9
if (var_18 == 0xe07f7fa9) {
int32_t rcx_7 = 0x74c24cdf
if (sub_405150(arg1) != 0) {
rcx_7 = 0x16a0bd81
}
var_18 = rcx_7
} else {
int32_t var_24_1 = var_18 - 0x16a0bd81
if (var_18 == 0x16a0bd81) {
var_14 = (var_14 << 1) - 0x55aabee5 + sub_405150(arg1) + 0x55aabee5
var_18 = 0xe07f7fa9
} else {
int32_t var_28_1 = var_18 - 0x74c24cdf
if (var_18 == 0x74c24cdf) {
break
}
}
}
}
return zx.q(var_14)
}
```

For the Emotet sample, the heuristic identifies 57 out of 138 functions in the binary (41.30%). While a handful of non-flattened functions decode memory areas, the vast majority share a flattening structure similar to the following snippet, in which `eax`

is the state variable:

```
int32_t __fastcall sub_40b370(int32_t* arg1, int32_t* arg2) {
int32_t* var_10
int32_t* esi = var_10
void* var_18
void* edi = var_18
int32_t eax = 0x3b2a39f6
do {
bool cond:0_1 = eax == 0x2637511e
if (eax s<= 0x2637511e) {
if (cond:0_1) {
void* ecx_1 = esi + 4
if (ecx_1 u<= edi) {
void* eax_3 = *esi
*(arg2 + 8) = eax_3
if (eax_3 + ecx_1 u<= edi) {
*(arg2 + 4) = ecx_1
return 1
}
}
break
}
if (eax == 0x110d5577) {
int32_t* ecx = esi + 4
if (ecx u> edi) {
break
}
int32_t eax_1 = *esi
esi = ecx
*arg2 = eax_1
eax = 0x2637511e
continue
} else if (eax == 0x14df8275) {
eax = 0x110d5577
esi = *arg1
edi = *(arg1 + 4) + esi
continue
}
} else if (eax == 0x3b2a39f6) {
eax = 0x14df8275
continue
}
} while (eax != 0x188efd81)
return 0
}
```

So far, the heuristic successfully identified control-flow flattening. What happens if the malware sample is not obfuscated? In order to find this out, we try our heuristic on PlugX, a remote access trojan. In total, we identify 26 out of 504 functions in the binary (5.16%). While most of these functions do not look very promising, some of them are *very* interesting: they implement command dispatching routines (which backdoor functionality will be executed) as well as the communication with the command-and-control server, two of the most interesting state machines in the binary.

Up until now, we already noticed that the heuristic also detects non-flattened functions that implement state machines. How does this perform on common non-obfuscated programs? To answer this, let’s have a quick look at `ls`

(140 KiB), `gcc`

(1.2 MiB) and `gdb`

(8.5 MiB). The heuristic identifies:

- 8 out of 237 functions (3.38%) for
`ls`

- 67 out of 1,456 functions (4.60%) for
`gcc`

- 374 out of 15,603 functions (2.46%) for
`gdb`

While a considerable fraction of functions is not that interesting, others implement complex state machines that perform recursive directory traversals, parse file formats or encode/decode data streams. Often, these state machines are implemented as loops with large dispatching routines based on `switch`

statements.

To conclude the experiments, we see that the heuristic identifies control-flow flattening as well as other state machines; some of them are very interesting for reverse engineers, since they implement complex dispatching routines that are relevant for understanding file formats, communication protocols or other important program logic. Although the heuristic identifies many functions that do not seem to be relevant for various reversing scenarios, they only constitute a small percentage of the whole code base, as these *non-representative* experiments show.

To automatically *break* a given code obfuscation scheme, we must often first analyze its characteristics, generalize the patterns and automatically identify them. Only in the second stage, we are able to implement a strategy that removes the obfuscation, regardless whether we patch the binary, clean up the decompilation or work on a simplified intermediate language. If we want to automatically *identify* obfuscated code, the approach is similar: We have to understand the obfuscation’s underlying structure, generalize its pattern and design a heuristic which pinpoints interesting code locations. In the second stage, we can have a closer look at the identified code parts.

In our case, we designed a simple but effective heuristic for discovering control-flow flattening by looking for functions which have a basic block that dominates large parts of the function *and* controls a loop. Personally, I use this heuristic quite a lot in early analysis stages to catch a glimpse of a complex code base. In cases where no control-flow flattening is present, it often finds complex program logic which is interesting for reverse engineers. To better filter the results, we can add additional data points to pinpoint a function’s complexity, such as its number of basic blocks or its cyclomatic complexity: `#edges - #blocks + 2`

. We could also think about refining the flattening heuristic to include checks for the state variable at the costs of requiring a code analysis.

For more advanced implementations of control-flow flattening which use indirect or call-based [dispatching routines][flattening], the heuristic will not work at all, since the graph structure is inherently different. Similarly, the heuristic won’t detect other obfuscation schemes such as opaque predicates, range dividers, mixed Boolean-Arithmetic or virtual machines, which have fundamentally different characteristics. For these schemes, we have to develop custom heuristics tailored to their underlying structures. Perhaps, this is a topic for a future post.

We send out regular updates on new releases, industry insights and technical case studies

© 2023 emproof B.V. All rights reserved. Design by Kava.
Privacy PolicyTerms and Conditions