Introduction to Control-flow Graph Analysis

General

March 15, 2021

Tim Blazytko

This article is a repost. It was originally published at Tim Blazytko’s personal blog: https://synthesis.to/2021/03/15/control_flow_analysis.html.

Following my last blog post, I got a lot of questions about additional material on control-flow analysis. While most compiler books (such as the Dragon Book) cover related topics in-depth, I decided to publish my own presentation. The slide deck illustrates the theory of control-flow graph construction, dominance relations and loop analysis. In the second part of this post, I would like to show you how to play around with these concepts using the reverse engineering framework Miasm.

Presentation on control-flow analysis
Code for graph analysis in Miasm

Graph Analysis in Miasm

For playing around with control-flow graphs, dominance relations and loop detection, we take the following graph as an example:

As described in my previous blog post on automated detection of control-flow flattening, this graph has an entry node a and a loop between c, e and g. To automatically determine all entries, dominance relations and loops, we can use Miasm’s DiGraph class:

from miasm.core.graph import DiGraph

# define edges
edges = [
    ("a", "b"),
    ("a", "c"),
    ("b", "d"),
    ("c", "d"),
    ("c", "e"),
    ("d", "f"),
    ("e", "f"),
    ("e", "g"),
    ("g", "c"),
]

# init graph
g = DiGraph()

# add edges
for x, y in edges:
    g.add_edge(x, y)

# walk over graph entries
for entry in g.heads():
    # dominators
    dominators = g.compute_dominators(entry)

    # dominator tree
    dominator_tree = g.compute_dominator_tree(entry)

    # natural loops
    loops = g.compute_natural_loops(entry)

First, we initialize a DiGraph instance and add all edges to the graph. Then, we walk over all graph entry nodes and compute the dominance relations, the dominator tree and all loops in the graph. With some additional code for pretty printing (omitted here due to readability), we get the following output:

Graph entry: a


Dominators:
a: a
b: a, b
c: a, c
d: a, d
e: a, c, e
f: a, f
g: a, c, e, g


Dominator Tree:
a -> b
a -> c
a -> d
a -> f
c -> e
e -> g


Natural Loops
g -> c: {a, c, e, g}

To apply these techniques to real-world binaries, we just wrap around some disassembling logic:

# init symbol table
loc_db = LocationDB()

# open the binary for analysis
container = Container.from_stream(open(file_path, 'rb'), loc_db)

# cpu abstraction
machine = Machine(container.arch)

# init disassemble engine
mdis = machine.dis_engine(container.bin_stream, loc_db=loc_db)

# disassemble the function at address and get CFG
asm_cfg = mdis.dis_multiblock(start_address)

The code takes a binary as input (provided via file_path), initializes the disassembly engine and disassembles a function at start_address. Then, the resulting asm_cfg is a DiGraph instance holding the function’s control-flow graph. The entire script that re-implements the heuristic to detect control-flow flattening in Miasm can be found here.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	2 years	Google Analytics sets this cookie to store and count page views.

Introduction to Control-flow Graph Analysis

Graph Analysis in Miasm

Sectors

Company

Resources