Exploring the arsenal of software reverse engineering tools in the battle for embedded system security

Uncategorized

March 4, 2024

Emproof

The notion that reverse engineering is exclusive to expert hackers is swiftly evolving. Thanks to sophisticated open-source tools such as Ghidra [7], the threshold skills to implement reverse engineering have been significantly reduced for both novices and experts. This transformation is credited to the growing community of cybersecurity enthusiasts and the wealth of resources that simplify the reverse engineering process. No longer a mysterious or specialised practice, reverse engineering has become a common skill, bolstered by a vast array of tools and knowledge that enable users to delve into and comprehend the intricacies of different software, especially within embedded systems security. In the following blog, we will dive deeper into this process and learn about the tools and techniques of choice for reverse engineering embedded firmware.

How attackers infiltrate / attack / hack

Understanding how attackers exploit embedded systems is crucial for developing effective defences. Embedded devices, once deployed, operate outside the manufacturer’s immediate control, making them prime targets for sophisticated attacks. Attackers leverage a combination of direct hardware interfaces and software vulnerabilities to extract and reverse engineer firmware, seeking to uncover and exploit weaknesses. This section aims to shed light on the initial steps attackers take in this process, highlighting the importance of robust security measures to safeguard against such intrusions.

Phase 1: Firmware Extraction

As the initial phase, acquiring the binary is imperative; there are several methods to accomplish this task: Starting from the device itself, extracting the binary is frequently straightforward. Many embedded systems facilitate a direct readout of the flashed binary through interfaces like JTAG. While modern devices incorporate readout protections to safeguard valuable code and data from unauthorised access, these protections may succumb to circumvention via techniques such as fault attacks [1] or weaknesses inherent in the implemented protection scheme [2]. The internet is replete with numerous examples and guides detailing specific microcontrollers and their circumvention of protection schemes [3, 4, 5]. Sometimes, more raw binary data (often referred to as binary blob) or entire images are extracted from the device. In this case, the binary first must be located.

In this phase, Binwalk [6] excels by efficiently identifying and extracting embedded file systems and executable code within binary blobs. It allows pinpointing and dissecting components such as code or binaries embedded in the firmware image; thereby, it streamlines the transition to deeper analysis with other reverse engineering tools, setting a solid foundation for uncovering software structures and vulnerabilities.

Depending on the scenario, it might not even be necessary to dump the binary from the device, as some software is provided as a compiled library, for example, downloadable firmware. This gives actors with malicious intent immediate access, and they can continue with phase two – software reverse engineering.

Phase 2: Software Reverse Engineering

Once the binary has been obtained, software reverse engineering can be conducted. Numerous reverse engineering frameworks facilitate binary analysis, with Ghidra [7], IDA Pro [8] and BinaryNinja [9] ranking the popular. While IDA Pro and BinaryNinja are paid frameworks, Ghidra is open-source and maintained by the National Security Agency (NSA) in the United States of America – meaning its free and available for everyone. With Ghidra, the NSA has given access to one of the most powerful reverse engineering frameworks, which novices can use. The release of such a powerful software reverse engineering framework by an intelligence agency might seem surprising, but the NSA explained in a recent interview that they aimed to “level the playing field” by providing an open-source tool accessible to students and cybersecurity enthusiasts worldwide. For the NSA, this initiative has paid off, as the community widely embraces Ghidra, contributing various extensions to enhance its overall capabilities [9].

Features

Although each software reverse engineering framework is distinguished by its unique features, there is a core set of functionalities that are commonly found across most of these tools. In the ensuing discussion, we highlight these essential and universally utilised features that are pivotal in the analysis of a binary. These features form the backbone of any reverse engineering effort, providing the necessary tools and capabilities to dissect and understand binary code.

Disassembly

The journey of reverse engineering begins with the disassembly of the binary. This critical first step converts machine code into a format that’s more intelligible to humans: a set of assembly instructions. For the adept reverse engineer, this stage is a treasure trove of information. It’s where the foundational understanding of the binary’s inner workings begins to take shape.

Key activities during this phase include identifying and naming imported API functions, a process akin to putting names to faces in a crowd. Reverse engineers also focus on pinpointing and labelling frequently used sections of library code. Another crucial task is the annotation of cross-references. Furthermore, there’s the reconstruction of various data structures. Strings, pointer tables, and classes are pieced back together, offering a clearer picture of how the application was originally designed. Each of these elements, once obscure and hidden within the binary’s machine code, all become a vital clue in understanding the application’s architecture and functionality.

The screenshot shows the disassembly of a function. At this point the reverse engineer is already able to identify various things – in this case the API call to a cryptographic function which extracts a private RSA key.

Decompiler

Contemporary software reverse engineering tools are typically fitted with an invaluable asset: a decompiler. This tool acts as a translator, converting the cryptic machine code back into a format resembling high-level, C-like code that is far more approachable for human analysis. The decompiler’s role in the reverse engineering process is pivotal, as it demystifies the original logic of the code, making it accessible not just to seasoned experts but also to beginners. The decompiler’s real strength lies in its ability to present complex machine-level instructions as organised, structured code. This is particularly advantageous when working with unprotected binaries, where the original structure and logic of the code is left relatively intact. In such cases, the decompiler can swiftly unravel the function’s core logic, laying it out in a familiar, high-level programming language. This translation offers a rapid and clear understanding of what a specific function is designed to do, enabling analysts to quickly comprehend the purpose and operation of various parts of the software.

Consider this scenario: a company develops an ultrasonic sensor that uses a proprietary algorithm for its key functionality. This algorithm is integral to the sensor’s ability to measure distances accurately and efficiently. Here’s a code snippet that illustrates how the sensor calculates distance by factoring in the speed of light:

int calculate_distance() {

  int travel_time = get_travel_time();

  int distance = 100*((travel_time/1000000.0)*340.29)/2;

  return distance;

}

When the compiled binary of this sensor is examined in Ghidra, the decompiler’s capabilities truly shine. Ghidra presents a side-by-side view: on the left, the original disassembly, and on the right, the decompiled output. It’s in this right panel where Ghidra demonstrates its prowess. The decompiler doesn’t just interpret the binary; it almost perfectly reconstructs the function, making the proprietary algorithm readily understandable. This level of accessibility is a double-edged sword: it allows for educational insights but also exposes potential vulnerabilities. Competitors or others with sufficient technical knowledge can dissect and analyse this proprietary intellectual property with relative ease; given the power of the decompiler, this can even be achieved by non-experts, with a basic understanding of programming.

Moreover, if the binary includes symbols and debug information – often the case when the binary isn’t deliberately stripped – the process becomes even more straightforward. Names of functions and various symbols remain intact, providing a clearer roadmap for reverse engineering efforts. It’s important to note that compilers typically don’t strip this information by default; developers must actively choose to enable this feature to obscure these details in the compiled binary.

CFG

Control-flow graph (CFG) analysis is an essential technique in reverse engineering, providing a visual representation of a function’s structure. By scrutinising the graph, one can effortlessly reconstruct the high-level logic of a function, including conditional statements, loops, and more. The CFG is generated by analysing the disassembled code, presenting a comprehensive overview of distinct code blocks and potential execution paths within the function. This graphical representation aids human analysts in several ways. It helps them identify recurring patterns in the code, facilitating a deeper understanding of its logic. By visualising the different code blocks, analysts can concentrate on the pertinent sections, avoiding unnecessary distractions and saving time during the analysis. Analysts can explore various execution paths, understanding how the program flows under different conditions. This is particularly valuable for comprehending complex or conditional logic within the function. Overall, CFG analysis enhances the efficiency and effectiveness of reverse engineering efforts, providing a structured visual map that guides analysts through the intricacies of the disassembled code. Our Co-Founder Tim Blazytko shared insights into CFG analysis in a recent blogpost: https://www.emproof.com/introduction-to-control-flow-graph-analysis/

The need for protection

Armed with features such as disassemblers, decompilers, and control flow graphs, these tools are so powerful that even the deepest secrets in code can be unlocked. The widespread accessibility of tools such as Ghidra has democratised the initiation of reverse engineering projects. These powerful tools dispel the myth that reverse engineering is a cryptic or exclusive process, making it accessible for a wider audience. However, this ease of access also presents a significant challenge: it exposes intellectual property to potential vulnerabilities. As a result, there’s an increasing need for more sophisticated and robust security measures to safeguard proprietary information in the digital age.

Emproof is reshaping the embedded software security landscape. Our mission is to deliver high levels of software security and IP integrity for embedded systems, using unique techniques that protect algorithms and data while securing the entire device. Our solution, Emproof Nyx, prevents reverse engineering, securing your valuable intellectual property and protecting against exploitation attacks.

References

[1] https://www.emproof.com/attacking-microcontroller-readout-protections-with-fault-attacks/

[2] https://www.emproof.com/bypassing-readout-protection-in-nordic-semiconductor-microcontrollers/

[3] https://hackaday.com/2023/02/05/need-to-dump-a-protected-stm32f0x-use-your-pico/

[4] https://blog.zapb.de/stm32f1-exceptional-failure/

[5] https://research.nccgroup.com/wp-content/uploads/2020/02/NCC-Group-Whitepaper-Microcontroller-Readback-Protection-1.pdf

[6] https://github.com/ReFirmLabs/binwalk

[7] https://github.com/NationalSecurityAgency/ghidra

[8] https://hex-rays.com/ida-pro/

[9] https://binary.ninja

[10] https://www.nsa.gov/Press-Room/News-Highlights/Article/Article/2958453/cybersecurity-speaker-series-ghidra-beyond-the-code/

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	2 years	Google Analytics sets this cookie to store and count page views.