The challenge of converting low-level assembly code back into human-readable source code is a cornerstone problem in reverse engineering. In this post

AI Models for Decompiling Assembly Code

submited by
Style Pass
2024-11-16 11:00:05

The challenge of converting low-level assembly code back into human-readable source code is a cornerstone problem in reverse engineering. In this post, we summarise recent work done at RevEng.AI that addresses this challenge through the development of foundational AI models designed for decompilation. As we shall see, this approach is able to produce surprisingly accurate code that more closely resembles human-written source-code than existing rules-based decompilers. Whilst these models currently still have limitations (which we discuss), they offer an exciting new approach that has the potential to greatly aid reverse engineers and security analysts.

Source code written by humans is typically far more comprehensible than the raw assembly code found in binaries. The process of decompilation - translating this low-level assembly code back into human-readable source code - can therefore play a vital role in reverse engineering and software security analysis. In particular, it plays a crucial role in:

However, decompilation faces inherent challenges. The compilation process strips away valuable information such as variable names and transforms higher-level code structures like loops and conditionals in a fundamentally irreversible way, making perfect reconstruction impossible. While established tools like Ghidra and IDA Pro can reconstruct a pseudo code equivalent of the disassembly, it can often be difficult-to-understand or bears little resemblance to human-written source code.

Leave a Comment