Classically, the procedure for reverse engineering binary code is to use a disassembler and to manually reconstruct the logic of the original program. Unfortunately, this is not always practical as obfuscation can make the binary extremely large by over-complicating the program logic or adding bogus code.
We present a novel approach, based on extracting semantic information by analyzing the behavior of the execution of a program. As obfuscation consists in manipulating the program while keeping its functionality, we argue that there are some characteristics of the execution that are strictly correlated with the underlying logic of the code and are invariant after applying obfuscation. We aim at highlighting these patterns, by introducing different techniques for processing memory and execution traces.
Our goal is to identify interesting portions of the traces by finding patterns that depend on the original semantics of the program. Using this approach the high-level information about the business logic is revealed and the amount of binary code to be analyze is considerable reduced.
For testing and simulations we used obfuscated code of cryptographic algorithms, as our focus are DRM system and mobile banking applications. We argue however that the methods presented in this work are generic and apply to other domains were obfuscated code is used.