Code obfuscation techniques are increasingly being used in software for such reasons as protecting trade secret algorithms from competitors and deterring license tampering by those wishing to use the software for free. However, these techniques have also grown in popularity in less legitimate areas, such as protecting malware from detection and reverse engineering. This work examines two such techniques “packing and virtualization-obfuscation“ and presents new behavioral approaches to analysis that may be relevant to security analysts whose job it is to defend against malicious code. These approaches are robust against variations in obfuscation algorithms, such as changing encryption keys or virtual instruction byte code.
Packing refers to the process of encrypting or compressing an executable file. This process scrambles the bytes of the executable so that byte-signature matching algorithms commonly used by anti-virus programs are ineffective. Standard static analysis techniques are similarly ineffective since the actual byte code of the program is hidden until after the program is executed. Dynamic analysis approaches exist, but are vulnerable to dynamic defenses. We detail a static analysis technique that starts by identifying the code used to "unpack" the executable, then uses this unpacker to generate the unpacked code in a form suitable for static analysis. Results show we are able to correctly unpack several encrypted and compressed malware, while still handling several dynamic defenses.
Virtualization-obfuscation is a technique that translates the original program into virtual instructions, then builds a customized virtual machine for these instructions. As with packing, the byte-signature of the original program is destroyed. Furthermore, static analysis of the obfuscated program reveals only the structure of the virtual machine, and dynamic analysis produces a dynamic trace where original program instructions are intermixed, and often indistinguishable from, virtual machine instructions. We present a dynamic analysis approach whereby all instructions that affect the external behavior of the program are identified, thus building an approximation of the original program that is observationally equivalent. We achieve good results at both identifying instructions from the original program, as well as eliminating instructions known to be part of the virtual machine.