Business models behind products such as iTunes and the Skype VoIP clients depend entirely on the secrecy of technical details of their product. Once the technical details are uncovered, a medium such as the Internet is extremely powerful to (anonymously) spread the sensitive information and it is shown that stopping the spread of such highly sensitive information is difficult. Therefore, program obfuscation recently attracted a lot of attention as a low cost approach to protect the inner workings of an application. However, when a new obfuscating transformation is proposed, it is unclear how to measure the quality of such transformation as there is no general agreement on this matter in this young domain.
Collberg's taxonomy describes the quality of an obfuscating transformation in terms of cost, resilience and potency. The cost describes the execution penalty, the resilience measures how well a transformation withstands an attack while the potency measures how much more difficult the obfuscated code is to understand.
Our work contributes by describing attacks that test the resilience of an obfuscating transformation and by the construction of a framework based on software complexity metrics to evaluate the potency of obfuscating transformations. In this dissertation, we bring together existing control flow obfuscating transformations and existing software complexity metrics. In particular, we consider three transformations: control flowflattening (CFF), branch procedures and opaque predicates together with two metrics: cyclomatic number and knot count. After applying the obfuscating transformations on a program, the complexity of the program increases. To measure this, our framework has to be capable of quantifying the obfuscating transformation independent of at which point in the development process the obfuscating transformation is applied. Therefore, our introduced framework works on the machine code. The machine code is represented by means of a static control flow graph. However, in this dissertation we also propose, for the first time, to construct a representation that is based on dynamically generated information.
After we have shown that the proposed obfuscating transformations increase the complexity of the programs,we look for inverse transformations. A successful inverse transformation should bring the complexity of the program close to the original complexity, but not necessarily reconstruct the original program. The first de-obfuscating transformation is called cloning. By judiciously duplicating portions of the program, spurious execution paths no longer taint the original program execution paths. By studying the duplicated portions, it is possible to construct inverse transformations breaking CFF and the insertion of branch procedures.
The second technique proposed to revert control flow obfuscating transformations is based on static feasible path analysis. Control flow obfuscating transformations aim to insert paths which are infeasible and the goal of our analysis is to detect these infeasible paths. First, a constraint is constructed for a given path through the program. Then, the analysis will determine whether that constraint is feasible or infeasible. This way, we are able to revert the CFF transformation. We also used the transformation to detect inserted opaque predicates Â– which are boolean valued expressions whose values are known to the obfuscator but difficult to determine afterwards – and by use of abstract interpretation we even reduced the number of elements in the domain. However, we discovered that this technique to detect opaque predicates is limited.
The aforementioned inverse techniques assume the presence of a conservative representation of the program. However, it is not always the case that such a representation can easily be derived from a program. To overcome this problem we introduced a third technique based on static and dynamic analyses. The new technique is successful even without the presence of a full representation of the program. It suffices to statically identify program parts and to observe their behaviour during execution. By using this technique, we were able to break CFF and branch procedures without the construction of a control flow graph. Because of the lack of general applicability of the attack based on static feasible path analysis for opaque predicates, we propose a last de-obfuscation technique. This last technique targets opaque predicates of all kinds where the goal is to find the conditional branches controlled by an opaque predicate. Similar to the previous technique, this attack involves both static and dynamic components. First, by executing the program it is possible to identify a set of candidates. Then, this set of opaque predicates is narrowed by injecting both well-chosen and randomly generated numbers in the code and observing the particular conditional branch. This technique is able to identify 99% of the inserted opaque predicates.
The complexity metrics of the programs after de-obfuscation indicate that the complexity is only marginally larger than the complexity of the original programs. For opaque predicates, we go into much more details because we can distinguish two types of errors. The underestimation error denotes the number of not found opaque predicates and the overestimation error denotes the number of regular conditional branches wrongly indicated as governed by an opaque predicate.
We have demonstrated that existing obfuscating transformations increase the complexity of a program. At the same time, these obfuscating transformations are vulnerable to attacks, reducing the complexity close to the complexity of the original program. Therefore, we propose an obfuscating transformation based on self-modifying code that ismore resilient against attacks. We provide solutions to protect against an insider attack. The technique is suitable for application security because it breaks assumptions made by state-of-the-art tools. Based on self-modifying code, we introduce a technique called trace obfuscation. Trace obfuscation forces the attacker to perform program understanding on the trace instead of on a graph. One proposed trace obfuscation transformation combines dynamic software mutation, cloning, diversity and obfuscating predicates.
There are three major contributions in this dissertation. First of all, a framework to measure program complexity is proposed and is applied to existing control flow obfuscating transformations. Secondly, we introduced inverse transformations attacking the control flow obfuscating transformations. Lastly, we introduced a new obfuscating transformation called trace obfuscation.