Jump to content
Tuts 4 You

Obfuscation & Deobfuscation

48 files

  1. Unpacking Virtualization Obfuscators

    Nearly every malware sample is sheathed in an executable protection which must be removed before static analyses can proceed. Existing research has studied automatically unpacking certain protections, but has not yet caught up with many modern techniques. Contrary to prior assumptions, protected programs do not always have the property that they are reverted to a fully unprotected state at some point during the course of their execution. This work provides a novel technique for circumventing one of the most problematic features of modern software protections, so-called virtualization obfuscation. The technique enables analysis of heretofore impenetrable malware.

    84 downloads

    0 comments

    Submitted

  2. Deobfuscation of Packed and Virtulazation-Obfuscated Protected Binaries

    Code obfuscation techniques are increasingly being used in software for such reasons as protecting trade secret algorithms from competitors and deterring license tampering by those wishing to use the software for free. However, these techniques have also grown in popularity in less legitimate areas, such as protecting malware from detection and reverse engineering. This work examines two such techniques “packing and virtualization-obfuscation“ and presents new behavioral approaches to analysis that may be relevant to security analysts whose job it is to defend against malicious code. These approaches are robust against variations in obfuscation algorithms, such as changing encryption keys or virtual instruction byte code.

    Packing refers to the process of encrypting or compressing an executable file. This process scrambles the bytes of the executable so that byte-signature matching algorithms commonly used by anti-virus programs are ineffective. Standard static analysis techniques are similarly ineffective since the actual byte code of the program is hidden until after the program is executed. Dynamic analysis approaches exist, but are vulnerable to dynamic defenses. We detail a static analysis technique that starts by identifying the code used to "unpack" the executable, then uses this unpacker to generate the unpacked code in a form suitable for static analysis. Results show we are able to correctly unpack several encrypted and compressed malware, while still handling several dynamic defenses.

    Virtualization-obfuscation is a technique that translates the original program into virtual instructions, then builds a customized virtual machine for these instructions. As with packing, the byte-signature of the original program is destroyed. Furthermore, static analysis of the obfuscated program reveals only the structure of the virtual machine, and dynamic analysis produces a dynamic trace where orig­inal program instructions are intermixed, and often indistinguishable from, virtual machine instructions. We present a dynamic analysis approach whereby all instructions that affect the external behavior of the program are identified, thus building an approximation of the original program that is observationally equivalent. We achieve good results at both identifying instructions from the original program, as well as eliminating instructions known to be part of the virtual machine.

    38 downloads

    0 comments

    Submitted

  3. Reversing An Obfuscated Java Malware

    Some time in the recent past, I stumbled upon a news on The Intercept, about a malware being used against some Argentine prosecutor, who was found dead under uncanny circumstances (Fig. 1 & 2). Interested, I decided to have a look at the malware. On further analysis, it revealed that it was a java based malware. Since java reversing is somewhat less documented, I decided to shed some light in this field, and the result of that is this document.

    37 downloads

    0 comments

    Submitted

  4. Reverse Engineering Obfuscated Code

    In recent years, code obfuscation has attracted attention as a low cost approach to improving software security by making it difficult for attackers to understand the inner workings of proprietary software systems. This paper examines techniques for automatic deobfuscation of obfuscated programs, as a step towards reverse engineering such programs. Our results indicate that much of the effects of code obfuscation, designed to increase the difficulty of static analyses, can be defeated using simple combinations of straightforward static and dynamic analyses. Our results have applications to both software engineering and software security. In the context of software engineering, we show how dynamic analyses can be used to enhance reverse engineering, even for code that has been designed to be difficult to reverse engineer. For software security, our results serve as an attack model for code obfuscators, and can help with the development of obfuscation techniques that are more resilient to straightforward reverse engineering.

    35 downloads

    0 comments

    Submitted

  5. Semantics-based Code Obfuscation by Abstract Interpretation

    In recent years code obfuscation has attracted research interest as a promising technique for protecting secret properties of programs. The basic idea of code obfuscation is to transform programs in order to hide their sensitive information while preserving their functionality. One of the major drawbacks of code obfuscation is the lack of a rigorous theoretical framework that makes it difficult to formally analyze and certify the effectiveness of obfuscating techniques. We face this problem by providing a formal framework for code obfuscation based on abstract interpretation and program semantics. In particular, we show that what is hidden and what is preserved by an obfuscating transformation can be expressed as abstract interpretations of program semantics. Being able to specify what is masked and what is preserved by an obfuscation allows us to understand its potency, namely the amount of obscurity that the transformation adds to programs. In the proposed framework, obfuscation and attackers are modeled as approximations of program semantics and the lattice of abstract interpretations provides a formal tool for comparing obfuscations with respect to their potency. In particular, we prove that our framework provides an adequate setting to measure not only the potency of an obfuscation but also its resilience, i.e., the difficulty of undoing the obfuscation. We consider code obfuscation by opaque predicate insertion and we show how the degree of abstraction needed to disclose different opaque predicates allows us to compare their potency and resilience.

    32 downloads

    0 comments

    Submitted

  6. Using Optimization Algorithms For Malware Deobfuscation

    Analysis of malware binaries is constantly becoming more difficult with introduction of many different types of code obfuscators. One common theme in all obfuscators is transformation of code into a complex representation. This process can be viewed as inverse of compiler optimization techniques and as such can be partially removed using optimization algorithms. This paper presents common obfuscation techniques and a process of adapting optimization algorithms for removing obfuscations. Additionally, a plug-in for the IDA Pro disassembler is presented that demonstrates usability of the proposed optimization process as well as a set of techniques to speed up the process of analyzing obfuscated code.

    32 downloads

    0 comments

    Submitted

  7. Deobfuscation of Virtualization-Obfuscated Software

    When new malware are discovered, it is important for researchers to analyze and understand them as quickly as possible. This task has been made more difficult in recent years as researchers have seen an increasing use of virtualization-obfuscated malware code. These programs are difficult to comprehend and reverse engineer, since they are resistant to both static and dynamic analysis tech-techniques. Current approaches to dealing with such code first reverse-engineer the byte code interpreter, then use this to work out the logic of the byte code program. This outside-in approach produces good results when the structure of the interpreter is known, but cannot be applied to all cases. This paper proposes a different approach to the problem that focuses on identifying instructions that affect the observable behaviour of the obfuscated code. This inside-out approach requires fewer assumptions, and aims to complement existing techniques by broadening the domain of obfuscated programs eligible for automated analysis. Results from a prototype tool on real-world malicious code are encouraging.

    30 downloads

    0 comments

    Submitted

  8. Symbolic Execution of Obfuscated Code

    Symbolic and concolic execution find important applications in a number of security-related program analyses, including analysis of malicious code. However, malicious code tend to very often be obfuscated, and current concolic analysis techniques have trouble dealing with some of these obfuscations, leading to imprecision and/or excessive resource usage. This paper discusses three such obfuscations: two of these are already found in obfuscation tools used by malware, while the third is a simple variation on an existing obfuscation technique. We show empirically that existing symbolic analyses are not robust against such obfuscations, and propose ways in which the problems can be mitigated using a combination of fine-grained bit-level taint analysis and architecture-aware constraint generations. Experimental results indicate that our approach is effective in allowing symbolic and concolic execution to handle such obfuscations.

    29 downloads

    0 comments

    Submitted

  9. Automatic Deobfuscation of Emulation-Obfuscated Software

    Malicious software are usually obuscated to avoid detection and resist analysis. When new malware is encountered, such obfuscations have to be penetrated or removed (deobfuscated) in order to understand the internal logic of the code and devise countermeasures. This paper discusses an approach for deobfuscation of code that uses emulation-based obfuscation, a particularly challenging class of obfuscations that have deployed in recent years. Our approach is highly general in that we do not make any assumptions about the nature of the obfuscations used; instead, we use semantics ­preserving program transformations to simplify away obfuscation code. Experiments show that our approach is effective in extracting the internal logic from code obfuscated using a variety of emulation-based obfuscators, including tools such as Themida that previous approaches could not handle.

    27 downloads

    0 comments

    Submitted

  10. Binary Code Obfuscations in Prevalent Packer Tools

    Security analysts' understanding of the behavior and intent of malware samples depends on their ability to build high-level analysis products from the raw bytes of program binaries. Thus, the first steps in analyzing defensive malware are understanding what obfuscations are present in real-world malware binaries, how these obfuscations hinder analysis, and how they can be overcome. To this end, we present a thorough examination of the obfuscation techniques used by the packer tools that are most popular with malware authors [Bustamante 2008]. Though previous studies have discussed the current state of binary packing [Yason 2007], anti-debugging [Falliere 2007], and anti-unpacking [Ferrie 2008a] techniques, there have been no comprehensive studies of the obfuscation techniques that are applied to binary code. While some of the individual obfuscations that we discuss have been reported independently, this paper consolidates the discussion while adding substantial depth and breadth to it.

    We describe obfuscations that make binary code difficult to discover (e.g., control-transfer obfuscations, exception-based control transfers, incremental code unpacking, code overwriting); to accurately disassemble into instructions (e.g., ambiguous code and data, disassembler fuzz-testing, non-returning calls); to structure into functions and basic blocks (e.g., obfuscated calls and returns, call-stack tampering, overlapping functions and basic blocks); to understand (e.g., obfuscated constants, calling-convention violations, chunked control-flow, do-nothing code); and to manipulate (e.g., self-checksumming, anti-relocation, stolen-bytes techniques). We also discuss techniques that mitigate the impact of these obfuscations on analysis tools such as disassemblers, decompilers, instrumenters, and emulators. This work is done in the context of our project to build tools for the analysis [Jacobson et al. 2011; Rosenblum et al. 2008] and instrumentation [Bernat and Miller 2011; Hollingsworth et al. 1994] of binaries, and builds on recent work that extends these analyses to malware binaries that are highly defensive [Bernat et al. 2011; Roundy and Miller 2010].

    We begin by describing the methodology and tools we used to perform this study. We proceed to a taxonomy of the obfuscation techniques, along with current approaches to dealing with these techniques, and conclude by presenting statistics and observations on the various obfuscation techniques and tools.

    26 downloads

    0 comments

    Submitted

  11. Code Deobfuscation

    Measuring the security of code obfuscation is difficult. A novel obfuscation transformation is in some cases only measured in terms of code expansion and speed, which are in fact only side effects of the transformation. A first step to define a security value to an obfuscation transformation could be having a look at what a cracker is able to reveal from an obfuscated program. This abstract first of all gives a short overview of existing techniques to obfuscate. Then, we describe existing techniques that can be used to deobfuscate, which were sometimes originally meant for other purposes, and new techniques which we are working on to deobfuscate.

    26 downloads

    0 comments

    Submitted

  12. Automated Approach to the Identification and Removal of Code Obfuscation

    Malware authors and owners of proprietary software algorithms often use code obfuscation techniques to hinder users from gaining understanding about the integral parts of their applications. Simple instruction sequences are obscured, control flow is disorganized, and unnecessary instructions are introduced to confuse disassembly tools, and the reverse engineer.

    The Deobfuscator combines instruction emulation and pattern recognition to determine code control flow, interpret the intended results of obfuscated code, and transform instruction sequences to enhance the readability of code where all states are known.

    25 downloads

    0 comments

    Submitted

  13. General Method of Program Code Obfuscation

    Obfuscation of machine code programs is a form of protection of programs' code against unauthorized reading. The problem of obfuscation is quite fresh, because first papers connected directly to obfuscation appeared only few years ago, yet some advanced publications can be already found. We reviewed them, describing in details most important papers of Christian Collberg and Chenxi Wang.

    We proposed a formal model of program based on the analysis of changes in the usage of computer's resources utilized by the program. The model appeared to be useful for development of obfuscation methods working on the low level of programming. We showed that obfuscating transformation has some interesting properties and proved, that for machine programs it is possible to create a single-pass algorithm of obfuscation. Describing own classification of obfuscating transformations we described different methods of obfuscation from the low level point of view. Obtaining results of research on typical properties of structure of today's computers' programs we created an efficient method of redundant code generation, required during the process of obfuscation. On the base of theoretical analysis and experience of another scientists we proposed a basic algorithm of machine programs obfuscation, which was implemented for the RISC and CISC type processors.

    To estimate efficiency of the obfuscation process we proposed three analytical methods of quality measurements and results of empirical research. We created three algorithms of machine programs' complexity measurement. For the implementation we showed results of quality measurements, performed using analytical and empirical methods. The empirical measurements were done on three different groups of programmers. From the final results it can be concluded, what should be the form of an algorithm of obfuscation, giving almost one hundred percent safe protection against unauthorized analysis. In the final conclusions we estimated values of parameters of an obfuscation process, giving such good efficiency.

    23 downloads

    0 comments

    Submitted

  14. Java Source Code Obfuscation

    Array restructuring operations obscure arrays. Our work aims on java source code obfuscation containing arrays. Our main proposal is Classes with restructured array members and obscured member methods for setting, getting array elements and to get the length of arrays. The class method definition codes are obscured through index transformation and constant hiding. The instantiated objects of these classes are used for source code writing. A tool named JDATATRANS is developed for generating classes and to the best of our knowledge this is the first tool available for array restructuring, on Java source codes.

    22 downloads

    0 comments

    Submitted

  15. Behavioral Analysis of Obfuscated Code

    Classically, the procedure for reverse engineering binary code is to use a disassembler and to manually reconstruct the logic of the original program. Unfortunately, this is not always practical as obfuscation can make the binary extremely large by over-complicating the program logic or adding bogus code.

    We present a novel approach, based on extracting semantic information by analyzing the behavior of the execution of a program. As obfuscation consists in manipulating the program while keeping its functionality, we argue that there are some characteristics of the execution that are strictly correlated with the underlying logic of the code and are invariant after applying obfuscation. We aim at highlighting these patterns, by introducing different techniques for processing memory and execution traces.

    Our goal is to identify interesting portions of the traces by finding patterns that depend on the original semantics of the program. Using this approach the high-level information about the business logic is revealed and the amount of binary code to be analyze is considerable reduced.

    For testing and simulations we used obfuscated code of cryptographic algorithms, as our focus are DRM system and mobile banking applications. We argue however that the methods presented in this work are generic and apply to other domains were obfuscated code is used.

    21 downloads

    0 comments

    Submitted

  16. Loco An Interactive Code Deobfuscation Tool

    This paper presents LOCO, a graphical, interactive environment to experiment with code obfuscation and deobfuscation transformations, which can be applied automatically, semi-automatically and by hand. LOCO is an extension of the multi-platform visualization tool LANCET, combined with an obfuscation infrastructure in the underlying link-time program rewriter DIABLO. By use of LOCO, a developer can easily navigate through the control flow graph of a program and do fine-grained obfuscation, test new obfuscation transformations, test the robustness of existing transformations or improve existing transformations.

    21 downloads

    0 comments

    Submitted

  17. A Toolkit for Code Obfuscation

    According to Business Software Alliance statistics, four out of every ten software programs is pirated in software business, world wide. Global piracy rate has increased 40% over the past years and nearly $11 billion is lost. This is definitely a clear threat for software producers and thus to global economy. Over the years, several software protection techniques have been developed, code obfuscation is one of them and it is very promising. Code obfuscation is a form of software protection against unauthorized reverse-engineering. In this paper we discuss software protection techniques in general and provide a broad overview of known obfuscation algorithms. We also address the issues related to implementation of obfuscation algorithms. Finally we propose JHide, an obfuscation tool kit for protection of Java code. We conclude our paper identifying the need for reviewing the performance of the algorithms as the future scope of our work.

    20 downloads

    0 comments

    Submitted

  18. Automatic Binary Deobfuscation

    This paper gives an overview of our research in the automation of the process of software protection analysis. We will focus more particularly on the problem of obfuscation.

    Our current approach is based on a local semantic analysis, which aims to rewrite the binary code in a simpler (easier to understand) way. This approach has the advantage of not relying on a manual search for patterns of obfuscation. This way of manipulating the code is, at the end, quite similar to the optimising stage of most of compilers. We will exhibit concrete results based on the development of a prototype and its application to a test target. Current limitations and future prospects will be discussed in as well.

    As a continuation of our work from last year, we focus on the automation of the software protection analysis process. We will focus more particularly on the problem of obfuscation.

    This problem is crucial as most malicious binaries (like viruses or trojans) use this kind of protection to slow down their analysis and to make their detection harder. Automation is a key step in order to face the constant growth of the amount of malware, year after year.

    Our previous paper was mainly focused on the attack and suppression of protection mechanisms using the Metasm framework. It provides many useful primitives to deal with protected code: control flow graph manipulation, recompilation, filtering processor, nevertheless most of these approaches rely on a tedious work of manual identification of the patterns used by the protection.

    We will now present the development of our new methods, relying on a semantic analysis of the binary code to extract a simpler representation. The objective is no longer to seek and destroy known patterns, but to proceed to a complete, on-the-fly, optimised code rewriting.

    We will exhibit concrete results obtained by applying these methods to a test target. Then, current limitations and future prospects will be discussed.

    20 downloads

    0 comments

    Submitted

  19. Context-Sensitive Analysis of Obfuscated x86 Executables

    A method for context-sensitive analysis of binaries that may have obfuscated procedure call and return operations is presented. Such binaries may use operators to directly manipulate stack instead of using native call and ret instructions to achieve equivalent behavior. Since definition of context-sensitivity and algorithms for contextsensitive analysis have thus far been based on the specific semantics associated to procedure call and return operations, classic interprocedural analyses cannot be used reliably for analyzing programs in which these operations cannot be discerned. A new notion of context-sensitivity is introduced that is based on the state of the stack at any instruction. While changes in ‘calling’-context are associated with transfer of control, and hence can be reasoned in terms of paths in an interprocedural control flow graph (ICFG), the same is not true of changes in ‘stack’-context. An abstract interpretation based framework is developed to reason about stackcontexts and to derive analogues of call-strings based methods for the context-sensitive analysis using stack-context. The method presented is used to create a context-sensitive version of Venable et al.’s algorithm for detecting obfuscated calls. Experimental results show that the context-sensitive version of the algorithm generates more precise results and is also computationally more efficient than its context-insensitive counterpart.

    20 downloads

    0 comments

    Submitted

  20. Mimimorphism - A New Approach to Binary Code Obfuscation

    Binary obfuscation plays an essential role in evading malware static analysis and detection. The widely used code obfuscation techniques, such as polymorphism and metamorphism, focus on evading syntax based detection. However, statistic test and semantic analysis techniques have been developed to thwart their evasion attempts. More recent binary obfuscation techniques are divided in their purposes of attacking either statistical or semantic approach, but not both. In this paper, we introduce mimimorphism, a novel binary obfuscation technique with the potential of evading both statistical and semantic detections. Mimimorphic malware uses instruction-syntax-aware high-order mimic functions to transform its binary into mimicry executables that exhibit high similarity to benign programs in terms of statistical properties and semantic characteristics. We implement a prototype of the mimimorphic engine on the Intel x86 platform, and evaluate its capability of evading statistical anomaly detection and semantic analysis detection techniques. Our experimental results demonstrate that the mimicry executables are indistinguishable from benign programs in terms of byte frequency distribution and entropy, as well as control flow fingerprint.

    20 downloads

    0 comments

    Submitted

  21. Multi-stage Binary Code Obfuscation Using Improved Virtual Machine

    A software obfuscator transforms a program into another executable one with the same functionality but unreadable code imple­mentation. This paper presents an algorithm of multi-stage software obfuscation method using improved virtual machine techniques. The key idea is to iteratively obfuscate a program for many times in using different interpretations. An improved virtual machine (VM) core is appended to the protected program for byte-code interpretation. Adversaries will need to crack all intermediate results in order to figure out the structure of original code. Compared with existing obfuscators, our new obfuscator generates the protected code which performs more efficiently, and enjoys proven higher level security.

    20 downloads

    0 comments

    Submitted

  22. The Impossibility of Obfuscation with a Universal Simulator

    We show that indistinguishability obfuscation implies that all functions with sufficient "pseudo-entropy" cannot be obfuscated under a virtual black box definition with a universal simulator. Let F = {fs} be a circuit family with super-polynomial pseudo-entropy, and suppose O is a candidate obfuscator with universal simulator S. We demonstrate the existence of an adversary A that, given the obfuscation O(fs), learns a predicate the simulator S cannot learn from the code of A and black-box access to fs. Furthermore, this is true in a strong sense: for any secret predicate P that is not learnable from black-box access to fs, there exists an adversary that given O(fs) efficiently recovers P (s), whereas given oracle access to fs and given the code of the adversary, it is computationally hard to recover P (s).

    We obtain this result by exploiting a connection between obfuscation with a universal simulator and obfuscation with auxiliary inputs, and by showing new impossibility results for obfuscation with auxiliary inputs.

    20 downloads

    0 comments

    Submitted

  23. Applied Binary Code Obfuscation

    An obfuscated code is the one that is hard (but not impossible) to read and understand. Sometimes corporate developers, programmers and malware coders for security reasons, intentionally obfuscate their software in an attempt to delay reverse engineering or confuse antivirus engines from identifying malicious behaviours. Nowadays, obfuscation is often applied to object oriented cross-platform programming languages like Java, .NET (C#, VB), Perl, Ruby, Python and PHP. That is because their code can be easily decompiled and examined making them vulnerable to reverse engineering. On the other hand, obfuscating binary code is not as easy as encrypting object or function names as it is done in programming languages mentioned above. In this case, the code is altered by using a variety of transformations, for instance self modifying code, stack operations or even splitting the factors of simple mathematical functions. Moreover, binary obfuscation is also used to defeat automated network traffic analyzers such like Intrusion Detection and Prevention Systems. In other words, binary code obfuscation is the technique of altering the original code structure and maintaining its original functionality. In the next pages of this paper we will explore the theory and practice of binary code obfuscation as well as a number of various techniques that can be used.

    19 downloads

    0 comments

    Submitted

  24. The Effectiveness of Source Code Obfuscation

    Source code obfuscation is a protection mechanism widely used to limit the possibility of malicious reverse engineering or attack activities on a software system. Although several code obfuscation techniques and tools are available, little knowledge is available about the capability of obfuscation to reduce attackers efficiency, and the contexts in which such an efficiency may vary.

    This paper reports the outcome of two controlled experiments meant to measure the ability of subjects to understand and modify decompiled, obfuscated Java code, compared to decompiled, clear code.

    Results quantify to what extent code obfuscation is able to make attacks more difficult to be performed, and reveal that obfuscation can mitigate the effect of factors that can alter the likelihood of a successful attack, such as the attackers skill and experience, or the intrinsic characteristics of the system under attack.

    19 downloads

    0 comments

    Submitted

  25. Translingual Obfuscation

    Program obfuscation is an important software protection technique that prevents attackers from revealing the programming logic and design of the software. We introduce translingual obfuscation, a new software obfuscation scheme which makes programs obscure by “misusing” the unique features of certain programming languages. Translingual obfuscation translates part of a program from its original language to another language which has a different programming paradigm and execution model, thus increasing program complexity and impeding reverse engineering. In this paper, we investigate the feasibility and effectiveness of translingual obfuscation with Prolog, a logic programming language. We implement translingual obfuscation in a tool called BABEL, which can selectively translate C functions into Prolog predicates. By leveraging two important features of the Prolog language, i.e., unification and backtracking, BABEL obfuscates both the data layout and control flow of C programs, making them much more difficult to reverse engineer. Our experiments show that BABEL provides effective and stealthy software obfuscation, while the cost is only modest compared to one of the most popular commercial obfuscators on the market. With BABEL, we verified the feasibility of translingual obfuscation, which we consider to be a promising new direction for software obfuscation.

    19 downloads

    0 comments

    Submitted


×
×
  • Create New...