Program binaries are commonly held to be an execute-only program form: rigid, lacking in clear structure, complex to extend and difficult to modify. However, there are several benefits to be gained from modifying binaries rather than another program form: the effects of the compiler upon the program are clearly present; binary modification does not require access to source code, which may be unavailable; and users may manipulate programs while they execute, which is impossible with other forms of program modification.
In this dissertation, we develop and refine four desired properties of a binary modification toolkit: abstraction, safety, timeliness, and efficiency. By abstraction, we mean that a user should operate in terms of familiar structural representations, such as functions, loops, or basic blocks, instead of directly on instructions. By safety, we mean that modification should preserve the visible behavior of code that was not explicitly modified and the structural validity of the binary as a whole. By timeliness, we mean that a toolkit should allow modification of a binary at any time in its execution continuum, from a file on disk to actively executing code. By efficiency, we mean that modification should impose cost that is both low and proportional to the amount of modified code and the frequency with which it is executed.
We then describe three techniques that allow us to achieve these properties. First, we demonstrate that the CFG, an abstraction that represents the binary programâ€™s structure, can also be used to modify this structure and thus the binary as a whole. By leveraging the CFG, we allow users to operate in terms of familiar and natural constructs rather than requiring them to understand the idiosyncrasies of particular instruction sets. Second, we further refine techniques for code replacement, allowing us to modify a program binary at any time in its execution continuum while preserving proportional cost. Third, we present a technique based on a formal understanding of the characteristics of binary code that allows us to modify the structure of the binary without changing its user-visible behavior, even when the binary attempts to detect such modifications.