A binary rewriter is a piece of software that accepts a binary executable program as input, and produces an improved executable as output. This paper describes the first technique in literature to decompile the input binary into an existing compiler's high-level intermediate form (IR). The compiler's back-end is then used to generate the output binary from the IR. Doing so enables the use of the rich set of compiler analysis and transformation passes available in mature compilers. It also enables binary rewriters to perform complex high-level transformations, such as automatic parallelization, not possible in existing binary rewriters.
Certain characteristics of binary code pose a great challenge while translating a binary to a high-level compiler IR; these include the use of an explicitly addressed stack, lack of function prototypes and the lack of symbols. We present techniques to overcome these challenges. We have built a prototype binary rewriter called SecondWrite that uses LLVM, a widely-used compiler infrastructure, as our intermediate IR, and rewrites both x86 binaries. Our results show that SecondWrite accelerates un-optimized binaries by 27% on average for our benchmarks, and maintains the performance of already optimized binaries without any custom optimizations on our part. We also present two case studies for custom improvement "automatic parallelization and security“ to exemplify the beneï¬ts and applications of a binary rewriter using a high IR.