We present a new and efficient generic unpacking algorithm which effectively locates the original entry point (OEP) area of a packed program. The algorithm is based upon the dual observation that (a) even in a packed program, the OEP bytes are almost always only executed once, and (b) most packers unpack the original program to an area of memory which has not been previously executed. Given this, the technique relies upon creating a histogram of the addresses of executed instructions (EIP on x86). Whilst others have done this, the trick is to order the histogram by the last time an address is executed. Decryption, decompression and copying appear as large spikes at the start of the histogram, followed by a flat section, of height one, which is usually the OEP. We attach figures showing histograms for some popular packers, on both linear and log scales, which clearly illustrate the OEP after the massive unpacking "hump".
This technique is extremely efficient to implement, and can compute the OEP "on-the-fly" in an emulator, or off-line from a trace of EIP. For instance, for UPX 2.03w, we need less than 1K of memory to hold the necessary data structures, and computation is similarly cheap (and compatible with dynamic-translation emulators). Given the shape of the chart, and the fact that after the "hump" represents a good opportunity to dump the memory, we have given this technique the somewhat sordid name of hump-and-dump.