When we think of families, the concept that comes to mind is that of closeness - a genetic bond through which certain traits are preserved. Those bonds carry information leading to similar body builds, facial features, hereditary diseases and a host of other commonalities. What makes up a family is the likeliness, of a similarity arising from that genetic connection or bond. A very straightforward analogy can be made in the world of malware.
Malware has been classified into families for decades. Some malware families, like our early ancestors, have grown to have thousands of members (for instance, there are over six thousand Zbot executables (as detected by Microsoft) and over one thousand Sinowal samples 1). These are said to belong to a given family because of their close resemblance.
In the case of malware the bond does not come from genes but from the executable make up or range of actions the malware embeds. A lot of research in recent years has gone into automatic classification of malware. The focus of some of this research revolves around taking executables and automatically discovering their family ties and deducting what the common functionality is.
In this paper we plan to answer a few questions about malware families and malware classification. We want to know what causes malware authors to spin new versions of their binaries. Do they release new variants to evade detection? Is a new release feature related? Or, could it be more bug fix related? We ould also like to know of the major malware families that exist - SubSeven, Conficker, TDSS, Peacomm, PoisonIvy, Waledac - what, if any, "incest" there is. Do we see family A sharing functionality or code with family B? If so this could clue us into which authors communicate and talk to other malware authors. We will examine mass-malware and targeted malware, as well as rootkits.