Jump to content
Tuts 4 You
RYDB3RG

VMPROTECT vs. LLVM

Recommended Posts

RYDB3RG
Posted (edited)

Hi,

I made a tool that interprets a vmp rsi-stream, it records the handlers (or vm instructions) and connects them via their data dependencies.
This is how a JCC looks like

The edges in this graph represent data dependencies. Sequences of nodes with one input and one output are collapsed into blocks. 
Green nodes are constant nodes. They do not depend on external values (such as CPU registers), unlike red nodes. 
The hex number left of a node is a step number, the right number is its result. Only const nodes (green) can have a result.

The graph contains all nodes that directly or indirectly contribute to the lower right "loadcc" instruction.

CMP/JCC in VMP works by executing an obfuscated version of the original CMP which also results in either zero or one. 
VMP then pushes 2 adresses to its stack (step 121f and 1209) and computes an address that points to either one, 
depending on zero/one result of the corresponding CMP (step 1265). It then simply loads from that 
computed address and uses its value for a JMP. The load that loads either address is 
represented by the "loadcc" node in the graph.

Even though all puzzle pieces are here, it is still hard to figure out what the original CMP was, 
but luckily we have LLVM and luckily it isn't hard to lower the graph to LLVM IR: Godbolt
Left is the graph as LLVM IR, middle is output of the optimizer, right is the optimized LLVM IR lowered to x64.

The attachment contains the original x64 input, the complete vmp program as LLVM (not just the loadcc part), the optimized x64 (-O3) and 
an unoptimized version (-O0). The unopt version is interesting because it shows how vmp looks like after removing
the junk but still leaving the handlers intact (RSI access is removed, RBP-stack is pre-baked to make it easier for the optimizer passes)

I thought it was pretty impressive how LLVM's optimizer plows through the crap and produces such a beautiful result.

That is all. Thanks for reading.

testproc.zip

Edited by RYDB3RG
formatting (see edit history)
  • Like 7
  • Thanks 6

Share this post


Link to post
Share on other sites
BambooQJ

good job

Share this post


Link to post
Share on other sites
kozera

How did you convert that assembly to llvm IR? It looks pretty good.

Share this post


Link to post
Share on other sites
RYDB3RG
On 11/6/2018 at 9:15 PM, kozera said:

How did you convert that assembly to llvm IR? It looks pretty good.

 

Keep in mind that i dont convert vmp's x86 straight to llvm ir (if you are looking for something like that, McSema might help).

Instead, I translate the handlers into my own node things, which i then create llvm ir from. There is a bunch of nodes, but most are pretty straight forward. This is how Add looks like:
 

struct AddNode : public BinaryNode
{
    AddNode(const NodePtr &left_value_node, const NodePtr &right_value_node) : BinaryNode(left_value_node, right_value_node)
    {

    }

    void get_name(std::ostream &o) const override 
    {
        o << "add";
    }

    void gen_ir(GenIr &o) const override
    {
        o << id(index) << " = add " << get_ir_type(width) << " " << id(left_value_node->index) << ", " << id(right_value_node->index) << endl;
    }

    Width get_width() const override
    {
        return left_value_node->width;
    }
};

 

So it expects 2 input nodes (which usually come from vmp's stack). When generating IR, Node X expects its inputs to already be generated and available via their input's index, so Add can just use consume them, create an Add instruction and thus create a new result, which itself will be consumed eventually (or not, if its a deadstore)

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×