RYDB3RG Posted October 6, 2018 Posted October 6, 2018 (edited) Hi, I made a tool that interprets a vmp rsi-stream, it records the handlers (or vm instructions) and connects them via their data dependencies. This is how a JCC looks like The edges in this graph represent data dependencies. Sequences of nodes with one input and one output are collapsed into blocks. Green nodes are constant nodes. They do not depend on external values (such as CPU registers), unlike red nodes. The hex number left of a node is a step number, the right number is its result. Only const nodes (green) can have a result. The graph contains all nodes that directly or indirectly contribute to the lower right "loadcc" instruction. CMP/JCC in VMP works by executing an obfuscated version of the original CMP which also results in either zero or one. VMP then pushes 2 adresses to its stack (step 121f and 1209) and computes an address that points to either one, depending on zero/one result of the corresponding CMP (step 1265). It then simply loads from that computed address and uses its value for a JMP. The load that loads either address is represented by the "loadcc" node in the graph. Even though all puzzle pieces are here, it is still hard to figure out what the original CMP was, but luckily we have LLVM and luckily it isn't hard to lower the graph to LLVM IR: Godbolt Left is the graph as LLVM IR, middle is output of the optimizer, right is the optimized LLVM IR lowered to x64. The attachment contains the original x64 input, the complete vmp program as LLVM (not just the loadcc part), the optimized x64 (-O3) and an unoptimized version (-O0). The unopt version is interesting because it shows how vmp looks like after removing the junk but still leaving the handlers intact (RSI access is removed, RBP-stack is pre-baked to make it easier for the optimizer passes) I thought it was pretty impressive how LLVM's optimizer plows through the crap and produces such a beautiful result. That is all. Thanks for reading. testproc.zip Edited October 6, 2018 by RYDB3RG formatting 9 8
kozera Posted November 6, 2018 Posted November 6, 2018 How did you convert that assembly to llvm IR? It looks pretty good.
RYDB3RG Posted November 11, 2018 Author Posted November 11, 2018 On 11/6/2018 at 9:15 PM, kozera said: How did you convert that assembly to llvm IR? It looks pretty good. Keep in mind that i dont convert vmp's x86 straight to llvm ir (if you are looking for something like that, McSema might help). Instead, I translate the handlers into my own node things, which i then create llvm ir from. There is a bunch of nodes, but most are pretty straight forward. This is how Add looks like: struct AddNode : public BinaryNode { AddNode(const NodePtr &left_value_node, const NodePtr &right_value_node) : BinaryNode(left_value_node, right_value_node) { } void get_name(std::ostream &o) const override { o << "add"; } void gen_ir(GenIr &o) const override { o << id(index) << " = add " << get_ir_type(width) << " " << id(left_value_node->index) << ", " << id(right_value_node->index) << endl; } Width get_width() const override { return left_value_node->width; } }; So it expects 2 input nodes (which usually come from vmp's stack). When generating IR, Node X expects its inputs to already be generated and available via their input's index, so Add can just use consume them, create an Add instruction and thus create a new result, which itself will be consumed eventually (or not, if its a deadstore) 1
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now