Pancake Posted May 22, 2016 Posted May 22, 2016 (edited) Hello. Id like to create my code virtualizer, something similar to VMP, but Im havin some issues. Studying the old (1.7) vmp virtualization got the idea - you get the original bytes and encrypt/store inside VM section and place a jump to prepared piece of code which pushes onto stack EFlags, 8 registers and unique value which contains encrypted pcode pointer. Then such prepared context jumps into virtual machine. And here the vm gets byte from pcode, decrypts, finds case in switch table and executes it and so on. So after thinking for a long time i got problem emulating the stack. The original function's stack looks like that esp -> return address, esp + 4 -> arg1, esp + 8 -> arg2 and so on Now, when i enter my VM i will have the saved registers and pcode pointer on the stack, over arguments to virtualized function. Emulating operations on registers is easy. Then i want to execute push instruction for example. If i write it on [saved esp - 4] i will overwrite my saved context structure. And what about sub esp,100? So the problem is, how to make stack be continuous ( so i can push pop on original stack, not having my VM context on the stack between)? Is it even doable in C, because i am sure some parts have to be writtten in asm and are undoable in C? Greetz Edited May 22, 2016 by Pancake
SmilingWolf Posted May 22, 2016 Posted May 22, 2016 (edited) Just write every single handler of your VM to be aware of where it should read/write. The point of a VM is executing things in its own way (using custom written handlers), not using the original code at all. You say the original code is encrypted and stored, I'm telling you the original code is destroyed and virtually lost forever. The original context can be stored wherever you want as long as you have the handlers behave consistently and accordingly. I've seen TM CISC uses a custom stack frame for its VM (it actually replaces the stack pointer ESP to point to a VirtualAlloc'ed memory region), RLPack uses a VirtualAlloc'ed memory region as a stack frame without replacing ESP, just having the handlers read/write accordingly to a variable that acts as a v(irtual)ESP, and a few others do different things as well... You sure you did ALL of your homework? I don't want to sound dickish (for once I'm serious about this), but it looks like you haven't seen enough VMs yet (yet I've seen you around such things for a fair while now, so I wonder if I got you wrong at some point?) Check out RLPack, lARP x64, cyclops's CHAKRAVYUHA (here -> http://cyclops.ueuo.com/crackme.html), and whatever else you heard implements a VM simple enough to be used as a study case. Edited May 22, 2016 by SmilingWolf 2
Pancake Posted May 22, 2016 Author Posted May 22, 2016 (edited) 1 hour ago, SmilingWolf said: Just write every single handler of your VM to be aware of where it should read/write. The point of a VM is executing things in its own way (using custom written handlers), not using the original code at all. You say the original code is encrypted and stored, I'm telling you the original code is destroyed and virtually lost forever. The original context can be stored wherever you want as long as you have the handlers behave consistently and accordingly. I've seen TM CISC uses a custom stack frame for its VM (it actually replaces the stack pointer ESP to point to a VirtualAlloc'ed memory region), RLPack uses a VirtualAlloc'ed memory region as a stack frame without replacing ESP, just having the handlers read/write accordingly to a variable that acts as a v(irtual)ESP, and a few others do different things as well... You sure you did ALL of your homework? I don't want to sound dickish (for once I'm serious about this), but it looks like you haven't seen enough VMs yet (yet I've seen you around such things for a fair while now, so I wonder if I got you wrong at some point?) Check out RLPack, lARP x64, cyclops's CHAKRAVYUHA (here -> http://cyclops.ueuo.com/crackme.html), and whatever else you heard implements a VM simple enough to be used as a study case. Yes I obviously will destroy the original code and translate it into something vm interprets. I studied VMP virtualization (but never actually tried to create one), thanks for the links Edited May 22, 2016 by Pancake
deepzero Posted May 22, 2016 Posted May 22, 2016 You have to create a virtual stack. Remember that push x is short for sub esp, 4 + mov [esp], x (at least on x86). So a typical approach (and also the approach VMP uses) is to do 1. save registers (including esp) to space 2. create a virtual stack. You do sub esp, y thereby allocating y bytes of space on the stack. The "real" stack ( = the saved_esp register saved in step 1) is now "before" the cpu esp register. Now you have y bytes of empty stack for your virtual stack. 3. So now what to do if you want to virtualze "push x"? You emulate the instruction sequence I put above: subtract 4 from the saved_esp register saved in step 1. and then write x to dword ptr [saved_esp]. 4. You can do that y / 4 times, then the virtual stack collides with the real stack and you have to grow the virtual stack space. Or increase y to begin with. 5. if you want to exit the vm, you just pop the stored register values, including saved_esp, back into the CPU registers and continue execution. This is the most straight forward approach imo and also the approach VMP 2.x takes. VMP uses y = 0xC0 and grows the stack as necessary. Of course there are other way to implement it as well. Tip: virtualize the following code snippet with VMP and the study how VMP virtualizes it: Quote while(1) { push 0x12345678 } Eventually you will get a classic stack overflow of course. Let us know if you need more help.
Pancake Posted May 22, 2016 Author Posted May 22, 2016 (edited) Thats loop push really helped me realise a lot of things. So the best way is to hold the saved context on top of the stack, have free space for pushes and if the stack comes to the context subtract the esp, copy context up and so u have more space for the stack emulation. By the way, it seems like its impossible to do in C (because C does not like manual fiddling with ebp/esp), these things definitely have to be done in asm, right? Edit: It works Im doin it 100% in asm instead of C. The VM obfiscation will be done in c, but the VM with handlers is asm thing. The first function has been succesfully executed inside vm! Edited May 23, 2016 by Pancake
Kurapica Posted May 23, 2016 Posted May 23, 2016 An old project I have on a simple VM, coded with C++ VM.rar 2
Pancake Posted May 23, 2016 Author Posted May 23, 2016 (edited) I am doin it in asm. Everything is working just great. I just have to implement multiple handlers. Im splitting every "complex" instruction like push into sub esp, 4 and "put_stack" handler, mov eax,[ecx+ebx*4+8] into mov eax,ecx, add eax ebx 4 times and add eax 8 and so on. Thats not that as scary as it looked yesterday, especially when i understood how to properly manage stack resizing and context storing, and obviously dumped C and do everything in asm In the end, i want my tool to be able to take a compiled binary (exe, dll), and virtualize function at selected address Edited May 23, 2016 by Pancake
atom0s Posted May 24, 2016 Posted May 24, 2016 Here is an open source example if you want to look at another project: https://github.com/rwfpl/rewolf-x86-virtualizer 1
Pancake Posted May 25, 2016 Author Posted May 25, 2016 (edited) I did some research before and bumped onto those links you provided, thanks nonetheless. I already took care of destroying imports, relocations, changing references to iat into direct calls into protector's code. Right now only things that seem hard are 1) obfuscate the VM (i wrote it in nasm, handles 30 basic instructions) 2) "link' vm, unpacking stub, encrypted file together. But so far so good, i will yell here if i get stuck 3) handle call instructions in VM Cheers Edited May 25, 2016 by Pancake
Pancake Posted May 31, 2016 Author Posted May 31, 2016 (edited) So far so good but i am unable to manipulate relocations. So things i have tried: 1) If i set relocation table size to 0, it obviously does not relocate anything, but it maps always at 400000 2) my original exe got total of 2730 relocation table size, with first part being at RVA 0x1000 with size 0x6C. If i set the relocation table size to 0x6C, and zero out all other relocations, the file does not start. But i look in LordPE and i see, table size 0x6C, hex data also looks okay... What am i doin wrong with manipulating relocations?? Edit: Okay, actually its 49 of them which is 98 + 8 bytes for size and rva whcih is 0x6A in size. But the original file had 0x6C with 00 00 last because there were next relocations after But the problem still... WHY after packing the file ALWAYS lands on 0x400000 (even with relocation table). The original was always NOT 0x400000 but random Edited May 31, 2016 by Pancake
kao Posted May 31, 2016 Posted May 31, 2016 1) that's kinda expected, isn't it? 2) if the file got mapped at different address than 0x400000 and you removed most of the relocations, most of the references to data and imported APIs are f*cked. Of course it won't start. You should only remove relocations that point to the code that got virtualized. Or, if you're replacing virtualized code with random garbage, leave relocations intact.
Pancake Posted May 31, 2016 Author Posted May 31, 2016 (edited) I want my packer to eat the relocations, wipe out from header and stub will do the magic during unpacking. Even when i leave relocations from first section it still loads at the preffered imagebase, but original loads on random. And yes i DO NOT want the file to RUN. I only want to manipulate the relocs and land on entrypoint safely. I am fully aware of consequences from wiping out relocs. I already wiped out imports, made it themida like (nop E8 call) and took care of resolving them at unpacking and it works, but it seems i am doin something wrong with relocations. Thats relocation table i generate 004001A0 00B00300 DD 0003B000 ; Relocation Table address = 0x3B000 004001A4 6A000000 DD 0000006A ; Relocation Table size = 6A (106.) 00 10 00 00 6A 00 00 00 24 30 B1 30 EA 30 DE 31 ED 31 60 32 6A 32 76 32 82 32 92 32 97 32 A1 32 B7 32 CB 32 CF 32 D9 32 E7 32 F1 32 FB 32 07 33 13 33 1D 33 31 33 47 33 69 33 9F 33 C4 33 DB 33 45 34 6F 34 88 34 A6 34 B8 34 54 35 55 37 82 37 C6 38 90 3A 9B 3A B7 3A BF 3A D4 3A FF 3A E9 3C F3 3C 11 3E 1B 3E C3 3E CD 3E 00 00 00 00 00 00 So we got rva 0x1000 and 0x6A size of relocs. I dont see whats incorrect here and why does lader map it always on prefered imagebase. Maybe i should also physically strip the reloc section in size?? Edited May 31, 2016 by Pancake
kao Posted May 31, 2016 Posted May 31, 2016 What does pe header look like (specifically FileHeader.Characteristics and OptionalHeader.DllCharacteristics) and what are the section attributes for the section where relocations are in? Or just give me a test file to check..
Pancake Posted May 31, 2016 Author Posted May 31, 2016 (edited) Sent you PM. I work on the original file header, so everything that i dont change should be fine Just before a moment i added a new section with proper relocation format and changed the VA and size in optional header to the new section, but still, maps at the 400000 and not random Edited May 31, 2016 by Pancake
Pancake Posted May 31, 2016 Author Posted May 31, 2016 @kao deserves one million hugs The problem was i forgot about TLS section. After removing relocations i was also accidentaly removing ones associated with TLS and that caused the disaster. Now, we can advance forward
Pancake Posted May 31, 2016 Author Posted May 31, 2016 So lets face our ultimate enemy - TLS section. The StartAddressOfRawData and EndAddressOfRawData is easy to handle - just copy the data pointed by it into own section and add relocation. Same for AddressOfIndex, but I have to ask about AddressOfCallBacks. The pointed functions will obviously be destryed by packer and so i have to implement TLS inside unpacking code. The enemy looks like this: typedef VOID (NTAPI *PIMAGE_TLS_CALLBACK) ( PVOID DllHandle, DWORD Reason, PVOID Reserved ); Is it enough to just call each once during unpacking? From what I've read they have to be called on every thread creation/destruction. Whats the proper way to solve the issue? Currently i will call it just once during unpacking and Im aware it will not supply the need for creation/deletion call. Should i modify the PE header TLS image directory entry to the old one from original file, so it will point to original functions?
kao Posted May 31, 2016 Posted May 31, 2016 Search the board, someone was solving the exact same problem a month ago. Sorry, don't remember the details.
Pancake Posted May 31, 2016 Author Posted May 31, 2016 I remember I was following mudlord's attempts. Anyways, now I think I know everything thats needed
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now