Jump to content

inner workings of an disassembler


snoopy

Recommended Posts

Hey guys,I am thinking about writing a disassembler to play more with assembly.
What I was wondering with IDA for example:.text:00CD28B0 55                                                           push    ebp
.text:00CD28B1 8B EC                                                      mov     ebp, espIf I use CreateFileMapping to map a file into memory. Is it possible to find a reference to the function start?.The above portion is called from:
.text:0062CA6B E8 40 5E 6A 00                                               call    sub_CD28B0But what if I only have the function start, how would I go on on finding the caller, without breakpointing the code offcourse.
It's something that has me busy for a while now and I am wondering how it works.
 

 

Link to comment

There's no perfect way to get a list of all locations, you'll have to work with heuristics to get good results. Writing good disassemblers and distinguishing code/data is a science in itself.


 


Generally, you would use the information you have on the file (sections, ep, relocation info, imagebase, ...) and disassemble the whole file, while checking for destination-links to your function (or building a table with all that right away).


 


 


That's alot of work, this is what you could do for quick results:


 


1) find which section the EP is in and assume it to be the code section


2) disassemble the entire code section assuming it is located at the default-imagebase from the header


3) check the disassembly for links to your function.


 


 


Additionally, if your file was linked with relocation information, you might want to check that, too (though you should have covered most with the disasm).


 


 


Remember that the file can always employ obfuscation/runtime calculations to hide those links and that you might get wrong or incomplete data due to embedded data bytes, and whatnot.


 


 


 


Long story short: Use IDA or olly.


Edited by deepzero
Link to comment

As an beginning you could just scan the whole file for x90 or xCC to find the fillings between a function. Additional If debug symbols are available you could scan the memory and get the FunctionNames for the offsets and put that together with your x90 and xCC search.


 


You will get a list of functions but some maybe missing or incorrect (since some instructions have xCC in it). Then you will need to collect these information with that what deepzero said. In the end its only searching in the collected data to build up what you need.


 


Thats how I do it in my debugger to get the functions and it works quite good.


with debug symbols:


04052013164206.png


without:


04052013164251.png


 


Regards,


Zer0Flag


Edited by Zer0Flag
Link to comment

I found a solution for the question.

Based on the info deepzero posted I managed to get it all working.

I use BEAEngine to disassemble the entire code section:

 

StartDisassembling:
invoke Disasm, offset MyDisasm

cmp D[MyDisasm.Instruction.AddrValue], ecx
 

The solution lies in the structure definition: MyDisasm.Instruction.AddrValue 

When BEAEngine diassembles a CALL the variable is filled with the calling address to the function.

 

It seems like a good start.

This will keep me busy and get me started for now.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...