inner workings of an disassembler

May 4, 201312 yr

Hey guys,I am thinking about writing a disassembler to play more with assembly.
What I was wondering with IDA for example:.text:00CD28B0 55 push ebp
.text:00CD28B1 8B EC mov ebp, espIf I use CreateFileMapping to map a file into memory. Is it possible to find a reference to the function start?.The above portion is called from:
.text:0062CA6B E8 40 5E 6A 00 call sub_CD28B0But what if I only have the function start, how would I go on on finding the caller, without breakpointing the code offcourse.
It's something that has me busy for a while now and I am wondering how it works.

May 4, 201312 yr

There's no perfect way to get a list of all locations, you'll have to work with heuristics to get good results. Writing good disassemblers and distinguishing code/data is a science in itself.

Generally, you would use the information you have on the file (sections, ep, relocation info, imagebase, ...) and disassemble the whole file, while checking for destination-links to your function (or building a table with all that right away).

That's alot of work, this is what you could do for quick results:

1) find which section the EP is in and assume it to be the code section

2) disassemble the entire code section assuming it is located at the default-imagebase from the header

3) check the disassembly for links to your function.

Additionally, if your file was linked with relocation information, you might want to check that, too (though you should have covered most with the disasm).

Remember that the file can always employ obfuscation/runtime calculations to hide those links and that you might get wrong or incomplete data due to embedded data bytes, and whatnot.

Long story short: Use IDA or olly.

Edited May 4, 201312 yr by deepzero

May 4, 201312 yr

As an beginning you could just scan the whole file for x90 or xCC to find the fillings between a function. Additional If debug symbols are available you could scan the memory and get the FunctionNames for the offsets and put that together with your x90 and xCC search.

You will get a list of functions but some maybe missing or incorrect (since some instructions have xCC in it). Then you will need to collect these information with that what deepzero said. In the end its only searching in the collected data to build up what you need.

Thats how I do it in my debugger to get the functions and it works quite good.

with debug symbols:

without:

Regards,

Zer0Flag

Edited May 4, 201312 yr by Zer0Flag

May 5, 201312 yr

Author

I found a solution for the question.

Based on the info deepzero posted I managed to get it all working.

I use BEAEngine to disassemble the entire code section:

StartDisassembling:
invoke Disasm, offset MyDisasm

cmp D[MyDisasm.Instruction.AddrValue], ecx

The solution lies in the structure definition: MyDisasm.Instruction.AddrValue

When BEAEngine diassembles a CALL the variable is filled with the calling address to the function.

It seems like a good start.

This will keep me busy and get me started for now.

Sign In

inner workings of an disassembler

Featured Replies

Create an account or sign in to comment

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)