Jump to content
Tuts 4 You

Understanding assembly Hello World


fabiothebest

Recommended Posts

Posted

I'm studying x86 architecture and assembly in order to have the basis for studying reversing and exploit development. I'm following a course on opensecuritytraining.info.

I see a Hello World example:

push ebp
mov ebp, esp
push offset aHelloWorld; "Hello world\n"
call ds:__imp__printf
add esp, 4
mov eax, 1234h
pop ebp
retn

This code was generated by Windows Visual C++ 2005 with buffer overflow protection turned off and disassembled with IDA Pro 4.9 Free Version.

I'm trying to understand what each line does.

the first line is push ebp.

I know ebp stands for base pointer. What is its function?

I see that in the second line the value in esp is moved into ebp and searching online I see that there first 2 instructions are very common at the beginning of an assembly program.

Though are ebp and esp empty at the beginning? I'm new to assembly. Is ebp used for stack frames, so when we have a function in our code and is it optional for a simple program?

Then push offset aHelloWorld; "Hello world\n"

The part after ; is a comment so it doesn't get executed right? The first part instead adds the address containing the string Hello World to the stack, right? But where is the string declared? I'm not sure I understand.

Then call ds:__imp__printf

it seems it's a call to a function, anyway printf is a builtin function right? And does ds stand fordata segment register? Is it used because we are trying to access a memory operand that isn't on the stack?

then add esp, 4

do we add 4 bytes to esp? Why?

then move eax, 1234h what is 1234h here?

then pop ebx..it was pushed at the beginning. is it necessary to pop it at the end?

then retn ( i knew about ret for returning a value after calling a function). I read that the n in retn refers to the number of pushed arguments by the caller. It isn't very clear for me. Can you help me to understand?

  • Like 1
Posted

In genera, if you're having problems with specific course, it's better to ask course authors. I haven't seen the course and can't comment on its quality..

Judging by your questions, you should rather find some ebook about assembly language basics. Any one should cover those concepts.

Quick answers, as I'm on mobile...

#1 - push ebp, mov ebp, esp - google "function prologue"

#2 - if you click on aHelloWorld, IDA will show where it's declared. It will be in data segment.

#3 - call. There are no builtin functions in assembler. Clicking in IDA will show you.. 

#4 - add esp, 4 - google "Windows calling conventions"

#5 - mov eax, 1234h. eax is used to return a value from function. In C it would look like "return 0x1234;"

#6 - pop ebp - google "function epilogue"

 

 

  • Like 1
Posted
4 minutes ago, kao said:

In genera, if you're having problems with specific course, it's better to ask course authors. I haven't seen the course and can't comment on its quality..

Judging by your questions, you should rather find some ebook about assembly language basics. Any one should cover those concepts.

Quick answers, as I'm on mobile...

#1 - push ebp, mov ebp, esp - google "function prologue"

#2 - if you click on aHelloWorld, IDA will show where it's declared. It will be in data segment.

#3 - call. There are no builtin functions in assembler. Clicking in IDA will show you.. 

#4 - add esp, 4 - google "Windows calling conventions"

#5 - mov eax, 1234h. eax is used to return a value from function. In C it would look like "return 0x1234;"

#6 - pop ebp - google "function epilogue"

 

 

 

Thank you very much. You confirmed some of my thoughts and pointed me in the right direction for some other things. I need to learn more about computer architecture and assembly.

  • Like 1
Posted

__imp is also a masm thing for imports.. so its call [printf] which will be in the executable import table, populated when the file is loaded into memory...

add esp, 4 is because the prinf function is a c function, ie : not stdcall (google function calling conventions), for c functions its up to the callee to balance the stack (you passed 1 param to the call - the pointer to the hello world text... 1 param in 32 bit = 4 bytes.. thats where the 4 comes from).. if the function was __stdcall type, the add esp, 4 would not be present

  • Like 1
Posted (edited)
BLOCK: A - ###FUNCTION PROLOGUE### (Most functions will always begin like this, you don't really need to be too techincal about this
but if you insist, just look for "ASM Function Epilogue")
--------------------------------
push ebp
mov ebp, esp
--------------------------------

BLOCK: B - ###PUSH OFFSET FOR THE STRING REFERENCE "Hello World\n" ON STACK###
----------------------------------------------------------
push offset aHelloWorld; "Hello world\n"
----------------------------------------------------------

BLOCK: C - ###CALL PRINTF FROM THE DATA SEGMENT###
---------------------------------------
call ds:__imp__printf
---------------------------------------

BLOCK: D - ###ALLOCATE 4 BYTES FOR LOCAL VARIABLE###
-----------------------------------------
add esp, 4
-----------------------------------------

BLOCK: E - ###MOVES 1234 TO EAX TO BE USED FOR A RETURN###
-----------------------------------------------
mov eax, 1234h
-----------------------------------------------

BLOCK: F - ###FUNCTION EPILOGUE###
-----------------------
pop ebp
-----------------------

BLOCK: G - ###RETURN TO THE CALLER###
--------------------------
retn
--------------------------

 

BLOCK A:
https://en.wikipedia.org/wiki/Function_prologue#Prologue

BLOCK B:
http://stackoverflow.com/a/17634965/3874785

BLOCK C:
?

BLOCK D:
https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames#Standard_Entry_Sequence

BLOCK E:
http://stackoverflow.com/questions/6171172/return-value-of-a-c-function-to-asm#6171209

BLOCK F:
https://en.wikipedia.org/wiki/Function_prologue#Epilogue

 

That's what I understand so far, hope it helps in something, however Kao expressed it very good as well.

Edited by 0xNOP
  • Like 1
Posted

BLOCK: D - ###ALLOCATE 4 BYTES FOR LOCAL VARIABLE###
-----------------------------------------
add esp, 4
-----------------------------------------

is wrong...

4 bytes were used for the push then call to printf... printf is a c function (not stdcall) so the add esp, 4 is to rebalance the stack

  • Like 2
Posted

Thank you very much everyone. Your comments are very useful. I hope I can learn much and contribute as well.

Posted (edited)
4 hours ago, evlncrn8 said:

BLOCK: D - ###ALLOCATE 4 BYTES FOR LOCAL VARIABLE###
-----------------------------------------
add esp, 4
-----------------------------------------

is wrong...

4 bytes were used for the push then call to printf... printf is a c function (not stdcall) so the add esp, 4 is to rebalance the stack

 
 
 

Ahh! Gotcha :D 

Now this got me thinking... unless you don't specify (or force) _cdecl or _stdcall calling convention on a function, which is the one default? is it C automatically?

See example...

void __cdecl HelloWorldCDECL()
//force C calling convention
{
    std::cout << "Hello World" << std::endl;
}
 
void __stdcall HelloWorldSTDCALL()
//force STDCALL calling convention
{
    std::cout << "Hello World" << std::endl;
}

 

Edited by 0xNOP
Posted

by default, at least in msvc, its cdecl, unless you explicitly state __stdcall or WINAPI

  • Like 1

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...