Buffer Overflow Basics
Buffer overflows are a fertile source of bugs and malicious attacks. They occur when a program attempts to write data past the end of a buffer. A buffer is a contiguous allocated chunk of memory, such as an array or pointer in C. Limitation of C and C++ is there are no automatic bounds checking on the buffer where user can write past a buffer as given in example.
Note: All examples are compiled on Linux platform having x86 configuration.
int main () { int buffer [10]; buffer[20]=10; }
After execution of this program it won’t give errors but program attempts to write beyond the allocated memory for the buffer which results for unexpected output.
Example:
void function (char *str) { char buffer[16]; strcpy(buffer,str); } int main() { char *str=”I am greater than 16 bytes”; function(str); }
This program is guaranteed to cause unexpected behavior, because a string (str) of 27 bytes has been copied to a location (buffer) that has been allocated for only 16 bytes. The extra bytes run past the buffer and overwrite the space allocated for the FP, return address and so on. This corrupts the process stack. The function used to copy the string is strcpy, which completes no checking of bounds. Using strncpy would have prevented this corruption of the stack.
Example:
int main() { char buff[15]={0}; printf(“Enter your name:”); scanf(buff,”%s”); }
In this example, program reads a string from the standard input but does not check strings length. If the string has more than 14 characters, then it causes a buffer overflow as scanf() tries to write the remaining character past buff’s end.
Note: One character is always reserved for a null terminator.
The result is most likely a segmentation fault that crashes the program .In certain conditions, the users will receive a shell’s prompt after the crash. Even if the shell has restricted privileges, they can examine the values of environment variables; list the current directory files to detect the network with the pig command.
Writing Buffer Overflow exploits:
1. Example of an exploitable program - Lets assume that we exploit a function like this:
void lame (void) { char small[30]; gets (small); printf("%sn", small); } main() { lame (); return 0; }
Compile and disassemble it:
# cc -ggdb program.c -o program /tmp/cca017401.o: In function `lame': /root/program.c:1: the `gets' function is dangerous and should not be used. # gdb program /* short explanation: gdb, the GNU debugger is used here to read the binary file and disassemble it (translate bytes to assembler code) */ (gdb) disas main Dump of assembler code for function main: 0x80484c8 : pushl %ebp 0x80484c9 : movl %esp,%ebp 0x80484cb : call 0x80484a0 0x80484d0 : leave 0x80484d1 : ret (gdb) disas lame Dump of assembler code for function lame: /* saving the frame pointer onto the stack right before the ret address */ 0x80484a0 : pushl %ebp 0x80484a1 : movl %esp,%ebp /* enlarge the stack by 0x20 or 32. our buffer is 30 characters, but the memory is allocated 4byte-wise (because the processor uses 32bit words) this is the equivalent to: char small[30]; */ 0x80484a3 : subl $0x20,%esp /* load a pointer to small[30] (the space on the stack, which is located at virtual address 0xffffffe0(%ebp)) on the stack, and call the gets function: gets(small); */ 0x80484a6 : leal 0xffffffe0(%ebp),%eax 0x80484a9 : pushl %eax 0x80484aa : call 0x80483ec 0x80484af : addl $0x4,%esp /* load the address of small and the address of "%sn" string on stack and call the print function: printf("%sn", small); */ 0x80484b2 : leal 0xffffffe0(%ebp),%eax 0x80484b5 : pushl %eax 0x80484b6 : pushl $0x804852c 0x80484bb : call 0x80483dc 0x80484c0 : addl $0x8,%esp /* get the return address, 0x80484d0, from stack and return to that address. you don't see that explicitly here because it is done by the CPU as 'ret' */ 0x80484c3 : leave 0x80484c4 : ret
End of assembler dump.
1.a. Overflowing the program
# ./program xxxxxxxxx <- user input xxxxxxxxxxxxx # ./program xxxxxxxxx <- user input xxxxxxxxxxxxx Segmentation fault (core dumped) # gdb program core (gdb) info registers eax: 0x24 36 ecx: 0x804852f 134513967 edx: 0x1 1 ebx: 0x11a3c8 1156040 esp: 0xbffffdb8 -1073742408 ebp: 0x787878 7895160
EBP is 0x787878, this means that we have written more data on the stack than the input buffer could handle. 0x78 is the hex representation of 'x'. The process had a buffer of 32 bytes maximum size. We have written more data into memory than allocated for user input and therefore overwritten EBP and the return address with 'xxxx', and the process tried to resume execution at address 0x787878, which caused it to get a segmentation fault.
1.b. Changing the return address
Lets try to exploit the program to return to lame() instead of return. We have to change return address 0x80484d0 to 0x80484cb, that is all. In memory, we have: 32 bytes buffer space | 4 bytes saved EBP | 4 bytes RET. Here is a simple program to put the 4byte return address into a 1byte character buffer:
main() { int i=0; char buf[44]; for (i=0;i<=40;i+=4) *(long *) &buf[i] = 0x80484cb; puts(buf); } # ./program test <- user input test
Here the program went through the function two times. If an overflow is present, the return address of functions can be changed to alter the programs execution thread.
Prevention:
1. Always check the bounds of an array before writing it to a buffer. If this is possible [eg when the input is coming from CGI script], then use functions that the number of input characters. For instance, instead of using scanf (), use the fgets () function which reads characters upto specified limit.
Example:
int main() { char buff[15]={0}; fgets(buff,sizeof(buff),stdin); //reads at most 14 characters }2. Additionaly, the standard string functions have versions that take on explicit size limit. Thus ,instead of strcpy(),strcmp() and sprintf() use strncpy(),strcmp(),snprint() respectively.
3. Stack execute invalidation:
Any code that attempts to execute any other code residing in the stack will cause a segmentation violation. Solution is not easy to solve this segmentation violation. Although it is possible in Linux, few compliers use trampoline functions to implement taking the address of a nested function that works on the system stack being executable. A trampoline is a small piece of code created at a run-time when the address of a nested function is taken. It normally resides in the stack, in the stack frame of the containing function and thus requires the stack to be executable.
4. Dynamic run-time checks:
This method primarily relies on the safety code being preloaded before an application is executed. This preloaded component can either provide safer versions of the standard unsafe functions, or it can ensure that return addresses are not overwritten. libsafe library provides secure calls to these functions, even if the function is not available. It makes use of the fact that stack frames are linked together by frame pointers. When a buffer is passed as an argument to any of the unsafe functions, libsafe follows the frame pointers to the correct stack frame. It then checks the distance to the nearest return address, and when the function executes, it makes sure that address is not overwritten.
Bibliography:
- Sandep Grover, Buffer Overflow Attacks and Their Countermeasures, Home | Linux Journal
- Avoiding Buffer overflows, Technology Industry | Computerworld
Suhas A Desai is an undergraduate Computer Engineering Student at Walchand CE Sangli, MS, India. He has written the following: "Biometrics Security with Smart Card in Linux" which was published in ISA EXPO 2004, IEEE Real-Time and Embedded Technology and Applications Symposium, CA, USA., InTech Journal,TX,USA., and e-SMART 2005,France. His research area include Linux security, networking,and Linux kernel internals.