Information Security – Return To LIBC
A buffer overflow was one of the very first vulnerabilities, so when it was published, back in 1996, information security wasn’t a popular field, and it wasn’t clear how to go about it. Solutions like reordering the variables showcase this well. Security researchers didn’t understand hackers, didn’t know how to think like them, and because of that were at a huge disadvantage.
Mitigations for Buffer Overflow Attack
1. The first mitigation worth is called a canary, named after the small yellow bird, and how it helped protect miners. In our case, the canary’s going to be some value that will be stored between where the return address is stored and where the buffers are allocated. Before the function returns, it should check the integrity of that value, and see that it hasn’t changed.
2. If it didn’t, all is right with the world. And if it did, someone’s been overflowing buffers, and we should abort the program before they take over. This is useful because when the buffer overflows, it does so sequentially. It can’t skip bytes. This means that if an attacker did overflow the buffer, and did overwrite the return address, she must have gone through the canary as well, and corrupted it in the process, which gives her away.
3. This mitigation doesn’t stop buffer overflows, but it makes it much more difficult to execute. Because you can’t get to the return address without going through the canary, whose value is random and hard to guess. However, it’s not perfect. An information disclosure vulnerability, for example, might let an attacker peek into memory, see what canaries the program uses, and craft a buffer overflow such that it overwrites the canary with its own value, thus preserving its integrity and making everything seem alright.
4. Useful mitigation is called data execution prevention, or DEP. What I like about it is that it treats more than just the symptom. It really takes a step back to reassess the situation, and then asks a very reasonable question, Why is the buffer even executable?
Why buffers are executable
1. Every memory region has certain characteristics that are enforced on the hardware level. It can be readable, in which case we have permissions to read it, it can be writable, in which case we have permissions to write to it, and what the security experts did was introduce another characteristic, whether it’s executable, in which case we have permission to run this memory as if it was code.
2. This is often called the NX bit, for no-execute, so we can map the stack, that memory region where buffers and return addresses are stored, as non-executable. Now, even if an attacker successfully overflows a buffer, overwrites the return address, and diverts the program execution to this buffer, the hardware will refuse to execute this buffer as code, and abort the program.
3. More generally, this is known as the W^X principle. Memory should be either writable or executable, but never both because then an attacker might be able to write some arbitrary code there and divert the program execution to run it. The good news is that both canaries and DEP are widely used and on by default. Most compilers automatically add canaries, also known as stack guards, and mark the stack as non-executable.
4. DEP doesn’t stop buffer overflows, but rather prevents code execution on the stack.
5. We can still overwrite functions’ return addresses, but having lost the ability to write executable code on the buffer, we need to figure out where to jump and what to run instead. Do you know what’s still mapped as executable? Code.
6. Every process has code somewhere in memory which defines its execution flow. In fact, this is where function return addresses point to in the first place. If function f calls function g, which calls function h, in which we overflow the stack, we can change the return address so instead of returning to g, we skip it and return straight to f.
7. But so what? Every program’s code is unique, and by definition does what the program is supposed to be doing. So, mixing and matching it to do something malicious is not that easy, and definitely not generic across programs.
8. Luckily, or not so luckily, most programs use common external libraries, and first among them, the C standard library, or libc. This library defines such basic functions as memcmp, memcpy and printf, the building blocks from which all other code is made.
9. Do you see where this is going? If we can overwrite return addresses to point elsewhere in the code, and all code includes these building blocks, then we can overwrite the return address, in fact, multiple subsequent return addresses, to jump all over the place in such a way and such an order as to patch together some sort of Frankenstein’s Code, which can be arbitrarily complex.
10. Another way to think about it is, having lost the ability to write, we can still compose a letter by taking a newspaper, cutting out words from it, and rearranging them to convey a message of our own, like in a ransom note.
11. In this case, we’re not cutting anything out, but rather overwriting a sequence of return addresses to point to such places that once the function returns, it starts a chain reaction that effectively executes our bidding. And the really frustrating part is that DEP can’t do anything about it, because it’s made of actual executable code.
1. We overwrite the function’s return address with the address of the standard C function called system. This function takes one argument, a string of command, and executes that command. This argument is also stored in memory, so we can further overwrite memory in such a way that once we return, and the system function is invoked, and looks for its argument, it finds a value provided by us, namely “sh” or “cmd”, a command that launches a shell.
2. And as we know, once we have a shell, it’s game over.
NOTE – To mitigate this attack, known as “return to libc”, operating systems incorporated Address Space Layout Randomization, or ASLR. This is a fancy way of saying that every time a process is launched, we should load its code, and most importantly, any external libraries it uses, to slightly different, or randomized, addresses. This way, when a return to libc attack will try to jump there, it will have a hard time landing where it intended. Of course, this isn’t enough.
Note: Nothing is ever enough. An unrelated information disclosure vulnerability might leak some address, which we might be able to used to infer where exactly libc is loaded, and calibrate our attack accordingly. But it does hinder the attack considerably and is currently enabled by default in all major operating systems.