SECURE PROGRAMMING
A.A. 2018/2019
TERMINOLOGY
SECURITY FLAWS
Software engineering has long been concerned with the elimination of software defects.
A software defect is the encoding of a human error into the software, including omissions.
Software defects can originate at any point in the
software development life cycle. For example, a defect in a deployed product can originate from a misstated or
misrepresented requirement.
Security Flaw
üA software defect that poses a potential security risk.
SECURITY FLAWS
Not all software defects pose a security risk. Those that do are security flaws. If we accept that a security flaw is a software defect, then we must also accept that by
eliminating all software defects, we can eliminate all security flaws.
This premise underlies the relationship between software engineering and secure programming.
An increase in quality, as might be measured by defects per thousand lines of code, would likely also result in an increase in security.
Many tools, techniques, and processes that are designed to eliminate software defects also can be used to
DIFFICULT/EXPENSIVE TO REMOVE
However, many security flaws go undetected because traditional software development processes seldom assume the existence of attackers.
For example, testing will normally validate that an
application behaves correctly for a reasonable range of user inputs.
Unfortunately, attackers are seldom reasonable and will spend an inordinate amount of time devising inputs that will break a system.
VULNERABILITY
Not all security flaws lead to vulnerabilities. However, a security flaw can cause a program to be vulnerable to attack when the program’s input data (for example,
command-line parameters) crosses a security boundary en route to the program.
This may occur when a program containing a security flaw is installed with execution privileges greater than those of the person running the program or is used by a network service where the program’s input data arrives via the network connection.
Vunerability
üA set of conditions that allows an attacker to violate an
POLICY
Taken verbatim from RFC 2828, the Internet Security Glossary (https://www.ietf.org/standards/rfcs/)
Security Policy
üA set of rules and practices that specify or regulate how a system or organization provides security services to protect sensitive and critical system resources.
Security policies that are documented, well known, and visibly enforced can help establish expected user
behaviour.
FLAWS ARE NOT VULNERABILITIES
A security flaw can also exist without all the
preconditions required to create a vulnerability.
For example, a program can contain a defect that allows a user to run arbitrary code inheriting the permissions of that program.
üThis is not a vulnerability if the program has no special permissions and can be accessed only by local users,
because there is no possibility that a security policy will be violated.
VULNERABILITIES
Vulnerabilities can exist without a security flaw.
Because security is a quality attribute that must be traded off with other quality attributes such as
performance and usability, software designers may
intentionally choose to leave their product vulnerable to some form of exploitation.
Making an intentional decision not to eliminate a
vulnerability does not mean the software is secure, only that the software designer has accepted the risk on
behalf of the software consumer.
EXPLOITS
Exploit
üA technique that takes advantage of a security vulnerability to violate an explicit or implicit security policy.
Vulnerabilities in software are subject to exploitation.
Exploits can take many forms, including worms, viruses, and trojans.
Understanding how programs can be exploited is a valuable tool that can be used to develop secure software.
üHowever, disseminating exploit code against known vulnerabilities can be damaging to everyone.
ZERO-DAY EXPLOIT
A zero-day (also known as 0-day) vulnerability is a
computer-software vulnerability that is unknown to those who would be interested in mitigating the vulnerability (including the vendor of the target software).
An exploit directed at a zero-day is called a zero-day exploit, or zero-day attack
Developed by highly-skilled groups (NSA, central governments)
üEternal blue https://cve.mitre.org/cgi-
bin/cvename.cgi?name=CVE-2017-0144
On the market
ühttps://0day.today
MITIGATIONS
Mitigation
üMethods, techniques, processes, tools, or runtime libraries that can prevent or limit exploits against vulnerabilities.
A mitigation (or countermeasure) is a solution for a software flaw or a workaround that can be applied to prevent exploitation of a vulnerability.
üAt the source code level, mitigations can be as simple as replacing an unbounded string copy operation with a
bounded one.
üAt a system or network level, a mitigation might involve
turning off a port or filtering traffic to prevent an attacker from accessing a vulnerability.
MITIGATIONS
The preferred way to eliminate security flaws is to find and correct the actual defect.
However, in some cases it can be more cost-effective to eliminate the security flaw by preventing malicious inputs from reaching the defect.
Vulnerabilities can also be addressed operationally by isolating the vulnerability.
Of course, operationally addressing vulnerabilities
significantly increases the cost of mitigation because the cost is pushed out from the developer to system
administrators and end users.
TO SUM UP
A LINK TO CIA
A resource (either physical or logical) may have one or more vulnerabilities that can be exploited by a threat agent in a threat action.
The result can potentially compromise the
confidentiality, integrity or availability of resources.
TAXONOMY
AN EXAMPLE OF CLASSIFICATION
paper
A BIT OF C HISTORY
WHY C/C++
In this course, the decision to use C and C++ was based on the popularity of these languages, the enormous
legacy code base, and the amount of new code being developed in these languages.
WHY C
The C programming language is intended to be a lightweight language with a small footprint.
This characteristic of C leads to vulnerabilities when programmers fail to implement required logic because they assume it is handled by C (but it is not).
This problem is magnified when programmers are
familiar with superficially similar languages such as Java, Pascal, or Ada, leading them to believe that C protects the programmer better than it actually does.
These false assumptions have led to programmers failing to prevent writing beyond the boundaries of an array, failing to catch integer overflows and truncations and calling functions with the wrong number of
WHY C
The C Standard [ISO/IEC 2011] defines several kinds of behaviors:
Locale-specific behavior: behavior that depends on local conventions of nationality, culture, and language that each implementation
documents.
üAn example of locale-specific behavior is whether the islower() function returns true for characters other than the 26 lowercase Latin letters.
Unspecified behavior: use of an unspecified value, or other behavior where the C Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance.
üAn example of unspecified behaviour is the order in which the arguments to a function are evaluated.
Implementation-defined behaviour: unspecified behaviour where each implementation documents how the choice is made.
üAn example of implementation-defined behaviour is the propagation of the high-order bit when a signed integer is shifted right.
Undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International
Standard imposes no requirements.
LET’S START WITH C
C is a high-level general-purpose, procedural
programming language. Dennis Ritchie first devised C in the 1970s at AT&T Bell Laboratories in Murray Hill, New Jersey, for the purpose of implementing the Unix
operating system and utilities with the greatest possible degree of independence from specific hardware
platforms. The key characteristics of the C language are the qualities that made it suitable for that purpose:
üSource code portability
üThe ability to operate “close to the machine”
üEfficiency
As a result, the developers of Unix were able to write most of the operating system in C, leaving only a
C
C’s ancestors are the typeless programming languages BCPL (the Basic Combined Programming Language), developed by Martin Richards; and B, a descendant of BCPL, developed by Ken Thompson.
A new feature of C was its variety of data types:
characters, numeric types, arrays, structures, and so on.
Brian Kernighan and Dennis Ritchie published an official description of the C programming language in 1978.
Few hardware-dependent elements. For example, the C language proper has no file access or dynamic memory management statements. No input/output.
Instead, the extensive standard C library provides the functions for all of these purposes.
VIRTUES OF C
Fast (it's a compiled language and so is close to the machine hardware)
Portable (you can compile you program to run on just about any hardware platform out there)
The language is small (unlike C++ for example) Mature (a long history and lots of resources and experience available)
There are many tools for making programming easier (e.g. IDEs like Xcode)
You have direct access to memory
CHALLENGES OF USING C
The language is small (but there are many APIs)
It's easy to get into trouble, e.g. with direct memory access & pointers
You must manage memory yourself
Sometimes code is more verbose than in high-level scripting languages like Python, R, etc
STANDARDS
K & R C (Brian Kernighan and Dennis Ritchie)
ü1972 First created by Dennis Ritchie
ü1978 The C Programming Language described
ANSI C
ü1989 ANSI X.159-1989 aka C89 - First standardized version
ISO C
1990 ISO/IEC 9899:1990 aka C90 - Equivalent to C89 1995 Amendment 1 aka C95
1999 ISO/IEC 9899:1999 aka C99 2011 ISO/IEC 9899:2011 aka C11
DENNIS
HISTORY OF C++
In the early 1970s, Dennis Ritchie introduced “C” at Bell Labs.
ühttp://cm.bell-labs.co/who/dmr/chist.html
As a Bell Labs employee, Bjarne Stroustrup was
exposed to and appreciated the strengths of C, but also appreciated the power and convenience of higher-level languages like Simula, which had language support for object-oriented programming (OOP).
üOriginally called C With Classes, in 1983 it becomes C++
In 1985, the first edition of The C++ Programming Language was released
HISTORY
Adding support for OOP turned out to be the right feature at the right time for the ʽ90s. At a time when GUI
programming was all the rage, OOP was the right paradigm, and C++ was the right implementation.
At over 700 pages, the C++ standard demonstrated
something about C++ that some critics had said about it for a while: C++ is a complicated beast.
The first decade of the 21st century saw desktop PCs that were powerful enough that it didn’t seem worthwhile to deal with all this complexity when there were
alternatives that offered OOP with less complexity.
üJava
STROUSTRUP
CHARACTERISTICS
The most important feature of C++ is that it is both low- and high- level.
Programming in C++ requires a discipline and attention to detail that may not be required of kinder, gentler
languages that are not as focused on performance
üNo garbage collector!
STANDARDS
1998 ISO/IEC 14882:1998 C++98 2003 ISO/IEC 14882:2003 C++03 2011 ISO/IEC 14882:2011 C++11 2014 ISO/IEC 14882:2014 C++14 2017 ISO/IEC 14882:2017 C++17
2020 ???? C++20
LET’S START!
CERT C CODING STANDARD
https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT +C+Coding+Standard
The SEI CERT C Coding Standard is a software coding standard for the C programming language, developed by the CERT Coordination Center to improve the safety,
reliability, and security of software system
üCERT division at Carnagie Mellon
üA non-profit United States federally funded research and development center
ARRAYS AND STRINGS
ARRAYS ARE COMPLICATED
When passed as a parameter, an array name is a pointer
sizeof(int *) == 8 always
The CERT C Secure Coding Standard includes “ARR01-C. Do not apply the sizeof operator to a pointer when taking the size of an array,” which warns
STRINGS
Strings are a fundamental concept in software
engineering, but they are not a built-in type in C or C++.
The standard C library supports strings of type char and wide strings of type wchar_t.
IMPROPERLY BOUNDED COPIES
The gets() function has been deprecated in C99 and eliminated from Reading from a source to a fixed-length array. This program has undefined
behaviour if more than eight characters (including the null terminator) are entered at the prompt. The main problem with the gets() function is that it
provides no way to specify a limit on the number of characters to read.
READING FROM STDIN
Reading data from unbounded sources (such as stdin())
creates an interesting problem for a programmer. Because it is not possible to know beforehand how many characters a user will supply, it is not possible to preallocate an array of sufficient length.
A common solution is to statically allocate an array that is thought to be much larger than needed. In this example, the programmer expects the user to enter only one character and consequently assumes that the eight-character array length will not be exceeded.
üWith friendly users, this approach works well. But with malicious users, a fixed-length character array can be easily exceeded, resulting in undefined behaviour.
This approach is prohibited by The CERT C Secure Coding Standard, “STR35-C. Do not copy data from an unbounded source to a fixed-length array.”
FROM PROGRAM PARAMETERS
Vulnerabilities can occur when inadequate space is allocated to copy a program input such as a command-line argument.
Although argv[0] contains the program name by convention, an attacker can control the contents of arg[0] to cause a vulnerability in the following program by providing a string with more than 128 bytes.
Problems with C++ as well
HOW TO FIX IT
NULL TERMINATING STRINGS
The result is that the strcpy() to c may write well beyond the bounds of the array because the string stored in a[] is not correctly null-terminated.
The CERT C Secure Coding Standard includes “STR32-C. Null-terminate byte strings as required.”
ERRORS
Most of the functions defined in the standard string-handling library
<string.h>. Visual Studio has deprecated most of them.
However, errors are still possible without them, since strings are arrays of chars…
STRING VULNERABILITIES
AND EXPLOITS
TAINTED VALUES
Previous sections described common errors in manipulating strings in C or C++.
üThese errors become dangerous when code operates on untrusted data from external sources such as command-line arguments, environment variables, console input, text files, and network connections.
It is safer to view all external data as untrusted.
In software security analysis, a value is said to be
tainted if it comes from an untrusted source (outside of the program’s control) and has not been sanitized to
ensure that it conforms to any constraints on its value
that consumers of the value require, for example, that all strings are null-terminated.
PASSWORD EXAMPLE
SECURITY FLAWS
The security flaw in the IsPasswordOK program that
allows an attacker to gain unauthorized access is caused by the call to gets().
The condition that allows an out-of-bounds write to occur is referred to in software security as a buffer overflow.
ONE MORE FLAW
The IsPasswordOK program has another problem: it does not check the return status of gets().
This is a violation of “FIO04- C. Detect and handle input and output errors.”
When gets() fails, the contents of the Password buffer are indeterminate, and the subsequent strcmp() call has undefined behaviour.
In a real program, the buffer might even contain the good password previously entered by another user.
BUFFER OVERFLOWS
Buffer overflows occur when data is written outside of the boundaries of the memory allocated to a particular data structure. C and C++ are susceptible to buffer overflows because these languages:
üDefine strings as null-terminated arrays of characters.
üDo not perform implicit bounds checking.
üProvide standard library calls for strings that do not enforce bounds checking.
Depending on the location of the memory and the size of the overflow, a buffer overflow may go undetected but
can corrupt data, cause erratic behaviour, or terminate the program abnormally.
BUFFER OVERFLOWS
Not all buffer overflows lead to software vulnerabilities.
However, a buffer overflow can lead to a vulnerability if an attacker can manipulate user-controlled inputs to exploit the security flaw.
There are, for example, well-known techniques for
overwriting frames in the stack to execute arbitrary code.
Buffer overflows can also be exploited in heap or static memory areas by overwriting data structures in adjacent memory.
To help, static (at program time) and dynamic analysis
MEMORY
PROCESS MEMORY ORGANIZATION
A program instance that is loaded into memory and managed by the operating system.
MEMORY
The code or text segment includes instructions and read-only data. It can be marked read-only so that modifying memory in the code section results in faults.
The data segment contains initialized data, uninitialized data, static variables, and global variables.
The heap is used for dynamically allocating process memory.
The stack is a last-in, first-out (LIFO) data structure used to support process execution.
The exact organization of process memory depends on the operating system, compiler, linker, and loader—in other
words, on the implementation of the programming language
STACK
int fun(int p1, int p2, int p3) { int res= 0;
res= p1 + p2 + p3;
return res;
}
int main() {
int a= 4, b= 5, c= 7;
a= fun(a,b,c);
}
b c p1 p2 p3 return address
res
EXAMPLE
include<stdio.h>
void f1();
void f2() { int c;
f1();
puts(“bye f2”);
}
void f1() { int b= 0;
f2();
puts(“bye f1”);
}
int main() { int a= 0;
f1();
main frame f1 frame f2 frame f1 frame f2
. . .
MacBook-Francesco:ProgrammI francescosantini$ ./test Segmentation fault: 11
STACK
To return control to the proper location, the sequence of return addresses must be stored. A stack is well suited for maintaining this information because it is a dynamic data structure that can support any level of nesting within memory constraints.
The address of the current frame is stored in the frame or base pointer register. On x86-32, the extended base pointer (ebp) register is used for this purpose.
The frame pointer is used as a fixed point of reference within the stack.
When a subroutine is called, the frame pointer for the
calling routine is also pushed onto the stack so that it can
DISASSEMBLY IN INTEL
INSTRUCTION POINTER
The instruction pointer (eip) points to the next instruction to be executed. When executing sequential instructions, it is automatically incremented by the size of each
instruction, so that the CPU will then execute the next instruction in the sequence.
Normally, the eip cannot be modified directly; instead, it must be modified indirectly by instructions such as jump, call, and return.
Extended stack pointer (esp) is the current pointer to the stack. The stack pointer points to the top of the stack.
üFor many popular architectures, including x86, SPARC, and MIPS processors, the stack grows toward lower memory.
DISASSEMBLY OF FOO (PROLOGUE)
STACK FRAME FOR FOO
DISASSEMBLING FOO (EPILOGUE)
RETURN VALUES
If there is a return value, it is stored in eax by the called function before returning.
The caller function knows it can be found in eax and can use it.
int MyFunction2(int a, int b) { return a + b;
}
x = MyFunction2(2, 3);
:_MyFunction2 push ebp
mov ebp, esp
mov eax, [ebp + 8]
mov edx, [ebp + 12]
add eax, edx
push 3 push 2
call _MyFunction2
%Use x in eax
STACK SMASHING
WHAT IS IT?
Stack smashing is when an attacker purposely
overflows a buffer on stack to get access to forbidden regions of computer memory.
Stack smashing occurs when a buffer overflow
overwrites data in the memory allocated to the execution stack.
It can have serious consequences for the reliability and security of a program.
Buffer overflows in the stack segment may allow an attacker to modify the values of automatic variables or execute arbitrary code.
WHAT CAN HAPPEN?
Overwriting automatic variables can result in a loss of data integrity or, in some cases, a security breach (for example, if a variable containing a user ID or password is overwritten).
More often, a buffer overflow in the stack segment can lead to an attacker executing arbitrary code by
overwriting a pointer to an address to which control is (eventually) transferred.
A common example is overwriting the return address, which is located on the stack.
EXAMPLE
The IsPasswordOk program is vulnerable to a stack- smashing attack.
The IsPasswordOK program has a security flaw because the Password array is improperly bounded and can hold only an 11-character password plus a trailing null byte.
This flaw can easily be demonstrated by entering a 20- character password of “1234567890123456W▸*!” that causes the program to jump in an unexpected way
BACK TO THE EXAMPLE
EXAMPLE
It crashes!
WHAT HAPPENS
Each of these characters has a corresponding
hexadecimal value: W = 0x57, ▸ = 0x10, * = 0x2A, and !
= 0x21.
In memory, this sequence of four characters corresponds to a 4-byte address that overwrites the return address on the stack, so instead of returning to the instruction
immediately following the call in main():
üThe IsPasswordOK() function returns control to the “Access granted” branch, bypassing the password validation logic and allowing unauthorized access to the system
GUESS THE RIGHT ADDRESS
A value of 0 yields the return address of the current function, a value of 1 yields the void * __builtin_return_address (unsigned int level)
ARC INJECTION
The arc injection technique (sometimes called return- into-libc) involves transferring control to code that
already exists in process memory.
These exploits are called arc injection because they
insert a new arc (control-flow transfer) into the program’s control-flow graph as opposed to injecting new code.
More sophisticated attacks are possible using this
technique, including installing the address of an existing function (such as system() or exec(), which can be used to execute commands and other programs already on the local system) on the stack along with the appropriate arguments.
ARC INJECTION
An attacker may prefer arc injection over code injection for several reasons.
Because arc injection uses code already in memory on the target system, the attacker merely needs to provide the addresses of the functions and arguments for a
successful attack.
üThe footprint for this type of attack can be significantly
smaller and may be used to exploit vulnerabilities that cannot be exploited by the code injection technique.
Because the exploit consists entirely of existing code, it cannot be prevented by memory-based protection
schemes such as making memory segments (such as
ONE MORE EXAMPLE
CODE INJECTION
(SHELL CODE)
INJECTION AND SHELLCODE
When the return address is overwritten because of a software flaw, it seldom points to valid instructions. Consequently,
transferring control to this address typically causes a trap and results in a corrupted stack.
But it is possible for an attacker to create a specially crafted string that contains a pointer to some malicious code, which the attacker also provides.
üWhen the function invocation whose return address has been overwritten returns, control is transferred to this code. The
malicious code runs with the permissions that the vulnerable program has when the subroutine returns.
üThis is why programs running with root or other elevated privileges are normally targeted. The malicious code can perform any function that can otherwise be programmed but often simply opens a remote shell on the compromised machine.
For this reason, the injected malicious code is referred to as shellcode.
HOW IT HAS TO BE
The pièce de résistance of any good exploit is the
malicious argument. A malicious argument must have several characteristics:
üIt must be accepted by the vulnerable program as legitimate input.
üThe argument, along with other controllable inputs, must result in execution of the vulnerable code path.
üThe argument must not cause the program to terminate abnormally before control is passed to the shellcode.
BACK TO THE EXAMPLE
INJECTION
% ./BufferOverflow < exploit.bin (exploit.bin is the “payload”)
HOW IT WORKS
The lea instruction used in this example stands for “load effective address.” The lea instruction computes the
effective address of the second operand (the source operand) and stores it in the first operand (destination operand).
The source operand is a memory address (offset part) specified with one of the processor’s addressing modes;
the destination operand is a general purpose register.
HOW IT WORKS
The exploit code works as follows:
ü1. The first mov instruction is used to assign 0xB to the %eax register. 0xB is the number of the execve() system call in
Linux.
• int execve(const char *filename, char *const argv[], char *const envp[]);
ü2. The three arguments for the execve() function call are set up in the subsequent three instructions (the two lea
instructions and the mov instruction). The data for these arguments is located on the stack, just before the exploit code.
ü3. The int $0x50 instruction is used to invoke execve(), which results in the execution of the Linux calendar program.
RESULT
RETURN-ORIENTED
PROGRAMMING
OVERCOMING DEFENCES
Code that already exists in the process image.
üThe standard C library, libc, is loaded in nearly every Unix program, it contains routines useful for an attacker.
üBut in principle any available code, either from the program’s text segment or from a library it links to, could be used.
By contrast, the building blocks for our attack are short code sequences, each just two or three instructions long.
Some are present in libc as a result of the code- generation choices of the compiler.
üThese code sequences would be very difficult to eliminate without extensive modifications to the compiler and
assembler.
HOW IT WORKS
The return-oriented programming exploit technique is similar to arc injection, but instead of returning to
functions, the exploit code returns to sequences of instructions followed by a return (ret) instruction.
Any such useful sequence of instructions is called a gadget.
Each gadget specifies certain values to be placed on the stack that make use of one or more sequences of
instructions in the code segment.
üGadgets perform well-defined operations, such as a load, an add, or a jump.
It allows an attacker to execute code in the presence of security defences such as executable space protection
HOW IT WORKS
Return-oriented programming is an advanced version of a stack smashing attack.
In a standard buffer overrun attack, the attacker would simply write attack code (the "payload") onto the stack and then overwrite the return address with the location of these newly written instructions.
ü Since the late 90s, OS/compilers have protections: data zones cannot be executed.
• DEP Data Execution prevention (there is a hardware bit)
• NX (no execute), on Intel XD (execute disable)
https://cseweb.ucsd.edu/~hovav/dist/geometry.pdf
EXAMPLE
The left side shows the x86-32 assembly language instruction necessary to copy the constant value
$0xdeadbeef into the ebx register, and the right side shows the equivalent gadget.
With the stack pointer referring to the gadget, the return instruction is executed by the CPU.
The resulting gadget pops the constant from the stack
pop %ebx;
ret;
EXAMPLE 2
An unconditional branch can be used to branch to an earlier gadget on the stack, resulting in an infinite loop.
pop %esp;
ret;
EXAMPLE OF ATTACK
The goal of the attack is to invoke system call sys_write and output “xxxHACKxxx” to screen
ssize_t sys_write(unsigned int fd, const char * buf, size_t count)
EXAMPLE OF ATTACK
int main(int argc, char *argv[]){
char buf[4];
gets(buf) return 0;
}
http://www.cs.virginia.edu/~ww6r/CS4630/lectures/return_oriented_programming.pdf
GADGETS
5:
HOW TO DO IT
SOME THEORETICAL ISSUES
Can you always find the gadgets you need?
üSome small executable files may not have all the gadgets for you
üIf the executable file is larger than 3MB there is a good chance that you can find a set of gadgets for any exploits
Do you need “ret”?
üNo, other jumps also work
ROP can work also without lib, only with provided code (in case mitigation on libc have been considered)
SOME THEORETICAL ISSUES
Return-oriented programming provides a fully functional
"language" (Turing complete) that an attacker can use to make a compromised machine perform any operation desired.
SUMMARY
Return-oriented Programming (ROP) addresses the limitations of code-injection and return-to-libc
üCode-injection: need executable stack üReturn-to-libc:
• Highly depends on libc's implementation
• Can be defended with mapped memory randomization
Gadgets: a small sequence of code ending with “ret” within a program's code section
üNo need to inject code, so no need of executable stack
üDo not use libc's full function implementation, may even only use just application's code
ROP attacks chain several gadgets together to execute arbitrary code
Enough ROP gadgets can be found in most executable files
COMPLICATED
While return-oriented programming might seem very complex, this complexity can be abstracted behind a programming language, compiler, and support tools, making it a viable technique for writing exploits.
HOW TO EXPLOIT/PREVENT IT
An automated tool has been developed to help automate the process of locating gadgets and constructing an
attack against a binary.
This tool, known as ROPgadget, searches through a binary looking for potentially useful gadgets, and
attempts to assemble them into an attack payload that spawns a shell to accept arbitrary commands from the attacker.
https://github.com/JonathanSalwan/ROPgadget
HOW TO BUILD THEM
pwntools is a CTF (Capture the Flag) framework and exploit development library. Written in Python, it is
designed for rapid prototyping and development, and intended to make exploit writing as simple as possible.
ühttp://docs.pwntools.com/en/stable/rop/rop.html
MITIGATION
STACK-SMASHING PROTECTOR (PROPOLICE)
In version 4.1, GCC introduced the Stack-Smashing Protector (SSP) feature, which implements canaries derived from StackGuard.
Also known as ProPolice, SSP is a GCC extension for protecting applications written in C from the most
common forms of stack buffer overflow exploits and is implemented as an intermediate language translator of GCC.
Specifically, SSP reorders local variables to place buffers after pointers and copies pointers in function arguments to an area preceding local variable buffers to avoid the corruption of pointers that could be used to further
corrupt arbitrary memory locations.
CANARIES
Canaries consist of a value that is difficult to insert or
spoof and are written to an address before the section of the stack being protected.
A sequential write would consequently need to overwrite this value on the way to the protected region.
The canary is initialized immediately after the return address is saved and checked immediately before the return address is accessed.
A hard-to-spoof or random canary is a 32-bit secret
random number that changes each time the program is
CANARIES
SSP works by introducing a canary to detect changes to the arguments, return address, and previous frame pointer in the stack. SSP inserts code fragments into appropriate locations as follows: a random number is generated for the guard value
during application initialization, preventing discovery by an unprivileged user.
DISABLING IT
The -fstack-protector and -fno-stack-protector options enable and disable stack-smashing protection for
functions with vulnerable objects (such as arrays).
ASLR
Address space layout randomization (ASLR) is a
security feature of many operating systems; its purpose is to prevent arbitrary code execution.
The feature randomizes the address of memory pages used by the program. ASLR cannot prevent the
returnaddress on the stack from being overwritten by a stack-based overflow.
However, by randomizing the address of stack pages, it may prevent attackers from correctly predicting the
address of the shellcode, system function, or return- oriented programming gadget that they want to invoke.
ASLR AND OS
ASLR was first introduced to Linux in the PaX project in 2000.
While the PaX patch has not been submitted to the mainstream Linux kernel, many of its features are incorporated into mainstream Linux distributions.
For example, ASLR has been part of Ubuntu since 2008 and Debian since 2007. Both platforms allow for fine-
grained tuning of ASLR via the following command:
üsysctl -w kernel.randomize_va_space=2
It can be turned off (= 0).
ASLR has been available on Windows since Vista.
ASLR
ASLR randomly arranges the address space positions of key data areas of a process, including the base of the
executable and the positions of the stack, heap and libraries.
NON-EXECUTABLE STACK
A nonexecutable stack is a runtime solution to buffer overflows that is designed to prevent executable code from running in the stack segment.
üMany operating systems can be configured to use nonexecutable stacks.
Nonexecutable stacks are often represented as a panacea in securing against buffer overflow
vulnerabilities, but…
üThey do not prevent buffer overflows from occurring in the heap or data segments.
üThey do not prevent an attacker from using a buffer overflow to modify a return address, variable, object pointer, or
function pointer.
DRAWBACKS
Depending on how they are implemented, nonexecutable stacks can affect performance.
Nonexecutable stacks can also break programs that execute code in the stack segment, including Linux signal delivery and GCC trampolines.
W^X
Several operating systems, including OpenBSD, Windows, Linux, and OS X, enforce reduced privileges in the kernel so that no part of the process address space is both writable and executable.
This policy is called W xor X (W⊕X), or more concisely W^X, and is supported by the use of a No eXecute (NX) bit on
several CPUs.
The NX bit enables memory pages to be marked as data, disabling the execution of code on these pages. This bit is named NX on AMD CPUs, XD (for eXecute Disable) on Intel CPUs.
W^X requires that no code is intended to be executed that is not part of the program itself. This prevents the execution of
IMPLEMENTATION
Deployment: Linux (via PaX patches); OpenBSD;
Windows (since XP SP2); OS X (since 10.5); ...
Hardware support: Intel “XD” bit, AMD “NX” bit (and many RISC processors)
HEAP ISSUES
SOME PROBLEMS WITH MALLOC
Initializing large blocks of memory can degrade performance and is not always necessary.
The decision by the C standards committee to not
require malloc() to initialize this memory reserves this decision for the programmer.
“MEM09-C. Do not assume memory allocation functions initialize memory.”
SOME SECURITY PROBLEMS
Where sensitive information is used, it is important to clear or overwrite the sensitive information before calling free().
üMEM03-C of The CERT C Secure Coding Standard: “Clear sensitive information stored in reusable resources”.
Unfortunately, compiler optimizations may silently remove a call to memset() if the memory is not accessed following the write.
To avoid this possibility, you can use the memset_s() function (if available). Unlike memset(), the memset_s() function
assumes that the memory being set may be accessed in the future.
CERT C Secure Coding Standard, “MSC06-C. Be aware of compiler optimization when dealing with sensitive data”.
FAILING TO CHECK RETURN VALUES
Memory is a limited resource and can be exhausted (memory leaks, overall memory, other processes).
Once all virtual memory is allocated, requests for more memory will fail.
“MEM32-C. Detect and handle memory allocation errors,”
MEMORY LEAKS AND SECURITY
Memory leaks occur when dynamically allocated memory is not freed after it is no longer needed.
Memory leaks can be problematic in long-running processes or in resource-exhaustion attacks.
üIf an attacker can identify an external action that causes
memory to be allocated but not freed, memory can eventually be exhausted.
ü Once memory is exhausted, additional allocations fail, and the application is unable to process valid user requests
without necessarily crashing.
EXAMPLE
REFERENCING FREED MEMORY
It is possible to access freed memory unless all pointers to that memory have been set to NULL or otherwise
overwritten.
Reading from freed memory is undefined behaviour but almost always succeeds without a memory fault because freed memory is recycled by the memory manager.
üHowever, there is no guarantee that the contents of the memory have not been altered.
SO
When you free, also set the pointer to freed memory to NULL.
Writing to a memory location that has been freed is also unlikely to result in a memory fault but could result in a number of serious problems.
If the memory has been reallocated, a programmer may overwrite memory, believing that a memory chunk is
dedicated to a particular variable when in reality it is being shared
OTHER ERRORS
Dereferencing Null or Invalid Pointers. If the operand doesn’t point to an object or function, the behaviour of the unary * operator is undefined. Cases:
ünull pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime
Double free: free the same memory chunk more than once.
Also a memory leak
HEAP OVERFLOWS
DLMALLOC
The GNU C library and most versions of Linux (for
example, Red Hat, Debian) are based on Doug Lea’s malloc (dlmalloc) the default native version of malloc.
In the following we describe the internals of dlmalloc
version 2.7.2, and examples of how these flaws can be exploited.
üNow after version 2.8.6
The security flaws responsible for the following
vulnerabilities are common to all versions of dlmalloc (and other memory managers as well).
üWe suppose Intel and 32bit architecture.
DLMALLOC
In dlmalloc, memory chunks are either allocated to a process or are free.
The first 4 bytes of both allocated and free chunks
contain either the size of the previous adjacent chunk, if it is free, or the last 4 bytes of user data of the previous chunk, if it is allocated.
CHUNCKS
Allocated chunk Free chunk
4 4
? 4
FREE CHUNKS
In dlmalloc, free chunks are arranged in circular double- linked lists, or bins.
Each double-linked list has a head that contains forward and backward pointers to the first and last chunks in the list. Both the forward pointer in the last chunk of the list and the backward pointer in the first chunk of the list point to the head element. When the list is empty, the head’s pointers reference the head itself.
A BIN
Both allocated and free chunks make use of a
PREV_INUSE bit
(represented by P in the figure) to indicate
whether or not the previous chunk is
ULINK
unlink() macro is used to remove a chunk from its
double-linked list. It is used when memory is
consolidated and when a chunk is taken off the free list because it has been allocated to a user.
ATTACKING THE HEAP
Through DLmalloc
http://grantcurell.com/2015/08/16/protostar-exploit- challenges-heap3-solution-exploiting-dlmalloc/
BUFFER OVERFLOW ON THE HEAP
Dynamically allocated memory is vulnerable to buffer overflows.
Exploiting a buffer overflow in the heap is generally
considered to be more difficult than smashing the stack.
For this reason, buffer overflows in the heap are not
always appropriately addressed, and developers adopt solutions that protect against stack-smashing attacks but not buffer overflows in the heap.
Buffer overflows, for example, can be used to corrupt
data structures used by the memory manager to execute arbitrary code.
üBoth the unlink and frontlink techniques described in the following can be used for this purpose as well.
ULINK TECHNIQUE
The unlink technique was successfully used against versions of Netscape browsers and traceroute using DLMalloc.
The unlink technique is used to exploit a buffer overflow to manipulate the boundary tags on chunks of memory to trick the unlink() macro into writing 4 bytes of data to an arbitrary location.
VULNERABLE CODE
This vulnerable program allocates three chunks of memory (lines 5–7). The program accepts a single string argument that is copied into first (line 8). This unbounded strcpy() operation is susceptible to a buffer overflow. The boundary tag can be overwritten by a string argument exceeding the length of first because the boundary tag for second is located directly after the first buffer.
THE HEAP
The content of the heap at the time
free() is
called for the first time
Div by 8
CONSOLIDATION
If the second chunk is unallocated, the free() operation will attempt to consolidate it with the first chunk.
To determine
whether the second chunk is unallocated, free() checks the
PREV_INUSE bit of the third chunk.
The location of the third chunk is
determined by adding the size of the second chunk to its starting address.
HEAP IS OVERWRITTEN
The attacker can overwrite the boundary tag associated with the second
chunk of memory, because this boundary tag is located immediately after the
EXAMPLE
IN PRACTICE
The unlink() macro writes 4 bytes of data supplied by an attacker to a 4-byte address also supplied by the
attacker.
ulink() is called
EFFECT
The first line of unlink, FD = P->fd, assigns the value in P->fd (provided as part of the malicious argument) to FD.
The second line of the unlink macro, BK = P->bk, assigns the value of P->bk, also provided by the malicious argument to BK.
The third line of the unlink() macro, FD->bk =
BK,overwrites the address specified by FD + 12 (the
offset of the bk field in the structure) with the value of BK.
RESULT
We write at address fp the content in bk
An attacker could, for example, provide the address of the return pointer on the stack and use the unlink()
macro to overwrite the address with the address of malicious code.
üDLMalloc change the stack for me
Exploitation of a buffer overflow in the heap is not
particularly difficult. The most difficult part of this exploit is determining the size of the first chunk so that the
boundary tag for the second argument can be precisely overwritten.
DOUBLE FREE VULNERABILITIES
Doug Lea’s malloc is also susceptible to double-free vulnerabilities. This type of vulnerability arises from freeing the same chunk of memory twice without its being reallocated between the two free operations.
For a double-free exploit to be successful, two conditions must be met. The chunk to be freed must be isolated in memory (that is, the adjacent chunks must be allocated so that no consolidation takes place), and the bin into which the chunk is to be placed must be empty.
WE START WITH
FRONTLINK
When a chunk of memory is freed, it must be linked into the appropriate double-linked list. In some versions of dlmalloc, this is performed by the frontlink code segment.
AFTER A FREE
ATTACK
The attacker supplies the address of a memory chunk
and arranges for the first 4 bytes of this memory chunk to contain executable code (that is, a jump instruction to
shellcode).
This is accomplished by writing these instructions into the last 4 bytes of the previous chunk in memory.
MITIGATION
Randomization works on the principle that it is harder to hit a moving target than a still target. Addresses of
memory allocated by malloc() are fairly predictable.
Randomizing the addresses of blocks of memory returned by the memory manager can make it more difficult to exploit a heap-based vulnerability.
Tools for static and dynamic analysis: Valgrind
üMemcheck is a memory error detector. It can detect the
following problems that are common in C and C++ programs.
üAccessing memory you shouldn't, e.g. overrunning and
underrunning heap blocks, overrunning the top of the stack, and accessing memory after it has been freed.
POINTER SUBTERFUGE
POINTERS CAN BE MODIFIED
Pointer subterfuge is a general term for exploits that modify a pointer’s value. C and C++ differentiate
between pointers to objects and pointers to functions.
Function pointers can be overwritten to transfer control to attacker-supplied shellcode. When the program executes a call via the function pointer, the attacker’s code is
executed instead of the intended code.
POINTERS TO FUNCTIONS
Shellcode can be pointed by funcPtr!!
Overflow in the data segment!!!!
POINTERS TO OBJECTS
This program contains an unbounded memory copy on line 5. After overflowing the buffer, an attacker can overwrite ptr and val. When *ptr = val is
consequently evaluated on line 6, an arbitrary memory write is performed.
ONE MORE EXAMPLE WITH FUNCT
For an attacker to succeed in executing arbitrary code on x86-32, an exploit must modify the value of the
instruction pointer to reference the shellcode.
The instruction pointer register (eip) contains the offset in the current code segment for the next instruction to be executed.
The eip register cannot be accessed directly by software.
It is advanced from one instruction boundary to the next when executing code sequentially or modified indirectly by control transfer instructions (such as jmp, jcc, call, and ret), interrupts, and exceptions.
ONE MORE EXAMPLE WITH FUNCT
The call instruction, for example, saves return
information on the stack and transfers control to the called function specified by the destination (target) operand.
The target operand specifies the address of the first
instruction in the called function. This operand can be an immediate value, a general-purpose register, or a
memory location.
EXAMPLE
DISASSEMBLING IT
ModR/M= 15 points to an absolute, indirect call
CONCLUSION
These invocations of good_function() provide examples of call instructions that can and cannot be attacked.
üThe static invocation uses an immediate value as relative displacement, and this displacement cannot be overwritten because it is in the code segment.
The invocation through the function pointer uses an indirect reference, and the address in the referenced location (typically in the data or stack segment) can be overwritten.
TOOLS AND LINKS
TOOLS
Tools for static analysis
ühttps://en.wikipedia.org/wiki/List_of_tools_for_static_code_a nalysis
ühttps://samate.nist.gov/index.php/Source_Code_Security_An alyzers.html
Tools for dynamic analysis
ühttps://en.wikipedia.org/wiki/Dynamic_program_analysis
Deassembling
ühttps://rada.re/r/
ROPgadget (gadgets finder and auto-roper)
ühttp://shell-storm.org/project/ROPgadget/
SHELLCODES
Shellcode database for study cases
ühttp://shell-storm.org/shellcode/
ühttps://www.exploit-db.com ühttps://0day.today
LIBRARIES
Pwntools (Python)
ühttps://docs.pwntools.com/en/stable/
HOWTO
https://thecyberrecce.net/tag/pwn-tools/
https://ocw.cs.pub.ro/courses/cns/labs/lab-01
https://github.com/FabioBaroni/awesome-exploit- development