SECURE PROGRAMMING

(1)

SECURE PROGRAMMING

A.A. 2018/2019

(2)

TERMINOLOGY

(3)

SECURITY FLAWS

Software engineering has long been concerned with the elimination of software defects.

A software defect is the encoding of a human error into the software, including omissions.

Software defects can originate at any point in the

software development life cycle. For example, a defect in a deployed product can originate from a misstated or

misrepresented requirement.

Security Flaw

üA software defect that poses a potential security risk.

(4)

SECURITY FLAWS

Not all software defects pose a security risk. Those that do are security flaws. If we accept that a security flaw is a software defect, then we must also accept that by

eliminating all software defects, we can eliminate all security flaws.

This premise underlies the relationship between software engineering and secure programming.

An increase in quality, as might be measured by defects per thousand lines of code, would likely also result in an increase in security.

Many tools, techniques, and processes that are designed to eliminate software defects also can be used to

(5)

DIFFICULT/EXPENSIVE TO REMOVE

However, many security flaws go undetected because traditional software development processes seldom assume the existence of attackers.

For example, testing will normally validate that an

application behaves correctly for a reasonable range of user inputs.

Unfortunately, attackers are seldom reasonable and will spend an inordinate amount of time devising inputs that will break a system.

(6)

VULNERABILITY

Not all security flaws lead to vulnerabilities. However, a security flaw can cause a program to be vulnerable to attack when the program’s input data (for example,

command-line parameters) crosses a security boundary en route to the program.

This may occur when a program containing a security flaw is installed with execution privileges greater than those of the person running the program or is used by a network service where the program’s input data arrives via the network connection.

Vunerability

üA set of conditions that allows an attacker to violate an

(7)

POLICY

Taken verbatim from RFC 2828, the Internet Security Glossary (https://www.ietf.org/standards/rfcs/)

Security Policy

üA set of rules and practices that specify or regulate how a system or organization provides security services to protect sensitive and critical system resources.

Security policies that are documented, well known, and visibly enforced can help establish expected user

behaviour.

(8)

FLAWS ARE NOT VULNERABILITIES

A security flaw can also exist without all the

preconditions required to create a vulnerability.

For example, a program can contain a defect that allows a user to run arbitrary code inheriting the permissions of that program.

üThis is not a vulnerability if the program has no special permissions and can be accessed only by local users,

because there is no possibility that a security policy will be violated.

(9)

VULNERABILITIES

Vulnerabilities can exist without a security flaw.

Because security is a quality attribute that must be traded off with other quality attributes such as

performance and usability, software designers may

intentionally choose to leave their product vulnerable to some form of exploitation.

Making an intentional decision not to eliminate a

vulnerability does not mean the software is secure, only that the software designer has accepted the risk on

behalf of the software consumer.

(10)

EXPLOITS

Exploit

üA technique that takes advantage of a security vulnerability to violate an explicit or implicit security policy.

Vulnerabilities in software are subject to exploitation.

Exploits can take many forms, including worms, viruses, and trojans.

Understanding how programs can be exploited is a valuable tool that can be used to develop secure software.

üHowever, disseminating exploit code against known vulnerabilities can be damaging to everyone.

(11)

ZERO-DAY EXPLOIT

A zero-day (also known as 0-day) vulnerability is a

computer-software vulnerability that is unknown to those who would be interested in mitigating the vulnerability (including the vendor of the target software).

An exploit directed at a zero-day is called a zero-day exploit, or zero-day attack

Developed by highly-skilled groups (NSA, central governments)

üEternal blue https://cve.mitre.org/cgi-

bin/cvename.cgi?name=CVE-2017-0144

On the market

ühttps://0day.today

(12)

MITIGATIONS

Mitigation

üMethods, techniques, processes, tools, or runtime libraries that can prevent or limit exploits against vulnerabilities.

A mitigation (or countermeasure) is a solution for a software flaw or a workaround that can be applied to prevent exploitation of a vulnerability.

üAt the source code level, mitigations can be as simple as replacing an unbounded string copy operation with a

bounded one.

üAt a system or network level, a mitigation might involve

turning off a port or filtering traffic to prevent an attacker from accessing a vulnerability.

(13)

MITIGATIONS

The preferred way to eliminate security flaws is to find and correct the actual defect.

However, in some cases it can be more cost-effective to eliminate the security flaw by preventing malicious inputs from reaching the defect.

Vulnerabilities can also be addressed operationally by isolating the vulnerability.

Of course, operationally addressing vulnerabilities

significantly increases the cost of mitigation because the cost is pushed out from the developer to system

administrators and end users.

(14)

TO SUM UP

(15)

A LINK TO CIA

A resource (either physical or logical) may have one or more vulnerabilities that can be exploited by a threat agent in a threat action.

The result can potentially compromise the

confidentiality, integrity or availability of resources.

(16)

TAXONOMY

(17)

AN EXAMPLE OF CLASSIFICATION

paper

(18)

A BIT OF C HISTORY

(19)

WHY C/C++

In this course, the decision to use C and C++ was based on the popularity of these languages, the enormous

legacy code base, and the amount of new code being developed in these languages.

(20)

WHY C

The C programming language is intended to be a lightweight language with a small footprint.

This characteristic of C leads to vulnerabilities when programmers fail to implement required logic because they assume it is handled by C (but it is not).

This problem is magnified when programmers are

familiar with superficially similar languages such as Java, Pascal, or Ada, leading them to believe that C protects the programmer better than it actually does.

These false assumptions have led to programmers failing to prevent writing beyond the boundaries of an array, failing to catch integer overflows and truncations and calling functions with the wrong number of

(21)

WHY C

The C Standard [ISO/IEC 2011] defines several kinds of behaviors:

Locale-specific behavior: behavior that depends on local conventions of nationality, culture, and language that each implementation

documents.

üAn example of locale-specific behavior is whether the islower() function returns true for characters other than the 26 lowercase Latin letters.

Unspecified behavior: use of an unspecified value, or other behavior where the C Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance.

üAn example of unspecified behaviour is the order in which the arguments to a function are evaluated.

Implementation-defined behaviour: unspecified behaviour where each implementation documents how the choice is made.

üAn example of implementation-defined behaviour is the propagation of the high-order bit when a signed integer is shifted right.

Undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International

Standard imposes no requirements.

(22)

LET’S START WITH C

C is a high-level general-purpose, procedural

programming language. Dennis Ritchie first devised C in the 1970s at AT&T Bell Laboratories in Murray Hill, New Jersey, for the purpose of implementing the Unix

operating system and utilities with the greatest possible degree of independence from specific hardware

platforms. The key characteristics of the C language are the qualities that made it suitable for that purpose:

üSource code portability

üThe ability to operate “close to the machine”

üEfficiency

As a result, the developers of Unix were able to write most of the operating system in C, leaving only a

(23)

C

C’s ancestors are the typeless programming languages BCPL (the Basic Combined Programming Language), developed by Martin Richards; and B, a descendant of BCPL, developed by Ken Thompson.

A new feature of C was its variety of data types:

characters, numeric types, arrays, structures, and so on.

Brian Kernighan and Dennis Ritchie published an official description of the C programming language in 1978.

Few hardware-dependent elements. For example, the C language proper has no file access or dynamic memory management statements. No input/output.

Instead, the extensive standard C library provides the functions for all of these purposes.

(24)

VIRTUES OF C

Fast (it's a compiled language and so is close to the machine hardware)

Portable (you can compile you program to run on just about any hardware platform out there)

The language is small (unlike C++ for example) Mature (a long history and lots of resources and experience available)

There are many tools for making programming easier (e.g. IDEs like Xcode)

You have direct access to memory

(25)

CHALLENGES OF USING C

The language is small (but there are many APIs)

It's easy to get into trouble, e.g. with direct memory access & pointers

You must manage memory yourself

Sometimes code is more verbose than in high-level scripting languages like Python, R, etc

(26)

STANDARDS

K & R C (Brian Kernighan and Dennis Ritchie)

ü1972 First created by Dennis Ritchie

ü1978 The C Programming Language described

ANSI C

ü1989 ANSI X.159-1989 aka C89 - First standardized version

ISO C

1990 ISO/IEC 9899:1990 aka C90 - Equivalent to C89 1995 Amendment 1 aka C95

1999 ISO/IEC 9899:1999 aka C99 2011 ISO/IEC 9899:2011 aka C11

(27)

DENNIS

(28)

HISTORY OF C++

In the early 1970s, Dennis Ritchie introduced “C” at Bell Labs.

ühttp://cm.bell-labs.co/who/dmr/chist.html

As a Bell Labs employee, Bjarne Stroustrup was

exposed to and appreciated the strengths of C, but also appreciated the power and convenience of higher-level languages like Simula, which had language support for object-oriented programming (OOP).

üOriginally called C With Classes, in 1983 it becomes C++

In 1985, the first edition of The C++ Programming Language was released

(29)

HISTORY

Adding support for OOP turned out to be the right feature at the right time for the ʽ90s. At a time when GUI

programming was all the rage, OOP was the right paradigm, and C++ was the right implementation.

At over 700 pages, the C++ standard demonstrated

something about C++ that some critics had said about it for a while: C++ is a complicated beast.

The first decade of the 21st century saw desktop PCs that were powerful enough that it didn’t seem worthwhile to deal with all this complexity when there were

alternatives that offered OOP with less complexity.

üJava

(30)

STROUSTRUP

(31)

CHARACTERISTICS

The most important feature of C++ is that it is both low- and high- level.

Programming in C++ requires a discipline and attention to detail that may not be required of kinder, gentler

languages that are not as focused on performance

üNo garbage collector!

(32)

STANDARDS

1998 ISO/IEC 14882:1998 C++98 2003 ISO/IEC 14882:2003 C++03 2011 ISO/IEC 14882:2011 C++11 2014 ISO/IEC 14882:2014 C++14 2017 ISO/IEC 14882:2017 C++17

2020 ???? C++20

(33)

LET’S START!

(34)

CERT C CODING STANDARD

https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT +C+Coding+Standard

The SEI CERT C Coding Standard is a software coding standard for the C programming language, developed by the CERT Coordination Center to improve the safety,

reliability, and security of software system

üCERT division at Carnagie Mellon

üA non-profit United States federally funded research and development center

(35)

ARRAYS AND STRINGS

(36)

ARRAYS ARE COMPLICATED

When passed as a parameter, an array name is a pointer

sizeof(int *) == 8 always

The CERT C Secure Coding Standard includes “ARR01-C. Do not apply the sizeof operator to a pointer when taking the size of an array,” which warns

(37)

STRINGS

Strings are a fundamental concept in software

engineering, but they are not a built-in type in C or C++.

The standard C library supports strings of type char and wide strings of type wchar_t.

(38)

IMPROPERLY BOUNDED COPIES

The gets() function has been deprecated in C99 and eliminated from Reading from a source to a fixed-length array. This program has undefined

behaviour if more than eight characters (including the null terminator) are entered at the prompt. The main problem with the gets() function is that it

provides no way to specify a limit on the number of characters to read.

(39)

READING FROM STDIN

Reading data from unbounded sources (such as stdin())

creates an interesting problem for a programmer. Because it is not possible to know beforehand how many characters a user will supply, it is not possible to preallocate an array of sufficient length.

A common solution is to statically allocate an array that is thought to be much larger than needed. In this example, the programmer expects the user to enter only one character and consequently assumes that the eight-character array length will not be exceeded.

üWith friendly users, this approach works well. But with malicious users, a fixed-length character array can be easily exceeded, resulting in undefined behaviour.

This approach is prohibited by The CERT C Secure Coding Standard, “STR35-C. Do not copy data from an unbounded source to a fixed-length array.”

(40)

FROM PROGRAM PARAMETERS

Vulnerabilities can occur when inadequate space is allocated to copy a program input such as a command-line argument.

Although argv[0] contains the program name by convention, an attacker can control the contents of arg[0] to cause a vulnerability in the following program by providing a string with more than 128 bytes.

Problems with C++ as well

(41)

HOW TO FIX IT

(42)

NULL TERMINATING STRINGS

The result is that the strcpy() to c may write well beyond the bounds of the array because the string stored in a[] is not correctly null-terminated.

The CERT C Secure Coding Standard includes “STR32-C. Null-terminate byte strings as required.”

(43)

ERRORS

Most of the functions defined in the standard string-handling library

<string.h>. Visual Studio has deprecated most of them.

However, errors are still possible without them, since strings are arrays of chars…

(44)

STRING VULNERABILITIES

AND EXPLOITS

(45)

TAINTED VALUES

Previous sections described common errors in manipulating strings in C or C++.

üThese errors become dangerous when code operates on untrusted data from external sources such as command-line arguments, environment variables, console input, text files, and network connections.

It is safer to view all external data as untrusted.

In software security analysis, a value is said to be

tainted if it comes from an untrusted source (outside of the program’s control) and has not been sanitized to

ensure that it conforms to any constraints on its value

that consumers of the value require, for example, that all strings are null-terminated.

(46)

PASSWORD EXAMPLE

(47)

SECURITY FLAWS

The security flaw in the IsPasswordOK program that

allows an attacker to gain unauthorized access is caused by the call to gets().

The condition that allows an out-of-bounds write to occur is referred to in software security as a buffer overflow.

(48)

ONE MORE FLAW

The IsPasswordOK program has another problem: it does not check the return status of gets().

This is a violation of “FIO04- C. Detect and handle input and output errors.”

When gets() fails, the contents of the Password buffer are indeterminate, and the subsequent strcmp() call has undefined behaviour.

In a real program, the buffer might even contain the good password previously entered by another user.

(49)

BUFFER OVERFLOWS

Buffer overflows occur when data is written outside of the boundaries of the memory allocated to a particular data structure. C and C++ are susceptible to buffer overflows because these languages:

üDefine strings as null-terminated arrays of characters.

üDo not perform implicit bounds checking.

üProvide standard library calls for strings that do not enforce bounds checking.

Depending on the location of the memory and the size of the overflow, a buffer overflow may go undetected but

can corrupt data, cause erratic behaviour, or terminate the program abnormally.

(50)

BUFFER OVERFLOWS

Not all buffer overflows lead to software vulnerabilities.

However, a buffer overflow can lead to a vulnerability if an attacker can manipulate user-controlled inputs to exploit the security flaw.

There are, for example, well-known techniques for

overwriting frames in the stack to execute arbitrary code.

Buffer overflows can also be exploited in heap or static memory areas by overwriting data structures in adjacent memory.

To help, static (at program time) and dynamic analysis

(51)

MEMORY

(52)

PROCESS MEMORY ORGANIZATION

A program instance that is loaded into memory and managed by the operating system.

(53)

MEMORY

The code or text segment includes instructions and read-only data. It can be marked read-only so that modifying memory in the code section results in faults.

The data segment contains initialized data, uninitialized data, static variables, and global variables.

The heap is used for dynamically allocating process memory.

The stack is a last-in, first-out (LIFO) data structure used to support process execution.

The exact organization of process memory depends on the operating system, compiler, linker, and loader—in other

words, on the implementation of the programming language

(54)

STACK

int fun(int p1, int p2, int p3) { int res= 0;

res= p1 + p2 + p3;

return res;

}

int main() {

int a= 4, b= 5, c= 7;

a= fun(a,b,c);

}

b c p1 p2 p3 return address

res

(55)

EXAMPLE

include<stdio.h>

void f1();

void f2() { int c;

f1();

puts(“bye f2”);

}

void f1() { int b= 0;

f2();

puts(“bye f1”);

}

int main() { int a= 0;

f1();

main frame f1 frame f2 frame f1 frame f2

. . .

MacBook-Francesco:ProgrammI francescosantini$ ./test Segmentation fault: 11

(56)

STACK

To return control to the proper location, the sequence of return addresses must be stored. A stack is well suited for maintaining this information because it is a dynamic data structure that can support any level of nesting within memory constraints.

The address of the current frame is stored in the frame or base pointer register. On x86-32, the extended base pointer (ebp) register is used for this purpose.

The frame pointer is used as a fixed point of reference within the stack.

When a subroutine is called, the frame pointer for the

calling routine is also pushed onto the stack so that it can

(57)

DISASSEMBLY IN INTEL

(58)

INSTRUCTION POINTER

The instruction pointer (eip) points to the next instruction to be executed. When executing sequential instructions, it is automatically incremented by the size of each

instruction, so that the CPU will then execute the next instruction in the sequence.

Normally, the eip cannot be modified directly; instead, it must be modified indirectly by instructions such as jump, call, and return.

Extended stack pointer (esp) is the current pointer to the stack. The stack pointer points to the top of the stack.

üFor many popular architectures, including x86, SPARC, and MIPS processors, the stack grows toward lower memory.

(59)

DISASSEMBLY OF FOO (PROLOGUE)

(60)

STACK FRAME FOR FOO

(61)

DISASSEMBLING FOO (EPILOGUE)

(62)

RETURN VALUES

If there is a return value, it is stored in eax by the called function before returning.

The caller function knows it can be found in eax and can use it.

int MyFunction2(int a, int b) { return a + b;

}

x = MyFunction2(2, 3);

:_MyFunction2 push ebp

mov ebp, esp

mov eax, [ebp + 8]

mov edx, [ebp + 12]

add eax, edx

push 3 push 2

call _MyFunction2

%Use x in eax

(63)

STACK SMASHING

(64)

WHAT IS IT?

Stack smashing is when an attacker purposely

overflows a buffer on stack to get access to forbidden regions of computer memory.

Stack smashing occurs when a buffer overflow

overwrites data in the memory allocated to the execution stack.

It can have serious consequences for the reliability and security of a program.

Buffer overflows in the stack segment may allow an attacker to modify the values of automatic variables or execute arbitrary code.

(65)

WHAT CAN HAPPEN?

Overwriting automatic variables can result in a loss of data integrity or, in some cases, a security breach (for example, if a variable containing a user ID or password is overwritten).

More often, a buffer overflow in the stack segment can lead to an attacker executing arbitrary code by

overwriting a pointer to an address to which control is (eventually) transferred.

A common example is overwriting the return address, which is located on the stack.

(66)

EXAMPLE

The IsPasswordOk program is vulnerable to a stack- smashing attack.

The IsPasswordOK program has a security flaw because the Password array is improperly bounded and can hold only an 11-character password plus a trailing null byte.

This flaw can easily be demonstrated by entering a 20- character password of “1234567890123456W▸*!” that causes the program to jump in an unexpected way

(67)

BACK TO THE EXAMPLE

(68)

EXAMPLE

It crashes!

(69)

WHAT HAPPENS

Each of these characters has a corresponding

hexadecimal value: W = 0x57, ▸ = 0x10, * = 0x2A, and !

= 0x21.

In memory, this sequence of four characters corresponds to a 4-byte address that overwrites the return address on the stack, so instead of returning to the instruction

immediately following the call in main():

üThe IsPasswordOK() function returns control to the “Access granted” branch, bypassing the password validation logic and allowing unauthorized access to the system

(70)

GUESS THE RIGHT ADDRESS

A value of 0 yields the return address of the current function, a value of 1 yields the void * __builtin_return_address (unsigned int level)

(71)

ARC INJECTION

The arc injection technique (sometimes called return- into-libc) involves transferring control to code that

already exists in process memory.

These exploits are called arc injection because they

insert a new arc (control-flow transfer) into the program’s control-flow graph as opposed to injecting new code.

More sophisticated attacks are possible using this

technique, including installing the address of an existing function (such as system() or exec(), which can be used to execute commands and other programs already on the local system) on the stack along with the appropriate arguments.

(72)

ARC INJECTION

An attacker may prefer arc injection over code injection for several reasons.

Because arc injection uses code already in memory on the target system, the attacker merely needs to provide the addresses of the functions and arguments for a

successful attack.

üThe footprint for this type of attack can be significantly

smaller and may be used to exploit vulnerabilities that cannot be exploited by the code injection technique.

Because the exploit consists entirely of existing code, it cannot be prevented by memory-based protection

schemes such as making memory segments (such as

(73)

ONE MORE EXAMPLE

(74)

CODE INJECTION

(SHELL CODE)

(75)

INJECTION AND SHELLCODE

When the return address is overwritten because of a software flaw, it seldom points to valid instructions. Consequently,

transferring control to this address typically causes a trap and results in a corrupted stack.

But it is possible for an attacker to create a specially crafted string that contains a pointer to some malicious code, which the attacker also provides.

üWhen the function invocation whose return address has been overwritten returns, control is transferred to this code. The

malicious code runs with the permissions that the vulnerable program has when the subroutine returns.

üThis is why programs running with root or other elevated privileges are normally targeted. The malicious code can perform any function that can otherwise be programmed but often simply opens a remote shell on the compromised machine.

For this reason, the injected malicious code is referred to as shellcode.

(76)

HOW IT HAS TO BE

The pièce de résistance of any good exploit is the

malicious argument. A malicious argument must have several characteristics:

üIt must be accepted by the vulnerable program as legitimate input.

üThe argument, along with other controllable inputs, must result in execution of the vulnerable code path.

üThe argument must not cause the program to terminate abnormally before control is passed to the shellcode.

(77)

BACK TO THE EXAMPLE

(78)

INJECTION

% ./BufferOverflow < exploit.bin (exploit.bin is the “payload”)

(79)

HOW IT WORKS

The lea instruction used in this example stands for “load effective address.” The lea instruction computes the

effective address of the second operand (the source operand) and stores it in the first operand (destination operand).

The source operand is a memory address (offset part) specified with one of the processor’s addressing modes;

the destination operand is a general purpose register.

(80)

HOW IT WORKS

The exploit code works as follows:

ü1. The first mov instruction is used to assign 0xB to the %eax register. 0xB is the number of the execve() system call in

Linux.

• int execve(const char *filename, char *const argv[], char *const envp[]);

ü2. The three arguments for the execve() function call are set up in the subsequent three instructions (the two lea

instructions and the mov instruction). The data for these arguments is located on the stack, just before the exploit code.

ü3. The int $0x50 instruction is used to invoke execve(), which results in the execution of the Linux calendar program.

(81)

RESULT

(82)

RETURN-ORIENTED

PROGRAMMING

(83)

OVERCOMING DEFENCES

Code that already exists in the process image.

üThe standard C library, libc, is loaded in nearly every Unix program, it contains routines useful for an attacker.

üBut in principle any available code, either from the program’s text segment or from a library it links to, could be used.

By contrast, the building blocks for our attack are short code sequences, each just two or three instructions long.

Some are present in libc as a result of the code- generation choices of the compiler.

üThese code sequences would be very difficult to eliminate without extensive modifications to the compiler and

assembler.

(84)

HOW IT WORKS

The return-oriented programming exploit technique is similar to arc injection, but instead of returning to

functions, the exploit code returns to sequences of instructions followed by a return (ret) instruction.

Any such useful sequence of instructions is called a gadget.

Each gadget specifies certain values to be placed on the stack that make use of one or more sequences of

instructions in the code segment.

üGadgets perform well-defined operations, such as a load, an add, or a jump.

It allows an attacker to execute code in the presence of security defences such as executable space protection

(85)

HOW IT WORKS

Return-oriented programming is an advanced version of a stack smashing attack.

In a standard buffer overrun attack, the attacker would simply write attack code (the "payload") onto the stack and then overwrite the return address with the location of these newly written instructions.

ü Since the late 90s, OS/compilers have protections: data zones cannot be executed.

• DEP Data Execution prevention (there is a hardware bit)

• NX (no execute), on Intel XD (execute disable)

https://cseweb.ucsd.edu/~hovav/dist/geometry.pdf

(86)

EXAMPLE

The left side shows the x86-32 assembly language instruction necessary to copy the constant value

$0xdeadbeef into the ebx register, and the right side shows the equivalent gadget.

With the stack pointer referring to the gadget, the return instruction is executed by the CPU.

The resulting gadget pops the constant from the stack

pop %ebx;

ret;

(87)

EXAMPLE 2

An unconditional branch can be used to branch to an earlier gadget on the stack, resulting in an infinite loop.

pop %esp;

ret;

(88)

EXAMPLE OF ATTACK

The goal of the attack is to invoke system call sys_write and output “xxxHACKxxx” to screen

ssize_t sys_write(unsigned int fd, const char * buf, size_t count)

(89)

EXAMPLE OF ATTACK

int main(int argc, char *argv[]){

char buf[4];

gets(buf) return 0;

}

http://www.cs.virginia.edu/~ww6r/CS4630/lectures/return_oriented_programming.pdf

(90)

GADGETS

5:

(91)

HOW TO DO IT

(92)

SOME THEORETICAL ISSUES

Can you always find the gadgets you need?

üSome small executable files may not have all the gadgets for you

üIf the executable file is larger than 3MB there is a good chance that you can find a set of gadgets for any exploits

Do you need “ret”?

üNo, other jumps also work

ROP can work also without lib, only with provided code (in case mitigation on libc have been considered)

(93)

SOME THEORETICAL ISSUES

Return-oriented programming provides a fully functional

"language" (Turing complete) that an attacker can use to make a compromised machine perform any operation desired.

(94)

SUMMARY

Return-oriented Programming (ROP) addresses the limitations of code-injection and return-to-libc

üCode-injection: need executable stack üReturn-to-libc:

• Highly depends on libc's implementation

• Can be defended with mapped memory randomization

Gadgets: a small sequence of code ending with “ret” within a program's code section

üNo need to inject code, so no need of executable stack

üDo not use libc's full function implementation, may even only use just application's code

ROP attacks chain several gadgets together to execute arbitrary code

Enough ROP gadgets can be found in most executable files

(95)

COMPLICATED

While return-oriented programming might seem very complex, this complexity can be abstracted behind a programming language, compiler, and support tools, making it a viable technique for writing exploits.

(96)

HOW TO EXPLOIT/PREVENT IT

An automated tool has been developed to help automate the process of locating gadgets and constructing an

attack against a binary.

This tool, known as ROPgadget, searches through a binary looking for potentially useful gadgets, and

attempts to assemble them into an attack payload that spawns a shell to accept arbitrary commands from the attacker.

https://github.com/JonathanSalwan/ROPgadget

(97)

HOW TO BUILD THEM

pwntools is a CTF (Capture the Flag) framework and exploit development library. Written in Python, it is

designed for rapid prototyping and development, and intended to make exploit writing as simple as possible.

ühttp://docs.pwntools.com/en/stable/rop/rop.html

(98)

MITIGATION

(99)

STACK-SMASHING PROTECTOR (PROPOLICE)

In version 4.1, GCC introduced the Stack-Smashing Protector (SSP) feature, which implements canaries derived from StackGuard.

Also known as ProPolice, SSP is a GCC extension for protecting applications written in C from the most

common forms of stack buffer overflow exploits and is implemented as an intermediate language translator of GCC.

Specifically, SSP reorders local variables to place buffers after pointers and copies pointers in function arguments to an area preceding local variable buffers to avoid the corruption of pointers that could be used to further

corrupt arbitrary memory locations.

(100)

CANARIES

Canaries consist of a value that is difficult to insert or

spoof and are written to an address before the section of the stack being protected.

A sequential write would consequently need to overwrite this value on the way to the protected region.

The canary is initialized immediately after the return address is saved and checked immediately before the return address is accessed.

A hard-to-spoof or random canary is a 32-bit secret

random number that changes each time the program is

(101)

CANARIES

SSP works by introducing a canary to detect changes to the arguments, return address, and previous frame pointer in the stack. SSP inserts code fragments into appropriate locations as follows: a random number is generated for the guard value

during application initialization, preventing discovery by an unprivileged user.

(102)

DISABLING IT

The -fstack-protector and -fno-stack-protector options enable and disable stack-smashing protection for

functions with vulnerable objects (such as arrays).

(103)

ASLR

Address space layout randomization (ASLR) is a

security feature of many operating systems; its purpose is to prevent arbitrary code execution.

The feature randomizes the address of memory pages used by the program. ASLR cannot prevent the

returnaddress on the stack from being overwritten by a stack-based overflow.

However, by randomizing the address of stack pages, it may prevent attackers from correctly predicting the

address of the shellcode, system function, or return- oriented programming gadget that they want to invoke.

(104)

ASLR AND OS

ASLR was first introduced to Linux in the PaX project in 2000.

While the PaX patch has not been submitted to the mainstream Linux kernel, many of its features are incorporated into mainstream Linux distributions.

For example, ASLR has been part of Ubuntu since 2008 and Debian since 2007. Both platforms allow for fine-

grained tuning of ASLR via the following command:

üsysctl -w kernel.randomize_va_space=2

It can be turned off (= 0).

ASLR has been available on Windows since Vista.

(105)

ASLR

ASLR randomly arranges the address space positions of key data areas of a process, including the base of the

executable and the positions of the stack, heap and libraries.

(106)

NON-EXECUTABLE STACK

A nonexecutable stack is a runtime solution to buffer overflows that is designed to prevent executable code from running in the stack segment.

üMany operating systems can be configured to use nonexecutable stacks.

Nonexecutable stacks are often represented as a panacea in securing against buffer overflow

vulnerabilities, but…

üThey do not prevent buffer overflows from occurring in the heap or data segments.

üThey do not prevent an attacker from using a buffer overflow to modify a return address, variable, object pointer, or

function pointer.

(107)

DRAWBACKS

Depending on how they are implemented, nonexecutable stacks can affect performance.

Nonexecutable stacks can also break programs that execute code in the stack segment, including Linux signal delivery and GCC trampolines.

(108)

W^X

Several operating systems, including OpenBSD, Windows, Linux, and OS X, enforce reduced privileges in the kernel so that no part of the process address space is both writable and executable.

This policy is called W xor X (W^⊕X), or more concisely W^X, and is supported by the use of a No eXecute (NX) bit on

several CPUs.

The NX bit enables memory pages to be marked as data, disabling the execution of code on these pages. This bit is named NX on AMD CPUs, XD (for eXecute Disable) on Intel CPUs.

W^X requires that no code is intended to be executed that is not part of the program itself. This prevents the execution of

(109)

IMPLEMENTATION

Deployment: Linux (via PaX patches); OpenBSD;

Windows (since XP SP2); OS X (since 10.5); ...

Hardware support: Intel “XD” bit, AMD “NX” bit (and many RISC processors)

(110)

HEAP ISSUES

(111)

SOME PROBLEMS WITH MALLOC

Initializing large blocks of memory can degrade performance and is not always necessary.

The decision by the C standards committee to not

require malloc() to initialize this memory reserves this decision for the programmer.

“MEM09-C. Do not assume memory allocation functions initialize memory.”

(112)

SOME SECURITY PROBLEMS

Where sensitive information is used, it is important to clear or overwrite the sensitive information before calling free().

üMEM03-C of The CERT C Secure Coding Standard: “Clear sensitive information stored in reusable resources”.

Unfortunately, compiler optimizations may silently remove a call to memset() if the memory is not accessed following the write.

To avoid this possibility, you can use the memset_s() function (if available). Unlike memset(), the memset_s() function

assumes that the memory being set may be accessed in the future.

CERT C Secure Coding Standard, “MSC06-C. Be aware of compiler optimization when dealing with sensitive data”.

(113)

FAILING TO CHECK RETURN VALUES

Memory is a limited resource and can be exhausted (memory leaks, overall memory, other processes).

Once all virtual memory is allocated, requests for more memory will fail.

“MEM32-C. Detect and handle memory allocation errors,”

(114)

MEMORY LEAKS AND SECURITY

Memory leaks occur when dynamically allocated memory is not freed after it is no longer needed.

Memory leaks can be problematic in long-running processes or in resource-exhaustion attacks.

üIf an attacker can identify an external action that causes

memory to be allocated but not freed, memory can eventually be exhausted.

ü Once memory is exhausted, additional allocations fail, and the application is unable to process valid user requests

without necessarily crashing.

(115)

EXAMPLE

(116)

REFERENCING FREED MEMORY

It is possible to access freed memory unless all pointers to that memory have been set to NULL or otherwise

overwritten.

Reading from freed memory is undefined behaviour but almost always succeeds without a memory fault because freed memory is recycled by the memory manager.

üHowever, there is no guarantee that the contents of the memory have not been altered.

(117)

SO

When you free, also set the pointer to freed memory to NULL.

Writing to a memory location that has been freed is also unlikely to result in a memory fault but could result in a number of serious problems.

If the memory has been reallocated, a programmer may overwrite memory, believing that a memory chunk is

dedicated to a particular variable when in reality it is being shared

(118)

OTHER ERRORS

Dereferencing Null or Invalid Pointers. If the operand doesn’t point to an object or function, the behaviour of the unary * operator is undefined. Cases:

ünull pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime

Double free: free the same memory chunk more than once.

Also a memory leak

(119)

HEAP OVERFLOWS

(120)

DLMALLOC

The GNU C library and most versions of Linux (for

example, Red Hat, Debian) are based on Doug Lea’s malloc (dlmalloc) the default native version of malloc.

In the following we describe the internals of dlmalloc

version 2.7.2, and examples of how these flaws can be exploited.

üNow after version 2.8.6

The security flaws responsible for the following

vulnerabilities are common to all versions of dlmalloc (and other memory managers as well).

üWe suppose Intel and 32bit architecture.

(121)

DLMALLOC

In dlmalloc, memory chunks are either allocated to a process or are free.

The first 4 bytes of both allocated and free chunks

contain either the size of the previous adjacent chunk, if it is free, or the last 4 bytes of user data of the previous chunk, if it is allocated.

(122)

CHUNCKS

Allocated chunk Free chunk

4 4

? 4

(123)

FREE CHUNKS

In dlmalloc, free chunks are arranged in circular double- linked lists, or bins.

Each double-linked list has a head that contains forward and backward pointers to the first and last chunks in the list. Both the forward pointer in the last chunk of the list and the backward pointer in the first chunk of the list point to the head element. When the list is empty, the head’s pointers reference the head itself.

(124)

A BIN

Both allocated and free chunks make use of a

PREV_INUSE bit

(represented by P in the figure) to indicate

whether or not the previous chunk is

(125)

ULINK

unlink() macro is used to remove a chunk from its

double-linked list. It is used when memory is

consolidated and when a chunk is taken off the free list because it has been allocated to a user.

(126)

ATTACKING THE HEAP

Through DLmalloc

http://grantcurell.com/2015/08/16/protostar-exploit- challenges-heap3-solution-exploiting-dlmalloc/

(127)

BUFFER OVERFLOW ON THE HEAP

Dynamically allocated memory is vulnerable to buffer overflows.

Exploiting a buffer overflow in the heap is generally

considered to be more difficult than smashing the stack.

For this reason, buffer overflows in the heap are not

always appropriately addressed, and developers adopt solutions that protect against stack-smashing attacks but not buffer overflows in the heap.

Buffer overflows, for example, can be used to corrupt

data structures used by the memory manager to execute arbitrary code.

üBoth the unlink and frontlink techniques described in the following can be used for this purpose as well.

(128)

ULINK TECHNIQUE

The unlink technique was successfully used against versions of Netscape browsers and traceroute using DLMalloc.

The unlink technique is used to exploit a buffer overflow to manipulate the boundary tags on chunks of memory to trick the unlink() macro into writing 4 bytes of data to an arbitrary location.

(129)

VULNERABLE CODE

This vulnerable program allocates three chunks of memory (lines 5–7). The program accepts a single string argument that is copied into first (line 8). This unbounded strcpy() operation is susceptible to a buffer overflow. The boundary tag can be overwritten by a string argument exceeding the length of first because the boundary tag for second is located directly after the first buffer.

(130)

THE HEAP

The content of the heap at the time

free() is

called for the first time

Div by 8

(131)

CONSOLIDATION

If the second chunk is unallocated, the free() operation will attempt to consolidate it with the first chunk.

To determine

whether the second chunk is unallocated, free() checks the

PREV_INUSE bit of the third chunk.

The location of the third chunk is

determined by adding the size of the second chunk to its starting address.

(132)

HEAP IS OVERWRITTEN

The attacker can overwrite the boundary tag associated with the second

chunk of memory, because this boundary tag is located immediately after the

(133)

EXAMPLE

(134)

IN PRACTICE

The unlink() macro writes 4 bytes of data supplied by an attacker to a 4-byte address also supplied by the

attacker.

ulink() is called

(135)

EFFECT

The first line of unlink, FD = P->fd, assigns the value in P->fd (provided as part of the malicious argument) to FD.

The second line of the unlink macro, BK = P->bk, assigns the value of P->bk, also provided by the malicious argument to BK.

The third line of the unlink() macro, FD->bk =

BK,overwrites the address specified by FD + 12 (the

offset of the bk field in the structure) with the value of BK.

(136)

RESULT

We write at address fp the content in bk

An attacker could, for example, provide the address of the return pointer on the stack and use the unlink()

macro to overwrite the address with the address of malicious code.

üDLMalloc change the stack for me

Exploitation of a buffer overflow in the heap is not

particularly difficult. The most difficult part of this exploit is determining the size of the first chunk so that the

boundary tag for the second argument can be precisely overwritten.

(137)

DOUBLE FREE VULNERABILITIES

Doug Lea’s malloc is also susceptible to double-free vulnerabilities. This type of vulnerability arises from freeing the same chunk of memory twice without its being reallocated between the two free operations.

For a double-free exploit to be successful, two conditions must be met. The chunk to be freed must be isolated in memory (that is, the adjacent chunks must be allocated so that no consolidation takes place), and the bin into which the chunk is to be placed must be empty.

(138)

WE START WITH

(139)

FRONTLINK

When a chunk of memory is freed, it must be linked into the appropriate double-linked list. In some versions of dlmalloc, this is performed by the frontlink code segment.

(140)

AFTER A FREE

(141)

ATTACK

The attacker supplies the address of a memory chunk

and arranges for the first 4 bytes of this memory chunk to contain executable code (that is, a jump instruction to

shellcode).

This is accomplished by writing these instructions into the last 4 bytes of the previous chunk in memory.

(142)

MITIGATION

Randomization works on the principle that it is harder to hit a moving target than a still target. Addresses of

memory allocated by malloc() are fairly predictable.

Randomizing the addresses of blocks of memory returned by the memory manager can make it more difficult to exploit a heap-based vulnerability.

Tools for static and dynamic analysis: Valgrind

üMemcheck is a memory error detector. It can detect the

following problems that are common in C and C++ programs.

üAccessing memory you shouldn't, e.g. overrunning and

underrunning heap blocks, overrunning the top of the stack, and accessing memory after it has been freed.

(143)

POINTER SUBTERFUGE

(144)

POINTERS CAN BE MODIFIED

Pointer subterfuge is a general term for exploits that modify a pointer’s value. C and C++ differentiate

between pointers to objects and pointers to functions.

Function pointers can be overwritten to transfer control to attacker-supplied shellcode. When the program executes a call via the function pointer, the attacker’s code is

executed instead of the intended code.

(145)

POINTERS TO FUNCTIONS

Shellcode can be pointed by funcPtr!!

Overflow in the data segment!!!!

(146)

POINTERS TO OBJECTS

This program contains an unbounded memory copy on line 5. After overflowing the buffer, an attacker can overwrite ptr and val. When *ptr = val is

consequently evaluated on line 6, an arbitrary memory write is performed.

(147)

ONE MORE EXAMPLE WITH FUNCT

For an attacker to succeed in executing arbitrary code on x86-32, an exploit must modify the value of the

instruction pointer to reference the shellcode.

The instruction pointer register (eip) contains the offset in the current code segment for the next instruction to be executed.

The eip register cannot be accessed directly by software.

It is advanced from one instruction boundary to the next when executing code sequentially or modified indirectly by control transfer instructions (such as jmp, jcc, call, and ret), interrupts, and exceptions.

(148)

ONE MORE EXAMPLE WITH FUNCT

The call instruction, for example, saves return

information on the stack and transfers control to the called function specified by the destination (target) operand.

The target operand specifies the address of the first

instruction in the called function. This operand can be an immediate value, a general-purpose register, or a

memory location.

(149)

EXAMPLE

(150)

DISASSEMBLING IT

ModR/M= 15 points to an absolute, indirect call

(151)

CONCLUSION

These invocations of good_function() provide examples of call instructions that can and cannot be attacked.

üThe static invocation uses an immediate value as relative displacement, and this displacement cannot be overwritten because it is in the code segment.

The invocation through the function pointer uses an indirect reference, and the address in the referenced location (typically in the data or stack segment) can be overwritten.

(152)

TOOLS AND LINKS

(153)

TOOLS

Tools for static analysis

ühttps://en.wikipedia.org/wiki/List_of_tools_for_static_code_a nalysis

ühttps://samate.nist.gov/index.php/Source_Code_Security_An alyzers.html

Tools for dynamic analysis

ühttps://en.wikipedia.org/wiki/Dynamic_program_analysis

Deassembling

ühttps://rada.re/r/

ROPgadget (gadgets finder and auto-roper)

ühttp://shell-storm.org/project/ROPgadget/

(154)

SHELLCODES

Shellcode database for study cases

ühttp://shell-storm.org/shellcode/

ühttps://www.exploit-db.com ühttps://0day.today

(155)

LIBRARIES

Pwntools (Python)

ühttps://docs.pwntools.com/en/stable/

(156)

HOWTO

https://thecyberrecce.net/tag/pwn-tools/

https://ocw.cs.pub.ro/courses/cns/labs/lab-01

https://github.com/FabioBaroni/awesome-exploit- development