Garbage Collection Garbage Collection
Introduction and Overview Introduction and Overview
Excerpted from presentation by Christian Schulte
Christian Schulte
Programming Systems Lab Programming Systems Lab Universität des Saarlandes, Germany Universität des Saarlandes, Germany
schulte@ps.uni-sb.de schulte@ps.uni-sb.de
Garbage Collection…
Garbage Collection…
…is concerned with the automatic
reclamation of dynamically allocated
memory after its last use by a program
Garbage collection…
Garbage collection…
Dynamically allocated memory Dynamically allocated memory
Last use by a program Last use by a program
Examples for automatic reclamation Examples for automatic reclamation
Kinds of Memory Allocation Kinds of Memory Allocation
static int i;
void foo(void) { int j;
int* p = (int*) malloc(…);
}
Static Allocation Static Allocation
By compiler (in text area) By compiler (in text area)
Available through entire runtime Available through entire runtime
Fixed size Fixed size
static int i;
void foo(void) { int j;
int* p = (int*) malloc(…);
}
Automatic Allocation Automatic Allocation
Upon procedure call (on stack) Upon procedure call (on stack)
Available during execution of call Available during execution of call
Fixed size Fixed size
static int i;
void foo(void) { int j;
int* p = (int*) malloc(…);
}
Dynamic Allocation Dynamic Allocation
Dynamically allocated at runtime (on heap) Dynamically allocated at runtime (on heap)
Available until explicitly deallocated Available until explicitly deallocated
Dynamically varying size Dynamically varying size
static int i;
void foo(void) { int j;
int* p = (int*) malloc(…);
}
Dynamically Allocated Memory Dynamically Allocated Memory
Also: heap-allocated memory Also: heap-allocated memory
Allocation: malloc, new, … Allocation: malloc, new, …
– before first usage
Deallocation: free, delete, dispose, … Deallocation: free, delete, dispose, …
– after last usage
Needed for Needed for
– C++, Java: objects
– SML: datatypes, procedures – anything that outlives procedure call
Getting it Wrong Getting it Wrong
Forget to free (memory leak) Forget to free (memory leak)
– program eventually runs out of memory – long running programs: OSs. servers, …
Free to early (dangling pointer) Free to early (dangling pointer)
– lucky: illegal access detected by OS
– horror: memory reused, in simultaneous use
• programs can behave arbitrarily
• crashes might happen much later
Estimates of effort Estimates of effort
– Up to 40%! [Rovner, 1985]
Nodes and Pointers Nodes and Pointers
Node Node n n
– Memory block, cell
Pointer Pointer p p
– Link to node
– Node access: *p
Children Children children( children( n n ) )
– set of pointers to nodes referred by n
n
p
Mutator Mutator
Abstraction of program Abstraction of program
– introduces new nodes with pointer – redirects pointers, creating garbage
Nodes referred to by several pointers Nodes referred to by several pointers
Makes manual deallocation hard Makes manual deallocation hard
– local decision impossible
– respect other pointers to node
Cycles instance of sharing Cycles instance of sharing
Shared Nodes
Shared Nodes
Last Use by a Program Last Use by a Program
Question: When is node Question: When is node M M not any longer not any longer used by program?
used by program?
– Let P be any program not using M – New program sketch:
Execute P; Use M;
– Hence:
M used P terminates – We are doomed: halting problem!
So “last use” undecidable! So “last use” undecidable!
Safe Approximation Safe Approximation
Decidable and also simple Decidable and also simple
What means safe? What means safe?
– only unused nodes freed
What means approximation? What means approximation?
– some unused nodes might not be freed
Idea Idea
– nodes that can be accessed by mutator
Reachable Nodes Reachable Nodes
Reachable from root set Reachable from root set
– processor registers – static variables
– automatic variables (stack)
Reachable from reachable nodes Reachable from reachable nodes
root
Summary: Reachable Nodes Summary: Reachable Nodes
A node A node n n is reachable, iff is reachable, iff
– n is element of the root set, or
– n is element of children(m) and m is reachable
Reachable node also called “live” Reachable node also called “live”
Mark and Sweep Mark and Sweep
Compute set of reachable nodes Compute set of reachable nodes
Free nodes known to be not Free nodes known to be not reachable
reachable
Reachability: Safe Approximation Reachability: Safe Approximation
Safe Safe
– access to not reachable node impossible
– depends on language semantics – but C/C++? later…
Approximation Approximation
– reachable node might never be accessed
– programmer must know about this!
– have you been aware of this?
Example Garbage Collectors Example Garbage Collectors
Mark-Sweep Mark-Sweep
Others Others
– Mark-Compact
– Reference Counting – Copying
– see Chapter 1&2 of [Lins&Jones,96]
The Mark-Sweep Collector The Mark-Sweep Collector
Compute reachable nodes: Mark Compute reachable nodes: Mark
– tracing garbage collector
Free not reachable nodes: Sweep Free not reachable nodes: Sweep
Run when out of memory: Allocation Run when out of memory: Allocation
First used with LISP [McCarthy, First used with LISP [McCarthy, 1960]
1960]
Allocation Allocation
node* new() {
if (free_pool is empty) mark_sweep();
…
Allocation Allocation
node* new() {
if (free_pool is empty) mark_sweep();
return allocate();
}
The Garbage Collector The Garbage Collector
void mark_sweep() { for (r in roots) mark(r);
…
The Garbage Collector The Garbage Collector
void mark_sweep() { for (r in roots) mark(r);
…
all live nodes marked
Recursive Marking Recursive Marking
void mark(node* n) {
if (!is_marked(n)) { set_mark(n);
… }
}
Recursive Marking Recursive Marking
void mark(node* n) {
if (!is_marked(n)) { set_mark(n);
… }
} nodes reachable from n marked
Recursive Marking Recursive Marking
void mark(node* n) {
if (!is_marked(n)) { set_mark(n);
for (m in children(n)) mark(m);
}
} i-th recursion: nodes
on path with length i marked
The Garbage Collector The Garbage Collector
void mark_sweep() { for (r in roots) mark(r);
sweep();
…
The Garbage Collector The Garbage Collector
void mark_sweep() { for (r in roots) mark(r);
sweep();
…
all nodes on heap live
The Garbage Collector The Garbage Collector
void mark_sweep() { for (r in roots) mark(r);
sweep();
…
all nodes on heap live
and not marked
Eager Sweep Eager Sweep
void sweep() {
node* n = heap_bottom;
while (n < heap_top) { …
} }
Eager Sweep Eager Sweep
void sweep() {
node* n = heap_bottom;
while (n < heap_top) {
if (is_marked(n)) clear_mark(n);
else free(n);
n += sizeof(*n);
} }
The Garbage Collector The Garbage Collector
void mark_sweep() { for (r in roots) mark(r);
sweep();
if (free_pool is empty)
abort(“Memory exhausted”);
}
Assumptions Assumptions
Nodes can be marked Nodes can be marked
Size of nodes known Size of nodes known
Heap contiguous Heap contiguous
Memory for recursion available Memory for recursion available
Child fields known! Child fields known!
Assumptions: Realistic Assumptions: Realistic
Nodes can be marked Nodes can be marked
Size of nodes known Size of nodes known
Heap contiguous Heap contiguous
Memory for recursion available Memory for recursion available
Child fields known Child fields known
Assumptions: Conservative Assumptions: Conservative
Nodes can be marked Nodes can be marked
Size of nodes known Size of nodes known
Heap contiguous Heap contiguous
Memory for recursion available Memory for recursion available
Child fields known Child fields known
Mark-Sweep Properties Mark-Sweep Properties
Covers cycles and sharing Covers cycles and sharing
Time depends on Time depends on
– live nodes (mark)
– live and garbage nodes (sweep)
Computation must be stopped Computation must be stopped
– non-interruptible stop/start collector – long pause
Nodes remain unchanged (as not moved) Nodes remain unchanged (as not moved)
Heap remains fragmented Heap remains fragmented
Software Engineering Issues Software Engineering Issues
Design goal in SE: Design goal in SE:
• decompose systems
• in orthogonal components
Clashes with letting each component Clashes with letting each component do its memory management
do its memory management
• liveness is global property
• leads to “local leaks”
• lacking power of modern gc methods
Typical Cost Typical Cost
Early systems (LISP) Early systems (LISP)
up to 40% [Steele,75] [Gabriel,85]
up to 40% [Steele,75] [Gabriel,85]
• “garbage collection is expensive” myth
Well engineered system of today Well engineered system of today
10% of entire runtime [Wilson, 94]
10% of entire runtime [Wilson, 94]
Areas of Usage Areas of Usage
Programming languages and systems Programming languages and systems
– Java, C#, Smalltalk, …
– SML, Lisp, Scheme, Prolog, … – Perl, Python, PHP, JavaScript – Modula 3, Microsoft .NET
Extensions Extensions
– C, C++ (Conservative)
Other systems Other systems
– Adobe Photoshop – Unix filesystem
– Many others in [Wilson, 1996]
Understanding Garbage Understanding Garbage
Collection: Benefits Collection: Benefits
Programming garbage collection Programming garbage collection
– programming systems – operating systems
Understand systems with garbage Understand systems with garbage collection (e.g. Java)
collection (e.g. Java)
– memory requirements of programs – performance aspects of programs – interfacing with garbage collection
(finalization)
References References
Garbage Collection. Richard Jones Garbage Collection. Richard Jones and Rafael Lins, John Wiley & Sons, and Rafael Lins, John Wiley & Sons,
1996.
1996.
Uniprocessor garbage collection Uniprocessor garbage collection techniques. Paul R. Wilson, ACM techniques. Paul R. Wilson, ACM
Computing Surveys. To appear.
Computing Surveys. To appear.
• Extended version of IWMM 92, St. Malo.