Principles of Programming Languages

(1)

Principles of Programming Languages

h"p://www.di.unipi.it/~andrea/Dida2ca/PLP-16/

Prof. Andrea Corradini

Department of Computer Science, Pisa

•  Dataﬂow Analysis for Global Op@miza@on

Lesson 15!

(2)

2

Summary

•  From local to global op@miza@on: Liveness analysis

•  Other examples of global op@miza@ons:

–  Available expressions –  Very busy expressions –  Reaching deﬁni@ons

•  A general framework for Dataﬂow Analysis

(3)

3

(Local) Liveness/Next-Use Informa@on

We have seen how to compute next-use/liveness informa2on in a basic block:

•  Next-use is computed by a backward scan of a basic block and performing the following ac@ons on

statement

L: x := y op z

–  Add liveness/next-use info on x, y, and z to statement L

•  This info can be stored in the symbol table

–  Before going up to the previous statement (scan up):

•  Set x info to “not live” and “no next use”

•  Set y and z info to “live” and the “next uses” of y and z to L

(4)

4

Local Next-Use/Liveness Analysis:

an example

j: a := b + c

k: t := a + b [ live(a) = true, live(b) = true, live(t) = true,

nextuse(a) = none, nextuse(b) = none, nextuse(t) = none ] [ live(a) = true, live(b) = true, live(c) = true,

nextuse(a) = k, nextuse(b) = k, nextuse(c) = none ] i: b := b + 1 [ live(b) = true, nextuse(b) = j]

•  In absence of global informa@on, all variables

are assumed to be live at the end of the block

(5)

5

Determina@on of Global Live Ranges for Register Alloca@on with Graph Coloring

a := read();

b := read();

c := read();

a := a+b+c;

a < 20 a < 10

d := c+8;

write(c);

f := 12;

d := f+a;

write(f);

e := 10;

d := e+a;

write(e);

write(d);

a

b c

e f

d

d d

Live range of b

a

f b

d e c

Interference graph:

connected vars have

overlapping ranges

(6)

The Dataﬂow Framework

•  There exists a general algorithm to extract informa@on from programs.

–  This algorithm solves what we will call dataﬂow analysis.

•  Many sta@c analyses are dataﬂow analysis:

–  Liveness, available expressions, very busy expressions, reaching deﬁni@ons.

•  We start with liveness, then present some other kind of analyses, and will eventually derive a

common paYern for them.

6

(7)

A Simple Example

How does the control ﬂow

graph of this program look like?

•  Determine the leaders

•  First statement

•  Each target of a goto

•  Each statement following a goto

•  One block from each leader to the line preceding the next leader

var x,y,z;

x = input;

while (x > 1) { y = x / 2;

if (y > 3)

x = x - y;

z = x - 4;

if (z > 0)

x = x / 2;

z = z - 1;

}

output x; ⁷

(8)

A simple example with Control Flow Graph

x = input x > 1

var x, y, z

y = x / 2 y > 3

x = x - y

z = x - 4

z > 0

x = x / 2

z = z - 1 output x

var x,y,z;

x = input;

while (x > 1) { y = x / 2;

if (y > 3)

x = x - y;

z = x - 4;

if (z > 0)

x = x / 2;

z = z - 1;

}

output x;

We can safely consider one block for each statement, omi]ng the boxes

8

(9)

A simple example with Control Flow Graph

x = input x > 1

var x, y, z

y = x / 2 y > 3

x = x - y

z = x - 4

z > 0

x = x / 2

z = z - 1 output x

•  How many registers do I need to compile this program?

•  We can answer with (Global) Liveness Analysis

var x,y,z;

x = input;

while (x > 1) { y = x / 2;

if (y > 3)

x = x - y;

z = x - 4;

if (z > 0)

x = x / 2;

z = z - 1;

}

output x;

9

(10)

Liveness

•  If we assume an inﬁnite set of registers, then a variable v should be in a register at a program point p, whenever:

1.  There is a path P from p to another program point p

_u

, where v is used.

2.  The path P does not include any deﬁni@on of v.

•  Condi@ons 1 and 2

determine when a variable v is alive at a program point p.

v = !

p

_u

: ! = v p: ! = !

P

10

(11)

Liveness Analysis

•  Given a control ﬂow graph G, ﬁnd, for each edge E in G, the set of variables alive at E.

x = input x > 1

var x, y, z

y = x / 2 y > 3

x = x - y

z = x - 4

z > 0

x = x / 2

z = z - 1 output x

{ ? }

{ ? } { ? }

{ ? }

{ ? } { ? }

{ ? }

Can you guess some of the live sets?

11

(12)

Determining Live Sets

•  To determine the sets of live variables on all edges of the control ﬂow graph we consider

–  The origin of informa@on: where liveness holds because of immediate observa@on

–  The propaga@on of informa@on: how liveness info is passed along a sequence of statements (“locally”)

–  Joining informa@on: how liveness info is passed across the boundaries of basic blocks (“globally”)

•  The same paYern will be used for other kind of analyses, and will be the essence of the Dataﬂow Analysis Framework

12

(13)

The Origin of Informa@on

If a variable is used at a program point p, then it must be alive

immediately before p.

^{p: ! = v}

If the variable is not used at p, when is it going to be alive

immediately before p? p: ! = !

13

(14)

The Propaga@on of Informa@on

•  A variable is alive immediately before a program point p if, and only if:

1.  It is alive immediate aber p.

2.  It is not redeﬁned at p.

•  or

1.  It is used at p.

p: u = !

{ ...u, ... v, ... } { ...u, ... v, ... }

!

"

This is the direc@on through which

informa@on propagates.

What if a program point has mul@ple successors?

14

(15)

Joining Informa@on

p: ! = !

. . .

How do we ﬁnd the

variables that are alive at this program point?

15

(16)

Joining Informa@on

p: ! = !

. . .

How do we ﬁnd the

variables that are alive at this program point?

But, what if a

program point has mul@ple successors?

If a variable v is alive immediately before any predecessor of p, then it must be alive immediately aber p.

16

(17)

IN and OUT Sets for Liveness

•  To solve the liveness analysis problem:

•  We associate with each program point p two sets, IN and OUT.

–  IN is the set of variables alive immediately before p.

–  OUT is the set of variables alive immediately a>er p.

•  We state equa@ons that relate these sets

•  We compute the sets as solu@ons of a system of equa@ons

x = input

z = x + y y = input

output z

IN = {x, y}

OUT = {z}

IN = {y}

IN = {}

IN = {z}

OUT = {x, y}

OUT = {y}

OUT = {}

17

(18)

Dataﬂow Equa@ons for Liveness Analysis

•  IN(p) = set of variables alive immediately before p

•  OUT(p) = set of variables alive immediately aber p

•  vars(E) = the variables that appear in E

•  succ(p) = the set of control ﬂow nodes that are successors of p

18

(19)

Solving Liveness

1.  We ini@alize each IN and OUT set to empty 2.  We evaluate all the

equa@ons

3.  If any IN or OUT set has changed during this

evalua@on, then we repeat (2), otherwise we are done

•  Is termina@on guaranteed?

2) x = input

3) z = x + y 1) y = input

4) output z

IN₃ = (OUT₃ \ {z}) ∪ {x, y}

OUT₃ = IN₄ IN₂ = OUT₂ \ x IN₁ = OUT₁ \ {y}

IN₄ = OUT₄ ∪ {z}

OUT₂ = IN₃ OUT₁ = IN₂

OUT₄ = {}

19

(20)

Example of Liveness Problem

•  Let's solve liveness analysis for the program below:

x = input x > 1

var x, y, z

y = x / 2 y > 3

x = x - y

z = x - 4

z > 0

x = x / 2

z = z - 1 output x

{ ? }

{ ? } { ? }

{ ? }

{ ? } { ? }

{ ? }

20

(21)

x = input x > 1

var x, y, z

y = x / 2 y > 3

x = x - y

z = x - 4

z > 0

x = x / 2

z = z - 1 output x

{ x }

{ x, y }

{ x }

{ x, y }

{ x }

{ x, z }

{ x, z } { x, z }

{ x, z } { x }

{ x }

{ } { }

{ }

Example of Liveness Problem

•  Let's solve liveness analysis for the program below:

21

(22)

Another Global Analysis:

Available Expressions

•  Consider the program on the leb. How could we op@mize it?

•  Which informa@on does this op@miza@on require?

•  How is the control ﬂow graph of this program?

22

var x, y, z, a, b;

z = a + b y = a * b

while (y > a + b) { a = a + 1

x = a + b

}

(23)

Available Expressions

z = a + b

y = a * b var x,y,z,a,b

y > a + b

a = a + 1

x = a + b

We know that the expression a + b is available at

these two program points

23

•  How could we improve

this code?

(24)

Available Expressions

z = a + b

y > a + b

a = a + 1

x = a + b

z = a + b y = a * b var x,y,z,a,b

y > w

a = a + 1

x = a + b w = x

w = z

Is this a good op@miza@on?

24

(25)

Available Expressions

•  In order to apply the previous op@miza@on, we had to know which expressions were

available at the places where we removed expressions by

variables.

•  An expression is available at a program point if its current value has already been

computed earlier in the execu@on.

•  Which expressions are available in our example?

z = a + b

y > a + b

a = a + 1

x = a + b

25

(26)

Available Expressions

•  In order to apply the previous op@miza@on, we had to know which expressions were

available at the places where we removed expressions by

variables.

•  An expression is available at a program point if its current value has already been

computed earlier in the execu@on.

•  Which expressions are available in our example?

•  We can approximate sets of available expressions with

dataﬂow analysis

²⁶

z = a + b

y > a + b

a = a + 1

x = a + b {a + b}

{a + b, a * b}

{a + b, y > a + b}

{a + b}

Why is the

expression a + 1 not available here?

(27)

p: ! = a + b

If an expression is used at a point p, then it is available immediately aber p, as long as p does not redeﬁne any of the variables that the

expression uses.

The Origin of Informa@on

(28)

The Propaga@on of Informa@on

•  An expression E is available immediately aber a program point p if, and only if:

1.  It is available immediately before p 2.  No variable of E is redeﬁned at p

•  or

1.  It is used at p

2.  No variable of E is redeﬁned at p

p: u = ●

{ ... w + x, ... }

✗

{ ...u + x, ... w + x, ... }

✓

This is the direc@on through which

informa@on propagates.

What if a program point has mul@ple predecessors?

(29)

If an expression E is available immediately aber every

predecessor of p, then it must be available immediately before p.

p: ! = !

. . .

Joining Informa@on

How do we ﬁnd the expressions that are

available at this program

point?

(30)

IN and OUT Sets for Availability

•  To solve the available expression analysis, we associate with

each program point p two sets, IN and OUT.

–  IN is the set of

expressions available immediately before p.

–  OUT is the set of

expressions available immediately aber p.

x = 2 * y

y = x + 1 y = 1 + 3

output y

IN = {1 + 3, 2 * y}

OUT = {1 + 3, x + 1}

IN = {1 + 3}

IN = {}

IN = {1 + 3, x + 1}

OUT = {1 + 3, 2 * y}

OUT = {1 + 3}

OUT = {1 + 3, x + 1}

(31)

Dataﬂow Equa@ons for Availability

•  IN(p) = the set of expressions available immediately before p

•  OUT(p) = the set of expressions available immediately aber p

•  pred(p) = the set of control ﬂow nodes that are predecessors of p

•  Expr(v) = the set of expressions that use variable v.

(32)

z = a + b

y > a + b

a = a + 1

x = a + b {a + b}

{a + b, a * b}

{a + b, y > a + b}

{a + b}

Example of Available Expressions

z = a + b

y > a + b

a = a + 1

x = a + b {?}

{?}

We can get the solu@on we have already seen by applying the

equa@ons, star@ng from the empty sets.

(33)

Very Busy Expressions

var x, a, b x = input a = x - 1 b = x - 2

while (x > 0) {

output a * b - x x = x - 1

}

output a * b

•  Consider the program on the leb. How could we op@mize it?

•  Which informa@on does this op@miza@on require?

•  Consider the expression a * b

–  Does it change inside the loop?

•  How is the control ﬂow graph

of this program?

(34)

Very Busy Expressions

We know that at this point, the expression a * b will be computed no maYer which program path is taken.

x = input a = x - 1 var x, a, b

x > 0

output a * b - x x = x - 1

b = x - 2

output a * b

(35)

x > 0

output a * b - x x = x - 1

b = x - 2

output a * b

x > 0

output t - x x = x - 1 b = x - 2

output t t = a * b

Very Busy Expressions

What is the

advantage of this op@miza@on?

What is the disadvantage of it? Is there any?

(36)

Very Busy Expressions

•  In order to apply the

previous op@miza@on, we **had to know that a * b was** a very busy expression

before the loop.

•  An expression is very busy at a program point if it will be computed before the program terminates along any path that goes from that point to the end of the program.

x = input

a = x - 1

var x,a,b

x > 0

x = x - 1 b = x - 2

output a * b

t = a * b

output t - x

(37)

Very Busy Expressions

•  We can approximate the set of very busy expressions by a

dataﬂow analysis.

•  How does informa@on originate?

•  How does informa@on propagate?

Why is the

expression a * b not very busy here?

{a * b}

x = input

a = x - 1

var x,a,b

x > 0

x = x - 1 b = x - 2

output a * b

{x - 1, x - 2, x > 0}

{x - 2, x > 0}

{x > 0, a * b}

t = a * b

output t - x

{a * b, x - 1}

{x - 1, a * b}

{t - x, x - 1, a * b}

(38)

p: ! = a + b

If an expression is used at a point p, then it is very busy immediately before p.

The Origin of Informa@on

(39)

The Propaga@on of Informa@on

•  An expression E is very busy immediately before a program point p if, and only if:

1.  It is very busy immediately aber p.

2.  No variable of E is redeﬁned at p.

1.  or

1.  It is used at p.

p: u = !

{ ...u + x, ... w + x, ... }

!

"

{ ...u + x, ... w + x, ... }

This is the direc@on through which

informa@on propagates.

What if a program point has mul@ple successors?

(40)

If an expression E is very busy immediately before every

successor of p, then it must be very busy immediately aber p.

Joining Informa@on

p: ! = !

. . .

How do we ﬁnd the

expressions that are very busy at this program

point?

(41)

IN and OUT Sets for Very Busy Expressions

•  To solve the very busy expression analysis, we associate with

each program point p two sets, IN and OUT.

–  IN is the set of very busy expressions

immediately before p.

–  OUT is the set of very busy expressions

immediately aber p.

x = 2 * y

y = x + 1 y = 1 + 3

output y

IN = {x + 1}

OUT = {y}

IN = {2 * y}

IN = {1 + 3}

IN = {y}

OUT = {x + 1}

OUT = {2 * y}

OUT = {}

(42)

Dataﬂow Equa@ons for Very Busy Expressions

•  IN(p) = the set of very busy expressions immediately before p

•  OUT(p) = the set of very busy expressions immediately aber p

•  succ(p) = the set of control ﬂow nodes that are

successors of p

(43)

Example of Very Busy Expressions

x = input

a = x - 1

var x,a,b

x > 0

x = x - 1 b = x - 2

output a * b

t = a * b

output t - x

{a * b}

x = input

a = x - 1

var x,a,b

x > 0

x = x - 1 b = x - 2

output a * b

{x - 1, x - 2, x > 0}

{x - 2, x > 0}

{x > 0, a * b}

t = a * b

output t - x

{a * b, x - 1}

{x - 1, a * b}

{t - x, x - 1, a * b}

Very busy

expressions in our example

(44)

{a * b}

x = input

a = x - 1

var x,a,b

x > 0

x = x - 1 b = x - 2

output a * b

{x - 1, x - 2, x > 0}

{x - 2, x > 0}

{x > 0, a * b}

t = a * b

output t - x

{a * b, x - 1}

{x - 1, a * b}

{t - x, x - 1, a * b}

Safe Code Hois@ng

•  It is performance safe to move a * b to this program point, in the sense that we will not be forcing the program to do any extra work, in any circumstance.

If we did not have this a * b here,

the transforma@on would not be

performance safe

(45)

Reaching Deﬁni@ons

var x,y,z;

x = input;

while (x > 1) { y = x / 2;

if (y > 3)

x = x - y;

z = x - 4;

if (y > 0)

x = x / 2;

z = z - 1;

}

output x;

•  Consider the program on the leb.

How could we op@mize it?

•  The assignment z = z – 1 is dead.

–  What is a dead assignment?

–  How can we ﬁnd them?

•  Which informa@on does this op@miza@on require?

•  How is the control ﬂow graph of

this program?

(46)

d₀: x = input d₁: x > 1

var x, y, z

d₂: y = x / 2 d₃: y > 3

d₄: x = x - y d₅: z = x - 4

d₆: y > 0

d₇: x = x / 2 d₈: z = z - 1

d₉: output x

Reaching Deﬁni@ons

•  We know that the assignment z = z – 1 is dead because this

deﬁni@on of z does not reach any use of this variable.

(47)

Reaching Deﬁni@ons

•  We say that a deﬁni@on of a variable v, at a program point p, reaches a program point p', if there is a path from p to p', and this path does not cross any redeﬁni@on of v.

d₀: x = input d₁: x > 1

var x, y, z

d₂: y = x / 2 d₃: y > 3

d₄: x = x - y d₅: z = x - 4

d₆: y > 0

d₇: x = x / 2 d₈: z = z - 1

d₉: output x

(48)

Reaching Deﬁni@ons

•  How does reaching def informa@on originate?

•  How does this informa@on propagate?

•  How can we join informa@on?

•  Which are the

dataﬂow equa@ons?

d₀: x = input d₁: x > 1

var x, y, z

d₂: y = x / 2 d₃: y > 3

d₄: x = x - y d₅: z = x - 4

d₆: y > 0

d₇: x = x / 2 d₈: z = z - 1

d₉: output x

(49)

p: v = !

If a program point p deﬁnes a variable v, then v reaches the point immediately aber p.

The Origin of Informa@on

(50)

The Propaga@on of Informa@on

•  A deﬁni@on of a variable v reaches the program point immediately aber p if, and only if:

1.  the deﬁni@on reaches the point immediately before 2.  variable v is not redeﬁned at p. p.

1.  or

1.  Variable v is deﬁned at p.

p: u = !

{ ...u = x, ... w = x, ... }

!

"

{ ...u = x, ... w = x, ... }

This is the direc@on through which

informa@on propagates.

(51)

p: ! = !

. . .

Joining Informa@on

How do we ﬁnd the

deﬁni@ons that reach this

program point?

(52)

If a deﬁni@on of a variable v reaches the point immediately aber at least one predecessor of p, then it reaches the

point immediately before p.

p: ! = !

. . .

Joining Informa@on

How do we ﬁnd the

deﬁni@ons that reach this

program point?

(53)

IN and OUT Sets for Reaching Deﬁni@ons

•  To solve the reaching deﬁni@ons analysis, we associate with

each program point p two sets, IN and OUT.

–  IN is the set of

deﬁni@ons that reach the point immediately before p.

–  OUT is the set of

deﬁni@ons that reach the point immediately aber p.

p₁: x = input

p₂: y = x + y p₀: y = input

p₃: output y

IN = {p₀, p₁}

OUT = {p₁, p₂} IN = {p₀}

IN = {}

IN = {p₁, p₂} OUT = {p₀, p₁}

OUT = {p₀}

OUT = {p₁, p₂}

(54)

Dataﬂow Equa@ons for Reaching Deﬁni@ons

•  IN(p) = the set of reaching deﬁni@ons immediately before p

•  OUT(p) = the set of reaching deﬁni@ons immediately aber p

•  pred(p) = the set of control ﬂow nodes that are predecessors of p

•  defs(v) = the set of deﬁni@ons of v in the program.

(55)

Example of Reaching Deﬁni@ons Revisited

•  Can you compute the set of reaching deﬁni@ons for our original example?

d₀: x = input d₁: x > 1

var x, y, z

d₂: y = x / 2 d₃: y > 3

d₄: x = x - y d₅: z = x - 4

d₆: y > 0

d₇: x = x / 2 d₈: z = z - 1

d₉: output x

(56)

d₀: x = input d₁: x > 1

var x, y, z

d₂: y = x / 2 d₃: y > 3

d₄: x = x - y d₅: z = x - 4

d₆: y > 0

d₇: x = x / 2 d₈: z = z - 1

d₉: output x

{0}

{0,2,4,7,8}

{0,2, 4,7,8}

{0,2,4,7,8}

{2,4,8}

{0,2,4,7,8}

{0,2,4,5,7}

{2,5,7}

{0,2,4,7,8}

Example of Reaching Deﬁni@ons Revisited

•  Can you compute the set of reaching

deﬁni@ons for our original example?

(57)

Liveness Reaching Defs

Very Busy Expressions Available Expressions

Finding Commonali@es

What these two analyses have in common?

(58)

Find Commonali@es

What about these two analysis?

Can you categorize the lines and

columns?

(59)

Backward Forward May

Must

The Dataﬂow Framework

(60)

The Dataﬂow Framework

•  A may analysis keeps tracks of facts that may happen during the execu@on of the program.

–  A deﬁni@on may reach a certain point.

•  A must analysis tracks facts that will – for sure – happen during the execu@on of the program.

–  This expression will be used aber certain program point.

•  A backward analysis propagates informa@on in the opposite direc@on in which the program ﬂows.

•  A forward analysis propagates informa@on in the same

direc@on in which the program ﬂows.

(61)

Transfer Func@ons

•  A data-ﬂow analysis does some interpreta2on of the program, in order to obtain informa@on.

•  But, we do not interpret the concrete seman@cs of the program, or else our analysis could not terminate.

•  Instead, we do some abstract interpreta2on.

•  The abstract seman@cs of a statement is given by a transfer func2on.

•  Transfer func@ons diﬀer if the analysis is forward or backward:

OUT[s] = f

_s

(IN[s]) IN[s] = f

_s

(OUT[s])

Forward analysis

Backward analysis

(62)

Transfer Func@ons

Backward Forward

May

Must

Can you recognize the transfer func@ons of each analysis?

(63)

Transfer Func@ons

Backward Forward

May

Must

Very Busy Expressions Available Expressions The transfer func@ons provides us with a new "interpreta@on" of the

program. We can implement a machine that traverses the program, always fetching a given instruc@on, and applying the transfer func@on onto that instruc@on. This process goes on un@l the results produced by these transfer func@ons stop changing.

(64)

Transfer Func@ons

•  The transfer func@ons do not have to be always the same, for every statement.

•  In the concrete seman@cs of an assembly language, each statement does

something diﬀerent.

–  The same can be true for the abstract seman@cs of the programming

language.

Statement Transfer Func@on

exit IN[p] = OUT[p]

output v IN[p] = OUT[p] ∪ {v}

bgz v L IN[p] = OUT[p] ∪ {v}

v = x + y IN[p] = (OUT[p] \ {v}) ∪ {x, y}

v = u IN[p] = (OUT[p] \ {v}) ∪ {u}

Which analysis is this one on the right?

(65)

The Merging Func@on

•  The merging func@on (meet or join) speciﬁes what happens once informa@on collides.

p: ● = ●

. . . p: ● = ●

. . .

(66)

Merging Func@ons

Backward Forward

May

Must

The combina@on of transfer func@ons, merging func@ons and – to a certain extent – the way that we ini@alize the IN and OUT gives us guarantees that the abstract interpreta@on terminates.

Principles of Programming Languages