• Non ci sono risultati.

Probabilistic interpretations and accurate algorithms for stochastic fluid models Federico Poloni

N/A
N/A
Protected

Academic year: 2022

Condividi "Probabilistic interpretations and accurate algorithms for stochastic fluid models Federico Poloni"

Copied!
37
0
0

Testo completo

(1)

Probabilistic interpretations and accurate algorithms

for stochastic fluid models

Federico Poloni1

Joint work with Giang T. Nguyen2

1U Pisa, Computer Science Dept.

2U Adelaide, School of Math. Sciences

U Adelaide – 23rd January 2014

(2)

Goal of this research

Markovian Models of queues/buffers — computing stationary measures

Many algorithms have multiple interpretations in different

“languages”, e.g. Newton’s method [Bean, O’Reilly, Taylor ’05]

I Linear algebra: invert matrices, compute eigenvalues

I Probability: Mij= P [something]

I Differential equations(sometimes): discretize dtdf (t) = . . .

However, the fastest algorithm available,doubling, is 100% abstract linear algebra

We try to gain more probabilistic insight on what it does + turn this insight into better accuracy

(3)

2

1

3 0.2

0.8 0.4

0.6

0.5

0.5 P =

0.4 0 0.6 0.2 0 0.8 0 0.5 0.5

Pij = P [transition i → j]

If πt=hπ1 π2 π3

i= probabilities of being in the states at time t Time evolution: πt+1= πtP

(4)

Probabilistic interpretations: censoring

2

1

3 0.2

0.8 0.4

0.6

0.5

0.5 P =

"

a bT c D

#

Censoring: ignore time spent in state 1, consider only states S = {2, 3}

Transitions 2 ↔ 3 may happen directly or through state 1.

Censored Markov chain

P = Db

|{z}

S→S

+cbT + c

|{z}

S→1

a

|{z}

1→1

bT

|{z}

1→S

+ca2bT+ · · · = D + c(1 − a)−1bT

(5)

Probabilistic interpretations: censoring II

1 2

3 4

0.2

0.8

0.4

0.3

0.3 0.5

0.5 0.9

0.1 P =

"

A B C D

#

Can also censor multiple states at the same time Censored Markov chain

P = D + CB + CAB + CAb 2B + · · · = D + C (I − A)−1B Schur complementation on I − P

(6)

Continuous-time Markov chains

Continuous time; transition probability = exponential distribution with parameter Qij

2

1

3 0.1

0.3 5

2

1.1

Q =

−7 5 2

0.1 −0.4 0.3

1.1 0 −1.1

Evolution follows dtdπ(t) = π(t)Q, or equivalently π(t) = π0exp(tQ)

(7)

Fluid queues

Queue, or buffer: “infinite-size bucket” in which fluid (or data) flows in or out at a rate ci, depending on the state of a continuous-time Markov chain

0 10 20 30 40 50 60

0 2 4 6 8

time

fluidlevel

We want the “long-time behavior” (stationary probabilities) of the fluid level, density vector f (x ) of P[level = x ]

(8)

Stationary density and ODEs

Theorem [Karandikar, Kulkarni ’95, Da Silva Soares Thesis]

The stationary density vector satisfies d

dxf (x )C = f (x )Q C = diag(c1, . . . , cn)

Different ways to see it. . . Differential equations:

The solutions of this linear ODE are linear combinations of the

“elementary solutions”

f(i )(x ) = uiexp(x λi),

with (ui, λi) (left) eigenvector-eigenvalue pairs of QC−1

Throw in boundary conditions. Stable ones? Keep only <λ < 0.

(9)

Invariant probabilities and linear algebra

Theorem [Karandikar, Kulkarni ’95, Da Silva Soares Thesis]

The invariant density satisfies d

dxf (x )C = f (x )Q C = diag(c1, . . . , cn)

Different ways to see it. . . Numerical linear algebra

Find the stable invariant subspaceof QC−1, i.e., U = span(u1, u2, . . . , uh)

u1, . . . , uh with eigenvalues in the left complex half-plane

(10)

Invariant probabilities and probability

Theorem [Karandikar, Kulkarni ’95, Da Silva Soares Thesis]

The invariant density satisfies d

dxf (x )C = f (x )Q C = diag(c1, . . . , cn)

Order states so that C has positive elements on top; a basis for U are the rows of

h

I −Ψi for the “first return probabilities” Ψ :

Ψij = P[0  0 after some time (for the first time), and state i  j]

(11)

Structured doubling algorithm

There’s a linear algebra algorithm to solve this:

Structured doubling algorithm

Ek+1 = Ek(I − GkHk)−1Ek Fk+1 = Fk(I − HkGk)−1Fk

Gk+1 = Gk+ Ek(I − GkHk)−1GkFk Hk+1 = Hk+ Fk(I − HkGk)−1HkEk

E0, F0, G0, H0 = more unilluminating formulas

(12)

What’s going on

What’s going on: SDA is related to scaling and squaring

To look for stable modes, build exp(tH) for a large t, look at what subspace “goes to 0” and what “to ∞”

Choose initial step-length γ, start from first-order arccurate S = exp(γH) ≈ (I +γ

2H)(I − γ 2H)−1 Then keep squaring: exp(2kγH) =



. . . S22. . .2

2

Keep iterates in the form S2k =

"

I −Gk 0 Fk

#−1"

Ek 0

−Hk I

#

Why?

I A method to prevent instabilities from large entries

I Natural in a different problem in control theory

I It works!

(13)

Probabilistic interpretation for SDA — the grand scheme

We construct a discrete-timeprocess with the same behavior

1 Rescaling

2 Discretization

3 Doubling

Rescaling: (state-dependent) change of time scale to get ±1 slopes Well understoodprobabilistically; linear algebra: diagonal similarity

0 5 10 15 20 25 30 35 40

−2 0 2 4 6 8

time

fluidlevel

Discrete time and ±1 rates =⇒ discrete space “level”

(14)

Discretization

Probabilists often use P = I + γQ, γ > 0, as a discretization of the continuous-time Markov chain Q (uniformization)

Differential equations : explicit Euler’s method!

discretize dtdf (t) = f (t)Q to ft+1= ft(I + γQ) It turns out that something slightly different happens in SDA:

Theorem (similar to [P., Reis, preprint], [P., thesis])

"

E0 G0

H0 F0

#

= (I + γQ)(I − γQ)−1

Differential equations Midpoint method with stepsize γ2 Probability on/off switch; observe the queue only if it is on

We encountered before (I + γH)(I − γH)−1, but on H = QC−1 instead

(15)

Doubling step

So,

"

E0 G0

H0 F0

#

is a discrete-time Markov chain.

Observation

After one doubling step

"

E1 G1 H1 F1

#

is still the transition matrix of a DTMC What do its states represent?

“States” of the queuing model = (`, s) = (level, state of the DTMC) some states are associated to a +1rate, we call them ⊕

resp. −1 rate,

(16)

Levels and states

. . . . . .

. . . . . .

` − 1 ` ` + 1

` − 1⊕ `⊕ ` + 1⊕

E0

H0

G0

F0

E0

H0

G0

F0

E0 E0

H0

G0

F0

E0

H0

G0

F0

(17)

More states

in a state with rate, E0 or G0 is applied in a state with rate, F0 or H0

1⊕

1

2⊕

2

3⊕

3

4⊕

4 . . .

. . .

. . .

. . . E0

H0

G0

F0

E0

H0

G0

F0

E0

H0

G0

F0

E0

H0

G0

F0

E0

H0

G0

F0

Ek+1 = Ek(I − GkHk)−1Ek Fk+1 = Fk(I − HkGk)−1Fk

Gk+1 = Gk+ Ek(I − GkHk)−1GkFk Hk+1 = Hk+ Fk(I − HkGk)−1HkEk

(18)

The solution

Censor in this way:

1⊕

1

2⊕

2

3⊕

3

4⊕

4 . . .

. . .

. . .

. . . E0

H0

G0

F0

E0

H0

G0

F0

E0

H0

G0

F0 E0

H0

G0

F0

E0

H0

G0

F0

Ek+1 = Ek(I − GkHk)−1Ek Fk+1 = Fk(I − HkGk)−1Fk

Gk+1 = Gk+ Ek(I − GkHk)−1GkFk

Hk+1 = Hk+ Fk(I − HkGk)−1HkEk

(19)

Structured doubling algorithm: probabilistic interpretation

E2 E2 E2 E2

F2 F2 F2 F2

G2

H2 H2 G2 H2 G2 H2 G2 H2 G2

Result

Ek = P[0⊕  2k before  −1]

Gk = P[0⊕  −1 before  2k] Fk = P[0  −2k before  1]

Ek = P[0  1 before  −2k]

k→∞lim Gk = P[0⊕  −1 before “escaping to infinity”] =Ψ

(20)

Tilt your head diagonally

E2 E2 E2 E2

F2 F2 F2 F2

G2

H2 H2 G2 H2 G2 H2 G2 H2 G2

` ` + 1 ` + 2 ` + 3 ` + 4

SDA ⇐⇒ Cyclic reduction on QBD "

Ek 0

0 0

# ,

"

0 Hk Gk 0

# ,

"

0 0

0 Fk

#!

Relation appeared (only algebraically) in [Bini, Meini, P., 2010]

(21)

Work on a torus

Let’s “wrap the chain on itself” after two steps

2⊕

2

3⊕

1

E0

H0

G0

F0

E0

H0

G0

F0

Transitions probabilities in this queue are the same as in the big one

"

E1 G1 H1 F1

#

= Schur compl of first two blocks in I −

0 G0 E0 0 H0 0 0 F0 E0 0 0 G0

0 F0 H0 0

(22)

Part II

Componentwise accurate algorithms

(23)

Componentwise accurate linear algebra

Traditional algorithms are normwise accurate: ˜v = v + ε kv k Suppose v =h1 10−8i and ε = 10−8

˜ v =

"

1 + ε

| {z }

ok

, 10−8+ ε

| {z }

junk

#

Here we wantcomponentwise accurate algorithms

˜

v =h1 + ε, 10−8+10−8εi

|v − ˜v | ≤ εv (with ≤, |·| on components) Recent componentwise error analysis for doubling [Xue et al., ’12]

Algorithms almost ready, but a detail is missing

(24)

Subtraction-free computations

Error amplification in floating point op’s (think “loss of significant digits”) bounded by 1 for ⊕ (of nonnegative numbers), ,

can be arbitrarily high for , e.g., 1.000000000 − 0.9999999999 Solution

Avoid all the minuses!

Most come from Z-matrices, i.e., matrices with sign pattern

+ − − −

− + − −

− − + −

− − − +

(25)

Triplet representations

Gaussian elimination & inversion of Z-matrices: cancellation only on diagonal entries

Algorithm (GTH trick [Grassmann et al, ’85?])

Let Z be a Z-matrix. If we know its off-diagonal entries and v > 0, w ≥ 0 such that Zv = w , then we can run subtraction-free Gaussian elimination (offdiag(Z ), v , w ) is called triplet representation

GE knowing a triplet representation always componentwise perfectly stable!

Theorem [Alfa, Xue, Ye ’02]

The GTH algorithms to solve a linear system Zx = b, given (P, v , w ) and b exact to machine precision u, returns ˜x such that

|x − ˜x | ≤ 4

3n3u x + lower order terms

(26)

No condition number?

No condition number! How is this even possible? Example:

"

1 −1

−1 1 + ε

#−1

=ε−1

"

1 + ε 1

1 1

#

No way to get around (unstable) subtraction (1 + ε)−1 A triplet representation (blue entries):

"

1 −1

−1 1 + ε

# "

1 1

#

=

"

0 ε

#

It already contains ε, no need to compute it

The catch: a triplet representation is ill-conditioned to compute from the matrix entries

But what if we had it for free?

(27)

Using triplet representations

Structured doubling algorithm

Ek+1= Ek(I − GkHk)−1Ek Fk+1= Fk(I − HkGk)−1Fk

Gk+1= Gk+ Ek(I − GkHk)−1GkFk Hk+1= Hk + Fk(I − HkGk)−1HkEk

"

E0 G0 H0 F0

#

= (I + γQ)(I − γQ)−1

Missing ingredient from [Xue et al, ’12]:

deriving triplet representations using stochasticity of

"

Ek Gk Hk Fk

#

Theorem

(I − GkHk)1 = (HkEk+ Fk)1 (I − HkGk)1 = (GkFk + Ek)1

(28)

After Ψ : matrix exponentials

After computing Ψ , invariant measure given by f (x ) = v exp(−Kx )

Z-matrix K and row vector v ≥ 0 computed explicitly from Ψ Now, onlymatrix exponential needed — lots of literature on it

We use a subtraction-free algorithm [Xue et al., ’08; Xue et al., preprint; Shao et al., preprint]

Idea:

1 shift to reduce to a positive matrix: exp(A + zI) = ezexp(A)

2 truncated Taylor series + scaling and squaring:

exp(2kA) =

. . . I + A +A2 2!

!2

. . .

2

2

(Thanks N Higham, MW Shao for useful discussions)

(29)

Numerical results

Figure : Error on the single components. 15 × 15 model with two “hard-to-reach”

states

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

10−27 10−13 101

state i

componentwiseerror

value of [f (1)]i

absolute error on [f (1)]i, plain SDA absolute error on [f (1)]i, sub-free SDA

(30)

Numerical experiments

Figure : pdf f (x ) in several points

10−3 10−2 10−1 100 101 102

10−16 10−12 10−8 10−4

x

componentwiseerror

plain SDA sub-free SDA cw_cond(expm)

(31)

Numerical experiments

Figure : pdf f (x ) in several points

10−3 10−2 10−1 100 101 102

10−16 10−12 10−8 10−4

x

normerror

plain SDA sub-free SDA cw_cond(expm)

(32)

Numerical experiments

Figure : 10 × 10 model with states “each slightly harder to reach”

10−3 10−2 10−1 100 101 102

10−16 10−12 10−8 10−4

x

componentwiseerror

plain SDA sub-free SDA cw_cond(expm)

(33)

Numerical experiments

Figure : 10 × 10 model with states “each slightly harder to reach”

10−3 10−2 10−1 100 101 102

10−16 10−12 10−8 10−4

x

normerror

plain SDA sub-free SDA cw_cond(expm)

(34)

Numerical experiments

Figure : Very simple test queue[Bean, O’Reilly, Taylor ’05, Example 3]

10−3 10−2 10−1 100 101 102 103 104 10−16

10−12 10−8 10−4

x

componentwiseerror

plain SDA sub-free SDA cw_cond(expm)

(35)

Numerical experiments

Figure : Very simple test queue[Bean, O’Reilly, Taylor ’05, Example 3]

10−3 10−2 10−1 100 101 102 103 104 10−16

10−12 10−8 10−4

x

normerror

plain SDA sub-free SDA cw_cond(expm)

(36)

Conclusions

Algorithms: now with triplets!

Improvedunderstandingof doubling on the probabilistic, differential-eq and linear algebra levels

Step 1 on the way to get new algorithms

Probabilists prefer to use something that they “see”

Next targets: second-order models (Brownian motion), finite-horizon

Thanks for your attention!

(37)

Conclusions

Algorithms: now with triplets!

Improvedunderstandingof doubling on the probabilistic, differential-eq and linear algebra levels

Step 1 on the way to get new algorithms

Probabilists prefer to use something that they “see”

Next targets: second-order models (Brownian motion), finite-horizon Thanks for your attention!

Riferimenti

Documenti correlati

dei three-state-buffers connettono registri da/al BUS interno, controllano quale registro viene riversato sul / lett dal bus sono pilotati da appositi ordini provenienti dalla CU

[r]

In general, we discuss the average-case running time when the probability distribution is over the inputs to the algorithm, and we discuss the expected running time when the

In the present thesis, we have proposed novel numerical methods to solve flows of compressible materials, including fluids and elastic solids, in different regimes.. Moreover,

5 Ribasso su noleggio delle piattaforme aeree e similari 5 6 Tempo d'intervento in reperibilità (espresso in minuti) 5. 7 Progettazione gratuita SI

Smoothed Model Checking solves this problem relying on Gaussian Process (generalised) regression, a Bayesian non-parametric method that returns in each point an estimate of the value

Besides, To correctly take into account the effect of the infrastructures at catchment scale, it is important to have local hydrometric data on a long enough period, capturing

We describe the space-time pattern of mortality risk for lung cancer for the whole region from the cohort 1905-14 to cohort 1930-39 using hierarchical Bayesian models with