Euclidean Distance Geometry

(1)

Euclidean Distance Geometry

Leo Liberti

CNRS LIX Ecole Polytechnique, France

IEOR, Columbia University, November 2015

[L., Lavor: Introduction to Euclidean Distance Geometry, in preparation]

(2)

1. Applications 2. Definition

3. Complexity primer

4. Complexity of the DGP 5. Number of solutions

6. Mathematical optimization formulations 7. Realizing complete graphs

8. The Branch-and-Prune algorithm 9. Symmetry in the ^K DMDGP

10. Tractability of protein instances 11. Finding vertex orders

12. Approximate realizations

(3)

Applications

1. Applications 2. Definition

3. Complexity primer

4. Complexity of the DGP 5. Number of solutions

6. Mathematical optimization formulations 7. Realizing complete graphs

8. The Branch-and-Prune algorithm 9. Symmetry in the ^K DMDGP

10. Tractability of protein instances 11. Finding vertex orders

12. Approximate realizations

(4)

Clock Synchronization

5 7

3

4 Atomic clock (S) 16:27

Chuck Alice

Bob

↓

16:20 16:22 16:24 16:26 16:28 16:30

A C S B

16:21 16:23 16:25 16:27 16:29 16:31

[Singer, 2011]

(5)

Sensor network localization

0

29 0 / 2.50781 36

1 / 1.97906

51 2 / 1.13649

52 3 / 2.92955

80 4 / 2.5548 91

5 / 2.4284

175 / 2.49225 177 / 1.4276

87 176 / 2.73147 211 / 0.989917 214 / 2.31786 49

210 / 2.83291 84

213 / 3.02018

66212 / 1.56417

268 / 1.76662 267 / 2.31042

271 / 1.75417 69 269 / 2.0622 73

270 / 3.10479

85 272 / 1.15897

346 / 1.50056

1 18

6 / 1.90126 37

7 / 2.61352 41

8 / 1.6604

45 9 / 1.92913

57 10 / 2.66407

11 / 0.65886158 83 12 / 1.32064

110 / 1.29792 111 / 0.840347

113 / 2.14071 114 / 2.03576 47

112 / 2.5425

215 / 2.12848

217 / 3.0822 218 / 1.98508 216 / 2.70003

229 / 2.97748 231 / 1.63275

232 / 2.32966 230 / 2.92252

249 / 1.49322 250 / 1.4467

251 / 3.11231

289 / 2.5056

81 290 / 2.54013

291 / 1.97931 2

5

13 / 1.62227

14 14 / 2.01015

17 15 / 2.68422

20

16 / 3.05872

23 17 / 2.48332

18 / 1.0575642

64 19 / 2.21356

67

20 / 2.42912

88 21 / 2.34146 93 22 / 1.0662

28 / 2.06104 29 / 1.80061

31 / 1.46243 33 / 1.38178 35 / 1.01186 33

30 / 3.15392 65

32 / 2.19726 70

34 / 3.17108

82 / 2.05285 83 / 1.9929

84 / 2.90088 85 / 0.965501

87 / 1.40579 88 / 2.51699 86 / 3.19003

98 89 / 2.50718

106 / 1.36517 107 / 2.2605

109 / 2.88336

55 108 / 2.81041

119 / 2.25758 123 / 0.666733 126 / 2.79445

122 / 1.26894 124 / 2.45029

71

125 / 2.61515

127 / 2.5523 56

120 / 2.46168 63

121 / 3.15779

143 / 2.62768 145 / 3.01294

146 / 1.62871

89 147 / 3.16128

144 / 2.60777 233 / 1.59139

234 / 3.07459 235 / 1.62072

316 / 1.57213 318 / 2.54084

92 317 / 3.16387

326 / 2.30061 325 / 2.97621

327 / 2.62621

350 / 3.18927

3 23 / 0.844888

26 / 2.43216 24 / 1.43953 25 / 2.52302

27 / 1.2847

329 / 2.97101 331 / 1.47226

330 / 2.81533 79

339 / 3.17026 195 / 3.02256

196 / 2.72116 194 / 3.16431 53 193 / 0.88823

319 / 1.78484 322 / 3.15941 320 / 1.19371

321 / 2.59083 332 / 2.88171 77

333 / 2.5427

6

36 / 2.6581637 / 2.84506 39 / 3.00913 39 38 / 0.977096

40 / 0.147583 90

41 / 2.54617 223 / 1.11811 225 / 1.74291 72

224 / 2.873

257 / 2.65041 7

10 42 / 1.67711

16 43 / 1.28092 30

44 / 2.60399 44

45 / 1.93798 46 / 0.90946

78 47 / 1.95204 86

48 / 3.1702

97 49 / 2.91638

50 / 2.79803 59 / 2.64882 61 / 3.11897

63 / 2.57151 64 / 0.63344

65 / 1.47059 15

58 / 2.53596 21

60 / 2.72874 59 62 / 2.44092

98 / 2.45974 102 / 2.89582 101 / 3.00823

100 / 2.72757 103 / 1.23805

104 / 2.65248

105 / 1.56149

24 99 / 2.76855 182 / 2.80251

183 / 2.17394 178 / 0.684621

184 / 2.04844 186 / 2.36739

179 / 1.75353 181 / 1.27559 62

180 / 2.82612 185 / 2.02673

244 / 2.90515 245 / 2.54461246 / 1.4908

248 / 2.19914

242 / 2.10075 243 / 1.81286 247 / 2.70375

334 / 2.85895

335 / 2.74921 345 / 0.969838

96 344 / 3.01517

8 52 / 1.59789

53 / 3.1109 13

51 / 2.96353

54 / 2.6062 94 55 / 2.44249 81 / 2.44372

31

80 / 3.09041

352 / 3.17363

95 353 / 1.69606 9 56 / 2.23711

57 / 2.87413

95 / 2.43097 97 / 1.79475 90 / 0.575857 91 / 1.84882

60 92 / 3.11441 61 93 / 0.745853

74 94 / 1.08674

96 / 1.91462

132 / 2.48921 134 / 1.67584 128 / 2.41661

129 / 2.54924 130 / 0.857403

131 / 1.03428

133 / 1.33929 295 / 2.78515 296 / 3.03355

297 / 2.79379 292 / 2.23004 293 / 2.59059 76 294 / 2.25052

11 22 66 / 2.54335

25 67 / 1.59039

26 68 / 2.55644

35 69 / 1.22183 43

70 / 1.26323

46 71 / 0.65784172 / 1.45548 73 / 1.45696 74 / 2.43597

99 75 / 0.710271

135 / 1.03939 137 / 2.64809

138 / 2.36917139 / 2.85603 141 / 1.99344 142 / 0.632129

28 136 / 3.03397 140 / 1.63894

150 / 2.05658 151 / 2.68427

152 / 1.33314153 / 1.85365 155 / 1.0403 156 / 1.22973

157 / 2.29037 154 / 2.67744

158 / 3.05556

159 / 2.34031 160 / 1.82595

162 / 2.70381 163 / 2.43938

161 / 1.90326 203 / 0.991465

204 / 1.81584 206 / 2.66247 207 / 2.45539 208 / 2.2551

209 / 1.2466 48

205 / 3.06568

236 / 1.90515238 / 2.5944 239 / 2.72018 240 / 3.16484

241 / 0.720351 237 / 2.87679

252 / 0.86256 254 / 0.843728 255 / 2.43018

256 / 1.26966

253 / 2.83326 262 / 0.880856 263 / 3.06774

264 / 1.88647

261 / 1.9719 348 / 2.26962

349 / 2.1111 351 / 3.02766

12

78 / 2.35652 76 / 0.500095 77 / 3.13454

79 / 2.02458

173 / 2.85379 172 / 2.85207

174 / 2.12157 281 / 1.77315

280 / 2.13908 187 / 0.741716

188 / 1.28288

304 / 2.67349 298 / 3.19922

301 / 3.10155

303 / 1.2464

68 299 / 2.68553 300 / 2.93307

302 / 2.7056482 306 / 3.166

308 / 2.46248 305 / 0.378469

307 / 1.95331 341 / 2.70349340 / 1.87485

354 / 2.04535

149 / 1.71292

32 148 / 2.63438

19 117 / 2.4561

118 / 1.55824

115 / 1.40904 50 116 / 0.648045

259 / 2.59804

260 / 2.60371

258 / 2.04883 265 / 2.53475 266 / 1.20584

285 / 3.04711 284 / 1.28966 286 / 0.444549

287 / 2.49879 283 / 0.739388 282 / 2.4463

288 / 2.40494

312 / 2.0258 313 / 1.03139

314 / 2.65429 315 / 1.70638

34 189 / 2.92806

38

190 / 2.07339

40 191 / 2.80145 192 / 2.35256

324 / 2.63064 323 / 2.58729

27 171 / 2.23482 167 / 2.38609

164 / 1.92141 165 / 2.6582

166 / 2.28031 168 / 2.41683169 / 0.955716

170 / 0.967808 199 / 2.18351

197 / 0.88776

198 / 1.51159 200 / 0.583974

201 / 2.87535 202 / 1.31584 220 / 2.98526

219 / 1.26904 221 / 0.308754

222 / 1.8407

226 / 1.33417227 / 3.09905 228 / 1.31378 328 / 1.65874

338 / 1.28689 337 / 1.83675

75 336 / 3.11488 347 / 3.11676 310 / 2.37803

309 / 1.9722 311 / 1.13749

276 / 3.05065 277 / 2.40579

273 / 2.81885 275 / 3.02541 274 / 2.49444

54 278 / 3.02172

279 / 1.17834 343 / 1.31891

342 / 2.50522

−→

[Yemini, 1978]

(6)

Protein conformation from NMR data

1

2

3 4

5 9

6

8 10

7

11

−→

[Crippen & Havel 1988]

(7)

Clock synchronization: solutions

|x

^A

− x

^B

| = 7

|x

^A

− x

^C

| = 3

|x

^A

− x

^S

| = 5

|x

B

− x

C

| = 4

x

_S

= 16:27.

⁵

7

3

4

PSfrag

x

_A

x

_B

x

_C

x

S

x

_S

= 16:27

x

_A

= 16:22

x _B = 16:15

x _C = 16:19

x _B = 16:15

x _C = 16:25

x _B = 16:29

x _C = 16:19

x _B = 16:29

x _C = 16:25

x

_A

= 16:32

x _B = 16:25

x _C = 16:29

x _B = 16:25

x _C = 16:35

x _B = 16:39

x _C = 16:29

x _B = 16:39

x _C = 16:35

(8)

Definition

1. Applications 2. Definition

3. Complexity primer

4. Complexity of the DGP 5. Number of solutions

6. Mathematical optimization formulations 7. Realizing complete graphs

8. The Branch-and-Prune algorithm 9. Symmetry in the ^K DMDGP

10. Tractability of protein instances 11. Finding vertex orders

12. Approximate realizations

(9)

Distance Geometry Problem (DGP)

Given: Determine whether ∃:

– a simple graph G = (V, E) – an edge function d : E → R _≥0 – an integer K ∈ N

a realization x : V → R ^K s.t.

∀{u, v} ∈ E kx ^u − x ^v k 2 = d _uv

1 2

3

4

1 √ 3 1 1

1 1 1

2

3

4 Let n = |V |

(10)

More applications

• Autonomous underwater vehicles [Bahr et al. 2009]

• Statics of rigid structures [Maxwell 1864]

• Matrix completion [Laurent 2009]

• Statistics [Boer 2013]

• Psychology [Kruskal 1964]

[Liberti et al., SIREV 2014]

(11)

Complexity primer

1. Applications 2. Definition

3. Complexity primer

4. Complexity of the DGP 5. Number of solutions

6. Mathematical optimization formulations 7. Realizing complete graphs

8. The Branch-and-Prune algorithm 9. Symmetry in the ^K DMDGP

10. Tractability of protein instances 11. Finding vertex orders

12. Approximate realizations

(12)

Definitions

• Decision problem: mathematical YES/NO-type question depending on a parameter vector π

• Instance: same as above with π replaced by given values v

• Certificate: proof that a given answer is true

• P: all decision problems solvable in at most p( |π|) steps where p is a polynomial

• NP: all decision problems with |YES certificate| ≤ p(|π|)

where p is a polynomial

(13)

Reductions

• P, Q: decision problems

• If ∃ algorithm A which:

1. reformulates instances ¯ P of P into instances ¯ Q of Q 2. has answer( ¯ P ) = YES iff answer(A( ¯ P )) = YES

3. is polytime in the instance size | ¯ P |

then A is a reduction of P to Q

(14)

NP-hardness

• Q is NP-hard if every problem in NP reduces to Q

• Q is NP-complete if it is NP-hard and is in NP

Why does it work?

any P in NP polytime reduction

−−−−−−−−−−−−−−−−−−→ Q: how hard?

• Suppose Q easier than P

• Solve P by reducing to Q in polytime and then solve Q

• Then P as easy as Q, against assumption

• ⇒ Q at least as hard as P

So if Q is NP-hard it is as hard as any problem in NP

⇒ Q is as hard as the hardest problem in NP

(15)

NP-hardness proofs

Given a new problem Q, take any known NP-hard problem P and reduce it to Q

Why does it work?

P : NP-hard polytime reduction

−−−−−−−−−−−−−−−−−−→ Q: how hard?

• As before: Suppose . . . (etc.) ⇒ Q at least as hard as P

• Since P is NP-hard, it is hardest in NP, and so is Q

⇒ Q is NP-hard

(16)

Complexity of the DGP

1. Applications 2. Definition

3. Complexity primer

4. Complexity of the DGP 5. Number of solutions

6. Mathematical optimization formulations 7. Realizing complete graphs

8. The Branch-and-Prune algorithm 9. Symmetry in the ^K DMDGP

10. Tractability of protein instances 11. Finding vertex orders

12. Approximate realizations

(17)

DGP ∈ NP?

• NP: YES/NO problems with polytime-checkable proofs for YES

• DGP is a YES/NO problem

• DGP ₁ ∈ NP, since d ^uv = |x ^u − x ^v | ⇒ (d ∈ Q → x ∈ Q)

• Solutions might involve irrational numbers when K > 1

• Some empirical evidence that DGP 6∈ NP [Beeker et al. 2013]

(18)

The DGP is NP-hard

Partition

Given a = (a ₁ , . . . , a _n ) ∈ N ⁿ , ∃ I ⊆ {1, . . . , n} s.t. ^P

i ∈I a _i = ^P

i 6∈I a _i ?

• Reduce (NP-hard) Partition to DGP 1

• a −→ cycle C with V (C) = {1, . . . , n}, E(C) = {{1, 2}, . . . , {n, 1}}

• For i < n let d _i,i+1 = a _i , and d _n,n+1 = d _n1 = a _n

• E.g. for a = (1, 4, 1, 3, 3), get cycle graph:

1 2 3

4 5 1

4

1 3 3

[Saxe, 1979]

(19)

Partition is YES ⇒ DGP 1 is YES

• Given: I ⊂ {1, . . . , n} s.t. ^P

i ∈I a _i = ^P

i 6∈I a _i

• Construct: realization x of C in R 1. x ₁ = 0 ^{// start}

2. induction step: suppose x _i known if i ∈ I

let x _i+1 = x _i + d _i,i+1 // go right

else

x _i+1 = x _i − d _i,i+1 ^{// go left}

• Correctness proof: by the same induction

but careful when i = n: have to show x _n+1 = x ₁

(20)

I = {1, 2, 3}

0 1 2 3 4 5 6

x

1

(21)

0 1 2 3 4 5 6 x

1

x

2

1 ∈ I

(22)

0 1 2 3 4 5 6

x

1

x

2

x

3

1 ∈ I 2 ∈ I

(23)

0 1 2 3 4 5 6

x

1

x

2

x

3

x

4

1 ∈ I 2 ∈ I

3 ∈ I

(24)

0 1 2 3 4 5 6

x

1

x

2

x

5

x

3

x

4

1 ∈ I 2 ∈ I

3 ∈ I

4 6∈ I

(25)

0 1 2 3 4 5 6

x

6

= x

1

? x

2

x

5

x

3

x

4

1 ∈ I 2 ∈ I

3 ∈ I 4 6∈ I

5 6∈ I

(26)

Partition is YES ⇒ DGP 1 is YES

(1) = ^X

i ∈I

(x _i+1 − x i ) = ^X

i ∈I

d _i,i+1 =

= ^X

i ∈I

a _i = ^X

i 6∈I

a _i =

= ^X

i 6∈I

d _i,i+1 = ^X

i 6∈I

(x _i − x i+1 ) = (2)

(1) = (2) ⇒ ^X

i ∈I

(x _i+1 − x i ) = ^X

i 6∈I

(x _i − x _i+1 ) ⇒ ^X

i ≤n

(x _i+1 − x i ) = 0

⇒ (x n+1 − x ⁿ ) + (x _n − x n −1 ) + · · · + (x 3 − x 2 ) + (x ₂ − x 1 ) = 0

⇒ x n+1 = x ₁

(27)

Partition is NO ⇒ DGP 1 is NO

• By contradiction: suppose DGP 1 is YES, x realization of C

• F = {{u, v} ∈ E(C) | x ^u ≤ x ^v }, E(C)rF = {{u, v} ∈ E(C) | x ^u > x _v }

• Trace x ₁ , . . . , x _n : follow edges in F ( →) and in E(C) r F (←)

0 1 2 3

-1 -2

-3

x

1

x

3

x

2

x

4

x

5

X

{u,v}∈F

(x

_v

− x

^u

) = X

{u,v}6∈F

(x

_u

− x

^v

) X

{u,v}∈F

|x

^u

− x

^v

| = X

{u,v}6∈F

|x

^u

− x

^v

| X

{u,v}∈F

d

uv

= X

{u,v}6∈F

d

uv

• Let J = {i < n | {i, i + 1} ∈ F } ∪ {n | {n, 1} ∈ F }

⇒ ^X

i ∈J

a _i = ^X

i 6∈J

a _i

• So J solves Partition instance, contradiction

• ⇒ DGP is NP-hard, DGP 1 is NP-complete

(28)

Number of solutions

1. Applications 2. Definition

3. Complexity primer

4. Complexity of the DGP 5. Number of solutions

6. Mathematical optimization formulations 7. Realizing complete graphs

8. The Branch-and-Prune algorithm 9. Symmetry in the ^K DMDGP

10. Tractability of protein instances 11. Finding vertex orders

12. Approximate realizations

(29)

With congruences

• (G, K): DGP instance

• ˜ X ⊆ R ^Kn : set of solutions

• Congruence: composition of translations, rotations, reflections

• C = set of congruences in R ^K

• x ∼ y means ∃ρ ∈ C (y = ρx):

distances in x are preserved in y through ρ

• ⇒ if | ˜ X | > 0, | ˜ X | = 2 ^ℵ ⁰

(30)

Modulo congruences

• Congruence is an equivalence relation ∼ on ˜ X (reflexive, symmetric, transitive)

∼

• Partitions ˜ X into equivalence classes

• X = ˜ X/ ∼: sets of representatives of equivalence classes

• Focus on |X| rather than | ˜ X |

(31)

Cardinality of X

• infeasible ⇔ |X| = 0

• rigid graph ⇔ |X| < ℵ 0

• globally rigid graph ⇔ |X| = 1

• flexible graph ⇔ |X| = 2 ^ℵ ⁰

• |X| = ℵ 0 : impossible by Milnor’s theorem

(32)

Milnor’s theorem implies |X| 6= ℵ 0

• System S of polynomial equations of degree 2

∀i ≤ m p _i (x ₁ , . . . , x _nK ) = 0

• Let X be the set of x ∈ R ^nK satisfying S

• Number of connected components of X is O(3 ^nK ) [Milnor 1964]

• If |X| is countable then G cannot be flexible

⇒ incongruent elements of X are separate connected components

⇒ by Milnor’s theorem, there’s finitely many of them

(33)

Examples

V

¹

= {1, 2, 3}

E

¹

= {{u, v} | u < v}

d

¹

= 1

x

1

x

2

x

3

ρ congruence in R

²

⇒ ρx valid realization

|X| = 1

V

²

= V

¹

∪ {4}

E

²

= E

¹

∪ {{1, 4}, {2, 4}}

d

²

= 1 ∧ d

¹⁴

= √ 2

x

1

x

2

x

3

x

4

ρ reflects x

4

wrt x

1

, x

2

⇒ ρx valid realization

|X| = 2 ( , )

V

³

= V

²

E

³

= {{u, u + 1}|u ≤ 3} ∪ {1, 4}

d

¹

= 1

x

1

x

2

x

3

x

4

ρ rotates x

2

x

3

, x

1

x

4

by θ

⇒ ρx valid realization

|X| is uncountable

( , , , , . . .)

(34)

Mathematical optimization formulations

1. Applications 2. Definition

3. Complexity primer

4. Complexity of the DGP 5. Number of solutions

6. Mathematical optimization formulations 7. Realizing complete graphs

8. The Branch-and-Prune algorithm 9. Symmetry in the ^K DMDGP

10. Tractability of protein instances 11. Finding vertex orders

12. Approximate realizations

(35)

System of quadratic constraints

∀{u, v} ∈ E kx ^u − x ^v k ² = d ² _uv

• Around 10 vertices

• Computationally useless

(36)

Quadratic objective

x min ∈R ^nK

X

{u,v}∈E

( kx ^u − x ^v k ² − d ² _uv ) ²

• Globally optimal value zero iff x is a realization of G

• sBB: 10-100 vertices, exact solutions

• heuristics: 100-1000 vertices, poor quality

[Lavor et al., 2006]

(37)

Convexity and concavity

max

x ∈R ^nK

X

{u,v}∈E

kx ^u − x ^v k ²

∀{u, v} ∈ E kx ^u − x ^v k ² ≤ d ² _uv

• Convex constraints, concave objective

• Computationally no better than “quadratic objective”

(38)

Pointwise reformulation

x max ∈R ^nK

X

{u,v}∈E,k≤K

θ _uvk (x _uk − x _vk )

∀{u, v} ∈ E kx ^u − x ^v k ² ≤ d ² _uv

• Convex subproblem in stochastic iterative heuristics

“guess θ and solve”

• 100-1000 vertices, good quality

[L. IOS14/MAGO14(slides)]

(39)

SDP formulation

X min 0

X

{u,v}∈E

(X _uu + X _vv − 2X ^uv )

∀{u, v} ∈ E X _uu + X _vv − 2X ^uv ≥ d ² _uv

• Similar to those of Ye, Wolkowicz — works better for proteins

• 100 vertices, good quality

[D’Ambrosio et al., in progress]

(40)

Realizing complete graphs

1. Applications 2. Definition

3. Complexity primer

4. Complexity of the DGP 5. Number of solutions

6. Mathematical optimization formulations 7. Realizing complete graphs

8. The Branch-and-Prune algorithm 9. Symmetry in the ^K DMDGP

10. Tractability of protein instances 11. Finding vertex orders

12. Approximate realizations

(41)

Cliques

2-clique

1 2

3-clique

1 2

3 4-clique

1 2

3

4 (K + 1)-clique = K-clique ⊕ a vertex

Given a realization of the K-clique, find the position of the vertex

(42)

Trilateration

1

2

3 1 1

2 −→

1 2 3

0 1 2

Example: realize triangle on a line

• From kx 3 − x 1 k = 2 and kx 3 − x 2 k = 1 get

x ² ₃ − 2x 1 x ₃ + x ² ₁ = 4 (1) x ² ₃ − 2x 2 x ₃ + x ² ₂ = 1. (2)

• (2) − (1) yields

2x ₃ (x ₁ − x 2 ) = x ² ₁ − x ² 2 − 3

⇒ 2x ₃ = 4,

• Hence x 3 = 2

(43)

Realizing a (K + 1)-clique in R ^K ⁻¹

• Apply trilateration inductively on K

assume x ₁ , . . . , x _K ∈ R ^K ⁻¹ known, compute y = x _K+1

• K quadratic eqns (∀j ≤ K ky − x j k ² = d ² _j,K+1 ) in K − 1 vars







kyk

²

− 2x

¹

· y + kx

¹

k

²

= d

²_1,K+1

[1]

... ...

kyk

²

− 2x

^K

· y + kx

^K

k

²

= d

²_K,K+1

[K]

• Form system ∀j ≤ K − 1 ([j] − [K])







2(x

1

− x

^K

) · y = kx

¹

k

²

− kx

^K

k

²

− d

²_1,K+1

+ d

²_K,K+1

[1] − [K]

... ...

2(x

_K₋₁

− x

K

) · y = kx

K−1

k

²

− kx

K

k

²

− d

²_K_−1,K+1

+ d

²_K,K+1

[K − 1] − [K]

• This is a (K − 1) × (K − 1) linear system Ay = b Solve to find y

[Dong, Wu 2002]

(44)

“Solve”?

1. What if A is singular?

2. Or: A nonsingular but instance is NO

(45)

Singularity: rkA = K − 2

One row x _j − x _K of A depends on the others

K = 2 triangle in R

¹

x

1

− x

²

= 0

x

1

= x

2

x

3

? x

3

?

K = 3 4-clique in R

²

x

1

, x

2

, x

3

on a line

x

1

x

2

x

3

x

4

? x

4

?

K = 4 5-clique in R

³

x

1

, . . . , x

4

in a plane ^x

¹

^x

²

^x

³

^x

⁴

x

5

? x

5

?

Trend continues: rk A = K − 2 ⇒ |X| = 2 (see later)

(46)

Singularity: rkA = K − 3

Two rows x _j − x _k depend on the others

K = 3 4-clique in R

²

x

1

= x

2

= x

3

x

1

= x

2

= x

3

x

4

?

K = 4 5-clique in R

³

x

1

, . . . , x

4

on a line

x

1

x

2

x

3

x

4

x

5

?

Trend continues: [Hendrickson, 1992]

Thm. 5.8. If a graph G is connected, flexible and has more than K vertices, X

contains almost always a submanifold diffeomorphic to a circle

(47)

Hendrickson’s theorem also applies to non-cliques

(48)

Nonsingular matrix A with NO instance

• Infeasible quadratic system ∀j ≤ K kx K+1 − x j k ² = d ² _j,K+1

• Take differences, get nonsingular A and value for x K+1

• . . . but it’s wrong!

Shit happens!

Every time you solve the linear system Ay = b

check feasibility with quadratic system

(49)

Algorithm for realizing complete graphs in R ^K

• Assume:

(i) G = (V, E) complete (ii) |V | = n ≥ K + 2

(iii) we know x ₁ , . . . , x _K+1

• Increase K: we know how to realize x _K+2 in R ^K

• Use this inductively for each i ∈ {K + 2, . . . , n}

(50)

Algorithm for realizing complete graphs in R ^K

// realize next vertex iteratively for i ∈ {K + 2, . . . , n} do

// use (K + 1) immediate adjacent predecessors to compute x

i

if rkA = K then

x

i

= A

⁻¹

b // A, b defined as above else

x

_i

= ∞ // A singular, mark ∞ and exit break

end if

// check that x

_i

is feasible w.r.t. other distances for {j ∈ N(i) | j < i} do

if kx

ⁱ

− x

^j

k 6= d

^ij

then

// if not, mark infeasible and exit loop x

_i

= ∅

break end if end for

if x

i

= ∅ then break

end if

end for

return x

(51)

Complexity of Alg. 1

• Outer loop: O(n)

• Rank and inverse of A: O(K ³ )

• Inner loop: O(n)

• Get O(n ² K ³ )

• But in most applications K is fixed

• Get O(n ² )

But how do we find the realization of the first K + 1 vertices?

(52)

Realizing (K + 1)-cliques in R ^K

• Realizing (K + 1)-cliques in R ^K ⁻¹ yields “flat simplices”

(e.g. triangles on lines)

• Use “natural” embedding dimension R ^K

• Same reasoning as above:

get system Ay = b where y = x _K+1 and A _j = 2(x _j − x K )

• But now A is (K − 1) × K

• Same as previous case with A singular

(53)

Almost square

How can you solve the following system Ay = b:



 

a ₁₁ a ₁₂ . . . a _1K ... ... . . . ...

a _K _−1,1 a _K _−1,2 . . . a _K _−1,K



 



 

y ₁ ...

y _K



  =



 

b ₁ ...

b _K ₋₁



 

where A has one more column than rows and rank K − 1?

(54)

Basics and nonbasics

• Since rk A = K − 1, ∃ K − 1 linearly independent columns

• B: set of their indices

• N : index of remaining column

• B: (K − 1) × (K − 1) square matrix of columns in B

• ⇒ B is nonsingular

• Can partition columns as A = (B |N)

Column j corresponds to variable y _j

• Variables y _B are called basic variables

• Variable y _N is called nonbasic variable

(55)

The dictionary

(B |N)y = b

⇒ By _B + N y _N = b

⇒ y _B = B ⁻¹ b − B ⁻¹ N y _N

Basics expressed in function of nonbasic

(56)

One quadratic equation

• From value of y _N , can use dictionary to get y _B

• Use one quadratic equation

1. Pick any h ∈ {1, . . . , K − 1}, equation is kx _h − yk ² ₂ = d ² _hK 2. y = (y _B |y _N ) ^⊤

3. Replace y _B with B ⁻¹ b − B ⁻¹ N y _N in equation

4. Solve resulting quadratic equation in one variable y _N 5. Get 0,1 or 2 values for y _N

6. ⇒ Get 0,1 or 2 positions for x K+1

(57)

What if B ⁻¹ N is zero?

• y _B = B ⁻¹ b − B ⁻¹ N y _N reduces to y _B = B ⁻¹ b

• Use one quadratic equation

1. Pick any h ∈ {1, . . . , K − 1}, equation is kx _h − yk ² ₂ = d ² _hK 2. y = (y _B |y _N ) ^⊤

3. Replace y _B with B ⁻¹ b in equation

4. Solve resulting quadratic equation in one variable y _N 5. Get 0,1 or 2 values for y _N

6. ⇒ Get 0,1 or 2 positions for x K+1

(58)

The difference

• B ⁻¹ N 6= 0: y _N −−−−−−−→ y ^dictionary _B

• Different values y _N ⁺ 6= y _N ⁻ −→ y ⁺ , y ⁻ with different components

• B ⁻¹ N = 0: y _B quadratic eqn.

−−−−−−−−−−→ y _N

• Even if y _N ⁺ 6= y _N ⁻ , K − 1 components of y ⁺ , y ⁻ are equal

aff(x ₁ , . . . , x _K ₋₁ ) = {y ∈ R ^K | y _N = 0 }

(59)

The case of no solutions

• No realizations exist for this (K + 1)-clique in R ^K

• DGP instance is NO

(60)

The case of one solution

• Assume for simplicity: N = K, h = 1, B ⁻¹ N 6= 0 Then kx h − yk ² = d ² _h,K+1 becomes:

λy

_K²

− 2µy

^K

+ ν = 0, where

λ = 1 + X

ℓ,j<K

β

_ℓj²

a

²_jK

µ = x

_1K

+ X

ℓ,j<K

β

_ℓj

a

_jK

(β

_ℓj

b

_ℓ

− x

^1ℓ

)

ν = X

ℓ,j<K

β

_ℓj

b

_ℓ

(β

_ℓj

b

_ℓ

− 2x

1ℓ

) + kx

¹

k

²

− d

²1,K+1

• (Exactly one solution for y K ) ⇔ µ ² = λν, not a tautology

• The set of all (K + 1)-clique DGP instances in R ^K s.t. µ ² = λν has Lebesgue measure 0

• Ignore them, they happen with probability ^∗ 0!

∗

Assuming continuous distributions over the reals. For floating point number, who knows? . . .

but we’ll ignore these instances anyhow

(61)

Discriminant > 0, = 0

4 4 4

1 3 1

2

3

1

2

3

2

(62)

The case of two solutions

• K spheres S ^K ₁ ⁻¹ , . . . , S ^K _K ⁻¹ in R ^K

centered at x

1

, . . . , x

_K

with radii d

_1,K+1

, . . . , d

_K,K+1

• x K+1 must be at the intersection of S ^K ₁ ⁻¹ , . . . , S ^K _K ⁻¹

• If ^T _j S ^K _j ⁻¹ 6= ∅, then | ^T _j S ^K _j ⁻¹ | = 2 in general

• will not mention “probability 0” or “in general” anymore

[Coope 2000]

(63)

Mirror images

• Let x ⁺ = {x 1 , . . . , x _K , x ⁺ _K+1 }, x ⁻ = {x 1 , . . . , x _K , x ⁻ _K+1 } assume dim aff(x ₁ , . . . , x _K ) = K − 1 (†)

• Theorem

x ⁺ , x ⁻ are reflections w.r.t. hyperplane defined by x ₁ , . . . , x _K

• Proof

1. x ⁺ , x ⁻ congruent by construction

2. ∀i ≤ K x i ∈ x ⁺ ∩ x ⁻ → x ⁺ , x ⁻ not translations

3. |x ⁺ ∩ x ⁻ | = K < |x ⁺ | = |x ⁻ | → x ⁺ , x ⁻ not rotations by ( †)

4. ⇒ must be reflections

(64)

Algorithm for realizing (K + 1)-cliques in R ^K

// realize 1 at the origin x ₁ = (0, . . . , 0)

// realize next vertex iteratively for ℓ ∈ {2, . . . , K + 1} do

// at most two positions in R ^ℓ ⁻¹ for vertex ℓ S = ^T

i<ℓ

S ^ℓ _i ⁻² if S = ∅ then

// warn: infeasible return ∅

end if

// arbitrarily choose one of the two points choose any x _ℓ ∈ S

end for

// return feasible realization

return x

(65)

Complexity of Alg. 2

• Outer loop: O(K)

• Gaussian elimination on A: O(K ³ )

• Some messing about to obtain x ⁺ _K+1 , x ⁻ _K+1 : +O(K ² )

• Get O(K ⁴ )

• But in most applications K is fixed

• Get O(1)

(66)

Back to complete graphs

• Alg. 2: realize 1, . . . , K + 1 in R ^K : O(1)

• Alg. 1: Realize K + 2, . . . , n: O(n ² )

• ⇒ O(n ² )

• What about |X|?

– Alg. 1 is deterministic: one solution from x ₁ , . . . , x _K+1 – Alg. 2 is stochastic: pick one of two values K times

⇒ |X| = 2 ^K

(67)

Let’s look at sparser graphs

(68)

K-laterative graphs

• In Alg. 1 we only need each v > K + 1 to have K + 1 adjacent predecessors in order to find a unique solution for x _v

• Determination of x v from K+1 adjacent predecessors: K-lateration

• K-laterative graph:

(i) has a vertex order ensuring this property

(ii) the initial K + 1 vertices induce a (K + 1)-clique the order is called K-lateration order

• Alg. 1 realizes all K-laterative graphs

The DGP restricted to K-laterative graphs in R ^K is tractable

[Eren et al. 2004]

(69)

The story so far

• Lots of nice applications

• DGP is NP-hard

• May have 0, 1, finitely many or 2 ^ℵ ⁰ solutions modulo congruences

• Continuous optimization techniques don’t scale well

• Using K + 1 adjacent predecessors, realize K-laterative graphs in R ^K in polytime

• Do we need K + 1 adjacent predecessors, or can we do with

less?

(70)

The Branch-and-Prune algorithm

1. Applications 2. Definition

3. Complexity primer

4. Complexity of the DGP 5. Number of solutions

6. Mathematical optimization formulations 7. Realizing complete graphs

8. The Branch-and-Prune algorithm 9. Symmetry in the ^K DMDGP

10. Tractability of protein instances 11. Finding vertex orders

12. Approximate realizations

(71)

Fewer adjacent predecessors

• Alg. 2 only needs K adjacent predecessor

• Extend to n vertices: (K − 1)-laterative graphs

• Can we realize (K − 1)-laterative graphs in R ^K ?

• A small case: graph consisting of two K + 1 cliques

1 1 1

5 5

5 2 2

2 3 3

3 4 4

4

(72)

Take a closer look. . .

5 5 2 2

3 3

4 4

• Realization of a K + 1 clique in R ^K knowing x ₁ , . . . , x _K

• We know how to do that!

• Consistent with 2 solutions for x 5 , reflected across plane through

x ₂ , x ₃ , x ₄

(73)

Discretization and pruning edges

• (K − 1)-laterative graph G = (V, E):

∀v > K ∃U ^v ⊂ V (|U ^v | = K ∧ ∀u ∈ U ^v (u < v) ∧ {u, v} ∈ E)

• Discretization edges:

E _D = {{u, v} ∈ E | u, v ≤ K}

| {z }

initial clique

∪ {{u, v} ∈ E | v > K ∧ u ∈ U v }

| {z }

vertex order

• Pruning edges E _P = E r E _D

10

1

2

3 4

5 6

7 8

9

11

12 13

14 15

16

17

18 19

(74)

Role of discretization edges

Missing discretization edge

⇒ non-rigid structure

⇒ X uncountable

Else: X finite

(75)

Role of pruning edges

No pruning edges: 8 incongruent realizations in R ²

1

2

3

4 5 1

2

3

4 5 1

2 3 4

5

1

2 3 4

5

1 2 3

4 5

1 2 3

4

5

1 2 3

4 5

1 2 3

4

5

(76)

Role of pruning edges

Pruning edge {1, 4}: only 4 realizations remain valid

1

2

3

4 5 1

2

3

4 5 1

2 3 4

5

1

2 3 4

5

1 2 3

4 5

1 2 3

4

5

1 2 3

4 5

1 2 3

4

5

(77)

Motivation

Protein backbones

• Total order < on V

• Covalent bond distances: {u − 1, u} ∈ E

• Covalent bond angles: {u − 2, u} ∈ E

• NMR experiments: {u − 3, u} ∈ E

(and other edges {u, v} with v − u > 3)

Generalize “3” to K

[Lavor et al., COAP 2012]

(78)

K DMDGP graphs

K = 2 K = 3

1

2

3

4

5 Generalization of protein backbone order:

v > K is adjacent to K immediate predecessors v − 1, . . . , v − K

K DMDGP: Discretizable Molecular Distance Geometry Problem

(79)

The Branch-and-Prune (BP) algorithm

BP(v, ¯ x, X):

1. Given v > K, realization ¯ x = (x ₁ , . . . , x _v ₋₁ ) 2. Compute S = ^T

u ∈U ^v

S ^K _u ⁻¹

3. For each x _v ∈ S s.t. ∀{u, v} ∈ E P (u < v → kx ^u − x ^v k = d ^uv ) (a) let x = (¯ x, x _v )

(b) if v = n add x to X, else call BP(v + 1, x, X)

• Recursive: starts with BP(K + 1, (x

1

, . . . , x

K

), ∅)

• All realizations in X are incongruent

^∗

• Can be easily modified to find only p solutions for given p

• Applies to all (K − 1)-laterative graphs in R

^K

• Specialize to KDMDGP graph by setting U

^v

= {v − 1, . . . , v − K}

∗

with probability 1, and aside from one reflection at v = K + 1

[L. et al. ITOR 2008]

(80)

Complexity of BP

• Most operations are O(K ^h ) for some fixed h ⇒ O(1)

• Distance check at Step 3: O(n)

• Recursion on at most 2 branches at each call: binary tree

• Only recurse when v > K, v < n: 2 ⁿ ^−K nodes

• Overall O(n2 ⁿ ^−K ) = O(2 ⁿ )

Worst-case exponential behaviour

(81)

Hardness of ^K DMDGP

• The ^K DMDGP is NP-hard for each K

– every DGP instance is also DMDGP if K = 1

– reduction from Partition can be extended to any K

• (K − 1)-lateration graphs are NP-hard by inclusion

• No polytime algorithm unless P=NP

Trilaterative graphs in R ^K are complexitywise borderline at K

(82)

Correctness

Thm.

When BP terminates, X contains every incongruent realization of G

Proof.

• Let ¯ y be any realization of G

• Since G has an initial K-clique, can rotate/translate/reflect ¯ y to y[K] = x[K] for all x ∈ X

• BP exhaustively constructs every extension of x[K] which is feasible with all distances, so y ∈ X

for a realization y, y[h] = (y

1

, . . . , y

h

) is the initial segment of y

(83)

Two examples

(84)

Empirical observations

• Fast: up to 10k vertices in a few seconds on 2010 hardware

• Precise: errors in range O(10 ⁻⁹ )-O(10 ⁻¹² )

• Number of solutions always a power of 2:

obvious if E _P = ∅, but otherwise mysterious

• Linear-time behaviour on proteins:

this really shouldn’t happen

(85)

Symmetry in the ^K DMDGP

1. Applications 2. Definition

3. Complexity primer

4. Complexity of the DGP 5. Number of solutions

6. Mathematical optimization formulations 7. Realizing complete graphs

8. The Branch-and-Prune algorithm 9. Symmetry in the ^K DMDGP 10. Tractability of protein instances 11. Finding vertex orders

12. Approximate realizations

[L. et al. DAM 2014]

(86)

Partial reflections

• For each v > K, let

g _v (x) = (x ₁ , . . . , x _v ₋₁ , R ^v _x (x _v ), . . . , R _x ^v (x _n )) be the partial reflection of x w.r.t. v

x

v−3

x

v−2

x

v−1

• Note: the g _v ’s are idempotent operators

• G D = (V, E _D ): subgraph of G given by discretization edges

• ∀v > K reflection R _x ^v gives a binary choice in general ^∗

• X D ⊂ R ^nK contains 2 ⁿ ^−K incongruent realizations of G _D

∗

subsequent results hold “with probability 1”

(87)

Discretization group

• G _D ₌ _hg _v | v > Ki: the discretization group of G w.r.t. K subgroup of a Cartesian product of reflection groups

• An element g ∈ G _D has the form ^N

v>K g _v ^a ^v , where a _v ∈ {0, 1}

• Action of G _D _{on X} _D _: _{g(x) =} _g ^a ^K+1

K+1 ◦ · · · ◦ g _n ^a ⁿ (x)

(88)

Commutativity of partial reflections

Lemma A G _D _{is Abelian}

Proof Assume K < u < v. Then

g _u g _v (x) = g _u (x ₁ , . . . , x _v ₋₁ , R _x ^v (x _v ), . . . , R ^v _x (x _n ))

= (x ₁ . . . , x _u ₋₁ , R ^u _g

v (x) (x _u ), . . . , R ^u _g

v (x) R _x ^v (x _v ), . . . , R ^u _g

v (x) R _x ^v (x _n ))

= (x ₁ . . . , x _u ₋₁ , R û _x (x _u ), . . . , R ^v _g _u _(x) R û _x (x _v ) , . . . , R ^v _g _u _(x) R _x û (x _n ))

= g _v (x ₁ , . . . , x _u ₋₁ , R _x ^u (x _u ), . . . , R ^u _x (x _n ))

= g _v g _u (x)

where equality of these terms holds by a Technical Lemma (next slide)

[L. et al. 2013]

(89)

Commutativity of partial reflections

Technical Lemma

(Proof sketch for K = 2) Let y ⊥Aff(x v −1 , . . . , x _v _−K ) and ρ ^y = R ^v _x

ρ ^z

ρ ^y t

O

z y ρ ^z y

ρ ^y t

ρ ^z t ρ ^ρ ^z ^y

ρ ^z ρ ^y t = ρ ^ρ ^z ^y ρ ^z t

(90)

One realization generates all others

Lemma B The action of G _D _{on X} _D is transitive

K = 2 ^x

y g

g g

5

4

3

∃g ∈ G _D _(y = g(x)): namely, y = g ₅ (g ₄ (g ₃ (x)))

Proof By induction on v: assume result holds to v − 1 with g

^′

, then either it holds for v and g = g

^′

, else flip and let g = g

v

g

^′

[L. et al. 2013]

(91)

Structure and invariance

• G _D is Abelian and generated by n − K idempotent elements

⇒ G _D = ∼ C ₂ ⁿ ^−K

• G _D _{≤ Aut(X} _D ) by construction

(92)

Solution sets

• X: set of incongruent realizations of G

• G D defined on same vertices but fewer edges

⇒ fewer distance constraints on realizations

⇒ more realizations

• All realizations of G are also realizations of G D

⇒ X ⊆ X _D

(93)

Losing invariance on pruning edges

Lemma C Let W ^uv = {u + K + 1, . . . , v} be the range of {u, v}

∀x ∈ X, u, w, v ∈ V (w ∈ W ^uv ↔ kx ^u − x ^v k 6= kg ^w (x) _u − g ^w (x) _v k) Proof sketch for K = 2

u

w v

Corollary If {u, v} ∈ E P and w ∈ W ^uv , g _w (x) 6∈ X

[L. et al. 2013]

(94)

Pruning group

Define:

Γ = {g ^w ∈ G _D | w > K ∧ ∀{u, v} ∈ E P (w 6∈ W ^uv ) } G _P ₌ _hΓi

Lemma D X is invariant w.r.t. G _P Proof

Follows by corollary, invariance of X _D w.r.t. G _D _{and X} _{⊆ X} _D

(95)

Transitivity of the pruning group

Lemma E The action of G _P on X is transitive

• Given x, y ∈ X, aim to show ∃g ∈ G

_P

_{(y = g(x))}

• Lemma B ⇒ ∃g ∈ G

_D

with y = g(x) ∈ X

^D

• Suppose g 6∈ G

_P

and aim for a contradiction

• ⇒ ∃{u, v} ∈ E

P

and w ∈ W

^uv

s.t. g

_w

is a component of g

• Lemma C ⇒ kg

^w

(x)

u

− g

^w

(x)

v

k 6= d

^uv

• If w is the only such vertex, ⇒ y = g(x) 6∈ X, contradiction

• Suppose ∃ another z ∈ W

^uv

s.t. g

z

is a component of g

• Set of cases s.t. kx

^u

− x

^v

k = kg

^z

g

w

(x)

_u

− g

^z

g

w

(x)

_v

k given kg

^w

(x)

_u

− g

^w

(x)

_v

k 6=

kx

^u

− x

^v

k 6= kg

^z

(x)

u

− g

^z

(x)

v

k has Lebesgue measure 0 in all DGP inputs

• By induction, holds for any number of components g

^z

of g with z ∈ W

^uv

• ⇒ y = g(x) 6= x against hypothesis, done

[L. et al. 2013]

(96)

The main result

Theorem |X| = 2 ^|Γ|

• Lemma A ⇒ G

_D

= ∼ C

₂ⁿ^−K

⇒ | G

_D

_{| = 2}

ⁿ^−K

• G

_P

_≤ G

_D

_{⇒ ∃ℓ ∈ N (} G

_P

= ∼ C

₂^ℓ

) , with ℓ = |Γ|

• Lemma E ⇒ ∀x ∈ X G

_P

_{x = X}

• Idempotency ⇒ ∀g ∈ G

_P

_g

⁻¹

_{= g}

⇒ ∀g, h ∈ G

_P

_{, x} _{∈ X (} _{gx =} _hx _{→ h}

⁻¹

_{gx = x} → hgx = x → hg = e → h = g

⁻¹

= g)

⇒ the mapping G

_P

_x _→ G

_P

given by gx → g is injective

• ∀g, h ∈ G

_P

_{, x} ∈ X (g 6= h → gx 6= hx)

⇒ the mapping gx → g is surjective

• ⇒ the mapping gx → g is a bijection

• ⇒ | G

_P

_x _{| = |} G

_P

_|

• ⇒ ∀x ∈ X |X| = | G

_P

_x _{| = |} G

_P

_{| = 2}

^|Γ|

[L. et al. 2013]

(97)

Symmetry-aware BP

• Don’t need to explore all branches of BP tree

• Build Γ as a pre-processing step

• Run BP, terminating as soon as |X| = 1

• For each g ∈ G _P , compute gx

[Mucherino et al. JBCB 2012]

(98)

Euclidean Distance Geometry