• Non ci sono risultati.

Hairpins participating in folding of human telomeric sequence quadruplexes studied by standard and T-REMD simulations

N/A
N/A
Protected

Academic year: 2021

Condividi "Hairpins participating in folding of human telomeric sequence quadruplexes studied by standard and T-REMD simulations"

Copied!
19
0
0

Testo completo

(1)

Hairpins participating in folding of human telomeric

sequence quadruplexes studied by standard and

T-REMD simulations

Petr Stadlbauer

1

, Petra K ¨uhrov´a

2

, Pavel Ban´aˇs

2

, Jaroslav Koˇca

3,4

, Giovanni Bussi

5

,

Luk´aˇs Trant´ırek

3

, Michal Otyepka

2

and Jiˇr´ı ˇSponer

1,3,*

1Institute of Biophysics, Academy of Sciences of the Czech Republic, Kr´alovopolsk´a 135, 612 65 Brno, Czech

Republic,2Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of

Science, Palack´y University, tˇr. 17 listopadu 12, 771 46 Olomouc, Czech Republic,3CEITEC – Central European

Institute of Technology, Masaryk University, Campus Bohunice, Kamenice 5, 625 00 Brno, Czech Republic,4National

Center for Biomolecular Research, Faculty of Science, Masaryk University, Campus Bohunice, Kamenice 5, 625 00

Brno, Czech Republic and5Scuola Internazionale Superiore di Studi Avanzati, Via Bonomea 265, 34136 Trieste, Italy

Received August 26, 2015; Revised September 20, 2015; Accepted September 22, 2015

ABSTRACT

DNA G-hairpins are potential key structures partici-pating in folding of human telomeric guanine quadru-plexes (GQ). We examined their properties by stan-dard MD simulations starting from the folded state and long T-REMD starting from the unfolded state, accumulating ∼130 !s of atomistic simulations. An-tiparallel G-hairpins should spontaneously form in all stages of the folding to support lateral and di-agonal loops, with sub-!s scale rearrangements be-tween them. We found no clear predisposition for direct folding into specific GQ topologies with spe-cificsyn/antipatterns. Our key prediction stemming from the T-REMD is that an ideal unfolded ensemble of the full GQ sequence populates all 4096syn/anti combinations of its four G-stretches. The simulations can propose idealized folding pathways but we ex-plain that such few-state pathways may be mislead-ing. In the context of the available experimental data, the simulations strongly suggest that the GQ folding could be best understood by the kinetic partition-ing mechanism with a set of deep competpartition-ing minima on the folding landscape, with only a small fraction of molecules directly folding to the native fold. The landscape should further include non-specific col-lapse processes where the molecules move via diffu-sion and consecutive random rare transitions, which could, e.g. structure the propeller loops.

INTRODUCTION

Telomeres are terminal regions of linear eukaryotic chromosomes. Their main function is to maintain ge-nomic stability and facilitate chromosome replication. Telomeric DNA in vertebrates comprises of double-stranded DNA consisting of tandem repeat sequence d(GGGTTA).d(CCCTAA), terminating in a

single-stranded 150–250 nt long 3!-terminal G-rich overhang

(1,2). Telomeres are shortened during every DNA

replica-tion (3,4) and their loss may eventually trigger apoptosis

(5). A specialized enzyme, telomerase, helps to maintain

telomere length (6). However, excessive telomerase activity

may lead to cell immortality (7); cells in more than 80% of

cancers have indeed been shown to overexpress telomerase

(8). The telomeric G-rich single-stranded overhang is

capable of forming non-canonical evolutionary conserved

structures called G-quadruplexes (GQ) (9), recently

visu-alized in vivo (10,11), which may downregulate telomerase

activity (12). Therefore, stabilization of telomeric GQs by

small ligands may help in cancer treatment (8,13).

GQ-forming sequences are widespread also in other parts of the genome and there is increasing evidence that formation of GQ structures may be a powerful way to regulate gene

expression (14–19).

GQs are formed by stacking of guanine quartets. The quartet is a planar assembly of four cis Watson–

Crick/Hoogsteen-paired (20) (shortly Hoogsteen-paired or

cWH) guanines. Four carbonyl oxygens of each quartet aim towards GQ centre and form a central channel with a nega-tive electrostatic potential. Monovalent cations are required to reside inside the channel to counterbalance the nega-tive potential, and thus stabilize the GQ architecture. Mu-tual orientation of G-strands in quadruplexes is either

par-*To whom correspondence should be addressed. Tel: +420 541 517 133; Fax: +420 541 211 293; Email: sponer@ncbr.muni.cz

C

"The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

at Sissa on November 16, 2015

http://nar.oxfordjournals.org/

(2)

allel or antiparallel, and connecting loops can be lateral (edgewise), diagonal or propeller (double-chain reversal), giving rise to parallel, antiparallel and hybrid GQ topolo-gies. Antiparallel orientation of two strands requires that any two cWH base-paired guanines have different χ angle orientation, one being in syn conformation and the other in anti. Parallel strands impose the same χ angle orientation for base-paired guanines. This leads to interdependence be-tween the syn/anti patterns in the GQ stems and the overall

GQ topologies (21,22). Human telomeric sequence is well

known for its structural polymorphism. Free energy compu-tations suggest that polymorphism is a feature inherent to DNA GQs having odd numbers of guanines in the G-tracts, unless their topology is specifically dictated by their loops

(23,24). The dominant GQ topology in a given experiment

depends on exact nucleotide sequence including the flank-ing nucleotides, nature of stabilizflank-ing cation, DNA strand concentration, presence of cosolvents and perhaps on other

factors (25–37). Six different topologies of human

telom-eric GQ have been elucidated by X-ray and NMR studies

(30–36,38,39).

There are currently intense experimental efforts to un-derstand the elusive process of GQ folding. Although ear-lier studies suggested that at least some GQ sequences may fold on a time scale of dozens to hundreds of milliseconds

(40,41), most recent ensemble experiments indicate that the

process of folding takes hours or even days (42,43). This

is consistent with the latest single molecule experiments which discovered conformational states with similar

life-times (44,45). The basic limitation of experimental

inves-tigations is that most of them do not allow confident deter-mination of the structures that are populated in the course of the folding process, and thus plausible structural sugges-tions are sometimes based on intuition. The order parame-ters that are currently used to describe the folding processes may be insufficient for the full characterization of folding. Some of them, e.g. derived from CD spectra, may

occasion-ally be ambiguous (46). For example, although CD is

com-monly used to detect major changes of the GQ folded

en-semble dominated by the basket-type GQ upon the Na+

K+replacement (47), Raman spectroscopy (48) and NMR

(49) revealed that sometimes the most populated fold may

remain unaltered despite a profound change of the CD signal. Different and sometimes even kinetically unrelated ensembles may in principle overlap in the measured

sig-nals (50). Recently, first time-resolved NMR study of GQ

folding intermediates for the human telomeric sequence emerged. The time resolution allowed to identify two com-peting long-lived structures corresponding to the hybrid-2 (off-pathway intermediate) and hybrid-1 (thermodynamic

minimum) folds (43).

In fact, when considering all the currently available ex-perimental data and their interpretation, the picture of GQ folding is far from being unambiguous. This may partially reflect limitations of the specific techniques. However, con-sistent interpretation of the experimental data may also be hampered by the fact that different experiments capture the folding process under different physical conditions, result-ing, e.g. in genuine differences between ensemble and sin-gle molecule experiments. The unfolded (denatured) state ensemble, which significantly influences the folding process

(45,51), depends on the specific physical process of

unfold-ing. Thermally denatured, chemically denatured and low-entropy force-denatured ensembles are indeed not identical

(51,52) and may give rise to diverse folding pathways.

Because of the genuine limitations of the experimental techniques, there have been increasing efforts to comple-ment the expericomple-ments by modern computational techniques

(53–62). The computational methods can provide in-depth

insights into those parts of the folding process which are not

within the resolution limits of the experiments (43). Among

the available techniques, molecular dynamics (MD) simu-lation approaches are very promising, since they allow an optimal compromise between accuracy and computational efficiency. Standard MD studies attempting to mimic real thermal dynamics of DNA molecules are however limited

by their !s time scale (63). This allows studying the

prop-erties of different structures (i.e. their associated local free energy basins) that can participate in the folding, but does not allow direct characterization of the full folding pro-cess. MD studies were performed for many GQ native

struc-tures (54,64–67), potential four-stranded intermediates with

strand slippage (58,60) and G-triplexes (42,57,61).

Simu-lation methods that enhance sampling can achieve larger conformational changes and even to certain extent sample the space between folded and denatured species. However, any acceleration of barrier crossing is associated with some drawbacks and may distort the picture of the folding

pro-cess (63). The induced folding or unfolding might follow

a pathway that depends on the exact protocol adopted to enhance sampling, which may complicate comparison with experiments made using specific control parameters and se-tups that are not equivalent to those in the computations

(63,68). Nevertheless, even with enhanced sampling

meth-ods, it is unlikely we can achieve a complete folding analysis for systems that fold on such a long time scale as the GQs. Besides that, all theoretical studies are certainly affected by the approximations inherent to the simple molecular

me-chanical (force field) model (63,69).

The DNA GQ folding has been sometimes likened to funnel-like folding of small fast-folding proteins or RNAs, perhaps in light of some earlier experiments indicating fast folding. However, we suggest that there is a substantial dif-ference between the principles of small protein and GQ fold-ing. Short proteins have folding times in hundreds of !s to

ms (51,70,71). Their sequences have been optimized to

elim-inate kinetic traps from their free energy surfaces, reducing their ruggedness to be around kT. On the contrary, the fact that the very short GQ-forming DNA sequences fold on very long time scales indicates that GQ conformational sur-face is extremely rugged with deep competing basins of at-tractions (i.e. the substates) impeding fast spontaneous

fold-ing into the native basin of attraction (43,50). GQs have very

limited sequence freedom for tuning the folding speed. Fur-ther, folding of short proteins is directed by formation of

lo-cal secondary structure native contacts (71). This contrasts

GQ DNA, where the most critical part of the folding may involve formation of the native consecutive ion-stabilized quartets. The quartets are based on non-local interactions and, in a given fold, must have a specific combination of syn

and anti nucleotides (21,22). A decisive role of local contacts

(which would include formation of the loop topologies) for

at Sissa on November 16, 2015

http://nar.oxfordjournals.org/

(3)

the GQ folding is also not consistent with the fact that many GQs fold to diverse topologies upon subtle changes of the environment or flanking sequences. This indicates presence of several or even many competing basins of attraction. Then only a small fraction of the molecules reaches directly the thermodynamically most stable ensemble. The remain-ing molecules would be first trapped in competremain-ing substates, i.e. misfolded, resulting in extremely multi-pathway folding process. The native and most significant competing basins of attraction may be interchanged upon changing the ex-perimental conditions, which may explain why the folding topology dominant in the final thermodynamic equilibrium is so sensitive to subtle changes of the folding conditions. Most molecules likely initially locate different (competing) folds and may need to unfold to make another folding at-tempt. This is consistent with our earlier simulation studies identifying numerous stable potential intermediates. It has been visualized experimentally for the folding of the human telomeric hybrid-1 type GQ for which the hybrid-2 structure

has been shown as a major competing substate (43). The

authors conclude that transitions between the two folds ap-pear to be realized via the unfolded ensemble and note that the folding process can be affected by other substates which are below the resolution of the experiment. The number of substates that can be revealed by a given technique depends on its structural and time resolution.

Earlier, we studied potential four- and three-stranded ar-chitectures that may participate in GQ folding pathways

(60,61). In this paper we complete the investigations by an

extended set of simulations of potential double-stranded G-hairpin structures. G-G-hairpins are relevant not only to the earliest stages of the folding pathways (the hairpins were

suggested to fold on the !s to ms time scale) (42,72), but

also to various later phases of folding as the G-hairpins may be important parts of the ensembles during inter-conversions among different triplex and GQ arrangements

(40,41,43,56,72–74). Only limited experimental and

compu-tational effort has been paid to G-hairpins so far (56,72–74).

We report three sets of investigations. First, a set of stan-dard simulations is initiated assuming structures of the hair-pins with cWH GG pairing with syn-anti orientations of the guanines corresponding to either different known GQ topologies or suggested GQ folding pathways. The aim is to characterize the conformational space associated with the native-like basins of the G-hairpins. The second set of stan-dard simulations is initiated from unfolded single strand, in an attempt to initiate formation of a G-hairpin from a straight single strand. Finally, we use an extended

tem-perature replica-exchange MD (T-REMD) (75)

enhanced-sampling simulation to obtain a full-scale analysis of the conformational space of the G-hairpins that is independent of the starting structures.

MATERIALS AND METHODS

Antiparallel hairpins with lateral (edgewise) loops

We have examined three DNA sequences,

differ-ing in their 3!-terminal base: d[AGGGTTAGGG],

d[AGGGTTAGGGT] and d[AGGGTTAGGGA].

The choice was based on the sequences of three

known GQ NMR structures, basket-type 143D

Figure 1. Basic schemes of the simulated structures. (A) Antiparallel hair-pins with four different syn/anti (s/a) patterns. Each pattern has one vari-ant with narrow groove and one with wide groove. (B) Parallel stranded duplexes. (C) Triplexes and a chair-like quadruplex with different loop types and order, all with s/a patterns capable of folding to the basket-type quadruplex 143D without any syn ↔ anti transition. Deoxyguanosine residues are shown as rectangles, syn-oriented are in orange, anti-oriented in yellow. Black lines depict backbone and arrows mark the 5!-end-to-3!

-end direction. Red lines mean hydrogen bonding. ‘n’, ‘w’ and ‘m’ stand for narrow, wide and medium groove, respectively. Loop and flanking residues are not shown. The base pairing of GG pairs is cWH. For the sake of brevity, s/a pattern abbreviations in the Figure are compacted, e.g. asa-sas instead of 5!-asa––sas-3!used in the text.

d[A(GGGTTA)3GGG] (31), 3 + 1 hybrid-1-type 2GKU

d[TT(GGGTTA)3GGGA] (38) and 3 + 1 hybrid-2-type

2JPZ d[TTA(GGGTTA)3GGGTT] (32). For the sake of

simplicity, we do not list the flanking bases while the TTA loop is marked as ‘––’ in the G-hairpin abbreviations used in the following text.

In antiparallel 5!-GGG––GGG-3!hairpins, the first G of

the first stretch pairs with the last G of the second G-stretch (both in bold), etc. Since the guanines are cWH paired, the antiparallel arrangement implies that if one G in a base pair adopts anti orientation of its χ angle, the other G in the pair must adopt syn orientation (or vice versa)

(21,22). Thus, if we assign specific order of syn (s) and anti

(a) residues (henceforth ‘s/a pattern’) to one G-stretch, the other adopts the inverted s/a pattern. Since there are three

Gs per every G-stretch, there are 23 = 8 possible s/a

pat-terns for an antiparallel hairpin. We studied four of them:

(i) 5!-asa––sas-3!, (ii) 5!-saa––ssa-3!, (iii) 5!-ssa––saa-3!and

(iv) 5!-sas––asa-3! (Figure 1A). These four combinations

maximize the number of syn-anti and anti-anti

combina-tions of two consecutive Gs (the dinucleotide steps, 5!

-GpG-3!), which are energetically favourable when participating

in complete GQ stems (23,24). The other four possible s/a

patterns have not been simulated due to the following

rea-sons. The 5!-aaa––sss-3!and 5!-sss––aaa-3!patterns possess

two unfavourable syn-syn steps (23,24). In addition, due to

lack of any alternation of syn and anti in one G-stretch, such structures would be prone to vertical strand slippage

at Sissa on November 16, 2015

http://nar.oxfordjournals.org/

(4)

(60,61). The 5!-ass––aas-3!and 5!-aas––ass-3!contain three

unfavourable syn-syn and anti-syn steps (23,24).

Depending on the overall GQ topology, an antiparal-lel lateral loop hairpin can adopt either a narrow groove or a wide groove shape (Supporting Supplementary Figure

S1A–H) (21,22). We prepared both variants of the hairpins

for all four studied s/a patterns, which means 4 × 2 = 8

hair-pin variants (Figure1A). When in addition considering the

variants of the 3!-terminal bases (see above), we had

alto-gether 24 starting structures. Eight of them could be directly prepared from the experimental structures (Supplementary Figure S1A–H), using the first frames of the PDB files. The remaining structures were prepared manually by creating

the missing 3!-terminal nucleotide and/or adjusting the s/a

pattern. The built-up structures were subjected to extended equilibration to avoid possible instability introduced by the modelling procedure (see the Equilibration Protocol in the Supporting Information).

Parallel hairpins with propeller (double-chain reversal) loops Propeller loop occurs when the two strands are parallel, which implicates that each base pair contains nucleotides

with the same χ angle (21,22). The first G of the first

G-stretch pairs with the first G of the second G-G-stretch, etc. We have taken three starting structures: (i) the first loop from 2GKU with the adjacent G-stretches with one T at both

termini (d[TGGGTTAGGGT] with 5!-saa––saa-3!pattern;

Figure1B and Supplementary Figure S1i) (38), (ii) the last

loop from 2JPZ with the adjacent G-stretches with one

ter-minal residue at both termini (d[AGGGTTAGGGT]; 5!

-saa––saa-3!; Figure1B and Supplementary Figure S1j) (32), and (iii) the first loop from the parallel stranded X-ray structure 1KF1with the adjacent G-stretches with one

ter-minal residue at both termini (d[AGGGTTAGGGT]; 5!

-aaa––aaa-3!; Figure 1B and Supplementary Figure S1k)

(34).

We have further examined parallel hairpin with sin-gle nucleotide propeller loop, namely, the last loop from the 3 + 1 hybrid structure of the Bcl-2 promoter GQ

2F8U with the adjacent G-stretches and one 5!-terminal

T (d[TGGGCGGG]; 5!-saa-saa-3!; Figure1B and

Supple-mentary Figure S1l) (76). All propeller loops are associated

with medium grooves (Supplementary Figure S1i–l) (21,22).

Note that the term ‘parallel hairpin’ is in the literature used also for a hairpin with topology resembling the

an-tiparallel hairpin with a lateral loop, but with 5!-5!or 3!-3!

chemical bond in the loop region. When referring to paral-lel hairpins in this article, we strictly mean a structure with parallel G-stretches linked by a propeller loop.

Duplexes without loops

To separate the effects of the loops, we extracted antiparal-lel wide and narrow groove duplexes and paralantiparal-lel medium

groove duplex d[GGG]2 from the 2GKU (38) structure by

deleting the loops and all flanking residues. The

antipar-allel duplexes had the 5!-saa-3!/5!-ssa-3!pattern, while the

parallel duplex had the 5!-saa-3!/5!-saa-3!pattern. The

re-moval of loops introduces symmetry for antiparallel

ar-rangements, so 5!-saa-3!/5!-ssa-3! is equivalent to 5!

-ssa-3!/5!-saa-3!.

Unfolded single stranded chains

A single stranded d[GGGTTAGGG] B-DNA-like all-anti

helix with the 5!-aaa––aaa-3! pattern was modelled using

the NAB module of AMBER (77). Then, three other s/a

patterns were prepared manually by flipping Gs into syn orientation, targeting the distribution of syn and anti Gs of

consecutive G-stretches in the 2GKU (38) and 2JPZ (32)

GQs, namely 5!-saa––saa-3!(to facilitate folding of the

par-allel hairpins with propeller loop), 5!-ssa––saa-3! and 5!

-saa––ssa-3! (to facilitate folding of anti-parallel hairpins). We also built a d[AGGGTTAGGG] single stranded helix

with 5!-aaa––aaa-3! pattern and from that we prepared a

5!-asa––sas-3!variant, corresponding to the lateral loops in

the basket GQ 143D (31); the 5!-terminal A prevented

syn-orientation of the first G (23,24).

Triplex and quadruplex systems

In order to test a specific hypothesis about folding of the basket structure which emerged from our interim

results and our previous triplex study (61), we performed

additional simulations of three hypothetical triplex inter-mediates T1, T2 and T3 with cWH paired Gs and of a

hypothetical chair-type GQ marked as Q1 (Figure1C). We

built up the d[A(GGGTTA)2GGG] triplex T1 with the 5!

-sas––asa––sas-3!pattern and lateral (wide) → lateral

(nar-row) loop (groove) order, d[A(GGGTTA)2GGGT] T2 with

5!-asa––sas––asa-3!pattern and lateral (narrow) → lateral

(wide) loop (groove) order, and d[A(GGGTTA)2GGGT]

T3 with 5!-asa––sas––asa-3! pattern and lateral (nar-row) → diagonal loop (groove) order. The GQ Q1

had the d[A(GGGTTA)3GGG] sequence, the 5!

-asa––sas––asa––sas-3! pattern identical to the 143D GQ and lateral (narrow) → lateral (wide) → lateral (nar-row) loop order. The T1 fold was built based on the 2GKU structure by extracting the triplex with lateral (wide) → lateral (narrow) loop order and manual adjustment of the s/a pattern. The T2, T3 and Q1 folds were built from scratch, geometries of lateral loops were taken from the 2GKU. Cations were manually placed in their channels and all systems were subjected to prolonged equilibration (see Supporting Information). We performed three independent simulations of T1 and T2 and nine of T3 (Supplementary Table S1). The Q1 GQ was simulated under standard conditions for 500 ns. The final structure was then utilized for two independent unfolding no-salt simulations. In no-salt simulations, no ions are present in the simula-tion box, including the GQ channel. Selected structures observed in one of the no-salt simulations were taken as starting structures for new standard simulations with ions (refolding simulations) (Supplementary Table S1). Goal of this procedure was to observe possible structural rearrangements between Q1 and the triplex structures. No-salt simulations represent an efficient tool to investigate likely rearrangements of GQ in early stages of unfolding and the subsequent refolding simulations aim to suggest

at Sissa on November 16, 2015

http://nar.oxfordjournals.org/

(5)

movements that are relevant to late stages of folding, as

described elsewhere (60,78).

Standard MD simulation

All molecules were solvated by a truncated octahedral box of waters with minimal distance between the solute and the

box border of 10 ˚A. We have used either TIP3P (79) or

SPC/E (80) water models. These two water models differ

somewhat in their kinetic properties (81), but this should

not have any substantial systematic effect on our results, as confirmed by them. Thus, when a given system is sim-ulated with both water models, all simulations are con-sidered as equivalent. Solute molecules were first neutral-ized by counterions and then ∼0.15 M excess salt was added, either NaCl or KCl. TIP3P-adapted and

SPC/E-adapted Joung & Cheatham ion parameters were used (82).

In some simulations, Amber-adapted Aqvist Na+(83) with

Smith&Dang Cl−(84) ions in TIP3P water or Dang K+(85)

with Smith&Dang Cl−ions in SPC/E water were also used,

in order to eliminate a possibility that some detected ion

binding sites are due to specific ion parameters (86). The

no-salt simulations of the Q1 molecule, one in TIP3P and one in SPC/E water box, were performed without presence of any salt. The residual charge was neutralized adding a homogenous compensating background.

Although the type of cation may affect final thermody-namic equilibrium of diverse folds in experiments, we can

consider properties of K+and Na+ions as sufficiently

sim-ilar in MD simulations, as discussed in ref. (78). Additional

discussion about how to correctly compare simulations with

experimental techniques can be found in ref. (87). The fact

that Na+and K+can sometimes stabilize different folds

un-der the condition of thermodynamic equilibrium does not mean that the other fold is entirely disrupted. Both folds are likely present with both cations, though the cations may modulate their relative free energies in such a way that one of them becomes undetectable. However, it does not signif-icantly affect the kinetic stabilities and dynamics of both

topologies on the !s time scale of the simulations (60). In

other words, cation differences do not affect computations that do not directly investigate the free energy difference or transitions between the two topologies and with respect to the unfolded state. Thus, we do not expect any dramatic ef-fect of the ion type or specific ion parameters on our re-sults. When addressing specific properties where we sus-pected that the type of ion could affect our conclusions, we

made simulations with both Na+and K+. Solvation,

addi-tion of ions and manual adjustment of s/a patterns were

done in the xLEaP module of AMBER (77).

Parmbsc0 (88) + parmχOL4 (89) + parm"ζOL1 (90)

ver-sion (bsc0χOL4"ζOL1) of the Cornell et al. force field (91,92)

was used in almost all simulations (for parameters, see

Am-berTools15 May 2015 update orhttp://fch.upol.cz/en/ff ol/

index.php). The bsc0 modifies the α/γ balance to

basi-cally stabilize DNA simulations. The χOL4improves the

bal-ance of syn/anti nucleotide conformers and "ζOL1

modi-fies the backbone " and ζ dihedrals. The χOL4 and "ζOL1

adjustments improve simulations of GQs as well as

B-DNA (89,90). In few simulations of the parallel stranded

G-hairpins other force field combinations were also

em-ployed to test force field sensitivity of the results: bsc0χOL4,

bsc0"ζOL1, bsc0 alone or parm99 (93). The bsc0 alone

was used for six standard simulations of unfolded single stranded chains, since they were done at the beginning of the project. We did not repeat these oldest simulations with the

new bsc0χOL4"ζOL1version since their basic outcome was

then confirmed by the extended T-REMD simulation using

bsc0χOL4"ζOL1. See (63) for overview of currently available

nucleic acids force fields.

Electrostatic interactions were calculated using the

parti-cle mesh Ewald method (PME) (94,95), non-bonded cutoff

was set to 9 ˚A. Temperature was held at 300 K and pres-sure at 1 atm using the Berendsen weak-coupling

thermo-stat and barothermo-stat (96). The SHAKE algorithm was applied

to all bonds involving hydrogens (97), and the integration

step was set to 2 fs. The molecules were equilibrated as de-scribed in the Supporting Information. Production phase of the simulations was performed with GPU version of pmemd

module of AMBER 12 (77). Trajectories were processed in

the ptraj module of AMBER and visualized in VMD. T-REMD simulation

In order to enhance sampling of the d[GGGTTAGGG]

sin-gle strand, we employed T-REMD (75). As starting

struc-tures, we utilized unfolded ssDNAs differing in orientations of the glycosidic torsions of all six Gs. Namely, we

con-structed all 64 possible s/a variants (considering six Gs: 26

= 64) and these were taken as input structures for the T-REMD calculation involving 64 replicas. We checked that syn ↔ anti transitions were sufficiently sampled at all gua-nines and all replicas, so the final population of s/a variants was not biased by their distribution in the starting structures (Supplementary Figure S2). Such choice of different start-ing structures should result in better convergence compared to the usage of one uniform starting structure for all repli-cas, though we in no case claim that our results are quantita-tively converged. The TTA residues were in anti orientation in all starting structures.

The starting topology and coordinates of

d[GGGTTAGGG] ssDNA were prepared using the

tLEaP module of AMBER 12 (77). All structures were

solvated using rectangular box with a 10 ˚A thick layer of

SPC/E water molecules surrounding the solute (80). The

T-REMD simulation was performed in ∼0.15 M NaCl

salt excess (84,98). All replicas shared the same number of

water molecules, numbers of ions and size of the simula-tion box. All structures were minimized and equilibrated using equilibration protocol as described in Supporting Information. The T-REMD simulation was carried out

using the AMBER suite of programs with bsc0χOL4"ζOL1

(88–90). The temperatures of all 64 replicas spanned the

range of 278–444.9 K, achieving exchange rates between 22 % and 28 %. The T-REMD simulation was performed at constant volume using the NVT ensemble in each replica, using PME with 1 ˚A grid spacing and 10 ˚A real-space

cutoff. Since using the weak coupling approach (96) in

combination with T-REMD can lead to artefacts (99),

we used Langevin dynamics with friction coefficient of

2 ps−1 as a thermostat in all replicas. Exchanges were

attempted every 10 ps. Each replica was simulated to 1 !s,

at Sissa on November 16, 2015

http://nar.oxfordjournals.org/

(6)

accumulating thus 64 !s in total. The trajectories were sorted either to follow continuous trajectories or to follow a given temperature in the ladder.

RESULTS AND DISCUSSION

Antiparallel G-hairpins can readily participate in folding of GQs

We have carried out a series of standard 200-ns-long simula-tions of 24 different lateral-loop antiparallel G-hairpins

us-ing the bsc0χOL4"ζOL1DNA AMBER force field, in order

to investigate their basic structural stability. We have carried out 6 independent simulations for each hairpin, resulting in 144 simulations with aggregate time 28.8 !s. Although we cannot claim that it provides a converged statistics, it should be sufficient to eliminate the most random results. Each sim-ulation was carried out with different combination of water model/ion parameters. However, we consider all six simula-tions as equivalent, as we noticed no systematic effect of the solvent parameters on the simulation outcome. The simula-tion time scale was chosen to observe a representative spec-trum of structural transitions and should be sufficient to decide if a given hairpin is intrinsically stable enough (hav-ing sufficient life-time) to directly participate in GQ fold(hav-ing pathways. This part of our work essentially monitors the unfolding rate of the folded G-hairpins. Note that the life-time of the G-hairpins in the simulations may be underes-timated by the force field and thus we estimate rather the

bottom border of their lifetimes (61). Nevertheless, our

re-sults should correctly reflect their relative structural stability order.

The simulations suggest that antiparallel G-hairpins are stable enough to participate in the folding pathways of GQs. At the same time, the simulations reveal exceptionally rich dynamics including entirely stable simulations, various local perturbations, transitions between narrow and wide groove hairpins, spontaneous formation of diagonal loop G-hairpins, strand slippage movements, complete unfolding events and even rare re-folding events. The dynamics is sen-sitive to the flanking nucleotides.

The basic results are summarized in Tables1–3. We

di-vide the simulations into three groups: (i) Simulations with at least two initial cWH GG base pairs present at the sim-ulation end. This means either completely stable structures or simulations with a transient reversible visit of geometries different from the starting one. This group of simulations is referred to as stable simulations further in the text. (ii) Sim-ulations with perturbed (compared to the start) structures at the end, nevertheless still keeping an overall G-hairpin-like structure; these include diagonal G-hairpins with trans

Watson–Crick/Watson–Crick (tWW) (20) base pairs,

hair-pins that underwent narrow → wide groove transition (or vice versa), locally misfolded hairpins with strand slippage, etc. (iii) The last group includes disrupted (i.e. unstable, un-folded) structures, usually unwound ss-helices or coils. The first group of structures would be competent to promote folding to the desired topology within the context of the full sequence while the second group would need only minor re-arrangements to do so. The third group is unsuitable for any immediate folding attempt. 73, 38 and 33 simulations were classified as stable, perturbed and unstable, respectively.

d[AGGGTTAGGG] antiparallel G-hairpin

Table1summarizes the simulations of the d[AGGGTTAG

GG] hairpin. Its narrow groove variant displayed

remark-able stability for the naturally occurring 5"-asa––sas-3", 5"

-saa––ssa-3"and 5"-ssa––saa-3"patterns, known from exist-ing GQ atomistic structures. 17 simulations were stable, 1 simulation perturbed and none was unstable. Nevertheless, the structures were not rigid and exhibited rich local

dynam-ics. The ‘non-native’ 5"-sas––asa-3"pattern was less stable,

with 2 simulations entirely disrupted and 1 simulation per-turbed.

The wide groove d[AGGGTTAGGG] hairpin was visibly

less stable for all four studied s/a patterns. The 5"

-saa––ssa-3"pattern had a tendency to form tWW GG base pairs. Still, 2 of its 6 simulations resulted in a complete structure loss and 1 into a modest perturbation. The remaining three s/a patterns entirely unfolded in 3–5 of the 6 individual

simula-tions (Table1).

d[AGGGTTAGGGT] antiparallel G-hairpin

Among the narrow groove hairpins, only the 5"

-asa––sas-3" pattern remained completely stable in all 6 simulations

(Table 2). The other narrow hairpin s/a patterns showed

tendency to change the cWH GG pairing into the tWW diagonal G-hairpin pairing, in 12 out of 18 simulations.

In 3 simulations of the ‘non-native’ 5"-sas––asa-3"

pat-tern, the rearrangements proceeded further into the wide

groove G-hairpin (Figure 2), i.e. we evidenced a

sponta-neous rearrangement of the structure from narrow to wide G-hairpin through a diagonal hairpin. Important factor affecting the simulations was a formation of the terminal canonical cWW AT base pair between the first and last nu-cleotide of the whole sequence. This event always preceded the transition from cWH into tWW GG pairing; the stable 5"-asa––sas-3"narrow groove pattern never formed the ter-minal AT base pair. The role of the AT base pair is a nice example how a specific interaction with non-G bases can modulate structural dynamics of G-hairpins at the atom-istic level, in this particular case the balance between narrow groove, diagonal and wide groove antiparallel hairpins.

For wide groove d[AGGGTTAGGGT] hairpins, strand slippage was observed in 9 out of 24 simulations. The

struc-tures slipped towards the 5"-end can be stabilized by

for-mation of a base pair between the first G of the second

G-stretch (G8) and the first T of the loop (T5) (Figure3) or

a base-phosphate interaction between amino group of G8 and the T6 phosphate group. Out of the four s/a patterns,

the 5"-saa––ssa-3"variant was the most stable, with no

un-folding and only one case of strand slippage. The patterns 5"-asa––sas-3" (the most stable one in the narrow variant,

see above) and 5"-sas––asa-3"were the least stable (Table2).

d[AGGGTTAGGGA] antiparallel G-hairpin

The narrow groove 5"-asa––sas-3" and 5"-ssa––saa-3"

pat-terns were the most stable, each of them in 5 simulations

(Table3). They formed stacking of the terminal As on each

other and on the terminal GG pair. The other two s/a pat-terns were perturbed or stable to approximately the same

at Sissa on November 16, 2015

http://nar.oxfordjournals.org/

(7)

Table 1. Outcome of all 48 simulations (200 ns each) of the antiparallel d[AGGGTTAGGG] hairpin with the bsc0χOL4"ζOL1force field

Groove width s/a pattern Water and ionic parametersa

5!-asa–-sas-3! 5!-saa–-ssa-3! 5!-ssa–-saa-3! 5!-sas––asa-3!

Narrow 3 O 1,2,3 tW* 3O*;1,2,3 tW* P: 1 O TIP3P NaCl J&C

3 O* 2 tW 3 tW* TIP3P KCl J&C

3 O* 2 tW TIP3P NaCl Aq.

3 O* U: S5 SPC/E NaCl J&C

3 O SPC/E KCl Aq.

P: 1,2,3 tW 2,3 tW* U: 1,2,3 O SPC/E KCl Dang

Wide U: S5 U: S5 U: 1,2,3 O*; S5 U: 1,2,3 O TIP3P NaCl J&C

P: S5 1,2,3 tW U: 1,2,3 O P: 2,3 O* TIP3P KCl J&C

3 O U: 1,2,3 tW See text U: 2,3 tW*; 1,2,3 O TIP3P NaCl Aq.

U: S5 P: 1,2,3 tW U: 1,2,3 O SPC/E NaCl J&C

U: 1,2,3 O U: 1,2,3 O SPC/E KCl Aq.

U: 2,3 O; 1 tW P: 1 O; 2,3 tW P: G; 1 O SPC/E KCl Dang

Final structures in the 12 × 4 matrix are classified in the following way: no text or regular font text mean that the final structure resembles the starting structure with at least two native cWH GG pairs at the end of the simulation (loss of one base pair is tolerated). ‘P:’ stands for any other hairpin-like structure at the end (perturbed structures, see the text). ‘U:’ means entire disruption of the hairpin, e.g. unwinding to ss-helix or coil formation. Characteristic structural rearrangements in the simulations are described by the following symbols: ‘O’ means opening (loss) of a particular GG pair (1–closing the loop, 2–middle, 3–terminal pair), ‘tW’ means transition of the GG pair from cWH to tWW geometry, ‘S5’ and ‘S3’ mean strand slippage towards the 5!- or 3!-terminus with base pairing shifted by one base (Figure3depicts example of a structure with S5 slippage), ‘G’ means conversion from

initial narrow groove to wide groove or vice versa (see Figure2). ‘*’ denotes that the particular rearrangement was reversible and the initial cWH pairing

was restored. Rearrangements shown for the unstable (U:) trajectories are always temporary, i.e. we mark those seen before the entire loss of the structure. If more structural changes occurred, they are divided by ‘;’ and ordered chronologically.

aJ&C: TIP3P-adapted or SPC/E-adapted Joung & Cheatham parameters; Aq: Amber-adapted Aquist Na+with Dang & Smith Cl-parameters; Dang:

Dang K+with Dang & Smith Cl-paremeters.

Table 2. Outcome of all 48 simulations (200 ns each) of the antiparallel d[AGGGTTAGGGT] hairpin

Groove width s/a pattern Water and ionic parametersa

5!-asa–-sas-3! 5!-saa––ssa-3! 5!-ssa––saa-3! 5!-sas––asa-3!

Narrow P: 2,3 tW; 1 O P: 2,3 tW; G* P: 1,2,3 tW TIP3P NaCl J&C P: 1,2,3 tW P: 1,2,3 tW P: 1,2,3 tW; G* TIP3P KCl J&C

P: 2,3 tW P: 1,2,3 tW P: G; 1 O TIP3P NaCl Aq.

SPC/E NaCl J&C

P: 2,3 tW SPC/E KCl Aq.

P: 1,2,3 tW P: G; 1 O SPC/E KCl Dang

Wide U: 1,2 tW P: 1,2,3 tW U: S5*; 1,2,3 O TIP3P NaCl J&C

U: S5 P: S5 U: 1,2,3 O TIP3P KCl J&C

3 O P: 1,2,3 tW U: S5 P: S5 TIP3P NaCl Aq.

3 O U: S5 SPC/E NaCl J&C

U: S5 P: S5 P: 1 O SPC/E KCl Aq.

U: 1,2,3 O P: S5 P: 1,2 O SPC/E KCl Dang

Length of the simulations is 200 ns. See the legend of the Table1for further details.

aJ&C: TIP3P-adapted or SPC/E-adapted Joung & Cheatham parameters; Aq: Amber-adapted Aquist Na+with Dang & Smith Clparameters; Dang:

Dang K+with Dang & Smith Clparemeters.

extent. Formation of the diagonal loop with tWW GG pair-ing was observed in 8 simulations, almost evenly distributed between the s/a patterns. One large-scale refolding event after initial structure loss was observed (see below). The wide groove hairpins exhibited roughly equal stability as in

case of the d[AGGGTTAGGGT] hairpin. The 5!

-saa––ssa-3!pattern never unfolded, with formation of tWW GG pairs

in half of the simulations. The other three s/a patterns un-folded in two simulations each.

Set of interactions stabilizing the 5!-xxa––sxx-3!lateral-loop

patterns

The simulations suggest that, regardless of the flanking

residues, the non-native 5!-sas––asa-3!lateral loop pattern

suffers from opening of the GG base pair adjacent to the

loop (the closing pair) while the native patterns are stable. In case of the narrow hairpins, it can be explained as fol-lows. If the first G after the loop is syn-oriented, it usually forms intra-nucleotide hydrogen bond between its amino

group and its phosphate group (Figure4). This can stabilize

all 5!-xxa––sxx-3!patterns, but none of the 5!-xxs––axx-3!

patterns. This hydrogen bond is usually present in full GQs with narrow grooves (e.g. in 143D, 2MBJ, 2HY9, 2GKU and 2JSM) and can be considered as a native GQ inter-action. Such intra-nucleotide base-phosphate interaction sometimes appears, albeit with much lower population, also within the second or terminal syn G in the second G-stretch.

Further stabilization of the 5!-xxa––sxx-3!patterns may

come from two additional hydrogen bonds of the first T, namely between T(O4) and the anti-oriented G(N2)

at Sissa on November 16, 2015

http://nar.oxfordjournals.org/

(8)

Table 3. Outcome of 48 simulations (200 ns each) of the antiparallel d[AGGGTTAGGGA] hairpin

Groove width s/a pattern Water and ionic parametersa

5!-asa–-sas-3! 5!-saa–-ssa-3! 5!-ssa–-saa-3! 5!-sas––asa-3!

Narrow TIP3P NaCl J&C

3 O U: 2,3 O U: 3 O; 1,2,3 tW U: 1,2,3 tW TIP3P KCl J&C

P: 1*,2,3tW; 3O* U: 1,2,3 tW; G 1,2,3 tW* U: 1,2,3 tW TIP3P NaCl Aq.

3 O SPC/E NaCl J&C

3 O P: 2,3 O SPC/E KCl Aq.

3 O P: 1,2,3 tW 2,3 tW* U: See text SPC/E KCl Dang

Wide 3 O 1,2,3 tW* P: S3 U: 1 O TIP3P NaCl J&C

3 O P: 1,2,3 tW S3*; 1*,2*,3O TIP3P KCl J&C

U: 2,3 O P: 1,2,3 tW P: S3 P: S5 TIP3P NaCl Aq.

3 O P: 1,2 O SPC/E NaCl J&C

3 O P: 3 O; S3* P: 1 O SPC/E KCl Aq.

U: 3 O U: S3 U: 1 O SPC/E KCl Dang

Length of the simulations is 200 ns. See the legend of the Table1for further details.

aJ&C: TIP3P-adapted or SPC/E-adapted Joung & Cheatham parameters; Aq: Amber-adapted Aquist Na+with Dang & Smith Cl-parameters; Dang:

Dang K+with Dang & Smith Cl-paremeters.

Figure 2. Three types of antiparallel hairpins: narrow groove lateral loop hairpin with cWH pairs (top), diagonal hairpin with tWW pairs (middle) and wide groove lateral loop hairpin with cWH pairs (bottom). The struc-tural schemes are visualized as in the Figure1.

ceding the loop, and between the T(O2) and the loop A(N6) (Supporting Supplementary Figure S3). These two H-bonds are probably relatively weak because they are not fully linear. They are rather rarely seen in the experimen-tal structures, e.g. in 143D and 2JSM. This may be due to

Figure 3. Antiparallel d[AGGGTTAGGG] hairpin slipped towards its 5!-terminus by one residue (this direction of slippage is marked as S5 in

the text). (Top left) Atomistic structure: syn-oriented Gs are orange, anti-oriented Gs yellow, the first thymine in the loop is green and the backbone of other loop residues is tan. (Top right) Secondary structure scheme. Red and black colours correspond to anti and syn conformation, respectively. Grey residues are not depicted in the atomistic structure. The two GG base pairs are non-native tWH, the GT base pair is tWW. (Bottom) Comparison of GQ cWH and non-GQ tWH base pairs.

the fact that the A is, in fully folded GQs, often paired with bases from other loops. Nevertheless, the shape of the lat-eral loop in our stable G-hairpin simulations is very similar to the shape of the loops associated with narrow grooves of almost all human telomeric GQs experimental struc-tures, except of 2JPZ. Perhaps, the weak T(O4)–G(N2) and T(O2)–A(N6) hydrogen bonds seen in our G-hairpin simu-lations are disrupted upon formation of more stable native interactions in later GQ folding stages. The structure of the loop is also supported by stacking of the middle T with the A (Supplementary Figure S3), which is seen also in com-plete GQs. Thus, some ‘native’ G-hairpins may be weakly favoured during the folding and some of their native

at Sissa on November 16, 2015

http://nar.oxfordjournals.org/

(9)

Figure 4. (Left) Intra-nucleotide base-phosphate hydrogen bonding be-tween amino and phosphate groups of syn-oriented G following the lat-eral TTA loop in the narrow groove hairpins. (Right) Two dominant cation binding sites of narrow groove hairpins associated with the syn-anti GpG dinucleotide step: near O6 carbonyl oxygens and in the narrow groove. The ions are also visible in the left part as transparent spheres. Syn-oriented Gs are in orange, anti-oriented Gs in yellow, backbone of other residues is tan and the cations are cyan.

actions may locally influence the folding free energy land-scape in the early stages of folding. Such specific molecular interactions may contribute to the GQ topology rules, see

(22) for a recent review.

Cation binding in antiparallel G-hairpins

The simulations reveal notable cation binding sites around the G-hairpins. Two sites are associated with the -sa- GpG steps: a carbonyl binding site in both narrow and wide hair-pins, equivalent to the channel binding site between two quartets in GQs, and a narrow groove binding site, where the cation is located between phosphates in the narrow

groove (Figure4). We have already observed the latter

bind-ing site in G-triplex intermediates (61) and in long

simula-tions of GQs (unpublished data). Occupation of both sites

by Na+ is roughly equal when the -sa- GpG step is

adja-cent to the loop. The narrow groove binding site is weaker when the -sa- GpG step involves the terminal GG pair, per-haps due to greater fluctuations of the terminal nucleotides affecting spatial orientation of the phosphate groups. Potas-sium cations were attracted to the carbonyl binding site more than to the narrow groove site, with Dang parame-ters leading to the weakest groove binding. This behaviour

could be attributed to smaller radii of the Na+ compared

with K+. Nonetheless, the cation binding does not seem to

be critical for stability of narrow hairpins. We observed sta-ble narrow hairpins in the no-salt simulation of the hypo-thetical Q1 chair GQ reported below.

No significant groove binding was observed for wide hair-pins and diagonal hairhair-pins with tWW base pairs, if formed in the simulations. All antiparallel hairpin geometries, nar-row, wide, diagonal and even misfolded with slipped strands could temporarily bind cations in the loop region, depend-ing on the actual spatial arrangement of bases in the loop. In summary, the simulations did not find any major differ-ence between NaCl and KCl simulations which could

ex-plain the experimentally known effect of the ions on the equilibrium GQ structures. However, this result can be af-fected by the short time scale of our simulations and inabil-ity of the force field to capture the difference. In addition, it is possible that the effect is not deducible from proper-ties of isolated G-hairpins. Perhaps, the subtle differences

between the Na+and K+when bound to the narrow groove

and carbonyl binding sites may potentially contribute to the sensitivity of the thermodynamically preferred GQ topolo-gies to the type of the ion. However, quantifying this issue would require different types of computations and would be exceptionally difficult.

Why is the first G in a narrow groove following a lateral loop usually syn?

As noted above, antiparallel hairpins with the naturally oc-curring s/a patterns were found to be more stable than the ‘non-native’ variant. This might indicate that they are bet-ter poised to serve as inbet-termediates in GQ folding

path-ways (40,41,56,72,73). Possibly, their higher intrinsic

stabil-ity may even be transferred to the full GQ structures. The observations of cation binding support previous calcula-tions predicting the binding site on the carbonyl side of -sa-GpG steps to be preferred over the -aa-, -ss- and -as- steps

(56). It is known that the first nucleotide after narrow lateral

loop in GQs is usually syn (22). We speculate that it could

be explained by the potential of the -sa- GpG steps to bind cations in narrow groove, the above-explained capability of syn-oriented Gs to form internal hydrogen bond between the amino and phosphate groups and perhaps formation of the two above-described weaker H-bonds within the loop. Diagonal antiparallel G-hairpins are viable intermediates Diagonal hairpins are antiparallel hairpins with tWW GG base pairs. Although all our simulations started with lateral G-hairpins with cWH GG base pairs, we observed sponta-neous sampling of the diagonal G-hairpins in many

simula-tions (Tables1–3). Detailed population of different hairpins

is given in the Supporting Information Appendix Tables SA1–SA24. The diagonal G-hairpins compete mainly with the wide groove G-hairpins, though we have also seen sim-ulations where diagonal hairpins served as intermediates in the interconversion between narrow and wide groove

hair-pins (Figure2); both directions were sampled in the whole

simulation set. Thus, there has been no need to start addi-tional simulations using the diagonal tWW G-hairpin ar-rangement. We conclude that antiparallel G-hairpins can readily support formation of narrow lateral, wide lateral and diagonal loops, allowing easy rearrangements between them. All these G-hairpins can easily participate in GQ ing and rearrangements. Role of diagonal hairpins in

fold-ing of the hybrid GQ has been suggested (43).

Parallel G-hairpins with propeller loops are entirely unstable in simulations

Out of the 19 individual 200 ns simulations of parallel G-hairpins with TTA loops, 18 unfolded; 9 even within the first 10 ns. The behaviour was independent of the used force

at Sissa on November 16, 2015

http://nar.oxfordjournals.org/

(10)

Figure 5. (Left) Native parallel hairpin with cWH GG base pairs. (Right) Typical intermediate in the counter-rotation unfolding mechanism. The two G-stretches (purple and yellow) are nearly perpendicular and the struc-ture of the V-shaped propeller loop is already lost. Backbone of other residues is tan.

field version (Supplementary Table S2). The most com-mon unfolding pathway started with counter-rotation of the strands into structures with roughly perpendicular

G-stretches (Figure5). Then, it proceeded into various coil or

ss-helix geometries. Once significantly unfolded, the parallel hairpin never reformed. All five simulations of the studied single-nucleotide propeller loop G-hairpin also unfolded.

When combining the present results with earlier studies

of other systems (54,60,64,78,100), we see the following

sim-ulation behaviour of parallel G-hairpins. They are entirely stable in simulations of full GQs stabilized by the channel

ions, even in absence of the bulk ions (78). However, they are

very unstable in all other situations, namely, in simulations

of GQs lacking the channel ions (54,60), in G-triplexes even

in excess salt conditions (61) and in isolation (the present

work). This indicates that the propeller loop structures are held by the GQ core and do not contribute to the stability of the GQ structure. This simulation picture indirectly con-trasts with the abundance of the propeller loops in known experimental structures of GQs. The single nucleotide pro-peller loop is actually supposed to be one of the most stable

loops (101–105).

In the whole simulation set we observed one transforma-tion of a parallel G-hairpin with TTA propeller loop and one with the single nucleotide propeller loop into antipar-allel arrangements. These transitions were not accompanied by any syn ↔ anti flips and thus did not result in perfect

G-hairpins. The 5"-saa––saa-3"hairpin formed a slipped

struc-ture with two cWH GG base pairs. Such antiparallel hair-pin could straightforwardly serve as an intermediate in

for-mation of GQs with two quartets. The 5"-saa-saa-3"

hair-pin adopted an antiparallel structure stabilized by tSS-like (trans sugar edge/sugar edge) GG pair closing the loop, tWH GG base pair in the middle and terminal tWW GG pair (Supplementary Figure S4). These two simulations di-rectly visualize that the parallel structures are less stable than the antiparallel arrangements.

The reasons of the instability of the parallel G-hairpins in simulations are unclear. One option is that it is a gen-uine property of these hairpins. However, then their com-mon presence in the experimental structures would be non-trivial, and their formation could be one of the bottlenecks of the folding attempts of the individual molecules. Al-ternatively, some approximations of the force field could under-stabilize the propeller loops in simulations. It could be caused by inaccurate description of the backbone

sub-states needed for the chain-reversal topology (100) or of the

interactions between the closely spaced phosphate groups. Although the DNA backbone dihedral potentials have been extensively optimized in the past, the simple pair-additive DNA force field model remains imperfect, as demon-strated by comparison with quantum chemical benchmarks

(24,106). Our ability to tune the force field by simple

re-finements of the essentially unphysical dihedral potentials

is quite limited (63).

Absence of loops introduces instability

Simulations of the cWH paired d[GGG]2 duplexes

with-out any loop and flanking residues are unstable, regardless of being parallel or antiparallel (Supplementary Table S3).

Perhaps, instability of the cWH d[GGG]2 duplexes is

pro-moted by introduction of another DNA terminus by the loop excision, which increases susceptibility of such a short

duplex to unfold due to simulation end effects (107).

Never-theless, instability of d[GGG]2duplexes does not pose any

problem to construct the intramolecular GQ folding path-ways. Further details are described in Supporting Informa-tion. A plausible model of formation of intermolecular GQs lacking loops does not require cWH paired duplexes as

in-termediates (58).

Single stranded helices tend to form misfolded antiparallel hairpins in standard simulations

Four standard simulations of d[GGGTTAGGG] and two of d[AGGGTTAGGG] starting from the unfolded (straight) single stranded helix with different s/a patterns were done (Supplementary Table S4). They exhibited a tendency to bend towards the antiparallel-hairpin-like ar-rangements. The simulations resulted into a variety of mis-folded structures with strand slippage. None of them was a properly folded hairpin with three GG pairs and a TTA loop, even though some of the helices started with manually pre-ordered self-complementary s/a GQ pattern.

Neverthe-less, the simulation with 5"-saa––ssa-3"pattern attempted a

move towards the right G-hairpin. It has formed the lateral TTA loop with the correct adjacent cWH GG base pair. However, further progression of the folding was blocked by stacking and base-phosphate interactions of the other Gs (Supplementary Figure S5). This is an atomistic exam-ple how non-native interactions enhance ruggedness of the folding landscape. The initial pre-folding event occurred at 900 ns. We have prolonged the simulation to 1500 ns, but no further development has been seen. Further details about the simulations are given in Supporting Information. Rare refolding events observed in simulations of the antipar-allel G-hairpins

As explained above, a few dozens of simulations of antipar-allel G-hairpins resulted into a complete unfolding of the initial structure. We observed two cases of spontaneous re-folding, when essentially the starting conformation has been re-established after entire unfolding. Such simulations pro-vide unique atomistic insights into the potential G-hairpin folding pathways and are in detail described in Supporting

at Sissa on November 16, 2015

http://nar.oxfordjournals.org/

(11)

Information. However, we still caution that both re-folding events might have been affected by the structural memory effect and the molecules might have been bouncing back from the transition state ensemble without reaching a true

unfolded state ensemble (108).

T-REMD simulation of d[GGGTTAGGG] reveals enormous richness of its conformational space with a clear trend to form antiparallel hairpin-like structures

Since standard simulations were not efficient to study fold-ing of the G-hairpins, we attempted an enhanced samplfold-ing method. We have selected the temperature-accelerated T-REMD approach, a robust method often considered as benchmark for small systems, such as DNA hairpin loops

with canonical B-DNA stem (109,110), UNCG and GNRA

RNA hairpin loops (111,112) and RNA tetranucleotides

(113,114). Although the T-REMD method does not allow

derivation of kinetics of the folding, the reference replica at a given temperature should give the correct thermodynam-ics (relative population of different species), provided the simulation is converged.

Our T-REMD run simulated 64 independent replicas, each for 1 !s. It has been initiated from a straight ssDNA helix, with each replica initially adopting one of the 64 pos-sible s/a patterns of the G-stretches, to sample distinct re-gions of the conformational space since the beginning of the simulation. We carefully monitored all 64 replicas at all tem-peratures, however, we decided to not report all populated structures in detail. It would mean to deal with hundreds of different structures, which is not necessary for the purpose of the present work.

Although the simulation was certainly still far from been rigorously converged, we have achieved broad sampling. The fully paired antiparallel G-hairpins were spontaneously reachable, but their population was only about 0.01% (see below). It may reflect complexity of the conformational space (i.e. the T-REMD was still too short), some under-estimation of the stability of the ideally folded G-hairpins

by the force field (61) or real absence of such structures as

dominantly populated species in the equilibrium ensemble. It will be interesting to see if future experiments directly vi-sualize formation of well-paired G-hairpins or rather reveal a complex mixture of diverse structures similar to that pre-dicted by the T-REMD run. Until today, the most direct ev-idence of formation of stable G-hairpins came from DNA

origami experiments by Sugiyama’s group (73) and NMR

experiments by Plavec’s group (115). The first method,

how-ever, did not have atomistic resolution while the NMR study predicted dimerization of diagonal hairpins for the Oxytricha nova sequence d[G4T4G4]. The longer G-tracts and the dimerization may bring additional thermodynamic stabilization to the G-hairpins.

Nevertheless, the most important result was that our T-REMD easily reached structures resembling antiparallel G-hairpins, except that they were not fully paired (see below). These structures actually formed ∼17% of population of the reference 278 K replica. Therefore, a broad spectrum of structures that can support lateral and diagonal loops is eas-ily accessible. On the other hand, we have seen absolutely no tendency to form any structures that could lead to parallel

G-hairpins with propeller loops; not a single such fluctua-tion was detected. All these results were consistent with the preceding parts of the paper.

Formation of rather compact bent anti-parallel confor-mations was a hallmark of all replicas, with a slightly higher propensity of high-temperature replicas to adopt straight conformations. In order to monitor the global bending, we measured the end-to-end distance, i.e. the distance between

centres of mass of the terminal guanines. Figure 6

sum-marizes the population of the bent and straight conforma-tions in the reference 278 K replica as well as the popu-lations summed over all temperatures. The minor popula-tion of the end-to-end distance of 28–30 ˚A corresponding to the straight single strand conformation (region III in

Fig-ure 6) was well separated from the other geometries. The

other structures were various types of bent conformations. They included conformations with stacked terminal Gs, ei-ther similar to the GQ-like hairpin helix with stacked raei-ther than base-paired terminal guanines or a ring-like confor-mation with two bends and stacked terminal bases (region

I in Figure6), different types of base-pairing of the terminal

guanines (region II in Figure6) and conformations, where

the terminal Gs did not directly interact (also region II in

Figure6) such as structures with inter-strand stacking in

hairpin-like conformation with shifted base pairing within the stem. Further analysis (see Supporting Information) re-vealed high propensity of bending in the TTA region at low temperatures, while at high temperatures the preference was

less pronounced (Figure7). Formation of the tWW or cWH

GG pair (both wide and narrow groove geometry) adjacent to the loop was found to be the most frequent out of all pos-sible GG base pairs in the reference replica (Supplementary Table S5), with 17.3% population. All such structures can be considered as the first step towards forming the antiparallel G-hairpins, although further zipping to two and three base pairs was seen with only ∼0.16% and ∼0.01% population, respectively.

Our T-REMD achieved sufficient amount of syn ↔ anti transitions (Supplementary Figure S2), in contrast to the standard simulations. REMD accelerated the sampling by temperature, so at higher temperatures, it is more likely to overcome the potential energy (enthalpic) barriers and achieve syn ↔ anti transitions. The total population of s:a orientations was almost uniformly ∼40:60 for G2, G3, G7,

G8 and G9. The 5#-terminal guanine G1 revealed swapped

preference favouring syn orientation in rate ∼80:20,

sta-bilized by a 5#-OH. . . N3 intramolecular hydrogen bond

(23,24). Using these populations, we can calculate idealized

populations of all 64 s/a patterns and compare them with the actual T-REMD results. Idealized populations would be achieved when the guanines sample the syn ↔ anti tran-sitions independently. We indeed found that with increas-ing temperature of the replicas, the observed s/a patterns converged to the ideal populations. So, at high tempera-tures the s/a orientations of Gs were mutually independent and mostly driven by entropy, guaranteeing proper sam-pling of the pool of all s/a combinations. On the other hand, we found several s/a patterns that were significantly more populated at low temperatures compared to the idealized

populations, e.g. 5#-sss––asa-3#, 5#-sas––saa-3#, 5#

-saa––saa-3# and 5#-sas––asa-3# (Supplementary Table S6).

at Sissa on November 16, 2015

http://nar.oxfordjournals.org/

(12)

Figure 6. T-REMD simulation of d[GGGTTAGGG]. The population of end-to-end distance represented by a distance between centres of masses of terminal guanine bases in the reference 278 K replica and in all 64 replicas. The region I with end-to-end distance bellow 5 ˚A corresponds to structures with stacked terminal guanines, region II includes native conformation with terminal GG base pair as well as the other bent conformations with strand slippage, and region III corresponds to the straight ssDNA. (A) circular structure containing G3-G9 and T4-G8 base pairs and stacked terminal guanines; (B) hairpin containing G2-G8 and G3-G7 base pairs and stacked terminal guanines; (C) hairpin with G2-G9, G3-G8 and T4-G7 base pairs; (D) hairpin with G1-G9, G2-G8 and G3-G7 base pairs; (E) hairpin with G1-G8, G2-G7 and G3-A6 base pairs; (F) unfolded ss-helix structure.

Figure 7. The bending propensity at different nucleotides along the d[GGGTTAGGG] sequence observed in the T-REMD simulation. The bent state was defined by N-1(C1!). . . N(C1!). . . N + 1(C1!) angle below 120(see Supporting Information for further details). The lines correspond to replicas following

temperature and are coloured according to the growing temperature from violet to maroon. The y-axis represents the population of the bend located at a particular nucleotide (x-axis) averaged over the entire replica. The representative structures taken from the T-REMD simulation correspond to the most populated bent conformations.

at Sissa on November 16, 2015

http://nar.oxfordjournals.org/

Riferimenti

Documenti correlati

Il lavoro è stato suddiviso in sei parti: analisi, caratterizzazione e schematizzazione delle apparecchiature; analisi dei modelli presenti in letteratura;

Con saggi di: Gianpiero Alfarano, Francesco Armato, Fabrizio Arrigoni, Giovanni Bartolozzi, Andrea Branzi, Alessia Brischetto, Elisabetta Cianfanelli, Luigi Dei, Maria Grazia

Considerando il ruolo chiave che gli acidi biliari hanno nel controllo del metabolismo glicidico e lipidico, interagendo con i loro recettori, FXR e TGR5, lo scopo del

(a) Flickr Beauty in London (b) Flickr Beauty in Boston (c) Shortest in Boston Figure 6: Maps showing the different paths with beauty scores predicted from Flickr’s metadata.. That

This idealized case is obviously unphysical, but it would make the parton world-line space-like: then, it ought to be possible to extract the soft contribution to the

La selezione ha permesso di raccogliere 117 lesioni a presentazione dermoscopica atipica, classificate al confocale come benigne (71 cheratosi seborroiche e 18

We report the number of subjects and gesture classes and the types of data included: RGB images, depth maps (acquired with Structured Light (SL) or Time-of-Flight (ToF)

The journals selected were: Cities; Computers, Environment and Urban Systems; Design and Culture; European Journal of Futures Research; Futures; Government Information