• Non ci sono risultati.

4.2 Benchmarking and application of the protocol

N/A
N/A
Protected

Academic year: 2021

Condividi "4.2 Benchmarking and application of the protocol"

Copied!
31
0
0

Testo completo

(1)

Figure 4.7: Relaxed scan (RS) C13=C14 photoisomerization paths along S1 for L83QASRAT , WTASRAT and W76S/Y179FASRAT , respectively, computed in Ref. 49 with the original ARM.

(A)-(C). CASPT2//CASSCF/AMBER/6-31G(d) energy profiles along S1(squares) isomerization paths. S0

(diamonds) and S2 (triangles) profiles along the S1 path are also given. The S1 is computed in terms of a relaxed scan along the C12-C13=C14-C15 dihedral angle. Adapted with permission from Marín et al.[49].

Copyright 2019 American Chemical Society.

otherwise, the rhodopsin is discarded.

In our example (see Figure 4.6 and Figure 4.7), the RS of L83QASRAT is barrierless (see Figure 4.7A) and is discarded, whereas WTASRAT (see Figure 4.7B) and W76S/Y179FASRAT (see Figure 4.7B) present a EfS1 of about ca. 2.8 and 6.4 kcal mol−1 and are used to build the list of potentially fluorescent candidates. Finally, the computed EfS1s are contrasted and related with the ESL. Please note that, using the currently presented protocol, L83Q would have been already discarded at a previous step. In order to perform Phase III on L83Q, one would need to manually choose a starting structure, since there is no PLA a-ARM QM/MM model. Marín et al. decided to use an extrapolated model when performing the original calculation.

4.1.4 Protocol Automation

Above I have described the three phases (sections 4.1.1-4.1.3) that makes possible to catego- rize rhodopsin variants as dim-fluorescent or enhanced-fluorescent systems. As anticipated above, I focused on the fast arising fluorescence of the variant DA state (mechanism A of Figure 4.1) and not that of its photocycle intermediates (mechanism B of Figure 4.1). Each phase is implemented into ARM as an independent, stand-alone module that operates auto- matically (i.e., via predefined command-line arguments) and, in principle, provides useful, but specific, information on the fluorescent features of a target set of variants specified in the calculation input. More specifically, Phase I is driven by the a_arm_emission module (see Section A.2.2.2), Phase II by the a_arm_fc module (see Section A.2.2.3) and Phase III by the a_arm_relaxed_scan module (see Section A.2.2.4).

In order to achieve a high level of automation and, potentially, a high-throughput screening of rhodopsin variants with enhanced fluorescence, I designed and implemented a general driver that links Phases I-III in the ARM framework (see Appendix A). This is the a_arm_fluorescence_searcher driver illustrated in Figure 4.8 which automates the pipeline connecting an input list of target rhodopsin variants along with their S0 a-ARM

Digitally signed by: PEDRAZA GONZALEZ LAURA MILENA

Reason: Ph.D. Thesis, dottorato di ricerca in Scienze chimiche e farmaceutiche Ciclo XXXIII (Matricola

(2)

Input:

List of Rhodopsin Variants

a-ARM QM/MM

final equilibrated

structure

Phase I: Location of the First Excited State Minimum

First Excited State (S1) Geometry Optimization

S1 Minimum?

Phase II: Computation of Semiclas- sical Franck-Condon (FC) Trajectories

Franck- Condon Trajectory

Conical Intersection

≈ 200 fs?

Phase III: Calculation of the Excited State Reaction Path

Relaxed Scan along the Isomerization

Coordinate

Torsional barrier?

Significant fluorescence.

SELECT CANDIDATE Dim fluorescence.

DISCARD CANDIDATE

Output:

List of Poten- tial Candidates

no yes

yes no

no yes

Figure 4.8: General workflow of the three-phases a-ARM rhodopsin fluorescence screening protocol. This diagram displays the methodology for automatic searching of fluorescent rhodopsins. The protocol is composed of three phases: i) Location of the first excited state minimum, ii) Franck-Condon trajectory calculation, and iii) Relaxed scan along the isomerization path; each of these phases serves as a criteria to select/discard possible fluorescent candidates.

QM/MM models, to a new list containing the potentially fluorescent candidates, along with their corresponding computed trends in maximum absorption (λamax) and emission (λfmax) wavelengths and energy barriers (EfS1).

In other words, the driver provides a “one-click” architecture to perform all the opera- tions required by the protocol (i.e., quantum chemistry calculations, FC trajectory and RS calculations, classification of rhodopsins) without any user decision/intervention, beyond the provided input, including also the production of formatted tables and graphical repre- sentations. This makes possible the fast and parallel study of large arrays of rhodopsins.

For example, the current work presents fluorescence analyses conducted on a set of 27 rhodopsins, that was performed in parallel. When considering the time required to build and process all 27 models manually, the presented research would not have been doable within a reasonable time slot if it were not for the achieved automation. In fact, to the best of our knowledge, this is the first reported effort for providing an unified platform for the automatic search of fluorescent proteins.

The different default parameters for each phase (e.g., number of states to be com- puted, number of constraints and step size for the relaxed scan) are predefined in an unique input file as shown in Figure 4.9. All of these parameters can be customized at the in- put level; however, the user is recommended to use the default values that were deter- mined via the benchmark calculations. In addition, it is mandatory that all the files of the S0 a-ARM QM/MM model (see Section A.2.1.3) share the root name provided in the

&LIST_OF_TARGET_RHODOPSINS section. For instance, for RHODOPSIN_1, one must place in the root folder the following bundle of files: WT_ASR_AT.Final.xyz, WT_ASR_AT.key, WT_ASR_AT.Espf.Data, WT_ASR_AT.JobIph, WT_ASR_AT.pdb and WT_ASR_AT.cavity con- taining the information necessary for the different calculations.

The driver starts by verifying that all files specified above, for each rhodopsin variant, are in the root folder. Then, one working sub-folder is generated for each rhodopsin. After this preparation step, the protocol starts with the parallel execution of Phase I (i.e., via the a_arm_emission module), as shown in Figure 4.8, for all variants (i.e., each in a different processor). Phase I finishes with the output file of the S1 geometry optimization (i.e., PLA structure). The convergence of this calculation is evaluated by using the following criteria:

(3)

&GENERAL_INFO

project_name : ASR_variants

&LIST_OF_TARGET_RHODOPSINS NUMBER_OF_RHODOPSINS : 3 RHODOPSIN_1 : WT_ASR_AT RHODOPSIN_2 : L83Q_ASR_AT

RHODOPSIN_3 : W76S-Y179F_ASR_AT

&EMISSION_MODULE N_ROOTS_S1 : 2 M_ROOTS_S1 : 3

&FC_MODULE N_ROOTS_FC : 2 M_ROOTS_FC : 3 N_STEPS : 400 GRAD : NONE

&RS_MODULE N_ROOTS_RS : 2 M_ROOTS_RS : 3 STEP_SIZE_RS : 5

Figure 4.9: Input file required for the a-ARM rhodopsin fluorescence screening protocol.

Example of input file, in yaml format, for the a_arm_fluorescence_searcher driver. Each section starts with a & command, and defines the parameters to be passed to the driver (general name of the project and list of rhodopsin variants to be analyzed) and to the single modules that are thus commandeered.

if the geometry optimization calculation reached convergence within 100 optimization steps, the rhodopsin is considered as a potential fluorescent candidate and it continues to Phase II (section 4.1.2); otherwise, the rhodopsin is discarded. The output is a list of potentially fluorescent candidates, along with their λfmax, calculated based on the PLA structure. Phase II starts (i.e., via the a_arm_fc module) immediately after the presence of the PLA is ver- ified. When the FC trajectory calculation reaches a threshold time of 200 fs, the following criteria is used: if the CI has not been reached the rhodopsin is considered as a potential fluorescent candidate and the FC calculation continues until it completes 500 fs and then pass to Phase III (section 4.1.3); otherwise, the rhodopsin is discarded. The output is a list of potentially fluorescent candidates, along with the corrected λfmax, this time calculated as the average ∆EfS1−S0 along the FC trajectory.[49] Finally, Phase III is computed (i.e., via the a_arm_relaxed_scan module) for the candidates selected in Phase II. The main output is a list of potentially fluorescent variants, along with the calculated EfS1. In addition, the output files of phases II and III include both a graphical representation and raw-data of the computed FC trajectory and photo-isomerization path, respectively. This includes informa- tion not only on the S0, S1 and S2 energy profiles, but also on complementary properties,

(4)

such as: Mulliken charges calculated for the reactive fragment, oscillator strength, bond length alternation (BLA) and hydrogen-out-of-plane (HOOP).

I stress that, unlike the case of Phase I explained above, neither Phase II nor Phase III start at the same time for all the variants, since once the driver is launched each rhodopsin is processed as a different thread. This architecture of the driver avoids problems of dead times and makes possible to easily restart the calculations without the needed to start from scratch in case of technical problems (i.e., the cluster is turned off). As further described in Section A.2.2, each of the three phases are composed of different stages/routines that work as a thread (i.e., the input of one stage is the output of the previous stage). Therefore, the stages and phases communicate between them through a communicator file with the extension *.finished. In this regard, when one routine of the module finishes a signal is produced by generating the communicator file that contains information on the module name (i.e., a_arm_emission), and the current stage of the module. The information contained in such a file is managed by the a_arm_crontab module to schedule the execution of the next routine, via python crontab utility. This procedure is the same for each of the modules and drivers of the ARM package.

4.2 Benchmarking and application of the protocol

In this section I report on the performance of the proposed a-ARM rhodopsin fluorescence screening protocol, illustrated in Figure 4.8, as a computational tool for the parallel and, therefore, relatively fast screening of large arrays of light-emitting rhodopsins (mechanism A of Figure 4.1). To this aim, I employ three different sets of rhodopsins, each intended for a specific scope, as follows:

In Section 4.2.1 I introduce and discuss the first set, from now on called benchmark set. As observed in Table 4.2, it is composed of 43 rhodopsin variants, with available experimental data on λamax, ranging from 470 nm to 628 nm. It includes vertebrate (V), invertebrate (I), microbial (M) and heliorhodopsin (H) variants that feature either all-trans, 11-cis or 9-cis r PSB chromophore isomers. The objective of this set is to expand the quality of the previously reported benchmark of a-ARM models,[62, 75, 79] presented in Figure 3.4, for testing the reliability of the ground-state (S0) equilibrium QM/MM models generated with the a-ARM rhodopsin model building protocol,[61, 62, 75] based on their ability to reproduce experimental trends in λamax. In addition, I discuss the presence of few customized, rather than default, models. This is important to establish the quality and limitations of the automatically generated input S0 structures for fluorescence screening. I stress that, as indicated in Table 4.2, such building protocol takes advantage of the availability of several X- ray crystallographic structures. The models of the set members, for which an experimental structure is not available (e.g., several mutants), were built via a comparative (homology) modeling protocol.

The second and third sets are actually subsets of the benchmark set. In Section 4.2.2

(5)

Table 4.2: Benchmark, application and search sets including wild-type and mutant rhodopsins.

Name typea variant PDB-ID RET-Cb code absorptionc

(nm) (kcal mol−1) (eV) benchmark set

Rh V WT 1U19[22] 11-cis WTRh11C 498[22] 57.4 2.49

JSiR1 I WT 6I9K[127] 9-cis WTJSR19C 505[127] 56.6 2.45

JSR1 I WT 6I9K[127] 11-cis WTJSR111C 535[127] 53.4 2.32

ChR2 M WT 6EID[89] all-trans WTChR2AT 470[89] 60.8 2.64

BPR M WT 4JQ6[83] all-trans WTBPRAT 490[83] 58.3 2.53

HeR-48C12 H WT 6UH3[128] all-trans WTHeR-48C12AT 541[128] 52.8 2.29

TaHeR H WT 6IS6[23] all-trans WTTaHeRAT 541[23] 52.8 2.29

KR2 M WT 6REW[21] all-trans WTKR2AT 528[21] 54.1 2.35

D116N all-trans D116NKR2AT 565[29] 50.6 2.19

GVirus M WT 6JO0[16] all-trans WTGVirusAT 509[16] 56.2 2.43

OLPVRII M WT 6SQG[17] all-trans WTOLPVRIIAT 514[17] 55.6 2.41

RxR M WT 6KFQ[129] all-trans WTRxRAT 540[129] 52.9 2.29

bR M WT 6G7H[81] all-trans WTbRAT 568[81] 50.4 2.19

PoXeR M WTe all-trans WTPoXeRAT 564[29] 50.7 2.20

D216N all-trans D216NPoXeRAT 571[29] 50.1 2.17

application set

Arch3 M WT 6GUX all-trans WTArch3AT 556[120] 51.4 2.23

D95E/T99Cd all-trans D95E/T99CArch3AT 626[120] 45.6 1.98 D95E/T99C/V59Ad all-trans D95E/T99C/V59AArch3AT 622[120] 46.0 1.99 D95E/T99C/P60Ld all-trans D95E/T99C/P60LArch3AT 624[120] 45.8 1.99 D95E/T99C/P196Sd all-trans D95E/T99C/P196SArch3AT 628[120] 45.5 1.97

Arch5d all-trans Arch5Arch3AT 622[120] 46.0 1.99

Arch7d all-trans Arch7-7Arch3AT 616[120] 46.4 2.01

QuasAr1d all-trans QuasAr1Arch3AT 580[116] 49.3 2.14

QuasAr2d all-trans QuasAr2Arch3AT 590[116] 48.5 2.10

Archon2d all-trans Archon2Arch3AT 581[118] 49.2 2.13

search set

ASR M WT 1XIO[80] all-trans WTASRAT 550[80] 52.0 2.25

V112N all-trans V112NASRAT 532[130] 53.7 2.33

W76F all-trans W76FASRAT 529[130] 54.0 2.34

L83Q all-trans L83QASRAT 517[49] 55.3 2.40

P206C all-trans P206CASRAT 542[29] 52.8 2.29

P206H all-trans P206HASRAT 525[29] 54.5 2.36

P206K all-trans P206KASRAT 519[29] 55.1 2.39

P206Q all-trans P206QASRAT 527[29] 54.4 2.36

P206Y all-trans P206YASRAT 529[29] 54.1 2.35

S214D all-trans S214DASRAT 550[29] 52.0 2.25

S214D/D217E all-trans S214D/D217EASRAT 547[29] 54.0 2.34

S86D all-trans S86DASRAT 549[29] 52.1 2.26

W76S/Y179F all-trans W76S/Y179FASRAT 488[49] 58.6 2.54

Y73Q all-trans Y73QASRAT 552[29] 51.8 2.25

D217N all-trans D217NASRAT 554[29] 51.5 2.23

D217E all-trans D217EASRAT 555[29] 51.6 2.24

E36Q all-trans E36QASRAT 554[29] 51.5 2.23

D75E all-trans D75EASRAT 526[29] 54.4 2.36

aVertebrate (V), invertebrate (I), microbial (M) and Heliorhodopsin (H);b retinal configuration;c experimental maximum absorption wavelength, λamax, expressed in nm and eV and as first vertical excitation energy, ∆EaS1−S0, in kcal mol−1. Structures obtained via comparative modeling using as templated6GUX ande4TL3.[86]

I introduce and discuss the second set, hereinafter referred to as application set. As ob- served in Table 4.2, it incorporates the 10 microbial (Archaea) rhodopsin variants (wild-type Archeorhodospin-3 and 9 mutants) of the benchmark set with available experimental data not only on λamax, but also on photophysical properties related to their fluorescent behavior.

Such properties are λfmax, ESL and φf (see Table 4.1). As mentioned above, Arch3-based rhodopsin variants have been experimentally demonstrated to be fluorescent and, in specific

(6)

cases, employed as fluorescent probes in optogenetics experiments. Therefore, this set is used for testing the ability of the proposed screening protocol to correctly predict trends in rhodopsin fluorescence and, therefore, to select the most likely fluorescent candidates.

As anticipated above this is done by using computational criteria based on the values of Transition oscillator strength (fOsc) at an S1 stable structure and barrier height along the S1 isomerization path, to assess a qualitative consistency with the available observed pho- tophysical data (e.g., most relevantly, φf).

Finally, in Section 4.2.3 I introduce and discuss the third set, called search set. As observed in Table 4.2, it includes 14 rhodopsin variants whose fluorescent behavior has not yet been characterized and 3 that has been previously studied in Ref. 49, for a total of 17 variants. However, all the variants have available experimental data on λamax(see Table 4.2).

More specifically, the set includes WT ASRAT(WTASRAT ) and 16 mutants. The latest intends to increase the set of three ASR variants studied in Ref. 49, by selecting mutations of residues that either directly interact with the chromophore or form part of the chromophore cavity.4 This set is employed for predicting the excited-state behavior of existing (i.e. successfully expressed in the lab) rhodopsin mutants and for ranking these mutants according to their chances of being fluorescent, which will then have to be experimentally confirmed.

Notice that, while the scope of this work is the selection of fluorescent rhodopsin can- didates, the produced S0 and S1 equilibrium structures can also be employed for following mechanistic studies that, however, are outside the scope of the present work. Further studies devoted to the elucidation of the fluorescence mechanism of rhodopsins from the application set, are currently on going in our laboratory (doctoral thesis of Ph.D. candidate Leonardo Barneschi).

4.2.1 Benchmark Set Results

As a first step, I computed the trend in Maximum absorption wavelength (λamax)5 for the benchmark set and contrasted it with experimental data (see Table 4.3 and Figure 4.10).

This was done to assess the quality of the automatically built ARM QM/MM models repre- senting the input S0 equilibrium structures for the proposed a-ARM rhodopsin fluorescence screening protocol (see Figure 4.8). To this aim, I employed a 4.0 kcal mol−1 error, as accu- racy threshold, as previously determined for the a-ARM rhodopsin model building protocol in paper [I] (see section 3.1.1).[61, 62, 75, 79]

Figure 4.10 displays, for each member of the benchmark set, the computed average λamax (green up-triangles) from the N =10 independently generated a-ARM replicas (see Section A.2.1.3 and Refs. 61, 62 and 75) expressed in terms of Vertical Excitation energy (∆EaS1−S0).

Each value is given along with its error bar (i.e. standard deviation) (Figure 4.10A) and

4Further details on the generation of the ARM QM/MM models for the ASR mutants, are provided in Section 3.1.4 (see Table 3.1).

5The ARM QM/MM models were generated using the a-ARM version of the protocol (section 3.1.1) imple- mented in ARM as the a_arm_protocol (section A.2.1).

(7)

Table 4.3: Ground-state Vertical Excitation energy (∆EaS1−S0), kcal mol−1 and eV in italic and paren- thesis), Maximum absorption wavelength (λamax), nm), and oscillator strength (fOsc), calculated using the a-ARMdefault and the a-ARMcustomized approaches. Differences between calculated and experimental data (∆∆EExpS1-S0, ∆λa,Expmax ) are also presented.

Experimental Calculateda Error

Model ∆Ea,ExpS1-S0 λa,Expmax ∆ES1-S0a λamax fOsc ∆∆ES1-S0a,Exp ∆λa,Expmax benchmark set

WTHeR-48C12AT 52.8 (2.29) 541 52.80.7(2.29) 541 1.12 0.0 (0.00) 0

WTTaHeR(c)AT 52.8 (2.29) 541 55.20.4(2.40) 518 1.22 2.4 (0.10) -23

WTJSR19C 56.6 (2.46) 505 55.80.7(2.42) 512 0.96 -0.8 (-0.03) 7 WTJSR111C 53.4 (2.32) 535 52.80.7(2.29) 541 0.83 -0.6 (-0.03) 6 WTRh11C 57.4 (2.49) 498 57.50.5(2.49) 497 0.88 0.1 (0.00) -1

WTBPR(c)AT 58.3 (2.53) 490 58.01.0(2.52) 493 0.73 -0.3 (-0.01) 3

WTChR2(c)AT 60.8 (2.64) 470 62.20.6(2.70) 459 1.13 1.4 (0.06) -11

D216NPoXeRAT 50.1 (2.17) 571 50.60.4(2.20) 565 1.44 0.6 (0.02) -6 WTbRAT 50.3 (2.18) 568 53.90.3(2.33) 531 1.22 3.6 (0.15) -37 D116NKR2AT 50.6 (2.19) 565 52.50.3(2.28) 544 1.39 1.9 (0.08) -21

WTPoXeRAT 50.7 (2.20) 564 50.40.3(2.19) 567 1.47 -0.3 (-0.01) 3

WTRxRAT 52.9 (2.30) 540 56.70.5(2.45) 504 1.09 3.8 (0.15) -36

WTOLPVRIIAT 55.6 (2.41) 514 54.80.1(2.38) 521 1.16 -0.8 (-0.03) 7

WTKR2(c)AT 54.1 (2.35) 528 56.00.0(2.43) 511 1.16 1.9 (0.08) -17

WTGVirusAT 56.2 (2.44) 509 55.61.0(2.41) 515 1.25 -0.6 (-0.03) 5

application set

WTArch3AT 51.4 (2.23) 556 54.30.7(2.35) 527 1.25 2.9 (0.12) -29

Arch5Arch3AT 46.0 (1.99) 622 48.00.4(2.08) 596 1.34 2.0 (0.09) -26 Arch7Arch3AT 46.4 (2.01) 616 48.80.1(2.12) 586 1.29 2.4 (0.10) -30 Archon2Arch3AT 49.2 (2.13) 581 53.00.8(2.30) 540 1.31 3.8 (0.16) -41 QuasAr1Arch3AT 49.3 (2.14) 580 53.00.4(2.30) 540 1.28 3.7 (0.16) -40 QuasAr2Arch3AT 48.5 (2.10) 590 51.80.6(2.25) 552 1.38 3.3 (0.14) -38 D95E/T99CArch3AT 45.7 (1.98) 626 49.70.8(2.16) 575 1.41 4.0 (0.18) -51 D95E/T99C/P60LArch3AT 45.5 (1.97) 628 49.90.6(2.17) 573 1.35 4.4 (0.19) -55 D95E/T99C/P196SArch3AT 46.0 (1.99) 622 47.30.2(2.05) 604 1.41 1.4 (0.06) -18 D95E/T99C/V59AArch3AT 45.8 (1.99) 624 49.70.0(2.15) 576 1.35 3.8 (0.17) -48 search setc

WTASRAT 52.0 (2.25) 550 52.30.2(2.27) 547 1.29 0.3 (0.01) -3 Y73QASRAT [R1] 51.8 (2.25) 552 52.50.7(2.28) 544 1.27 0.7 (0.03) -8 S214DASRAT [R1] 52.0 (2.25) 550 53.20.4(2.31) 538 1.26 1.2 (0.05) -12 S86DASRAT [R1] 52.1 (2.26) 549 52.40.6(2.27) 545 1.28 0.4 (0.02) -4 P206CASRAT [R1] 52.8 (2.29) 542 52.90.7(2.29) 541 1.32 0.1 (0.01) -1 P206HASRAT [R1] 54.5 (2.37) 525 56.00.2(2.43) 510 1.11 1.5(0.07) -15 P206KASR(c)AT [R3] 55.1 (2.40) 519 55.30.2(2.40) 517 1.19 0.1(0.01) -2 P206QASRAT [R1] 54.3 (2.36) 527 54.40.3(2.36) 526 1.07 0.1(0.00) -1 P206YASRAT [R1] 54.1 (2.35) 529 53.41.5(2.32) 536 1.19 -0.7(-0.03) +7 S214D/D217EASRAT [R1/R1] 53.3 (2.27) 547 54.00.4(2.35) 529 1.23 1.7(0.07) -18 D217NASRAT [R1] 51.5 (2.24) 554 52.60.5(2.29) 544 1.31 1.1(0.05) -10 D217EASRAT [R1] 51.6 (2.24) 555 52.60.8(2.29) 544 1.28 1.0(0.04) -11 E36QASRAT [R1] 51.6 (2.24) 555 52.90.4(2.30) 540 1.29 1.3(0.06) -14 D75EASRAT [R1] 54.4 (2.37) 526 53.20.3(2.31) 538 1.28 -1.2(-0.05) +12 V112NASR(c)AT [R3] 53.7 (2.33) 532 54.11.6(2.35) 528 1.23 0.4 (0.02) -4 W76FASRAT [R1] 54.0 (2.34) 529 55.30.7(2.40) 517 1.14 1.3 (0.06) -12

L83QASR(c)AT [R2] 55.3 (2.40) 517 55.70.4(2.42) 513 1.02 0.4 (0.02) -4

W76S/Y179FASRAT [R1/R1] 58.6 (2.54) 488 56.82.0(2.46) 503 1.06 -1.8 (-0.08) 15 ADmaxb

4.4 (0.19)

MAE ± MAD of ∆∆EExpS1-S0b 1.5 ± 1.1 (0.07 ± 0.05)

aAverage value of 10 replicas, along with the corresponding standard deviation given as sub-index.

bThe 36 rhodopsins are considered.

cThe selected rotamer is specified in square parentheses (see Section 3.1.4).

(c)symbol stands for customized models constructed with the a-ARMcustomizedapproach.

(8)

45.0 48.0 51.0 54.0 57.0 60.0 63.0 66.0 69.0 72.0 75.0 78.0 81.0 84.0

1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6

−6.0

−3.0 0.0 3.0 6.0 9.0 12.0 15.0

WT

HeR-48C12 A(H)T WT

JSR1 9C

(I) WT

Rh 11C

(V) WT

BPR A(M)T WT

ChR2 A(M)T D95E/T99C/Arch3P60L(M)AT Arch3D95E/T99C(M)AT D95E/T99C/Arch3V59A(M)AT Arch3Arch5(M)AT D95E/T99C/ Arch3P196S(M)AT Arch3Arch7(M)AT Arch3QuasAr2(M)AT Arch3Archon1(M)AT Arch3Archon2(M)AT Arch3QuasAr1(H)AT Arch3WT(H)AT PoXeRD216N(M)AT PoXeRWT(M)AT OLPVRIIWT(M)AT D116N KR2 A(M)T TaHeRWT(H)AT WT

JSR1 11C

(I) WT

KR2 A(M)T WT GVirus A(M)T Y73Q

ASR A(M)T WT ASR A(M)T S214D

ASR A(M)T S86D ASR A(M)T P206C

ASR A(M)T V112N ASR A(M)T W76F

ASR A(M)T L83Q ASR A(M)T W76F/V112N

ASR A(M)T W76S/Y179F ASR A(M)T

−0.1 0.0 0.1

E

a S1−S0 (eV)

a-ARMdefault(N =10) Experimental

Benchmark set

Application set

a-ARMcustomized(N =10) A

Search set

(kcalmol1)∆∆Ea,Exp S1−S0 (eV)

Rhodopsin variant B

(kcalmol1)

Figure 4.10: Extended benchmark of the a-ARM protocol, in terms of reproduction of ex- perimental trends in λamax (benchmark set, application set and search set). The computed data was obtained using the a_arm_protocol driver, where the average value from the N =10 replicas is plotted (green up-triangles) (A) along with the corresponding error bars (B). Models were constructed with the a- ARMdefault approach, with exception of those circled in red that were constructed with the a-ARMcustomized

approach. (x axis; M indicates microbial, H heliorhodopsin, V vertebrate and I invertebrate rhodopsins).

difference with respect to corresponding experimental data (∆Expcalc∆EaS1−S0) (Figure 4.10B).

When all models were generated automatically with the a-ARMdefault approach, the results show that 39 out of the 43 models (91%) have an absolute ∆Expcalc∆EaS1−S0 value below

± 4.0 kcal mol−1. A customization procedure was used for the four outliers by using the a-ARMcustomized approach, so as to improve the models quality. 6 As illustrated in Table 4.4, the customization of the WTs was achieved by exploring different choices for the protonation state of certain ionizable residues based on either chemical reasoning or experimental information. Moreover, the customization that requires a choice among the side-chain conformation for 3 of the ASRAT mutants is explained in section 3.1.4.

As it will be further discussed in section 5.1.3 and paper [V], KR2 is a good case study for customization. The default model of this rhodopsin has two negatively charged aspartic acid residues, forming the counterion complex around the r PSB, Asp-116 and Asp-251.

However, as previously discussed in paper [I], two negative charges outbalance the single positive charge of the r PSB, producing a λamax that is c.a. 15 kcal mol−1 blue-shifted with respect to experimental value. Instead, the protonation/neutralization of the second counterion (Asp-251), through a customized model, allows a more charge-balanced model

6Both a-ARMdefaultand a-ARMcustomizedapproaches are described in section 3.1.1 and detailed in papers [I] and [III]

(9)

Table 4.4: Setup of the protonation states for a-ARMdefaultand a-ARMcustomizedmodels. The residues with different protonation states are highlighted. Asp, Glu are deprotonated while Ash and Glh are protonated.

Rhodopsin a-ARMdefault a-ARMcustomized

WTBPRAT • GLH-90

• GLH-124

• GLH-90

• GLU-124

WTTaHeRAT • HIE-23

• HID-82

• HIS-23

• HIE-82

WTKR2AT • ASP-251

• GLH-160

• ASH-251

• GLH-160

Charge of histidine: +1 when both the δ-nitrogen and -nitrogen of the imidazole ring are protonated (HIS), while it is neutral when either the δ-nitrogen (HID) or the -nitrogen (HIE) are deprotonated.

that reproduces the experimental λamax with an error of c.a. 1.9 kcal mol−1.

While the construction of customized models (i.e., through the modification of the pro- tonation state of specific residues) allowed to obtain ∆Expcalc∆EaS1−S0 values falling within the error bar, it has to be reckoned that this were not possible when a fullly automated preparation of the input is requested. Most importantly, in the absence of experimental λamaxvalues (e.g., for rhodopsin variants not yet expressed and/or spectroscopically studied in the lab) it will not be possible to detect which model has to be customized. This also applies for the subroutine for mutants generation presented in section 3.1.4, where the choice of the mutated side-chain rotamer relies on experimental λamax. Additional limitations and pitfalls related to the ground-state a-ARM QM/MM building are explicitly described in section 3.1.1 (see page 51). These issues impose a first limit on the quality of an automated fluorescent rhodopsin screening and, therefore, an error bar.

In conclusion, the general trend in absorption energy can be qualitatively reproduced by using the a-ARM protocol for the 43 rhodopsin variants reported in Table 4.2. The corresponding ARM QM/MM models can be used for the ensuing excited-state calculations, as detailed in sections 4.2.2 and 4.2.3. However, if a full automated procedure were to be used for the input generation, only 91% of the models would be of a quality matching the selected threshold. Methods for improving the prediction of the residue protonation states during the construction of ARM QM/MM models are being presently investigated in our laboratory (see for instance a preliminary study in Ref. 93).

4.2.2 Application Set Results

In this Section I examine the results obtained by applying the protocol of Figure 4.8 to the application set: a set of experimentally investigated fluorescent rhodopsins for which default S0 ARM QM/MM models have been automatically generated7. For each phase I discuss

7As specified in Table 4.2, the initial structures for the Arch3-based mutants were generated via compar- ative (homology) modeling, using as template the X-ray structure of WT Arch3 [PDBID 6GUX].

(10)

the resulting screening procedure and the corresponding selection outcome.

4.2.2.1 Phase I: application set

As introduced in Section 4.1.1, in Phase I the protocol looks for the existence of a S1 excited state planar minimum (PLA) located near the FC region of the corresponding PES (see Figure 1.3) computed at the CASSCF level of theory and, therefore, corresponding to a rhodopsin where the r PSB structure is nearly planar. The existence of a PLA classifies the given candidate as potentially fluorescent and allows to move it to Phase II. However, it does not provide information on the actual stability of the PLA, which is of importance for having a long enough ESL and promotes light emission. Phase II and Phase III will address this point.

Before discussing the outcome of Phase I, I better evaluate how the quality of the initial ground-state S0 ARM QM/MM models reflects the quality of the output S1. In this regard, I consider two different aspects. The first one (i) is related to the effect of the choice of the initial S0 geometry on the location of the PLA structure. The second aspect (ii) is, instead, related to the possible dependency of the computed trend in λfmax on the quality of the computed trend in λamax.

Regarding aspect (i), Phase I receives as an input for each variant the representative S0 a-ARM QM/MM structure, whose λamax is closest to the average value. Since Ref. 61 demonstrated that at least 10 replicas of the ground-state ARM QM/MM model are necessary for the correct description of the λamax property, we wonder whether the evaluation of a single replica of excited-state S1 ARM QM/MM model is enough for the location of the PLA structure and the subsequent description of λfmax property.

Figure 4.11A presents a close view of the trend in absorption energy (turquoise triangles) computed for the application set (i.e., average of N =10 replicas), taken from Figure 4.10.

Notice that, in this figure, the individual ∆EaS1−S0 computed for the chosen representative replica of the ARM QM/MM model (i.e., the one closest to the average) is also reported (dark- blue circles). As shown in the figure and detailed in Table 4.3, the standard deviation in

∆EaS1−S0 for members of the application set is relatively small, ranging between 0.0 - 0.8 kcal mol−1. Accordingly, I do not expect that the 10 S0 ARM replicas of a same variant feature significant structural differences and, thus, I expect that all of the replicas might lead to the same PLA structure (or no PLA).

In order to corroborate the above hypothesis, I evaluated Phase I for the 10 replicas of a reduced sub-set of the application set. Notice that, for this test, the definition of a sub-set was needed considering the large number of required calculations and, consequently, the required computational wall-time.8 Thus, the members of the sub-set were chosen as those six variants heavily featured in a number of optogenetics studies (i.e., Arch3, Arch5,

8As observed in Figure A.4, the execution of the a_arm_emission module for a single replica implies 7 QM/MM calculations. The evaluation of the 10 seeds for each of the 10 members of the application set, would require around 700 calculations.

(11)

46.0 48.0 50.0 52.0 54.0

2.0 2.1 2.2 2.3 2.4

−8.0

−6.0

−4.0

−2.0 0.0 2.0 4.0 6.0 8.0

WTArch3AT(M) Arch5Arch3 AT(M) Arch7Arch3 AT(M) Archon2Arch3 AT(M) QuasAr1Arch3 AT(H) QuasAr2Arch3AT(M) D95E/ T99CArch3 AT(M) D95E/T99C/ V59AArch3 AT(M) D95E/T99C/ P196SArch3AT(M) D95E/T99C/ P60LArch3AT(M)

−0.2

−0.1 0.0 0.1 0.2

E

a S1−S0 (eV)

a-ARM (N =10) a-ARM (N =1) A Exp.

(kcalmol1)∆∆Ea,Exp S1−S0 (eV)

Rhodopsin variant B

(kcalmol1)

32.0 34.0 36.0 38.0 40.0 42.0

1.4 1.5 1.6 1.7 1.8

−6.0

−4.0

−2.0 0.0 2.0 4.0 6.0

WTArch3 AT(M) Arch5Arch3 AT(M) Arch7Arch3 AT(M) Archon2Arch3 AT(M) QuasAr1Arch3 AT(H) QuasAr2Arch3AT(M) D95E/ T99CArch3 AT(M) D95E/T99C/ V59AArch3AT(M) D95E/T99C/ P196SArch3AT(M) D95E/T99C/ P60LArch3AT(M)

−0.2

−0.1 0.0 0.1 0.2

Ef S1−S0 (eV)

PLA (N =1) FC

Exp. PLA (N =10)

C

(kcalmol1)∆∆Ef,Exp S1−S0 (eV)

Rhodopsin variant D

(kcalmol1)

Figure 4.11: Trends in vertical absorption (∆EaS1−S0, left) and emission (∆EfS1−S0, right) ener- gies for the rhodopsins of the application set. (A) Computed vertical absorption values (∆EaS1−S0) exclusively for the application set. Experimental (yellow triangles) and N = 10 replicas values (green up- triangles) were taken from Figure 4.10. The values relative to the replica used for the PLA calculations are included as dark-blue circles. (B) Difference between calculated and experimental data (∆Expcalc∆EaS1−S0).

Green bars as in Figure 4.10, while dark blue bars are relative to the replica used for the PLA. (C) Com- puted ∆EfS1−S0 (via a_arm_emission module) using the representative replica (red squares), along with experimental data (indigo circles). Data are also presented (green triangles) for a subset of the application set, where values were computed as average of ten replicas, with their corresponding error bars. Finally, values corrected with kinetics energy (via a_arm_fc module) are shown in orange squares. (D) Difference between calculated and experimental data (∆Expcalc∆EfS1−S0). Bars are coloured according to the computed data reported in C. Data on emission for Arch3 is not presented, since its reported fluorescence comes from a photointermediate instead of from the DA state, as for its variants.

Arch7, Archon2, QuasAr1 and QuasAr2). With this choice, the total number of QM/MM calculations was reduced from c.a. 700, considering the whole application set, to c.a. 420. As a result, the produced trend in λfmax with the corresponding standard deviation is reported in Figure 4.11C (green triangles), for the variants with located PLA.

Most of the variants of the sub-set were found to present a PLA structure for each of their 10 initial S0 structures (Arch5, Arch7, Archon2, QuasAr1 and QuasAr2), with a standard deviation (see green error bar in panel C) comparable to the one obtained for absorption (see turquoise error bars in panel A). However, for WT Arch3, only 8 out of the 10 initial S0 replicas provide a PLA structure. Therefore, the results of Phase I for Arch3 are not conclusive and it has to be evaluated in Phase II. Consequently, I have used the a_arm_fc module (see Section A.2.2.3) to compute the FC trajectory of Arch3.

The trajectory energy profile plotted in Figure 4.12(a) shows a decay channel (located in the vicinity of a CI showing a ca. 90 degrees twisted C13=C14 bond), which is reached in less than 200 fs. This finding indicates that the WTArch3AT candidate must be readily discarded as

(12)

Table 4.5: First excited-state vertical excitation energies (∆ES1-S0, kcal mol−1 and eV in italic and paren- thesis), maximum emission wavelengths (λfmax, nm), and oscillator strength (fOsc), calculated using the a_arm_emission module. Differences between calculated and experimental data (∆∆Ef,ExpS1-S0, ∆λf,Expmax ) are also presented.

Rhodopsin Experimental Calculateda Error

variant ∆ES1-S0f,Exp λf,Expmax ∆ES1-S0f λfmax fOsc ∆∆Ef,ExpS1-S0 ∆λf,Expmax

Arch5Arch3AT 39.1 (1.70) 731 34.1 (1.48) 838 1.44 -5.0 107

Arch7Arch3AT 39.3 (1.70) 727 33.9 (1.47) 843 1.44 -5.4 116

Archon2Arch3AT 38.8 (1.69) 735 35.8 (1.55) 799 1.25 -4.9 64 QuasAr2Arch3AT 40.0 (1.73) 715 36.2 (1.57) 789 1.38 -3.8 74 QuasAr1Arch3AT 40.0 (1.73) 715 35.6 (1.55) 802 1.31 -4.3 87 D95E/T99CArch3AT 39.1 (1.70) 731 35.1 (1.52) 814 1.50 -4.0 83 D95E/T99C/V59AArch3AT 39.2 (1.70) 728 36.4 (1.61) 786 1.56 -2.8 44 D95E/T99C/P196SArch3AT 39.1 (1.70) 731 36.3 (1.58) 787 1.60 -2.8 56 D95E/T99C/P60LArch3AT 39.1 (1.70) 731 35.0 (1.52) 817 1.50 -4.1 86

ADbmax 5.4 (0.23)

MAE ± MAD of ∆Expcalc∆EaS1−S0b 3.9 ± 0.8 (0.17 ± 0.03)

a1 replica close to the average.

their PLA structures are unstable, being prone to undergo double bond photoisomerization.

This appears to be, at least partially, consistent with experimental data since, as reported in Table 4.1, WTArch3AT features the lowest observed φf value, that is at least two orders of magnitude smaller than that of the other members of the set and, as discussed above, its fluorescence comes from an intermediate and not from the DA state.

On the other hand, with the aim of comparing the average ∆EfS1−S0s with the values produced by means of the representative replica, I plotted in Figure 4.11 (see red squares) the latter values for each member of the application set (see Table 4.5). As observed, the values computed with the representative replica are in line with the average of the 10 replicas. These results suggest that, the use of a single replica provides a good estimation of the located PLA and, in turn, the emission energy, reducing in this case the number of required calculations from c.a. 420 to 60.

From the above results, I conclude that the choice of a single replica represents a good compromise between computational resources consumption and quality of the results.

Therefore, as shown in Figures 4.11C-D, Phase I was applied to the rest of the application set (D95E/T99C, D95E/T99C/V59A, D95E/T99C/P296S/ and D95E/T99C/P60L) using the representative replica.

I now analyze aspect (ii) above, focusing on the trends (i.e., for both absorption and emission) and evaluate whether they are consistent with the experimental observations.

As anticipated above, both the computed and observed trends, as well as the differences between computed and experimental λfmax for each rhodopsin, are plotted in Figure 4.11. I first analyze the trend in λamax. Since the 9 mutants were generated from the same template X-ray structure (i.e., Arch3, PDBID 6GUX) via comparative modeling, I would expect that all of them present an error bar consistent with that of the WT Arch3. However, a close inspection of the trend (Figure 4.11A-B) reveals that the shape of the turquoise curve describing the calculated results do not accurately resemble the shape of the yellow curve describing the experimental trend. Indeed, only the first 6 variants reproduce the

Riferimenti

Documenti correlati

However nding any way of handling resource sharing in these systems without breaking the BIP has been an open problem for long time, as long as the idea of solving it using

Sebbene la scarsità di notizie sugli okimono e la mancanza di studi specifici abbiano 

Facendo riferimento a quanto analizzato nel 2004, solo il 19% delle aziende intervistate ha ritenuto che la BSC non avrebbe avuto una notevole diffusione nella prassi aziendale,

o nell’adozione di uno scopo mutualistico) ed un diverso orientamento geografico dell’operatività (internazionale, nazionale, regionale e locale). Ma ciò che giustifica il

The Animal Welfare Indicators (AWIN) project is the first project, funded by the European Commission, intended to improve donkey welfare by developing a scientifically sound

In order to overcome the above drawbacks, this doctoral Thesis is devoted to the de- sign of a substantially improved ARM methodological framework, characterized by a fully

However, once we abandon civic humanist readings, the scholarly demonstration of contract and state of nature theory in writers like Jefferson, Madison, or Adams

[6] introduced a slightly different protocol based on the decomposition problem (They worked with braid groups, but we will apply their protocol to Thompson’s group)..