• Non ci sono risultati.

Università di Pisa

N/A
N/A
Protected

Academic year: 2021

Condividi "Università di Pisa"

Copied!
6
0
0

Testo completo

(1)

Università di Pisa

Dottorato di ricerca in Scienze Chimiche

XXII Ciclo (2007-2009)

Ph.D. Thesis

P

REDICTION OF THE

P

HYSICO

-C

HEMICAL

P

ROPERTIES

OF

L

OW AND

H

IGH

M

OLECULAR

W

EIGHT

C

OMPOUNDS

Structure-Based QSAR/QSPR with

Recursive Neural Networks

Ph.D. Student:

Carlo Giuseppe Bertinetto

(2)
(3)

Table of Content

ABSTRACT...5

1. Introduction and Aim of the Work...7

1.1.Introduction...9

1.2.Historical Background of Quantitative Structure Activity Relationships...10

1.3.Aim of the work ...12

1.4.Outline of the thesis ...14

1.5.References...15

2. Theoretical Basis of Quantitative Structure-Activity/Property Relationship. Recursive Neural Network and Multi Linear Regression Methods. ...19

2.1.General principles of Quantitative Structure-Activity(Property) Relationship ...21

2.2.The Recursive Neural Network Methodology ...24

2.2.1.Basic definitions...24

2.2.2.Encoding and mapping functions...25

2.2.3.Training algorithm and Recursive Cascade Correlation (RCC) ...29

2.2.4.Molecular representation ...31

2.3.CROMRsel technique for the stepwise and optimal selection of descriptors in Multi Linear Regression models...34

2.4.References...38

3. Evaluation of Hierarchical Structured Representations of Cyclic Moieties for Recursive Neural Network QSPR...41

3.1.Introduction...43

3.2.Data sets and molecular representation...43

3.3.Experiments and Discussion ...49

3.4.Conclusions...56

3.5.References...57

4. Prediction of the Glass Transition Temperature of Acrylic and Methacrylic Copolymers by Recursive Neural Networks...59

4.1.Introduction...61

(4)

4

4.2.1.Prediction of homopolymer Tg...61

4.2.2.Prediction of copolymer Tg...64

4.2.3.Use of RNN-QSPR for the prediction of polymer Tg...65

4.3.Data set and molecular representation ...67

4.4.Experiments and discussion ...68

4.5.Conclusions...76

4.6.References...78

5. Prediction of Toxicity by Recursive Neural Networks and Multi-Linear Regression ...83

5.1.Introduction...85

5.2.Background ...85

5.2.1.Overview of methods for the development of QSTR ...85

5.2.2.Prediction of growth inhibition concentration to Tetrahymena pyriformis.... ...87

5.2.3.Prediction of mean lethal concentration to Pimephales promelas ...88

5.2.4.Modern challenges in toxicological prediction...89

5.3.RNN-QSTR of the Growth Impairment Concentration of substituted phenols ...90

5.3.1.Data set and molecular representation ...90

5.3.2.Experiments and discussion...91

5.4.QSTR of the Mean Lethal Concentration of substituted benzenes by RNN and MLR ...98

5.4.1.Data set and molecular representation ...98

5.4.2.Experiments and discussion...99

5.5.Conclusions...103

5.6.References...105

6. General Conclusions ...111

7. Appendix I: Rules for Determining the Priority Scale...113

8. Appendix II: References for Experimental Data...114

9. Appendix III: Molecular Fragments and Vertex Labels Used for the Experiments ...128

10.Appendix IV: Definition of the Statistical Parameters Used for the Evaluation of Results...132

(5)

ABSTRACT

In the present Ph.D. Thesis, an innovative approach to derive Quantitative Structure-Property/Activity Relationships (QSPR/QSARs) was investigated and discussed by applying it to various predictive problems. This approach is based on the direct and adaptive treatment of molecular structure by means of a Recursive Neural Network (RNN). Chemical compounds are represented through appropriate graphical tools and no numerical descriptors are needed.

In the first part, the RNN-QSPR method was applied to predicting the melting point (Tm) of a set of 126 pyridinium bromides and the glass transition temperature (Tg) of a set of 337 (meth)acrylic homopolymers. Particular emphasis was placed on the representation of cyclic moieties, which can be achieved in different ways by exploiting the flexibility of the structured approach. Various representations were devised, each one having different advantages and sampling requirements. The performance did not show significant variations when passing from a more specific representation to a more general one. The best result obtained for the Tm of pyridinium bromides showed, for the test set of 37 molecules, a mean absolute residual (MAR) of 25 K, a standard error of prediction (S) of 29.6 K and a squared correlation coefficient (R2) of 0.62. The best outcome for the Tg of poly(meth)acrylates had MAR, S and R2 values of 15.8 K, 20.4 K and 0.85, respectively, for the test set of 54 molecules.

In the second part, the representation used for the treatment of homopolymers was expanded to treat copolymers. A data set containing the Tg of 275 random (meth)acrylic copolymers was investigated, either alone or mixed with homopolymer data. The prediction on copolymers was excellent, with MAR, S and R2 for the 57 compounds in the test set of 4.9 K, 6.1 K and 0.98. The method yielded a good performance also on the total data set comprising homopolymers and copolymers together.

In the last part, the RNN approach was employed to model and predict the toxicity of two sets of aromatic molecules. The first data set involved the median growth impairment concentration (IGC50) of 221 phenols towards Tetrahymena pyriformis. The results were good for the training set, but the performance on the test set (41 molecules) was not on par with that of other methods in the literature. However, it must be stressed that the referenced methods employ a priori information synthesized into appropriate numerical descriptors, whereas our method does not make use of any background knowledge. The second data set concerned the median Lethal Concentration (LC50) of 69 substituted

(6)

6

benzenes towards Pimephales promelas. This data set was also investigated by means of a descriptor-based MLR technique. The performance was good for both calculations, yielding MAR ≈ 0.22, S ≈ 0.25 and R2 ≈ 0.80 on the test set of 18 molecules. The results obtained by RNN and MLR were very similar, despite the radically different approaches of these two methods.

Riferimenti

Documenti correlati

In Chapter 6 the Post-Newtonian approach to cosmological perturbations is presented in the Poisson gauge; with a clarification of the Newtonian limit, we obtain a set of

E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees How many purely conjunctive hypotheses (e.g., Hungry ∧ ¬Rain)?. Chapter 18, Sections

It slightly improves the first model through the addition of two further variables, the upper columns’ stiffness, but these two variables, although significant according to

The problem of nonlinear state estimation based on the measurements of nodal powers and voltages is solved by the method of simple iteration which minimizes the quadratic function

Il rifacimento della terra, il suo modellamento per mezzo della scienza e della macchina assume anche un valore metaforico perché secondo Papini, gli uomini moderni,

by transatlantic mutual recognition agreements include the International Standards Organization ISO for a broad range of standards; the International Electrotechnical Commission IEC

12-4 PREDICTION OF NEW OBSERVATIONS 12-5 MODEL ADEQUACY CHECKING 12-5.1 Residual Analysis 12-5.2 Influential Observations 12-6 ASPECTS OF MULTIPLE REGRESSION MODELING 12-6.1

The seminar was composed of workshops and tutorials on head-mounted eye tracking, egocentric vision, optics, and head-mounted displays. The seminar welcomed 30 academic and