• Non ci sono risultati.

In Silico Modeling of Aryl Hydrocarbon Receptor Binding Affinities of a Series of Mixed Halogenated Aromatic Compounds

N/A
N/A
Protected

Academic year: 2021

Condividi "In Silico Modeling of Aryl Hydrocarbon Receptor Binding Affinities of a Series of Mixed Halogenated Aromatic Compounds"

Copied!
157
0
0

Testo completo

(1)

U

NIVERSITY OF

P

ISA

EARTH SCIENCES DEPARTMENT

I

N

S

ILICO

M

ODELING

OF

A

RYL

H

YDROCARBON

R

ECEPTOR

B

INDING

A

FFINITIES

OF

A

S

ERIES

OF

M

IXED

H

ALOGENATED

A

ROMATIC

C

OMPOUNDS

Thesis Work by Annalisa Ruffa

A thesis submitted in partial fulfillment of

the requirements for the degree of Master of science in Environmental Science

Prof. Maria Grazia Tozzi Prof. Dr. Melek Türker Saçan

(Supervisor) (Co-Supervisor)

Prof. Andrea Raffaelli (Outside expert)

(2)

ACKNOWLEDGEMENTS

I dedicate this work to all the persons that have been close to me and whom I want to thank most sincerely.

The person that I want to thank first of all is undoubtedly Prof. Dr. Melek Türker Saçan, that I had the honour to get to know personally in my internship experience. She has been the point of support during my Istanbul experience. She guided me and support me with great patience and availability during the whole period of my Master thesis work, and she also did so in the most difficult moments.

I would really like to thank Prof. Paola Gramatica who allowed me the use of QSAR software to carrying on my research and a I want to do a special thanksgiving also to the Bogazici University, Institute of Environmental Sciences, which gave me the possibility to collaborate with them.

A warm thank you as well to two professors that I admire a lot: Prof. Maria Grazia Tozzi and Prof. Andrea Raffaelli. Having attended their lessons, I will always consider them the best researchers and professors that I have known during my whole academic career.

A heartfelt thank you to my parents and sister. Without your support and your love I don't where I would have ended up going...

I also want to thank the wonderful new people and my classmates that have I got to know in Pisa during these last years, and all the splendid people that have been supportive towards me in Florence. You are the best guys.

Finally, I thank Didem for her warm welcome and Pol for his availability -again: without you I would not have made it this year.

(3)

ABSTRACT

The Halogenated Aromatic Compounds (HAC) are considered an emerging group of persistent chemical pollutants dangerous and potentially harmful to human health. Their biological activities as the binding affinity to the Aryl Hydrocarbon Receptor (AhR) is of fundamental importance to detect the toxicity of these compounds on living organisms.

In this study, the Quantitative Structure-Activity Relationship/Quantitative Structure-Toxicity Relationship (QSAR/QSTR) methods were used to create some models developed on log RBA values of a data set of 108 congeners from halogenated aromatic compounds (PCBs, PCDDs, PCDFs, PBDDs, PBDEs, some substituted PCB groups and congeners from bromo chloro substituted dibenzo dioxin groups) by employing the Multiple Linear Regression (MLR).

The used descriptors were from DRAGON 06 and SPARTAN 04 software, whereas the models are developed from QSARINS (evaluation version b1.1 2012) software. All the best models were validated for their performance using all the criteria suggested by Organization for Economic Co-operation and Development principles (OECD, 2007), which involving the internal and external validation of the models, the analysis of the applicability domain (AD) and, when possible, a mechanistic interpretation of the models. External validation was provided by splitting the data sets into training and test sets either choosing manually the compounds initially ordered according to the increasing order of their toxicity values or by applying the hierarchical clustering technique.

Finally, the proposed QSTR models were tested in their predictivity using an

external set comprising all the rest of HACs (618 compounds) with no experimental

(4)

RIASSUNTO ANALITICO

I Composti Alogenuri Aromatici (HAC) sono considerati un emergente gruppo di composti chimici inquinanti persistenti pericolosi e potenzialmente dannosi per la salute umana. Le loro attivit' biologiche come l'affinità di legame al Recettore degli Idrocarburi Arilici (AhR) è di importanza fondamentale per individuare la tossicità di questi composti sugli organismi viventi.

In questo studio, i metodi di Relazione Quantitativa Struttura-Attivitità/Relazione Quantitativa Struttura-Tossicità (QSAR/QSTR) sono stati usati per creare alcuni modelli sviluppati da valori di log RBA di un set di dati di 108 congeneri di composti aromatici alogenati (PCBs, PCDDs, PCDFs, PBDDs, PBDEs, alcuni PCB sostituiti e cogeneri da gruppi di dibenza dioxina bromo cloro sostituiti) adottando la Regressione Multipla Lineare (MLR).

I descrittori usati derivano da DRAGON 06 e SPARTAN 04 software, mentre i modelli sono stati sviluppati dal software QSARINS (evaluation version b1.1 2012). tutti i modelli migliori sono stati validati per la loro performance usando tutti i criteri suggeriti dai principi della Economic Co-operation and Development (OECD, 2007), i quali coinvolgono la convalida interna ed esterna dei modelli, l'analisi del dominio di applicabilità (AD) e, quando possibile, una interpretazione meccanicistica dei modelli. Validazione esterna è stata fornita suddividendo il set di dati in training e

test set sia scegliendo manualmente i composti inizialmente ordinati secondo l'ordine

crescente dei loro valori di tossicità, sia applicando la tecnica del clustering gerarchico.

Infine, i modelli QSTR proposti sono stati testati nella loro predittività usando un set di composti esterni comprendenti di tutto il resto dei HAC (618 composti) senza i dati di tossicità sperimentali.

(5)

TABLE OF CONTENTS

ACKNOWLEDGEMENTS... 2 ABSTRACT...3 RIASSUNTO...4 TABLE OF CONTENTS...5 LIST OF FIGURES...8 LIST OF TABLES...11 1 INTRODUCTION...12

2 LITERATURE SURVEY ON HALOGENATED AROMATIC COMPOUNDS...15

2.1 Polychlorinated Biphenyls (PCBs)...15

2.1.1 Usage and Sources...15

2.1.2 Physico-chemical Properties of PCBs and distribution...16

2.1.3 Exposure and toxicity of PCBs...17

2.2 Brominated diphenyl ethers (PBDEs)...18

2.2.1 Usage and Sources...18

2.2.2 Physico-chemical Properties of PBDEs and distribution...19

2.2.3 Exposure and toxicity of PBDEs...20

2.3 Polychlorinated dibenzo-p-dioxins (PCDDs) and Polychlorinated dibenzofurans (PCDF)...21

2.3.1 Sources2...22

2.3.2 Physico-chemical Properties of PCDD/Fs and distribution...23

2.3.3 Exposure and toxicity of PCDD/Fs...24

2.4 Polybrominated dibenzo-p-dioxins (PBDDs)...24

2.4.1 Sources...25

2.4.2 Physico-chemical Properties of PBDDs and distribution...25

(6)

2.5 Substituted Chlorobiphenyls other than Chloro Group...27

2.5.1 Toxicity of Substituted Chlorobiphenyls...28

3 THEORETICAL BACKGROUND...29

4 PURPOSE OF THE STUDY...31

5 MATERIALS AND METHODS...32

5.1 Collection of the RBA data...32

5.2 Calculation and Selection of Molecular Descriptor...33

5.2.1 Preparation, analysis, and setup of the input data set...34

5.2.2 Drawing structure and geometry optimization...35

5.2.3 Calculation of descriptors...36

5.2.4 Selection of descriptors...36

5.3 Development of models...36

5.3.1 Multiple Linear Regression...37

5.4 Selection of the best model and its validation...40

5.4.1 Internal validation...41

5.4.1.1 Correlation coefficient (R2)...42

5.4.1.2 Correlation coefficient adjusted (R2 adj )...42

5.4.1.3 Cross-validation and squared correlation coefficient with leave-one-out (Q2 LOO) and Leave-More-Out (Q2 LMO) procedure...43

5.4.1.4 Y-scrambling...44

5.4.1.5 Root Means Squared Error (RMSE)...45

5.4.2 External validation...45

5.4.2.1 Predictive squared correlation coefficient Q2 F1, Q2 F2, Q2F3...46

(7)

5.4.2.3 r2

m...49

5.4.2.4 Concordance Correlation Coefficient (CCC)...49

5.4.3 Applicability Domain (AD)...50

5.5 External prediction (Predictivity test of the model)...51

6 RESULTS AND DISCUSSION...52

6.1 Studied data...52

6.2 Modelling AhR binding affinities (log RBA) for a series of mixed HAC...53

6.2.1 Applicability Domain of Model 5...66

6.2.2 Structural Applicability Domain of a Large Set of HAC...68

6.3 Generation of other QSAR model for log RBA...73

6.3.1 Applicability Domain of Model 10...79

6.4 Comparison of the two models...85

6.5 Consensus model...87

6.6 Comparison with the Reported Models...90

7 CONCLUSION...95 REFERENCE...96 APPENDIX A.1...119 APPENDIXA.2 ...113 APPENDIX A.3...134 APPENDIX A.4...139

(8)

LIST OF FIGURES

Figure 2.1 Structures and numbering systems for PCBs. The numbers denote

the various chlorine atoms numbering by carbon position...15

Figure 2.2 Structural formula of PBDEs; the numbers denote the various bromine atoms numbering by carbon position...18

Figure 2.3 Structural formulas of PCDDs and PCDFs; the numbers denote the various chlorine atoms numbering by carbon position...22

Figure 2.4 Structural formula of PBDDs; the numbers denote the various chlorine atoms numbering by carbon position...25

Figure 2.5 Structural formula of common structure of 4'-substituted tetrachlor obiphenyls. The X denote the various substiuents that can occupy 4' lateral position...27

Figure 5.1 Example of a PCDF's molecular structure built with SPARTAN software...35

Figure 5.2 “Data Setup” dialog in QSARINS...37

Figure 5.3 “Calculate models” dialog in QSARINS...38

Figure 6.1 The distribution of log RBA values...52

Figure 6.2 The change in R2 and Q2 with the increase in number of descriptors...54

Figure 6.3 MCDM graph of 6- descriptor log RBA models...57

Figure 6.4 Relative frequency of descriptors appeared in the generated models...58

Figure 6.5 Plot of calculated/predicted vs. observed values of log RBA for the training/test set compounds by model 5, with training set in yellow colour and test set in blue...60

(9)

Figure 6.6 The relationship between the descriptors and log RBA for model 5. (+ indicates positive correlation, - indicates

negative correlation...66

Figure 6.7 Williams plot for model 5, with training set in yellow and test set in blue...67

Figure 6.8 Scatter plot of predicted log RBA values of 135 PCDF and 75 PCDD congeners from model 5...69

Figure 6.9 Scatter plot of predicted log RBA values of (a) 209 PCB congeners

and,(b) 209 PBDE and 75 PBDD congeners from model 5...71

Figure 6.10 Plot of calculated/predicted vs. observed values of log RBA for the training/test set compounds by model 10, with training set in yellow

color and test set in blue...74

Figure 6.11 The relationship between the descriptors and log RBA for model 10

(+ indicates positive correlation, -indicates negative correlation)...79

Figure 6.12 Williams plot for model 10, with training set in yellow and test setin

blue...80

Figure 6.13 Scatter plot of predicted log RBA values of 135 PCDF and

75 PCDD congeners from model 10...81

Figure 6.14 Scatter plot of predicted log RBA values of 209 PCB congeners

from model 10...83

Figure 6.15 Scatter plot of predicted log RBA values of 209 PBDE and 75

PBDD congeners from model 10...85

Figure 6.16 The predictive performance of model 5 for each group of chemicals...86

Figure 6.17 The predictive performance of model 10 for each group of chemicals...87

Figure 6.18 The predictive performance of consensus model for a) PCDF and

(10)

Figure 6.19 Comparison of the predictive performance of consensus

(11)

LIST OF TABLES

Table 5.1 Golbraikh and Tropsha's criteria...49

Table 6.1 Test set composition and its observed log RBA values for the

best division...53

Table 6.2 The 2-6 variable models developed for log RBA of the training

set and their descriptors and validation criteria...56

Table 6.3 External validation parameters proposed by Golbraikh and

Tropsha (2002) and Ojha et al. (2011)...59

Table 6.4 List of DRAGON descriptors appeared in the model 5...61

Table 6.5 The selected 3-6 variable models generated with hold descriptors for log RBA of the training set and their descriptors and validation

criteria...75

Table 6.6 External validation parameters proposed by Golbraikh and

Tropsha (2002) and Ojha et al. (2011)...76

Table 6.7 List of DRAGON descriptors appeared in model 10...77

Table 6.8 The performance of internal and external validation parameters

of model 5, model 10 and consensus model...88

Table 6.9 Comparison of statistical performance of different QSAR

(12)

1. INTRODUCTION

Halogenated Aromatic Compounds (HAC) are considered an emerging group of persistent chemical pollutants resulting from industrial development. Although their production has been stopped following scientific studies that confirmed the environmental problems and issues for living beings related to the adverse effects of these compounds, their residues can still be found in food, water, soil and air. Furthermore, a continuous monitoring of human samples has shown that they are still causing ecological and human health (occupational, environmental) concerns (Khan et al., 2008).

Because of their ubiquity and toxicity, it is necessary to know more about sources, formation, environmental distribution and biological effects of dioxins and related compounds. An effort in this direction has been made in the Stockholm convention on POPs (UNEP, 2001), in order to emphasize their environmental significance.

Many of these Halogenated Aromatic Compounds, such as polychlorinated biphenyls (PCBs) and polybrominated diphenyl ethers (PBDEs), can be released in the environment as main products of the industrial production cycles, whereas others, such as polychlorinated dibenzo-p-dioxins (PCDDs), polychlorinated dibenzofurans (PCDF) and polybrominated dibenzo dioxins (PBDDs), have been identified as byproducts originated from commercial processes and combustion or as a result of chemical modification of artificial molecules under certain environmental conditions in the atmosphere (Rappe et al., 1979; Hutzinger and Roof, 1980).

Their detection in the different compartments of the environment is particularly important because some of these halogenated aromatic compounds biphenyls, dioxins, furans and diphenyl ethers are of great concern for their carcinogenicity: 2,3,7,8-TCDD was classified in 1997 as a Group1 carcinogen chemical by the International Agency for Research on Cancer (IARC, 1997).

Based on observations dating back to mid-1960s, it has become evident that these compounds have common capabilities to be transported over thousands of kilometers (e.g., Sladen et al., 1966; Peterle, 1969) and to fractionate, react and

(13)

bioaccumulate in the organic tissues of the global food chains thanks to their high stability and lipophilic character. The biological activities, including the toxicity, of the Halogenated Aromatic Hydrocarbons is now known to be highly dependent on their chemical structure, especially on the number of substituents and the substitution patterns.

In particular, the toxicity is related to the ability of these halogenated chemicals to bind a cytosolic protein called the Aryl Hydrocarbon Receptor (AhR). AhR plays a very important role in the detoxification of endo- and xenobiotics, it belongs to the basic helix-loop-helix protein family and it is included in an important super-family of regulatory receptors involved in the signal transduction of critical cellular processes such as the regulation of cell growth, differentiation and metabolic processes. AhR is mainly involved in the induction of hepatic cytochrome P4501A1 and it is associated to the Aryl Hydrocarbon Hydroxylase (AHH) and to the 7-ethoxyresorufin O-deethylase (EROD) activities (Safe, 1990; Okey, 1990).

The role of Ah receptor protein in the mechanism of action of toxic halogenated aromatic hydrocarbons has been the subject of several studies (Poland and Knutson, 1982; Eisen et al., 1983; Denomme et al., 1986) and satisfies most of the specific criteria that support a receptor-mediated cellular process.

From the other several specific studies it was possible to discover that the toxic responses in the rats such as thymic atrophy, weight loss, immunotoxicity and acute lethality, as well as induction of cytochrome P-4501A1, are correlated with the relative affinity of PCBs, PCDFs, and PCDDs to the AhR (Safe, 1990; Olivero-Verbel et al., 2004; Mandal, 2005; Ohura et al., 2010). Therefore, the estimation of their binding to the Ah receptor is considered an important step in predicting of the toxic effects of these chemicals. However, the structure of this protein complex is not known in detail yet.

The large number of Halogenated Aromatic Hydrocarbons congeners in the environment and their difficult extraction from the environmental matrices further complicates the determination of the binding affinity for each compound to the AhR. In addition, the pursuit of only in vivo studies is not sufficient to obtain satisfactory results in the short-term. For these reasons, the new Registration, Evaluation, Authorization and Restriction of Chemicals (REACH) regulation was enacted in June

(14)

of the 2007 by the European Union commission. It streamlines and improves the former legislative framework on chemicals with the encouragement of the use of alternative in vitro and in silico methods in order to minimize the data gap on the environmental and toxicological profiles of pollutants: animal testing, high costs, long analysis periods.

Quantitative structure-activity/toxicity relationship (QSA/TR) is one of the scientifically credible methods suggested to predict the ecological effect and fate of chemicals when a small amount or almost no experimental data are available. QSAR method is based on the principle that the chemical and physical properties of chemicals depend on their molecular structure.

The complicated but important aim of these methods is the creation of a quantitative relationship between molecular structure and molecular property for each chemical. Because of the importance of the AhR for determining toxicity of several chemicals, previous studies have shown that reliable QSAR models can be applied to predict toxicity and provide basic data to risk assessment and also used to explain the toxicity mechanisms.

The group of compounds which possess known experimental data relating to chemical-physical properties is called training set. Quantitative structure-activity relationships (QSAR) can be established by comparing structural molecular variations with the variation of the property for every compound in the training set.

As a common and successful research approach, QSAR/QSPR studies are extensively applied to many research areas such as toxicology/aquatic toxicology, environmental chemistry, drug-design and so on.

(15)

2. LITERATURE SURVEY ON HALOGENATED AROMATIC COMPOUNDS

2.1 Polychlorinated Biphenyls (PCBs)

Polychlorinated biphenyls (PCBs) are anthropogenic chemicals classified as persistent organic pollutants (POP) by the Stockholm Convention (adopted in 2001). This group of chemicals includes 209 different compounds (known as congeners) each having a specific number of chlorine atoms located on specific positions of the biphenyls which compose their structure.

PCBs are formed by chlorination of biphenyl and its commercial production started around the 1930's and consist of a biphenyl (two benzene rings with a carbon to carbon bond between carbon 1 on one ring and carbon 1' on the second ring) with a varying number of chlorines. The common structure of PCBs congeners is depicted in Figure 2.1.

Figure 2.1 Structures and numbering systems for PCBs. The numbers denote the various chlorine atoms numbering by carbon position.

2.1.1 Usage and Sources

A study based on high-resolution gas chromatographic (GC) analyses of the commercial PCBs revealed that 132 individual congeners were present in some mixtures (like Aroclor and Chlopen mixtures) and, the relative number of congeners and their concentrations in the commercial products depended on the degree of the chlorination of the mixture (Schulz et al., 1989). Their important physico-chemical properties, like an excellent heat transfer and electrical properties, led to several possible uses for these substance in a variety of industrial, commercial and domestic

(16)

applications as electric fluids in transformers and capacitors, pesticide extenders, adhesives, de-dusting agents, cutting oils, flame retardants, heat transfer fluids, hydraulic lubricants, sealants, paints, and they can be found in carbonless copy paper. The total amount produced worldwide in the 1992 was estimated at 1.5 million tons (Ivanov and Sandell, 1992; Rantanen, 1992).

Although the production and use of PCBs were banned in most industrial countries in the 1970's/1980's, they are not readily biodegradable and can be currently found in the soil, air, seawater and river sediments (Meijer et al., 2003; Shin et al., 2011; Richard et al., 1997; David et al., 2004). An amounts of PCBs may be released to the environment from landfills, small PCB-containing capacitors in household appliances and PCB-containing sealants for buildings (Persson et al., 2005), and also from other PCB wastes, incineration of PCB-containing wastes, and improper disposal of the compounds to open areas.

2.1.2 Physico-chemical Properties of PCBs and distribution

The excellent heat transfer and electrical properties, which were so important for extensive use of PCBs, depend on the chlorination degree of the biphenyls. Because of their stability and resistance to the biodegradation and metabolism, PCBs have been detected in virtually all environmental matrices as indoor and outdoor air, surface/groundwater, sediments, soil and food. For example in 2009 the mean concentration of total PCBs detected in air samples in open burning sites (Guiyu, China) was 414.8 ng m-3 which was significantly higher than the residential

area (4.7ng m-3 air) and villages (1.1 ng m-3 air) (Xing et al., 2009). An inventory on

atmospheric deposition in background surface soil, estimates a global soil total PCBs burden of 21,000 t (Jurado et al., 2004). In many countries (e.g. UK, Australia, USA), the threshold concentration for contaminated soil varies between 10 and 50 mgkg-1,

but in some cases it can be as low as 0.5mg kg-1 (CCME, 1999; EPA, 2009; UKEPA,

2004 and USEPA, 2012).

Some of their applications resulted in a direct or indirect release of PCBs into the environment and it is well known that lower chlorinated PCBs can volatilise and are, thus, more susceptible to atmospheric removal processes (Mackay et al., 1992). In fact, the PCBs are found even in samples taken remotely from known sources,

(17)

suggesting that, like other HACs having similar vapour pressures, they may undergo atmospheric transport. In a study in the USA, 92% of the PCBs detected were in the vapour phase (WHO, 1993).

Due to their lipophilic property, the presence of PCBs has also been identified in biological matrices, like in the fishes, wildlife and human tissues (Tanabe et al., 1987; Fishbein, 1972; Ballschmiter et al., 1981), in human milk extract (Safe et al., 1985b). Approximately 66% of the neonates in Shijiazhuang (China) were delivered in hospital between 2002 and in 2007, and a study undertaken by Sun and coworkers (2011) showed an increase of PCBs in breast milk between the 2002 and 2007, for example the only congener105 increased from 886 pg g-1 fat in 2002 to

1948 pg g-1 fat in 2007.

Because of their structure, planar PCB congeners are expected to interact with hormone function and to result in disease processes in steroid responsive tissues, including cancer in the breast and prostate (Rattenborg et al., 2002, Howsam et al.,2004, Li et al., 2005; Ritchie et al., 2005).

2.1.3 Exposure and toxicity of PCBs

PCB exposure may affect and interfere with major physiological functions of the body, such as reproductive functions (Brouwer et al., 1999 and Loch-Caruso, 2002), neurological functions (Sanchez-Alonso et al., 2003), endocrinal functions (Brouwer et al., 1998), cardiovascular functions (Kopf and Walker, 2009), and immunological function (Hertz-Picciotto et al., 2008).

In vivo structure-activity relationships developed for PCB congeners show that the most active compounds, 3,3',4,4'-tetra-, 3,3',4,4',5-penta-, and 3,3',4,4',5,5'-hexachlorobiphenyl (CAS: 32598-13-3, 57465-28-8 and 32774-16-6 respectively), elicit in rats toxic and biologic effects which are comparable to those observed after treatment with 2,3,7,8-TCDD (Safe et. al, 1985a). However, other evidences indicated that also other members of PCB congeners can cause similar toxic effects. For example: 2,3',4,4',5-penta-, 2,3,3',4,4'-penta-, 2,3,3',4,4',5-hexa-, and 2,3,3',4,4',5'-hexachlorobiphenyl (CAS:31508-00-6, 32598-14-4, 35065-28-2 and 38380-08-4 respectively) cause thymic atrophy in rats (Robertson et al., 1984).

(18)

2.2 Brominated diphenyl ethers (PBDEs)

PBDE is the common name for 209 different brominated diphenyl ether congeners that can be theoretically obtained in complex mixtures via bromination of diphenyl ether or as individual, pure compounds via specific routes for their synthesis (Norström et al., 1976; Golounin et al., 1994; Marsh et al., 1999). Based on the number of bromine substituents, there are 10 homologous groups of PBDE congeners (monobrominated through decabrominated), with each homologous group containing one or more isomers.

Due to their persistence and bioaccumulation potentiality in the environment, PBDEs are also defined as an emerging class of organic pollutants (Renner, 2000; De Wit, 2002; Watanabe and Sakai, 2003). The common structure of PBDE congeners is depicted in Figure 2.2.

Figure 2.2 Structural formula of PBDEs; the numbers denote the various bromine atoms numbering by carbon position.

2.2.1 Usage and Sources

For several years the PBDEs were used primarily as flame retardants. Brominated Flame Retardants (BFRs) are a diverse group of chemicals that are used to increase fire safety and they are incorporated into a wide range of products, such as TVs, computers, household appliances, textiles and upholstery. Some of them (e.g., pentabromodiphenyl ether - PentaBDE) were categorized as persistent organic pollutants (POPs) and thus, they were banned from being used in new electronic equipment in the EU (Directive 2002/95/EC of the European Parliament and of the

(19)

Council of 27 January 2003 on the restriction of the use of certain hazardous substances in electrical and electronic equipment, “RoHS” directive) and Switzerland (Chemical Risk Reduction Ordinance, ORRChem).

In the EU also the contents of decabromodiphenyl ether (DecaBDE) should not exceed 0.1% in electronic equipment as set by the RoHS directive. However, DecaBDE and other BFRs, which might be transformed into toxic compounds such as polybrominated dibenzofurans and dioxins, are certainly present in currently used electronic equipment. It is estimated that up to recent years 67,400 tons of PBDEs were produced annually throughout the world, including penta-BDE, octa-BDE, and deca-BDE (Birnbaum and Staskal 2004; Richardson 2004; Kim et al., 2007).

2.2.2 Physico-chemical Properties of PBDEs and distribution

PBDEs are not fixed in the polymer product through chemical binding, and can thus leak into the environment. PBDEs are chemically similar to the polychlorinated biphenyls (PCBs), and as in this case, they have a biomagnification potential in the food chain (Darnerud et al., 2001; Birnbaum and Staskal, 2004; She et al., 2004; Schecter et al., 2005; Law et al., 2006). The number and position of bromines on the biphenyls influence the congener's ability to enter cells (rate of absorption); for example, tetra- and penta-BDEs are better absorbed than decaBDE (BDE-209) (Hakk and Letcher, 2003; Costa and Giordano, 2007).

Nowadays, the general population is exposed to PBDEs through products such as upholstery, building materials, insulation, electronic equipment, combustion processes and through dietary intake (Bocio et al., 2003; Schettgen et al., 2012). PBDEs have been detected also in house dust, leaves, food and human tissues due to high levels of production and the persistence of PBDEs in the environment (Bakker et al., 2008).

Numerous studies have established the almost ubiquitous presence of PBDEs in the environment, several traces of them, in fact, have been frequently detected in a wide range of environmental samples including fish (Hale et al., 2001; Borghesi et al., 2008), birds (Polder et al., 2008; Vetter et al., 2008), adipose tissue (Haglund et al., 1997; Covaci et al., 2008), human plasma (Klasson-Wehler et al., 1997; Sjödin et

(20)

al., 1999) and recent studies have reported much higher levels of PBDEs even in human milk (Schecter et al., 2003) along with serum levels in the general population (Thomsen et al., 2002; Sjödin et al., 2004). The increase of the accumulation of these compounds is confirmed by the highest serum levels of PBDEs found in infants and toddlers, 418 ng g-1 lipid weight compared to 106 ng/g in mothers, as a

result of exposure through maternal milk and house dust (Fischer et al., 2006). Levels of PBDEs ranging from 4 to 98.5 ng g-1 lipid have also been found in fetal liver

(Schecter et al., 2007).

2.2.3 Exposure and toxicity of PBDEs

The pricipal danger of the PBDEs is attributed to the contribution of their total dioxin-like activity in the environmental samples. Although their activity is much less than that of dioxins and furans (Chen et al., 2001), it has been stated that PBDEs may be developmental neurotoxicants, and may cause neurochemical and hormonal deficiencies (Eriksson et al., 2001; Viberg et al., 2002; Zhou et al., 2002; Branchi et al., 2003). In addition, they can cause negative reproductive effects e.g. delay on set of puberty, decrease in the sperm count and reduction in weight of gonads in male rats (Birnbaum and Staskal, 2004).

The close similarity of the PBDEs to thyroid hormones (T3 and T4) makes them endocrine disruptors (Mc Donald, 2002). In rats the chronic exposure to deca-BDE might cause hepatic and pancreatic adenomas, whereas, a combined incidence of hepatocellular adenomas and carcinomas was seen in mice (NTP, 1986).

Regarding studies on the toxic effects on humans, in Taiwan, elevated PBDE levels in breast milk were correlated with lower birth weight and length, lower head and chest circumference, decrease of the body mass index, and developmental delays in cognition during the growth (Chao et al., 2007). According to a study of Atlantic salmons, the exposure to environmental concentrations of PBDEs are the cause of disorders in the xenobiotic biotransformation, regulation of proliferation, endocrine metabolism and glucose homeostasis regulation (Søfteland et al., 2011).

Although commercial mixtures of PBDEs have physical and chemical characteristics that make them persistent, bioaccumulative and structurally similar to

(21)

some dioxin-like compounds, some experimental evidences have shown that PBDEs do not appear to activate the signal transduction (activation of Ah receptor AhR nuclear translocator protein-XRE complex), but they can anyway bind the Ah receptors (Chen and Bunce, 2003; Peters et al., 2006; Hamers et al., 2006).

However, the most relevant data on the toxicological effects of PBDEs, lead to the conclusion that a greater effort should be made in the experimental field in the future. This is extremely important for the identification and the assessment of the risks for humans due to this new environmental pollutants.

2.3 Polychlorinated dibenzo-p-dioxins (PCDDs) and Polychlorinated dibenzofurans (PCDF )

Polychlorinated dibenzo-p-dioxins (PCDDs) and polychlorinated dibenzofurans (PCDFs) are considered the most famous ubiquitous and dangerous contaminants that persist in the environment. Dioxins were first unintentionally produced as by-products from 1848 onwards as Leblanc process plants started operating in Germany (Weber et al., 2008).

After a series of environmental disasters, to combat the threat posed by specific adverse acute and chronic health effects (including cancer) due to the exposure of dioxins and furans, the Stockholm Convention on Persistent Organic Pollutants was adopted in 2001 and the production and emission of these compounds has subsequently been regulated heavily. Despite recent phasing out of chlorine in many processes and other measures taken to reduce contaminant emissions, levels of PCDD/Fs continue to be of ongoing concern in all environmental compartments for their persistence and ubiquity.

For both groups of compounds, the basic common chemical structure is formed by two chlorinated aromatic rings linked by a central ring on which there are one (furans) or two (dioxins) oxygen atoms. The number of chlorine atoms can vary between 1 and 8 leading to the possible formation of 75 PCDDs and 135 PCDFs possible congeners. The common structure of PCDD/Fs congeners is depicted in Figure 2.3.

(22)

Figure 2.3 Structural formulas of PCDDs and PCDFs; the numbers denote the various chlorine atoms numbering by carbon position.

2.3.1 Sources

PCDFs and PCDDs are primarily formed as unintentional by-products of a range of natural and anthropogenic processes including pulp bleaching, and heating and/or combustion processes involving organic matter or chlorine containing compounds (Mc Lean et al., 2009).

Regarding the combustion, the “Trance Chemistries of Fire Hypothesis” (Crummett, 1982) suggested Polychlorinated dibenzo-p-dioxins and dibenzofurans can arise from diverse combustion processes which include “natural” forest fires and volcanic eruption but some results from the analysis of dated aquatic sediment profiles have shown that the production of industrial halogenated organic chemicals and their subsequent incineration contribute more to the accumulation of PCDDs and PCDFs in sediments than do natural combustion processes (Czuczwa and Hites, 1986).

The presence of others chlorinated organic compounds, such as chlorophenols, chlorobenzenes, polychlorinated diphenyl ethers or polychlorinated biphenyls, is an another aspect to consider that furthers dioxin formation. In fact, laboratory pyrolysis have indeed shown that PCBs give significant yields of PCDFs (Hutzinger and Blumich, 1985).

(23)

2.3.2 Physico-chemical Properties of PCDD/Fs and distribution

The chemical-physical characteristics vary with the degree of chlorination: the number of chlorine atoms decreases the solubility in water (very low even for low-chlorinated) and increases the solubility in fats.

PCDD/Fs, particularly the higher chlorinated, are poorly soluble in water, have a low vapour pressure, a high melting point and a low volatility, and they can strongly adsorb on particles and surfaces (high KOC). Thus, PCDD/Fs can hardly be identified

in water and are immobile in soils. Especially, the 2,3,7,8-chlorine substituted PCDD/Fs which have a long half-lives are extremely stable in the environment and can bioaccumulate in fatty tissues (high KOW) of animals and humans.

In a study conducted by W. Christmann in 1989, sample volumes of 700.000 m3 air were collected over a period of 10-14 days in selected areas throughout West

Germany and Austria and, although the detection limit ranged from 0.01 to 0.1 pg/m3,

in some cases measures of PCDD/Fs levels up to about 100 pg/m3 did occur. Two

year earlier, the atmospheric levels of PCDD/Fs were determined around Kobe (Japan) and the average concentration was 8.8 pg/m3 (Nakano et al., 1987).

The concentrations of polychlorinated dibenzo-p-dioxins have been detected in places very far from their source, even in remote areas such as the Arctic after their transport by winds and tides (Bjorn and Krister 2000; Niu et al., 2003) and, due to their low biodegradability, PCDDs/Fs can be found in the samples after long period from the exposure event. For example, in Taiwan, after 15 years since a mass poisoning from rice-bran oil contaminated by PCDD/Fs, the serum samples of the victims exposed contained TEQ values of the most dangerous PCDD/Fs 46 times higher than those in the general population (Hsu et al., 2005).

PCDD/Fs were determined, unfortunately, even in breast milk of women living in the vicinity of a hazardous waste incinerator in Catalonia, Spain: the current total concentrations of 2,3,7,8-chlorinated PCDD/Fs in breast milk ranged from 18 to 126 pg g-1 fat (1.1-2.3 pg WHO2005-TEQ PCDD/F). From the same study it was detected

that the levels of PCDD/Fs, PCBs, and PBDEs in milk of women living in urban zones were higher than those corresponding to industrial zones (41%, 26%, and 8%, respectively) (Schumacher et al., 2013).

(24)

2.3.3 Exposure and toxicity of PCDD/Fs

Because of their properties, PCDD/Fs have raised concern about their adverse effects such as carcinogenicity, teratogenicity, and mutagenicity on organisms and humans.

It is known that the common mechanism that induces toxicity for both kind of compounds is based on a response mediated by the Ah receptor (Poland and Knutson, 1982) and the toxicity degree depends on the number and positions of the chlorine atoms. This relative affinity of individual PCDFs and PCDDs to the receptor AhR has been correlated with many toxic responses such as thymic atrophy, body weight loss, immuno-toxicity, and acute lethality (Safe,1990). The maximum binding affinity to receptor requires the substitution of Cl at all four lateral positions (Poland and Knutson.,1982; Safe et al., 1985a), in fact, in 209 congeners that make up the group of PCDDs and PCDFs, 17 congeners having the chlorine substituted in 2, 3, 7, and 8 positions are those with higher toxicity and potentially carcinogenic, including the more active 2,3,7,8-Tetrachlorodibenzo-p-dioxin (2,3,7,8-TCDD) with international toxic equivalency factor (TEQ) (IARC, 1997) equal to 1.

2.4 Polybrominated dibenzo-p-dioxins (PBDDs)

Polybrominated dibenzo-p-dioxins (PBDDs) are not as well known as their chlorinated cousins (PCDDs) because of the complex and costly analytical procedures needed to study them, and also because they are frequently found at lower levels than PCDDs. After the classification of the Polybrominated dibenzo-p-dioxins (PBDDs) as new persistent organic pollutants, in the 2010 they were disciplined for the first time thank to an implementation of a matter related to the Convention Stockholm.

Polybrominated dibenzo-p-dioxins are analogues of PCDDs with the all chlorine atoms substituted by bromine atoms and due to the different bromine substitution pattern, there are 75 possible PBDD congeners exactly like the analogous chlorinated compounds.

(25)

Organic Pollutants and Endocrine Disruptors” in 2012, Arnold Schecter introduced with this book more detailed explanations on polybrominated dioxins (PBDDs) and dibenzofurans (PBDFs). The common structure of PBDD/Fs congeners is depicted in Figure 2.4.

Figure 2.4 Structural formula of PBDDs; the numbers denote the various chlorine atoms numbering by carbon position.

2.4.1 Sources

PBDDs (and dibenzofurans PBDFs) were detected as trace contaminants in brominated flame retardants and, the main source of these substances in the environment is related to the combustion of compounds with the bromine, e.g., in municipal and industrial incinerators and in internal-combustion engines (WHO, 1998; Weber and Kuch, 2003). About the incinerators, cheap and primordial methods, like roasting, pyrolysis and combustion not controlled are often used to dismantle the electronic waste which contains considerable amounts of brominated flame retardants (BFRs), and the results are significant emissions of PBDD/Fs (Li et al., 2007; Ma et al., 2009).

2.4.2 Physico-chemical Properties of PBDDs and distribution

Although the PBDD/Fs are more lipophilic and less water-soluble than PCDD/Fs, the brominated compounds appear to be less environmentally persistent and more sensitive to UV degradation, possibly because bromine is a better leaving group than chlorine. The biochemical properties of the dioxins and furans are also

(26)

altered by the bromine atom, since the larger size of the bromine atom alters the susceptibility to enzymatic attack, and the carbon–bromine bond has lower strength than the carbon–chlorine bond. However, even if the degradation could be faster then that of PCDDs, considering the high structural similarity, PBDDs show similar biological properties, geochemical behaviour and a similar distribution way to their chlorinated counterparts.

High levels of PBDD/Fs have been found in many industrial materials for disposal, for example an average levels of 280,000 ng g-1 of PBDDs was found in

waste television cabinets manufactured in Japan between 1984 and 1998 (Sakai et al., 2001).

Other contamination sources can be the workshop floor dust and leaves from large-scale-e-waste recycling facilities, air and soil from chemical-industrial complexes as well as agricultural areas. For example, high concentrations of 8.12 -61 pg Mm-3 were reported in the air from a large electronic waste recycling facility in

Guiyu, southern China (Li et al., 2007). Concentrations of PBDD/Fs in soil from an incineration of electronic wastes were 4 orders of magnitude higher than the concentrations found in the soil near a chemical-industrial complex (Ma et al., 2009).

2.4.3 Exposure and toxicity of PBDDs

Analysis of POPs in human samples has revealed that PBDD/Fs contribute to 15% of the total dioxin TEQ (Jogsten et al., 2010). Overall, though there are limited database on the health effects of PBDDs, it is not excluded that these brominated congeners may have similar toxicity, biological and geochemical behaviour to their counterparts chlorinated dibenzo-p-dioxins (PCDDs). Those congeners with bromine atoms in the 2,3,7,8-positions have turned out, even this time, to be much more toxic than the others.

After a careful study on the differences between the effects of PCDDs and PBDDs on rats, it has been shown that TBDDs are generally well absorbed following either oral or pulmonary exposure, but dermal absorption is about three times lower than that of TCDD (Diliberto et al., 1993).

(27)

Regarding the toxic effects tested, this group of compounds might induce hepatic aryl hydrocarbon hydroxylase (AHH) and ethoxyresorufin-o-deethylase (EROD) in rats and cause thymic atrophy in rats and guinea pigs. Tetrabrominated dibenzo-p-dioxin (TBDD) and dibenzofuran (TBDF) are reproductive toxins in mice and produce skin lesions in the rabbit-ear acnegenic test (Mennear and. Lee, 1994).

2.5 Substituted Chlorobiphenyls other than Chloro Group

Although 2,3,4,5-tetrachlorobiphenyl (CAS number: 33284-53-6) is a poor ligand for the cytosolic Ah receptor (Bandiera et al., 1982) that does not induce aryl hydro-carbon hydroxylase (AHH) or ethoxyresorufin O-deethylase (EROD) and is relatively non-toxic, the chlorination in position 4' can significantly change the properties of this molecule.

The common structure of 4'-substituted-2,3,4,5-tetrachlorobiphenyl congeners is depicted in Figure 2.5.

Figure 2.5 Structural formula of common structure of 4'-substituted tetrachlorobiphenyls. The X denote the various substiuents that can occupy 4' lateral position.

Some of these compounds like hydroxylated polychlorinated biphenyls (OH-PCBs) are formed by oxidative metabolism of PCBs via cytochrome P450 monooxygenases catalysis (Morse et al.,1995) and the substitution at the 4' lateral position with other substituents different than chlorine was considered important for the activity of this group of halogenated biphenyls. In fact, from results of a study in

(28)

1988, it was proposed that binding of the 4'-X-C12H5Cl4 derivatives to the the rat

cytosolic Ah receptor is favored by the increasing of electronegativity, lipophilicity and hydrogen bonding characteristics of the 4' substituent (Parkinson et al., 1988).

Competitive binding experiments, indicated that the relative competitive potency of the 4'-substituted 2,3,4,5-tetrachlorobiphenyls follows the order CF3 > CH(CH3)2> I >

Br > CH2CH3> Cl for the most effective ligands (Safe et al., 1985a).

2.5.1 Toxicity of Substituted Chlorobiphenyls

The in vivo activity, 2,3,4,4',5-pentachlorobiphenyl (PCBP) can induces microsomal AHH and EROD in male Wistar rats (Parkinson et al., 1980 ), C57BL/6J mice (Parkinson et al., 1982) and rat hepatoma H-4-II-E cells in culture (Sawyer and Safe, 1982), and also induces cytochromes P-450c and P-450d in male Long-Evans Rats (Parkinson et al., 1983). In addition, PCBP causes thymic atrophy in rats and C57BL/6J mice and binds to the rat hepatic cytosolic Ah receptor (Bandiera et al., 1982).

The analogous compound, 4'-iodo-2,3,4,5-tetrachlorobiphenyl, was identified as the most potent inducer of AHH and EROD followed by 4' Br- and 4'Cl-2,3,4,5-tetrachlorobiphenyls; however, the 4'-trifluoromethyl-, 4'-nitro-, 4'-cyano-, 4'-tert-butyl-, 4'-isopropyl-, 4'-fluoro-, 4'-phenyl-, 2,3,4,5-tetrachlorobiphenyls also significantly induce both of the cytochrome P-448-dependent monooxygenases (Safe et al., 1985a).

(29)

3. THEORETICAL BACKGROUND

Many different algorithms and computer software are available for QSAR model development. Most are based on linear or multiple linear regression (MLR) with variable selection, partial least squares (PLS), as well as non-linear (genetic algorithms, artificial neural networks) methods.

Until now, the published QSARs models for the prediction of toxicity of halogenated aromatic compounds (HAC) by the AhR binding affinity (RBA) are limited. The previous studies reporting the results of QSAR models developed for AhR binding affinities of HAC are summarized in the following paragraphs.

Some models were proposed to describe the binding affinity of PBDEs with AhR. For example, Gu et al. (2010) used the density functional theory (DFT) analysis to evaluate electronic properties including the polarisabilities, polarisability anisotropies and quadrupole moment for 18 congeners of polybrominated diphenyl ethers.

Gu et. al. (2007) established, through step-wise multiple regression analysis, a QSAR model using data of 34 PCDFs (DF included), which were obtained for the first time from bioassay on the rat hepatic cytosol AhR. The observed values were used as dependent variable correlated with the DFT-calculated descriptors.

Benfenati et al. (2006) developed QSAR models for PCDDs, PBDDs, PCBs, PCDF and others like naphthalenes, indolocarbazoles, and indolocarbazoles derivatives using PLS as statistical analysis.

Gu et al. (2012) developed a simple but potent QSAR model for the AhR binding affinities (RBAs) of 18 PBDEs by direct PLS analysis considering WHIM and DFT-calculated descriptors.

One year later, Li et al. (2013) generated a structure-based 3D-QSAR model after investigating the interaction modes between ligands and receptor for RBAs to AhR of 18 PCDEs. They also identified the structural characteristics influencing the activity.

(30)

Diao et al. (2010) have developed three simple QSAR models for modelling the toxicity of polychlorinated dibenzop-dioxins and dibenzofurans (PCDD/Fs) by partial least square regression. These three optimal QSAR models are developed for 25 PCDDs and 35 PCDFs together. Li et al. (2011) have employed an integrated molecular docking and 3D-QSAR approach to investigate the binding interactions between 14 PCBs, 13 PCDDs, 3 PC/BDs, 8 PBDDs, 27 PCDFs and AhR. Safe et al. (1985a) determined the quantitative structure-activity relationships for a series of PCBs substituted congeners other than chloro group (4'-substituted (X)-2,3,4,5-tetrachlorobiphenyls, where X = H, OH, CH3, F, OCH3, COCH3, CN, Cl, CH2CH3, Br, I,

CH(CH3)2, CF3, NO2, NHCOCH, C6H5, C(CH3)3, CH2)3CH) using MLR analyses

comparing the EC50 values for the competitive binding avidities to the rat cytosolic Ah

receptor protein and AHH/EROD induction.

Ashek et al. (2006) have created comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) models to explain the observed structure–activity relationship of binding between Ah receptor and a series of dioxins(14 PCDDs, 3 PC/BDDs, 8 PBDDs) and dioxin-like compounds (14 PCBs, 16 4'-substituted -2,3,4,5-tetrachlorobiphenyls, 34 PCDFs). Chen et al. (2011) investigated the toxicity data (pEC50) of 95 doxins and dioxin-like

compounds (including 4'-substituted-2,3,4,5-tetrachlorobiphenyls) by quantitative structure–activity relationship (QSAR) with the molecular fragments variable connectivity index (mfVCI). Slavov et al. (2013) developed a consensus partial least squares (PLS)-similarity based k-nearest neighbours (KNN) model utilizing 3D-SDAR (three dimensional spectral data-activity relationship) fingerprint descriptors for prediction of the log (1/EC50) values of a data set of 94 aryl hydrocarbon receptor

(31)

4. PURPOSE OF THE STUDY

The main purpose of the present study is to explore QSTR models relating to Ah receptor binding affinity (RBA) values available in literature for Halogenated Aromatic Compunds (HAC) including PCBs and their 4'-substituted congeners other than chloro group, PBDEs, PCDD/Fs, and PBDDs (3 bromo chloro substituted dibenzo dioxin compound) by employing Multiple Linear Regression (MLR) method and to predict the Ah receptor binding affinity (RBA) values of selected HACs outside the sample set using the best developed model, and to compare the best model with the reported literature models.

The main steps to reach the purpose of the study are the construction of a model according to the following scheme:

(1) to split the data set into a training and test sets for the generation of the model; (2) to calculate the theoretical molecular descriptors representing the studied molecular structures from DRAGON 06 and SPARTAN 04 software;

(3) to select descriptors from a descriptor pool using all subset and Genetic Algortithm tools of QSARINS software (evaluation version b1.1 2012);

(4) to validate the model externally using the test set;

(5) to validate the model internally using the leave-one/many-out cross-validation and Y-scrambling procedures;

(6) to define applicability domain using the leverage approach by highlighting both the response-outliers and the structural influential chemicals (Williams graph).

A satisfactory final result should be the creation of a QSTR model that has to be both descriptive (pinpointing the key descriptors) and predictive (able to predict the toxicity (RBA) of compounds which are not included in the QSTR determinations).

(32)

5. MATERIALS AND METHODS

The main steps involved in the development and analysis of a QSAR model can be summarized as:

1) Collection of the Aryl hydrocarbon Receptor (AhR) binding affinity (log RBA) data from the literature;

2) Calculation and Selection of Molecular Descriptor; 3) Development and validation of the model;

4) Prediction of the unknown logRBA values of congeners using the proposed model.

5.1 Collection of the RBA data

The first step for constructing a QSTR model is the compilation of experimental data about the chemical property of interest. After a literature search regarding all possible halogenated congeners with experimental log RBA values, 108 compounds will be taken into consideration to compose the data set:

The experimental log RBA values of 14 PCDDs and 35 PCDFs will be mainly obtained from the investigations performed by Safe (1990);

The experimentally determined log RBA values for 18 PBDEs will be taken from Chen et al. (2001) and the values for 8 PBDD and 3 bromo/chloro dibenzo-p-dioxins from Waller and McKinney (1995);

In addition, log RBA values of 14 PCBs and 16 tetrachlorobiphenyl derivatives with para-substituents will be considered and taken from Safe et al. (1985a);

All these RBA data collected from literature and considered in the present study were analyzed in rat hepatocytes.

(33)

5.2 Calculation and Selection of Molecular Descriptor

A molecular descriptor provides a means of representing molecular structures in a numerical form. The number may be a theoretical attribute (e.g. relating to size or shape) or a measurable property of chemical compounds.

A molecular descriptor has been defined as: “the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment to measure a molecular attribute” (Todeschini and Consonni, 2000) and their application in silico methods is useful to describe different features of chemicals, to compare different chemicals structures or different conformations of the same chemical, and relate structure to activity (i.e. develop QSARs) or structure to toxicity (i.e. develop QSTRs).

Suitable molecular descriptors, besides the trivial invariant properties, should satisfy some basic requirement suggested by Randic, 1996, listed below:

1) Should have structural interpretation

2) Should have good correlation with at least one property 3) Should preferably discriminate among isomers

4) Should be possible to apply to local structure

5) Should be possible to generalize to “higher” descriptors 6) Descriptors should be preferably independent

7) Should be simple

8) Should not be based on properties

9) Should not be trivially related to other descriptors 10) Should be possible to construct efficiently

(34)

11) Should use familiar structural concepts 12) Should have the correct size dependence

13) Should change gradually with gradual change in structures

The number of descriptors calculable by the software can depend on the structure of the molecules and many different kinds of descriptors can be calculated. For example, the constitutional descriptors are basically related to the number of atoms and bonds in each molecule. Topological descriptors include valence and non-valence molecular connectivity indices calculated from the hydrogen-suppressed formula of the molecule, encoding information about the size, composition and the degree of branching of a molecule. The topological descriptors describe the atomic connectivity in the molecule. The geometrical descriptors describe the size of the molecule and require 3D-co-ordinates of the atoms in the given molecule. The electrostatic descriptors reflect characteristics of the charge distribution of the molecule. The quantum chemical descriptors include information about binding and formation energies, partial atom charge, dipole moment, and molecular orbital energy levels (Lü et al., 2007).

5.2.1 Preparation, analysis, and setup of the input data set

The first step in QSAR modelling for the calculation of molecular descriptor is the preparation of a suitable data set which is then used as the input for the QSAR model generation by means of statistical analysis. This data set consists of the experimental responses (dependent variable) and the corresponding set of molecular structure descriptors for each chemical (independent variables).

In this study the dependent variable will be the experimental values of binding affinity for each halogenated aromatic compounds to the Ah Receptor collected from the literature.

(35)

negative logarithm of the chemical molecular concentration necessary to displace 50% of radiolabeled 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) from the Ah receptor and reported as plC50. Molecular descriptor values will be used as the independent variables. They will be used in turn to predict the dependent variable (-log RBA).

5.2.2 Drawing structure and geometry optimization

For the purpose of the molecular descriptors calculation for each compound it is a necessity to draw the structures of all compounds in the data set using Spartan 04 software package. An example of a built molecular structure is shown in Figure5.1. Subsequently, a geometry optimization of the structures must be done employing the semi-empirical PM3 method by including the effect of an aqueous solvent.

Figure 5.1 Example of a PCDF's molecular structure built with SPARTAN software.

The choice of considering an aqueous solvent around the compounds is fundamental to obtain a proper QSTR model because the observed data on the affinity of binding to the AhR (describing the toxicity of the molecules) were calculated on biological tissues (rat's tissues), therefore the descriptors should be calculated in a similar matrix.

(36)

5.2.3 Calculation of descriptors

The molecular descriptors computed from SPARTAN software are the molecular weight, the molecular volume, the dipole moment, and quantum chemical parameters such as the energy values of the highest occupied molecular orbital (EHOMO), the energy values of the lowest unoccupied molecular orbital (ELUMO), gas-phase energy (E), aqueous-phase energy (Eaq). Using these quantum chemical parameters additional variables such as ELUMO-EHOMO gap, chemical potential (μ), hardness (η); softness (σ) and electrophilicity index (ω) will be calculate according to the equations proposed by LoPachin et al.(2007). Additionally, optimized SPARTAN structures will be saved as mol2 file and loaded into the Dragon 06 software to calculate different and several sets of molecular descriptors (topological, geometrical, WHIM, 3D-MoRSE, molecular profiles descriptors).

5.2.4 Selection of descriptors

Usually, the number of calculated descriptors is very high (hundreds, sometimes even thousands) to have the possibility to represent different features of the chemical structure in different ways, but the problem is that most of them could be intercorrelated and redundant giving very similar structural information. To mitigate the redundancy of intercorrelated descriptors, a pre-reduction of descriptors is performed based on an objective selection through, even in this case, QSARINS software: (1) tests of identical values (constant variables); (2) pair-wise correlations (according to a user defined cutoff value; suggestion: 95%) calculating the correlation among all couples of descriptors and, if a couple is found to be highly correlated, the descriptor with the higher correlation with the other descriptors is deleted. Descriptors can also be deleted if the percentage of the compounds sharing the same value is too high (suggestion: 80%).

5.3. Development of models

The descriptor values from SPARTAN and DRAGON will be saved as text file to be loaded in QSARINS software. Once the data are imported, the compounds and the descriptors for the development of the model can be selected from Data setup

(37)

dialog (Fig. 5.2); in particular the available tools in Data setup dialog allow for the selection of: (1) the molecular descriptors to include in the subsequent variable selection procedure, (2) the response/endpoint to be modeled which in this case is expressed with plC50 (the negative logarithm of the chemical molecular concentration necessary to displace 50% of radiolabeled 2,3,7,8-tetrachloro-p-dibenzodioxin from the Ah receptor), (3) the status of the molecules under study (training, prediction, unknown) (Gramatica et al., 2013).

Figure 5.2 “Data Setup” dialog in QSARINS.

5.3.1 Multiple Linear Regression

Once the data selection has been performed, the next step is the development of the QSAR model. For the development and validation of a model, in this study, the use of QSARINS software which is based on the Ordinary Least Squares (OLS) will be considered (Gramatica et al., 2013). The Multiple Linear

(38)

Regression (MLR) is a linear equation which performs the relationships between the behaviour of chemicals as defined by the model endpoints (in this case the AhR binding affinity) and different descriptors of chemical structures. It is represented by the following equation:

Y= b0 + b1 X1 + b2 X2 + ... bn Xn

where Y is the property (AhR binding affinity, -log RBA) or dependent variable, X-Xn are the specific molecular descriptors, b1 -bn are the coefficients of the descriptors and b0 is the intercept of the equation.

The relationship between the descriptors of chemical structure and the toxicity endpoint in a QSAR model is called the “algorithm” of the model. Multi Linear Regression (MLR), in particular OLS (Ordinary Least Squares) is the most popular regression method and it can produce a transparent and easily reproducible algorithm.

Taking into account the large number of descriptors calculated, the use of all the countless combinations of the available descriptors for models calculation by means of the MLR would be impossible. Thus, to reduce the time for model calculation, in “calculate models” dialog (Fig. 5.3) the operator can select only a small number of descriptors per model to explore all the low dimension combinations, and then apply a genetic algorithm (GA) procedure for the development of models based on a bigger number of descriptors.

(39)

As in the natural selection, the GA procedure selects the best descriptors (or small groups of descriptors) and discards the other. The tuning of the GA can be done varying the population size (or the number of descriptors considered in the calculation) and the mutation rate and, repeating this procedure many times with different settings, the result at the end of the process will be a population of models with better performances than the initial models.

Statistically, in the development of a models it should be appreciated if there are real restrictions in the number of variables that should be used as compared to the number of observations. Therefore, the ratio of observations to variables should be as high as possible, and at least 5:1 (Topliss and Costello, 1972)

Moreover, during the procedure for the models development it is possible to set up the software holding some descriptors from a potentially valuable model and doing the calculation of new models by adding one new descriptor.

An increase of R2 and Q2 for the model obtained after the adding is a positive

consideration but, regarding on models with too many descriptors, it can give an overoptimistic idea of the model's predictive ability and overfitting. For this reason, and to avoid this over-parameterization of the model, an increase of the R2 value of

less than 0,02 was chosen as breakpoint criterion.

Additionally, QUCK rule can be applied to test intercoorelation of descriptors. This method tests whether the total correlation among the block of descriptors (KXX)

is higher than the correlation among them and the responses (KXY), that is to say:

before the model is completely evaluated (regression parameters and coefficients), a model is discarded if KXY-KXX < DeltaK (in QSARINS it suggested to set a QUIK rule

value ≥ 0.05, i.e. Delta KXY-KXX ≥ 0.05), where DeltaK is a user defined threshold

value. Through this filtration system, a model makes sense if the correlation among the descriptors is smaller than the correlation among them and the response, in other words, the model has low multicollinearity and good correlation with the modeled response. In this study we set the QUICK rule to 0.05 level as indicated in “Calculate models” dialog box in Figure 5.3.

(40)

5.4 Selection of the best model and its validation

The next step is to select the “best” one on the basis of OECD principles. All the QSTR models that are going to be developed will take into account the Economic Co-operation and Development (OECD, 2007) principles for their validation. A validation of a newly created model is a necessity to assess the reliability and relevance of this model and it is important to verify that the model is not obtained by chance.

The OECD principles provide: 1. A defined endpoint;

2. an unambiguous algorithm;

3. a defined applicability domain (AD);

4. appropriate measures of goodness-of-fit, robustness and predictivity; 5. when possible, a mechanistic interpretation of the models.

The fourth OECD principle related to the validation model imply an internal and external validation to check various model performances, that is, fitting, stability or robustness (internal validation) and capability to predict new chemicals (external validation). This analysis is complex, because model performances vary depending on the different criteria which are used to evaluate them. First, a model must have at least a high ability to reproduce the data used to calculate it, that is, a good fitting. Then, a model must be verified to have a high capacity to predict portions of the

training set, using techniques collectively known as cross-validation (CV) or internal

validation. If the model passes this verification, the model can be defined as robust and stable (Gramatica et al., 2013).

To guarantee its ability in providing reliable predictions on new chemicals, the model should have a good performances in predicting the external data set: this procedure is called external validation. Therefore, before performing modelling, the

data set has to be divided into a training set (80% of compounds of the data set) and

a test set (20% of data set). The first set is used to derive the model through the application of a statistical method, the second one contains chemicals which have not to be used in the calculation of model but used for the external validation: they are in turn put aside and used to check the model’s ability to predict them.

(41)

An important parameter to obtain an optimal model is the size of the training

set: The larger or more compounds comprising it, the more the final model will be

able to find a proper structure-activity relationship (Roy et al., 2008). The created model is used to predict -log RBA values for the training and test set compounds and if these predictions are significantly bad it can indicate a poor predictive ability of the model. In this case, the procedure of training and test set selection and external validation (discussed later) should be repeated several times, in order to identify the QSTR model with training set that affords the best prediction power for the test set.

The division of the data set can be done randomly (random division set by software) or it can be done manually: initially ordering the compounds according to the increasing order of their toxicity values, and then choosing some compounds for the test set. If the data set has 100 compounds and the test set must be 20 percent of the data set, one out of five compounds will be assigned to the test set (always including the most and the least active compound in the training set). This splitting guarantees that the training set covers the entire range of the experimental responses. The choice is repeated by changing the starting order of assignment of the compounds in the test set.

Another partition method to create a possible optimal division of the data set can be performed by applying the hierarchical clustering technique in terms of pIC50 values. In this study clusters were obtained using “between group linkage” method and the “squared Euclidean distance” as executed by SPSS 21 software.

5.4.1 Internal validation

After calculation and storage of the models, predictions for endpoints of chemicals in the training set are used to assess the goodness-of-fit and the robustness of the model, which is a measure of how well the model accounts for the variance of the response in the training set. The internal validation parameters for each model are the squared correlation coefficient (R2) and the adjusted (for degrees

of freedom) squared correlation coefficient (R2

adj). Additionally, internal validation will

be tested using also the parameters like cross validation leave-one-out (Q2 LOO),

leave-many-out (Q2

LMO), Yscrambling, and the root mean squared errors on (a)

Riferimenti

Documenti correlati

Among voltage-gated K ⫹ channel (Kv) genes, the Kv7 (KCNQ) family comprises five members (Kv7.1–5); Kv7.1 is mainly expressed in cardiac cells, whereas Kv7.2, Kv7.3, Kv7.4, and

study, neonatal antibiotic exposure was associated with reduced weight and height gain in boys whilst antibiotic use later in infancy and childhood was associated with increased

This is the first study to evidence a positive association between adherence to the MD evaluated by PREDIMED score and its indi- vidual foods with the HGS in elderly active women.

Among the three groups with different binding sites and interactions, the main difference was observed between groups 1 and 3, that show completely different

We describe a case of a metastatic fracture of the humeral neck treated by percutaneous insertion of metallic nails through an 8-gauge cannula followed by PMMA bone cement

Concerning the change of phase during flashing flow, the models can be classified into two main categories: (a) models accounting for the nucleation process, which generally show