• Non ci sono risultati.

Consequently a band of 4 kbp with the same brightness of a band of 2 kbp will contain one half of the molecules

N/A
N/A
Protected

Academic year: 2021

Condividi "Consequently a band of 4 kbp with the same brightness of a band of 2 kbp will contain one half of the molecules"

Copied!
5
0
0

Testo completo

(1)

64 4. DISCUSSION

4.1. PCRs Yield And Coverage

The almost complete superimposition of the PCR yield (Figure r3) and global coverage (Figure r7) plots indicates that attempts to achieve an equimolar amount of amplicons were not entirely successful. Although the volume of each pooled amplicon was carefully balanced, yet low-yield or failed PCRs were under-represented in the sequencing output. In addition, an error occurred during evaluation of PCR yields. When assigning a score based on the brightness of each single band, the length of the amplified sequence was not taken into account. It is worth recalling that the brightness of a band is linearly correlated with the number of nucleotide present in that band. Consequently a band of 4 kbp with the same brightness of a band of 2 kbp will contain one half of the molecules. If at the scoring step this fact is not taken into account, the two PCR products will be pooled in the same amount and, as a result, the 4 kbp product will be less represented into the final pooled DNA mix.

Despite this error is for sure responsible of the decreasing in coverage of the longer amplicons (i.e. the genic regions) compared with the shorter ones (the Hominid regions), the decrease in yield should be proportional to the length of each amplicon (the longer was the amplicon, the less molecules were pooled). As it appears clear from the two plots (Figures r3 and r7) the above described error cannot be the unique responsible for the coverage divergence between the two classes of amplicons. Figure r4 shows how the yields of the PCRs inversely correlate with the length of the amplified fragments. This means that the longer was the fragment the lower was the yield of the PCR. This trend is not affected (but, in case, improved) by the incorrect scoring step. Taking the biased scoring into

(2)

65 account, a low yield PCR would indeed get an even lower score. Therefore the PCR yield trend must be charged (being the primer pairs already tested and all equally working) to the DNA quality of the Daghestani samples. The scoring method (based just on the “b” and “e”

rows) was optimized to check the Daghestani amplification and was almost not sensitive to the control population (supposedly with a better DNA quality, having being recently extracted from cell cultures). It is indeed very likely that, in the lapse of time between the DNA extraction (2005) and this study (2009) the Daghestani DNA molecules underwent a slight degradation, which statistically affects more the long amplicons than the short ones.

4.2. Quality Controls

The quality controls obtained in results can be interpreted as follows.

A discordance rate of 0.75% (<1% being reasonably acceptable) shown in Table r2 means that 99.25% of the sequenced bases were processed correctly. This score obviously excludes those bases that were not sequenced at all or whose coverage or base calling was not good enough to pass the genotype filters. The feedback process to get the best genotype filters (coverage threshold and allele ratio) was performed to minimize this score. As a result of this process the amount of missing data increased, since the positions below those filter thresholds were added to the positions actually not sequenced. Discarding such huge amount of data is acceptable because reducing the amount of data to maximize its quality is more effective than keeping the complete data and greatly decreasing its confidence.

The false negative rate is a measure of the percentage of data (in this case Non Reference SNPs) not included in the final output whereas it should have been. It basically indicates the amount of missing information. From Table r2 results that the overall missing data

(3)

66 (because it was not sequenced or did not pass the genotype filters) amounts to 27%. This high rate of missing data is consistent with the low PCR yields and, therefore, coverage discussed above. With more than one fourth of the overall sequences missing, a FN rate of 16% is therefore acceptable and expected.

Concerning the false positive rate (Table r4), assuming the 35% of the found SNPs being potential false results is higher than expected. Despite further filtering the final results were not acceptable because although they led to an improved FP rate but enormously increased the FN. It is likely that altering the parameters for filtering using the MAQ quality score will ultimately bring a balance between FP and FP rates. As half of the samples have yet to be sequenced and, therefore, the dataset is incomplete, further filtering will be attempted once all the sequences are available. It is worth emphasizing that no check on inserted or deleted (indel) regions has been performed. Indels are known to reduce the alignment quality, and misaligned reads provide false evidence for a SNP at those positions.

Consequently it is very likely that the SNP positions surrounding an indel would be misread and contribute to the high false positive rate.

4.3. Demographic Events And Selective Sweeps

The reduced number of samples (max 12 individuals per population in Laks) makes statistical processing of the currently available data meaningless. The very high p values found for the summary statistics produced so far, further confirm the empirical finding that at least 30 chromosomes are needed to get reliable results (Pickrell, Coop et al.2009). At this stage it is, therefore, premature to make any comments about the demographic events or selective sweeps that could have affected the studied populations. Once all samples have

(4)

67 been sequenced the presence of selective sweeps in a given genic region could be detected, narrowed down and the mutation responsible for such signal characterised.

4.4. SNPs Analysis

Despite the reduced amount of samples obtained so far, the SNPs position and consequent analysis obtained from the re-sequenced data can still be analyzed, while being aware of the high FP (35%) encountered. To reduce the risk of a non synonymous or more generally

“interesting” SNP being just a false positive, the coverage and the other relevant parameters were carefully checked to guarantee a high confidence for that SNP calling.

This does not remove the risk of a given position being just a false result but at least ensures that the same was reliably sequenced.

Out of the 39 SNPs found in the exons (out of 1248 found in coding regions), 20 were non synonymous, resulting in a modification of the amino acidic sequence of the given protein.

Of these 20 NS SNPs and, therefore, amino acidic substitutions, 4 are reported as

“damaging” while 12 are tolerated and, at this stage, must be considered neutrals (4 NS SNPs were automatically excluded by SIFT from the analysis). Once the data is complete, a more refined analysis will be possible to determine whether these loci are under selection and, therefore, whether even a tolerated NS SNPs should be regarded as a sign of adaptation. The same is true for the synonymous SNPs and other present in UTRs or un- transcribed regions.

The 4 “damaging” mutations that were identified in-silico in the HIF1α, NOS3, ACE and EPOr, genes have a high probability of changing the function of these proteins. However, their in-vivo effects are unknown but they seem to be good candidates to account for

(5)

68 genetic adaptation of the Daghestani populations to the high altitudes. As shown in Figure r11, three out of four oh the NS damaging SNPs are not present in the populations living at high altitudes, whereas one of them (#HighAltitude917) is present with at a higher frequency in the Kubachians. Due to the reduced amount of data it is not possible to make a reliable prediction, but further data may confirm this trend. It hence could be argued that the mutation #HighAltitude917 enhances the positive effects of EPOr (either increasing or reducing its activity), whereas the #HighAltitude591, #HighAltitude1135 and

#HighAltitude1366 mutations are deleterious for a hypoxic environment adaptation and were removed from the genic pool of the Daghestani populations.

Riferimenti

Documenti correlati

Her report Redefining Justice (Payne, 2009) brought a valuable insight into failings of the justice system and made a number of recommendations, for example

To deal with the unknown clutter statistics, three covariance matrix estimators, namely the sample covariance matrix (SCM), the normalized sample covariance matrix (NSCM) and the

A: The acknowledgement of the WFD in the national legislation is expected within December 2003 followed from the identification of the Districts of Basin and of the authorities

At the quantum level, the model’s path integral sums over all possible Dirac operators whose on-shell moduli space corresponds to conformal geometries, and so as promised, the model

Objective: To assess the multiexaminer reproducibility and the accuracy comparing with cadaver anatomic specimens of ultrasound (US) measurement of femoral articular cartilage

The different types of membranes are released at times depending on the TMAH solution formulation, etching temperature, and pattern dimensions (but not membrane dimensions).. A

In riferimento alle applicazioni di queste piattaforme in saggi di detection, la rilevazione del fenomeno di binding tra un recettore e il suo analita viene misurato sulla base