• Non ci sono risultati.

A bird’s-eye view of Italian genomic variation through whole-genome sequencing

N/A
N/A
Protected

Academic year: 2021

Condividi "A bird’s-eye view of Italian genomic variation through whole-genome sequencing"

Copied!
9
0
0

Testo completo

(1)

AperTO - Archivio Istituzionale Open Access dell'Università di Torino

Original Citation:

A bird’s-eye view of Italian genomic variation through whole-genome sequencing

Published version:

DOI:10.1038/s41431-019-0551-x

Terms of use:

Open Access

(Article begins on next page)

Anyone can freely access the full text of works made available as "Open Access". Works made available under a Creative Commons license can be used according to the terms and conditions of said license. Use of all other works requires consent of the right holder (author or publisher) if not exempted from copyright protection by the applicable law.

Availability:

This is a pre print version of the following article:

(2)

FVG CAR VBI SAU ILL RSI ERT CLZ SMC (a) CAR FVG VBI TSI 0 10 20 0.01 0.03 0.05 0.49 MAF Prop ortion of sites (%) (b) 0 2500000 5000000 7500000 10000000 12500000 1000GP hase3 EUR -1000G Phase3 UK10K+1 000GPh ase3 Sites MAF > 5% Sites 1% < MAF <= 5% Sites MAF <= 1% (c)

Figure (1) a) Geographic localization of the three study cohorts. b) Minor allele frequency spectrum

of the final INGI data set. For comparison, the Minor allele frequency spectrum of the TSI cohort from 1000GPhase 3 data has been added. c) Novel sites identified in the whole INGI dataset, compared to available resources. The majority of the private INGI sites are in the range of the rare variants (singletons sites are included).

(3)

NW-ITA VBI CAR FVG 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.010.010.010.010.010.010.010.010.010.010.010.01 0.020.020.020.020.020.020.020.020.020.020.020.02 0.050.050.050.050.050.050.050.050.050.050.050.05 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.010.010.010.010.010.010.010.010.010.010.010.01 0.020.020.020.020.020.020.020.020.020.020.020.02 0.050.050.050.050.050.050.050.050.050.050.050.05 0.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.0050.005 0.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.01 0.020.020.020.020.020.020.020.020.020.020.020.020.020.020.020.020.020.020.020.020.020.020.020.02 0.050.050.050.050.050.050.050.050.050.050.050.050.050.050.050.050.050.050.050.050.050.050.050.05 0 500,000 1,000,000 1,500,000 0 500,000 1,000,000 1,500,000 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 MAF bins Number of variants Mean IM PUTE r^2 REFERENCE IGRP1.0 1000GPhase3 Info score 0.2 0.4 1

Figure (2) Mean values of r2 stratified by minor allele frequency (colored lines) and values of info

scores stratified by minor allele frequency (bar plot) for Italian cohorts. An outbred cohort from North Italy (NW ITA) was included for comparison.

(4)

(a)

(5)

−0.05 0.00 0.05 0.10 0.15 0.20 −0.20 −0.15 −0.10 −0.05 0.00 0.05 PC1 9.01 % PC2 7.82 % FIN GBR IBS CAR TSI VBI CLZ SMC ERT SAU ILG RSI (a) −0.15 −0.10 −0.05 0.00 −0.05 0.00 0.05 0.10 0.15 0.20 PC3 7.53 % PC4 6.98 % (b)

ASW ACBLWK GWD MSL YRI ESN PEL JPT CDX KHV CHB CHS PUR CLM MXL PJL GIH BEB ITU STU ILG RSI SAUERTCLZ SMC FIN VBI CAR TSI IBS CEU GBR ASW ACB LWK GWD MSL YRI ESN PEL JPT CDX KHV CHB CHS PUR CLM MXL PJL GIH BEB ITU STU ILG RSI SAU ERT CLZ SMC FIN VBI CAR TSI IBS CEU GBR 00.05 0.1 0.15 Value 0 40 80 Count FST (c) Drift parameter 0.000 0.001 0.002 0.003 0.004 0.005 0.006 TSI SMC GBR RSI SAU CAR ILG VBI FIN ERT IBS CLZ 10 s.e. (d) 0.00 0.05 0.10

FIN GBR IBS CAR TSI VBI CLZ SMC ERT SAU ILG RSI

Total Homozygosity (Mb)

(e)

0.00

0.05

0.10

FIN GBR IBS CAR TSI VBI CLZ SMC ERT SAU ILG RSI

Inbreeding Coefficient

(f)

Figure (4) Population structure: PCA of Italian samples. a) first two PC and b) the 3rd PC versus

the 4th PC. Variance explained by the axis is reported. Each population from FVG has its own axis of variation. The first axis separates ILG from all other Italian populations. The second separates SAU from RSI; c) Pairwise matrix of FST; d) TREEMIX graph analyses with 3 migration edges, link between North Euopean population and some isolates are shown; e) Distribution of total amount of homozygosity in the all cohort minimum ROH length was set at 1 Mb; f) Beanplots of Inbreeding coefficient of 1000G European populations and Italian populations. All FVG population have higher inbreeding coefficient respect to other Italian and European population with the exception of FIN. The plot shows as in the INGI populations the distribution of the inbreeding values is more sparse respect to the reference Italian population TSI from 1000G.

(6)

TSI CAR VBI CLZ SMC ERT SAU ILG RSI Genes under Selection |iHS|>=2

Populations Number of genes 0 100 200 300 400

Shared with TSI Different from TSI

Figure (5) Proportion of genes with signatures of selection in INGI subpopulations respect to TSI. Blue

colour represents the fraction of shared genes with TSI, orange colour represent the fraction of private genes respect to TSI.

(7)

0_5 5_15 15_20 >20 CAR_TSI DVxy AC 3−5

CADD scores bin

0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4 1.6 ● ● ● ● 0_5 5_15 15_20 >20 VBI_TSI DVxy AC 3−5

CADD scores bin

0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4 1.6 ● ● ● ● 0_5 5_15 15_20 >20 CLZ_TSI DVxy AC 3−5

CADD scores bin

0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4 1.6 ● ● ● ● 0_5 5_15 15_20 >20 SMC_TSI DVxy AC 3−5

CADD scores bin

0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4 1.6 ● ● ● ● 0_5 5_15 15_20 >20 ERT_TSI DVxy AC 3−5

CADD scores bin

0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4 1.6 ● ● ● ● 0_5 5_15 15_20 >20 SAU_TSI DVxy AC 3−5

CADD scores bin

0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4 1.6 ● ● ● ● 0_5 5_15 15_20 >20 ILG_TSI DVxy AC 3−5

CADD scores bin

0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4 1.6 ● ● ● ● 0_5 5_15 15_20 >20 RSI_TSI DVxy AC 3−5

CADD scores bin

0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4 1.6 ● ● ● ●

Figure (6) DVxy statistic for each INGI cohort using as reference the TSI population using variants

between 3-5 allele count (AC) and binned for CADD score as show in the x axis. Confidence intervals were created using the distribution of all 22 chromosomes and represent 1 standard deviation. A value =1 means no enrichment, a value <1 means depletion in the INGI coohort respect to TSI and a value >1 means enrichment respect to the reference.

(8)

CAR VBI CLZ SMC ER T SA U ILG RSI RSI ILG SAU ERT SMC CLZ VBI CAR 23 17 16 19 23 31 22 0 24 22 20 23 29 24 0 28 17 21 22 21 0 20 17 19 19 0 19 15 16 0 20 12 0 16 0 0

Figure (7) Heatmap of the ratio of DV variants respect to the Italian reference ( 3-5 AC ,CADD≥20,

fre-quency fold enrichment≥3) that are different between pairs of INGI sub-populations with the DV variants that are shared between pairs.

(9)

Figure (8) Venn diagram shows how the 133 genes harbouring HKO (with at least 1 homozygous individual) are distributed among INGI cohorts.

Riferimenti

Documenti correlati

PARK2 mutations also explain ~15% of the sporadic cases with onset before 45 [1, 2] and act as susceptibility alleles for late-onset forms of Parkinson disease (2% of cases) [3].

In questo articolo racconteremo la nostra esperienza di soccorso post ter- remoto del 24 agosto 2016 ad Amatrice e nelle zone limitrofe, in particolare l’eperienza

Figure S2: Diurnal profiles of foliar leaf gas exchange and chlorophyll fluorescence parameters in Quercus cerris plants (i) regularly irrigated to maximum soil water holding

For the purpose of exploring the process of adaptation to the market requirements with regard to blueberries by a group of farms associated with a fruit and vegetable

During each iteration, the algorithm associates each column (resp. row) to the nearest column (resp. row) cluster which does not introduce any cannot-link violation. row) is involved

In this work, the lycopene level, the antioxidant activity, and the bioaccessibility of lycopene and polyphenols in “Pizza Napoletana marinara” and other similar pizzas not subjected

In this field several researches established the relationship between AS and panic: subjects with High AS (HAS), compared to Low AS (LAS), had greater panic-like responses when the

Dialects differ from one another and from Italian with respect to the possessive forms available in their lexicon (clitic, weak, strong pos- sessives; cf. Cardinaletti’s 1998