C RITICAL COMPARISON OF SAMPLE PREPARATION STRATEGIES
FOR SHOTGUN PROTEOMIC ANALYSIS OF FORMALIN - FIXED ,
PARAFFIN - EMBEDDED SAMPLES
Marcello Abbondio, Alessandro Tanca*, Salvatore Pisanu, Sergio Uzzau, Daniela Pagnozzi, Maria Filippa Addis
Porto Conte Ricerche Srl, Tramariglio, Alghero (SS), Italy; *tanca@portocontericerche.it
1. I NTRODUCTION
The growing field of formalin-fixed paraffin-embedded (FFPE) tissue proteomics holds promise for improving translational research. Worldwide archival tissue banks hold a significant number and variety of tissue samples, as well as a wealth of retrospective information regarding diagnosis, prognosis, and response to therapy. This makes them an important resource for protein biomarker discovery and validation. Direct tissue trypsinization (DT) and protein extraction followed by in solution digestion (ISD) or filter-aided sample preparation (FASP) are the most common workflows for shotgun LC- MS/MS analysis of FFPE samples. However, there is currently no consensus on the optimal protocol, and no studies critically comparing the performance of the three different methods with FFPE specimens have been reported so far. Liver tissue was chosen as a model in consideration of its high proteome complexity in terms of expressed proteins and metabolic pathways.
4. C ONCLUSIONS
These results highlight that diverse sample preparation strategies provide qualitatively and quantitatively different proteomic information, and present typical biases that should be taken into account when planning a shotgun proteomic investigation dealing with FFPE samples. In view of the considerable portion of unique identifications provided by each method (particularly by DT and FASP), when a sufficient amount of tissue is available, a complementary, parallel use of different sample preparation strategies is suggested to increase proteome coverage, width and depth.
3.2. Q
UALITATIVE AND QUANTITATIVE COMPARISONA B
1 10 100 1000
1 10 100 1000
Log10NSAF ISD
Log10NSAF FASP
1 10 100 1000
1 10 100 1000
Log10NSAF ISD
Log10NSAF DT
1 10 100 1000
1 10 100 1000
Log10NSAF FASP
Log10NSAF DT
r=0.952 r=0.897 r=0.941
DT FASP
ISD
1274
50.0%
317
124
314 302
171 47
FASP 1
FASP 2
FASP 3
ISD 3
ISD 1
ISD 2 DT 1
DT 3
DT 2
1 10 100
1 10 100
Log10NSAF ISD
Log10NSAF FASP
1 10 100
1 10 100
Log10NSAF ISD
Log10NSAF DT
1 10 100
1 10 100
Log10NSAF FASP
Log10NSAF DT
r=0.775 r=0.576 r=0.622
DT FASP
ISD
3595
26.6%
2746
406
3387 2329
745 204
FASP 1
FASP 2
FASP 3 ISD 3
ISD 1
ISD 2 DT 1
DT 2
DT 3
C D
E F
Top: Unsupervised hierarchical cluster analysis based on protein (A) and peptide (B) label-free quantitative data, respectively.
Middle: Venn diagrams illustrating distribution of all identified proteins (C) and peptides (D). Percentage of common proteins and peptides are indicated in yellow.
Bottom: Dot plots describing correlation of protein (E) and peptide (F) abundance between DT and FASP, DT and ISD, FASP and ISD. Pearson correlation coefficients are also reported.
3. R ESULTS AND D ISCUSSION
179
235
61 112
1126
58.0%
174 53
DT 1 DT 2
DT 3
1 10 100 1000
1 10 100 1000
Log10NSAF DT 3
Log10NSAF DT 2
1 10 100 1000
1 10 100 1000
Log10NSAF DT 3
Log10NSAFDT 1
1 10 100 1000
1 10 100 1000
Log10NSAF DT 2
Log10 NSAF DT 1
r=0.910 r=0.957 r=0.931
136 128
163 179
1353
65.2%
54 63
FASP 1 FASP 2
FASP 3
1 10 100 1000
1 10 100 1000
Log10NSAF FASP 2
Log10NSAF FASP 1
1 10 100 1000
1 10 100 1000
Log10NSAF FASP 3
Log10NSAF FASP 1
1 10 100 1000
1 10 100 1000
Log10NSAF FASP 3
Log10NSAF FASP 2
r=0.995 r=0.974 r=0.978
ISD 1 ISD 2
ISD 3
139
1124
69.5%
43 28
139 75
68
1 10 100 1000
1 10 100 1000
Log10NSAF ISD 2
Log10NSAF ISD 1
1 10 100 1000
1 10 100 1000
Log10NSAF ISD 3
Log10NSAF ISD 1
1 10 100 1000
1 10 100 1000
Log10NSAF ISD 3
Log10NSAF ISD 2
r=0.994 r=0.987 r=0.988
A
1462
1371
488 1126
2895
32.6%
1221 311
DT 1 DT 2
DT 3
1 10 100
1 10 100
Log10NSAF DT 3
Log10NSAF DT 2
1 10 100
1 10 100
Log10NSAF DT 3
Log10NSAF DT 1
1 10 100
1 10 100
Log10NSAF DT 2
Log10NSAF DT 1
r=0.582 r=0.561 r=0.488
960 1003
1310 1825
4623
44.1%
330 422
FASP 1 FASP 2
FASP 3
1 10 100
1 10 100
Log10NSAF FASP 3
Log10NSAF FASP 2
1 10 100
1 10 100
Log10NSAF FASP 3
Log10NSAF FASP 1
1 10 100
1 10 100
Log10NSAF FASP 2
Log10NSAF FASP 1
r=0.920 r=0.705 r=0.728
ISD 1 ISD 2
ISD 3
691
2798
56.5%
214 113
519 268
347
1 10 100
1 10 100
Log10NSAF ISD 3
Log10NSAF ISD 2
1 10 100
1 10 100
Log10NSAF ISD 3
Log10NSAF ISD 1
1 10 100
1 10 100
Log10NSAF ISD 2
Log10NSAF ISD 1
r=0.899 r=0.833 r=0.822
B
3.1. R
EPRODUCIBILITY• lower reproducibility
• good preservation of high-MW proteins
• much lower keratin contamination
• higher abundance of non tryptic peptides
• depletion of high-MW proteins
• enrichment in hydrophobic and membrane proteins
• higher identification yields
• higher reproducibility DT
FASP AND ISD FASP
ISD
Qualitative and quantitative reproducibility of DT, FASP and ISD.
A) Top: distribution of identified proteins among replicates. Percentage of common proteins are indicated in yellow.
Bottom: correlation of protein abundance between all replicates combinations for every method. Pearson correlation coefficients are also reported.
B) Same as Panel A but at peptide level.
3.5. N
ON-
TRYPTIC AND FORMALDEHYDE-
MODIFIED PEPTIDESDT FASP
ISD 75 78
129 226 525
37 37
DT FASP
ISD 10 25
76 117 270
12 8
A
B
DT
187 317
DT 8687
+3.6%
mod no mod
715 DT
7735 1139
trypsin DT no enzyme
+7.3% 1822 8651 416
FASP FASP
trypsin no enzyme
+3.8%
437
FASP FASP
10036 160
+1.5%
no mod mod
3734 278
1216
ISD
trypsin ISD no enzyme
+5.3%
4745 205
ISD ISD
+2.1% 106
mod no mod
A) Left: distribution of peptides identified with ‘trypsin’ and ‘no enzyme’ searches in DT, FASP and ISD samples. Right: distribution of non-tryptic peptides among all methods.
B) Left: distribution of peptides identified with standard search (‘no mod’) and search comprising formaldehyde-induced modifications (‘mod’) in DT, FASP and ISD samples. Right:
distribution of formaldehyde-modified peptides among all methods.
3.4. Q
UANTITATIVE PROTEIN DISTRIBUTION:
PHYSICOCHEMICAL FEATURES0 5 10 15 20 25 30 35 40 45 50
0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 100-150 150-200 >200
% NSAF
MW (kDa)
*
** *
*
*
* * ** *
* * * ** *
** ** * ** * * * * ** *
*
*
0 5 10 15 20 25 30
<5 5-6 6-7 7-8 8-9 9-10 10-11 >11
% NSAF
pI
*
*
* *
*
*
* *
*
*
*
* ** *
*
*
0.0 0.1 0.2 0.3 0.4 0.5 0.6
GRAVY >0.5
% NSAF
*
*
*
0 1 2 3 4 5 6 7 8 9
TMD>0 TMD>1 TMD>2
% NSAF
*
*
*
**
* *
*
*
A B C D
0.0 0.2 0.4 0.6 0.8 1.0 1.2
GRAVY >0.5 GRAVY >0.5
proteins NSAF
%
DT FASP ISD
*
*
*
*
*
*
*
Quantitative protein distribution according to MW (A), pI (B), number of transmembrane domains (TMD, C) and hydrophobicity (GRAVY score, D). Mean and SD value of NSAF percentage for three independent experimental replicates are shown. NSAF values were expressed as percentage of all proteins.
Asterisks indicate statistical significance according to Student’s t-test (p value < 0.05): statistically significant difference versus DT, versus FASP, versus ISD and versus all other methods.
* * * *
0.0 0.5 1.0 1.5 2.0
0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 100-150 150-200 >200
% NSAF
MW (kDa)
*
* **
*
*
* * ** *
* * * ** *
** ** *
* *
*
*
*
*
*
* *
*
*
0.0 0.5 1.0 1.5 2.0
0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 100-150 150-200 >200
% NSAF
MW (kDa)
*
* **
*
*
* * ** *
* * * ** *
* * ** *
* *
*
*
*
*
*
* *
*
*
3.3. Q
UANTITATIVE PROTEIN DISTRIBUTION: S
UBCELLULAR LOCALIZATIONMean and SD value of NSAF percentage for three independent experimental replicates are shown. NSAF values were expressed as percentage of the annotated proteins.
Asterisks indicate statistical significance according to Student’s t-test (p value < 0.05):
statistically significant difference versus DT versus FASP
versus ISD
versus all other methods
* *
* *
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Extracellular matrix Secreted Membrane Multi-pass membrane protein Single-pass membrane protein Peripheral membrane protein Lipid-anchor Cell membrane Cytoplasm Cytosol Cytoskeleton Nucleus Nucleus membrane Nucleus matrix Nucleolus Endoplasmic reticulum membrane Endoplasmic reticulum lumen Golgi apparatus Mitochondrion outer membrane Mitochondrion matrix Mitochondrion inner membrane Lysosome
NSAF
DT FASP ISD
*
* *
*
*
*
*
**
*
*
* **
*
*
**
* *
**
*
*
**
*
**
*
*
**
** *
***
***
* *
*
**
*
2. M ETHODS
0 1 2 3 4 5 6
Categoria 1 Categoria 2 Categoria 3 Categoria 4
D
IRECT TISSUE TRYPSINIZATION(DT)
Ammonium bicarbonate 50 mM
FASP
Microcon YM-30
I
N SOLUTION DIGESTION(ISD)
Detergent Removal Spin Columns
PROTEIN EXTRACTION
SDS 2 %, DTT 200 mM, Tris–HCl (pH 8.8) 20 mM 99 °C for 60 min
DEPARAFFINIZATION
& REHYDRATION
TRYPSIN DIGESTION PEPTIDE MIXTURE LC-MS/MS
UltiMate 3000 RSLCnano LC system 485 min gradient
LTQ Orbitrap Velos - HCD
PROTEIN IDENTIFICATION Search engine: Sequest-HT
Peptide validation: Percolator
FDR < 1 % based on peptide q-value
DATA ANALYSIS
Label free quantification via spectral counting Multivariate statistics using Perseus
COMPARISON
Reproducibility
Qualitative-quantitative Subcellular localization, pI, MW, GRAVY, TMD Formaldehyde-modified and non-tryptic peptides
HUMAN LIVER TISSUE
3 INDEPENDENT REPLICATES PER METHOD
5 5-μM-THICK SLICES PER REPLICATE
NSAF = Spc/L
∑ SpC/L
5. R
EFERENCES•Tanca A, Abbondio M, Pisanu S, Pagnozzi D, Uzzau S, Addis MF: Critical comparison of sample preparation strategies for shotgun proteomic analysis of formalin-fixed, paraffin-embedded samples: insights from liver tissue. Clin Proteomics 2014, 11(1):28.
•Tanca A, Pagnozzi D, Addis MF: Setting proteins free: Progresses and achievements in proteomics of formalin-fixed, paraffin- embedded tissues. Proteomics Clin Appl 2012, 6:7–21.
•Zybailov B, Mosley AL, Sardiu ME, Coleman MK, Florens L, Washburn MP: Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae. J Proteome Res 2006, 5:2339–2347.
•Ostasiewicz P, Zielinska DF, Mann M, Wisniewski JR: Proteome, phosphoproteome, and N-glycoproteome are quantitatively preserved in formalin-fixed paraffin-embedded tissue and analyzable by high-resolution mass spectrometry. J Proteome Res 2010, 9:3688–3700.
•Alkhas A, Hood BL, Oliver K, Teng PN, Oliver J, Mitchell D, Hamilton CA, Maxwell GL, Conrads TP: Standardization of a sample preparation and analytical workflow for proteomics of archival endometrial cancer tissue. J Proteome Res 2011, 10:5264–5271.
•Gamez-Pozo A, Ferrer NI, Ciruelos E, Lopez-Vacas R, Martinez FG, Espinosa E, Vara JA: Shotgun proteomics of archival triple- negative breast cancer samples. Proteomics Clin Appl 2013, 7:283–291.