All of the code used in our anlaysis can be viewed in our GitHub Repository.
This collection of scripts were used to analyze the proteomics data from the manuscript titled, “Redefining the breast cancer exosome proteome by tandem mass tag quantitative proteomics and multivariate cluster analysis” by David J Clark et al. The analysis performed by these scripts uses the peptide ratios that were calculated using Quantimore (formerly IsoQuant), calculates the protein ratios and performs SVM multivariate analysis to cluster proteins into groups likely to be of exosomal origin or not from the exosome.
Shotgun proteomics data was acquired using tri-plexed tandem mass tags (TMT), specifically TMT-129,130 and 131 in 2 replicates. The 3 tags represent different points in a traditional exosome isolation strategy, where TMT-129 is a 10,000 x g pellet, TMT-130 is a 100,000 x g supernatant, and TMT-131 is the exosome fraction from an Optiprep density gradient. The raw mass spectra were searhed using Comet. Peptide IDs were validated using PeptideProphet, keeping all peptides above 0.8 PeptideProphet probability (1% FDR) for quantitation. ProteinProphet was used to infer protein evidence using all identified peptides. Peptide quantitation was performed on the filtered PeptideProphet results using QuantiMORE. These results are used as the input for these scripts.
Peptide Quantiation Results from QuantiMORE
“noC” indicates that we used Excel search to replace all commas with semi-colons in order to csv parsing issues.
Results from ProteinProphet
These are the results from ProteinProphet exported as a excel spreadsheet then saved as a csv.
FASTA Protein Database
This is needed to get the protein descriptions.
The Exosome and Non-Exosome Markers
These are the markers we hand-picked as a training set for our SVM analysis.
\[ Protein Ratio = \frac{\sum_{i=1}^{n}PSMs_{i} * PeptideRatio_{i}}{\sum_{i=1}^{n}PSMs_{i}} \]
ProtQuant_Functions.R - Contains the functions needed to read in the ProteinProphet files and perform the protein quantitation in ProtQuant.R.
ClusterAnalysis.R - Performs the SVM cluster analysis on the protein ratios obtained from ProtQuant.R. For our final anlysis, the SVM parameters were optimized over 100 iterations (time=100) and 5x cross-validation (xval=5). This was performed based on the pRoloc tutorial on Bioconductor.
Make_Plots.R - Depends on Plot_Functions.R. Creates figures to represent our data using ggplot2. These were further annotated using Adobe Illustrator to to yield the figures seen in the published manuscript. Note that results from this script are written to a “Figures” folder in your working directory
Plot_Functions.R - Contains the R code to creat most of the presented figures using ggplot2.
Make_Tables.R - Creates intelligible tables from the dataframes used in this analysis. These tables are the unformatted versions of the tables seen in the published manuscript. Note that the results from this script are written to a “Tables” folder in your working directory.
To perform the same analysis that is seen in our manuscript, the following code can be executed:
source("Scripts/ProteinQuant.R")
source("Scripts/ClusterAnalysis.R")
source("Scripts/Make_Plots.R")
source("Scripts/Make_Tables.R")
Below are the final tables output by the analysis scripts. Each contains the file name, description and first 5 rows.
| Protein Group | Uniprot Accession | Gene Symbol | Protein Description | ProteinProphet Probability | SVM Classification | SVM Probability | TMT 130/129 +/- SD | Log2 TMT 130/129 | TMT 131/129 +/- SD | Log2 TMT 131/129 | Unique Peptides | Coverage (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A5D8V6 | VPS37C | Vacuolar protein sorting-associated protein 37C | 1 | Non-Exosome | 0.570 | 0.886 +/- 0.094 | -0.175 | 0.974 +/- 0.396 | -0.038 | 2 | 15.5 |
| 2 | B7ZAQ6-2 | GPR89A | Isoform 2 of Golgi pH regulator A | 1 | Non-Exosome | 0.900 | 0.333 +/- 0.053 | -1.585 | 0.225 +/- 0.085 | -2.152 | 1 | 4.8 |
| 3 | O00186 | STXBP3 | Syntaxin-binding protein 3 | 1 | Non-Exosome | 0.732 | 1.188 +/- 0.172 | 0.248 | 0.507 +/- 0.033 | -0.981 | 4 | 8.1 |
| 4 | O00220 | TNFRSF10A | Tumor necrosis factor receptor superfamily member 10A | 1 | Non-Exosome | 0.726 | 0.835 +/- 0.205 | -0.260 | 0.755 +/- 0.145 | -0.405 | 1 | 5.1 |
| 5 | O00232-2 | PSMD12 | Isoform 2 of 26S proteasome non-ATPase regulatory subunit 12 | 1 | Non-Exosome | 0.919 | 0.602 +/- 0.198 | -0.731 | 0.279 +/- 0.039 | -1.841 | 6 | 19.0 |
| 6 | O00442-2 | RTCA | Isoform 2 of RNA 3’-terminal phosphate cyclase | 1 | Non-Exosome | 0.848 | 0.248 +/- 0.038 | -2.010 | 0.192 +/- 0.002 | -2.383 | 1 | 8.2 |
| Uniprot Accession | Gene Name | Peptide Sequence | TMT 130/129 | TMT 131/129 | PSMs | Replicate |
|---|---|---|---|---|---|---|
| A0AVT1 | UBA6 | GMITVTDPDLIEK | 0.66 | 0.25 | 1 | 2 |
| A0AVT1 | UBA6 | LETGQFLTFR | 0.57 | 0.17 | 1 | 1 |
| A0AVT1 | UBA6 | QDVIITALDNVEAR | 0.10 | 0.26 | 1 | 2 |
| A0AVT1 | UBA6 | TVFFESLER | 0.71 | 0.44 | 2 | 1 |
| A0AVT1 | UBA6 | TVFFESLER | 0.28 | 0.42 | 1 | 2 |
| A0FGR8-2 | ESYT2 | ALALLEDEER | 1.05 | 0.53 | 1 | 1 |
| Gene Name | Description | Log2 TMT 130/129 | Log2 TMT 131/129 | Unique Peptides | Marker Class |
|---|---|---|---|---|---|
| SEC63 | Translocation protein SEC63 homolog | -0.873 | -2.000 | 7 | Non-Exosome |
| TMX3 | Protein disulfide-isomerase TMX3 | -1.247 | -2.059 | 3 | Non-Exosome |
| SDC1 | Syndecan-1 | 1.007 | 0.632 | 5 | Exosome |
| HK2 | Hexokinase-2 | -1.255 | -1.694 | 8 | Non-Exosome |
| CD9 | CD9 antigen | 2.513 | 2.304 | 5 | Exosome |
| CD81 | CD81 antigen | 3.034 | 2.474 | 3 | Exosome |
| Uniprot Accession | Protein Description | Gene Symbol | Protein Group | TMT 130/129 +/- SD | Log2 TMT 130/129 | TMT 131/129 +/- SD | Log2 TMT 131/129 | SVM Classification | SVM Probability |
|---|---|---|---|---|---|---|---|---|---|
| A5D8V6 | Vacuolar protein sorting-associated protein 37C | VPS37C | 1 | 0.886 +/- 0.094 | -0.175 | 0.974 +/- 0.396 | -0.038 | Non-Exosome | 0.570 |
| B7ZAQ6-2 | Isoform 2 of Golgi pH regulator A | GPR89A | 2 | 0.333 +/- 0.053 | -1.585 | 0.225 +/- 0.085 | -2.152 | Non-Exosome | 0.900 |
| O00186 | Syntaxin-binding protein 3 | STXBP3 | 3 | 1.188 +/- 0.172 | 0.248 | 0.507 +/- 0.033 | -0.981 | Non-Exosome | 0.732 |
| O00220 | Tumor necrosis factor receptor superfamily member 10A | TNFRSF10A | 4 | 0.835 +/- 0.205 | -0.260 | 0.755 +/- 0.145 | -0.405 | Non-Exosome | 0.726 |
| O00232-2 | Isoform 2 of 26S proteasome non-ATPase regulatory subunit 12 | PSMD12 | 5 | 0.602 +/- 0.198 | -0.731 | 0.279 +/- 0.039 | -1.841 | Non-Exosome | 0.919 |
| O00442-2 | Isoform 2 of RNA 3’-terminal phosphate cyclase | RTCA | 6 | 0.248 +/- 0.038 | -2.010 | 0.192 +/- 0.002 | -2.383 | Non-Exosome | 0.848 |
| SVM Classification | Gene Name | Description | Log2 TMT 130/129 | Log2 TMT 131/129 | SVM Probability |
|---|---|---|---|---|---|
| Exosome | ITGAV | Isoform 2 of Integrin alpha-V | 0.431 | -0.165 | 0.600 |
| Exosome | CD151 | CD151 antigen | 0.428 | -0.089 | 0.624 |
| Non-Exosome | ATP1A1 | Isoform 4 of Sodium/potassium-transporting ATPase subunit alpha-1 | 0.087 | -0.701 | 0.715 |
| Non-Exosome | YES1 | Tyrosine-protein kinase Yes | -0.595 | -1.161 | 0.904 |
| Non-Exosome | ATP2B3 | Isoform XA of Plasma membrane calcium-transporting ATPase 3 | 0.023 | -1.191 | 0.823 |
| Non-Exosome | ATP2B2 | Isoform YB of Plasma membrane calcium-transporting ATPase 2 | 0.012 | -0.887 | 0.781 |
| Uniprot Accession | Protein Description | Gene Symbol | Protein Group | ProteinProphet Probability |
|---|---|---|---|---|
| VP37C | Vacuolar protein sorting-associated protein 37C | VPS37C | 1 | 1 |
| SUSD5 | Sushi domain-containing protein 5 | SUSD5 | 10 | 1 |
| CHM4B | Charged multivesicular body protein 4b | CHMP4B | 100 | 1 |
| CDCP1 | CUB domain-containing protein 1 | CDCP1 | 101 | 1 |
| MUC5B | Mucin-5B | MUC5B | 102 | 1 |
| CD320 | CD320 antigen | CD320 | 103 | 1 |