Introduction

GitHub Repository

All of the code used in our anlaysis can be viewed in our GitHub Repository.

Description

This collection of scripts were used to analyze the proteomics data from the manuscript titled, “Redefining the breast cancer exosome proteome by tandem mass tag quantitative proteomics and multivariate cluster analysis” by David J Clark et al. The analysis performed by these scripts uses the peptide ratios that were calculated using Quantimore (formerly IsoQuant), calculates the protein ratios and performs SVM multivariate analysis to cluster proteins into groups likely to be of exosomal origin or not from the exosome.

Experiment Detail

Shotgun proteomics data was acquired using tri-plexed tandem mass tags (TMT), specifically TMT-129,130 and 131 in 2 replicates. The 3 tags represent different points in a traditional exosome isolation strategy, where TMT-129 is a 10,000 x g pellet, TMT-130 is a 100,000 x g supernatant, and TMT-131 is the exosome fraction from an Optiprep density gradient. The raw mass spectra were searhed using Comet. Peptide IDs were validated using PeptideProphet, keeping all peptides above 0.8 PeptideProphet probability (1% FDR) for quantitation. ProteinProphet was used to infer protein evidence using all identified peptides. Peptide quantitation was performed on the filtered PeptideProphet results using QuantiMORE. These results are used as the input for these scripts.

Inputs for analysis

Peptide Quantiation Results from QuantiMORE
“noC” indicates that we used Excel search to replace all commas with semi-colons in order to csv parsing issues.

Results from ProteinProphet
These are the results from ProteinProphet exported as a excel spreadsheet then saved as a csv.

FASTA Protein Database
This is needed to get the protein descriptions.

The Exosome and Non-Exosome Markers
These are the markers we hand-picked as a training set for our SVM analysis.

R scripts

  1. ProtQuant.R - Depends on ProtQuant_Functions.R. This script is used to calculate the protein ratios for each replicate, independently based on the peptides assigned to each protein by ProteinProphet. The protein ratio is calculated as the average of the peptide ratios assiged to it, weighted by the number of quantified PSMs for each peptide. An equation to represent this calculation is shown below, n represents the number of peptides quantified for a given protein:

\[ Protein Ratio = \frac{\sum_{i=1}^{n}PSMs_{i} * PeptideRatio_{i}}{\sum_{i=1}^{n}PSMs_{i}} \]

  1. ProtQuant_Functions.R - Contains the functions needed to read in the ProteinProphet files and perform the protein quantitation in ProtQuant.R.

  2. ClusterAnalysis.R - Performs the SVM cluster analysis on the protein ratios obtained from ProtQuant.R. For our final anlysis, the SVM parameters were optimized over 100 iterations (time=100) and 5x cross-validation (xval=5). This was performed based on the pRoloc tutorial on Bioconductor.

  3. Make_Plots.R - Depends on Plot_Functions.R. Creates figures to represent our data using ggplot2. These were further annotated using Adobe Illustrator to to yield the figures seen in the published manuscript. Note that results from this script are written to a “Figures” folder in your working directory

  4. Plot_Functions.R - Contains the R code to creat most of the presented figures using ggplot2.

  5. Make_Tables.R - Creates intelligible tables from the dataframes used in this analysis. These tables are the unformatted versions of the tables seen in the published manuscript. Note that the results from this script are written to a “Tables” folder in your working directory.

Reproducing the Analysis

To perform the same analysis that is seen in our manuscript, the following code can be executed:

source("Scripts/ProteinQuant.R")
source("Scripts/ClusterAnalysis.R")
source("Scripts/Make_Plots.R")
source("Scripts/Make_Tables.R")

Final Outputs

Figures

  1. ClusterPlot.pdf and ClusterPlot.tiff - A plot of our SVM cluster analysis results with the Exosome and the Non-Exosome markers shown in black.

  1. PM_MarkerPlot.pdf and PM_MarkerPlot.tiff - A plot of our SVM cluster analysis results with the plasma membrane markers from pRoloc shown in black.

  1. ScatterPlot.pdf and ScatterPlot.tiff - A scatter plot of the Log2 protein ratios for all of our quantified proteins.

  1. ScatterPlot_M.pdf and ScatterPlot_M.tiff - A scatter plot of the Log2 protein ratios for all of our quantified proteins with Exosome and Non-Exosome markers shown in color.

  1. CoverPlot.tiff - A version of our SVM cluster analysis for use as a cover picture.

  1. Cluster_rev.tiff - Plots the TMT 130/129 vs TMT 131/130 protein ratios while coloring the proteins according to our prior SVM analysis above.

  1. Validation_Venn.tiff - A euler diagram displaying the protein overlap of our size exclusion chromatography exosome preparation to the exosome and non-exosome clusters.

Tables

Below are the final tables output by the analysis scripts. Each contains the file name, description and first 5 rows.

  1. SI-1_Proteins.csv - contains a detailed protein list with protein quantitation information.
Protein Group Uniprot Accession Gene Symbol Protein Description ProteinProphet Probability SVM Classification SVM Probability TMT 130/129 +/- SD Log2 TMT 130/129 TMT 131/129 +/- SD Log2 TMT 131/129 Unique Peptides Coverage (%)
1 A5D8V6 VPS37C Vacuolar protein sorting-associated protein 37C 1 Non-Exosome 0.570 0.886 +/- 0.094 -0.175 0.974 +/- 0.396 -0.038 2 15.5
2 B7ZAQ6-2 GPR89A Isoform 2 of Golgi pH regulator A 1 Non-Exosome 0.900 0.333 +/- 0.053 -1.585 0.225 +/- 0.085 -2.152 1 4.8
3 O00186 STXBP3 Syntaxin-binding protein 3 1 Non-Exosome 0.732 1.188 +/- 0.172 0.248 0.507 +/- 0.033 -0.981 4 8.1
4 O00220 TNFRSF10A Tumor necrosis factor receptor superfamily member 10A 1 Non-Exosome 0.726 0.835 +/- 0.205 -0.260 0.755 +/- 0.145 -0.405 1 5.1
5 O00232-2 PSMD12 Isoform 2 of 26S proteasome non-ATPase regulatory subunit 12 1 Non-Exosome 0.919 0.602 +/- 0.198 -0.731 0.279 +/- 0.039 -1.841 6 19.0
6 O00442-2 RTCA Isoform 2 of RNA 3’-terminal phosphate cyclase 1 Non-Exosome 0.848 0.248 +/- 0.038 -2.010 0.192 +/- 0.002 -2.383 1 8.2
  1. SI-2_Peptides.csv - contains a peptide list with the assigned protein and peptide quantitation information.
Uniprot Accession Gene Name Peptide Sequence TMT 130/129 TMT 131/129 PSMs Replicate
A0AVT1 UBA6 GMITVTDPDLIEK 0.66 0.25 1 2
A0AVT1 UBA6 LETGQFLTFR 0.57 0.17 1 1
A0AVT1 UBA6 QDVIITALDNVEAR 0.10 0.26 1 2
A0AVT1 UBA6 TVFFESLER 0.71 0.44 2 1
A0AVT1 UBA6 TVFFESLER 0.28 0.42 1 2
A0FGR8-2 ESYT2 ALALLEDEER 1.05 0.53 1 1
  1. table_1_ExosomeMarkers.csv - contains the list of exosome and non-exosome markers with protein quantitation information.
Gene Name Description Log2 TMT 130/129 Log2 TMT 131/129 Unique Peptides Marker Class
SEC63 Translocation protein SEC63 homolog -0.873 -2.000 7 Non-Exosome
TMX3 Protein disulfide-isomerase TMX3 -1.247 -2.059 3 Non-Exosome
SDC1 Syndecan-1 1.007 0.632 5 Exosome
HK2 Hexokinase-2 -1.255 -1.694 8 Non-Exosome
CD9 CD9 antigen 2.513 2.304 5 Exosome
CD81 CD81 antigen 3.034 2.474 3 Exosome
  1. table_S1_AllProteins.csv - contains a simplified protein list.
Uniprot Accession Protein Description Gene Symbol Protein Group TMT 130/129 +/- SD Log2 TMT 130/129 TMT 131/129 +/- SD Log2 TMT 131/129 SVM Classification SVM Probability
A5D8V6 Vacuolar protein sorting-associated protein 37C VPS37C 1 0.886 +/- 0.094 -0.175 0.974 +/- 0.396 -0.038 Non-Exosome 0.570
B7ZAQ6-2 Isoform 2 of Golgi pH regulator A GPR89A 2 0.333 +/- 0.053 -1.585 0.225 +/- 0.085 -2.152 Non-Exosome 0.900
O00186 Syntaxin-binding protein 3 STXBP3 3 1.188 +/- 0.172 0.248 0.507 +/- 0.033 -0.981 Non-Exosome 0.732
O00220 Tumor necrosis factor receptor superfamily member 10A TNFRSF10A 4 0.835 +/- 0.205 -0.260 0.755 +/- 0.145 -0.405 Non-Exosome 0.726
O00232-2 Isoform 2 of 26S proteasome non-ATPase regulatory subunit 12 PSMD12 5 0.602 +/- 0.198 -0.731 0.279 +/- 0.039 -1.841 Non-Exosome 0.919
O00442-2 Isoform 2 of RNA 3’-terminal phosphate cyclase RTCA 6 0.248 +/- 0.038 -2.010 0.192 +/- 0.002 -2.383 Non-Exosome 0.848
  1. table_S2_PMMarkers.csv - contains a list of the plasma membrane markers used from the pRoloc markers data set.
SVM Classification Gene Name Description Log2 TMT 130/129 Log2 TMT 131/129 SVM Probability
Exosome ITGAV Isoform 2 of Integrin alpha-V 0.431 -0.165 0.600
Exosome CD151 CD151 antigen 0.428 -0.089 0.624
Non-Exosome ATP1A1 Isoform 4 of Sodium/potassium-transporting ATPase subunit alpha-1 0.087 -0.701 0.715
Non-Exosome YES1 Tyrosine-protein kinase Yes -0.595 -1.161 0.904
Non-Exosome ATP2B3 Isoform XA of Plasma membrane calcium-transporting ATPase 3 0.023 -1.191 0.823
Non-Exosome ATP2B2 Isoform YB of Plasma membrane calcium-transporting ATPase 2 0.012 -0.887 0.781
  1. SI-3_SEC_Proteins.csv - contains a list of the protein identifcations from our size exclusion chromatography exosome preparation.
Uniprot Accession Protein Description Gene Symbol Protein Group ProteinProphet Probability
VP37C Vacuolar protein sorting-associated protein 37C VPS37C 1 1
SUSD5 Sushi domain-containing protein 5 SUSD5 10 1
CHM4B Charged multivesicular body protein 4b CHMP4B 100 1
CDCP1 CUB domain-containing protein 1 CDCP1 101 1
MUC5B Mucin-5B MUC5B 102 1
CD320 CD320 antigen CD320 103 1