Evaluation of Analysis Methods in Mass Spectrometry Based Proteomics

  • Caitlin Arthur

Student thesis: Phd

Abstract

Mass spectrometry (MS) has become indispensable in proteomics and is routinely used in biomarker discovery, on which the foundation of personalised medicine is built. Effective MS analysis relies on sophisticated software tools capable of processing large quantities of complex data. As such, constant improvement of MS data analysis pipelines is essential to keep up with advances in MS capabilities. Data independent analysis (DIA) is favourable in discovery proteomics due to the unbiased fragmentation of all peptides, reportedly improving reproducibility. SWATH MS, SONAR and HDMSe are among the best DIA MS methods available for proteomic biomarker discovery. In order to evaluate the effectiveness of MS pipelines inclusive of software tools commonly used in conjunction with these DIA MS methods, here we conduct a comparative analysis on a range of proteomic data sets. An initial comparison between SWATH MS data processed with OpenSWATH (OS) and HDMSe data processed with Progenesis QI P (PQIP) was conducted on serum samples, the predominant sample type used in biomarker studies. Data processing methods were found to significantly influence outcomes, in particular the ion accounting identification method used in PQIP reports fewer missing values (missingness) than the OS pipeline, which is beneficial in biomarker discovery to achieve comprehensive differential expression analysis. Serum samples typically have low proteins yields in MS analysis due to the large dynamic range of the plasma proteome, therefore a secondary comparison was conducted on cell lines. Cell lines were processed using SWATH MS with OS, SONAR with PQIP, and HDMSe with PQIP. In absence of a universal processing method, SONAR with PQIP was used to provide insights into whether differences seen between SWATH MS with OS and HDMSe with PQIP related to the MS or processing method. Findings substantiated evidence that the PQIP ion accounting method is better able to alleviate missingness. Other findings highlighted differences between outcomes in HDMSe and SONAR that are likely attributed to advances in the SONAR MS method. Finally, developments to the SWATH MS with OS pipeline were tested by comparing a novel multi-library approach to the standard single library method, along with a new machine learning element to replace standard p-value based expression analysis in order to determine which of the methods offers the most valuable insights for candidate biomarker discovery. Combining multiple libraries using z-scores increased the number of total proteins quantified and those significantly differentially expressed, providing higher predictive power for the potential biomarker panels. Throughout this research the choice of processing methods were found to significantly impact the interpretation of data and final results. As such, further efforts into improving reproducibility across the field are needed.
Date of Award31 Dec 2022
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorAnthony Whetton (Supervisor), Andrew Pierce (Supervisor) & Richard Unwin (Supervisor)

Cite this

'