An integrative and predictive model for the influence of protein sequence, structure and excipients on aggregation propensity

  • Spyros Charonis

Student thesis: Phd


Loss of solubility and aggregation of proteins are important bottlenecks in modern bioprocessing pipelines, where formulation and large-scale production of therapeutic proteins such as antibodies is achieved. The mechanistic basis of protein aggregation propensity and solubility are actively investigated using experimental and computational techniques. A significant part of research in this field involves efforts to understand how sequence- and structure-based properties enable proteins to remain functional under conditions and conditions relevant to physiology and delivery of biotherapeutic agents.Using sequence-based and structural features as well as physico-chemical properties, a model was developed to study how such descriptors can be used in a predictive capacity to separate soluble and insoluble proteins. Therapeutic protein datasets including antibody derivatives and non-antibody biologics were constructed so that their solubility could be studied using the descriptors. Surface charge, polarity, and sequence composition were tested against established thresholds for solubility of E. coli proteins. Surface non-polarity was verified as a consistent feature for separating soluble and insoluble therapeutics, in line with its established role as a key player for determining aggregation propensity in the broader scientific community. The ratio of lysine to arginine composition emerged as a novel sequence-based feature that contributes to solubility, where higher lysine composition is favourable for the solubility. There is potential to use this as a method for engineering proteins for higher solubility with minimal disruption to functionality. The predictive model was subsequently expanded to include a broad array of sequence-based and 3D structural features. Quantitative proteomics studies with high-throughput data for protein solubility, abundance and concentration were used to construct datasets. Web accessible repositories of protein abundance in several species and plasma protein concentration were used to augment the data used to validate the model. Our findings reiterate previously established studies regarding protein length, charge-based properties and surface non-polarity as important descriptors for discriminating soluble and insoluble proteins. The sequence-level lysine/arginine ratio offers a novel perspective on potentially simple ways of protecting proteins against aggregation, which could prove useful for bioprocessing pipelines. Protein-excipient interactions were studied using a dot product metric to measure the association of a set of crystallisation screen ligands with proteins in the PDB database. Enrichment for predicted small molecule (sugars and buffers) binding sites was observed, although the underlying reasons remain unclear without more sophisticated structure-based techniques.
Date of Award1 Aug 2017
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorRobin Curtis (Supervisor) & James Warwicker (Supervisor)

Cite this