1
|
Crook OM, Chung CW, Deane CM. Challenges and Opportunities for Bayesian Statistics in Proteomics. J Proteome Res 2022; 21:849-864. [PMID: 35258980 PMCID: PMC8982455 DOI: 10.1021/acs.jproteome.1c00859] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Indexed: 12/27/2022]
Abstract
Proteomics is a data-rich science with complex experimental designs and an intricate measurement process. To obtain insights from the large data sets produced, statistical methods, including machine learning, are routinely applied. For a quantity of interest, many of these approaches only produce a point estimate, such as a mean, leaving little room for more nuanced interpretations. By contrast, Bayesian statistics allows quantification of uncertainty through the use of probability distributions. These probability distributions enable scientists to ask complex questions of their proteomics data. Bayesian statistics also offers a modular framework for data analysis by making dependencies between data and parameters explicit. Hence, specifying complex hierarchies of parameter dependencies is straightforward in the Bayesian framework. This allows us to use a statistical methodology which equals, rather than neglects, the sophistication of experimental design and instrumentation present in proteomics. Here, we review Bayesian methods applied to proteomics, demonstrating their potential power, alongside the challenges posed by adopting this new statistical framework. To illustrate our review, we give a walk-through of the development of a Bayesian model for dynamic organic orthogonal phase-separation (OOPS) data.
Collapse
Affiliation(s)
- Oliver M. Crook
- Department
of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Chun-wa Chung
- Structural
and Biophysical Sciences, GlaxoSmithKline
R&D, Stevenage SG1 2NY, United Kingdom
| | - Charlotte M. Deane
- Department
of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| |
Collapse
|
2
|
Crook OM, Mulvey CM, Kirk PDW, Lilley KS, Gatto L. A Bayesian mixture modelling approach for spatial proteomics. PLoS Comput Biol 2018; 14:e1006516. [PMID: 30481170 PMCID: PMC6258510 DOI: 10.1371/journal.pcbi.1006516] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 09/17/2018] [Indexed: 01/01/2023] Open
Abstract
Analysis of the spatial sub-cellular distribution of proteins is of vital importance to fully understand context specific protein function. Some proteins can be found with a single location within a cell, but up to half of proteins may reside in multiple locations, can dynamically re-localise, or reside within an unknown functional compartment. These considerations lead to uncertainty in associating a protein to a single location. Currently, mass spectrometry (MS) based spatial proteomics relies on supervised machine learning algorithms to assign proteins to sub-cellular locations based on common gradient profiles. However, such methods fail to quantify uncertainty associated with sub-cellular class assignment. Here we reformulate the framework on which we perform statistical analysis. We propose a Bayesian generative classifier based on Gaussian mixture models to assign proteins probabilistically to sub-cellular niches, thus proteins have a probability distribution over sub-cellular locations, with Bayesian computation performed using the expectation-maximisation (EM) algorithm, as well as Markov-chain Monte-Carlo (MCMC). Our methodology allows proteome-wide uncertainty quantification, thus adding a further layer to the analysis of spatial proteomics. Our framework is flexible, allowing many different systems to be analysed and reveals new modelling opportunities for spatial proteomics. We find our methods perform competitively with current state-of-the art machine learning methods, whilst simultaneously providing more information. We highlight several examples where classification based on the support vector machine is unable to make any conclusions, while uncertainty quantification using our approach provides biologically intriguing results. To our knowledge this is the first Bayesian model of MS-based spatial proteomics data.
Collapse
Affiliation(s)
- Oliver M. Crook
- Computational Proteomics Unit, Department of Biochemistry, University of Cambridge, Cambridge, UK
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, UK
- MRC Biostatistics Unit, Cambridge Institute for Public Health, Cambridge, UK
| | - Claire M. Mulvey
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Paul D. W. Kirk
- MRC Biostatistics Unit, Cambridge Institute for Public Health, Cambridge, UK
| | - Kathryn S. Lilley
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Laurent Gatto
- Computational Proteomics Unit, Department of Biochemistry, University of Cambridge, Cambridge, UK
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, UK
- * E-mail:
| |
Collapse
|
3
|
Gilbert M, Schulze WX. Global Identification of Protein Complexes within the Membrane Proteome of Arabidopsis Roots Using a SEC-MS Approach. J Proteome Res 2018; 18:107-119. [PMID: 30370772 DOI: 10.1021/acs.jproteome.8b00382] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Biological processes consist of several consecutive and interacting steps as, for example, in signal transduction cascades or metabolic reaction chains. These processes are regulated by protein-protein interactions and the formation of larger protein complexes, which also occur within biological membranes. To gain a large-scale overview of complex-forming proteins and the composition of such complexes within the cellular membranes of Arabidopsis roots, we use the combination of size-exclusion chromatography and mass spectrometry. First, we identified complex-forming proteins by a retention shift analysis relative to expected retention times of monomeric proteins during size-exclusion chromatography. In a second step we predicted complex composition through pairwise correlation of elution profiles. As result we present an interactome of 963 proteins within cellular membranes of Arabidopsis roots. Identification of complex-forming proteins was highly robust between two independently grown root proteomes. The protein complex composition derived from pairwise correlations of coeluting proteins reproducibly identified stable protein complexes (ribosomes, proteasome, mitochondrial respiratory chain supercomplexes) but showed higher variance between replicates regarding transient interactions (e.g., interactions with kinases) within membrane protein complexes.
Collapse
Affiliation(s)
- Max Gilbert
- Department of Plant Systems Biology , Universität Hohenheim , 70593 Stuttgart , Germany
| | - Waltraud X Schulze
- Department of Plant Systems Biology , Universität Hohenheim , 70593 Stuttgart , Germany
| |
Collapse
|
4
|
Rosa-Fernandes L, Rocha VB, Carregari VC, Urbani A, Palmisano G. A Perspective on Extracellular Vesicles Proteomics. Front Chem 2017; 5:102. [PMID: 29209607 PMCID: PMC5702361 DOI: 10.3389/fchem.2017.00102] [Citation(s) in RCA: 82] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 11/03/2017] [Indexed: 12/15/2022] Open
Abstract
Increasing attention has been given to secreted extracellular vesicles (EVs) in the past decades, especially in the portrayal of their molecular cargo and role as messengers in both homeostasis and pathophysiological conditions. This review presents the state-of-the-art proteomic technologies to identify and quantify EVs proteins along with their PTMs, interacting partners and structural details. The rapid growth of mass spectrometry-based analytical strategies for protein sequencing, PTMs and structural characterization has improved the level of molecular details that can be achieved from limited amount of EVs isolated from different biological sources. Here we will provide a perspective view on the achievements and challenges on EVs proteome characterization using mass spectrometry. A detailed bioinformatics approach will help us to picture the molecular fingerprint of EVs and understand better their pathophysiological function.
Collapse
Affiliation(s)
- Livia Rosa-Fernandes
- GlycoProteomics Laboratory, Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
| | - Victória Bombarda Rocha
- GlycoProteomics Laboratory, Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
| | | | - Andrea Urbani
- Proteomic and Metabonomic Laboratory, Fondazione Santa Lucia, Rome, Italy.,Institute of Biochemistry and Biochemical Clinic, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Giuseppe Palmisano
- GlycoProteomics Laboratory, Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil.,Proteomic and Metabonomic Laboratory, Fondazione Santa Lucia, Rome, Italy
| |
Collapse
|
5
|
Meysman P, Titeca K, Eyckerman S, Tavernier J, Goethals B, Martens L, Valkenborg D, Laukens K. Protein complex analysis: From raw protein lists to protein interaction networks. MASS SPECTROMETRY REVIEWS 2017; 36:600-614. [PMID: 26709718 DOI: 10.1002/mas.21485] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Accepted: 11/17/2015] [Indexed: 06/05/2023]
Abstract
The elucidation of molecular interaction networks is one of the pivotal challenges in the study of biology. Affinity purification-mass spectrometry and other co-complex methods have become widely employed experimental techniques to identify protein complexes. These techniques typically suffer from a high number of false negatives and false positive contaminants due to technical shortcomings and purification biases. To support a diverse range of experimental designs and approaches, a large number of computational methods have been proposed to filter, infer and validate protein interaction networks from experimental pull-down MS data. Nevertheless, this expansion of available methods complicates the selection of the most optimal ones to support systems biology-driven knowledge extraction. In this review, we give an overview of the most commonly used computational methods to process and interpret co-complex results, and we discuss the issues and unsolved problems that still exist within the field. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:600-614, 2017.
Collapse
Affiliation(s)
- Pieter Meysman
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Kevin Titeca
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Sven Eyckerman
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Jan Tavernier
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Bart Goethals
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Lennart Martens
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Dirk Valkenborg
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- IBioStat, Hasselt University, Hasselt, Belgium
- CFP-CeProMa, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
6
|
Reprint of “Abstraction for data integration: Fusing mammalian molecular, cellular and phenotype big datasets for better knowledge extraction”. Comput Biol Chem 2015; 59 Pt B:123-38. [DOI: 10.1016/j.compbiolchem.2015.08.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 06/04/2015] [Accepted: 06/05/2015] [Indexed: 12/21/2022]
|
7
|
Kutzera J, Smilde AK, Wilderjans TF, Hoefsloot HCJ. Towards a Hierarchical Strategy to Explore Multi-Scale IP/MS Data for Protein Complexes. PLoS One 2015; 10:e0139704. [PMID: 26448546 PMCID: PMC4598013 DOI: 10.1371/journal.pone.0139704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 09/16/2015] [Indexed: 11/24/2022] Open
Abstract
Protein interaction in cells can be described at different levels. At a low interaction level, proteins function together in small, stable complexes and at a higher level, in sets of interacting complexes. All interaction levels are crucial for the living organism, and one of the challenges in proteomics is to measure the proteins at their different interaction levels. One common method for such measurements is immunoprecipitation followed by mass spectrometry (IP/MS), which has the potential to probe the different protein interaction forms. However, IP/MS data are complex because proteins, in their diverse interaction forms, manifest themselves in different ways in the data. Numerous bioinformatic tools for finding protein complexes in IP/MS data are currently available, but most tools do not provide information about the interaction level of the discovered complexes, and no tool is geared specifically to unraveling and visualizing these different levels. We present a new bioinformatic tool to explore IP/MS datasets for protein complexes at different interaction levels and show its performance on several real–life datasets. Our tool creates clusters that represent protein complexes, but unlike previous methods, it arranges them in a tree–shaped structure, reporting why specific proteins are predicted to build a complex and where it can be divided into smaller complexes. In every data analysis method, parameters have to be chosen. Our method can suggest values for its parameters and comes with adapted visualization tools that display the effect of the parameters on the result. The tools provide fast graphical feedback and allow the user to interact with the data by changing the parameters and examining the result. The tools also allow for exploring the different organizational levels of the protein complexes in a given dataset. Our method is available as GNU-R source code and includes examples at www.bdagroup.nl.
Collapse
Affiliation(s)
- Joachim Kutzera
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
- Netherlands Institute for Systems Biology, University of Amsterdam, Amsterdam, The Netherlands
- * E-mail:
| | - Age K. Smilde
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
- Netherlands Institute for Systems Biology, University of Amsterdam, Amsterdam, The Netherlands
| | - Tom F. Wilderjans
- Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
- Faculty of Social and Behavioural Sciences, Leiden University, Leiden, The Netherlands
| | - Huub C. J. Hoefsloot
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
- Netherlands Institute for Systems Biology, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
8
|
López-Fernández H, Santos HM, Capelo JL, Fdez-Riverola F, Glez-Peña D, Reboiro-Jato M. Mass-Up: an all-in-one open software application for MALDI-TOF mass spectrometry knowledge discovery. BMC Bioinformatics 2015; 16:318. [PMID: 26437641 PMCID: PMC4595311 DOI: 10.1186/s12859-015-0752-4] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Accepted: 09/28/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mass spectrometry is one of the most important techniques in the field of proteomics. MALDI-TOF mass spectrometry has become popular during the last decade due to its high speed and sensitivity for detecting proteins and peptides. MALDI-TOF-MS can be also used in combination with Machine Learning techniques and statistical methods for knowledge discovery. Although there are many software libraries and tools that can be combined for these kind of analysis, there is still a need for all-in-one solutions with graphical user-friendly interfaces and avoiding the need of programming skills. RESULTS Mass-Up, an open software multiplatform application for MALDI-TOF-MS knowledge discovery is herein presented. Mass-Up software allows data preprocessing, as well as subsequent analysis including (i) biomarker discovery, (ii) clustering, (iii) biclustering, (iv) three-dimensional PCA visualization and (v) classification of large sets of spectra data. CONCLUSIONS Mass-Up brings knowledge discovery within reach of MALDI-TOF-MS researchers. Mass-Up is distributed under license GPLv3 and it is open and free to all users at http://sing.ei.uvigo.es/mass-up.
Collapse
Affiliation(s)
- H López-Fernández
- Informatics Department, Universidad de Vigo, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain. .,Instituto de Investigación Biomédica de Vigo (IBIV), Vigo, Pontevedra, Spain.
| | - H M Santos
- BIOSCOPE Research Group, UCIBIO-REQUIMTE, Department of Chemistry, Faculty of Science and Technology, Universidade NOVA de Lisboa, Caparica, Setubal, Portugal.
| | - J L Capelo
- BIOSCOPE Research Group, UCIBIO-REQUIMTE, Department of Chemistry, Faculty of Science and Technology, Universidade NOVA de Lisboa, Caparica, Setubal, Portugal.
| | - F Fdez-Riverola
- Informatics Department, Universidad de Vigo, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain. .,Instituto de Investigación Biomédica de Vigo (IBIV), Vigo, Pontevedra, Spain.
| | - D Glez-Peña
- Informatics Department, Universidad de Vigo, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain. .,Instituto de Investigación Biomédica de Vigo (IBIV), Vigo, Pontevedra, Spain.
| | - M Reboiro-Jato
- Informatics Department, Universidad de Vigo, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain. .,Instituto de Investigación Biomédica de Vigo (IBIV), Vigo, Pontevedra, Spain.
| |
Collapse
|
9
|
Rouillard AD, Wang Z, Ma’ayan A. Publisher’s Note:Abstraction for data integration:Fusing mammalian molecular, cellular and phenotype big datasets for better knowledge extraction. Comput Biol Chem 2015; 58:104-19. [PMID: 26101093 PMCID: PMC4675694 DOI: 10.1016/j.compbiolchem.2015.06.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 06/04/2015] [Accepted: 06/05/2015] [Indexed: 12/27/2022]
Abstract
With advances in genomics, transcriptomics, metabolomics and proteomics, and more expansive electronic clinical record monitoring, as well as advances in computation, we have entered the Big Data era in biomedical research. Data gathering is growing rapidly while only a small fraction of this data is converted to useful knowledge or reused in future studies. To improve this, an important concept that is often overlooked is data abstraction. To fuse and reuse biomedical datasets from diverse resources, data abstraction is frequently required. Here we summarize some of the major Big Data biomedical research resources for genomics, proteomics and phenotype data, collected from mammalian cells, tissues and organisms. We then suggest simple data abstraction methods for fusing this diverse but related data. Finally, we demonstrate examples of the potential utility of such data integration efforts, while warning about the inherit biases that exist within such data.
Collapse
Affiliation(s)
- Andrew D. Rouillard
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, NY 10029
- BD2K-LINCS Data Coordination and Integration Center
- Illuminating the Druggable Genome Knowledge Management Center
| | - Zichen Wang
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, NY 10029
- BD2K-LINCS Data Coordination and Integration Center
- Illuminating the Druggable Genome Knowledge Management Center
| | - Avi Ma’ayan
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, NY 10029
- BD2K-LINCS Data Coordination and Integration Center
- Illuminating the Druggable Genome Knowledge Management Center
| |
Collapse
|
10
|
Knight JDR, Liu G, Zhang JP, Pasculescu A, Choi H, Gingras AC. A web-tool for visualizing quantitative protein-protein interaction data. Proteomics 2015; 15:1432-6. [PMID: 25422071 DOI: 10.1002/pmic.201400429] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2014] [Revised: 10/02/2014] [Accepted: 11/20/2014] [Indexed: 11/06/2022]
Abstract
Quantitative interaction proteomics data can be a challenge to efficiently analyze and subsequently present to an audience in a simple and easy to understand format that still conveys sufficient levels of information. Here we present freely accessible and open-source web tools for displaying multiple parameters from quantitative protein-protein interaction data sets in a visually intuitive format. Given a set of "bait" proteins with detected "prey" interactions, dot plots can be generated to display absolute spectral counts for the preys, relative spectral counts between baits and confidence levels for the interactions (e.g. as determined by SAINTexpress). Additional tools are available for displaying fold change results between numerous baits with their associated confidence level (e.g. resulting from intensity measurements) and pairwise bait analyses displaying spectral counts, confidence score and fold change differences in a scatter plot format. These tools make it easy for the user to identify important interaction changes, interpret their data, and present this information to others in an intuitive way.
Collapse
Affiliation(s)
- James D R Knight
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada
| | | | | | | | | | | |
Collapse
|
11
|
Fabre B, Lambour T, Garrigues L, Amalric F, Vigneron N, Menneteau T, Stella A, Monsarrat B, Van den Eynde B, Burlet-Schiltz O, Bousquet-Dubouch MP. Deciphering preferential interactions within supramolecular protein complexes: the proteasome case. Mol Syst Biol 2015; 11:771. [PMID: 25561571 PMCID: PMC4332148 DOI: 10.15252/msb.20145497] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
In eukaryotic cells, intracellular protein breakdown is mainly performed by the ubiquitin-proteasome system. Proteasomes are supramolecular protein complexes formed by the association of multiple sub-complexes and interacting proteins. Therefore, they exhibit a very high heterogeneity whose function is still not well understood. Here, using a newly developed method based on the combination of affinity purification and protein correlation profiling associated with high-resolution mass spectrometry, we comprehensively characterized proteasome heterogeneity and identified previously unknown preferential associations within proteasome sub-complexes. In particular, we showed for the first time that the two main proteasome subtypes, standard proteasome and immunoproteasome, interact with a different subset of important regulators. This trend was observed in very diverse human cell types and was confirmed by changing the relative proportions of both 20S proteasome forms using interferon-γ. The new method developed here constitutes an innovative and powerful strategy that could be broadly applied for unraveling the dynamic and heterogeneous nature of other biologically relevant supramolecular protein complexes.
Collapse
Affiliation(s)
- Bertrand Fabre
- CNRS IPBS (Institut de Pharmacologie et de Biologie Structurale), Toulouse, France Université de Toulouse UPS IPBS, Toulouse, France
| | - Thomas Lambour
- CNRS IPBS (Institut de Pharmacologie et de Biologie Structurale), Toulouse, France Université de Toulouse UPS IPBS, Toulouse, France
| | - Luc Garrigues
- CNRS IPBS (Institut de Pharmacologie et de Biologie Structurale), Toulouse, France Université de Toulouse UPS IPBS, Toulouse, France
| | - François Amalric
- CNRS IPBS (Institut de Pharmacologie et de Biologie Structurale), Toulouse, France Université de Toulouse UPS IPBS, Toulouse, France
| | - Nathalie Vigneron
- Ludwig Institute for Cancer Research, Brussels, Belgium WELBIO (Walloon Excellence in Life Sciences and Biotechnology), Brussels, Belgium de Duve Institute Université catholique de Louvain, Brussels, Belgium
| | - Thomas Menneteau
- CNRS IPBS (Institut de Pharmacologie et de Biologie Structurale), Toulouse, France Université de Toulouse UPS IPBS, Toulouse, France
| | - Alexandre Stella
- CNRS IPBS (Institut de Pharmacologie et de Biologie Structurale), Toulouse, France Université de Toulouse UPS IPBS, Toulouse, France
| | - Bernard Monsarrat
- CNRS IPBS (Institut de Pharmacologie et de Biologie Structurale), Toulouse, France Université de Toulouse UPS IPBS, Toulouse, France
| | - Benoît Van den Eynde
- Ludwig Institute for Cancer Research, Brussels, Belgium WELBIO (Walloon Excellence in Life Sciences and Biotechnology), Brussels, Belgium de Duve Institute Université catholique de Louvain, Brussels, Belgium
| | - Odile Burlet-Schiltz
- CNRS IPBS (Institut de Pharmacologie et de Biologie Structurale), Toulouse, France Université de Toulouse UPS IPBS, Toulouse, France
| | - Marie-Pierre Bousquet-Dubouch
- CNRS IPBS (Institut de Pharmacologie et de Biologie Structurale), Toulouse, France Université de Toulouse UPS IPBS, Toulouse, France
| |
Collapse
|
12
|
Teng B, Zhao C, Liu X, He Z. Network inference from AP-MS data: computational challenges and solutions. Brief Bioinform 2014; 16:658-74. [DOI: 10.1093/bib/bbu038] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 09/30/2014] [Indexed: 02/04/2023] Open
|
13
|
Clancy T, Hovig E. From proteomes to complexomes in the era of systems biology. Proteomics 2014; 14:24-41. [PMID: 24243660 DOI: 10.1002/pmic.201300230] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2013] [Revised: 10/22/2013] [Accepted: 11/06/2013] [Indexed: 01/16/2023]
Abstract
Protein complexes carry out almost the entire signaling and functional processes in the cell. The protein complex complement of a cell, and its network of complex-complex interactions, is referred to here as the complexome. Computational methods to predict protein complexes from proteomics data, resulting in network representations of complexomes, have recently being developed. In addition, key advances have been made toward understanding the network and structural organization of complexomes. We review these bioinformatics advances, and their discovery-potential, as well as the merits of integrating proteomics data with emerging methods in systems biology to study protein complex signaling. It is envisioned that improved integration of proteomics and systems biology, incorporating the dynamics of protein complexes in space and time, may lead to more predictive models of cell signaling networks for effective modulation.
Collapse
Affiliation(s)
- Trevor Clancy
- Department of Tumor Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | | |
Collapse
|
14
|
Henriques R, Madeira SC. BicSPAM: flexible biclustering using sequential patterns. BMC Bioinformatics 2014; 15:130. [PMID: 24885271 PMCID: PMC4071222 DOI: 10.1186/1471-2105-15-130] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Accepted: 04/07/2014] [Indexed: 11/10/2022] Open
Abstract
Background Biclustering is a critical task for biomedical applications. Order-preserving biclusters, submatrices where the values of rows induce the same linear ordering across columns, capture local regularities with constant, shifting, scaling and sequential assumptions. Additionally, biclustering approaches relying on pattern mining output deliver exhaustive solutions with an arbitrary number and positioning of biclusters. However, existing order-preserving approaches suffer from robustness, scalability and/or flexibility issues. Additionally, they are not able to discover biclusters with symmetries and parameterizable levels of noise. Results We propose new biclustering algorithms to perform flexible, exhaustive and noise-tolerant biclustering based on sequential patterns (BicSPAM). Strategies are proposed to allow for symmetries and to seize efficiency gains from item-indexable properties and/or from partitioning methods with conservative distance guarantees. Results show BicSPAM ability to capture symmetries, handle planted noise, and scale in terms of memory and time. BicSPAM also achieves the best match-scores for the recovery of hidden biclusters in synthetic datasets with varying noise distributions and levels of missing values. Finally, results on gene expression data lead to complete solutions, delivering new biclusters corresponding to putative modules with heightened biological relevance. Conclusions BicSPAM provides an exhaustive way to discover flexible structures of order-preserving biclusters. To the best of our knowledge, BicSPAM is the first attempt to deal with order-preserving biclusters that allow for symmetries and that are robust to varying levels of noise.
Collapse
Affiliation(s)
- Rui Henriques
- Knowledge Discovery and BIOInformatics group (KDBIO), INESC-ID, and Computer Science and Engineering (CSE) Department, Instituto Superior Técnico, Universidade de Lisboa, Av, Rovisco Pais, 1, 1049-001 Lisboa, Portugal.
| | | |
Collapse
|
15
|
Kutzera J, Hoefsloot HCJ, Malovannaya A, Smit AB, Van Mechelen I, Smilde AK. Inferring protein-protein interaction complexes from immunoprecipitation data. BMC Res Notes 2013; 6:468. [PMID: 24237943 PMCID: PMC3874675 DOI: 10.1186/1756-0500-6-468] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Accepted: 10/31/2013] [Indexed: 11/26/2022] Open
Abstract
Background Protein–protein interactions in cells are widely explored using small–scale experiments. However, the search for protein complexes and their interactions in data from high throughput experiments such as immunoprecipitation is still a challenge. We present "4N", a novel method for detecting protein complexes in such data. Our method is a heuristic algorithm based on Near Neighbor Network (3N) clustering. It is written in R, it is faster than model-based methods, and has only a small number of tuning parameters. We explain the application of our new method to real immunoprecipitation results and two artificial datasets. We show that the method can infer protein complexes from protein immunoprecipitation datasets of different densities and sizes. Findings 4N was applied on the immunoprecipitation dataset that was presented by the authors of the original 3N in Cell 145:787–799, 2011. The test with our method shows that it can reproduce the original clustering results with fewer manually adapted parameters and, in addition, gives direct insight into the complex–complex interactions. We also tested 4N on the human "Tip49a/b" dataset. We conclude that 4N can handle the contaminants and can correctly infer complexes from this very dense dataset. Further tests were performed on two artificial datasets of different sizes. We proved that the method predicts the reference complexes in the two artificial datasets with high accuracy, even when the number of samples is reduced. Conclusions 4N has been implemented in R. We provide the sourcecode of 4N and a user-friendly toolbox including two example calculations. Biologists can use this 4N-toolbox even if they have a limited knowledge of R. There are only a few tuning parameters to set, and each of these parameters has a biological interpretation. The run times for medium scale datasets are in the order of minutes on a standard desktop PC. Large datasets can typically be analyzed within a few hours.
Collapse
Affiliation(s)
- Joachim Kutzera
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands.
| | | | | | | | | | | |
Collapse
|
16
|
Tucker G, Loh PR, Berger B. A sampling framework for incorporating quantitative mass spectrometry data in protein interaction analysis. BMC Bioinformatics 2013; 14:299. [PMID: 24093595 PMCID: PMC3851523 DOI: 10.1186/1471-2105-14-299] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2013] [Accepted: 09/14/2013] [Indexed: 11/15/2022] Open
Abstract
Background Comprehensive protein-protein interaction (PPI) maps are a powerful resource for uncovering the molecular basis of genetic interactions and providing mechanistic insights. Over the past decade, high-throughput experimental techniques have been developed to generate PPI maps at proteome scale, first using yeast two-hybrid approaches and more recently via affinity purification combined with mass spectrometry (AP-MS). Unfortunately, data from both protocols are prone to both high false positive and false negative rates. To address these issues, many methods have been developed to post-process raw PPI data. However, with few exceptions, these methods only analyze binary experimental data (in which each potential interaction tested is deemed either observed or unobserved), neglecting quantitative information available from AP-MS such as spectral counts. Results We propose a novel method for incorporating quantitative information from AP-MS data into existing PPI inference methods that analyze binary interaction data. Our approach introduces a probabilistic framework that models the statistical noise inherent in observations of co-purifications. Using a sampling-based approach, we model the uncertainty of interactions with low spectral counts by generating an ensemble of possible alternative experimental outcomes. We then apply the existing method of choice to each alternative outcome and aggregate results over the ensemble. We validate our approach on three recent AP-MS data sets and demonstrate performance comparable to or better than state-of-the-art methods. Additionally, we provide an in-depth discussion comparing the theoretical bases of existing approaches and identify common aspects that may be key to their performance. Conclusions Our sampling framework extends the existing body of work on PPI analysis using binary interaction data to apply to the richer quantitative data now commonly available through AP-MS assays. This framework is quite general, and many enhancements are likely possible. Fruitful future directions may include investigating more sophisticated schemes for converting spectral counts to probabilities and applying the framework to direct protein complex prediction methods.
Collapse
Affiliation(s)
- George Tucker
- Mathematics Department and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | | | | |
Collapse
|
17
|
Moussavi-Harami SF, Annis DS, Ma W, Berry SM, Coughlin EE, Strotman LN, Maurer LM, Westphall MS, Coon JJ, Mosher DF, Beebe DJ. Characterization of molecules binding to the 70K N-terminal region of fibronectin by IFAST purification coupled with mass spectrometry. J Proteome Res 2013; 12:3393-404. [PMID: 23750785 DOI: 10.1021/pr400225p] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Fibronectin (Fn) is a large glycoprotein present in plasma and extracellular matrix and is important for many processes. Within Fn the 70 kDa N-terminal region (70k-Fn) is involved in cell-mediated Fn assembly, a process that contributes to embryogenesis, development, and platelet thrombus formation. In addition, major human pathogens including Staphlycoccus aureus and Streptococcus pyogenes bind the 70k-Fn region by a novel form of protein-protein interaction called β-zipper formation, facilitating bacterial spread and colonization. Knowledge of blood plasma and platelet proteins that interact with 70k-Fn by β-zipper formation is incomplete. In the current study, we aimed to characterize these proteins through affinity purification. For this affinity purification, we used a novel purification technique termed immiscible filtration assisted by surface tension (IFAST). The foundation of this technology is immiscible phase filtration, using a magnet to draw paramagnetic particle (PMP)-bound analyte through an immiscible barrier (oil or organic solvent) that separates an aqueous sample from an aqueous eluting buffer. The immiscible barrier functions to remove unbound proteins via exclusion rather than dilutive washing used in traditional isolation methods. We identified 31 interactors from plasma, of which only seven were previously known to interact with Fn. Furthermore, five proteins were identified to interact with 70k-Fn from platelet lysate, of which one was previously known. These results demonstrate that IFAST offers advantages for proteomic studies of interacting molecules in that the technique requires small sample volumes, can be done with high enough throughput to sample multiple interaction conditions, and is amenable to exploratory mass spectrometric and confirmatory immuno-blotting read-outs.
Collapse
|
18
|
Nesvizhskii AI. Computational and informatics strategies for identification of specific protein interaction partners in affinity purification mass spectrometry experiments. Proteomics 2012; 12:1639-55. [PMID: 22611043 DOI: 10.1002/pmic.201100537] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Analysis of protein interaction networks and protein complexes using affinity purification and mass spectrometry (AP/MS) is among most commonly used and successful applications of proteomics technologies. One of the foremost challenges of AP/MS data is a large number of false-positive protein interactions present in unfiltered data sets. Here we review computational and informatics strategies for detecting specific protein interaction partners in AP/MS experiments, with a focus on incomplete (as opposite to genome wide) interactome mapping studies. These strategies range from standard statistical approaches, to empirical scoring schemes optimized for a particular type of data, to advanced computational frameworks. The common denominator among these methods is the use of label-free quantitative information such as spectral counts or integrated peptide intensities that can be extracted from AP/MS data. We also discuss related issues such as combining multiple biological or technical replicates, and dealing with data generated using different tagging strategies. Computational approaches for benchmarking of scoring methods are discussed, and the need for generation of reference AP/MS data sets is highlighted. Finally, we discuss the possibility of more extended modeling of experimental AP/MS data, including integration with external information such as protein interaction predictions based on functional genomics data.
Collapse
|
19
|
Abstract
Protein complex identification is an important goal of protein-protein interaction analysis. To date, development of computational methods for detecting protein complexes has been largely motivated by genome-scale interaction data sets from high-throughput assays such as yeast two-hybrid or tandem affinity purification coupled with mass spectrometry (TAP-MS). However, due to the popularity of small to intermediate-scale affinity purification-mass spectrometry (AP-MS) experiments, protein complex detection is increasingly discussed in local network analysis. In such data sets, protein complexes cannot be detected using binary interaction data alone because the data contain interactions with tagged proteins only and, as a result, interactions between all other proteins remain unobserved, limiting the scope of existing algorithms. In this article, we provide a pragmatic review of network graph-based computational algorithms for protein complex analysis in global interactome data, without requiring any computational background. We discuss the practical gap in applying these algorithms to recently surging small to intermediate-scale AP-MS data sets, and review alternative clustering algorithms using quantitative proteomics data and their limitations.
Collapse
Affiliation(s)
- Hyungwon Choi
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore.
| |
Collapse
|
20
|
Saha S, Dazard JE, Xu H, Ewing RM. Computational framework for analysis of prey-prey associations in interaction proteomics identifies novel human protein-protein interactions and networks. J Proteome Res 2012; 11:4476-87. [PMID: 22845868 DOI: 10.1021/pr300227y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Large-scale protein-protein interaction data sets have been generated for several species including yeast and human and have enabled the identification, quantification, and prediction of cellular molecular networks. Affinity purification-mass spectrometry (AP-MS) is the preeminent methodology for large-scale analysis of protein complexes, performed by immunopurifying a specific "bait" protein and its associated "prey" proteins. The analysis and interpretation of AP-MS data sets is, however, not straightforward. In addition, although yeast AP-MS data sets are relatively comprehensive, current human AP-MS data sets only sparsely cover the human interactome. Here we develop a framework for analysis of AP-MS data sets that addresses the issues of noise, missing data, and sparsity of coverage in the context of a current, real world human AP-MS data set. Our goal is to extend and increase the density of the known human interactome by integrating bait-prey and cocomplexed preys (prey-prey associations) into networks. Our framework incorporates a score for each identified protein, as well as elements of signal processing to improve the confidence of identified protein-protein interactions. We identify many protein networks enriched in known biological processes and functions. In addition, we show that integrated bait-prey and prey-prey interactions can be used to refine network topology and extend known protein networks.
Collapse
Affiliation(s)
- Sudipto Saha
- Center for Proteomics and Bioinformatics, Western Reserve University School of Medicine, Cleveland, Ohio 44106, USA
| | | | | | | |
Collapse
|
21
|
Abstract
In the life sciences, a new paradigm is emerging that places networks of interacting molecules between genotype and phenotype. These networks are dynamically modulated by a multitude of factors, and the properties emerging from the network as a whole determine observable phenotypes. This paradigm is usually referred to as systems biology, network biology, or integrative biology. Mass spectrometry (MS)-based proteomics is a central life science technology that has realized great progress toward the identification, quantification, and characterization of the proteins that constitute a proteome. Here, we review how MS-based proteomics has been applied to network biology to identify the nodes and edges of biological networks, to detect and quantify perturbation-induced network changes, and to correlate dynamic network rewiring with the cellular phenotype. We discuss future directions for MS-based proteomics within the network biology paradigm.
Collapse
Affiliation(s)
- Ariel Bensimon
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, CH 8093, Switzerland.
| | | | | |
Collapse
|
22
|
Zeng T, Chen L. Tracing dynamic biological processes during phase transition. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 1:S12. [PMID: 23046764 PMCID: PMC3403121 DOI: 10.1186/1752-0509-6-s1-s12] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Background Phase transition widely exists in the biological world, such as transformation of cell cycle phases, cell differentiation stages, disease development, and so on. Such a nonlinear phenomenon is considered as the conversion of a biological system from one phenotype/state to another. Studies on the molecular mechanisms of biological phase transition have attracted much attention, in particular, on different genotypes (or expression variations) in a specific phase, but with less of focus on cascade changes of genes' functions (or system state) during the phase shift or transition process. However, it is a fundamental but important mission to trace the temporal characteristics of a biological system during a specific phase transition process, which can offer clues for understanding dynamic behaviors of living organisms. Results By overcoming the hurdles of traditional time segmentation and temporal biclustering methods, a causal process model (CPM) in the present work is proposed to study the biological phase transition in a systematic manner, i.e. first, we make gene-specific segmentation on time-course expression data by developing a new boundary gene estimation scheme, and then infer functional cascade dynamics by constructing a temporal block network. After the computational validation on synthetic data, CPM was used to analyze the well-known Yeast cell cycle data. It was found that the dynamics of the boundary genes are periodic and consistent with the phases of the cell cycle, and the temporal block network indeed demonstrates a meaningful cascade structure of the enriched biological functions. In addition, we further studied protein modules based on the temporal block network, which reflect temporal features in different cycles. Conclusions All of these results demonstrate that CPM is effective and efficient comparing to traditional methods, and is able to elucidate essential regulatory mechanism of a biological system even with complicated nonlinear phase transitions.
Collapse
Affiliation(s)
- Tao Zeng
- Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | | |
Collapse
|
23
|
Stukalov A, Superti-Furga G, Colinge J. Deconvolution of Targeted Protein–Protein Interaction Maps. J Proteome Res 2012; 11:4102-9. [DOI: 10.1021/pr300137n] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Alexey Stukalov
- CeMM − Center for Molecular Medicine of the Austrian Academy of Sciences, AKH-BT 25.3, Lazarettgasse
14, A-1090 Vienna, Austria
| | - Giulio Superti-Furga
- CeMM − Center for Molecular Medicine of the Austrian Academy of Sciences, AKH-BT 25.3, Lazarettgasse
14, A-1090 Vienna, Austria
| | - Jacques Colinge
- CeMM − Center for Molecular Medicine of the Austrian Academy of Sciences, AKH-BT 25.3, Lazarettgasse
14, A-1090 Vienna, Austria
| |
Collapse
|
24
|
Bantscheff M, Lemeer S, Savitski MM, Kuster B. Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal Bioanal Chem 2012; 404:939-65. [PMID: 22772140 DOI: 10.1007/s00216-012-6203-4] [Citation(s) in RCA: 539] [Impact Index Per Article: 44.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2012] [Revised: 06/06/2012] [Accepted: 06/15/2012] [Indexed: 02/08/2023]
Abstract
Mass-spectrometry-based proteomics is continuing to make major contributions to the discovery of fundamental biological processes and, more recently, has also developed into an assay platform capable of measuring hundreds to thousands of proteins in any biological system. The field has progressed at an amazing rate over the past five years in terms of technology as well as the breadth and depth of applications in all areas of the life sciences. Some of the technical approaches that were at an experimental stage back then are considered the gold standard today, and the community is learning to come to grips with the volume and complexity of the data generated. The revolution in DNA/RNA sequencing technology extends the reach of proteomic research to practically any species, and the notion that mass spectrometry has the potential to eventually retire the western blot is no longer in the realm of science fiction. In this review, we focus on the major technical and conceptual developments since 2007 and illustrate these by important recent applications.
Collapse
|
25
|
Choi H, Glatter T, Gstaiger M, Nesvizhskii AI. SAINT-MS1: protein-protein interaction scoring using label-free intensity data in affinity purification-mass spectrometry experiments. J Proteome Res 2012; 11:2619-24. [PMID: 22352807 DOI: 10.1021/pr201185r] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
We present a statistical method SAINT-MS1 for scoring protein-protein interactions based on the label-free MS1 intensity data from affinity purification-mass spectrometry (AP-MS) experiments. The method is an extension of Significance Analysis of INTeractome (SAINT), a model-based method previously developed for spectral count data. We reformulated the statistical model for log-transformed intensity data, including adequate treatment of missing observations, that is, interactions identified in some but not all replicate purifications. We demonstrate the performance of SAINT-MS1 using two recently published data sets: a small LTQ-Orbitrap data set with three replicate purifications of single human bait protein and control purifications and a larger drosophila data set targeting insulin receptor/target of rapamycin signaling pathway generated using an LTQ-FT instrument. Using the drosophila data set, we also compare and discuss the performance of SAINT analysis based on spectral count and MS1 intensity data in terms of the recovery of orthologous and literature-curated interactions. Given rapid advances in high mass accuracy instrumentation and intensity-based label-free quantification software, we expect that SAINT-MS1 will become a useful tool allowing improved detection of protein interactions in label-free AP-MS data, especially in the low abundance range.
Collapse
Affiliation(s)
- Hyungwon Choi
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | | | | | | |
Collapse
|
26
|
Abstract
Systems biology requires comprehensive data at all molecular levels. Mass spectrometry (MS)-based proteomics has emerged as a powerful and universal method for the global measurement of proteins. In the most widespread format, it uses liquid chromatography (LC) coupled to high-resolution tandem mass spectrometry (MS/MS) to identify and quantify peptides at a large scale. This peptide intensity information is the basic quantitative proteomic data type. It is used to quantify proteins between different proteome states, including the temporal variation of the proteome, to determine the complete primary structure of proteins including posttranslational modifications, to localize proteins to organelles, and to determine protein interactions. Here, we describe the principles of analysis and the areas of biology where proteomics can make unique contributions. The large-scale nature of proteomics data and its high accuracy pose special opportunities as well as challenges in systems biology that have been largely untapped so far.
Collapse
Affiliation(s)
- Jürgen Cox
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried D-82152, Germany.
| | | |
Collapse
|
27
|
Remmerie N, De Vijlder T, Valkenborg D, Laukens K, Smets K, Vreeken J, Mertens I, Carpentier SC, Panis B, De Jaeger G, Blust R, Prinsen E, Witters E. Unraveling tobacco BY-2 protein complexes with BN PAGE/LC-MS/MS and clustering methods. J Proteomics 2011; 74:1201-17. [PMID: 21443973 DOI: 10.1016/j.jprot.2011.03.023] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 03/13/2011] [Accepted: 03/21/2011] [Indexed: 11/26/2022]
Abstract
To understand physiological processes, insight into protein complexes is very important. Through a combination of blue native gel electrophoresis and LC-MS/MS, we were able to isolate protein complexes and identify their potential subunits from Nicotiana tabacum cv. Bright Yellow-2. For this purpose, a bioanalytical approach was used that works without a priori knowledge of the interacting proteins. Different clustering methods (e.g., k-means and hierarchical clustering) and a biclustering approach were evaluated according to their ability to group proteins by their migration profile and to correlate the proteins to a specific complex. The biclustering approach was identified as a very powerful tool for the exploration of protein complexes of whole cell lysates since it allows for the promiscuous nature of proteins. Furthermore, it searches for associations between proteins that co-occur frequently throughout the BN gel, which increases the confidence of the putative associations between co-migrating proteins. The statistical significance and biological relevance of the profile clusters were verified using functional gene ontology annotation. The proof of concept for identifying protein complexes by our BN PAGE/LC-MS/MS approach is provided through the analysis of known protein complexes. Both well characterized long-lived protein complexes as well as potential temporary sequential multi-enzyme complexes were characterized.
Collapse
Affiliation(s)
- Noor Remmerie
- Center for Proteomics (CFP), Groenenborgerlaan 171, B-2020 Antwerp, Belgium
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Fermin D, Basrur V, Yocum AK, Nesvizhskii AI. Abacus: a computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis. Proteomics 2011; 11:1340-5. [PMID: 21360675 DOI: 10.1002/pmic.201000650] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2010] [Revised: 12/15/2010] [Accepted: 12/29/2010] [Indexed: 01/16/2023]
Abstract
We describe Abacus, a computational tool for extracting spectral counts from MS/MS data sets. The program aggregates data from multiple experiments, adjusts spectral counts to accurately account for peptides shared across multiple proteins, and performs common normalization steps. It can also output the spectral count data at the gene level, thus simplifying the integration and comparison between gene and protein expression data. Abacus is compatible with the widely used Trans-Proteomic Pipeline suite of tools and comes with a graphical user interface making it easy to interact with the program. The main aim of Abacus is to streamline the analysis of spectral count data by providing an automated, easy to use solution for extracting this information from proteomic data sets for subsequent, more sophisticated statistical analysis.
Collapse
Affiliation(s)
- Damian Fermin
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | | | | | | |
Collapse
|
29
|
Gavin AC, Maeda K, Kühner S. Recent advances in charting protein-protein interaction: mass spectrometry-based approaches. Curr Opin Biotechnol 2010; 22:42-9. [PMID: 20934865 DOI: 10.1016/j.copbio.2010.09.007] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2010] [Revised: 09/08/2010] [Accepted: 09/09/2010] [Indexed: 12/13/2022]
Abstract
Cellular functions are the result of the coordinated action of groups of proteins interacting in molecular assemblies or pathways. The systematic and unbiased charting of protein-protein networks in a variety of organisms has become an important challenge in systems biology. These protein-protein interaction networks contribute comprehensive cartographies of key pathways or biological processes relevant to health or disease by providing a molecular frame for the interpretation of genetic links. At a structural level protein-protein networks enabled the identification of the sequences, motifs and structural folds involved in the process of molecular recognition. A rapidly growing choice of technologies is available for the global charting of protein-protein interactions. In this review, we focus on recent developments in a suite of methods that enable the purification of protein complexes under native conditions and, in conjunction with protein mass spectrometry, identification of their constituents.
Collapse
|