1
|
Serag A, Salem MA, Gong S, Wu JL, Farag MA. Decoding Metabolic Reprogramming in Plants under Pathogen Attacks, a Comprehensive Review of Emerging Metabolomics Technologies to Maximize Their Applications. Metabolites 2023; 13:424. [PMID: 36984864 PMCID: PMC10055942 DOI: 10.3390/metabo13030424] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Revised: 03/01/2023] [Accepted: 03/09/2023] [Indexed: 03/15/2023] Open
Abstract
In their environment, plants interact with a multitude of living organisms and have to cope with a large variety of aggressions of biotic or abiotic origin. What has been known for several decades is that the extraordinary variety of chemical compounds the plants are capable of synthesizing may be estimated in the range of hundreds of thousands, but only a fraction has been fully characterized to be implicated in defense responses. Despite the vast importance of these metabolites for plants and also for human health, our knowledge about their biosynthetic pathways and functions is still fragmentary. Recent progress has been made particularly for the phenylpropanoids and oxylipids metabolism, which is more emphasized in this review. With an increasing interest in monitoring plant metabolic reprogramming, the development of advanced analysis methods should now follow. This review capitalizes on the advanced technologies used in metabolome mapping in planta, including different metabolomics approaches, imaging, flux analysis, and interpretation using bioinformatics tools. Advantages and limitations with regards to the application of each technique towards monitoring which metabolite class or type are highlighted, with special emphasis on the necessary future developments to better mirror such intricate metabolic interactions in planta.
Collapse
Affiliation(s)
- Ahmed Serag
- Pharmaceutical Analytical Chemistry Department, Faculty of Pharmacy, Al-Azhar University, Cairo 11751, Egypt
| | - Mohamed A. Salem
- Department of Pharmacognosy and Natural Products, Faculty of Pharmacy, Menoufia University, Gamal Abd El Nasr st., Shibin Elkom 32511, Menoufia, Egypt
| | - Shilin Gong
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau 999078, China
| | - Jian-Lin Wu
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau 999078, China
| | - Mohamed A. Farag
- Pharmacognosy Department, College of Pharmacy, Cairo University, Kasr el Aini St., Cairo 11562, Egypt
| |
Collapse
|
2
|
Lobanov V, Gobet A, Joyce A. Ecosystem-specific microbiota and microbiome databases in the era of big data. ENVIRONMENTAL MICROBIOME 2022; 17:37. [PMID: 35842686 PMCID: PMC9287977 DOI: 10.1186/s40793-022-00433-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 06/29/2022] [Indexed: 05/05/2023]
Abstract
The rapid development of sequencing methods over the past decades has accelerated both the potential scope and depth of microbiota and microbiome studies. Recent developments in the field have been marked by an expansion away from purely categorical studies towards a greater investigation of community functionality. As in-depth genomic and environmental coverage is often distributed unequally across major taxa and ecosystems, it can be difficult to identify or substantiate relationships within microbial communities. Generic databases containing datasets from diverse ecosystems have opened a new era of data accessibility despite costs in terms of data quality and heterogeneity. This challenge is readily embodied in the integration of meta-omics data alongside habitat-specific standards which help contextualise datasets both in terms of sample processing and background within the ecosystem. A special case of large genomic repositories, ecosystem-specific databases (ES-DB's), have emerged to consolidate and better standardise sample processing and analysis protocols around individual ecosystems under study, allowing independent studies to produce comparable datasets. Here, we provide a comprehensive review of this emerging tool for microbial community analysis in relation to current trends in the field. We focus on the factors leading to the formation of ES-DB's, their comparison to traditional microbial databases, the potential for ES-DB integration with meta-omics platforms, as well as inherent limitations in the applicability of ES-DB's.
Collapse
Affiliation(s)
- Victor Lobanov
- Department of Marine Sciences, University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden
| | | | - Alyssa Joyce
- Department of Marine Sciences, University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden.
| |
Collapse
|
3
|
Rauh D, Blankenburg C, Fischer TG, Jung N, Kuhn S, Schatzschneider U, Schulze T, Neumann S. Data format standards in analytical chemistry. PURE APPL CHEM 2022. [DOI: 10.1515/pac-2021-3101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
Research data is an essential part of research and almost every publication in chemistry. The data itself can be valuable for reuse if sustainably deposited, annotated and archived. Thus, it is important to publish data following the FAIR principles, to make it findable, accessible, interoperable and reusable not only for humans but also in machine-readable form. This also improves transparency and reproducibility of research findings and fosters analytical work with scientific data to generate new insights, being only accessible with manifold and diverse datasets. Research data requires complete and informative metadata and use of open data formats to obtain interoperable data. Generic data formats like AnIML and JCAMP-DX have been used for many applications. Special formats for some analytical methods are already accepted, like mzML for mass spectrometry or nmrML and NMReDATA for NMR spectroscopy data. Other methods still lack common standards for data. Only a joint effort of chemists, instrument and software vendors, publishers and infrastructure maintainers can make sure that the analytical data will be of value in the future. In this review, we describe existing data formats in analytical chemistry and introduce guidelines for the development and use of standardized and open data formats.
Collapse
Affiliation(s)
- David Rauh
- Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data , Weinberg 3 , 06120 Halle , Germany
| | - Claudia Blankenburg
- Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data , Weinberg 3 , 06120 Halle , Germany
| | - Tillmann G. Fischer
- Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data , Weinberg 3 , 06120 Halle , Germany
| | - Nicole Jung
- Karlsruhe Institute of Technology, Institute for Chemical and Biological Systems (IBCS-FMS) , Hermann von Helmholtz Platz 1 , 76344 Eggenstein-Leopolshafen , Germany
| | - Stefan Kuhn
- School of Computer Science and Informatics , De Montfort University , Leicester , UK
| | - Ulrich Schatzschneider
- Institut für Anorganische Chemie , Julius-Maximilians-Universität Würzburg , Am Hubland , D-97074 Würzburg , Germany
| | - Tobias Schulze
- Department of Effect-Directed Analysis , Helmholtz Centre for Environmental Research – UFZ , Permoserstr. 15, 04318 Leipzig , Germany
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data , Weinberg 3 , 06120 Halle , Germany
| |
Collapse
|
4
|
Hoffmann N, Mayer G, Has C, Kopczynski D, Al Machot F, Schwudke D, Ahrends R, Marcus K, Eisenacher M, Turewicz M. A Current Encyclopedia of Bioinformatics Tools, Data Formats and Resources for Mass Spectrometry Lipidomics. Metabolites 2022; 12:584. [PMID: 35888710 PMCID: PMC9319858 DOI: 10.3390/metabo12070584] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/17/2022] [Accepted: 06/19/2022] [Indexed: 12/13/2022] Open
Abstract
Mass spectrometry is a widely used technology to identify and quantify biomolecules such as lipids, metabolites and proteins necessary for biomedical research. In this study, we catalogued freely available software tools, libraries, databases, repositories and resources that support lipidomics data analysis and determined the scope of currently used analytical technologies. Because of the tremendous importance of data interoperability, we assessed the support of standardized data formats in mass spectrometric (MS)-based lipidomics workflows. We included tools in our comparison that support targeted as well as untargeted analysis using direct infusion/shotgun (DI-MS), liquid chromatography-mass spectrometry, ion mobility or MS imaging approaches on MS1 and potentially higher MS levels. As a result, we determined that the Human Proteome Organization-Proteomics Standards Initiative standard data formats, mzML and mzTab-M, are already supported by a substantial number of recent software tools. We further discuss how mzTab-M can serve as a bridge between data acquisition and lipid bioinformatics tools for interpretation, capturing their output and transmitting rich annotated data for downstream processing. However, we identified several challenges of currently available tools and standards. Potential areas for improvement were: adaptation of common nomenclature and standardized reporting to enable high throughput lipidomics and improve its data handling. Finally, we suggest specific areas where tools and repositories need to improve to become FAIRer.
Collapse
Affiliation(s)
- Nils Hoffmann
- Forschungszentrum Jülich GmbH, Institute for Bio- and Geosciences (IBG-5), 52425 Jülich, Germany
| | - Gerhard Mayer
- Institute of Medical Systems Biology, Ulm University, 89081 Ulm, Germany;
| | - Canan Has
- Biological Mass Spectrometry, Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany;
- University Hospital Carl Gustav Carus, 01307 Dresden, Germany
- CENTOGENE GmbH, 18055 Rostock, Germany
| | - Dominik Kopczynski
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (D.K.); (R.A.)
| | - Fadi Al Machot
- Faculty of Science and Technology, Norwegian University for Life Science (NMBU), 1433 Ås, Norway;
| | - Dominik Schwudke
- Bioanalytical Chemistry, Forschungszentrum Borstel, Leibniz Lung Center, 23845 Borstel, Germany;
- Airway Research Center North, German Center for Lung Research (DZL), 23845 Borstel, Germany
- German Center for Infection Research (DZIF), TTU Tuberculosis, 23845 Borstel, Germany
| | - Robert Ahrends
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (D.K.); (R.A.)
| | - Katrin Marcus
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany; (K.M.); (M.E.)
| | - Martin Eisenacher
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany; (K.M.); (M.E.)
- Faculty of Medicine, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Michael Turewicz
- Institute for Clinical Biochemistry and Pathobiochemistry, German Diabetes Center (DDZ), Leibniz Center for Diabetes Research at Heinrich-Heine-University Düsseldorf, 40225 Düsseldorf, Germany
- German Center for Diabetes Research (DZD), Partner Düsseldorf, 85764 Neuherberg, Germany
| |
Collapse
|
5
|
Bioinformatics in Lipidomics: Automating Large-Scale LC-MS-Based Untargeted Lipidomics Profiling with SimLipid Software. Methods Mol Biol 2021. [PMID: 34786685 DOI: 10.1007/978-1-0716-1822-6_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Liquid chromatography-mass spectrometry (LC-MS) provides one of the most popular platforms for untargeted plant lipidomics analysis (Shulaev and Chapman, Biochim Biophys Acta 1862(8):786-791, 2017; Rupasinghe and Roessner, Methods Mol Biol 1778:125-135, 2018; Welti et al., Front Biosci 12:2494-506, 2007; Shiva et al., Plant Methods 14:14, 2018). We have developed SimLipid software in order to streamline the analysis of large-volume datasets generated by LC-MS-based untargeted lipidomics methods. SimLipid contains a customizable library of lipid species; graphical user interfaces (GUIs) for visualization of raw data; the identified lipid molecules and their associated mass spectra annotated with fragment ions and parent ions; and detailed information of each identified lipid species all in a single workbench enabling users to rapidly review the results by examining the data for confident identifications of lipid molecular species. In this chapter, we present the functionality of the software and workflow for automating large-scale LC-MS-based untargeted lipidomics profiling.
Collapse
|
6
|
Bell M, Blais JM. "-Omics" workflow for paleolimnological and geological archives: A review. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019; 672:438-455. [PMID: 30965259 DOI: 10.1016/j.scitotenv.2019.03.477] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 03/29/2019] [Accepted: 03/30/2019] [Indexed: 06/09/2023]
Abstract
"-Omics" is a powerful screening method with applications in molecular biology, toxicology, wildlife biology, natural product discovery, and many other fields. Genomics, proteomics, metabolomics, and lipidomics are common examples included under the "-omics" umbrella. This screening method uses combinations of untargeted, semi-targeted, and targeted analyses paired with data mining to facilitate researchers' understanding of the genome, proteins, and small organic molecules in biological systems. Recently, however, the use of "-omics" has expanded into the fields of geology, specifically petrology, and paleolimnology. Specifically, untargeted analyses stand to transform these fields as petroleomics, and sediment-"omics" become more prevalent. "-Omics" facilitates the visualization of small molecule profiles from environmental matrices (i.e. oil and sediment). Small molecule profiles can provide improved understanding of small molecules distributions throughout the environment, and how those compositions can change depending on conditions (i.e. climate change, weathering, etc.). "-Omics" also facilities discovery of next-generation biomarkers that can be used for oil source identification and as proxies for reconstructing past environmental changes. Untargeted analyses paired with data mining and multivariate statistical analyses represents a powerful suite of tools for hypothesis generation, and new method development for environmental reconstructions. Here we present an introduction to "-omics" methodology, technical terms, and examples of applications to paleolimnology and petrology. The purpose of this review is to highlight the important considerations at each step in the "-omics" workflow to produce high quality and statistically powerful data for petrological and paleolimnological applications.
Collapse
Affiliation(s)
- Madison Bell
- Laboratory for the Analysis of Natural and Synthetic Environmental Toxicants, Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | - Jules M Blais
- Laboratory for the Analysis of Natural and Synthetic Environmental Toxicants, Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada.
| |
Collapse
|
7
|
Gorrochategui E, Jaumot J, Lacorte S, Tauler R. Data analysis strategies for targeted and untargeted LC-MS metabolomic studies: Overview and workflow. Trends Analyt Chem 2016. [DOI: 10.1016/j.trac.2016.07.004] [Citation(s) in RCA: 187] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
8
|
Meitei NS, Apte A, Snovida SI, Rogers JC, Saba J. Automating mass spectrometry-based quantitative glycomics using aminoxy tandem mass tag reagents with SimGlycan. J Proteomics 2015; 127:211-22. [PMID: 26003531 DOI: 10.1016/j.jprot.2015.05.015] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 04/08/2015] [Accepted: 05/14/2015] [Indexed: 11/26/2022]
Abstract
Protein glycosylation is a common post-translational modification, which serves critical roles in the biological processes of organisms. Monitoring of changes in the abundance and structure of glycans may be necessary to explain the correlations between protein glycosylation and various diseases. Hence, the growing importance of glycoproteomics necessitates in-depth qualitative and quantitative studies of glycans. One of the emerging trends in glycomics research is the innovation related to accurate mass spectrometry based quantitative analysis of glycans. Recently, we have introduced aminoxyTMT reagents, which enable efficient relative quantitation of carbohydrates, improved glycan ionization efficiency and increased analytical throughput. These reagents can be used for quantitative analysis of N-glycans by direct infusion or liquid chromatography (LC)-coupled to electrospray ionization mass spectrometry (ESI-MS). However, unlike in proteomics, one of the major challenges left unaddressed is the lack of informatics tools to automate the qualitative and quantitative analysis of generated data. This analysis typically includes identification/quantitation of glycans using MS/MS data and differential analysis across biological samples. We have developed software modules to streamline such protocols for quantitative analysis of aminoxyTMT labeled-glycans derived from complex mixtures. This article is part of a Special Issue entitled: Proteomics in India.
Collapse
|
9
|
Deutsch EW, Albar JP, Binz PA, Eisenacher M, Jones AR, Mayer G, Omenn GS, Orchard S, Vizcaíno JA, Hermjakob H. Development of data representation standards by the human proteome organization proteomics standards initiative. J Am Med Inform Assoc 2015; 22:495-506. [PMID: 25726569 PMCID: PMC4457114 DOI: 10.1093/jamia/ocv001] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Revised: 09/29/2014] [Accepted: 01/05/2015] [Indexed: 11/22/2022] Open
Abstract
OBJECTIVE To describe the goals of the Proteomics Standards Initiative (PSI) of the Human Proteome Organization, the methods that the PSI has employed to create data standards, the resulting output of the PSI, lessons learned from the PSI's evolution, and future directions and synergies for the group. MATERIALS AND METHODS The PSI has 5 categories of deliverables that have guided the group. These are minimum information guidelines, data formats, controlled vocabularies, resources and software tools, and dissemination activities. These deliverables are produced via the leadership and working group organization of the initiative, driven by frequent workshops and ongoing communication within the working groups. Official standards are subjected to a rigorous document process that includes several levels of peer review prior to release. RESULTS We have produced and published minimum information guidelines describing what information should be provided when making data public, either via public repositories or other means. The PSI has produced a series of standard formats covering mass spectrometer input, mass spectrometer output, results of informatics analysis (both qualitative and quantitative analyses), reports of molecular interaction data, and gel electrophoresis analyses. We have produced controlled vocabularies that ensure that concepts are uniformly annotated in the formats and engaged in extensive software development and dissemination efforts so that the standards can efficiently be used by the community.Conclusion In its first dozen years of operation, the PSI has produced many standards that have accelerated the field of proteomics by facilitating data exchange and deposition to data repositories. We look to the future to continue developing standards for new proteomics technologies and workflows and mechanisms for integration with other omics data types. Our products facilitate the translation of genomics and proteomics findings to clinical and biological phenotypes. The PSI website can be accessed at http://www.psidev.info.
Collapse
Affiliation(s)
| | - Juan Pablo Albar
- Died July 18, 2014 Proteomics Facility, Centro Nacional de Biotecnología - CSIC, Madrid, Spain ProteoRed Consortium, Spanish National Institute of Proteomics, Madrid, Spain
| | - Pierre-Alain Binz
- CHUV Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Martin Eisenacher
- Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum, Bochum, Germany
| | - Andrew R Jones
- Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Gerhard Mayer
- Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum, Bochum, Germany
| | - Gilbert S Omenn
- Institute for Systems Biology, Seattle, USA Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, USA
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
10
|
Metabolomics as a tool for discovery of biomarkers of autism spectrum disorder in the blood plasma of children. PLoS One 2014; 9:e112445. [PMID: 25380056 PMCID: PMC4224480 DOI: 10.1371/journal.pone.0112445] [Citation(s) in RCA: 104] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 10/06/2014] [Indexed: 12/14/2022] Open
Abstract
Background The diagnosis of autism spectrum disorder (ASD) at the earliest age possible is important for initiating optimally effective intervention. In the United States the average age of diagnosis is 4 years. Identifying metabolic biomarker signatures of ASD from blood samples offers an opportunity for development of diagnostic tests for detection of ASD at an early age. Objectives To discover metabolic features present in plasma samples that can discriminate children with ASD from typically developing (TD) children. The ultimate goal is to identify and develop blood-based ASD biomarkers that can be validated in larger clinical trials and deployed to guide individualized therapy and treatment. Methods Blood plasma was obtained from children aged 4 to 6, 52 with ASD and 30 age-matched TD children. Samples were analyzed using 5 mass spectrometry-based methods designed to orthogonally measure a broad range of metabolites. Univariate, multivariate and machine learning methods were used to develop models to rank the importance of features that could distinguish ASD from TD. Results A set of 179 statistically significant features resulting from univariate analysis were used for multivariate modeling. Subsets of these features properly classified the ASD and TD samples in the 61-sample training set with average accuracies of 84% and 86%, and with a maximum accuracy of 81% in an independent 21-sample validation set. Conclusions This analysis of blood plasma metabolites resulted in the discovery of biomarkers that may be valuable in the diagnosis of young children with ASD. The results will form the basis for additional discovery and validation research for 1) determining biomarkers to develop diagnostic tests to detect ASD earlier and improve patient outcomes, 2) gaining new insight into the biochemical mechanisms of various subtypes of ASD 3) identifying biomolecular targets for new modes of therapy, and 4) providing the basis for individualized treatment recommendations.
Collapse
|
11
|
Beisken S, Earll M, Portwood D, Seymour M, Steinbeck C. MassCascade: Visual Programming for LC-MS Data Processing in Metabolomics. Mol Inform 2014; 33:307-310. [PMID: 26279687 PMCID: PMC4524413 DOI: 10.1002/minf.201400016] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2014] [Accepted: 03/10/2014] [Indexed: 01/02/2023]
Abstract
Liquid chromatography coupled to mass spectrometry (LC-MS) is commonly applied to investigate the small molecule complement of organisms. Several software tools are typically joined in custom pipelines to semi-automatically process and analyse the resulting data. General workflow environments like the Konstanz Information Miner (KNIME) offer the potential of an all-in-one solution to process LC-MS data by allowing easy integration of different tools and scripts. We describe MassCascade and its workflow plug-in for processing LC-MS data. The Java library integrates frequently used algorithms in a modular fashion, thus enabling it to serve as back-end for graphical front-ends. The functions available in MassCascade have been encapsulated in a plug-in for the workflow environment KNIME, allowing combined use with e.g. statistical workflow nodes from other providers and making the tool intuitive to use without knowledge of programming. The design of the software guarantees a high level of modularity where processing functions can be quickly replaced or concatenated. MassCascade is an open-source library for LC-MS data processing in metabolomics. It embraces the concept of visual programming through its KNIME plug-in, simplifying the process of building complex workflows. The library was validated using open data.
Collapse
Affiliation(s)
- Stephan Beisken
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI)Welcome Trust, Genome Campus, Hinxton, Cambridgeshire, UK
| | - Mark Earll
- Syngenta Jealott's Hill International Research CentreBracknell, Berkshire, UK
| | - David Portwood
- Syngenta Jealott's Hill International Research CentreBracknell, Berkshire, UK
| | - Mark Seymour
- Syngenta Jealott's Hill International Research CentreBracknell, Berkshire, UK
| | - Christoph Steinbeck
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI)Welcome Trust, Genome Campus, Hinxton, Cambridgeshire, UK
| |
Collapse
|
12
|
Martínez-Bartolomé S, Binz PA, Albar JP. The Minimal Information about a Proteomics Experiment (MIAPE) from the Proteomics Standards Initiative. Methods Mol Biol 2014; 1072:765-80. [PMID: 24136562 DOI: 10.1007/978-1-62703-631-3_53] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
During the last 10 years, the Proteomics Standards Initiative from the Human Proteome Organization (HUPO-PSI) has worked on defining standards for proteomics data representation as well as guidelines that state the minimum information that should be included when reporting a proteomics experiment (MIAPE). Such minimum information must describe the complete experiment, including both experimental protocols and data processing methods, allowing a critical evaluation of the whole process and the potential recreation of the work. In this chapter we describe the standardization work performed by the HUPO-PSI, and then we concentrate on the MIAPE guidelines, highlighting its importance when publishing proteomics experiments particularly in specialized proteomics journals. Finally, we describe existing bioinformatics resources that generate MIAPE compliant reports or that check proteomics data files for MIAPE compliance.
Collapse
|
13
|
Robbe MF, Both JP, Prideaux B, Klinkert I, Picaud V, Schramm T, Hester A, Guevara V, Stoeckli M, Roempp A, Heeren RMA, Spengler B, Gala O, Haan S. Software tools of the Computis European project to process mass spectrometry images. EUROPEAN JOURNAL OF MASS SPECTROMETRY (CHICHESTER, ENGLAND) 2014; 20:351-360. [PMID: 25707124 DOI: 10.1255/ejms.1293] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Among the needs usually expressed by teams using mass spectrometry imaging, one that often arises is that for user-friendly software able to manage huge data volumes quickly and to provide efficient assistance for the interpretation of data. To answer this need, the Computis European project developed several complementary software tools to process mass spectrometry imaging data. Data Cube Explorer provides a simple spatial and spectral exploration for matrix-assisted laser desorption/ionisation-time of flight (MALDI-ToF) and time of flight-secondary-ion mass spectrometry (ToF-SIMS) data. SpectViewer offers visualisation functions, assistance to the interpretation of data, classification functionalities, peak list extraction to interrogate biological database and image overlay, and it can process data issued from MALDI-ToF, ToF-SIMS and desorption electrospray ionisation (DESI) equipment. EasyReg2D is able to register two images, in American Standard Code for Information Interchange (ASCII) format, issued from different technologies. The collaboration between the teams was hampered by the multiplicity of equipment and data formats, so the project also developed a common data format (imzML) to facilitate the exchange of experimental data and their interpretation by the different software tools. The BioMap platform for visualisation and exploration of MALDI-ToF and DESI images was adapted to parse imzML files, enabling its access to all project partners and, more globally, to a larger community of users. Considering the huge advantages brought by the imzML standard format, a specific editor (vBrowser) for imzML files and converters from proprietary formats to imzML were developed to enable the use of the imzML format by a broad scientific community. This initiative paves the way toward the development of a large panel of software tools able to process mass spectrometry imaging datasets in the future.
Collapse
|
14
|
Kessler N, Neuweger H, Bonte A, Langenkämper G, Niehaus K, Nattkemper TW, Goesmann A. MeltDB 2.0-advances of the metabolomics software system. Bioinformatics 2013; 29:2452-9. [PMID: 23918246 PMCID: PMC3777109 DOI: 10.1093/bioinformatics/btt414] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Motivation: The research area metabolomics achieved tremendous popularity and development in the last couple of years. Owing to its unique interdisciplinarity, it requires to combine knowledge from various scientific disciplines. Advances in the high-throughput technology and the consequently growing quality and quantity of data put new demands on applied analytical and computational methods. Exploration of finally generated and analyzed datasets furthermore relies on powerful tools for data mining and visualization. Results: To cover and keep up with these requirements, we have created MeltDB 2.0, a next-generation web application addressing storage, sharing, standardization, integration and analysis of metabolomics experiments. New features improve both efficiency and effectivity of the entire processing pipeline of chromatographic raw data from pre-processing to the derivation of new biological knowledge. First, the generation of high-quality metabolic datasets has been vastly simplified. Second, the new statistics tool box allows to investigate these datasets according to a wide spectrum of scientific and explorative questions. Availability: The system is publicly available at https://meltdb.cebitec.uni-bielefeld.de. A login is required but freely available. Contact:nkessler@cebitec.uni-bielefeld.de
Collapse
Affiliation(s)
- Nikolas Kessler
- Biodata Mining Group, CeBiTec, Bielefeld University, Bielefeld, Germany, Computational Genomics, CeBiTec, Bielefeld University, Bielefeld, Germany, Bruker Daltonik GmbH, Bremen, Germany, Proteome and Metabolome Research, Bielefeld University, Bielefeld, Germany and Max Rubner-Institute, Detmold, Germany
| | | | | | | | | | | | | |
Collapse
|
15
|
Using R and Bioconductor for proteomics data analysis. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:42-51. [PMID: 23692960 DOI: 10.1016/j.bbapap.2013.04.032] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Revised: 04/09/2013] [Accepted: 04/30/2013] [Indexed: 10/26/2022]
Abstract
This review presents how R, the popular statistical environment and programming language, can be used in the frame of proteomics data analysis. A short introduction to R is given, with special emphasis on some of the features that make R and its add-on packages premium software for sound and reproducible data analysis. The reader is also advised on how to find relevant R software for proteomics. Several use cases are then presented, illustrating data input/output, quality control, quantitative proteomics and data analysis. Detailed code and additional links to extensive documentation are available in the freely available companion package RforProteomics. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Collapse
|
16
|
Medina-Aunon JA, Krishna R, Ghali F, Albar JP, Jones AJ. A guide for integration of proteomic data standards into laboratory workflows. Proteomics 2013; 13:480-92. [DOI: 10.1002/pmic.201200268] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Revised: 08/14/2012] [Accepted: 09/10/2012] [Indexed: 01/28/2023]
Affiliation(s)
| | - Ritesh Krishna
- Institute of Integrative Biology; University of Liverpool; Liverpool; UK
| | - Fawaz Ghali
- Institute of Integrative Biology; University of Liverpool; Liverpool; UK
| | - Juan P. Albar
- Centro Nacional de Biotecnología; CSIC; Madrid; Spain
| | - Andrew J. Jones
- Institute of Integrative Biology; University of Liverpool; Liverpool; UK
| |
Collapse
|
17
|
Deutsch EW. File formats commonly used in mass spectrometry proteomics. Mol Cell Proteomics 2012; 11:1612-21. [PMID: 22956731 PMCID: PMC3518119 DOI: 10.1074/mcp.r112.019695] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2012] [Revised: 08/06/2012] [Indexed: 11/06/2022] Open
Abstract
The application of mass spectrometry (MS) to the analysis of proteomes has enabled the high-throughput identification and abundance measurement of hundreds to thousands of proteins per experiment. However, the formidable informatics challenge associated with analyzing MS data has required a wide variety of data file formats to encode the complex data types associated with MS workflows. These formats encompass the encoding of input instruction for instruments, output products of the instruments, and several levels of information and results used by and produced by the informatics analysis tools. A brief overview of the most common file formats in use today is presented here, along with a discussion of related topics.
Collapse
|
18
|
Côté RG, Griss J, Dianes JA, Wang R, Wright JC, van den Toorn HWP, van Breukelen B, Heck AJR, Hulstaert N, Martens L, Reisinger F, Csordas A, Ovelleiro D, Perez-Rivevol Y, Barsnes H, Hermjakob H, Vizcaíno JA. The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium. Mol Cell Proteomics 2012; 11:1682-9. [PMID: 22949509 PMCID: PMC3518121 DOI: 10.1074/mcp.o112.021543] [Citation(s) in RCA: 99] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The original PRIDE Converter tool greatly simplified the process of submitting mass spectrometry (MS)-based proteomics data to the PRIDE database. However, after much user feedback, it was noted that the tool had some limitations and could not handle several user requirements that were now becoming commonplace. This prompted us to design and implement a whole new suite of tools that would build on the successes of the original PRIDE Converter and allow users to generate submission-ready, well-annotated PRIDE XML files. The PRIDE Converter 2 tool suite allows users to convert search result files into PRIDE XML (the format needed for performing submissions to the PRIDE database), generate mzTab skeleton files that can be used as a basis to submit quantitative and gel-based MS data, and post-process PRIDE XML files by filtering out contaminants and empty spectra, or by merging several PRIDE XML files together. All the tools have both a graphical user interface that provides a dialog-based, user-friendly way to convert and prepare files for submission, as well as a command-line interface that can be used to integrate the tools into existing or novel pipelines, for batch processing and power users. The PRIDE Converter 2 tool suite will thus become a cornerstone in the submission process to PRIDE and, by extension, to the ProteomeXchange consortium of MS-proteomics data repositories.
Collapse
Affiliation(s)
- Richard G Côté
- Proteomics Services Team, EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
MilQuant: A free, generic software tool for isobaric tagging-based quantitation. J Proteomics 2012; 75:5516-22. [DOI: 10.1016/j.jprot.2012.06.028] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2012] [Revised: 06/16/2012] [Accepted: 06/29/2012] [Indexed: 11/21/2022]
|
20
|
Sharma V, Eng JK, Maccoss MJ, Riffle M. A mass spectrometry proteomics data management platform. Mol Cell Proteomics 2012; 11:824-31. [PMID: 22611296 DOI: 10.1074/mcp.o111.015149] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Mass spectrometry-based proteomics is increasingly being used in biomedical research. These experiments typically generate a large volume of highly complex data, and the volume and complexity are only increasing with time. There exist many software pipelines for analyzing these data (each typically with its own file formats), and as technology improves, these file formats change and new formats are developed. Files produced from these myriad software programs may accumulate on hard disks or tape drives over time, with older files being rendered progressively more obsolete and unusable with each successive technical advancement and data format change. Although initiatives exist to standardize the file formats used in proteomics, they do not address the core failings of a file-based data management system: (1) files are typically poorly annotated experimentally, (2) files are "organically" distributed across laboratory file systems in an ad hoc manner, (3) files formats become obsolete, and (4) searching the data and comparing and contrasting results across separate experiments is very inefficient (if possible at all). Here we present a relational database architecture and accompanying web application dubbed Mass Spectrometry Data Platform that is designed to address the failings of the file-based mass spectrometry data management approach. The database is designed such that the output of disparate software pipelines may be imported into a core set of unified tables, with these core tables being extended to support data generated by specific pipelines. Because the data are unified, they may be queried, viewed, and compared across multiple experiments using a common web interface. Mass Spectrometry Data Platform is open source and freely available at http://code.google.com/p/msdapl/.
Collapse
Affiliation(s)
- Vagisha Sharma
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | | | | | | |
Collapse
|
21
|
Hoekman B, Breitling R, Suits F, Bischoff R, Horvatovich P. msCompare: a framework for quantitative analysis of label-free LC-MS data for comparative candidate biomarker studies. Mol Cell Proteomics 2012; 11:M111.015974. [PMID: 22318370 DOI: 10.1074/mcp.m111.015974] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Data processing forms an integral part of biomarker discovery and contributes significantly to the ultimate result. To compare and evaluate various publicly available open source label-free data processing workflows, we developed msCompare, a modular framework that allows the arbitrary combination of different feature detection/quantification and alignment/matching algorithms in conjunction with a novel scoring method to evaluate their overall performance. We used msCompare to assess the performance of workflows built from modules of publicly available data processing packages such as SuperHirn, OpenMS, and MZmine and our in-house developed modules on peptide-spiked urine and trypsin-digested cerebrospinal fluid (CSF) samples. We found that the quality of results varied greatly among workflows, and interestingly, heterogeneous combinations of algorithms often performed better than the homogenous workflows. Our scoring method showed that the union of feature matrices of different workflows outperformed the original homogenous workflows in some cases. msCompare is open source software (https://trac.nbic.nl/mscompare), and we provide a web-based data processing service for our framework by integration into the Galaxy server of the Netherlands Bioinformatics Center (http://galaxy.nbic.nl/galaxy) to allow scientists to determine which combination of modules provides the most accurate processing for their particular LC-MS data sets.
Collapse
Affiliation(s)
- Berend Hoekman
- Department of Pharmacy, University of Groningen, 9713 AV Groningen, The Netherlands
| | | | | | | | | |
Collapse
|
22
|
Medina-Aunon JA, Martínez-Bartolomé S, López-García MA, Salazar E, Navajas R, Jones AR, Paradela A, Albar JP. The ProteoRed MIAPE web toolkit: a user-friendly framework to connect and share proteomics standards. Mol Cell Proteomics 2012; 10:M111.008334. [PMID: 21983993 DOI: 10.1074/mcp.m111.008334] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The development of the HUPO-PSI's (Proteomics Standards Initiative) standard data formats and MIAPE (Minimum Information About a Proteomics Experiment) guidelines should improve proteomics data sharing within the scientific community. Proteomics journals have encouraged the use of these standards and guidelines to improve the quality of experimental reporting and ease the evaluation and publication of manuscripts. However, there is an evident lack of bioinformatics tools specifically designed to create and edit standard file formats and reports, or embed them within proteomics workflows. In this article, we describe a new web-based software suite (The ProteoRed MIAPE web toolkit) that performs several complementary roles related to proteomic data standards. First, it can verify that the reports fulfill the minimum information requirements of the corresponding MIAPE modules, highlighting inconsistencies or missing information. Second, the toolkit can convert several XML-based data standards directly into human readable MIAPE reports stored within the ProteoRed MIAPE repository. Finally, it can also perform the reverse operation, allowing users to export from MIAPE reports into XML files for computational processing, data sharing, or public database submission. The toolkit is thus the first application capable of automatically linking the PSI's MIAPE modules with the corresponding XML data exchange standards, enabling bidirectional conversions. This toolkit is freely available at http://www.proteored.org/MIAPE/.
Collapse
|
23
|
Wilhelm M, Kirchner M, Steen JAJ, Steen H. mz5: space- and time-efficient storage of mass spectrometry data sets. Mol Cell Proteomics 2011; 11:O111.011379. [PMID: 21960719 DOI: 10.1074/mcp.o111.011379] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Across a host of MS-driven-omics fields, researchers witness the acquisition of ever increasing amounts of high throughput MS data and face the need for their compact yet efficiently accessible storage. Addressing the need for an open data exchange format, the Proteomics Standards Initiative and the Seattle Proteome Center at the Institute for Systems Biology independently developed the mzData and mzXML formats, respectively. In a subsequent joint effort, they defined an ontology and associated controlled vocabulary that specifies the contents of MS data files, implemented as the newer mzML format. All three formats are based on XML and are thus not particularly efficient in either storage space requirements or read/write speed. This contribution introduces mz5, a complete reimplementation of the mzML ontology that is based on the efficient, industrial strength storage backend HDF5. Compared with the current mzML standard, this strategy yields an average file size reduction to ∼54% and increases linear read and write speeds ∼3-4-fold. The format is implemented as part of the ProteoWizard project and is available under a permissive Apache license. Additional information and download links are available from http://software.steenlab.org/mz5.
Collapse
Affiliation(s)
- Mathias Wilhelm
- Proteomics Center, Children's Hospital Boston, Boston, Massachusetts; Faculty of Technology, University Bielefeld, Bielefeld, Germany; Department of Pathology, Children's Hospital Boston, Boston, Massachusetts
| | - Marc Kirchner
- Proteomics Center, Children's Hospital Boston, Boston, Massachusetts; Department of Pathology, Children's Hospital Boston, Boston, Massachusetts; Department of Pathology, Harvard Medical School, Boston, Massachusetts.
| | - Judith A J Steen
- Proteomics Center, Children's Hospital Boston, Boston, Massachusetts; Department of Neurobiology, Harvard Medical School and F. M. Kirby Neurobiology Center, Children's Hospital, Boston, Massachusetts
| | - Hanno Steen
- Proteomics Center, Children's Hospital Boston, Boston, Massachusetts; Department of Pathology, Children's Hospital Boston, Boston, Massachusetts; Department of Pathology, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
24
|
Aranda B, Blankenburg H, Kerrien S, Brinkman FSL, Ceol A, Chautard E, Dana JM, De Las Rivas J, Dumousseau M, Galeota E, Gaulton A, Goll J, Hancock REW, Isserlin R, Jimenez RC, Kerssemakers J, Khadake J, Lynn DJ, Michaut M, O’Kelly G, Ono K, Orchard S, Prieto C, Razick S, Rigina O, Salwinski L, Simonovic M, Velankar S, Winter A, Wu G, Bader GD, Cesareni G, Donaldson IM, Eisenberg D, Kleywegt GJ, Overington J, Ricard-Blum S, Tyers M, Albrecht M, Hermjakob H. PSICQUIC and PSISCORE: accessing and scoring molecular interactions. Nat Methods 2011; 8:528-9. [PMID: 21716279 PMCID: PMC3246345 DOI: 10.1038/nmeth.1637] [Citation(s) in RCA: 209] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Bruno Aranda
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Samuel Kerrien
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Fiona S L Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Arnaud Ceol
- Institute for Research in Biomedicine, Barcelona, Spain
- Department of Biology, University of Rome Tor Vergata, Rome, Italy
| | - Emilie Chautard
- Institut de Biologie et Chimie des Protéines, Unité Mixte de Recherche 5086, Centre National de la Recherche Scientifique–Université Lyon 1, Lyon, France
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Jose M Dana
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Javier De Las Rivas
- Cancer Research Center, Centro de Investigación de Cáncer–Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas, Universidad de Salamanca, Salamanca, Spain
| | - Marine Dumousseau
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Eugenia Galeota
- Department of Biology, University of Rome Tor Vergata, Rome, Italy
- Istituto di Ricovero e Cura a Carattere Scientifico, Fondazione S. Lucia, Rome, Italy
| | - Anna Gaulton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Robert E W Hancock
- Centre for Microbial Diseases and Immunity Research, University of British Columbia, Vancouver, British Columbia, Canada
| | - Ruth Isserlin
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Rafael C Jimenez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Jules Kerssemakers
- Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | - Jyoti Khadake
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - David J Lynn
- Animal and Bioscience Research Department, Animal and Grassland Research Innovation Centre, Teagasc, Ireland
| | - Magali Michaut
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Gavin O’Kelly
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Keiichiro Ono
- University of California, Trey Ideker Lab, San Diego, School of Medicine, La Jolla, California, USA
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Carlos Prieto
- Cancer Research Center, Centro de Investigación de Cáncer–Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas, Universidad de Salamanca, Salamanca, Spain
- Institute of Biotechnology of León, León, Spain
| | - Sabry Razick
- The Biotechnology Centre of Oslo, University of Oslo, Oslo, Norway
- Biomedical Research Group, Department of Informatics, University of Oslo, Oslo, Norway
| | - Olga Rigina
- Center for Biological Sequence Analysis, BioCentrum, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Lukasz Salwinski
- University of California, Los Angeles, Department of Energy Institute for Genomics and Proteomics, Los Angeles, California, USA
| | - Milan Simonovic
- Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Andrew Winter
- Wellcome Trust Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, Scotland
| | - Guanming Wu
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Gary D Bader
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Gianni Cesareni
- Department of Biology, University of Rome Tor Vergata, Rome, Italy
- Istituto di Ricovero e Cura a Carattere Scientifico, Fondazione S. Lucia, Rome, Italy
| | - Ian M Donaldson
- The Biotechnology Centre of Oslo, University of Oslo, Oslo, Norway
- Department for Molecular Biosciences, University of Oslo, Oslo, Norway
| | - David Eisenberg
- University of California, Los Angeles, Department of Energy Institute for Genomics and Proteomics, Los Angeles, California, USA
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, California, USA
- Howard Hughes Medical Institute, University of California, Los Angeles, California, USA
| | - Gerard J Kleywegt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - John Overington
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Sylvie Ricard-Blum
- Institut de Biologie et Chimie des Protéines, Unité Mixte de Recherche 5086, Centre National de la Recherche Scientifique–Université Lyon 1, Lyon, France
| | - Mike Tyers
- Wellcome Trust Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, Scotland
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada
| | - Mario Albrecht
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| |
Collapse
|
25
|
Turewicz M, Deutsch EW. Spectra, chromatograms, Metadata: mzML-the standard data format for mass spectrometer output. Methods Mol Biol 2011; 696:179-203. [PMID: 21063948 DOI: 10.1007/978-1-60761-987-1_11] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
This chapter describes Mass Spectrometry Markup Language (mzML), an XML-based and vendor-neutral standard data format for storage and exchange of mass spectrometer output like raw spectra and peak lists. It is intended to replace its two precursor data formats (mzData and mzXML), which had been developed independently a few years earlier. Hence, with the release of mzML, the problem of having two different formats for the same purposes is solved, and with it the duplicated effort of maintaining and supporting two data formats. The new format has been developed by a broad-based consortium of major instrument vendors, software vendors, and academic researchers under the aegis of the Human Proteome Organisation (HUPO), Proteomics Standards Initiative (PSI), with full participation of the main developers of the precursor formats. This comprehensive approach helped mzML to become a generally accepted standard. Furthermore, the collaborative development insured that mzML has adopted the best features of its precursor formats. In this chapter, we discuss mzML's development history, its design principles and use cases, as well as its main building components. We also present the available documentation, an example file, and validation software for mzML.
Collapse
Affiliation(s)
- Michael Turewicz
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, Bochum, Germany
| | | |
Collapse
|
26
|
Gilski MJ, Sadygov RG. Comparison of Programmatic Approaches for Efficient Accessing to mzML Files. ACTA ACUST UNITED AC 2011; 2. [PMID: 21766049 DOI: 10.4172/2153-0602.1000109] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The Human Proteome Organization (HUPO) Proteomics Standard Initiative has been tasked with developing file formats for storing raw data (mzML) and the results of spectral processing (protein identification and quantification) from proteomics experiments (mzIndentML). In order to fully characterize complex experiments, special data types have been designed. Standardized file formats will promote visualization, validation and dissemination of data independent of the vendor-specific binary data storage files. Innovative programmatic solutions for robust and efficient data access to standardized file formats will contribute to more rapid wide-scale acceptance of these file formats by the proteomics community.In this work, we compare algorithms for accessing spectral data in the mzML file format. As an XML file, mzML files allow efficient parsing of data structures when using XML-specific class types. These classes provide only sequential access to files. However, random access to spectral data is needed in many algorithmic applications for processing proteomics datasets. Here, we demonstrate implementation of memory streams to convert a sequential access into random access. Our application preserves the elegant XML parsing capabilities. Benchmarking file access times in sequential and random access modes show that while for small number of spectra the random access is more time efficient, when retrieving large number of spectra sequential access becomes more efficient. We also provide comparisons to other file accessing methods from academia and industry.
Collapse
Affiliation(s)
- Miroslaw J Gilski
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, 301 University Blvd., Galveston, TX, 77555, USA
| | | |
Collapse
|
27
|
Martens L, Chambers M, Sturm M, Kessner D, Levander F, Shofstahl J, Tang WH, Römpp A, Neumann S, Pizarro AD, Montecchi-Palazzi L, Tasman N, Coleman M, Reisinger F, Souda P, Hermjakob H, Binz PA, Deutsch EW. mzML--a community standard for mass spectrometry data. Mol Cell Proteomics 2011; 10:R110.000133. [PMID: 20716697 PMCID: PMC3013463 DOI: 10.1074/mcp.r110.000133] [Citation(s) in RCA: 461] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2010] [Revised: 07/26/2010] [Indexed: 12/27/2022] Open
Abstract
Mass spectrometry is a fundamental tool for discovery and analysis in the life sciences. With the rapid advances in mass spectrometry technology and methods, it has become imperative to provide a standard output format for mass spectrometry data that will facilitate data sharing and analysis. Initially, the efforts to develop a standard format for mass spectrometry data resulted in multiple formats, each designed with a different underlying philosophy. To resolve the issues associated with having multiple formats, vendors, researchers, and software developers convened under the banner of the HUPO PSI to develop a single standard. The new data format incorporated many of the desirable technical attributes from the previous data formats, while adding a number of improvements, including features such as a controlled vocabulary with validation tools to ensure consistent usage of the format, improved support for selected reaction monitoring data, and immediately available implementations to facilitate rapid adoption by the community. The resulting standard data format, mzML, is a well tested open-source format for mass spectrometer output files that can be readily utilized by the community and easily adapted for incremental advances in mass spectrometry technology.
Collapse
Affiliation(s)
- Lennart Martens
- From the ‡Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- §Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | | | - Marc Sturm
- ‖Eberhard Karls University, 72074, Tübingen, Germany
| | - Darren Kessner
- **University of Southern California, Los Angeles, CA, 90089, USA
| | - Fredrik Levander
- ‡‡Department of Immunotechnology and CREATE Health, Lund University, 22362, Lund, Sweden
| | - Jim Shofstahl
- §§Thermo Fisher Scientific, San Jose, CA, 95134, USA
| | | | | | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, 06120 Halle, Germany
| | | | - Luisa Montecchi-Palazzi
- EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB101SD, UK
| | | | | | - Florian Reisinger
- EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB101SD, UK
| | - Puneet Souda
- University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Henning Hermjakob
- EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB101SD, UK
| | - Pierre-Alain Binz
- Geneva Bioinformatics (GeneBio) SA, 1206 Geneva, Switzerland and Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | |
Collapse
|
28
|
Abstract
In the past decades, a variety of publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. However, there is also an increasing confusion for the researchers who are trying to quickly find the appropriate resources to help them solve their problems. In this chapter, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases and resources that are relevant to comparative proteomics research. We conclude the chapter by discussing the challenges and opportunities for developing new protein bioinformatics databases.
Collapse
|
29
|
Strassberger V, Fugmann T, Neri D, Roesli C. Chemical proteomic and bioinformatic strategies for the identification and quantification of vascular antigens in cancer. J Proteomics 2010; 73:1954-73. [DOI: 10.1016/j.jprot.2010.05.018] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2009] [Revised: 05/27/2010] [Accepted: 05/27/2010] [Indexed: 10/19/2022]
|
30
|
Dunn WB, Broadhurst DI, Atherton HJ, Goodacre R, Griffin JL. Systems level studies of mammalian metabolomes: the roles of mass spectrometry and nuclear magnetic resonance spectroscopy. Chem Soc Rev 2010; 40:387-426. [PMID: 20717559 DOI: 10.1039/b906712b] [Citation(s) in RCA: 567] [Impact Index Per Article: 40.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The study of biological systems in a holistic manner (systems biology) is increasingly being viewed as a necessity to provide qualitative and quantitative descriptions of the emergent properties of the complete system. Systems biology performs studies focussed on the complex interactions of system components; emphasising the whole system rather than the individual parts. Many perturbations to mammalian systems (diet, disease, drugs) are multi-factorial and the study of small parts of the system is insufficient to understand the complete phenotypic changes induced. Metabolomics is one functional level tool being employed to investigate the complex interactions of metabolites with other metabolites (metabolism) but also the regulatory role metabolites provide through interaction with genes, transcripts and proteins (e.g. allosteric regulation). Technological developments are the driving force behind advances in scientific knowledge. Recent advances in the two analytical platforms of mass spectrometry (MS) and nuclear magnetic resonance (NMR) spectroscopy have driven forward the discipline of metabolomics. In this critical review, an introduction to metabolites, metabolomes, metabolomics and the role of MS and NMR spectroscopy will be provided. The applications of metabolomics in mammalian systems biology for the study of the health-disease continuum, drug efficacy and toxicity and dietary effects on mammalian health will be reviewed. The current limitations and future goals of metabolomics in systems biology will also be discussed (374 references).
Collapse
Affiliation(s)
- Warwick B Dunn
- Manchester Centre for Integrative Systems Biology, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK.
| | | | | | | | | |
Collapse
|
31
|
Végvári A, Marko-Varga G. Clinical protein science and bioanalytical mass spectrometry with an emphasis on lung cancer. Chem Rev 2010; 110:3278-98. [PMID: 20415473 DOI: 10.1021/cr100011x] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Akos Végvári
- Division of Clinical Protein Science & Imaging, Biomedical Center, Department of Measurement Technology and Industrial Electrical Engineering, Lund University, BMC C13, SE-221 84 Lund, Sweden
| | | |
Collapse
|
32
|
Yu W, Taylor JA, Davis MT, Bonilla LE, Lee KA, Auger PL, Farnsworth CC, Welcher AA, Patterson SD. Maximizing the sensitivity and reliability of peptide identification in large-scale proteomic experiments by harnessing multiple search engines. Proteomics 2010; 10:1172-89. [PMID: 20101609 DOI: 10.1002/pmic.200900074] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Despite recent advances in qualitative proteomics, the automatic identification of peptides with optimal sensitivity and accuracy remains a difficult goal. To address this deficiency, a novel algorithm, Multiple Search Engines, Normalization and Consensus is described. The method employs six search engines and a re-scoring engine to search MS/MS spectra against protein and decoy sequences. After the peptide hits from each engine are normalized to error rates estimated from the decoy hits, peptide assignments are then deduced using a minimum consensus model. These assignments are produced in a series of progressively relaxed false-discovery rates, thus enabling a comprehensive interpretation of the data set. Additionally, the estimated false-discovery rate was found to have good concordance with the observed false-positive rate calculated from known identities. Benchmarking against standard proteins data sets (ISBv1, sPRG2006) and their published analysis, demonstrated that the Multiple Search Engines, Normalization and Consensus algorithm consistently achieved significantly higher sensitivity in peptide identifications, which led to increased or more robust protein identifications in all data sets compared with prior methods. The sensitivity and the false-positive rate of peptide identification exhibit an inverse-proportional and linear relationship with the number of participating search engines.
Collapse
Affiliation(s)
- Wen Yu
- Computational Biology, Amgen Inc., Seattle, WA 98119-3105, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
33
|
An optimized data structure for high-throughput 3D proteomics data: mzRTree. J Proteomics 2010; 73:1176-82. [DOI: 10.1016/j.jprot.2010.02.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Revised: 02/02/2010] [Accepted: 02/09/2010] [Indexed: 11/18/2022]
|
34
|
Protein Bioinformatics Infrastructure for the Integration and Analysis of Multiple High-Throughput "omics" Data. Adv Bioinformatics 2010:423589. [PMID: 20369061 PMCID: PMC2847380 DOI: 10.1155/2010/423589] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2009] [Accepted: 01/05/2010] [Indexed: 12/26/2022] Open
Abstract
High-throughput “omics” technologies bring new opportunities for biological and biomedical researchers to ask complex questions and gain new scientific insights. However, the voluminous, complex, and context-dependent data being maintained in heterogeneous and distributed environments plus the lack of well-defined data standard and standardized nomenclature imposes a major challenge which requires advanced computational methods and bioinformatics infrastructures for integration, mining, visualization, and comparative analysis to facilitate data-driven hypothesis generation and biological knowledge discovery. In this paper, we present the challenges in high-throughput “omics” data integration and analysis, introduce a protein-centric approach for systems integration of large and heterogeneous high-throughput “omics” data including microarray, mass spectrometry, protein sequence, protein structure, and protein interaction data, and use scientific case study to illustrate how one can use varied “omics” data from different laboratories to make useful connections that could lead to new biological knowledge.
Collapse
|
35
|
Abstract
Mass spectrometry has quickly become an essential tool in molecular biology laboratories. Here, we describe the Trans-Proteomic Pipeline, a collection of software tools, to facilitate the analysis, exchange, and comparison of MS data. The pipeline is instrument-independent and supports most commonly used proteomics workflows, including quantitative applications such as ICAT, iTRAQ, and SILAC. Importantly, the pipeline uses open, standard data formats and calculates accurate estimates of sensitivity and error rates, thus allowing for meaningful data exchange. In this chapter, we will introduce the various components of the pipeline in the context of three typical proteomic use-case scenarios.
Collapse
Affiliation(s)
- Patrick G A Pedrioli
- Institute of Biochemistry, Swiss Federal Institute of Technology Zürich (ETHZ), Zürich, Switzerland.
| |
Collapse
|
36
|
Neuweger H, Albaum SP, Dondrup M, Persicke M, Watt T, Niehaus K, Stoye J, Goesmann A. MeltDB: a software platform for the analysis and integration of metabolomics experiment data. ACTA ACUST UNITED AC 2008; 24:2726-32. [PMID: 18765459 DOI: 10.1093/bioinformatics/btn452] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The recent advances in metabolomics have created the potential to measure the levels of hundreds of metabolites which are the end products of cellular regulatory processes. The automation of the sample acquisition and subsequent analysis in high-throughput instruments that are capable of measuring metabolites is posing a challenge on the necessary systematic storage and computational processing of the experimental datasets. Whereas a multitude of specialized software systems for individual instruments and preprocessing methods exists, there is clearly a need for a free and platform-independent system that allows the standardized and integrated storage and analysis of data obtained from metabolomics experiments. Currently there exists no such system that on the one hand supports preprocessing of raw datasets but also allows to visualize and integrate the results of higher level statistical analyses within a functional genomics context. RESULTS To facilitate the systematic storage, analysis and integration of metabolomics experiments, we have implemented MeltDB, a web-based software platform for the analysis and annotation of datasets from metabolomics experiments. MeltDB supports open file formats (netCDF, mzXML, mzDATA) and facilitates the integration and evaluation of existing preprocessing methods. The system provides researchers with means to consistently describe and store their experimental datasets. Comprehensive analysis and visualization features of metabolomics datasets are offered to the community through a web-based user interface. The system covers the process from raw data to the visualization of results in a knowledge-based background and is integrated into the context of existing software platforms of genomics and transcriptomics at Bielefeld University. We demonstrate the potential of MeltDB by means of a sample experiment where we dissect the influence of three different carbon sources on the gram-negative bacterium Xanthomonas campestris pv. campestris on the level of measured metabolites. Experimental data are stored, analyzed and annotated within MeltDB and accessible via the public MeltDB web server. AVAILABILITY The system is publicly available at http://meltdb.cebitec.uni-bielefeld.de.
Collapse
Affiliation(s)
- Heiko Neuweger
- International NRW Graduate School in Bioinformatics and Genome Research, Bielefeld University, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
37
|
Deus HF, Stanislaus R, Veiga DF, Behrens C, Wistuba II, Minna JD, Garner HR, Swisher SG, Roth JA, Correa AM, Broom B, Coombes K, Chang A, Vogel LH, Almeida JS. A Semantic Web management model for integrative biomedical informatics. PLoS One 2008; 3:e2946. [PMID: 18698353 PMCID: PMC2491554 DOI: 10.1371/journal.pone.0002946] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Accepted: 07/12/2008] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Data, data everywhere. The diversity and magnitude of the data generated in the Life Sciences defies automated articulation among complementary efforts. The additional need in this field for managing property and access permissions compounds the difficulty very significantly. This is particularly the case when the integration involves multiple domains and disciplines, even more so when it includes clinical and high throughput molecular data. METHODOLOGY/PRINCIPAL FINDINGS The emergence of Semantic Web technologies brings the promise of meaningful interoperation between data and analysis resources. In this report we identify a core model for biomedical Knowledge Engineering applications and demonstrate how this new technology can be used to weave a management model where multiple intertwined data structures can be hosted and managed by multiple authorities in a distributed management infrastructure. Specifically, the demonstration is performed by linking data sources associated with the Lung Cancer SPORE awarded to The University of Texas MD Anderson Cancer Center at Houston and the Southwestern Medical Center at Dallas. A software prototype, available with open source at www.s3db.org, was developed and its proposed design has been made publicly available as an open source instrument for shared, distributed data management. CONCLUSIONS/SIGNIFICANCE The Semantic Web technologies have the potential to addresses the need for distributed and evolvable representations that are critical for systems Biology and translational biomedical research. As this technology is incorporated into application development we can expect that both general purpose productivity software and domain specific software installed on our personal computers will become increasingly integrated with the relevant remote resources. In this scenario, the acquisition of a new dataset should automatically trigger the delegation of its analysis.
Collapse
Affiliation(s)
- Helena F. Deus
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Lisboa, Portugal
| | - Romesh Stanislaus
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Diogo F. Veiga
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Carmen Behrens
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Ignacio I. Wistuba
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
- Department of Pathology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - John D. Minna
- Hamon Center for Therapeutic Oncology Research, Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Harold R. Garner
- Hamon Center for Therapeutic Oncology Research, Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Center for Biomedical Inventions, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Stephen G. Swisher
- Department of Thoracic and Cardiovascular Surgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Jack A. Roth
- Department of Thoracic and Cardiovascular Surgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Arlene M. Correa
- Department of Thoracic and Cardiovascular Surgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Bradley Broom
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Kevin Coombes
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Allen Chang
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Lynn H. Vogel
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
- Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
| | - Jonas S. Almeida
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| |
Collapse
|
38
|
Kessner D, Chambers M, Burke R, Agus D, Mallick P. ProteoWizard: open source software for rapid proteomics tools development. ACTA ACUST UNITED AC 2008; 24:2534-6. [PMID: 18606607 PMCID: PMC2732273 DOI: 10.1093/bioinformatics/btn323] [Citation(s) in RCA: 1386] [Impact Index Per Article: 86.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Summary: The ProteoWizard software project provides a modular and extensible set of open-source, cross-platform tools and libraries. The tools perform proteomics data analyses; the libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard proteomics and LCMS dataset computations. The library contains readers and writers of the mzML data format, which has been written using modern C++ techniques and design principles and supports a variety of platforms with native compilers. The software has been specifically released under the Apache v2 license to ensure it can be used in both academic and commercial projects. In addition to the library, we also introduce a rapidly growing set of companion tools whose implementation helps to illustrate the simplicity of developing applications on top of the ProteoWizard library. Availability: Cross-platform software that compiles using native compilers (i.e. GCC on Linux, MSVC on Windows and XCode on OSX) is available for download free of charge, at http://proteowizard.sourceforge.net. This website also provides code examples, and documentation. It is our hope the ProteoWizard project will become a standard platform for proteomics development; consequently, code use, contribution and further development are strongly encouraged. Contact:darren@proteowizard.org; parag@ucla.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Darren Kessner
- Spielberg Family Center for Applied Proteomics, Cedars-Sinai Medical Center, USA.
| | | | | | | | | |
Collapse
|
39
|
Resolving the network of cell signaling pathways using the evolving yeast two-hybrid system. Biotechniques 2008; 44:655-62. [PMID: 18474041 DOI: 10.2144/000112797] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
In 1983, while investigators had identified a few human proteins as important regulators of specific biological outcomes, how these proteins acted in the cell was essentially unknown in almost all cases. Twenty-five years later, our knowledge of the mechanistic basis of protein action has been transformed by our increasingly detailed understanding of protein-protein interactions, which have allowed us to define cellular machines. The advent of the yeast two-hybrid (Y2H) system in 1989 marked a milestone in the field of proteomics. Exploiting the modular nature of transcription factors, the Y2H system allows facile measurement of the activation of reporter genes based on interactions between two chimeric or "hybrid" proteins of interest. After a decade of service as a leading platform for individual investigators to use in exploring the interaction properties of interesting target proteins, the Y2H system has increasingly been applied in high-throughput applications intended to map genome-scale protein-protein interactions for model organisms and humans. Although some significant technical limitations apply, Y2H has made a great contribution to our general understanding of the topology of cellular signaling networks.
Collapse
|
40
|
Chatr-Aryamontri A, Ceol A, Licata L, Cesareni G. Protein interactions: integration leads to belief. Trends Biochem Sci 2008; 33:241-2; author reply 242-3. [PMID: 18472267 DOI: 10.1016/j.tibs.2008.04.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2008] [Accepted: 04/01/2008] [Indexed: 10/22/2022]
|