1
|
Peng H, Wang H, Kong W, Li J, Goh WWB. Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference. Nat Commun 2024; 15:3922. [PMID: 38724498 PMCID: PMC11082229 DOI: 10.1038/s41467-024-47899-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 04/16/2024] [Indexed: 05/12/2024] Open
Abstract
Identification of differentially expressed proteins in a proteomics workflow typically encompasses five key steps: raw data quantification, expression matrix construction, matrix normalization, missing value imputation (MVI), and differential expression analysis. The plethora of options in each step makes it challenging to identify optimal workflows that maximize the identification of differentially expressed proteins. To identify optimal workflows and their common properties, we conduct an extensive study involving 34,576 combinatoric experiments on 24 gold standard spike-in datasets. Applying frequent pattern mining techniques to top-ranked workflows, we uncover high-performing rules that demonstrate optimality has conserved properties. Via machine learning, we confirm optimal workflows are indeed predictable, with average cross-validation F1 scores and Matthew's correlation coefficients surpassing 0.84. We introduce an ensemble inference to integrate results from individual top-performing workflows for expanding differential proteome coverage and resolve inconsistencies. Ensemble inference provides gains in pAUC (up to 4.61%) and G-mean (up to 11.14%) and facilitates effective aggregation of information across varied quantification approaches such as topN, directLFQ, MaxLFQ intensities, and spectral counts. However, further development and evaluation are needed to establish acceptable frameworks for conducting ensemble inference on multiple proteomics workflows.
Collapse
Affiliation(s)
- Hui Peng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - He Wang
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Jinyan Li
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore.
- Center of AI in Medicine, Nanyang Technological University, Singapore, Singapore.
- Division of Neurology, Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, UK.
| |
Collapse
|
2
|
Singh V, Singh R, Kushwaha R. Exploring novel protein biomarkers for early-stage diagnosis and prognosis of T-acute lymphoblastic leukemia (T-ALL). Hematol Transfus Cell Ther 2024:S2531-1379(24)00063-4. [PMID: 38584071 DOI: 10.1016/j.htct.2024.02.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Accepted: 02/12/2024] [Indexed: 04/09/2024] Open
Abstract
BACKGROUND Efficient classification of T-acute lymphoblastic leukemia (T-ALL) involves considering various factors, such as age, white blood cell count, and chromosomal alterations. However, studying protein markers are crucial to improving T-ALL patients' diagnosis and treatment. A study analyzing the expression of proteomes was conducted to identify promising early-stage biomarkers for T-ALL patients METHODS: Label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) was used to analyze the blood proteins of both patients and healthy individuals to identify new biomarkers for T-ALL. The findings were validated by RT-PCR, ELISA and computational analysis RESULTS: The study identified 1467 proteins in the blood, of which nine were upregulated and 35 were downregulated by more than 2-fold. T-ALL patients showed a significant increase in specific disease-related proteins, such as eleven-nineteen lysine-rich leukemia protein, triggering receptor expressed on myeloid cells 1, cisplatin resistance-associated-overexpressed protein, X-ray radiation resistance-associated protein 1, tumor necrosis factor receptor superfamily member 10D, protein S100-A8, and copine-4, by more than 3-fold CONCLUSION: The findings of this study provide a valuable protein map of leukemic cells and identify potential biomarkers for leukemic aggressiveness. However, further studies using larger T-ALL patient samples must confirm these preliminary results.
Collapse
Affiliation(s)
- Vivek Singh
- King George's Medical University, Lucknow, UP, India
| | - Ranjana Singh
- King George's Medical University, Lucknow, UP, India,.
| | | |
Collapse
|
3
|
Li J, Xiong Y, Feng S, Pan C, Guo X. CloudProteoAnalyzer: scalable processing of big data from proteomics using cloud computing. BIOINFORMATICS ADVANCES 2024; 4:vbae024. [PMID: 38495055 PMCID: PMC10942798 DOI: 10.1093/bioadv/vbae024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 02/05/2024] [Accepted: 02/21/2024] [Indexed: 03/19/2024]
Abstract
Summary Shotgun proteomics is widely used in many system biology studies to determine the global protein expression profiles of tissues, cultures, and microbiomes. Many non-distributed computer algorithms have been developed for users to process proteomics data on their local computers. However, the amount of data acquired in a typical proteomics study has grown rapidly in recent years, owing to the increasing throughput of mass spectrometry and the expanding scale of study designs. This presents a big data challenge for researchers to process proteomics data in a timely manner. To overcome this challenge, we developed a cloud-based parallel computing application to offer end-to-end proteomics data analysis software as a service (SaaS). A web interface was provided to users to upload mass spectrometry-based proteomics data, configure parameters, submit jobs, and monitor job status. The data processing was distributed across multiple nodes in a supercomputer to achieve scalability for large datasets. Our study demonstrated SaaS for proteomics as a viable solution for the community to scale up the data processing using cloud computing. Availability and implementation This application is available online at https://sipros.oscer.ou.edu/ or https://sipros.unt.edu for free use. The source code is available at https://github.com/Biocomputing-Research-Group/CloudProteoAnalyzer under the GPL version 3.0 license.
Collapse
Affiliation(s)
- Jiancheng Li
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
| | - Yi Xiong
- School of Biological Sciences, University of Oklahoma, Norman, OK 73019, United States
| | - Shichao Feng
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
| | - Chongle Pan
- School of Biological Sciences, University of Oklahoma, Norman, OK 73019, United States
- School of Computer Science, University of Oklahoma, Norman, OK 73019, United States
| | - Xuan Guo
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
| |
Collapse
|
4
|
Henke AN, Chilukuri S, Langan LM, Brooks BW. Reporting and reproducibility: Proteomics of fish models in environmental toxicology and ecotoxicology. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 912:168455. [PMID: 37979845 DOI: 10.1016/j.scitotenv.2023.168455] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/20/2023]
Abstract
Environmental toxicology and ecotoxicology research efforts are employing proteomics with fish models as New Approach Methodologies, along with in silico, in vitro and other omics techniques to elucidate hazards of toxicants and toxins. We performed a critical review of toxicology studies with fish models using proteomics and reported fundamental parameters across experimental design, sample preparation, mass spectrometry, and bioinformatics of fish, which represent alternative vertebrate models in environmental toxicology, and routinely studied animals in ecotoxicology. We observed inconsistencies in reporting and methodologies among experimental designs, sample preparations, data acquisitions and bioinformatics, which can affect reproducibility of experimental results. We identified a distinct need to develop reporting guidelines for proteomics use in environmental toxicology and ecotoxicology, increased QA/QC throughout studies, and method optimization with an emphasis on reducing inconsistencies among studies. Several recommendations are offered as logical steps to advance development and application of this emerging research area to understand chemical hazards to public health and the environment.
Collapse
Affiliation(s)
- Abigail N Henke
- Department of Biology, Baylor University Waco, TX, USA; Center for Reservoir and Aquatic Systems Research (CRASR), Baylor University Waco, TX, USA
| | | | - Laura M Langan
- Department of Environmental Science, Baylor University Waco, TX, USA; Center for Reservoir and Aquatic Systems Research (CRASR), Baylor University Waco, TX, USA.
| | - Bryan W Brooks
- Department of Environmental Science, Baylor University Waco, TX, USA; Center for Reservoir and Aquatic Systems Research (CRASR), Baylor University Waco, TX, USA.
| |
Collapse
|
5
|
Carvalho LB, Teigas-Campos PAD, Jorge S, Protti M, Mercolini L, Dhir R, Wiśniewski JR, Lodeiro C, Santos HM, Capelo JL. Normalization methods in mass spectrometry-based analytical proteomics: A case study based on renal cell carcinoma datasets. Talanta 2024; 266:124953. [PMID: 37490822 DOI: 10.1016/j.talanta.2023.124953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 07/07/2023] [Accepted: 07/14/2023] [Indexed: 07/27/2023]
Abstract
Normalization is a crucial step in proteomics data analysis as it enables data adjustment and enhances comparability between datasets by minimizing multiple sources of variability, such as sampling, sample handling, storage, treatment, and mass spectrometry measurements. In this study, we investigated different normalization methods, including Z-score normalization, median divide normalization, and quantile normalization, to evaluate their performance using a case study based on renal cell carcinoma datasets. Our results demonstrate that when comparing datasets by pairs, both the Z-score and quantile normalization methods consistently provide better results in terms of the number of proteins identified and quantified as well as in identifying statistically significant up or down-regulated proteins. However, when three or more datasets are compared at the same time the differences are found to be negligible.
Collapse
Affiliation(s)
- Luis B Carvalho
- BIOSCOPE Group, LAQV-REQUIMTE, Chemistry Department, NOVA School of Science and Technology, FCT NOVA, Universidade NOVA de Lisboa, 2829-516, Caparica, Portugal; PROTEOMASS Scientific Society, Madan Park, 2829-516, Caparica, Portugal
| | - Pedro A D Teigas-Campos
- BIOSCOPE Group, LAQV-REQUIMTE, Chemistry Department, NOVA School of Science and Technology, FCT NOVA, Universidade NOVA de Lisboa, 2829-516, Caparica, Portugal; PROTEOMASS Scientific Society, Madan Park, 2829-516, Caparica, Portugal
| | - Susana Jorge
- BIOSCOPE Group, LAQV-REQUIMTE, Chemistry Department, NOVA School of Science and Technology, FCT NOVA, Universidade NOVA de Lisboa, 2829-516, Caparica, Portugal; PROTEOMASS Scientific Society, Madan Park, 2829-516, Caparica, Portugal
| | - Michele Protti
- Research Group of Pharmaco-Toxicological Analysis (PTA Lab), Department of Pharmacy and Biotechnology (FaBiT), Alma Mater Studiorum - University of Bologna, Via Belmeloro 6, 40126, Bologna, Italy
| | - Laura Mercolini
- Research Group of Pharmaco-Toxicological Analysis (PTA Lab), Department of Pharmacy and Biotechnology (FaBiT), Alma Mater Studiorum - University of Bologna, Via Belmeloro 6, 40126, Bologna, Italy
| | - Rajiv Dhir
- Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Jacek R Wiśniewski
- Biochemical Proteomics Group, Department of Proteomics and Signal Transduction, Max-Planck-Institute of Biochemistry, Martinsried, Germany
| | - Carlos Lodeiro
- BIOSCOPE Group, LAQV-REQUIMTE, Chemistry Department, NOVA School of Science and Technology, FCT NOVA, Universidade NOVA de Lisboa, 2829-516, Caparica, Portugal; PROTEOMASS Scientific Society, Madan Park, 2829-516, Caparica, Portugal
| | - Hugo M Santos
- BIOSCOPE Group, LAQV-REQUIMTE, Chemistry Department, NOVA School of Science and Technology, FCT NOVA, Universidade NOVA de Lisboa, 2829-516, Caparica, Portugal; PROTEOMASS Scientific Society, Madan Park, 2829-516, Caparica, Portugal; Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, PA, USA.
| | - José L Capelo
- BIOSCOPE Group, LAQV-REQUIMTE, Chemistry Department, NOVA School of Science and Technology, FCT NOVA, Universidade NOVA de Lisboa, 2829-516, Caparica, Portugal; PROTEOMASS Scientific Society, Madan Park, 2829-516, Caparica, Portugal.
| |
Collapse
|
6
|
Montero-Calle A, Garranzo-Asensio M, Rejas-González R, Feliu J, Mendiola M, Peláez-García A, Barderas R. Benefits of FAIMS to Improve the Proteome Coverage of Deteriorated and/or Cross-Linked TMT 10-Plex FFPE Tissue and Plasma-Derived Exosomes Samples. Proteomes 2023; 11:35. [PMID: 37987315 PMCID: PMC10661291 DOI: 10.3390/proteomes11040035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/20/2023] [Accepted: 10/20/2023] [Indexed: 11/22/2023] Open
Abstract
The proteome characterization of complex, deteriorated, or cross-linked protein mixtures as paired clinical FFPE or exosome samples isolated from low plasma volumes (250 µL) might be a challenge. In this work, we aimed at investigating the benefits of FAIMS technology coupled to the Orbitrap Exploris 480 mass spectrometer for the TMT quantitative proteomics analyses of these complex samples in comparison to the analysis of protein extracts from cells, frozen tissue, and exosomes isolated from large volume plasma samples (3 mL). TMT experiments were performed using a two-hour gradient LC-MS/MS with or without FAIMS and two compensation voltages (CV = -45 and CV = -60). In the TMT experiments of cells, frozen tissue, or exosomes isolated from large plasma volumes (3 mL) with FAIMS, a limited increase in the number of identified and quantified proteins accompanied by a decrease in the number of peptides identified and quantified was observed. However, we demonstrated here a noticeable improvement (>100%) in the number of peptide and protein identifications and quantifications for the plasma exosomes isolated from low plasma volumes (250 µL) and FFPE tissue samples in TMT experiments with FAIMS in comparison to the LC-MS/MS analysis without FAIMS. Our results highlight the potential of mass spectrometry analyses with FAIMS to increase the depth into the proteome of complex samples derived from deteriorated, cross-linked samples and/or those where the material was scarce, such as FFPE and plasma-derived exosomes from low plasma volumes (250 µL), which might aid in the characterization of their proteome and proteoforms and in the identification of dysregulated proteins that could be used as biomarkers.
Collapse
Affiliation(s)
- Ana Montero-Calle
- Chronic Disease Programme (UFIEC), Instituto de Salud Carlos III, 28220 Majadahonda, Spain; (M.G.-A.); (R.R.-G.)
| | - María Garranzo-Asensio
- Chronic Disease Programme (UFIEC), Instituto de Salud Carlos III, 28220 Majadahonda, Spain; (M.G.-A.); (R.R.-G.)
| | - Raquel Rejas-González
- Chronic Disease Programme (UFIEC), Instituto de Salud Carlos III, 28220 Majadahonda, Spain; (M.G.-A.); (R.R.-G.)
| | - Jaime Feliu
- Translational Oncology Group, La Paz University Hospital (IdiPAZ), 28046 Madrid, Spain;
- Center for Biomedical Research in the Cancer Network (CIBERONC), Instituto de Salud Carlos III, 28046 Madrid, Spain;
| | - Marta Mendiola
- Center for Biomedical Research in the Cancer Network (CIBERONC), Instituto de Salud Carlos III, 28046 Madrid, Spain;
- Molecular Pathology and Therapeutic Targets Group, La Paz University Hospital (IdiPAZ), 28046 Madrid, Spain;
| | - Alberto Peláez-García
- Molecular Pathology and Therapeutic Targets Group, La Paz University Hospital (IdiPAZ), 28046 Madrid, Spain;
| | - Rodrigo Barderas
- Chronic Disease Programme (UFIEC), Instituto de Salud Carlos III, 28220 Majadahonda, Spain; (M.G.-A.); (R.R.-G.)
| |
Collapse
|
7
|
Bennike TB. Advances in proteomics: characterization of the innate immune system after birth and during inflammation. Front Immunol 2023; 14:1254948. [PMID: 37868984 PMCID: PMC10587584 DOI: 10.3389/fimmu.2023.1254948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 09/26/2023] [Indexed: 10/24/2023] Open
Abstract
Proteomics is the characterization of the protein composition, the proteome, of a biological sample. It involves the large-scale identification and quantification of proteins, peptides, and post-translational modifications. This review focuses on recent developments in mass spectrometry-based proteomics and provides an overview of available methods for sample preparation to study the innate immune system. Recent advancements in the proteomics workflows, including sample preparation, have significantly improved the sensitivity and proteome coverage of biological samples including the technically difficult blood plasma. Proteomics is often applied in immunology and has been used to characterize the levels of innate immune system components after perturbations such as birth or during chronic inflammatory diseases like rheumatoid arthritis (RA) and inflammatory bowel disease (IBD). In cancers, the tumor microenvironment may generate chronic inflammation and release cytokines to the circulation. In these situations, the innate immune system undergoes profound and long-lasting changes, the large-scale characterization of which may increase our biological understanding and help identify components with translational potential for guiding diagnosis and treatment decisions. With the ongoing technical development, proteomics will likely continue to provide increasing insights into complex biological processes and their implications for health and disease. Integrating proteomics with other omics data and utilizing multi-omics approaches have been demonstrated to give additional valuable insights into biological systems.
Collapse
Affiliation(s)
- Tue Bjerg Bennike
- Medical Microbiology and Immunology, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| |
Collapse
|
8
|
Lou N, Wang G, Wang Y, Xu M, Zhou Y, Tan Q, Zhong Q, Zhang L, Zhang X, Liu S, Luo R, Wang S, Tang L, Yao J, Zhang Z, Shi Y, Yu X, Han X. Proteomics Identifies Circulating TIMP-1 as a Prognostic Biomarker for Diffuse Large B-Cell Lymphoma. Mol Cell Proteomics 2023; 22:100625. [PMID: 37500057 PMCID: PMC10470290 DOI: 10.1016/j.mcpro.2023.100625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 06/24/2023] [Accepted: 07/24/2023] [Indexed: 07/29/2023] Open
Abstract
Diffuse large B-cell lymphoma (DLBCL) is a heterogeneous disease, although disease stratification using in-depth plasma proteomics has not been performed to date. By measuring more than 1000 proteins in the plasma of 147 DLBCL patients using data-independent acquisition mass spectrometry and antibody array, DLBCL patients were classified into four proteomic subtypes (PS-I-IV). Patients with the PS-IV subtype and worst prognosis had increased levels of proteins involved in inflammation, including a high expression of metalloproteinase inhibitor-1 (TIMP-1) that was associated with poor survival across two validation cohorts (n = 180). Notably, the combination of TIMP-1 with the international prognostic index (IPI) identified 64.00% to 88.24% of relapsed and 65.00% to 80.49% of deceased patients in the discovery and two validation cohorts, which represents a 24.00% to 41.67% and 20.00% to 31.70% improvement compared to the IPI score alone, respectively. Taken together, we demonstrate that DLBCL heterogeneity is reflected in the plasma proteome and that TIMP-1, together with the IPI, could improve the prognostic stratification of patients.
Collapse
Affiliation(s)
- Ning Lou
- Department of Clinical Laboratory, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China
| | - Guibin Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing, China
| | - Yanrong Wang
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China
| | - Meng Xu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing, China
| | - Yu Zhou
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China
| | - Qiaoyun Tan
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China
| | - Qiaofeng Zhong
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China
| | - Lei Zhang
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China
| | - Xiaomei Zhang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing, China
| | - Shuxia Liu
- Department of Clinical Laboratory, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China
| | - Rongrong Luo
- Department of Clinical Laboratory, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China
| | - Shasha Wang
- Department of Clinical Laboratory, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China
| | - Le Tang
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China
| | - Jiarui Yao
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China
| | - Zhishang Zhang
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China
| | - Yuankai Shi
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study on Anticancer Molecular Targeted Drugs, Beijing, China.
| | - Xiaobo Yu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing, China.
| | - Xiaohong Han
- Clinical Pharmacology Research Center, Peking Union Medical College Hospital, State Key Laboratory of Complex Severe and Rare Diseases, NMPA Key Laboratory for Clinical Research and Evaluation of Drug, Beijing Key Laboratory of Clinical PK & PD Investigation for Innovative Drugs, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
| |
Collapse
|
9
|
Kourti M, Aivaliotis M, Hatzipantelis E. Proteomics in Childhood Acute Lymphoblastic Leukemia: Challenges and Opportunities. Diagnostics (Basel) 2023; 13:2748. [PMID: 37685286 PMCID: PMC10487225 DOI: 10.3390/diagnostics13172748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/20/2023] [Accepted: 08/21/2023] [Indexed: 09/10/2023] Open
Abstract
Acute lymphoblastic leukemia (ALL) is the most common cancer in children and one of the success stories in cancer therapeutics. Risk-directed therapy based on clinical, biologic and genetic features has played a significant role in this accomplishment. Despite the observed improvement in survival rates, leukemia remains one of the leading causes of cancer-related deaths. Implementation of next-generation genomic and transcriptomic sequencing tools has illustrated the genomic landscape of ALL. However, the underlying dynamic changes at protein level still remain a challenge. Proteomics is a cutting-edge technology aimed at deciphering the mechanisms, pathways, and the degree to which the proteome impacts leukemia subtypes. Advances in mass spectrometry enable high-throughput collection of global proteomic profiles, representing an opportunity to unveil new biological markers and druggable targets. The purpose of this narrative review article is to provide a comprehensive overview of studies that have utilized applications of proteomics in an attempt to gain insight into the pathogenesis and identification of biomarkers in childhood ALL.
Collapse
Affiliation(s)
- Maria Kourti
- Third Department of Pediatrics, School of Medicine, Aristotle University and Hippokration General Hospital, 54642 Thessaloniki, Greece
| | - Michalis Aivaliotis
- Laboratory of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece;
| | - Emmanouel Hatzipantelis
- Children & Adolescent Hematology-Oncology Unit, Second Department of Pediatrics, School of Medicine, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece;
| |
Collapse
|
10
|
Boekweg H, Payne SH. Challenges and Opportunities for Single-cell Computational Proteomics. Mol Cell Proteomics 2023; 22:100518. [PMID: 36828128 PMCID: PMC10060113 DOI: 10.1016/j.mcpro.2023.100518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/15/2023] [Accepted: 02/17/2023] [Indexed: 02/25/2023] Open
Abstract
Single-cell proteomics is growing rapidly and has made several technological advancements. As most research has been focused on improving instrumentation and sample preparation methods, very little attention has been given to algorithms responsible for identifying and quantifying proteins. Given the inherent difference between bulk data and single-cell data, it is necessary to realize that current algorithms being employed on single-cell data were designed for bulk data and have underlying assumptions that may not hold true for single-cell data. In order to develop and optimize algorithms for single-cell data, we need to characterize the differences between single-cell data and bulk data and assess how current algorithms perform on single-cell data. Here, we present a review of algorithms responsible for identifying and quantifying peptides and proteins. We will give a review of how each type of algorithm works, assumptions it relies on, how it performs on single-cell data, and possible optimizations and solutions that could be used to address the differences in single-cell data.
Collapse
Affiliation(s)
- Hannah Boekweg
- Biology Department, Brigham Young University, Provo, Utah, USA
| | - Samuel H Payne
- Biology Department, Brigham Young University, Provo, Utah, USA.
| |
Collapse
|
11
|
Wolski WE, Nanni P, Grossmann J, d'Errico M, Schlapbach R, Panse C. prolfqua: A Comprehensive R-Package for Proteomics Differential Expression Analysis. J Proteome Res 2023; 22:1092-1104. [PMID: 36939687 PMCID: PMC10088014 DOI: 10.1021/acs.jproteome.2c00441] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2023]
Abstract
Mass spectrometry is widely used for quantitative proteomics studies, relative protein quantification, and differential expression analysis of proteins. There is a large variety of quantification software and analysis tools. Nevertheless, there is a need for a modular, easy-to-use application programming interface in R that transparently supports a variety of well principled statistical procedures to make applying them to proteomics data, comparing and understanding their differences easy. The prolfqua package integrates essential steps of the mass spectrometry-based differential expression analysis workflow: quality control, data normalization, protein aggregation, statistical modeling, hypothesis testing, and sample size estimation. The package makes integrating new data formats easy. It can be used to model simple experimental designs with a single explanatory variable and complex experiments with multiple factors and hypothesis testing. The implemented methods allow sensitive and specific differential expression analysis. Furthermore, the package implements benchmark functionality that can help to compare data acquisition, data preprocessing, or data modeling methods using a gold standard data set. The application programmer interface of prolfqua strives to be clear, predictable, discoverable, and consistent to make proteomics data analysis application development easy and exciting. Finally, the prolfqua R-package is available on GitHub https://github.com/fgcz/prolfqua, distributed under the MIT license. It runs on all platforms supported by the R free software environment for statistical computing and graphics.
Collapse
Affiliation(s)
- Witold E Wolski
- Functional Genomics Center Zurich (FGCZ)-University of Zurich/ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland.,Swiss Institute of Bioinformatics (SIB) Quartier Sorge-Batiment Amphipole, 1015 Lausanne, Switzerland
| | - Paolo Nanni
- Functional Genomics Center Zurich (FGCZ)-University of Zurich/ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
| | - Jonas Grossmann
- Functional Genomics Center Zurich (FGCZ)-University of Zurich/ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland.,Swiss Institute of Bioinformatics (SIB) Quartier Sorge-Batiment Amphipole, 1015 Lausanne, Switzerland
| | - Maria d'Errico
- Functional Genomics Center Zurich (FGCZ)-University of Zurich/ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland.,Swiss Institute of Bioinformatics (SIB) Quartier Sorge-Batiment Amphipole, 1015 Lausanne, Switzerland
| | - Ralph Schlapbach
- Functional Genomics Center Zurich (FGCZ)-University of Zurich/ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
| | - Christian Panse
- Functional Genomics Center Zurich (FGCZ)-University of Zurich/ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland.,Swiss Institute of Bioinformatics (SIB) Quartier Sorge-Batiment Amphipole, 1015 Lausanne, Switzerland
| |
Collapse
|
12
|
Fu J, Yang Q, Luo Y, Zhang S, Tang J, Zhang Y, Zhang H, Xu H, Zhu F. Label-free proteome quantification and evaluation. Brief Bioinform 2023; 24:6833644. [PMID: 36403090 DOI: 10.1093/bib/bbac477] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 09/24/2022] [Accepted: 10/08/2022] [Indexed: 11/21/2022] Open
Abstract
The label-free quantification (LFQ) has emerged as an exceptional technique in proteomics owing to its broad proteome coverage, great dynamic ranges and enhanced analytical reproducibility. Due to the extreme difficulty lying in an in-depth quantification, the LFQ chains incorporating a variety of transformation, pretreatment and imputation methods are required and constructed. However, it remains challenging to determine the well-performing chain, owing to its strong dependence on the studied data and the diverse possibility of integrated chains. In this study, an R package EVALFQ was therefore constructed to enable a performance evaluation on >3000 LFQ chains. This package is unique in (a) automatically evaluating the performance using multiple criteria, (b) exploring the quantification accuracy based on spiking proteins and (c) discovering the well-performing chains by comprehensive assessment. All in all, because of its superiority in assessing from multiple perspectives and scanning among over 3000 chains, this package is expected to attract broad interests from the fields of proteomic quantification. The package is available at https://github.com/idrblab/EVALFQ.
Collapse
Affiliation(s)
- Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Qingxia Yang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Song Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jing Tang
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China
| | - Ying Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Hanxiang Xu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
13
|
Välikangas T, Suomi T, Chandler CE, Scott AJ, Tran BQ, Ernst RK, Goodlett DR, Elo LL. Benchmarking tools for detecting longitudinal differential expression in proteomics data allows establishing a robust reproducibility optimization regression approach. Nat Commun 2022; 13:7877. [PMID: 36550114 PMCID: PMC9780321 DOI: 10.1038/s41467-022-35564-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 12/09/2022] [Indexed: 12/24/2022] Open
Abstract
Quantitative proteomics has matured into an established tool and longitudinal proteomics experiments have begun to emerge. However, no effective, simple-to-use differential expression method for longitudinal proteomics data has been released. Typically, such data is noisy, contains missing values, and has only few time points and biological replicates. To address this need, we provide a comprehensive evaluation of several existing differential expression methods for high-throughput longitudinal omics data and introduce a Robust longitudinal Differential Expression (RolDE) approach. The methods are evaluated using over 3000 semi-simulated spike-in proteomics datasets and three large experimental datasets. In the comparisons, RolDE performs overall best; it is most tolerant to missing values, displays good reproducibility and is the top method in ranking the results in a biologically meaningful way. Furthermore, RolDE is suitable for different types of data with typically unknown patterns in longitudinal expression and can be applied by non-experienced users.
Collapse
Affiliation(s)
- Tommi Välikangas
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Tomi Suomi
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | | | - Alison J Scott
- University of Maryland - Baltimore, Baltimore, MD, 21201, USA
| | - Bao Q Tran
- US Army 20th Support Command CBRNE Analytical and Remediation Activity, Baltimore, MD, 21010-5424, USA
| | - Robert K Ernst
- University of Maryland - Baltimore, Baltimore, MD, 21201, USA
| | - David R Goodlett
- University of Victoria, Victoria, BC, V8P 3E6, Canada
- International Centre for Cancer Vaccine Science, Gdansk, Poland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland.
- Institute of Biomedicine, University of Turku, FI-20520, Turku, Finland.
| |
Collapse
|
14
|
Comparative tissue proteomics reveals unique action mechanisms of vaccine adjuvants. iScience 2022; 26:105800. [PMID: 36619976 PMCID: PMC9813788 DOI: 10.1016/j.isci.2022.105800] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 11/10/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022] Open
Abstract
Radiofrequency adjuvant (RFA) was recently developed to boost influenza vaccination without the safety concerns of chemical adjuvants due to their physical nature. Yet, the action mechanisms of RFA remain largely unknown. Omics techniques offer new opportunities to identify molecular mechanisms of RFA. This study utilized comparative tissue proteomics to explore molecular mechanisms of the physical RFA. Comparison of RFA and chemical adjuvant (Alum, AddaVax, MPL, MPL/Alum)-induced tissue proteome changes identified 14 exclusively induced proteins by RFA, among which heat shock protein (HSP) 70 was selected for further analysis due to its known immune-modulating functions. RFA showed much weakened ability to boost ovalbumin and pandemic influenza vaccination in HSP70 knockout than wild-type mice, hinting crucial roles of HSP70 in RFA effects. This study supports comparative tissue proteomics to be an effective tool to study molecular mechanisms of vaccine adjuvants.
Collapse
|
15
|
Dressler FF, Brägelmann J, Reischl M, Perner S. Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure. Mol Cell Proteomics 2022; 21:100269. [PMID: 35853575 PMCID: PMC9450154 DOI: 10.1016/j.mcpro.2022.100269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 06/16/2022] [Accepted: 07/13/2022] [Indexed: 11/17/2022] Open
Abstract
Several algorithms for the normalization of proteomic data are currently available, each based on a priori assumptions. Among these is the extent to which differential expression (DE) can be present in the dataset. This factor is usually unknown in explorative biomarker screens. Simultaneously, the increasing depth of proteomic analyses often requires the selection of subsets with a high probability of being DE to obtain meaningful results in downstream bioinformatical analyses. Based on the relationship of technical variation and (true) biological DE of an unknown share of proteins, we propose the “Normics” algorithm: Proteins are ranked based on their expression level–corrected variance and the mean correlation with all other proteins. The latter serves as a novel indicator of the non-DE likelihood of a protein in a given dataset. Subsequent normalization is based on a subset of non-DE proteins only. No a priori information such as batch, clinical, or replicate group is necessary. Simulation data demonstrated robust and superior performance across a wide range of stochastically chosen parameters. Five publicly available spike-in and biologically variant datasets were reliably and quantitively accurately normalized by Normics with improved performance compared to standard variance stabilization as well as median, quantile, and LOESS normalizations. In complex biological datasets Normics correctly determined proteins as being DE that had been cross-validated by an independent transcriptome analysis of the same samples. In both complex datasets Normics identified the most DE proteins. We demonstrate that combining variance analysis and data-inherent correlation structure to identify non-DE proteins improves data normalization. Standard normalization algorithms can be consolidated against high shares of (one-sided) biological regulation. The statistical power of downstream analyses can be increased by focusing on Normics-selected subsets of high DE likelihood. Normics is a tool for the normalization of proteomic data based on existing algorithms. Specifically addresses data with high shares of differential expression. Combines variance and data-inherent correlation structure. Provides a ranking of differential expression likelihood. Enables normalization based on the most stable proteins.
Collapse
Affiliation(s)
- Franz F Dressler
- Institute of Pathology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany; Institute of Pathology, University Medical Center Schleswig-Holstein, Luebeck Site, Luebeck, Germany.
| | - Johannes Brägelmann
- Mildred Scheel School of Oncology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany; Department of Translational Genomics, University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany; Center for Molecular Medicine Cologne, University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Markus Reischl
- Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Sven Perner
- Institute of Pathology, University Medical Center Schleswig-Holstein, Luebeck Site, Luebeck, Germany; Institute of Pathology, Research Center Borstel, Leibniz Lung Center, Borstel, Germany
| |
Collapse
|
16
|
Fröhlich K, Brombacher E, Fahrner M, Vogele D, Kook L, Pinter N, Bronsert P, Timme-Bronsert S, Schmidt A, Bärenfaller K, Kreutz C, Schilling O. Benchmarking of analysis strategies for data-independent acquisition proteomics using a large-scale dataset comprising inter-patient heterogeneity. Nat Commun 2022; 13:2622. [PMID: 35551187 PMCID: PMC9098472 DOI: 10.1038/s41467-022-30094-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 04/14/2022] [Indexed: 12/25/2022] Open
Abstract
Numerous software tools exist for data-independent acquisition (DIA) analysis of clinical samples, necessitating their comprehensive benchmarking. We present a benchmark dataset comprising real-world inter-patient heterogeneity, which we use for in-depth benchmarking of DIA data analysis workflows for clinical settings. Combining spectral libraries, DIA software, sparsity reduction, normalization, and statistical tests results in 1428 distinct data analysis workflows, which we evaluate based on their ability to correctly identify differentially abundant proteins. From our dataset, we derive bootstrap datasets of varying sample sizes and use the whole range of bootstrap datasets to robustly evaluate each workflow. We find that all DIA software suites benefit from using a gas-phase fractionated spectral library, irrespective of the library refinement used. Gas-phase fractionation-based libraries perform best against two out of three reference protein lists. Among all investigated statistical tests non-parametric permutation-based statistical tests consistently perform best. Data independent acquisition (DIA) has been gaining momentum in clinical proteomics. Here, the authors create a benchmark dataset comprising inter-patient heterogeneity to compare popular DIA data analysis workflows for identifying differentially abundant proteins.
Collapse
Affiliation(s)
- Klemens Fröhlich
- Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany.,Faculty of Biology, University of Freiburg, Freiburg im Breisgau, Germany.,Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, Freiburg im Breisgau, Germany
| | - Eva Brombacher
- Faculty of Biology, University of Freiburg, Freiburg im Breisgau, Germany.,Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, Freiburg im Breisgau, Germany.,Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg im Breisgau, Germany.,Centre for Integrative Biological Signaling Studies (CIBSS), University of Freiburg, Freiburg im Breisgau, Germany
| | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany.,Faculty of Biology, University of Freiburg, Freiburg im Breisgau, Germany.,Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, Freiburg im Breisgau, Germany
| | - Daniel Vogele
- Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany.,Faculty of Biology, University of Freiburg, Freiburg im Breisgau, Germany
| | - Lucas Kook
- Epidemiology, Biostatistics & Prevention Institute, University of Zurich, Zurich, Switzerland.,Institute for Data Analysis and Process Design, Zurich University of Applied Sciences, Winterthur, Switzerland
| | - Niko Pinter
- Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany
| | - Peter Bronsert
- Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany.,German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany.,Tumorbank Comprehensive Cancer Center Freiburg, Medical Center University of Freiburg, Freiburg im Breisgau, Germany
| | - Sylvia Timme-Bronsert
- Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany.,Tumorbank Comprehensive Cancer Center Freiburg, Medical Center University of Freiburg, Freiburg im Breisgau, Germany
| | - Alexander Schmidt
- Proteomics Core Facility, Biozentrum, University of Basel, Basel, Switzerland
| | - Katja Bärenfaller
- Swiss Institute of Allergy and Asthma Research (SIAF), University of Zurich, and Swiss Institute of Bioinformatics (SIB), Wolfgang, Switzerland
| | - Clemens Kreutz
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg im Breisgau, Germany.,Centre for Integrative Biological Signaling Studies (CIBSS), University of Freiburg, Freiburg im Breisgau, Germany
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg im Breisgau, Germany. .,German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany. .,BIOSS Centre for Biological Signaling Studies, University of Freiburg, Freiburg im Breisgau, Germany.
| |
Collapse
|
17
|
Lin MH, Wu PS, Wong TH, Lin IY, Lin J, Cox J, Yu SH. Benchmarking differential expression, imputation and quantification methods for proteomics data. Brief Bioinform 2022; 23:6566001. [PMID: 35397162 DOI: 10.1093/bib/bbac138] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/22/2022] [Accepted: 03/25/2022] [Indexed: 11/14/2022] Open
Abstract
Data analysis is a critical part of quantitative proteomics studies in interpreting biological questions. Numerous computational tools for protein quantification, imputation and differential expression (DE) analysis were generated in the past decade and the search for optimal tools is still going on. Moreover, due to the rapid development of RNA sequencing (RNA-seq) technology, a vast number of DE analysis methods were created for that purpose. The applicability of these newly developed RNA-seq-oriented tools to proteomics data remains in doubt. In order to benchmark these analysis methods, a proteomics dataset consisting of proteins derived from humans, yeast and drosophila, in defined ratios, was generated in this study. Based on this dataset, DE analysis tools, including microarray- and RNA-seq-based ones, imputation algorithms and protein quantification methods were compared and benchmarked. Furthermore, applying these approaches to two public datasets showed that RNA-seq-based DE tools achieved higher accuracy (ACC) in identifying DEPs. This study provides useful guidelines for analyzing quantitative proteomics datasets. All the methods used in this study were integrated into the Perseus software, version 2.0.3.0, which is available at https://www.maxquant.org/perseus.
Collapse
Affiliation(s)
- Miao-Hsia Lin
- Graduate Institute and Department of Microbiology, College of Medicine, National Taiwan University, No.1 Jen Ai road section 1 Taipei 100 Taiwan
| | - Pei-Shan Wu
- Genome and Systems Biology Degree Program, College of Life Science, National Taiwan University, Taipei, Taiwan
| | - Tzu-Hsuan Wong
- Graduate Institute and Department of Microbiology, College of Medicine, National Taiwan University, No.1 Jen Ai road section 1 Taipei 100 Taiwan
| | - I-Ying Lin
- Graduate Institute and Department of Microbiology, College of Medicine, National Taiwan University, No.1 Jen Ai road section 1 Taipei 100 Taiwan
| | - Johnathan Lin
- Institute of Precision Medicine, National Sun Yat-set University, No.70 Lien-hai Rd., Kaohsiung 80424, Taiwan
| | - Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried, Germany
| | - Sung-Huan Yu
- Institute of Precision Medicine, National Sun Yat-set University, No.70 Lien-hai Rd., Kaohsiung 80424, Taiwan
| |
Collapse
|
18
|
Suomi T, Elo LL. Statistical and machine learning methods to study human CD4+ T cell proteome profiles. Immunol Lett 2022; 245:8-17. [DOI: 10.1016/j.imlet.2022.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/11/2022] [Accepted: 03/15/2022] [Indexed: 11/05/2022]
|
19
|
Yang Y, Cheng J, Wang S, Yang H. StatsPro: Systematic integration and evaluation of statistical approaches for detecting differential expression in label-free quantitative proteomics. J Proteomics 2022; 250:104386. [PMID: 34600153 DOI: 10.1016/j.jprot.2021.104386] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 09/16/2021] [Accepted: 09/21/2021] [Indexed: 02/08/2023]
Abstract
Quantitative label-free mass spectrometry (MS) is an increasingly powerful technology for profiling thousands of proteins from complex biological samples. One of the primary goals of analyses performed on such proteomics data is to detect differentially expressed proteins (DEPs) under different experimental conditions. Many statistical methods have been developed and assessed for DEP detection in various proteomics studies. However, it remains a challenge for many proteomics scientists to choose an appropriate statistical procedure. Therefore, in this study, we organized 12 common testing algorithms and 6 P-value combination methods and further provided Cohen's d effect size for every protein and three evaluation criteria to help proteomics scientists investigate their influence on DEP detection in a systematic manner. To promote the widespread use of these methods, we developed a user-friendly web tool, StatsPro, and presented two case studies involving label-free quantitative proteomics data obtained using data-dependent acquisition and data-independent acquisition to illustrate its practicability. This tool is freely available in our GitHub repository (https://github.com/YanglabWCH/StatsPro/). SIGNIFICANCE: One of the primary goals of analyses performed on liquid chromatography-mass spectrometry (LC-MS) based proteomics data is to detect differentially expressed proteins (DEPs) under different experimental conditions. Despite of many research efforts have been proposed to detect DEPs, to date, there is a scarcity of efficient, systematic, and easy-to-handle tools that are tailored for proteomics scientists to choose an appropriate statistical procedure. Herein, we present a new tool, StatsPro, to enable implementation and evaluation of different statistical methods for proteomics scientists. This tool has two significant advances compared to existing software: a) It integrates up to 18 common statistical approaches (12 statistical tests and 6 P-value combination strategies) and performs Cohen's d effect size systematically for users, moreover, it provides a web-based interface and can be quite conveniently operated by users, even those with less profound computational background. b) It supports three performance evaluation criteria (e.g. number of DEPs, correlation coefficient between P-values and effect sizes, Area under the ROC curve) for users to review the final statistical results, which may guide the method selection for DEPs detection.
Collapse
Affiliation(s)
- Yin Yang
- Department of Clinical Research Management, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, China; Institutes for Systems Genetics and NHC Key Lab of Transplant Engineering and Immunology, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Jingqiu Cheng
- Institutes for Systems Genetics and NHC Key Lab of Transplant Engineering and Immunology, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Shisheng Wang
- Institutes for Systems Genetics and NHC Key Lab of Transplant Engineering and Immunology, West China Hospital, Sichuan University, Chengdu 610041, China; Sichuan Provincial Engineering Laboratory of Pathology in Clinical Application, West China Hospital, Sichuan University, Chengdu 610041, China.
| | - Hao Yang
- Institutes for Systems Genetics and NHC Key Lab of Transplant Engineering and Immunology, West China Hospital, Sichuan University, Chengdu 610041, China; Sichuan Provincial Engineering Laboratory of Pathology in Clinical Application, West China Hospital, Sichuan University, Chengdu 610041, China.
| |
Collapse
|
20
|
Faux T, Rytkönen KT, Mahmoudian M, Paulin N, Junttila S, Laiho A, Elo LL. Differential ATAC-seq and ChIP-seq peak detection using ROTS. NAR Genom Bioinform 2021; 3:lqab059. [PMID: 34235431 PMCID: PMC8253552 DOI: 10.1093/nargab/lqab059] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 05/12/2021] [Accepted: 06/11/2021] [Indexed: 12/30/2022] Open
Abstract
Changes in cellular chromatin states fine-tune transcriptional output and ultimately lead to phenotypic changes. Here we propose a novel application of our reproducibility-optimized test statistics (ROTS) to detect differential chromatin states (ATAC-seq) or differential chromatin modification states (ChIP-seq) between conditions. We compare the performance of ROTS to existing and widely used methods for ATAC-seq and ChIP-seq data using both synthetic and real datasets. Our results show that ROTS outperformed other commonly used methods when analyzing ATAC-seq data. ROTS also displayed the most accurate detection of small differences when modeling with synthetic data. We observed that two-step methods that require the use of a separate peak caller often more accurately called enrichment borders, whereas one-step methods without a separate peak calling step were more versatile in calling sub-peaks. The top ranked differential regions detected by the methods had marked correlation with transcriptional differences of the closest genes. Overall, our study provides evidence that ROTS is a useful addition to the available differential peak detection methods to study chromatin and performs especially well when applied to study differential chromatin states in ATAC-seq data.
Collapse
Affiliation(s)
- Thomas Faux
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, 20520, Turku, Finland
| | - Kalle T Rytkönen
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, 20520, Turku, Finland
- Institute of Biomedicine, University of Turku, Kiinamyllynkatu 10, 20014, Finland
| | - Mehrad Mahmoudian
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, 20520, Turku, Finland
- Department of Future Technologies, University of Turku, FI-20014 Turku, Finland
| | - Niklas Paulin
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, 20520, Turku, Finland
| | - Sini Junttila
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, 20520, Turku, Finland
| | - Asta Laiho
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, 20520, Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, 20520, Turku, Finland
- Institute of Biomedicine, University of Turku, Kiinamyllynkatu 10, 20014, Finland
| |
Collapse
|
21
|
Tang J, Fu J, Wang Y, Li B, Li Y, Yang Q, Cui X, Hong J, Li X, Chen Y, Xue W, Zhu F. ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies. Brief Bioinform 2021; 21:621-636. [PMID: 30649171 PMCID: PMC7299298 DOI: 10.1093/bib/bby127] [Citation(s) in RCA: 131] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Revised: 11/19/2018] [Accepted: 12/06/2018] [Indexed: 12/13/2022] Open
Abstract
Label-free quantification (LFQ) with a specific and sequentially integrated workflow of acquisition technique, quantification tool and processing method has emerged as the popular technique employed in metaproteomic research to provide a comprehensive landscape of the adaptive response of microbes to external stimuli and their interactions with other organisms or host cells. The performance of a specific LFQ workflow is highly dependent on the studied data. Hence, it is essential to discover the most appropriate one for a specific data set. However, it is challenging to perform such discovery due to the large number of possible workflows and the multifaceted nature of the evaluation criteria. Herein, a web server ANPELA (https://idrblab.org/anpela/) was developed and validated as the first tool enabling performance assessment of whole LFQ workflow (collective assessment by five well-established criteria with distinct underlying theories), and it enabled the identification of the optimal LFQ workflow(s) by a comprehensive performance ranking. ANPELA not only automatically detects the diverse formats of data generated by all quantification tools but also provides the most complete set of processing methods among the available web servers and stand-alone tools. Systematic validation using metaproteomic benchmarks revealed ANPELA's capabilities in 1 discovering well-performing workflow(s), (2) enabling assessment from multiple perspectives and (3) validating LFQ accuracy using spiked proteins. ANPELA has a unique ability to evaluate the performance of whole LFQ workflow and enables the discovery of the optimal LFQs by the comprehensive performance ranking of all 560 workflows. Therefore, it has great potential for applications in metaproteomic and other studies requiring LFQ techniques, as many features are shared among proteomic studies.
Collapse
Affiliation(s)
- Jing Tang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Bo Li
- School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Yinghong Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Qingxia Yang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Xuejiao Cui
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Jiajun Hong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Xiaofeng Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Yuzong Chen
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Singapore, Singapore
| | - Weiwei Xue
- School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| |
Collapse
|
22
|
Svecla M, Garrone G, Faré F, Aletti G, Norata GD, Beretta G. DDASSQ: an open-source, multiple peptide sequencing strategy for label free quantification based on an OpenMS pipeline in the KNIME analytics platform. Proteomics 2021; 21:e2000319. [PMID: 34312990 PMCID: PMC8459258 DOI: 10.1002/pmic.202000319] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 07/08/2021] [Accepted: 07/12/2021] [Indexed: 11/16/2022]
Abstract
In this study we investigated the performance of a computational pipeline for protein identification and label free quantification (LFQ) of LC–MS/MS data sets from experimental animal tissue samples, as well as the impact of its specific peptide search combinatorial approach. The full pipeline workflow was composed of peptide search engine adapters based on different identification algorithms, in the frame of the open‐source OpenMS software running within the KNIME analytics platform. Two different in silico tryptic digestion, database‐search assisted approaches (X!Tandem and MS‐GF+), de novo peptide sequencing based on Novor and consensus library search (SpectraST), were tested for the processing of LC‐MS/MS raw data files obtained from proteomic LC‐MS experiments done on proteolytic extracts from mouse ex vivo liver samples. The results from proteomic LFQ were compared to those based on the application of the two software tools MaxQuant and Proteome Discoverer for protein inference and label‐free data analysis in shotgun proteomics. Data are available via ProteomeXchange with identifier PXD025097.
Collapse
Affiliation(s)
- Monika Svecla
- Department of Excellence of Pharmacological and Biomolecular Sciences, University of Milan, Milan, Italy
| | | | | | - Giacomo Aletti
- Department of Environmental Science and Policy, University of Milan, Milan, Italy
| | - Giuseppe Danilo Norata
- Department of Excellence of Pharmacological and Biomolecular Sciences, University of Milan, Milan, Italy.,Centro Studio Aterosclerosi, Bassini Hospital, Cinisello Balsamo, Milan, Italy
| | - Giangiacomo Beretta
- Department of Environmental Science and Policy, University of Milan, Milan, Italy
| |
Collapse
|
23
|
Egert J, Brombacher E, Warscheid B, Kreutz C. DIMA: Data-Driven Selection of an Imputation Algorithm. J Proteome Res 2021; 20:3489-3496. [PMID: 34062065 DOI: 10.1021/acs.jproteome.1c00119] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Imputation is a prominent strategy when dealing with missing values (MVs) in proteomics data analysis pipelines. However, it is difficult to assess the performance of different imputation methods and varies strongly depending on data characteristics. To overcome this issue, we present the concept of a data-driven selection of an imputation algorithm (DIMA). The performance and broad applicability of DIMA are demonstrated on 142 quantitative proteomics data sets from the PRoteomics IDEntifications (PRIDE) database and on simulated data consisting of 5-50% MVs with different proportions of missing not at random and missing completely at random values. DIMA reliably suggests a high-performing imputation algorithm, which is always among the three best algorithms and results in a root mean square error difference (ΔRMSE) ≤ 10% in 80% of the cases. DIMA implementation is available in MATLAB at github.com/kreutz-lab/OmicsData and in R at github.com/kreutz-lab/DIMAR.
Collapse
Affiliation(s)
- Janine Egert
- Institute of Medical Biometry and Statistics (IMBI), Institute of Medicine and Medical Center Freiburg, 79104 Freiburg im Breisgau, Germany.,Centre for Integrative Biological Signalling Studies (CIBSS), Albert-Ludwigs-Universität Freiburg, 79104 Freiburg, Germany
| | - Eva Brombacher
- Institute of Medical Biometry and Statistics (IMBI), Institute of Medicine and Medical Center Freiburg, 79104 Freiburg im Breisgau, Germany.,Centre for Integrative Biological Signalling Studies (CIBSS), Albert-Ludwigs-Universität Freiburg, 79104 Freiburg, Germany.,Spemann Graduate School of Biology and Medicine (SGBM), Albert-Ludwigs-Universität Freiburg, 79104 Freiburg, Germany.,Faculty of Biology, Albert-Ludwigs-Universität Freiburg, 79104 Freiburg im Breisgau, Germany
| | - Bettina Warscheid
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, Albert-Ludwigs-Universität Freiburg, 79104 Freiburg im Breisgau, Germany.,Signalling Research Centres BIOSS and CIBSS, Albert-Ludwigs-Universität Freiburg, 79104 Freiburg im Breisgau, Germany
| | - Clemens Kreutz
- Institute of Medical Biometry and Statistics (IMBI), Institute of Medicine and Medical Center Freiburg, 79104 Freiburg im Breisgau, Germany.,Signalling Research Centres BIOSS and CIBSS, Albert-Ludwigs-Universität Freiburg, 79104 Freiburg im Breisgau, Germany.,Center for Data Analysis and Modeling (FDM), Albert-Ludwigs-Universität Freiburg, 79104 Freiburg im Breisgau, Germany
| |
Collapse
|
24
|
Dowell JA, Wright LJ, Armstrong EA, Denu JM. Benchmarking Quantitative Performance in Label-Free Proteomics. ACS OMEGA 2021; 6:2494-2504. [PMID: 33553868 PMCID: PMC7859943 DOI: 10.1021/acsomega.0c04030] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 01/11/2021] [Indexed: 05/07/2023]
Abstract
Previous benchmarking studies have demonstrated the importance of instrument acquisition methodology and statistical analysis on quantitative performance in label-free proteomics. However, the effects of these parameters in combination with replicate number and false discovery rate (FDR) corrections are not known. Using a benchmarking standard, we systematically evaluated the combined impact of acquisition methodology, replicate number, statistical approach, and FDR corrections. These analyses reveal a complex interaction between these parameters that greatly impacts the quantitative fidelity of protein- and peptide-level quantification. At a high replicate number (n = 8), both data-dependent acquisition (DDA) and data-independent acquisition (DIA) methodologies yield accurate protein quantification across statistical approaches. However, at a low replicate number (n = 4), only DIA in combination with linear models for microarrays (LIMMA) and reproducibility-optimized test statistic (ROTS) produced a high level of quantitative fidelity. Quantitative accuracy at low replicates is also greatly impacted by FDR corrections, with Benjamini-Hochberg and Storey corrections yielding variable true positive rates for DDA workflows. For peptide quantification, replicate number and acquisition methodology are even more critical. A higher number of replicates in combination with DIA and LIMMA produce high quantitative fidelity, while DDA performs poorly regardless of replicate number or statistical approach. These results underscore the importance of pairing instrument acquisition methodology with the appropriate replicate number and statistical approach for optimal quantification performance.
Collapse
Affiliation(s)
- James A. Dowell
- Wisconsin
Institute for Discovery, University of Wisconsin−Madison, 330 North Orchard Street, Madison, Wisconsin 53715, United States
| | - Logan J. Wright
- Wisconsin
Institute for Discovery, University of Wisconsin−Madison, 330 North Orchard Street, Madison, Wisconsin 53715, United States
| | - Eric A. Armstrong
- Wisconsin
Institute for Discovery, University of Wisconsin−Madison, 330 North Orchard Street, Madison, Wisconsin 53715, United States
| | - John M. Denu
- Wisconsin
Institute for Discovery, University of Wisconsin−Madison, 330 North Orchard Street, Madison, Wisconsin 53715, United States
- Department
of Biomolecular Chemistry, University of
Wisconsin−Madison, 420 Henry Mall Room 1135 Biochemistry Building, Madison, Wisconsin 53706, United States
- .
| |
Collapse
|
25
|
Barger PC, Liles MR, Beck BH, Newton JC. Differential production and secretion of potentially toxigenic extracellular proteins from hypervirulent Aeromonas hydrophila under biofilm and planktonic culture. BMC Microbiol 2021; 21:8. [PMID: 33407117 PMCID: PMC7788984 DOI: 10.1186/s12866-020-02065-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 12/07/2020] [Indexed: 12/17/2022] Open
Abstract
Background Hypervirulent Aeromonas hydrophila (vAh) is an emerging pathogen in freshwater aquaculture that results in the loss of over 3 million pounds of marketable channel catfish, Ictalurus punctatus, and channel catfish hybrids (I. punctatus, ♀ x blue catfish, I. furcatus, ♂) each year from freshwater catfish production systems in Alabama, U.S.A. vAh isolates are clonal in nature and are genetically unique from, and significantly more virulent than, traditional A. hydrophila isolates from fish. Even with the increased virulence, natural infections cannot be reproduced in aquaria challenges making it difficult to determine modes of infection and the pathophysiology behind the devastating mortalities that are commonly observed. Despite the intimate connection between environmental adaptation and plastic response, the role of environmental adaption on vAh pathogenicity and virulence has not been previously explored. In this study, secreted proteins of vAh cultured as free-living planktonic cells and within a biofilm were compared to elucidate the role of biofilm growth on virulence. Results Functional proteolytic assays found significantly increased degradative activity in biofilm secretomes; in contrast, planktonic secretomes had significantly increased hemolytic activity, suggesting higher toxigenic potential. Intramuscular injection challenges in a channel catfish model showed that in vitro degradative activity translated into in vivo tissue destruction. Identification of secreted proteins by HPLC-MS/MS revealed the presence of many putative virulence proteins under both growth conditions. Biofilm grown vAh produced higher levels of proteolytic enzymes and adhesins, whereas planktonically grown cells secreted higher levels of toxins, porins, and fimbrial proteins. Conclusions This study is the first comparison of the secreted proteomes of vAh when grown in two distinct ecological niches. These data on the adaptive physiological response of vAh based on growth condition increase our understanding of how environmental niche partitioning could affect vAh pathogenicity and virulence. Increased secretion of colonization factors and degradative enzymes during biofilm growth and residency may increase bacterial attachment and host invasiveness, while increased secretion of hemolysins, porins, and other potential toxins under planktonic growth (or after host invasion) could result in increased host mortality. The results of this research underscore the need to use culture methods that more closely mimic natural ecological habitat growth to improve our understanding of vAh pathogenesis. Supplementary Information The online version contains supplementary material available at 10.1186/s12866-020-02065-2.
Collapse
Affiliation(s)
- Priscilla C Barger
- Department of Pathobiology, College of Veterinary Medicine, Auburn University, Auburn, AL, USA. .,Biological Sciences, College of Sciences and Math, Auburn University, Auburn, AL, USA.
| | - Mark R Liles
- Biological Sciences, College of Sciences and Math, Auburn University, Auburn, AL, USA
| | - Benjamin H Beck
- USDA ARS Aquatic Animal Health Research Unit, Auburn, AL, USA
| | - Joseph C Newton
- Department of Pathobiology, College of Veterinary Medicine, Auburn University, Auburn, AL, USA.
| |
Collapse
|
26
|
Yadav P, Pandey VK, Shankar BS. Proteomic analysis of radio-resistant breast cancer xenografts: Increased TGF-β signaling and metabolism. Cell Biol Int 2020; 45:804-819. [PMID: 33325135 DOI: 10.1002/cbin.11525] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 11/16/2020] [Accepted: 12/13/2020] [Indexed: 12/11/2022]
Abstract
Our previous studies have shown that MCF-7 breast cancer cell line exposed to 6 Gy and allowed to recover for 7 days (D7-6G) developed radio-resistance. In this study, we have tested the ability of these cells to form tumors in severe combined immunodeficiency (SCID) mice and characterized these tumors by proteomic analyses. Untreated (MCF-C) and D7-6G cells (MCF-R) were injected s.c. in SCID mice and tumor growth monitored. On Day 18, the mice were killed and tumor tissues were fixed in formalin or RNA later. Expression of genes was assessed by reverse transcription-polymerase chain reaction and proteins by enzyme-linked immunosorbent assay/antibody labeling and flow cytometry. Label free proteomic analyses was carried out by liquid chromatography-mass spectrometry. Metabolic analysis was carried out using Seahorse analyzer. MCF-R cells had a shorter latency and formed larger tumors. These tumors were characterized by an increased expression of transforming growth factor β (TGF-β) isoforms; its downstream genes pSMAD3, Snail-1, Zeb-1, HMGA2; hybrid epithelial/mesenchymal phenotype; migration, enrichment of cancer stem cells and radioresistance following challenge dose of radiation. Proteomic analysis of MCF-7R tumors resulted in identification of a total of 649 differentially expressed proteins and pathway analyses using protein annotation through evolutionary relationship indicated enrichment of genes involved in metabolism. Data are available via ProteomeXchange with identifier PXD022506. Seahorse analyzer confirmed increased metabolism in these cells with increased oxidative phosphorylation as well as glycolysis. Increased uptake of 2-NBDG further confirmed increased glycolysis. In summary, we demonstrate that radioresistant breast cancer cells had an enrichment of TGF-β signaling and increased metabolism.
Collapse
Affiliation(s)
- Poonam Yadav
- Radiation Biology & Health Sciences Division, Bio-Science Group, Bhabha Atomic Research Center, Mumbai, Maharastra, India.,Department of Life Sciences, Homi Bhabha National Institute, Mumbai, Maharastra, India
| | - Vipul K Pandey
- Radiation Biology & Health Sciences Division, Bio-Science Group, Bhabha Atomic Research Center, Mumbai, Maharastra, India
| | - Bhavani S Shankar
- Radiation Biology & Health Sciences Division, Bio-Science Group, Bhabha Atomic Research Center, Mumbai, Maharastra, India.,Department of Life Sciences, Homi Bhabha National Institute, Mumbai, Maharastra, India
| |
Collapse
|
27
|
Sperk M, van Domselaar R, Rodriguez JE, Mikaeloff F, Sá Vinhas B, Saccon E, Sönnerborg A, Singh K, Gupta S, Végvári Á, Neogi U. Utility of Proteomics in Emerging and Re-Emerging Infectious Diseases Caused by RNA Viruses. J Proteome Res 2020; 19:4259-4274. [PMID: 33095583 PMCID: PMC7640957 DOI: 10.1021/acs.jproteome.0c00380] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Indexed: 12/21/2022]
Abstract
Emerging and re-emerging infectious diseases due to RNA viruses cause major negative consequences for the quality of life, public health, and overall economic development. Most of the RNA viruses causing illnesses in humans are of zoonotic origin. Zoonotic viruses can directly be transferred from animals to humans through adaptation, followed by human-to-human transmission, such as in human immunodeficiency virus (HIV), severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV), and, more recently, SARS coronavirus 2 (SARS-CoV-2), or they can be transferred through insects or vectors, as in the case of Crimean-Congo hemorrhagic fever virus (CCHFV), Zika virus (ZIKV), and dengue virus (DENV). At the present, there are no vaccines or antiviral compounds against most of these viruses. Because proteins possess a vast array of functions in all known biological systems, proteomics-based strategies can provide important insights into the investigation of disease pathogenesis and the identification of promising antiviral drug targets during an epidemic or pandemic. Mass spectrometry technology has provided the capacity required for the precise identification and the sensitive and high-throughput analysis of proteins on a large scale and has contributed greatly to unravelling key protein-protein interactions, discovering signaling networks, and understanding disease mechanisms. In this Review, we present an account of quantitative proteomics and its application in some prominent recent examples of emerging and re-emerging RNA virus diseases like HIV-1, CCHFV, ZIKV, and DENV, with more detail with respect to coronaviruses (MERS-CoV and SARS-CoV) as well as the recent SARS-CoV-2 pandemic.
Collapse
Affiliation(s)
- Maike Sperk
- Division
of Clinical Microbiology, Department of Laboratory Medicine, Karolinska Institute, ANA Futura, Campus Flemingsberg, Stockholm 14152, Sweden
| | - Robert van Domselaar
- Division
of Infectious Diseases, Department of Medicine Huddinge, Karolinska Institute, ANA Futura, Campus Flemingsberg, Stockholm 14152, Sweden
| | - Jimmy Esneider Rodriguez
- Division
of Chemistry I, Department of Medical Biochemistry and Biophysics, Karolinska Institute, Stockholm 14152 Sweden
| | - Flora Mikaeloff
- Division
of Clinical Microbiology, Department of Laboratory Medicine, Karolinska Institute, ANA Futura, Campus Flemingsberg, Stockholm 14152, Sweden
| | - Beatriz Sá Vinhas
- Division
of Clinical Microbiology, Department of Laboratory Medicine, Karolinska Institute, ANA Futura, Campus Flemingsberg, Stockholm 14152, Sweden
| | - Elisa Saccon
- Division
of Clinical Microbiology, Department of Laboratory Medicine, Karolinska Institute, ANA Futura, Campus Flemingsberg, Stockholm 14152, Sweden
| | - Anders Sönnerborg
- Division
of Clinical Microbiology, Department of Laboratory Medicine, Karolinska Institute, ANA Futura, Campus Flemingsberg, Stockholm 14152, Sweden
- Division
of Infectious Diseases, Department of Medicine Huddinge, Karolinska Institute, ANA Futura, Campus Flemingsberg, Stockholm 14152, Sweden
| | - Kamal Singh
- Department
of Molecular Microbiology and Immunology and the Bond Life Science
Center, University of Missouri, Columbia, Missouri 65211, United States
| | - Soham Gupta
- Division
of Clinical Microbiology, Department of Laboratory Medicine, Karolinska Institute, ANA Futura, Campus Flemingsberg, Stockholm 14152, Sweden
| | - Ákos Végvári
- Division
of Chemistry I, Department of Medical Biochemistry and Biophysics, Karolinska Institute, Stockholm 14152 Sweden
| | - Ujjwal Neogi
- Division
of Clinical Microbiology, Department of Laboratory Medicine, Karolinska Institute, ANA Futura, Campus Flemingsberg, Stockholm 14152, Sweden
- Department
of Molecular Microbiology and Immunology and the Bond Life Science
Center, University of Missouri, Columbia, Missouri 65211, United States
| |
Collapse
|
28
|
Kraus M, Mathew Stephen M, Schapranow MP. Eatomics: Shiny Exploration of Quantitative Proteomics Data. J Proteome Res 2020; 20:1070-1078. [PMID: 32954734 DOI: 10.1021/acs.jproteome.0c00398] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Quantitative proteomics data are becoming increasingly more available, and as a consequence are being analyzed and interpreted by a larger group of users. However, many of these users have less programming experience. Furthermore, experimental designs and setups are getting more complicated, especially when tissue biopsies are analyzed. Luckily, the proteomics community has already established some best practices on how to conduct quality control, differential abundance analysis and enrichment analysis. However, an easy-to-use application that wraps together all steps for the exploration and flexible analysis of quantitative proteomics data is not yet available. For Eatomics, we utilize the R Shiny framework to implement carefully chosen parts of established analysis workflows to (i) make them accessible in a user-friendly way, (ii) add a multitude of interactive exploration possibilities, and (iii) develop a unique experimental design setup module, which interactively translates a given research hypothesis into a differential abundance and enrichment analysis formula. In this, we aim to fulfill the needs of a growing group of inexperienced quantitative proteomics data analysts. Eatomics may be tested with demo data directly online via https://we.analyzegenomes.com/now/eatomics/ or with the user's own data by installation from the Github repository at https://github.com/Millchmaedchen/Eatomics.
Collapse
Affiliation(s)
- Milena Kraus
- Digital Health Center, Hasso Plattner Institute, University of Potsdam, 14482 Potsdam, Germany
| | - Mariet Mathew Stephen
- Digital Health Center, Hasso Plattner Institute, University of Potsdam, 14482 Potsdam, Germany
| | - Matthieu-P Schapranow
- Digital Health Center, Hasso Plattner Institute, University of Potsdam, 14482 Potsdam, Germany
| |
Collapse
|
29
|
Tang J, Mou M, Wang Y, Luo Y, Zhu F. MetaFS: Performance assessment of biomarker discovery in metaproteomics. Brief Bioinform 2020; 22:5854399. [PMID: 32510556 DOI: 10.1093/bib/bbaa105] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 04/17/2020] [Accepted: 05/05/2020] [Indexed: 12/19/2022] Open
Abstract
Metaproteomics suffers from the issues of dimensionality and sparsity. Data reduction methods can maximally identify the relevant subset of significant differential features and reduce data redundancy. Feature selection (FS) methods were applied to obtain the significant differential subset. So far, a variety of feature selection methods have been developed for metaproteomic study. However, due to FS's performance depended heavily on the data characteristics of a given research, the well-suitable feature selection method must be carefully selected to obtain the reproducible differential proteins. Moreover, it is critical to evaluate the performance of each FS method according to comprehensive criteria, because the single criterion is not sufficient to reflect the overall performance of the FS method. Therefore, we developed an online tool named MetaFS, which provided 13 types of FS methods and conducted the comprehensive evaluation on the complex FS methods using four widely accepted and independent criteria. Furthermore, the function and reliability of MetaFS were systematically tested and validated via two case studies. In sum, MetaFS could be a distinguished tool for discovering the overall well-performed FS method for selecting the potential biomarkers in microbiome studies. The online tool is freely available at https://idrblab.org/metafs/.
Collapse
|
30
|
Zhu Y, Orre LM, Zhou Tran Y, Mermelekas G, Johansson HJ, Malyutina A, Anders S, Lehtiö J. DEqMS: A Method for Accurate Variance Estimation in Differential Protein Expression Analysis. Mol Cell Proteomics 2020; 19:1047-1057. [PMID: 32205417 PMCID: PMC7261819 DOI: 10.1074/mcp.tir119.001646] [Citation(s) in RCA: 99] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 03/20/2020] [Indexed: 12/19/2022] Open
Abstract
Quantitative proteomics by mass spectrometry is widely used in biomarker research and basic biology research for investigation of phenotype level cellular events. Despite the wide application, the methodology for statistical analysis of differentially expressed proteins has not been unified. Various methods such as t test, linear model and mixed effect models are used to define changes in proteomics experiments. However, none of these methods consider the specific structure of MS-data. Choices between methods, often originally developed for other types of data, are based on compromises between features such as statistical power, general applicability and user friendliness. Furthermore, whether to include proteins identified with one peptide in statistical analysis of differential protein expression varies between studies. Here we present DEqMS, a robust statistical method developed specifically for differential protein expression analysis in mass spectrometry data. In all data sets investigated there is a clear dependence of variance on the number of PSMs or peptides used for protein quantification. DEqMS takes this feature into account when assessing differential protein expression. This allows for a more accurate data-dependent estimation of protein variance and inclusion of single peptide identifications without increasing false discoveries. The method was tested in several data sets including E. coli proteome spike-in data, using both label-free and TMT-labeled quantification. Compared with previous statistical methods used in quantitative proteomics, DEqMS showed consistently better accuracy in detecting altered protein levels compared with other statistical methods in both label-free and labeled quantitative proteomics data. DEqMS is available as an R package in Bioconductor.
Collapse
Affiliation(s)
- Yafeng Zhu
- Department of Oncology-Pathology, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| | - Lukas M Orre
- Department of Oncology-Pathology, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| | - Yan Zhou Tran
- Department of Oncology-Pathology, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| | - Georgios Mermelekas
- Department of Oncology-Pathology, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| | - Henrik J Johansson
- Department of Oncology-Pathology, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| | - Alina Malyutina
- Institute for Molecular Medicine, University of Helsinki, Helsinki, Finland
| | - Simon Anders
- Centre for Molecular Biology of Heidelberg University (ZMBH), Heidelberg, Germany
| | - Janne Lehtiö
- Department of Oncology-Pathology, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden.
| |
Collapse
|
31
|
SWATH-MS analysis of cerebrospinal fluid to generate a robust battery of biomarkers for Alzheimer's disease. Sci Rep 2020; 10:7423. [PMID: 32366888 PMCID: PMC7198522 DOI: 10.1038/s41598-020-64461-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 04/16/2020] [Indexed: 12/14/2022] Open
Abstract
Cerebrospinal fluid (CSF) Aβ42 and tau protein levels are established diagnostic biomarkers of Alzheimer's disease (AD). However, their inadequacy to represent clinical efficacy in drug trials indicates the need for new biomarkers. Sequential window acquisition of all theoretical fragment ion spectra (SWATH)-based mass spectrometry (MS) is an advanced proteomic tool for large-scale, high-quality quantification. In this study, SWATH-MS showed that VGF, chromogranin-A, secretogranin-1, and opioid-binding protein/cell adhesion molecule were significantly decreased in 42 AD patients compared to 39 controls, whereas 14-3-3ζ was increased (FDR < 0.05). In addition, 16 other proteins showed substantial changes (FDR < 0.2). The expressions of the top 21 analytes were closely interconnected, but were poorly correlated with CSF Aβ42, tTau, and pTau181 levels. Logistic regression analysis and data mining were used to establish the best algorithm for AD, which created novel biomarker panels with high diagnostic value (AUC = 0.889 and 0.924) and a strong correlation with clinical severity (all p < 0.001). Targeted proteomics was used to validate their usefulness in a different cohort (n = 36) that included patients with other brain disorders (all p < 0.05). This study provides a list of proteins (and combinations thereof) that could serve as new AD biomarkers.
Collapse
|
32
|
A comparative proteomic study of plasma in Colombian childhood acute lymphoblastic leukemia. PLoS One 2019; 14:e0221509. [PMID: 31437251 PMCID: PMC6705836 DOI: 10.1371/journal.pone.0221509] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Accepted: 08/08/2019] [Indexed: 01/24/2023] Open
Abstract
Acute lymphoblastic leukemia (ALL) is the most common childhood cancer. Owing to the incorporation of risk-adapted therapy and the arrival of new directed agents, the cure rate and survival of patients with ALL have improved dramatically, get near to 90%. In Latin American countries, the mortality rates of ALL are high, for example in Colombia, during the last decade, ALL has been the most prevalent cancer among children between 0–14 years of age. In the face of this public health problem and coupled with the fact that the knowledge of the proteome of the child population is little, our investigation proposes the study of the plasma proteome of Colombian children diagnosed with B-cell ALL (B-ALL) to determine potential disease markers that could reflect processes altered by the presence of the disease or in response to it. A proteomic study by LC-MS/MS and quantification by label-free methods were performed in search of proteins differentially expressed between healthy children and those diagnosed with B-ALL. We quantified a total of 472 proteins in depleted blood plasma, and 25 of these proteins were differentially expressed (fold change >2, Bonferroni-adjusted P-values <0.05). Plasma Aggrecan core protein, alpha-2-HS-glycoprotein, coagulation factor XIII A chain and gelsolin protein were examined by ELISA assay and compared to shotgun proteomics results. Our data provide new information on the plasma proteome of Colombian children. Additionally, these proteins may also have certain potential as illness markers or as therapeutic targets in subsequent investigations.
Collapse
|
33
|
Tang J, Fu J, Wang Y, Luo Y, Yang Q, Li B, Tu G, Hong J, Cui X, Chen Y, Yao L, Xue W, Zhu F. Simultaneous Improvement in the Precision, Accuracy, and Robustness of Label-free Proteome Quantification by Optimizing Data Manipulation Chains. Mol Cell Proteomics 2019; 18:1683-1699. [PMID: 31097671 PMCID: PMC6682996 DOI: 10.1074/mcp.ra118.001169] [Citation(s) in RCA: 93] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 04/28/2019] [Indexed: 12/13/2022] Open
Abstract
The label-free proteome quantification (LFQ) is multistep workflow collectively defined by quantification tools and subsequent data manipulation methods that has been extensively applied in current biomedical, agricultural, and environmental studies. Despite recent advances, in-depth and high-quality quantification remains extremely challenging and requires the optimization of LFQs by comparatively evaluating their performance. However, the evaluation results using different criteria (precision, accuracy, and robustness) vary greatly, and the huge number of potential LFQs becomes one of the bottlenecks in comprehensively optimizing proteome quantification. In this study, a novel strategy, enabling the discovery of the LFQs of simultaneously enhanced performance from thousands of workflows (integrating 18 quantification tools with 3,128 manipulation chains), was therefore proposed. First, the feasibility of achieving simultaneous improvement in the precision, accuracy, and robustness of LFQ was systematically assessed by collectively optimizing its multistep manipulation chains. Second, based on a variety of benchmark datasets acquired by various quantification measurements of different modes of acquisition, this novel strategy successfully identified a number of manipulation chains that simultaneously improved the performance across multiple criteria. Finally, to further enhance proteome quantification and discover the LFQs of optimal performance, an online tool (https://idrblab.org/anpela/) enabling collective performance assessment (from multiple perspectives) of the entire LFQ workflow was developed. This study confirmed the feasibility of achieving simultaneous improvement in precision, accuracy, and robustness. The novel strategy proposed and validated in this study together with the online tool might provide useful guidance for the research field requiring the mass-spectrometry-based LFQ technique.
Collapse
Affiliation(s)
- Jing Tang
- ‡College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; §School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China; ¶Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China
| | - Jianbo Fu
- ‡College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunxia Wang
- ‡College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- ‡College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Qingxia Yang
- ‡College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; §School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Bo Li
- §School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Gao Tu
- ‡College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; §School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Jiajun Hong
- ‡College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xuejiao Cui
- §School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Yuzong Chen
- ‖Department of Pharmacy, National University of Singapore, Singapore 117543, Singapore
| | - Lixia Yao
- **Department of Health Sciences Research, Mayo Clinic, Rochester MN 55905, United States
| | - Weiwei Xue
- §School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Feng Zhu
- ‡College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; §School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China.
| |
Collapse
|
34
|
Liu HM, Yang D, Liu ZF, Hu SZ, Yan SH, He XW. Density distribution of gene expression profiles and evaluation of using maximal information coefficient to identify differentially expressed genes. PLoS One 2019; 14:e0219551. [PMID: 31314810 PMCID: PMC6636747 DOI: 10.1371/journal.pone.0219551] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Accepted: 06/26/2019] [Indexed: 12/12/2022] Open
Abstract
The hypothesis of data probability density distributions has many effects on the design of a new statistical method. Based on the analysis of a group of real gene expression profiles, this study reveal that the primary density distributions of the real profiles are normal/log-normal and t distributions, accounting for 80% and 19% respectively. According to these distributions, we generated a series of simulation data to make a more comprehensive assessment for a novel statistical method, maximal information coefficient (MIC). The results show that MIC is not only in the top tier in the overall performance of identifying differentially expressed genes, but also exhibits a better adaptability and an excellent noise immunity in comparison with the existing methods.
Collapse
Affiliation(s)
- Han-Ming Liu
- School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, China
- * E-mail:
| | - Dan Yang
- School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, China
| | - Zhao-Fa Liu
- School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, China
| | - Sheng-Zhou Hu
- School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, China
| | - Shen-Hai Yan
- School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, China
| | - Xian-Wen He
- School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, China
| |
Collapse
|
35
|
Barranger A, Langan LM, Sharma V, Rance GA, Aminot Y, Weston NJ, Akcha F, Moore MN, Arlt VM, Khlobystov AN, Readman JW, Jha AN. Antagonistic Interactions between Benzo[a]pyrene and Fullerene (C 60) in Toxicological Response of Marine Mussels. NANOMATERIALS (BASEL, SWITZERLAND) 2019; 9:E987. [PMID: 31288459 PMCID: PMC6669530 DOI: 10.3390/nano9070987] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 06/25/2019] [Accepted: 06/28/2019] [Indexed: 12/12/2022]
Abstract
This study aimed to assess the ecotoxicological effects of the interaction of fullerene (C60) and benzo[a]pyrene (B[a]P) on the marine mussel, Mytilus galloprovincialis. The uptake of nC60, B[a]P and mixtures of nC60 and B[a]P into tissues was confirmed by Gas Chromatography-Mass Spectrometry (GC-MS), Liquid Chromatography-High Resolution Mass Spectrometry (LC-HRMS) and Inductively Coupled Plasma Mass Spectrometer (ICP-MS). Biomarkers of DNA damage as well as proteomics analysis were applied to unravel the interactive effect of B[a]P and C60. Antagonistic responses were observed at the genotoxic and proteomic level. Differentially expressed proteins (DEPs) were only identified in the B[a]P single exposure and the B[a]P mixture exposure groups containing 1 mg/L of C60, the majority of which were downregulated (~52%). No DEPs were identified at any of the concentrations of nC60 (p < 0.05, 1% FDR). Using DEPs identified at a threshold of (p < 0.05; B[a]P and B[a]P mixture with nC60), gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis indicated that these proteins were enriched with a broad spectrum of biological processes and pathways, including those broadly associated with protein processing, cellular processes and environmental information processing. Among those significantly enriched pathways, the ribosome was consistently the top enriched term irrespective of treatment or concentration and plays an important role as the site of biological protein synthesis and translation. Our results demonstrate the complex multi-modal response to environmental stressors in M. galloprovincialis.
Collapse
Affiliation(s)
- Audrey Barranger
- School of Biological and Marine Sciences, University of Plymouth, Plymouth PL4 8AA, UK
| | - Laura M Langan
- School of Biological and Marine Sciences, University of Plymouth, Plymouth PL4 8AA, UK
| | - Vikram Sharma
- School of Biomedical Sciences, University of Plymouth, Plymouth PL4 8AA, UK
| | - Graham A Rance
- School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, UK
- Nanoscale and Microscale Research Centre, University of Nottingham, University Park, Nottingham NG7 2RD, UK
| | - Yann Aminot
- Centre for Chemical Sciences, University of Plymouth, Plymouth PL4 8AA, UK
| | - Nicola J Weston
- Nanoscale and Microscale Research Centre, University of Nottingham, University Park, Nottingham NG7 2RD, UK
| | - Farida Akcha
- Ifremer, Laboratory of Ecotoxicology, F-44311, CEDEX 03 Nantes, France
| | - Michael N Moore
- School of Biological and Marine Sciences, University of Plymouth, Plymouth PL4 8AA, UK
- Plymouth Marine Laboratory, Prospect Place, The Hoe, Plymouth PL1 3HD, UK
- European Centre for Environment & Human Health (ECEHH), University of Exeter Medical School, Knowledge Spa, Royal Cornwall Hospital, Cornwall TR1 3LJ, UK
| | - Volker M Arlt
- Department of Analytical, Environmental and Forensic Sciences, King's College London, MRC-PHE Centre for Environmental & Health, London SE1 9NH, UK
- NIHR Health Protection Research Unit in Health Impact of Environmental Hazards at King's College London in partnership with Public Health England and Imperial College London, London SE1 9NH, UK
| | - Andrei N Khlobystov
- School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, UK
- Nanoscale and Microscale Research Centre, University of Nottingham, University Park, Nottingham NG7 2RD, UK
| | - James W Readman
- Centre for Chemical Sciences, University of Plymouth, Plymouth PL4 8AA, UK
| | - Awadhesh N Jha
- School of Biological and Marine Sciences, University of Plymouth, Plymouth PL4 8AA, UK.
| |
Collapse
|
36
|
Välikangas T, Suomi T, Elo LL. A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation. Brief Bioinform 2019; 19:1344-1355. [PMID: 28575146 PMCID: PMC6291797 DOI: 10.1093/bib/bbx054] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Indexed: 01/15/2023] Open
Abstract
Label-free mass spectrometry (MS) has developed into an important tool applied in various fields of biological and life sciences. Several software exist to process the raw MS data into quantified protein abundances, including open source and commercial solutions. Each software includes a set of unique algorithms for different tasks of the MS data processing workflow. While many of these algorithms have been compared separately, a thorough and systematic evaluation of their overall performance is missing. Moreover, systematic information is lacking about the amount of missing values produced by the different proteomics software and the capabilities of different data imputation methods to account for them.In this study, we evaluated the performance of five popular quantitative label-free proteomics software workflows using four different spike-in data sets. Our extensive testing included the number of proteins quantified and the number of missing values produced by each workflow, the accuracy of detecting differential expression and logarithmic fold change and the effect of different imputation and filtering methods on the differential expression results. We found that the Progenesis software performed consistently well in the differential expression analysis and produced few missing values. The missing values produced by the other software decreased their performance, but this difference could be mitigated using proper data filtering or imputation methods. Among the imputation methods, we found that the local least squares (lls) regression imputation consistently increased the performance of the software in the differential expression analysis, and a combination of both data filtering and local least squares imputation increased performance the most in the tested data sets.
Collapse
Affiliation(s)
- Tommi Välikangas
- Computational Biomedicine Group, Turku Centre for Biotechnology Finland
| | - Tomi Suomi
- Computational Biomedicine research group at the Turku Centre for Biotechnology Finland
| | - Laura L Elo
- Biomathematics, Research Director in Bioinformatics and Group Leader in Computational Biomedicine at Turku Centre for Biotechnology, University of Turku, Finland
| |
Collapse
|
37
|
Tang J, Wang Y, Fu J, Zhou Y, Luo Y, Zhang Y, Li B, Yang Q, Xue W, Lou Y, Qiu Y, Zhu F. A critical assessment of the feature selection methods used for biomarker discovery in current metaproteomics studies. Brief Bioinform 2019; 21:1378-1390. [DOI: 10.1093/bib/bbz061] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 04/14/2019] [Indexed: 02/06/2023] Open
Abstract
Abstract
Microbial community (MC) has great impact on mediating complex disease indications, biogeochemical cycling and agricultural productivities, which makes metaproteomics powerful technique for quantifying diverse and dynamic composition of proteins or peptides. The key role of biostatistical strategies in MC study is reported to be underestimated, especially the appropriate application of feature selection method (FSM) is largely ignored. Although extensive efforts have been devoted to assessing the performance of FSMs, previous studies focused only on their classification accuracy without considering their ability to correctly and comprehensively identify the spiked proteins. In this study, the performances of 14 FSMs were comprehensively assessed based on two key criteria (both sample classification and spiked protein discovery) using a variety of metaproteomics benchmarks. First, the classification accuracies of those 14 FSMs were evaluated. Then, their abilities in identifying the proteins of different spiked concentrations were assessed. Finally, seven FSMs (FC, LMEB, OPLS-DA, PLS-DA, SAM, SVM-RFE and T-Test) were identified as performing consistently superior or good under both criteria with the PLS-DA performing consistently superior. In summary, this study served as comprehensive analysis on the performances of current FSMs and could provide a valuable guideline for researchers in metaproteomics.
Collapse
Affiliation(s)
- Jing Tang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- Department of Bioinformatics, Chongqing Medical University, Chongqing, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Ying Zhou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Ying Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Bo Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Qingxia Yang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Yan Lou
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yunqing Qiu
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| |
Collapse
|
38
|
Yang D, Liu H. Maximal information coefficient applied to differentially expressed genes identification: A feasibility study. Technol Health Care 2019; 27:249-262. [PMID: 31045544 PMCID: PMC6597975 DOI: 10.3233/thc-199024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
BACKGROUND: The main obstacle encountered in microarray technology is how to mine the valuable information under the profiles and study the genes function. OBJECTIVE: Maximal information coefficient (MIC) is a novel, non-parametric statistic that has been successfully applied to genome-wide association studies and differentially gene and miRNA expression analysis. However, the data used in these applications are not gold standard but real data. METHODS: Therefore, this study attempts to test the feasibility of MIC for differentially expressed gene identification with simulation data. RESULTS: Our experiments indicate that, MIC perfermance is better than Limma always, which is almost the same level of SAM, ROTS or DESeq2. However, the count of AUC < 0.5 of MIC is significantly smaller than the three methods, and MIC does not exhibit an abnormal phenomenon in which the AUC increases as the noise increases. CONCLUSIONS: Compared to the existing methods, our experiments show that MIC is not only in the first tier in identifying differentially expressed genes and noise immunity, but also shows better robustness and stronger data/environment adaptability.
Collapse
Affiliation(s)
| | - Hanming Liu
- Corresponding author: Hanming Liu, School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, Jiangxi 341000, China. E-mail:
| |
Collapse
|
39
|
Jun J, Gim J, Kim Y, Kim H, Yu SJ, Yeo I, Park J, Yoo JJ, Cho YY, Lee DH, Cho EJ, Lee JH, Kim YJ, Lee S, Yoon JH, Kim Y, Park T. Analysis of significant protein abundance from multiple reaction-monitoring data. BMC SYSTEMS BIOLOGY 2018; 12:123. [PMID: 30598095 PMCID: PMC6311902 DOI: 10.1186/s12918-018-0656-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Background Discovering reliable protein biomarkers is one of the most important issues in biomedical research. The ELISA is a traditional technique for accurate quantitation of well-known proteins. Recently, the multiple reaction-monitoring (MRM) mass spectrometry has been proposed for quantifying newly discovered protein and has become a popular alternative to ELISA. For the MRM data analysis, linear mixed modeling (LMM) has been used to analyze MRM data. MSstats is one of the most widely used tools for MRM data analysis that is based on the LMMs. However, LMMs often provide various significance results, depending on model specification. Sometimes it would be difficult to specify a correct LMM method for the analysis of MRM data. Here, we propose a new logistic regression-based method for Significance Analysis of Multiple Reaction Monitoring (LR-SAM). Results Through simulation studies, we demonstrate that LMM methods may not preserve type I error, thus yielding high false- positive errors, depending on how random effects are specified. Our simulation study also shows that the LR-SAM approach performs similarly well as LMM approaches, in most cases. However, LR-SAM performs better than the LMMs, particularly when the effects sizes of peptides from the same protein are heterogeneous. Our proposed method was applied to MRM data for identification of proteins associated with clinical responses of treatment of 115 hepatocellular carcinoma (HCC) patients with the tyrosine kinase inhibitor sorafenib. Of 124 candidate proteins, LMM approaches provided 6 results varying in significance, while LR-SAM, by contrast, yielded 18 significant results that were quite reproducibly consistent. Conclusion As exemplified by an application to HCC data set, LR-SAM more effectively identified proteins associated with clinical responses of treatment than LMM did.
Collapse
Affiliation(s)
- Jongsu Jun
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Jungsoo Gim
- Graduate School of Public Health, Seoul National University, Seoul, South Korea
| | - Yongkang Kim
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Hyunsoo Kim
- Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, South Korea.,Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University College of Medicine, Seoul, South Korea
| | - Su Jong Yu
- Department of Internal Medicine and Liver Research Institute, Seoul National University, Seoul, South Korea
| | - Injun Yeo
- Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, South Korea
| | - Jiyoung Park
- Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, South Korea
| | - Jeong-Ju Yoo
- Department of Internal Medicine and Liver Research Institute, Seoul National University, Seoul, South Korea
| | - Young Youn Cho
- Department of Internal Medicine and Liver Research Institute, Seoul National University, Seoul, South Korea
| | - Dong Hyeon Lee
- Department of Internal Medicine and Liver Research Institute, Seoul National University, Seoul, South Korea
| | - Eun Ju Cho
- Department of Internal Medicine and Liver Research Institute, Seoul National University, Seoul, South Korea
| | - Jeong-Hoon Lee
- Department of Internal Medicine and Liver Research Institute, Seoul National University, Seoul, South Korea
| | - Yoon Jun Kim
- Department of Internal Medicine and Liver Research Institute, Seoul National University, Seoul, South Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Seoul, South Korea
| | - Jung-Hwan Yoon
- Department of Internal Medicine and Liver Research Institute, Seoul National University, Seoul, South Korea
| | - Youngsoo Kim
- Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, South Korea.,Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University College of Medicine, Seoul, South Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, South Korea. .,Interdisciplinary program in Bioinformatics, Seoul National University, Seoul, South Korea.
| |
Collapse
|
40
|
Välikangas T, Suomi T, Elo LL. A systematic evaluation of normalization methods in quantitative label-free proteomics. Brief Bioinform 2018; 19:1-11. [PMID: 27694351 PMCID: PMC5862339 DOI: 10.1093/bib/bbw095] [Citation(s) in RCA: 138] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Indexed: 12/25/2022] Open
Abstract
To date, mass spectrometry (MS) data remain inherently biased as a result of reasons ranging from sample handling to differences caused by the instrumentation. Normalization is the process that aims to account for the bias and make samples more comparable. The selection of a proper normalization method is a pivotal task for the reliability of the downstream analysis and results. Many normalization methods commonly used in proteomics have been adapted from the DNA microarray techniques. Previous studies comparing normalization methods in proteomics have focused mainly on intragroup variation. In this study, several popular and widely used normalization methods representing different strategies in normalization are evaluated using three spike-in and one experimental mouse label-free proteomic data sets. The normalization methods are evaluated in terms of their ability to reduce variation between technical replicates, their effect on differential expression analysis and their effect on the estimation of logarithmic fold changes. Additionally, we examined whether normalizing the whole data globally or in segments for the differential expression analysis has an effect on the performance of the normalization methods. We found that variance stabilization normalization (Vsn) reduced variation the most between technical replicates in all examined data sets. Vsn also performed consistently well in the differential expression analysis. Linear regression normalization and local regression normalization performed also systematically well. Finally, we discuss the choice of a normalization method and some qualities of a suitable normalization method in the light of the results of our evaluation.
Collapse
Affiliation(s)
- Tommi Välikangas
- Computational Biomedicine Group at the Turku Centre for Biotechnology Finland
| | - Tomi Suomi
- Computational Biomedicine research group at the Turku Centre for Biotechnology Finland
| | - Laura L Elo
- Computational Biomedicine at Turku Centre for Biotechnology, University of Turku, Finland
- Corresponding author. Laura L. Elo, Turku Centre for Biotechnology, FI-20520 Turku, Finland. Tel.: +358-2-333-8009; Fax: +358-2-251 8808; E-mail:
| |
Collapse
|
41
|
Jaakkola MK, Seyednasrollah F, Mehmood A, Elo LL. Comparison of methods to detect differentially expressed genes between single-cell populations. Brief Bioinform 2017; 18:735-743. [PMID: 27373736 PMCID: PMC5862313 DOI: 10.1093/bib/bbw057] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Indexed: 01/20/2023] Open
Abstract
We compared five statistical methods to detect differentially expressed genes between two distinct single-cell populations. Currently, it remains unclear whether differential expression methods developed originally for conventional bulk RNA-seq data can also be applied to single-cell RNA-seq data analysis. Our results in three diverse comparison settings showed marked differences between the different methods in terms of the number of detections as well as their sensitivity and specificity. They, however, did not reveal systematic benefits of the currently available single-cell-specific methods. Instead, our previously introduced reproducibility-optimization method showed good performance in all comparison settings without any single-cell-specific modifications.
Collapse
Affiliation(s)
- Maria K Jaakkola
- Turku Centre of Biotechnology, University of Turku, Tykistökatu 6, Turku, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
- Corresponding author: Maria K. Jaakkola, Turku Centre for Biotechnology, and Department of Mathematics and Statistics, University of Turku, Turku FI-20014, Finland. Tel.: +358-2-333-8566; Fax: +358-2-231-0311; E-mail:
| | - Fatemeh Seyednasrollah
- Turku Centre of Biotechnology, University of Turku, Tykistökatu 6, Turku, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Arfa Mehmood
- Turku Centre of Biotechnology, University of Turku, Tykistökatu 6, Turku, Finland
| | - Laura L Elo
- Turku Centre of Biotechnology, University of Turku, Tykistökatu 6, Turku, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| |
Collapse
|
42
|
Enhanced differential expression statistics for data-independent acquisition proteomics. Sci Rep 2017; 7:5869. [PMID: 28724900 PMCID: PMC5517573 DOI: 10.1038/s41598-017-05949-y] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 06/07/2017] [Indexed: 01/28/2023] Open
Abstract
We describe a new reproducibility-optimization method ROPECA for statistical analysis of proteomics data with a specific focus on the emerging data-independent acquisition (DIA) mass spectrometry technology. ROPECA optimizes the reproducibility of statistical testing on peptide-level and aggregates the peptide-level changes to determine differential protein-level expression. Using a ‘gold standard’ spike-in data and a hybrid proteome benchmark data we show the competitive performance of ROPECA over conventional protein-based analysis as well as state-of-the-art peptide-based tools especially in DIA data with consistent peptide measurements. Furthermore, we also demonstrate the improved accuracy of our method in clinical studies using proteomics data from a longitudinal human twin study.
Collapse
|
43
|
Wang J, Li L, Chen T, Ma J, Zhu Y, Zhuang J, Chang C. In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values. Sci Rep 2017; 7:3367. [PMID: 28611393 PMCID: PMC5469784 DOI: 10.1038/s41598-017-03650-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 05/02/2017] [Indexed: 11/09/2022] Open
Abstract
Considering as one of the major goals in quantitative proteomics, detection of the differentially expressed proteins (DEPs) plays an important role in biomarker selection and clinical diagnostics. There have been plenty of algorithms and tools focusing on DEP detection in proteomics research. However, due to the different application scopes of these methods, and various kinds of experiment designs, it is not very apparent about the best choice for large-scale proteomics data analyses. Moreover, given the fact that proteomics data usually contain high percentage of missing values (MVs), but few replicates, a systematic evaluation of the DEP detection methods combined with the MV imputation methods is essential and urgent. Here, we analyzed a total of four representative imputation methods and five DEP methods on different experimental and simulated datasets. The results showed that (i) MV imputation could not always improve the performances of DEP detection methods and the imputation effects differed in the missing value percentages; (ii) the DEP detection methods had different statistical powers affected by the percentage of MVs. Two statistical methods (i.e. the empirical Bayesian random censoring threshold model, and the significance analysis of microarray) performed better than the other evaluated methods in terms of accuracy and sensitivity.
Collapse
Affiliation(s)
- Jinxia Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), National Engineering Research Center for Protein Drugs, Beijing Institute of Radiation Medicine, Beijing, 102206, P.R. China.,Department of Mathematics, Dalian Maritime University, Dalian, 116026, P.R. China.,Drug Research and Development Center, Shandong Drug and Food Vocational College, Weihai, 264210, P.R. China
| | - Liwei Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), National Engineering Research Center for Protein Drugs, Beijing Institute of Radiation Medicine, Beijing, 102206, P.R. China
| | - Tao Chen
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), National Engineering Research Center for Protein Drugs, Beijing Institute of Radiation Medicine, Beijing, 102206, P.R. China
| | - Jie Ma
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), National Engineering Research Center for Protein Drugs, Beijing Institute of Radiation Medicine, Beijing, 102206, P.R. China
| | - Yunping Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), National Engineering Research Center for Protein Drugs, Beijing Institute of Radiation Medicine, Beijing, 102206, P.R. China
| | - Jujuan Zhuang
- Department of Mathematics, Dalian Maritime University, Dalian, 116026, P.R. China.
| | - Cheng Chang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), National Engineering Research Center for Protein Drugs, Beijing Institute of Radiation Medicine, Beijing, 102206, P.R. China.
| |
Collapse
|
44
|
Strbenac D, Zhong L, Raftery MJ, Wang P, Wilson SR, Armstrong NJ, Yang JYH. Quantitative Performance Evaluator for Proteomics (QPEP): Web-based Application for Reproducible Evaluation of Proteomics Preprocessing Methods. J Proteome Res 2017; 16:2359-2369. [DOI: 10.1021/acs.jproteome.6b00882] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Dario Strbenac
- School
of Mathematics and Statistics, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Ling Zhong
- Bioanalytical
Mass Spectrometry Facility, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Mark J. Raftery
- Bioanalytical
Mass Spectrometry Facility, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Penghao Wang
- School
of Mathematics and Statistics, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Susan R. Wilson
- School of Mathematics & Statistics, University of New South Wales, Sydney, New South Wales 2052, Australia
- Centre
for Mathematics and its Applications, Mathematical Sciences Institute, Australian National University, Canberra, Australian Capital Territory 0200, Australia
| | - Nicola J. Armstrong
- School
of Mathematics and Statistics, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Jean Y. H. Yang
- School
of Mathematics and Statistics, University of Sydney, Sydney, New South Wales 2006, Australia
| |
Collapse
|
45
|
Suomi T, Seyednasrollah F, Jaakkola MK, Faux T, Elo LL. ROTS: An R package for reproducibility-optimized statistical testing. PLoS Comput Biol 2017; 13:e1005562. [PMID: 28542205 PMCID: PMC5470739 DOI: 10.1371/journal.pcbi.1005562] [Citation(s) in RCA: 77] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Revised: 06/14/2017] [Accepted: 05/10/2017] [Indexed: 12/21/2022] Open
Abstract
Differential expression analysis is one of the most common types of analyses performed on various biological data (e.g. RNA-seq or mass spectrometry proteomics). It is the process that detects features, such as genes or proteins, showing statistically significant differences between the sample groups under comparison. A major challenge in the analysis is the choice of an appropriate test statistic, as different statistics have been shown to perform well in different datasets. To this end, the reproducibility-optimized test statistic (ROTS) adjusts a modified t-statistic according to the inherent properties of the data and provides a ranking of the features based on their statistical evidence for differential expression between two groups. ROTS has already been successfully applied in a range of different studies from transcriptomics to proteomics, showing competitive performance against other state-of-the-art methods. To promote its widespread use, we introduce here a Bioconductor R package for performing ROTS analysis conveniently on different types of omics data. To illustrate the benefits of ROTS in various applications, we present three case studies, involving proteomics and RNA-seq data from public repositories, including both bulk and single cell data. The package is freely available from Bioconductor (https://www.bioconductor.org/packages/ROTS).
Collapse
Affiliation(s)
- Tomi Suomi
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland
- Department of Future Technologies, University of Turku, Turku, Finland
- * E-mail: (TS); (LLE)
| | - Fatemeh Seyednasrollah
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Maria K. Jaakkola
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Thomas Faux
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland
| | - Laura L. Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland
- * E-mail: (TS); (LLE)
| |
Collapse
|
46
|
van Ooijen MP, Jong VL, Eijkemans MJC, Heck AJR, Andeweg AC, Binai NA, van den Ham HJ. Identification of differentially expressed peptides in high-throughput proteomics data. Brief Bioinform 2017; 19:971-981. [DOI: 10.1093/bib/bbx031] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Indexed: 12/25/2022] Open
Affiliation(s)
| | - Victor L Jong
- Department of Biostatistics and Research Support, Julius Center, UMC Utrecht, Netherlands
| | - Marinus J C Eijkemans
- Julius Center for Health Sciences and Primary Care of the University Medical Center Utrecht, Netherlands
| | - Albert J R Heck
- Biomolecular Mass Spectrometry and Proteomics, Utrecht University, Netherlands
| | - Arno C Andeweg
- Department of Viroscience, Erasmus MC, CA Rotterdam, Netherlands
| | - Nadine A Binai
- Biomolecular Mass Spectrometry Group, Utrecht University, Netherlands
| | | |
Collapse
|
47
|
Waardenberg AJ. Statistical Analysis of ATM-Dependent Signaling in Quantitative Mass Spectrometry Phosphoproteomics. Methods Mol Biol 2017; 1599:229-244. [PMID: 28477123 DOI: 10.1007/978-1-4939-6955-5_17] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Ataxia-telangiectasia mutated (ATM) is a serine/threonine protein kinase, which when perturbed is associated with modified protein signaling that ultimately leads to a range of neurological and DNA repair defects. Recent advances in phospho-proteomics coupled with high-resolution mass-spectrometry provide new opportunities to dissect signaling pathways that ATM utilize under a number of conditions. This chapter begins by providing a brief overview of ATM function, its various regulatory roles and then leads into a workflow focused on the use of the statistical programming language R, together with code, for the identification of ATM-dependent substrates in the cytoplasm. This chapter cannot cover statistical properties in depth nor the range of possible methods in great detail, but instead aims to equip researchers with a set of tools to perform analysis between two conditions through examples with R functions.
Collapse
Affiliation(s)
- Ashley J Waardenberg
- Children's Medical Research Institute, University of Sydney, 214 Hawkesbury Road, Westmead, NSW, 2145, Australia.
| |
Collapse
|
48
|
Blein-Nicolas M, Zivy M. Thousand and one ways to quantify and compare protein abundances in label-free bottom-up proteomics. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2016; 1864:883-95. [PMID: 26947242 DOI: 10.1016/j.bbapap.2016.02.019] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 01/21/2016] [Accepted: 02/24/2016] [Indexed: 11/18/2022]
Abstract
How to process and analyze MS data to quantify and statistically compare protein abundances in bottom-up proteomics has been an open debate for nearly fifteen years. Two main approaches are generally used: the first is based on spectral data generated during the process of identification (e.g. peptide counting, spectral counting), while the second makes use of extracted ion currents to quantify chromatographic peaks and infer protein abundances based on peptide quantification. These two approaches actually refer to multiple methods which have been developed during the last decade, but were submitted to deep evaluations only recently. In this paper, we compiled these different methods as exhaustively as possible. We also summarized the way they address the different problems raised by bottom-up protein quantification such as normalization, the presence of shared peptides, unequal peptide measurability and missing data. This article is part of a Special Issue entitled: Plant Proteomics--a bridge between fundamental processes and crop production, edited by Dr. Hans-Peter Mock.
Collapse
Affiliation(s)
- Mélisande Blein-Nicolas
- GQE-Le Moulon, INRA, Univ Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, F-91190 Gif-sur-Yvette, France
| | - Michel Zivy
- GQE-Le Moulon, INRA, Univ Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, F-91190 Gif-sur-Yvette, France.
| |
Collapse
|
49
|
Vehmas AP, Adam M, Laajala TD, Kastenmüller G, Prehn C, Rozman J, Ohlsson C, Fuchs H, Hrabě de Angelis M, Gailus-Durner V, Elo LL, Aittokallio T, Adamski J, Corthals G, Poutanen M, Strauss L. Liver lipid metabolism is altered by increased circulating estrogen to androgen ratio in male mouse. J Proteomics 2015; 133:66-75. [PMID: 26691839 DOI: 10.1016/j.jprot.2015.12.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 10/26/2015] [Accepted: 12/05/2015] [Indexed: 02/05/2023]
Abstract
Estrogens are suggested to lower the risk of developing metabolic syndrome in both sexes. In this study, we investigated how the increased circulating estrogen-to-androgen ratio (E/A) alters liver lipid metabolism in males. The cytochrome P450 aromatase (P450arom) is an enzyme converting androgens to estrogens. Male mice overexpressing human aromatase enzyme (AROM+ mice), and thus have high circulating E/A, were used as a model in this study. Proteomics and gene expression analyses indicated an increase in the peroxisomal β-oxidation in the liver of AROM+ mice as compared with their wild type littermates. Correspondingly, metabolomic analysis revealed a decrease in the amount of phosphatidylcholines with long-chain fatty acids in the plasma. With interest we noted that the expression of Cyp4a12a enzyme, which specifically metabolizes arachidonic acid (AA) to 20-hydroxy AA, was dramatically decreased in the AROM+ liver. As a consequence, increased amounts of phospholipids having AA as a fatty acid tail were detected in the plasma of the AROM+ mice. Overall, these observations demonstrate that high circulating E/A in males is linked to indicators of higher peroxisomal β-oxidation and lower AA metabolism in the liver. Furthermore, the plasma phospholipid profile reflects the changes in the liver lipid metabolism.
Collapse
Affiliation(s)
- Anni P Vehmas
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland; Department of Physiology, Institute of Biomedicine, University of Turku, Turku, Finland
| | - Marion Adam
- Department of Physiology, Institute of Biomedicine, University of Turku, Turku, Finland; Turku Center for Disease Modeling, University of Turku, Turku, Finland
| | - Teemu D Laajala
- Turku Center for Disease Modeling, University of Turku, Turku, Finland; Department of Mathematics and Statistics, University of Turku, Turku, Finland; Drug Research Doctoral Programme, University of Turku, Finland; Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland
| | - Gabi Kastenmüller
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Cornelia Prehn
- Genome Analysis Center, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Jan Rozman
- German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany; German Center for Diabetes Research (DZD), Neuherberg, Germany; Molecular Nutritional Medicine, Else Kröner-Fresenius Center, Technische Universität München, Freising-Weihenstephan, Germany
| | - Claes Ohlsson
- Centre for Bone and Arthritis Research, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Helmut Fuchs
- German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Martin Hrabě de Angelis
- German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany; German Center for Diabetes Research (DZD), Neuherberg, Germany; Experimental Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universität München, Neuherberg, Germany
| | - Valérie Gailus-Durner
- German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Laura L Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland; Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Tero Aittokallio
- Department of Mathematics and Statistics, University of Turku, Turku, Finland; Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland
| | - Jerzy Adamski
- Genome Analysis Center, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany; German Center for Diabetes Research (DZD), Neuherberg, Germany; Experimental Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universität München, Neuherberg, Germany
| | - Garry Corthals
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland; Van 't Hoff Institute for Molecular Sciences, University of Amsterdam, The Netherlands
| | - Matti Poutanen
- Department of Physiology, Institute of Biomedicine, University of Turku, Turku, Finland; Turku Center for Disease Modeling, University of Turku, Turku, Finland; Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Sweden
| | - Leena Strauss
- Department of Physiology, Institute of Biomedicine, University of Turku, Turku, Finland; Turku Center for Disease Modeling, University of Turku, Turku, Finland.
| |
Collapse
|