1
|
Ramsbottom KA, Prakash A, Perez-Riverol Y, Camacho OM, Sun Z, Kundu DJ, Bowler-Barnett E, Martin M, Fan J, Chebotarov D, McNally KL, Deutsch EW, Vizcaíno JA, Jones AR. Meta-Analysis of Rice Phosphoproteomics Data to Understand Variation in Cell Signaling Across the Rice Pan-Genome. J Proteome Res 2024. [PMID: 38810119 DOI: 10.1021/acs.jproteome.4c00187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have reanalyzed publicly available mass spectrometry proteomics data sets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,565 phosphosites on serine, threonine, and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety and clustered the data to identify groups of sites with similar patterns across rice family groups. The data has been loaded into UniProt Knowledge-Base─enabling researchers to visualize sites alongside other data on rice proteins, e.g., structural models from AlphaFold2, PeptideAtlas, and the PRIDE database─enabling visualization of source evidence, including scores and supporting mass spectra.
Collapse
Affiliation(s)
- Kerry A Ramsbottom
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, United Kingdom
| | - Ananth Prakash
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Oscar Martin Camacho
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, United Kingdom
| | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Deepti J Kundu
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Emily Bowler-Barnett
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Maria Martin
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Jun Fan
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Dmytro Chebotarov
- International Rice Research Institute, DAPO Box 7777, Manila 1301, Philippines
| | - Kenneth L McNally
- International Rice Research Institute, DAPO Box 7777, Manila 1301, Philippines
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Andrew R Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, United Kingdom
| |
Collapse
|
2
|
Lin A, See D, Fondrie WE, Keich U, Noble WS. Target-decoy false discovery rate estimation using Crema. Proteomics 2024; 24:e2300084. [PMID: 38380501 DOI: 10.1002/pmic.202300084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 01/06/2024] [Accepted: 01/16/2024] [Indexed: 02/22/2024]
Abstract
Assigning statistical confidence estimates to discoveries produced by a tandem mass spectrometry proteomics experiment is critical to enabling principled interpretation of the results and assessing the cost/benefit ratio of experimental follow-up. The most common technique for computing such estimates is to use target-decoy competition (TDC), in which observed spectra are searched against a database of real (target) peptides and a database of shuffled or reversed (decoy) peptides. TDC procedures for estimating the false discovery rate (FDR) at a given score threshold have been developed for application at the level of spectra, peptides, or proteins. Although these techniques are relatively straightforward to implement, it is common in the literature to skip over the implementation details or even to make mistakes in how the TDC procedures are applied in practice. Here we present Crema, an open-source Python tool that implements several TDC methods of spectrum-, peptide- and protein-level FDR estimation. Crema is compatible with a variety of existing database search tools and provides a straightforward way to obtain robust FDR estimates.
Collapse
Affiliation(s)
- Andy Lin
- Chemical and Biological Signatures, Pacific Northwest National Laboratory, Seattle, Washington, USA
| | - Donavan See
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, USA
| | | | - Uri Keich
- School of Mathematics and Statistics, University of Sydney, Sydney, Australia
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, USA
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| |
Collapse
|
3
|
Fields L, Vu NQ, Dang TC, Yen HC, Ma M, Wu W, Gray M, Li L. EndoGenius: Optimized Neuropeptide Identification from Mass Spectrometry Datasets. J Proteome Res 2024. [PMID: 38426863 DOI: 10.1021/acs.jproteome.3c00758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
Neuropeptides represent a unique class of signaling molecules that have garnered much attention but require special consideration when identifications are gleaned from mass spectra. With highly variable sequence lengths, neuropeptides must be analyzed in their endogenous state. Further, neuropeptides share great homology within families, differing by as little as a single amino acid residue, complicating even routine analyses and necessitating optimized computational strategies for confident and accurate identifications. We present EndoGenius, a database searching strategy designed specifically for elucidating neuropeptide identifications from mass spectra by leveraging optimized peptide-spectrum matching approaches, an expansive motif database, and a novel scoring algorithm to achieve broader representation of the neuropeptidome and minimize reidentification. This work describes an algorithm capable of reporting more neuropeptide identifications at 1% false-discovery rate than alternative software in five Callinectes sapidus neuronal tissue types.
Collapse
Affiliation(s)
- Lauren Fields
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Nhu Q Vu
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Tina C Dang
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| | - Hsu-Ching Yen
- Department of Biochemistry, University of Wisconsin-Madison, 433 Babcock Drive, Madison, Wisconsin 53706, United States
| | - Min Ma
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| | - Wenxin Wu
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Mitchell Gray
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
- Lachman Institute for Pharmaceutical Development, School of Pharmacy, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
- Wisconsin Center for NanoBioSystems, School of Pharmacy, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| |
Collapse
|
4
|
Langan LM, Lovin LM, Taylor RB, Scarlett KR, Kevin Chambliss C, Chatterjee S, Scott JT, Brooks BW. Proteome changes in larval zebrafish (Danio rerio) and fathead minnow (Pimephales promelas) exposed to (±) anatoxin-a. ENVIRONMENT INTERNATIONAL 2024; 185:108514. [PMID: 38394915 DOI: 10.1016/j.envint.2024.108514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/16/2024] [Accepted: 02/17/2024] [Indexed: 02/25/2024]
Abstract
Anatoxin-a and its analogues are potent neurotoxins produced by several genera of cyanobacteria. Due in part to its high toxicity and potential presence in drinking water, these toxins pose threats to public health, companion animals and the environment. It primarily exerts toxicity as a cholinergic agonist, with high affinity at neuromuscular junctions, but molecular mechanisms by which it elicits toxicological responses are not fully understood. To advance understanding of this cyanobacteria, proteomic characterization (DIA shotgun proteomics) of two common fish models (zebrafish and fathead minnow) was performed following (±) anatoxin-a exposure. Specifically, proteome changes were identified and quantified in larval fish exposed for 96 h (0.01-3 mg/L (±) anatoxin-a and caffeine (a methodological positive control) with environmentally relevant treatment levels examined based on environmental exposure distributions of surface water data. Proteomic concentration - response relationships revealed 48 and 29 proteins with concentration - response relationships curves for zebrafish and fathead minnow, respectively. In contrast, the highest number of differentially expressed proteins (DEPs) varied between zebrafish (n = 145) and fathead minnow (n = 300), with only fatheads displaying DEPs at all treatment levels. For both species, genes associated with reproduction were significantly downregulated, with pathways analysis that broadly clustered genes into groups associated with DNA repair mechanisms. Importantly, significant differences in proteome response between the species was also observed, consistent with prior observations of differences in response using both behavioral assays and gene expression, adding further support to model specific differences in organismal sensitivity and/or response. When DEPs were read across from humans to zebrafish, disease ontology enrichment identified diseases associated with cognition and muscle weakness consistent with the prior literature. Our observations highlight limited knowledge of how (±) anatoxin-a, a commonly used synthetic racemate surrogate, elicits responses at a molecular level and advances its toxicological understanding.
Collapse
Affiliation(s)
- Laura M Langan
- Department of Environmental Science, Baylor University, Waco, TX 76798, USA; Center for Reservoir and Aquatic Systems Research, Baylor University, Waco, TX 76798, USA; Department of Environmental Health Sciences, University of South Carolina, Columbia, SC 29208, USA.
| | - Lea M Lovin
- Department of Environmental Science, Baylor University, Waco, TX 76798, USA; Center for Reservoir and Aquatic Systems Research, Baylor University, Waco, TX 76798, USA; Department of Wildlife, Fish and Environmental Studies, Swedish University of Agricultural Sciences, Umeå, Sweden
| | - Raegyn B Taylor
- Center for Reservoir and Aquatic Systems Research, Baylor University, Waco, TX 76798, USA; Department of Chemistry, Baylor University, Waco, TX 76798, USA
| | - Kendall R Scarlett
- Department of Environmental Science, Baylor University, Waco, TX 76798, USA; Center for Reservoir and Aquatic Systems Research, Baylor University, Waco, TX 76798, USA
| | - C Kevin Chambliss
- Center for Reservoir and Aquatic Systems Research, Baylor University, Waco, TX 76798, USA; Department of Chemistry, Baylor University, Waco, TX 76798, USA
| | - Saurabh Chatterjee
- Department of Medicine, Department of Environmental and Occupational Health, University of California Irvine, Irvine, CA 92617, USA
| | - J Thad Scott
- Center for Reservoir and Aquatic Systems Research, Baylor University, Waco, TX 76798, USA; Department of Biology, Baylor University, Waco, TX 76798, USA
| | - Bryan W Brooks
- Department of Environmental Science, Baylor University, Waco, TX 76798, USA; Center for Reservoir and Aquatic Systems Research, Baylor University, Waco, TX 76798, USA.
| |
Collapse
|
5
|
Schröder JM. Discovery of natural bispecific antibodies: Is psoriasis induced by a toxigenic Corynebacterium simulans and maintained by CIDAMPs as autoantigens? Exp Dermatol 2024; 33:e15014. [PMID: 38284202 DOI: 10.1111/exd.15014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 12/21/2023] [Accepted: 12/29/2023] [Indexed: 01/30/2024]
Abstract
The high abundance of Corynebacterium simulans in psoriasis skin suggests a contribution to the psoriasis aetiology. This hypothesis was tested in an exploratory study, where western blot (WB) analyses with extracts of heat-treated C. simulans and psoriasis serum-derived IgG exhibited a single 16 kDa-WB-band. Proteomic analyses revealed ribosomal proteins as candidate C. s.-antigens. A peptidomic analysis unexpectedly showed that psoriasis serum-derived IgG already contained 31 immunopeptides of Corynebacteria ssp., suggesting the presence of natural bispecific antibodies (BsAbs). Moreover, peptidomic analyses gave 372 DECOY-peptides with similarity to virus- and phage proteins, including Corynebacterium diphtheriae phage, and similarity to diphtheria toxin. Strikingly, a peptidomic analysis for human peptides revealed 64 epitopes of major psoriasis autoantigens such as the spacer region of filaggrin, hornerin repeats and others. Most identified immunopeptides represent potential cationic intrinsically disordered antimicrobial peptides (CIDAMPs), which are generated within the epidermis. These may form complexes with bacterial disordered protein regions, representing chimeric antigens containing discontinuous epitopes. In addition, among 128 low-abundance immunopeptides, 48 are putatively psoriasis-relevant such as epitope peptides of PGE2-, vitamin D3- and IL-10-receptors. Further, 47 immunopeptides originated from tumour antigens, and the endogenous retrovirus HERV-K. I propose that persistent infection with a toxigenic C. simulans initiates psoriasis, which is exacerbated as an autoimmune disease by CIDAMPs as autoantigens. The discovery of natural BsAbs allows the identification of antigen epitopes from microbes, viruses, autoantigens and tumour-antigens, and may help to develop epitope-specific peptide-vaccines and therapeutic approaches with antigen-specific regulatory T cells to improve immune tolerance in an autoimmune disease-specific-manner.
Collapse
Affiliation(s)
- Jens-Michael Schröder
- Department of Dermatology, University-Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| |
Collapse
|
6
|
Ramsbottom KA, Prakash A, Riverol YP, Camacho OM, Sun Z, Kundu DJ, Bowler-Barnett E, Martin M, Fan J, Chebotarov D, McNally KL, Deutsch EW, Vizcaíno JA, Jones AR. A meta-analysis of rice phosphoproteomics data to understand variation in cell signalling across the rice pan-genome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.17.567512. [PMID: 38014076 PMCID: PMC10680829 DOI: 10.1101/2023.11.17.567512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have re-analysed publicly available mass spectrometry proteomics datasets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,522 phosphosites on serine, threonine and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety. The data was clustered to identify groups of sites with similar patterns across rice family groups, for example those highly conserved in Japonica, but mostly absent in Aus type rice varieties - known to have different responses to drought. These resources can assist rice researchers to discover alleles with significantly different functional effects across rice varieties. The data has been loaded into UniProt Knowledge-Base - enabling researchers to visualise sites alongside other data on rice proteins e.g. structural models from AlphaFold2, PeptideAtlas and the PRIDE database - enabling visualisation of source evidence, including scores and supporting mass spectra.
Collapse
Affiliation(s)
- Kerry A Ramsbottom
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7BE, United Kingdom
| | - Ananth Prakash
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Yasset Perez Riverol
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Oscar Martin Camacho
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7BE, United Kingdom
| | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Deepti J. Kundu
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Emily Bowler-Barnett
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Maria Martin
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Jun Fan
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Dmytro Chebotarov
- International Rice Research Institute, DAPO 7777, Manila 1301, Philippines
| | - Kenneth L McNally
- International Rice Research Institute, DAPO 7777, Manila 1301, Philippines
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Andrew R Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7BE, United Kingdom
| |
Collapse
|
7
|
Higgins L, Gerdes H, Cutillas PR. Principles of phosphoproteomics and applications in cancer research. Biochem J 2023; 480:403-420. [PMID: 36961757 PMCID: PMC10212522 DOI: 10.1042/bcj20220220] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 02/24/2023] [Accepted: 02/28/2023] [Indexed: 03/25/2023]
Abstract
Phosphorylation constitutes the most common and best-studied regulatory post-translational modification in biological systems and archetypal signalling pathways driven by protein and lipid kinases are disrupted in essentially all cancer types. Thus, the study of the phosphoproteome stands to provide unique biological information on signalling pathway activity and on kinase network circuitry that is not captured by genetic or transcriptomic technologies. Here, we discuss the methods and tools used in phosphoproteomics and highlight how this technique has been used, and can be used in the future, for cancer research. Challenges still exist in mass spectrometry phosphoproteomics and in the software required to provide biological information from these datasets. Nevertheless, improvements in mass spectrometers with enhanced scan rates, separation capabilities and sensitivity, in biochemical methods for sample preparation and in computational pipelines are enabling an increasingly deep analysis of the phosphoproteome, where previous bottlenecks in data acquisition, processing and interpretation are being relieved. These powerful hardware and algorithmic innovations are not only providing exciting new mechanistic insights into tumour biology, from where new drug targets may be derived, but are also leading to the discovery of phosphoproteins as mediators of drug sensitivity and resistance and as classifiers of disease subtypes. These studies are, therefore, uncovering phosphoproteins as a new generation of disruptive biomarkers to improve personalised anti-cancer therapies.
Collapse
Affiliation(s)
- Luke Higgins
- Cell Signaling and Proteomics Group, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, U.K
| | - Henry Gerdes
- Cell Signaling and Proteomics Group, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, U.K
| | - Pedro R. Cutillas
- Cell Signaling and Proteomics Group, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, U.K
- Alan Turing Institute, The British Library, London, U.K
- Digital Environment Research Institute, Queen Mary University of London, London, U.K
| |
Collapse
|
8
|
Madej D, Lam H. Modeling Lower-Order Statistics to Enable Decoy-Free FDR Estimation in Proteomics. J Proteome Res 2023; 22:1159-1171. [PMID: 36962508 DOI: 10.1021/acs.jproteome.2c00604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2023]
Abstract
One of the chief objectives in mass spectrometry-based peptide identification in proteomics is the statistical validation of top-scoring peptide-spectrum matches (PSMs) in the form of false discovery rate (FDR) estimation. Existing methods construct a null model that captures the characteristics of incorrect target PSMs to estimate the FDR, most often with the help of decoys. Decoy-based methods, however, increase the computational cost and rely on the difficult-to-verify assumption that decoy PSMs constitute a sufficient and representative sample of the population of possible incorrect target PSMs. On the other hand, the possibility of FDR estimation assisted by the plentiful non-top-scoring PSMs, which are almost always incorrect, has been scarcely explored. In this work, we propose a novel decoy-free procedure for developing null models for top-scoring PSMs using the transformed e-value (TEV) score and the distributions of non-top-scoring target PSMs. The method relies on a theoretically derivable relationship between the parameters of the distributions of lower-order statistics of the TEV score and a necessary empirical optimization to fit a single parameter to actual data. The framework was tested on multiple different data sets and two search engines. We present evidence that our method is comparable to and occasionally outperforms popular decoy-free and decoy-based methods in FDR estimation.
Collapse
Affiliation(s)
- Dominik Madej
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong 999077, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong 999077, China
| |
Collapse
|
9
|
Deutsch EW, Mendoza L, Shteynberg DD, Hoopmann MR, Sun Z, Eng JK, Moritz RL. Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite. J Proteome Res 2023; 22:615-624. [PMID: 36648445 PMCID: PMC10166710 DOI: 10.1021/acs.jproteome.2c00624] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The Trans-Proteomic Pipeline (TPP) mass spectrometry data analysis suite has been in continual development and refinement since its first tools, PeptideProphet and ProteinProphet, were published 20 years ago. The current release provides a large complement of tools for spectrum processing, spectrum searching, search validation, abundance computation, protein inference, and more. Many of the tools include machine-learning modeling to extract the most information from data sets and build robust statistical models to compute the probabilities that derived information is correct. Here we present the latest information on the many TPP tools, and how TPP can be deployed on various platforms from personal Windows laptops to Linux clusters and expansive cloud computing environments. We describe tutorials on how to use TPP in a variety of ways and describe synergistic projects that leverage TPP. We conclude with plans for continued development of TPP.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | | | | | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
10
|
Etourneau L, Burger T. Challenging Targets or Describing Mismatches? A Comment on Common Decoy Distribution by Madej et al. J Proteome Res 2022; 21:2840-2845. [PMID: 36305797 DOI: 10.1021/acs.jproteome.2c00279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
In their recent article, Madej et al. (Madej, D.; Wu, L.; Lam, H.Common Decoy Distributions Simplify False Discovery Rate Estimation in Shotgun Proteomics. J. Proteome Res.2022, 21 (2), 339-348) proposed an original way to solve the recurrent issue of controlling for the false discovery rate (FDR) in peptide-spectrum-match (PSM) validation. Briefly, they proposed to derive a single precise distribution of decoy matches termed the Common Decoy Distribution (CDD) and to use it to control for FDR during a target-only search. Conceptually, this approach is appealing as it takes the best of two worlds, i.e., decoy-based approaches (which leverage a large-scale collection of empirical mismatches) and decoy-free approaches (which are not subject to the randomness of decoy generation while sparing an additional database search). Interestingly, CDD also corresponds to a middle-of-the-road approach in statistics with respect to the two main families of FDR control procedures: Although historically based on estimating the false-positive distribution, FDR control has recently been demonstrated to be possible thanks to competition between the original variables (in proteomics, target sequences) and their fictional counterparts (in proteomics, decoys). Discriminating between these two theoretical trends is of prime importance for computational proteomics. In addition to highlighting why proteomics was a source of inspiration for theoretical biostatistics, it provides practical insights into the improvements that can be made to FDR control methods used in proteomics, including CDD.
Collapse
Affiliation(s)
- Lucas Etourneau
- Univ. Grenoble Alpes, CNRS, CEA, Inserm, ProFI, FR2048Grenoble, France
| | - Thomas Burger
- Univ. Grenoble Alpes, CNRS, CEA, Inserm, ProFI, FR2048Grenoble, France
| |
Collapse
|
11
|
Lee S, Park H, Kim H. False discovery rate estimation using candidate peptides for each spectrum. BMC Bioinformatics 2022; 23:454. [PMID: 36319948 PMCID: PMC9623924 DOI: 10.1186/s12859-022-05002-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 10/25/2022] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND False discovery rate (FDR) estimation is very important in proteomics. The target-decoy strategy (TDS), which is often used for FDR estimation, estimates the FDR under the assumption that when spectra are identified incorrectly, the probabilities of the spectra matching the target or decoy peptides are identical. However, no spectra matching target or decoy peptide probabilities are identical. We propose cTDS (target-decoy strategy with candidate peptides) for accurate estimation of the FDR using the probability that the spectrum is identified incorrectly as a target or decoy peptide. RESULTS Most spectrum cases result in a probability of having the spectrum identified incorrectly as a target or decoy peptide of close to 0.5, but only about 1.14-4.85% of the total spectra have an exact probability of 0.5. We used an entrapment sequence method to demonstrate the accuracy of cTDS. For fixed FDR thresholds (1-10%), the false match rate (FMR) in cTDS is closer than the FMR in TDS. We compared the number of peptide-spectrum matches (PSMs) obtained with TDS and cTDS at a 1% FDR threshold with the HEK293 dataset. In the first and third replications, the number of PSMs obtained with cTDS for the reverse, pseudo-reverse, shuffle, and de Bruijn databases exceeded those obtained with TDS (about 0.001-0.132%), with the pseudo-shuffle database containing less compared to TDS (about 0.05-0.126%). In the second replication, the number of PSMs obtained with cTDS for all databases exceeds that obtained with TDS (about 0.013-0.274%). CONCLUSIONS When spectra are actually identified incorrectly, most probabilities of the spectra matching a target or decoy peptide are not identical. Therefore, we propose cTDS, which estimates the FDR more accurately using the probability of the spectrum being identified incorrectly as a target or decoy peptide.
Collapse
Affiliation(s)
- Sangjeong Lee
- grid.49606.3d0000 0001 1364 9317Department of Computer Science, Hanyang University, Seoul, 06978 Republic of Korea
| | - Heejin Park
- grid.49606.3d0000 0001 1364 9317Department of Computer Science, Hanyang University, Seoul, 06978 Republic of Korea
| | - Hyunwoo Kim
- grid.249964.40000 0001 0523 5253Biomedical Informatics Team, Korea Institute of Science and Technology Information, Daejeon, 34141 Republic of Korea
| |
Collapse
|
12
|
Ramsbottom KA, Prakash A, Riverol YP, Camacho OM, Martin MJ, Vizcaíno JA, Deutsch EW, Jones AR. Method for Independent Estimation of the False Localization Rate for Phosphoproteomics. J Proteome Res 2022; 21:1603-1615. [PMID: 35640880 PMCID: PMC9251759 DOI: 10.1021/acs.jproteome.1c00827] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
![]()
Phosphoproteomic
methods are commonly employed to identify and
quantify phosphorylation sites on proteins. In recent years, various
tools have been developed, incorporating scores or statistics related
to whether a given phosphosite has been correctly identified or to
estimate the global false localization rate (FLR) within a given data
set for all sites reported. These scores have generally been calibrated
using synthetic datasets, and their statistical reliability on real
datasets is largely unknown, potentially leading to studies reporting
incorrectly localized phosphosites, due to inadequate statistical
control. In this work, we develop the concept of scoring modifications
on a decoy amino acid, that is, one that cannot be modified, to allow
for independent estimation of global FLR. We test a variety of amino
acids, on both synthetic and real data sets, demonstrating that the
selection can make a substantial difference to the estimated global
FLR. We conclude that while several different amino acids might be
appropriate, the most reliable FLR results were achieved using alanine
and leucine as decoys. We propose the use of a decoy amino acid to
control false reporting in the literature and in public databases
that re-distribute the data. Data are available via ProteomeXchange
with identifier PXD028840.
Collapse
Affiliation(s)
- Kerry A Ramsbottom
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, U.K
| | - Ananth Prakash
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, U.K
| | - Yasset Perez Riverol
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, U.K
| | - Oscar Martin Camacho
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, U.K
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, U.K
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, U.K
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Andrew R Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, U.K
| |
Collapse
|
13
|
Simple, efficient and thorough shotgun proteomic analysis with PatternLab V. Nat Protoc 2022; 17:1553-1578. [DOI: 10.1038/s41596-022-00690-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 02/08/2022] [Indexed: 11/08/2022]
|
14
|
Gholamizoj S, Ma B. SPEQ: quality assessment of peptide tandem mass spectra with deep learning. Bioinformatics 2022; 38:1568-1574. [PMID: 34978568 PMCID: PMC8896601 DOI: 10.1093/bioinformatics/btab874] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 12/25/2021] [Accepted: 12/30/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION In proteomics, database search programs are routinely used for peptide identification from tandem mass spectrometry data. However, many low-quality spectra cannot be interpreted by any programs. Meanwhile, certain high-quality spectra may not be identified due to incompleteness of the database or failure of the software. Thus, spectrum quality (SPEQ) assessment tools are helpful programs that can eliminate poor-quality spectra before the database search and highlight the high-quality spectra that are not identified in the initial search. These spectra may be valuable candidates for further analyses. RESULTS We propose SPEQ: a spectrum quality assessment tool that uses a deep neural network to classify spectra into high-quality, which are worthy candidates for interpretation, and low-quality, which lack sufficient information for identification. SPEQ was compared with a few other prediction models and demonstrated improved prediction accuracy. AVAILABILITY AND IMPLEMENTATION Source code and scripts are freely available at github.com/sor8sh/SPEQ, implemented in Python.
Collapse
Affiliation(s)
- Soroosh Gholamizoj
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Bin Ma
- To whom correspondence should be addressed.
| |
Collapse
|
15
|
Maixner F, Sarhan MS, Huang KD, Tett A, Schoenafinger A, Zingale S, Blanco-Míguez A, Manghi P, Cemper-Kiesslich J, Rosendahl W, Kusebauch U, Morrone SR, Hoopmann MR, Rota-Stabelli O, Rattei T, Moritz RL, Oeggl K, Segata N, Zink A, Reschreiter H, Kowarik K. Hallstatt miners consumed blue cheese and beer during the Iron Age and retained a non-Westernized gut microbiome until the Baroque period. Curr Biol 2021; 31:5149-5162.e6. [PMID: 34648730 PMCID: PMC8660109 DOI: 10.1016/j.cub.2021.09.031] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 08/16/2021] [Accepted: 09/14/2021] [Indexed: 02/06/2023]
Abstract
We subjected human paleofeces dating from the Bronze Age to the Baroque period (18th century AD) to in-depth microscopic, metagenomic, and proteomic analyses. The paleofeces were preserved in the underground salt mines of the UNESCO World Heritage site of Hallstatt in Austria. This allowed us to reconstruct the diet of the former population and gain insights into their ancient gut microbiome composition. Our dietary survey identified bran and glumes of different cereals as some of the most prevalent plant fragments. This highly fibrous, carbohydrate-rich diet was supplemented with proteins from broad beans and occasionally with fruits, nuts, or animal food products. Due to these traditional dietary habits, all ancient miners up to the Baroque period have gut microbiome structures akin to modern non-Westernized individuals whose diets are also mainly composed of unprocessed foods and fresh fruits and vegetables. This may indicate a shift in the gut community composition of modern Westernized populations due to quite recent dietary and lifestyle changes. When we extended our microbial survey to fungi present in the paleofeces, in one of the Iron Age samples, we observed a high abundance of Penicillium roqueforti and Saccharomyces cerevisiae DNA. Genome-wide analysis indicates that both fungi were involved in food fermentation and provides the first molecular evidence for blue cheese and beer consumption in Iron Age Europe.
Collapse
Affiliation(s)
- Frank Maixner
- Institute for Mummy Studies, EURAC Research, Viale Druso 1, 39100 Bolzano, Italy.
| | - Mohamed S Sarhan
- Institute for Mummy Studies, EURAC Research, Viale Druso 1, 39100 Bolzano, Italy
| | - Kun D Huang
- Department CIBIO, University of Trento, Via Sommarive 9, 38123 Povo (Trento), Italy; Department of Sustainable Agro-Ecosystems and Bioresources, Fondazione Edmund Mach, Via Edmund Mach 1, 38010 San Michele all'Adige (TN), Italy
| | - Adrian Tett
- Department CIBIO, University of Trento, Via Sommarive 9, 38123 Povo (Trento), Italy; CUBE (Division of Computational Systems Biology), Centre for Microbiology and Environmental Systems Science, University of Vienna, Althanstraße 14, 1090 Vienna, Austria
| | - Alexander Schoenafinger
- Institute for Mummy Studies, EURAC Research, Viale Druso 1, 39100 Bolzano, Italy; Institute of Botany, University of Innsbruck, Sternwartestraße 15, 6020 Innsbruck, Austria
| | - Stefania Zingale
- Institute for Mummy Studies, EURAC Research, Viale Druso 1, 39100 Bolzano, Italy
| | - Aitor Blanco-Míguez
- Department CIBIO, University of Trento, Via Sommarive 9, 38123 Povo (Trento), Italy
| | - Paolo Manghi
- Department CIBIO, University of Trento, Via Sommarive 9, 38123 Povo (Trento), Italy
| | - Jan Cemper-Kiesslich
- Interfaculty Department of Legal Medicine & Department of Classics, University of Salzburg, Ignaz-Harrer-Straße 79, 5020 Salzburg, Austria
| | - Wilfried Rosendahl
- Reiss-Engelhorn-Museen, Zeughaus C5, 68159 Mannheim, Germany; Curt-Egelhorn-Zentrum Archäomtrie, D6,3, 61859 Mannheim, Germany
| | - Ulrike Kusebauch
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109, USA
| | - Seamus R Morrone
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109, USA
| | - Michael R Hoopmann
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109, USA
| | - Omar Rota-Stabelli
- Center Agriculture Food Environment (C3A), University of Trento, 38010 San Michele all'Adige (TN), Italy
| | - Thomas Rattei
- CUBE (Division of Computational Systems Biology), Centre for Microbiology and Environmental Systems Science, University of Vienna, Althanstraße 14, 1090 Vienna, Austria
| | - Robert L Moritz
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109, USA
| | - Klaus Oeggl
- Institute of Botany, University of Innsbruck, Sternwartestraße 15, 6020 Innsbruck, Austria
| | - Nicola Segata
- Department CIBIO, University of Trento, Via Sommarive 9, 38123 Povo (Trento), Italy
| | - Albert Zink
- Institute for Mummy Studies, EURAC Research, Viale Druso 1, 39100 Bolzano, Italy
| | - Hans Reschreiter
- Prehistoric Department, Museum of Natural History Vienna, Burgring 7, 1010 Vienna, Austria
| | - Kerstin Kowarik
- Prehistoric Department, Museum of Natural History Vienna, Burgring 7, 1010 Vienna, Austria.
| |
Collapse
|
16
|
Lee S, Park H, Kim H. Comparison of false-discovery rates of various decoy databases. Proteome Sci 2021; 19:11. [PMID: 34537052 PMCID: PMC8449453 DOI: 10.1186/s12953-021-00179-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 09/01/2021] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ. RESULTS We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database. CONCLUSION The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used.
Collapse
Affiliation(s)
- Sangjeong Lee
- Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea
| | - Heejin Park
- Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea.
| | - Hyunwoo Kim
- Center for Supercomputing Applications, Korea Institute of Science and Technology Information, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
17
|
Hoopmann MR, Kusebauch U, Palmblad M, Bandeira N, Shteynberg DD, He L, Xia B, Stoychev SH, Omenn GS, Weintraub ST, Moritz RL. Insights from the First Phosphopeptide Challenge of the MS Resource Pillar of the HUPO Human Proteome Project. J Proteome Res 2020; 19:4754-4765. [PMID: 33166149 DOI: 10.1021/acs.jproteome.0c00648] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Mass spectrometry has greatly improved the analysis of phosphorylation events in complex biological systems and on a large scale. Despite considerable progress, the correct identification of phosphorylated sites, their quantification, and their interpretation regarding physiological relevance remain challenging. The MS Resource Pillar of the Human Proteome Organization (HUPO) Human Proteome Project (HPP) initiated the Phosphopeptide Challenge as a resource to help the community evaluate methods, learn procedures and data analysis routines, and establish their own workflows by comparing results obtained from a standard set of 94 phosphopeptides (serine, threonine, tyrosine) and their nonphosphorylated counterparts mixed at different ratios in a neat sample and a yeast background. Participants analyzed both samples with their method(s) of choice to report the identification and site localization of these peptides, determine their relative abundances, and enrich for the phosphorylated peptides in the yeast background. We discuss the results from 22 laboratories that used a range of different methods, instruments, and analysis software. We reanalyzed submitted data with a single software pipeline and highlight the successes and challenges in correct phosphosite localization. All of the data from this collaborative endeavor are shared as a resource to encourage the development of even better methods and tools for diverse phosphoproteomic applications. All submitted data and search results were uploaded to MassIVE (https://massive.ucsd.edu/) as data set MSV000085932 with ProteomeXchange identifier PXD020801.
Collapse
Affiliation(s)
| | - Ulrike Kusebauch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093, United States
| | | | - Lingjie He
- Synpeptide Co., Ltd., Shanghai 201204, China
| | - Bin Xia
- Synpeptide Co., Ltd., Shanghai 201204, China
| | | | - Gilbert S Omenn
- Institute for Systems Biology, Seattle, Washington 98109, United States.,Departments of Computational Medicine and Bioinformatics, Internal Medicine, and Human Genetics and School of Public Health, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Susan T Weintraub
- Department of Biochemistry and Structural Biology, The University of Texas Health Science Center at San Antonio, San Antonio, Texas 78229, United States
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
18
|
Guan S, Taylor PP, Han Z, Moran MF, Ma B. Data Dependent-Independent Acquisition (DDIA) Proteomics. J Proteome Res 2020; 19:3230-3237. [PMID: 32539411 DOI: 10.1021/acs.jproteome.0c00186] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Data dependent acquisition (DDA) and data independent acquisition (DIA) are traditionally separate experimental paradigms in bottom-up proteomics. In this work, we developed a strategy combining the two experimental methods into a single LC-MS/MS run. We call the novel strategy data dependent-independent acquisition proteomics, or DDIA for short. Peptides identified from DDA scans by a conventional and robust DDA identification workflow provide useful information for interrogation of DIA scans. Deep learning based LC-MS/MS property prediction tools, developed previously, can be used repeatedly to produce spectral libraries facilitating DIA scan extraction. A complete DDIA data processing pipeline, including the modules for iRT vs RT calibration curve generation, DIA extraction classifier training, and false discovery rate control, has been developed. Compared to another spectral library-free method, DIA-Umpire, the DDIA method produced a similar number of peptide identifications, but nearly twice as many protein group identifications. The primary advantage of the DDIA method is that it requires minimal information for processing its data.
Collapse
Affiliation(s)
- Shenheng Guan
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada.,Program in Cell Biology and SPARC BioCentre, Hospital for Sick Children, 686 Bay Street, Toronto, Ontario M5G 0A4, Canada
| | - Paul P Taylor
- Rapid Novor Inc., Unit 450, 137 Glasgow Street, Kitchener, Ontario N2G 4X8, Canada
| | - Ziwei Han
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Michael F Moran
- Program in Cell Biology and SPARC BioCentre, Hospital for Sick Children, 686 Bay Street, Toronto, Ontario M5G 0A4, Canada.,Department of Molecular Genetics, University of Toronto, 686 Bay Street, Toronto, Ontario M5G 0A4, Canada
| | - Bin Ma
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|