1
|
Thomas PD, Ebert D, Muruganujan A, Mushayahama T, Albou L, Mi H. PANTHER: Making genome-scale phylogenetics accessible to all. Protein Sci 2022; 31:8-22. [PMID: 34717010 PMCID: PMC8740835 DOI: 10.1002/pro.4218] [Citation(s) in RCA: 450] [Impact Index Per Article: 225.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 10/24/2021] [Accepted: 10/26/2021] [Indexed: 02/03/2023]
Abstract
Phylogenetics is a powerful tool for analyzing protein sequences, by inferring their evolutionary relationships to other proteins. However, phylogenetics analyses can be challenging: they are computationally expensive and must be performed carefully in order to avoid systematic errors and artifacts. Protein Analysis THrough Evolutionary Relationships (PANTHER; http://pantherdb.org) is a publicly available, user-focused knowledgebase that stores the results of an extensive phylogenetic reconstruction pipeline that includes computational and manual processes and quality control steps. First, fully reconciled phylogenetic trees (including ancestral protein sequences) are reconstructed for a set of "reference" protein sequences obtained from fully sequenced genomes of organisms across the tree of life. Second, the resulting phylogenetic trees are manually reviewed and annotated with function evolution events: inferred gains and losses of protein function along branches of the phylogenetic tree. Here, we describe in detail the current contents of PANTHER, how those contents are generated, and how they can be used in a variety of applications. The PANTHER knowledgebase can be downloaded or accessed via an extensive API. In addition, PANTHER provides software tools to facilitate the application of the knowledgebase to common protein sequence analysis tasks: exploring an annotated genome by gene function; performing "enrichment analysis" of lists of genes; annotating a single sequence or large batch of sequences by homology; and assessing the likelihood that a genetic variant at a particular site in a protein will have deleterious effects.
Collapse
Affiliation(s)
- Paul D. Thomas
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Dustin Ebert
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Anushya Muruganujan
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Tremayne Mushayahama
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Laurent‐Philippe Albou
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| |
Collapse
|
2
|
Vega Yon GG, Thomas DC, Morrison J, Mi H, Thomas PD, Marjoram P. Bayesian parameter estimation for automatic annotation of gene functions using observational data and phylogenetic trees. PLoS Comput Biol 2021; 17:e1007948. [PMID: 33600408 PMCID: PMC7924801 DOI: 10.1371/journal.pcbi.1007948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 03/02/2021] [Accepted: 12/30/2020] [Indexed: 11/29/2022] Open
Abstract
Gene function annotation is important for a variety of downstream analyses of genetic data. But experimental characterization of function remains costly and slow, making computational prediction an important endeavor. Phylogenetic approaches to prediction have been developed, but implementation of a practical Bayesian framework for parameter estimation remains an outstanding challenge. We have developed a computationally efficient model of evolution of gene annotations using phylogenies based on a Bayesian framework using Markov Chain Monte Carlo for parameter estimation. Unlike previous approaches, our method is able to estimate parameters over many different phylogenetic trees and functions. The resulting parameters agree with biological intuition, such as the increased probability of function change following gene duplication. The method performs well on leave-one-out cross-validation, and we further validated some of the predictions in the experimental scientific literature.
Collapse
Affiliation(s)
- George G. Vega Yon
- Division of Biostatistics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Duncan C. Thomas
- Division of Biostatistics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| | - John Morrison
- Division of Biostatistics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Paul D. Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Paul Marjoram
- Division of Biostatistics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| |
Collapse
|
3
|
Puri D, Swamy CVB, Dhawan J, Mishra RK. Comparative nuclear matrix proteome analysis of skeletal muscle cells in different cellular states. Cell Biol Int 2021; 45:580-598. [PMID: 33200434 DOI: 10.1002/cbin.11499] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Revised: 10/01/2020] [Accepted: 11/11/2020] [Indexed: 12/20/2022]
Abstract
The nuclear matrix (NuMat) serves as the structural framework for organizing and maintaining nuclear architecture, however, the mechanisms by which this non-chromatin compartment is constructed and regulated are poorly understood. This study presents a proteomic analysis of the NuMat isolated from cultured skeletal muscle cells in three distinct cellular states- proliferating myoblasts (MBs), terminally differentiated myotubes (MTs), and mitotically quiescent (G0) myoblasts. About 40% of the proteins identified were found to be common in the NuMat proteome of these morphologically and functionally distinct cell states. These proteins, termed as the "core NuMat," define the stable, conserved, structural constituent of the nucleus, with functions such as RNA splicing, cytoskeletal organization, and chromatin modification, while the remaining NuMat proteins showed cell-state specificity, consistent with a more dynamic and potentially regulatory function. Specifically, myoblast NuMat was enriched in cell cycle, DNA replication and repair proteins, myotube NuMat in muscle differentiation and muscle function proteins, while G0 NuMat was enriched in metabolic, transcription, and transport proteins. These findings offer a new perspective for a cell-state-specific role of nuclear architecture and spatial organization, integrated with diverse cellular processes, and implicate NuMat proteins in the control of the cell cycle, lineage commitment, and differentiation.
Collapse
Affiliation(s)
- Deepika Puri
- Centre for Cellular and Molecular Biology, Council for Scientific and Industrial Research, Hyderabad, India
| | - Ch V B Swamy
- Centre for Cellular and Molecular Biology, Council for Scientific and Industrial Research, Hyderabad, India
| | - Jyotsna Dhawan
- Centre for Cellular and Molecular Biology, Council for Scientific and Industrial Research, Hyderabad, India
| | - Rakesh K Mishra
- Centre for Cellular and Molecular Biology, Council for Scientific and Industrial Research, Hyderabad, India
| |
Collapse
|
4
|
Zhang P, Berardini TZ, Ebert D, Li Q, Mi H, Muruganujan A, Prithvi T, Reiser L, Sawant S, Thomas PD, Huala E. PhyloGenes: An online phylogenetics and functional genomics resource for plant gene function inference. PLANT DIRECT 2020; 4:e00293. [PMID: 33392435 PMCID: PMC7773024 DOI: 10.1002/pld3.293] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 11/11/2020] [Indexed: 05/22/2023]
Abstract
We aim to enable the accurate and efficient transfer of knowledge about gene function gained from Arabidopsis thaliana and other model organisms to other plant species. This knowledge transfer is frequently challenging in plants due to duplications of individual genes and whole genomes in plant lineages. Such duplications result in complex evolutionary relationships between related genes, which may have similar sequences but highly divergent functions. In such cases, functional inference requires more than a simple sequence similarity calculation. We have developed an online resource, PhyloGenes (phylogenes.org), that displays precomputed phylogenetic trees for plant gene families along with experimentally validated function information for individual genes within the families. A total of 40 plant genomes and 10 non-plant model organisms are represented in over 8,000 gene families. Evolutionary events such as speciation and duplication are clearly labeled on gene trees to distinguish orthologs from paralogs. Nearly 6,000 families have at least one member with an experimentally supported annotation to a Gene Ontology (GO) molecular function or biological process term. By displaying experimentally validated gene functions associated to individual genes within a tree, PhyloGenes enables functional inference for genes of uncharacterized function, based on their evolutionary relationships to experimentally studied genes, in a visually traceable manner. For the many families containing genes that have evolved to perform different functions, PhyloGenes facilitates the use of evolutionary history to determine the most likely function of genes that have not been experimentally characterized. Future work will enrich the resource by incorporating additional gene function datasets such as plant gene expression atlas data.
Collapse
Affiliation(s)
| | | | - Dustin Ebert
- Department of Preventive MedicineUniversity of Southern CaliforniaLos AngelesCAUSA
| | - Qian Li
- Phoenix BioinformaticsFremontCAUSA
| | - Huaiyu Mi
- Department of Preventive MedicineUniversity of Southern CaliforniaLos AngelesCAUSA
| | - Anushya Muruganujan
- Department of Preventive MedicineUniversity of Southern CaliforniaLos AngelesCAUSA
| | | | | | | | - Paul D. Thomas
- Department of Preventive MedicineUniversity of Southern CaliforniaLos AngelesCAUSA
| | | |
Collapse
|
5
|
Mi H, Muruganujan A, Ebert D, Huang X, Thomas PD. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res 2020; 47:D419-D426. [PMID: 30407594 PMCID: PMC6323939 DOI: 10.1093/nar/gky1038] [Citation(s) in RCA: 1820] [Impact Index Per Article: 455.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/17/2018] [Indexed: 12/16/2022] Open
Abstract
PANTHER (Protein Analysis Through Evolutionary Relationships, http://pantherdb.org) is a resource for the evolutionary and functional classification of genes from organisms across the tree of life. We report the improvements we have made to the resource during the past two years. For evolutionary classifications, we have added more prokaryotic and plant genomes to the phylogenetic gene trees, expanding the representation of gene evolution in these lineages. We have refined many protein family boundaries, and have aligned PANTHER with the MEROPS resource for protease and protease inhibitor families. For functional classifications, we have developed an entirely new PANTHER GO-slim, containing over four times as many Gene Ontology terms as our previous GO-slim, as well as curated associations of genes to these terms. Lastly, we have made substantial improvements to the enrichment analysis tools available on the PANTHER website: users can now analyze over 900 different genomes, using updated statistical tests with false discovery rate corrections for multiple testing. The overrepresentation test is also available as a web service, for easy addition to third-party sites.
Collapse
Affiliation(s)
- Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Anushya Muruganujan
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Dustin Ebert
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Xiaosong Huang
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA.,School of Life Sciences, Guangzhou University, Guangzhou 510006, China
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| |
Collapse
|
6
|
Huang X, Albou LP, Mushayahama T, Muruganujan A, Tang H, Thomas PD. Ancestral Genomes: a resource for reconstructed ancestral genes and genomes across the tree of life. Nucleic Acids Res 2020; 47:D271-D279. [PMID: 30371900 PMCID: PMC6323951 DOI: 10.1093/nar/gky1009] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 10/10/2018] [Indexed: 11/23/2022] Open
Abstract
A growing number of whole genome sequencing projects, in combination with development of phylogenetic methods for reconstructing gene evolution, have provided us with a window into genomes that existed millions, and even billions, of years ago. Ancestral Genomes (http://ancestralgenomes.org) is a resource for comprehensive reconstructions of these ‘fossil genomes’. Comprehensive sets of protein-coding genes have been reconstructed for 78 genomes of now-extinct species that were the common ancestors of extant species from across the tree of life. The reconstructed genes are based on the extensive library of over 15 000 gene family trees from the PANTHER database, and are updated on a yearly basis. For each ancestral gene, we assign a stable identifier, and provide additional information designed to facilitate analysis: an inferred name, a reconstructed protein sequence, a set of inferred Gene Ontology (GO) annotations, and a ‘proxy gene’ for each ancestral gene, defined as the least-diverged descendant of the ancestral gene in a given extant genome. On the Ancestral Genomes website, users can browse the Ancestral Genomes by selecting nodes in a species tree, and can compare an extant genome with any of its reconstructed ancestors to understand how the genome evolved.
Collapse
Affiliation(s)
- Xiaosong Huang
- School of Life Sciences, Guangzhou University, Guangzhou 510006, China.,Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Laurent-Philippe Albou
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Tremayne Mushayahama
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Anushya Muruganujan
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Haiming Tang
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| |
Collapse
|
7
|
Rawlings ND, Barrett AJ, Thomas PD, Huang X, Bateman A, Finn RD. The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database. Nucleic Acids Res 2019; 46:D624-D632. [PMID: 29145643 PMCID: PMC5753285 DOI: 10.1093/nar/gkx1134] [Citation(s) in RCA: 962] [Impact Index Per Article: 192.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 10/30/2017] [Indexed: 12/15/2022] Open
Abstract
The MEROPS database (http://www.ebi.ac.uk/merops/) is an integrated source of information about peptidases, their substrates and inhibitors. The hierarchical classification is: protein-species, family, clan, with an identifier at each level. The MEROPS website moved to the EMBL-EBI in 2017, requiring refactoring of the code-base and services provided. The interface to sequence searching has changed and the MEROPS protein sequence libraries can be searched at the EMBL-EBI with HMMER, FastA and BLASTP. Cross-references have been established between MEROPS and the PANTHER database at both the family and protein-species level, which will help to improve curation and coverage between the resources. Because of the increasing size of the MEROPS sequence collection, in future only sequences of characterized proteins, and from completely sequenced genomes of organisms of evolutionary, medical or commercial significance will be added. As an example, peptidase homologues in four proteomes from the Asgard superphylum of Archaea have been identified and compared to other archaean, bacterial and eukaryote proteomes. This has given insights into the origins and evolution of peptidase families, including an expansion in the number of proteasome components in Asgard archaeotes and as organisms increase in complexity. Novel structures for proteasome complexes in archaea are postulated.
Collapse
Affiliation(s)
- Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Alan J Barrett
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, 1450 Biggy St, NRT 2502, Los Angeles, CA 90033, USA
| | - Xiaosong Huang
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, 1450 Biggy St, NRT 2502, Los Angeles, CA 90033, USA
| | - Alex Bateman
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Robert D Finn
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
8
|
Kinfe TM, Asif M, Chakravarthy KV, Deer TR, Kramer JM, Yearwood TL, Hurlemann R, Hussain MS, Motameny S, Wagle P, Nürnberg P, Gravius S, Randau T, Gravius N, Chaudhry SR, Muhammad S. Unilateral L4-dorsal root ganglion stimulation evokes pain relief in chronic neuropathic postsurgical knee pain and changes of inflammatory markers: part II whole transcriptome profiling. J Transl Med 2019; 17:205. [PMID: 31217010 PMCID: PMC6585082 DOI: 10.1186/s12967-019-1952-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 06/09/2019] [Indexed: 01/08/2023] Open
Abstract
Background In our recent clinical trial, increased peripheral concentrations of pro-inflammatory molecular mediators were determined in complex regional pain syndrome (CRPS) patients. After 3 months adjunctive unilateral, selective L4 dorsal root ganglion stimulation (L4-DRGSTIM), significantly decreased serum IL-10 and increased saliva oxytocin levels were assessed along with an improved pain and functional state. The current study extended molecular profiling towards gene expression analysis of genes known to be involved in the gonadotropin releasing hormone receptor and neuroinflammatory (cytokines/chemokines) signaling pathways. Methods Blood samples were collected from 12 CRPS patients for whole-transcriptome profiling in order to assay 18,845 inflammation-associated genes from frozen blood at baseline and after 3 months L4-DRGSTIM using PANTHER™ pathway enrichment analysis tool. Results Pathway enrichment analyses tools (GOrilla™ and PANTHER™) showed predominant involvement of inflammation mediated by chemokines/cytokines and gonadotropin releasing hormone receptor pathways. Further, screening of differentially regulated genes showed changes in innate immune response related genes. Transcriptomic analysis showed that 21 genes (predominantly immunoinflammatory) were significantly changed after L4-DRGSTIM. Seven genes including TLR1, FFAR2, IL1RAP, ILRN, C5, PKB and IL18 were down regulated and fourteen genes including CXCL2, CCL11, IL36G, CRP, SCGB1A1, IL-17F, TNFRSF4, PLA2G2A, CREB3L3, ADAMTS12, IL1F10, NOX1, CHIA and BDKRB1 were upregulated. Conclusions In our sub-group analysis of L4-DRGSTIM treated CRPS patients, we found either upregulated or downregulated genes involved in immunoinflammatory circuits relevant for the pathophysiology of CRPS indicating a possible relation. However, large biobank-based approaches are recommended to establish genetic phenotyping as a quantitative outcome measure in CRPS patients. Trial registration The study protocol was registered at the 15.11.2016 on German Register for Clinical Trials (DRKS ID 00011267). https://www.drks.de/drks_web/navigate.do?navigationId=trial.HTML&TRIAL_ID=DRKS00011267
Collapse
Affiliation(s)
- Thomas M Kinfe
- Department of Psychiatry, Rheinische Friedrich-Wilhelms University, Sigmund-Freud Street 25, 53105, Bonn, Germany. .,Division of Medical Psychology (NEMO Neuromodulation of Emotions), Rheinische Friedrich-Wilhelms University, Bonn, Germany. .,University Hospital Bonn, Rheinische Friedrich-Wilhelms University, Bonn, Germany.
| | - Maria Asif
- Cologne Center for Genomics (CCG), University of Cologne, Cologne, Germany.,Institute of Biochemistry I, Medical Faculty, University of Cologne, Cologne, Germany.,Center for Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Krishnan V Chakravarthy
- Department of Anesthesiology and Pain Medicine, University of California, San Diego, CA, USA.,San Diego Health Sciences, VA San Diego Healthcare System, San Diego, CA, USA
| | - Timothy R Deer
- The Spine and Nerve Center of the Virginias, Charleston, WV, USA
| | | | | | - Rene Hurlemann
- Department of Psychiatry, Rheinische Friedrich-Wilhelms University, Sigmund-Freud Street 25, 53105, Bonn, Germany.,Division of Medical Psychology (NEMO Neuromodulation of Emotions), Rheinische Friedrich-Wilhelms University, Bonn, Germany.,University Hospital Bonn, Rheinische Friedrich-Wilhelms University, Bonn, Germany
| | - Muhammad Sajid Hussain
- Cologne Center for Genomics (CCG), University of Cologne, Cologne, Germany.,Institute of Biochemistry I, Medical Faculty, University of Cologne, Cologne, Germany.,Center for Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany
| | - Susanne Motameny
- Cologne Center for Genomics (CCG), University of Cologne, Cologne, Germany
| | - Prerana Wagle
- Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Peter Nürnberg
- Cologne Center for Genomics (CCG), University of Cologne, Cologne, Germany.,Center for Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany.,Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Sascha Gravius
- University Hospital Bonn, Rheinische Friedrich-Wilhelms University, Bonn, Germany.,Department of Orthopedics and Trauma Surgery, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | - Thomas Randau
- University Hospital Bonn, Rheinische Friedrich-Wilhelms University, Bonn, Germany.,Department of Orthopedics and Trauma Surgery, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | - Nadine Gravius
- University Hospital Bonn, Rheinische Friedrich-Wilhelms University, Bonn, Germany.,Department of Orthopedics and Trauma Surgery, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | - Shafqat R Chaudhry
- Dept. of Basic Medical Sciences Shifa College of Pharmaceutical Sciences, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | - Sajjad Muhammad
- Department of Neurosurgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| |
Collapse
|
9
|
Liu L, Anderson C, Pearl D, Edwards SV. Modern Phylogenomics: Building Phylogenetic Trees Using the Multispecies Coalescent Model. Methods Mol Biol 2019; 1910:211-239. [PMID: 31278666 DOI: 10.1007/978-1-4939-9074-0_7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The multispecies coalescent (MSC) model provides a compelling framework for building phylogenetic trees from multilocus DNA sequence data. The pure MSC is best thought of as a special case of so-called "multispecies network coalescent" models, in which gene flow is allowed among branches of the tree, whereas MSC methods assume there is no gene flow between diverging species. Early implementations of the MSC, such as "parsimony" or "democratic vote" approaches to combining information from multiple gene trees, as well as concatenation, in which DNA sequences from multiple gene trees are combined into a single "supergene," were quickly shown to be inconsistent in some regions of tree space, in so far as they converged on the incorrect species tree as more gene trees and sequence data were accumulated. The anomaly zone, a region of tree space in which the most frequent gene tree is different from the species tree, is one such region where many so-called "coalescent" methods are inconsistent. Second-generation implementations of the MSC employed Bayesian or likelihood models; these are consistent in all regions of gene tree space, but Bayesian methods in particular are incapable of handling the large phylogenomic data sets currently available. Two-step methods, such as MP-EST and ASTRAL, in which gene trees are first estimated and then combined to estimate an overarching species tree, are currently popular in part because they can handle large phylogenomic data sets. These methods are consistent in the anomaly zone but can sometimes provide inappropriate measures of tree support or apportion error and signal in the data inappropriately. MP-EST in particular employs a likelihood model which can be conveniently manipulated to perform statistical tests of competing species trees, incorporating the likelihood of the collected gene trees on each species tree in a likelihood ratio test. Such tests provide a useful alternative to the multilocus bootstrap, which only indirectly tests the appropriateness of competing species trees. We illustrate these tests and implementations of the MSC with examples and suggest that MSC methods are a useful class of models effectively using information from multiple loci to build phylogenetic trees.
Collapse
Affiliation(s)
- Liang Liu
- Department of Statistics, University of Georgia, Athens, GA, USA
| | | | - Dennis Pearl
- Department of Statistics, Pennsylvania State University, University Park, PA, USA
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology & Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
10
|
GATC: a genetic algorithm for gene tree construction under the Duplication-Transfer-Loss model of evolution. BMC Genomics 2018; 19:102. [PMID: 29764363 PMCID: PMC5954287 DOI: 10.1186/s12864-018-4455-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
11
|
Abstract
The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function and the extent to which the functional repertoire can vary across the three kingdoms of life. This has lead to the creation of a wide range of protein family classifications that aim to group proteins based upon their evolutionary relationships.In this chapter we discuss the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and we show how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.
Collapse
|
12
|
Making sense of genomes of parasitic worms: Tackling bioinformatic challenges. Biotechnol Adv 2016; 34:663-686. [DOI: 10.1016/j.biotechadv.2016.03.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 02/25/2016] [Accepted: 03/01/2016] [Indexed: 01/25/2023]
|
13
|
Noutahi E, Semeria M, Lafond M, Seguin J, Boussau B, Guéguen L, El-Mabrouk N, Tannier E. Efficient Gene Tree Correction Guided by Genome Evolution. PLoS One 2016; 11:e0159559. [PMID: 27513924 PMCID: PMC4981423 DOI: 10.1371/journal.pone.0159559] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 07/04/2016] [Indexed: 12/31/2022] Open
Abstract
MOTIVATIONS Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. RESULTS We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. AVAILABILITY A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available.
Collapse
Affiliation(s)
- Emmanuel Noutahi
- Département d’Informatique (DIRO), Université de Montréal, H3C3J7 Montréal, Canada
| | - Magali Semeria
- LBBE, UMR CNRS 5558, Université de Lyon 1, F-69622 Villeurbanne, France
| | - Manuel Lafond
- Département d’Informatique (DIRO), Université de Montréal, H3C3J7 Montréal, Canada
| | - Jonathan Seguin
- Département d’Informatique (DIRO), Université de Montréal, H3C3J7 Montréal, Canada
| | - Bastien Boussau
- LBBE, UMR CNRS 5558, Université de Lyon 1, F-69622 Villeurbanne, France
| | - Laurent Guéguen
- LBBE, UMR CNRS 5558, Université de Lyon 1, F-69622 Villeurbanne, France
| | - Nadia El-Mabrouk
- Département d’Informatique (DIRO), Université de Montréal, H3C3J7 Montréal, Canada
| | - Eric Tannier
- LBBE, UMR CNRS 5558, Université de Lyon 1, F-69622 Villeurbanne, France
- INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| |
Collapse
|
14
|
Standardized benchmarking in the quest for orthologs. Nat Methods 2016; 13:425-30. [PMID: 27043882 PMCID: PMC4827703 DOI: 10.1038/nmeth.3830] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 03/09/2016] [Indexed: 11/23/2022]
Abstract
Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision–recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.
Collapse
|
15
|
Tekaia F. Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016; 9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]
Abstract
With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|
16
|
Mi H, Poudel S, Muruganujan A, Casagrande JT, Thomas PD. PANTHER version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res 2015; 44:D336-42. [PMID: 26578592 PMCID: PMC4702852 DOI: 10.1093/nar/gkv1194] [Citation(s) in RCA: 647] [Impact Index Per Article: 71.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 10/23/2015] [Indexed: 11/16/2022] Open
Abstract
PANTHER (Protein Analysis THrough Evolutionary Relationships, http://pantherdb.org) is a widely used online resource for comprehensive protein evolutionary and functional classification, and includes tools for large-scale biological data analysis. Recent development has been focused in three main areas: genome coverage, functional information (‘annotation’) coverage and accuracy, and improved genomic data analysis tools. The latest version of PANTHER, 10.0, includes almost 5000 new protein families (for a total of over 12 000 families), each with a reference phylogenetic tree including protein-coding genes from 104 fully sequenced genomes spanning all kingdoms of life. Phylogenetic trees now include inference of horizontal transfer events in addition to speciation and gene duplication events. Functional annotations are regularly updated using the models generated by the Gene Ontology Phylogenetic Annotation Project. For the data analysis tools, PANTHER has expanded the number of different ‘functional annotation sets’ available for functional enrichment testing, allowing analyses to access all Gene Ontology annotations—updated monthly from the Gene Ontology database—in addition to the annotations that have been inferred through evolutionary relationships. The Prowler (data browser) has been updated to enable users to more efficiently browse the entire database, and to create custom gene lists using the multiple axes of classification in PANTHER.
Collapse
Affiliation(s)
- Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90089, USA
| | - Sagar Poudel
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90089, USA
| | - Anushya Muruganujan
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90089, USA
| | - John T Casagrande
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90089, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
17
|
Hamed M, Spaniol C, Nazarieh M, Helms V. TFmiR: a web server for constructing and analyzing disease-specific transcription factor and miRNA co-regulatory networks. Nucleic Acids Res 2015; 43:W283-8. [PMID: 25943543 PMCID: PMC4489273 DOI: 10.1093/nar/gkv418] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 04/18/2015] [Indexed: 01/27/2023] Open
Abstract
TFmiR is a freely available web server for deep and integrative analysis of combinatorial regulatory interactions between transcription factors, microRNAs and target genes that are involved in disease pathogenesis. Since the inner workings of cells rely on the correct functioning of an enormously complex system of activating and repressing interactions that can be perturbed in many ways, TFmiR helps to better elucidate cellular mechanisms at the molecular level from a network perspective. The provided topological and functional analyses promote TFmiR as a reliable systems biology tool for researchers across the life science communities. TFmiR web server is accessible through the following URL: http://service.bioinformatik.uni-saarland.de/tfmir.
Collapse
Affiliation(s)
- Mohamed Hamed
- Center for Bioinformatics, Saarland University, 66041 Saarbrucken, Germany
| | - Christian Spaniol
- Center for Bioinformatics, Saarland University, 66041 Saarbrucken, Germany
| | - Maryam Nazarieh
- Center for Bioinformatics, Saarland University, 66041 Saarbrucken, Germany
| | - Volkhard Helms
- Center for Bioinformatics, Saarland University, 66041 Saarbrucken, Germany
| |
Collapse
|
18
|
Lafond M, Chauve C, Dondi R, El-Mabrouk N. Polytomy refinement for the correction of dubious duplications in gene trees. ACTA ACUST UNITED AC 2015; 30:i519-26. [PMID: 25161242 PMCID: PMC4147916 DOI: 10.1093/bioinformatics/btu463] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Motivation: Large-scale methods for inferring gene trees are error-prone. Correcting gene trees for weakly supported features often results in non-binary trees, i.e. trees with polytomies, thus raising the natural question of refining such polytomies into binary trees. A feature pointing toward potential errors in gene trees are duplications that are not supported by the presence of multiple gene copies. Results: We introduce the problem of refining polytomies in a gene tree while minimizing the number of created non-apparent duplications in the resulting tree. We show that this problem can be described as a graph-theoretical optimization problem. We provide a bounded heuristic with guaranteed optimality for well-characterized instances. We apply our algorithm to a set of ray-finned fish gene trees from the Ensembl database to illustrate its ability to correct dubious duplications. Availability and implementation: The C++ source code for the algorithms and simulations described in the article are available at http://www-ens.iro.umontreal.ca/~lafonman/software.php. Contact:lafonman@iro.umontreal.ca or mabrouk@iro.umontreal.ca Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manuel Lafond
- Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy
| | - Cedric Chauve
- Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy
| | - Riccardo Dondi
- Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy
| | - Nadia El-Mabrouk
- Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy
| |
Collapse
|
19
|
Abstract
This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.
Collapse
Affiliation(s)
- Gergely J Szöllősi
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Eric Tannier
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Bastien Boussau
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France;
| |
Collapse
|
20
|
Lafond M, Semeria M, Swenson KM, Tannier E, El-Mabrouk N. Gene tree correction guided by orthology. BMC Bioinformatics 2013; 14 Suppl 15:S5. [PMID: 24564227 PMCID: PMC3851885 DOI: 10.1186/1471-2105-14-s15-s5] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Background Reconciled gene trees yield orthology and paralogy relationships between genes. This information may however contradict other information on orthology and paralogy provided by other footprints of evolution, such as conserved synteny. Results We explore a way to include external information on orthology in the process of gene tree construction. Given an initial gene tree and a set of orthology constraints on pairs of genes or on clades, we give polynomial-time algorithms for producing a modified gene tree satisfying the set of constraints, that is as close as possible to the original one according to the Robinson-Foulds distance. We assess the validity of the modifications we propose by computing the likelihood ratio between initial and modified trees according to sequence alignments on Ensembl trees, showing that often the two trees are statistically equivalent. Availability Software and data available upon request to the corresponding author.
Collapse
|
21
|
Abstract
The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.
Collapse
|
22
|
Mi H, Muruganujan A, Thomas PD. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 2013; 41. [PMID: 23193289 PMCID: PMC3531194 DOI: 10.1093/nar/gks1118;select dbms_pipe.receive_message(chr(77)||chr(112)||chr(80)||chr(97),32) from dual--] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
The data and tools in PANTHER-a comprehensive, curated database of protein families, trees, subfamilies and functions available at http://pantherdb.org-have undergone continual, extensive improvement for over a decade. Here, we describe the current PANTHER process as a whole, as well as the website tools for analysis of user-uploaded data. The main goals of PANTHER remain essentially unchanged: the accurate inference (and practical application) of gene and protein function over large sequence databases, using phylogenetic trees to extrapolate from the relatively sparse experimental information from a few model organisms. Yet the focus of PANTHER has continually shifted toward more accurate and detailed representations of evolutionary events in gene family histories. The trees are now designed to represent gene family evolution, including inference of evolutionary events, such as speciation and gene duplication. Subfamilies are still curated and used to define HMMs, but gene ontology functional annotations can now be made at any node in the tree, and are designed to represent gain and loss of function by ancestral genes during evolution. Finally, PANTHER now includes stable database identifiers for inferred ancestral genes, which are used to associate inferred gene attributes with particular genes in the common ancestral genomes of extant species.
Collapse
Affiliation(s)
| | | | - Paul D. Thomas
- *To whom correspondence should be addressed. Tel: +1 323 442 7799; Fax +1 323 442 7995;
| |
Collapse
|
23
|
Mi H, Muruganujan A, Thomas PD. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 2012. [PMID: 23193289 PMCID: PMC3531194 DOI: 10.1093/nar/gks1118] [Citation(s) in RCA: 1277] [Impact Index Per Article: 106.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The data and tools in PANTHER-a comprehensive, curated database of protein families, trees, subfamilies and functions available at http://pantherdb.org-have undergone continual, extensive improvement for over a decade. Here, we describe the current PANTHER process as a whole, as well as the website tools for analysis of user-uploaded data. The main goals of PANTHER remain essentially unchanged: the accurate inference (and practical application) of gene and protein function over large sequence databases, using phylogenetic trees to extrapolate from the relatively sparse experimental information from a few model organisms. Yet the focus of PANTHER has continually shifted toward more accurate and detailed representations of evolutionary events in gene family histories. The trees are now designed to represent gene family evolution, including inference of evolutionary events, such as speciation and gene duplication. Subfamilies are still curated and used to define HMMs, but gene ontology functional annotations can now be made at any node in the tree, and are designed to represent gain and loss of function by ancestral genes during evolution. Finally, PANTHER now includes stable database identifiers for inferred ancestral genes, which are used to associate inferred gene attributes with particular genes in the common ancestral genomes of extant species.
Collapse
Affiliation(s)
- Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | | | | |
Collapse
|
24
|
Gaudet P, Livstone MS, Lewis SE, Thomas PD. Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Brief Bioinform 2011; 12:449-62. [PMID: 21873635 PMCID: PMC3178059 DOI: 10.1093/bib/bbr042] [Citation(s) in RCA: 573] [Impact Index Per Article: 44.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be based on predictions. To make as accurate inferences as possible, the GO Consortium's Reference Genome Project is using an explicit evolutionary framework to infer annotations of proteins from a broad set of genomes from experimental annotations in a semi-automated manner. Most components in the pipeline, such as selection of sequences, building multiple sequence alignments and phylogenetic trees, retrieving experimental annotations and depositing inferred annotations, are fully automated. However, the most crucial step in our pipeline relies on software-assisted curation by an expert biologist. This curation tool, Phylogenetic Annotation and INference Tool (PAINT) helps curators to infer annotations among members of a protein family. PAINT allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions. In this article, we describe how we use PAINT to infer protein function in a phylogenetic context with emphasis on its strengths, limitations and guidelines. We also discuss specific examples showing how PAINT annotations compare with those generated by other highly used homology-based methods.
Collapse
Affiliation(s)
- Pascale Gaudet
- Swiss Institute for Bioinformatics, CMU, 1 Rue Michel Servet, 1211 Geneva 4, Switzerland.
| | | | | | | |
Collapse
|
25
|
Lévesque CA, Brouwer H, Cano L, Hamilton JP, Holt C, Huitema E, Raffaele S, Robideau GP, Thines M, Win J, Zerillo MM, Beakes GW, Boore JL, Busam D, Dumas B, Ferriera S, Fuerstenberg SI, Gachon CMM, Gaulin E, Govers F, Grenville-Briggs L, Horner N, Hostetler J, Jiang RHY, Johnson J, Krajaejun T, Lin H, Meijer HJG, Moore B, Morris P, Phuntmart V, Puiu D, Shetty J, Stajich JE, Tripathy S, Wawra S, van West P, Whitty BR, Coutinho PM, Henrissat B, Martin F, Thomas PD, Tyler BM, De Vries RP, Kamoun S, Yandell M, Tisserat N, Buell CR. Genome sequence of the necrotrophic plant pathogen Pythium ultimum reveals original pathogenicity mechanisms and effector repertoire. Genome Biol 2010; 11:R73. [PMID: 20626842 PMCID: PMC2926784 DOI: 10.1186/gb-2010-11-7-r73] [Citation(s) in RCA: 257] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Revised: 05/02/2010] [Accepted: 07/13/2010] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Pythium ultimum is a ubiquitous oomycete plant pathogen responsible for a variety of diseases on a broad range of crop and ornamental species. RESULTS The P. ultimum genome (42.8 Mb) encodes 15,290 genes and has extensive sequence similarity and synteny with related Phytophthora species, including the potato blight pathogen Phytophthora infestans. Whole transcriptome sequencing revealed expression of 86% of genes, with detectable differential expression of suites of genes under abiotic stress and in the presence of a host. The predicted proteome includes a large repertoire of proteins involved in plant pathogen interactions, although, surprisingly, the P. ultimum genome does not encode any classical RXLR effectors and relatively few Crinkler genes in comparison to related phytopathogenic oomycetes. A lower number of enzymes involved in carbohydrate metabolism were present compared to Phytophthora species, with the notable absence of cutinases, suggesting a significant difference in virulence mechanisms between P. ultimum and more host-specific oomycete species. Although we observed a high degree of orthology with Phytophthora genomes, there were novel features of the P. ultimum proteome, including an expansion of genes involved in proteolysis and genes unique to Pythium. We identified a small gene family of cadherins, proteins involved in cell adhesion, the first report of these in a genome outside the metazoans. CONCLUSIONS Access to the P. ultimum genome has revealed not only core pathogenic mechanisms within the oomycetes but also lineage-specific genes associated with the alternative virulence and lifestyles found within the pythiaceous lineages compared to the Peronosporaceae.
Collapse
Affiliation(s)
- C André Lévesque
- Agriculture and Agri-Food Canada, 960 Carling Ave, Ottawa, ON, K1A 0C6, Canada
- Department of Biology, Carleton University, Ottawa, ON, K1S 5B6, Canada
| | - Henk Brouwer
- CBS-KNAW, Fungal Biodiversity Centre, Uppsalalaan 8, Utrecht, 3584 CT, The Netherlands
| | | | - John P Hamilton
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Carson Holt
- Eccles Institute of Human Genetics, University of Utah, 15 North 2030 East, Room 2100, Salt Lake City, UT 84112-5330, USA
| | | | | | - Gregg P Robideau
- Agriculture and Agri-Food Canada, 960 Carling Ave, Ottawa, ON, K1A 0C6, Canada
- Department of Biology, Carleton University, Ottawa, ON, K1S 5B6, Canada
| | - Marco Thines
- Biodiversity and Climate Research Centre, Georg-Voigt-Str 14-16, D-60325, Frankfurt, Germany
- Department of Biological Sciences, Insitute of Ecology, Evolution and Diversity, Johann Wolfgang Goethe University, Siesmayerstr. 70, D-60323 Frankfurt, Germany
| | - Joe Win
- The Sainsbury Laboratory, Norwich, NR4 7UH, UK
| | - Marcelo M Zerillo
- Department of Bioagricultural Sciences and Pest Management, Colorado State University, Fort Collins, CO 80523-1177, USA
| | - Gordon W Beakes
- School of Biology, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
| | - Jeffrey L Boore
- Genome Project Solutions, 1024 Promenade Street, Hercules, CA 94547, USA
| | - Dana Busam
- J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA
| | - Bernard Dumas
- Surfaces Cellulaires et Signalisation chez les Végétaux, UMR5546 CNRS-Université de Toulouse, 24 chemin de Borde Rouge, BP42617, Auzeville, Castanet-Tolosan, F-31326, France
| | - Steve Ferriera
- J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA
| | | | | | - Elodie Gaulin
- Surfaces Cellulaires et Signalisation chez les Végétaux, UMR5546 CNRS-Université de Toulouse, 24 chemin de Borde Rouge, BP42617, Auzeville, Castanet-Tolosan, F-31326, France
| | - Francine Govers
- Laboratory of Phytopathology, Wageningen University, NL-1-6708 PB, Wageningen, The Netherlands
- Centre for BioSystems Genomics (CBSG), PO Box 98, 6700 AB Wageningen, The Netherlands
| | - Laura Grenville-Briggs
- Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK
| | - Neil Horner
- Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK
| | - Jessica Hostetler
- J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA
| | - Rays HY Jiang
- The Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Justin Johnson
- J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA
| | - Theerapong Krajaejun
- Department of Pathology, Faculty of Medicine-Ramathibodi Hospital, Mahidol University, Rama 6 Road, Bangkok, 10400, Thailand
| | - Haining Lin
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Harold JG Meijer
- Laboratory of Phytopathology, Wageningen University, NL-1-6708 PB, Wageningen, The Netherlands
| | - Barry Moore
- Eccles Institute of Human Genetics, University of Utah, 15 North 2030 East, Room 2100, Salt Lake City, UT 84112-5330, USA
| | - Paul Morris
- Department of Biological Sciences, Bowling Green State University, Bowling Green, OH 43403, USA
| | - Vipaporn Phuntmart
- Department of Biological Sciences, Bowling Green State University, Bowling Green, OH 43403, USA
| | - Daniela Puiu
- J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA
| | - Jyoti Shetty
- J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA
| | - Jason E Stajich
- Department of Plant Pathology and Microbiology, University of California, Riverside, CA 92521, USA
| | - Sucheta Tripathy
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Washington Street, Blacksburg, VA 24061-0477, USA
| | - Stephan Wawra
- Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK
| | - Pieter van West
- Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK
| | - Brett R Whitty
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Pedro M Coutinho
- Architecture et Fonction des Macromolecules Biologiques, UMR6098, CNRS, Univ. Aix-Marseille I & II, 163 Avenue de Luminy, 13288 Marseille, France
| | - Bernard Henrissat
- Architecture et Fonction des Macromolecules Biologiques, UMR6098, CNRS, Univ. Aix-Marseille I & II, 163 Avenue de Luminy, 13288 Marseille, France
| | - Frank Martin
- USDA-ARS, 1636 East Alisal St, Salinias, CA, 93905, USA
| | - Paul D Thomas
- Evolutionary Systems Biology, SRI International, Room AE207, 333 Ravenswood Ave, Menlo Park, CA 94025, USA
| | - Brett M Tyler
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Washington Street, Blacksburg, VA 24061-0477, USA
| | - Ronald P De Vries
- CBS-KNAW, Fungal Biodiversity Centre, Uppsalalaan 8, Utrecht, 3584 CT, The Netherlands
| | | | - Mark Yandell
- Eccles Institute of Human Genetics, University of Utah, 15 North 2030 East, Room 2100, Salt Lake City, UT 84112-5330, USA
| | - Ned Tisserat
- Department of Bioagricultural Sciences and Pest Management, Colorado State University, Fort Collins, CO 80523-1177, USA
| | - C Robin Buell
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|