1
|
Li Y, Chai Q, Chen Y, Ma Y, Wang Y, Zhao J. Genome-wide investigation of the OR gene family in Helicoverpa armigera and functional analysis of OR48 and OR75 in metamorphosis development. Int J Biol Macromol 2024; 278:134646. [PMID: 39128738 DOI: 10.1016/j.ijbiomac.2024.134646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/24/2024] [Accepted: 08/08/2024] [Indexed: 08/13/2024]
Abstract
The cotton bollworm, Helicoverpa armigera, is a significant global agricultural pest, particularly detrimental during its larval feeding period. Insects' odorant receptors (ORs) are crucial for their crop-feeding activities, yet a comprehensive analysis of H. armigera ORs has been lacking, and the influence of hormones on ORs remain understudied. Herein, we conducted a genome-wide study and identified 81 ORs, categorized into 15 distinct groups. Analyses of protein motifs and gene structures revealed both conservation within groups and divergence among them. Comparative gene duplication analysis between H. armigera and Bombyx mori highlighted different duplication patterns. We further investigated subcellular localization and protein interactions within the odorant receptor family, providing valuable insights for future functional and interaction studies of ORs. Specifically, we identified that OR48 and OR75 were abundantly expressed during molting/metamorphosis and feeding stages, respectively. We demonstrated that 20E induced the upregulation of OR48 via EcR, while insulin upregulated OR75 expression through InR. Moreover, 20E induced the translocation of OR48 to the cell membrane, mediating its effects. Functional studies involving the knockdown of OR48 and OR75 revealed their roles in metamorphosis development, with OR48 knockdown resulting in delayed pupation and OR75 knockdown leading to premature pupation. OR48 can promote autophagy and apoptosis in fat body, while OR75 can significantly inhibit apoptosis and autophagy. These findings significantly contribute to our understanding of OR function in H. armigera and shed light on potential avenues for pest control strategies.
Collapse
Affiliation(s)
- Yanli Li
- Institute of Industrial Crops, Shandong Academy of Agricultural Sciences, Jinan 250100, Shandong, China
| | - Qichao Chai
- Institute of Industrial Crops, Shandong Academy of Agricultural Sciences, Jinan 250100, Shandong, China
| | - Ying Chen
- Crop Research Institute, Shandong Academy of Agricultural Sciences, Jinan 250100, Shandong, China
| | - Yujia Ma
- College of Life Sciences, Shandong Normal University, Jinan 250300, Shandong, China
| | - Yongcui Wang
- Institute of Industrial Crops, Shandong Academy of Agricultural Sciences, Jinan 250100, Shandong, China
| | - Junsheng Zhao
- Institute of Industrial Crops, Shandong Academy of Agricultural Sciences, Jinan 250100, Shandong, China.
| |
Collapse
|
2
|
Raghuraman P, Ramireddy S, Raman G, Park S, Sudandiradoss C. Understanding a point mutation signature D54K in the caspase activation recruitment domain of NOD1 capitulating concerted immunity via atomistic simulation. J Biomol Struct Dyn 2024:1-17. [PMID: 38415678 DOI: 10.1080/07391102.2024.2322618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 12/11/2023] [Indexed: 02/29/2024]
Abstract
Point mutation D54K in the human N-terminal caspase recruitment domain (CARD) of nucleotide-binding oligomerization domain -1 (NOD1) abrogates an imperative downstream interaction with receptor-interacting protein kinase (RIPK2) that entails combating bacterial infections and inflammatory dysfunction. Here, we addressed the molecular details concerning conformational changes and interaction patterns (monomeric-dimeric states) of D54K by signature-based molecular dynamics simulation. Initially, the sequence analysis prioritized D54K as a pathogenic mutation, among other variants, based on a sequence signature. Since the mutation is highly conserved, we derived the distant ortholog to predict the sequence and structural similarity between native and mutant. This analysis showed the utility of 33 communal core residues associated with structural-functional preservation and variations, concurrently served to infer the cryptic hotspots Cys39, Glu53, Asp54, Glu56, Ile57, Leu74, and Lys78 determining the inter helical fold forming homodimers for putative receptor interaction. Subsequently, the atomistic simulations with free energy (MM/PB(GB)SA) calculations predicted structural alteration that takes place in the N-terminal mutant CARD where coils changed to helices (45 α3- L4-α4-L6- α683) in contrast to native (45T2-L4-α4-L6-T483). Likewise, the C-terminal helices 93T1-α7105 connected to the loops distorted compared to native 93α6-L7105 may result in conformational misfolding that promotes functional regulation and activation. These structural perturbations of D54K possibly destabilize the flexible adaptation of critical homotypic NOD1CARD-CARDRIPK2 interactions (α4Asp42-Arg488α5 and α6Phe86-Lys471α4) is consistent with earlier experimental reports. Altogether, our findings unveil the conformational plasticity of mutation-dependent immunomodulatory response and may aid in functional validation exploring clinical investigation on CARD-regulated immunotherapies to prevent systemic infection and inflammation.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- P Raghuraman
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea
| | - Sriroopreddy Ramireddy
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
- Department of Genetics and Molecular Biology, School of Health Sciences, The Apollo University, Chittoor, India
| | - Gurusamy Raman
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea
| | - SeonJoo Park
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea
| | - C Sudandiradoss
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
3
|
Pinto-Pinho P, Soares J, Esteves P, Pinto-Leite R, Fardilha M, Colaço B. Comparative Bioinformatic Analysis of the Proteomes of Rabbit and Human Sex Chromosomes. Animals (Basel) 2024; 14:217. [PMID: 38254386 PMCID: PMC10812427 DOI: 10.3390/ani14020217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/18/2023] [Accepted: 12/28/2023] [Indexed: 01/24/2024] Open
Abstract
Studying proteins associated with sex chromosomes can provide insights into sex-specific proteins. Membrane proteins accessible through the cell surface may serve as excellent targets for diagnostic, therapeutic, or even technological purposes, such as sperm sexing technologies. In this context, proteins encoded by sex chromosomes have the potential to become targets for X- or Y-chromosome-bearing spermatozoa. Due to the limited availability of proteomic studies on rabbit spermatozoa and poorly annotated databases for rabbits compared to humans, a bioinformatic analysis of the available rabbit X chromosome proteome (RX), as well as the human X (HX) and Y (HY) chromosomes proteome, was conducted to identify potential targets that could be accessible from the cell surface and predict which of the potential targets identified in humans might also exist in rabbits. We identified 100, 211, and 3 proteins associated with the plasma membrane or cell surface for RX, HX, and HY, respectively, of which 61, 132, and 3 proteins exhibit potential as targets as they were predicted to be accessible from the cell surface. Cross-referencing the potential HX targets with the rabbit proteome revealed an additional 60 proteins with the potential to be RX targets, resulting in a total of 121 potential RX targets. In addition, at least 53 possible common HX and RX targets have been previously identified in human spermatozoa, emphasizing their potential as targets of X-chromosome-bearing spermatozoa. Further proteomic studies on rabbit sperm will be essential to identify and validate the usefulness of these proteins for application in rabbit sperm sorting techniques as targets of X-chromosome-bearing spermatozoa.
Collapse
Affiliation(s)
- Patrícia Pinto-Pinho
- Centre for the Research and Technology of Agro-Environmental and Biological Sciences, University of Trás-os-Montes and Alto Douro, 5000-801 Vila Real, Portugal;
- Laboratory of Signal Transduction, Institute of Biomedicine, Department of Medical Sciences, University of Aveiro, 3810-193 Aveiro, Portugal;
- Laboratory of Genetics and Andrology, Hospital Center of Trás-os-Montes and Alto Douro, E.P.E., 5000-508 Vila Real, Portugal;
- Experimental Pathology and Therapeutics Group, IPO Porto Research Center, Portuguese Institute of Oncology of Porto Francisco Gentil, E.P.E., 4200-072 Porto, Portugal
| | - João Soares
- Department of Computer Science, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal; (J.S.); (P.E.)
- Center for Research in Advanced Computing Systems, Institute for Systems and Computer Engineering, Technology and Science (CRACS—INESC TEC), 4150-179 Porto, Portugal
| | - Pedro Esteves
- Department of Computer Science, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal; (J.S.); (P.E.)
- Department of Biology, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
- CIBIO—Research Centre in Biodiversity and Genetic Resources, InBIO Associate Laboratory, 4485-661 Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, Research Centre in Biodiversity and Genetic Resources, 4485-661 Vairão, Portugal
| | - Rosário Pinto-Leite
- Laboratory of Genetics and Andrology, Hospital Center of Trás-os-Montes and Alto Douro, E.P.E., 5000-508 Vila Real, Portugal;
- Experimental Pathology and Therapeutics Group, IPO Porto Research Center, Portuguese Institute of Oncology of Porto Francisco Gentil, E.P.E., 4200-072 Porto, Portugal
| | - Margarida Fardilha
- Laboratory of Signal Transduction, Institute of Biomedicine, Department of Medical Sciences, University of Aveiro, 3810-193 Aveiro, Portugal;
| | - Bruno Colaço
- Centre for the Research and Technology of Agro-Environmental and Biological Sciences, University of Trás-os-Montes and Alto Douro, 5000-801 Vila Real, Portugal;
- Animal and Veterinary Research Centre, University of Trás-os-Montes and Alto Douro, 5001-801 Vila Real, Portugal
| |
Collapse
|
4
|
Parakkunnel R, Bhojaraja Naik K, Susmita C, Girimalla V, Bhaskar KU, Sripathy KV, Shantharaja CS, Aravindan S, Kumar S, Lakhanpaul S, Bhat KV. Evolution and co-evolution: insights into the divergence of plant heat shock factor genes. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2022; 28:1029-1047. [PMID: 35722513 PMCID: PMC9203654 DOI: 10.1007/s12298-022-01183-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/27/2022] [Accepted: 05/04/2022] [Indexed: 05/03/2023]
Abstract
The Heat Shock Factor (Hsf) genes are widely distributed across the plant kingdom regulating the plant response to various abiotic stresses. In addition to natural selection, breeding and accelerated selection changed the structure and function of Hsf genes. 1076 Hsf genes from 30 genera from primitive algae to the most advanced plant species and major crop plants were used for phylogenetic analysis. The interspecific divergence was studied with 11 members of genus Oryza while intraspecific divergence was studied with sesame pan-genome adapted to diverse ecological niches. B2 genes in eudicots and monocots originated separately while A1 gave rise to the recently evolved Class-C genes and land colonization happened with evolution of A1 genes. An increase in the number of lineages in the Oryza clade with the evolution of AA genome indicated independent domestication and positive selection was observed in > 53% of loci whereas the highly conserved homologues were under purifying selection. The paralogous genes under positive selection exhibited more domain changes for diversified function and increased fitness. A significant co-evolving cluster involving amino acids Phenylalanine, Lysine and Valine played crucial role in maintaining hydrophobic core along with highly conserved Tryptophan residues. A mutation of Glutamic acid to Glutamine was observed in A8 genes of Lamiales affecting protein solvency. Breeding resulted in accumulation of mutations reducing the hydrophobicity of proteins and a further reduction in protein aggregation. This study identify genome duplications, non-neutral selection and co-evolving residues as causing drastic changes in the conserved domain of Hsf proteins. Supplementary information The online version contains supplementary material available at 10.1007/s12298-022-01183-7.
Collapse
Affiliation(s)
- Ramya Parakkunnel
- ICAR- Indian Institute of Seed Science, Regional Station, GKVK Campus, Bengaluru, Karnataka 560065 India
| | - K Bhojaraja Naik
- ICAR- Indian Institute of Seed Science, Regional Station, GKVK Campus, Bengaluru, Karnataka 560065 India
| | - C Susmita
- ICAR- Indian Institute of Seed Science, Mau, Uttar Pradesh 275103 India
| | - Vanishree Girimalla
- ICAR- Indian Institute of Seed Science, Regional Station, GKVK Campus, Bengaluru, Karnataka 560065 India
| | - K Udaya Bhaskar
- ICAR- Indian Institute of Seed Science, Regional Station, GKVK Campus, Bengaluru, Karnataka 560065 India
| | - KV Sripathy
- ICAR- Indian Institute of Seed Science, Regional Station, GKVK Campus, Bengaluru, Karnataka 560065 India
| | - CS Shantharaja
- ICAR- Indian Institute of Seed Science, Regional Station, GKVK Campus, Bengaluru, Karnataka 560065 India
| | - S Aravindan
- 4Division of Genomic Resources, ICAR- National Bureau of Plant Genetic Resources, New Delhi, 110012 India
| | - Sanjay Kumar
- ICAR- Indian Institute of Seed Science, Mau, Uttar Pradesh 275103 India
| | | | - KV Bhat
- 4Division of Genomic Resources, ICAR- National Bureau of Plant Genetic Resources, New Delhi, 110012 India
| |
Collapse
|
5
|
Du H, Ong YS, Knittel M, Mawhorter R, Liu N, Gross G, Tojo R, Libeskind-Hadas R, Wu YC. Multiple Optimal Reconciliations Under the Duplication-Loss-Coalescence Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2144-2156. [PMID: 31199267 DOI: 10.1109/tcbb.2019.2922337] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Gene trees can differ from species trees due to a variety of biological phenomena, the most prevalent being gene duplication, horizontal gene transfer, gene loss, and coalescence. To explain topological incongruence between the two trees, researchers apply reconciliation methods, often relying on a maximum parsimony framework. However, while several studies have investigated the space of maximum parsimony reconciliations (MPRs) under the duplication-loss and duplication-transfer-loss models, the space of MPRs under the duplication-loss-coalescence (DLC) model remains poorly understood. To address this problem, we present new algorithms for computing the size of MPR space under the DLC model and sampling from this space uniformly at random. Our algorithms are efficient in practice, with runtime polynomial in the size of the species and gene tree when the number of genes that map to any given species is fixed, thus proving that the MPR problem is fixed-parameter tractable. We have applied our methods to a biological data set of 16 fungal species to provide the first key insights in the space of MPRs under the DLC model. Our results show that a plurality reconciliation, and underlying events, are likely to be representative of MPR space.
Collapse
|
6
|
Hybrid Deep Learning Based on a Heterogeneous Network Profile for Functional Annotations of Plasmodium falciparum Genes. Int J Mol Sci 2021; 22:ijms221810019. [PMID: 34576183 PMCID: PMC8468833 DOI: 10.3390/ijms221810019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 09/13/2021] [Accepted: 09/14/2021] [Indexed: 12/15/2022] Open
Abstract
Functional annotation of unknown function genes reveals unidentified functions that can enhance our understanding of complex genome communications. A common approach for inferring gene function involves the ortholog-based method. However, genetic data alone are often not enough to provide information for function annotation. Thus, integrating other sources of data can potentially increase the possibility of retrieving annotations. Network-based methods are efficient techniques for exploring interactions among genes and can be used for functional inference. In this study, we present an analysis framework for inferring the functions of Plasmodium falciparum genes based on connection profiles in a heterogeneous network between human and Plasmodium falciparum proteins. These profiles were fed into a hybrid deep learning algorithm to predict the orthologs of unknown function genes. The results show high performance of the model's predictions, with an AUC of 0.89. One hundred and twenty-one predicted pairs with high prediction scores were selected for inferring the functions using statistical enrichment analysis. Using this method, PF3D7_1248700 and PF3D7_0401800 were found to be involved with muscle contraction and striated muscle tissue development, while PF3D7_1303800 and PF3D7_1201000 were found to be related to protein dephosphorylation. In conclusion, combining a heterogeneous network and a hybrid deep learning technique can allow us to identify unknown gene functions of malaria parasites. This approach is generalized and can be applied to other diseases that enhance the field of biomedical science.
Collapse
|
7
|
Gangele K, Gulati K, Joshi N, Kumar D, Poluri KM. Molecular insights into the differential structure-dynamics-stability features of interleukin-8 orthologs: Implications to functional specificity. Int J Biol Macromol 2020; 164:3221-3234. [PMID: 32853623 DOI: 10.1016/j.ijbiomac.2020.08.176] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 08/20/2020] [Accepted: 08/21/2020] [Indexed: 11/17/2022]
Abstract
Chemokines are a sub-group of chemotactic cytokines that regulate the leukocyte migration by binding to G-protein coupled receptors (GPCRs) and cell surface glycosaminoglycans (GAGs). Interleukin-8 (CXCL8/IL8) is one of the most essential CXC chemokine that has been reported to be involved in various pathophysiological conditions. Structure-function relationships of human IL8 have been studied extensively. However, no such detailed information is available on IL8 orthologs, although they exhibit significant functional divergence. In order to unravel the differential structure-dynamics-stability-function relationship of IL8 orthologs, comparative molecular analysis was performed on canine (laurasians) and human (primates) IL8 proteins using in-silico molecular evolutionary analysis and solution NMR spectroscopy methods. The residue level NMR studies suggested that, although the overall structural architecture of canine IL8 is similar to that of human IL8, systematic differences were observed in their backbone dynamics and low-energy excited states due to amino acid substitutions. Further, these substitutions also resulted in attenuation of stability and heparin binding affinity in the canine IL8 as compared to its human counterpart. Indeed, structural and sequence analysis evidenced for specificity of molecular interactions with cognate receptor (CXCR1) and glycosaminoglycan (heparin), thus providing evidence for a noticeable functional specificity and divergence between the two IL8 orthologs.
Collapse
Affiliation(s)
- Krishnakant Gangele
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee 247667, Uttarakhand, India
| | - Khushboo Gulati
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee 247667, Uttarakhand, India
| | - Nidhi Joshi
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee 247667, Uttarakhand, India
| | - Dinesh Kumar
- Centre of Biomedical Research, SGPGIMS Campus, Lucknow 226014, Uttar Pradesh, India
| | - Krishna Mohan Poluri
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee 247667, Uttarakhand, India; Centre for Nanotechnology, Indian Institute of Technology Roorkee, Roorkee 247667, Uttarakhand, India.
| |
Collapse
|
8
|
Sang ER, Tian Y, Gong Y, Miller LC, Sang Y. Integrate structural analysis, isoform diversity, and interferon-inductive propensity of ACE2 to predict SARS-CoV2 susceptibility in vertebrates. Heliyon 2020; 6:e04818. [PMID: 32904785 PMCID: PMC7458074 DOI: 10.1016/j.heliyon.2020.e04818] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 08/18/2020] [Accepted: 08/26/2020] [Indexed: 12/22/2022] Open
Abstract
The current new coronavirus disease (COVID-19) has caused globally over 0.4/6 million confirmed deaths/infected cases across more than 200 countries. As the etiological coronavirus (a.k.a. SARS-CoV2) may putatively have a bat origin, our understanding about its intermediate reservoir between bats and humans, especially its tropism in wild and domestic animals are mostly unknown. This constitutes major concerns in public health for the current pandemics and potential zoonosis. Previous reports using structural analysis of the viral spike protein (S) binding its cell receptor of angiotensin-converting enzyme 2 (ACE2), indicate a broad potential of SARS-CoV2 susceptibility in wild and particularly domestic animals. Through integration of key immunogenetic factors, including the existence of S-binding-void ACE2 isoforms and the disparity of ACE2 expression upon early innate immune response, we further refine the SARS-CoV2 susceptibility prediction to fit recent experimental validation. In addition to showing a broad susceptibility potential across mammalian species based on structural analysis, our results also reveal that domestic animals including dogs, pigs, cattle and goats may evolve ACE2-related immunogenetic diversity to restrict SARS-CoV2 infections. Thus, we propose that domestic animals may be unlikely to play a role as amplifying hosts unless the virus has further species-specific adaptation. Findings may relieve relevant public concerns regarding COVID-19-like risk in domestic animals, highlight virus-host coevolution, and evoke disease intervention through targeting ACE2 molecular diversity and interferon optimization.
Collapse
Affiliation(s)
- Eric R. Sang
- Department of Agricultural and Environmental Sciences, College of Agriculture, Tennessee State University, 3500 John A. Merritt Boulevard, Nashville, TN 37209, USA
| | - Yun Tian
- Department of Agricultural and Environmental Sciences, College of Agriculture, Tennessee State University, 3500 John A. Merritt Boulevard, Nashville, TN 37209, USA
| | - Yuanying Gong
- Department of Agricultural and Environmental Sciences, College of Agriculture, Tennessee State University, 3500 John A. Merritt Boulevard, Nashville, TN 37209, USA
| | - Laura C. Miller
- Virus and Prion Diseases of Livestock Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA, USA
| | - Yongming Sang
- Department of Agricultural and Environmental Sciences, College of Agriculture, Tennessee State University, 3500 John A. Merritt Boulevard, Nashville, TN 37209, USA
| |
Collapse
|
9
|
Stamboulian M, Guerrero RF, Hahn MW, Radivojac P. The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction. Bioinformatics 2020; 36:i219-i226. [PMID: 32657391 PMCID: PMC7355290 DOI: 10.1093/bioinformatics/btaa468] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION The computational prediction of gene function is a key step in making full use of newly sequenced genomes. Function is generally predicted by transferring annotations from homologous genes or proteins for which experimental evidence exists. The 'ortholog conjecture' proposes that orthologous genes should be preferred when making such predictions, as they evolve functions more slowly than paralogous genes. Previous research has provided little support for the ortholog conjecture, though the incomplete nature of the data cast doubt on the conclusions. RESULTS We use experimental annotations from over 40 000 proteins, drawn from over 80 000 publications, to revisit the ortholog conjecture in two pairs of species: (i) Homo sapiens and Mus musculus and (ii) Saccharomyces cerevisiae and Schizosaccharomyces pombe. By making a distinction between questions about the evolution of function versus questions about the prediction of function, we find strong evidence against the ortholog conjecture in the context of function prediction, though questions about the evolution of function remain difficult to address. In both pairs of species, we quantify the amount of information that would be ignored if paralogs are discarded, as well as the resulting loss in prediction accuracy. Taken as a whole, our results support the view that the types of homologs used for function transfer are largely irrelevant to the task of function prediction. Maximizing the amount of data used for this task, regardless of whether it comes from orthologs or paralogs, is most likely to lead to higher prediction accuracy. AVAILABILITY AND IMPLEMENTATION https://github.com/predragradivojac/oc. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Moses Stamboulian
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Rafael F Guerrero
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Matthew W Hahn
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| |
Collapse
|
10
|
Lal D, May P, Perez-Palma E, Samocha KE, Kosmicki JA, Robinson EB, Møller RS, Krause R, Nürnberg P, Weckhuysen S, De Jonghe P, Guerrini R, Niestroj LM, Du J, Marini C, Ware JS, Kurki M, Gormley P, Tang S, Wu S, Biskup S, Poduri A, Neubauer BA, Koeleman BPC, Helbig KL, Weber YG, Helbig I, Majithia AR, Palotie A, Daly MJ. Gene family information facilitates variant interpretation and identification of disease-associated genes in neurodevelopmental disorders. Genome Med 2020; 12:28. [PMID: 32183904 PMCID: PMC7079346 DOI: 10.1186/s13073-020-00725-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Accepted: 02/21/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Classifying pathogenicity of missense variants represents a major challenge in clinical practice during the diagnoses of rare and genetic heterogeneous neurodevelopmental disorders (NDDs). While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes belong to gene families. The use of gene family information for disease gene discovery and variant interpretation has not yet been investigated on a genome-wide scale. We empirically evaluate whether paralog-conserved or non-conserved sites in human gene families are important in NDDs. METHODS Gene family information was collected from Ensembl. Paralog-conserved sites were defined based on paralog sequence alignments; 10,068 NDD patients and 2078 controls were statistically evaluated for de novo variant burden in gene families. RESULTS We demonstrate that disease-associated missense variants are enriched at paralog-conserved sites across all disease groups and inheritance models tested. We developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in NDD patients of which 28 represent novel candidate genes for NDD which are brain expressed and under evolutionary constraint. CONCLUSION This study represents the first method to incorporate gene family information into a statistical framework to interpret variant data for NDDs and to discover new NDD-associated genes.
Collapse
Affiliation(s)
- Dennis Lal
- Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, USA.
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and M.I.T, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, USA.
- Cologne Center for Genomics, University of Cologne, Cologne, Germany.
- Genomic Medicine Institute, Lerner Research Institute Cleveland Clinic, 9500 Euclid Avenue, Cleveland, OH, 44195, USA.
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6, Avenue du Swing, 4367, Belvaux, Luxembourg.
| | - Eduardo Perez-Palma
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
- Genomic Medicine Institute, Lerner Research Institute Cleveland Clinic, 9500 Euclid Avenue, Cleveland, OH, 44195, USA
| | - Kaitlin E Samocha
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and M.I.T, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, USA
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Jack A Kosmicki
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and M.I.T, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, USA
| | - Elise B Robinson
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and M.I.T, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Rikke S Møller
- The Danish Epilepsy Centre, Dianalund, Denmark
- Institute for Regional Health research, University of Southern Denmark, Odense, Denmark
| | - Roland Krause
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6, Avenue du Swing, 4367, Belvaux, Luxembourg
| | - Peter Nürnberg
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
- Center for Molecular Medicine Cologne, University of Cologne, Cologne, Germany
- Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne, Cologne, Germany
| | - Sarah Weckhuysen
- Division of Neurology, Antwerp University Hospital, Antwerp, Belgium
- Neurogenetics Group, Center for Molecular Neurology, VIB, Antwerp, Belgium
- Laboratory of Neurogenetics, Institute Born-Bunge, University of Antwerp, Antwerp, Belgium
| | - Peter De Jonghe
- Division of Neurology, Antwerp University Hospital, Antwerp, Belgium
| | - Renzo Guerrini
- Pediatric Neurology and Neuroscience Department, Children's Hospital Anna Meyer, University of Florence, Florence, Italy
| | - Lisa M Niestroj
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
| | - Juliana Du
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
| | - Carla Marini
- Pediatric Neurology and Neuroscience Department, Children's Hospital Anna Meyer, University of Florence, Florence, Italy
| | - James S Ware
- National Heart & Lung Institute and MRC London Institute of Medical Science, Imperial College London, London, UK
| | - Mitja Kurki
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and M.I.T, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, USA
| | - Padhraig Gormley
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and M.I.T, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, USA
| | - Sha Tang
- Division of Clinical Genomics, Ambry Genetics, Aliso Viejo, CA, USA
| | - Sitao Wu
- Division of Clinical Genomics, Ambry Genetics, Aliso Viejo, CA, USA
| | - Saskia Biskup
- CeGat and Practice for Human Genetics, Tübingen, Germany
| | - Annapurna Poduri
- Epilepsy Genetics Program, Boston Children's Hospital, Boston, MA, USA
| | - Bernd A Neubauer
- Department of Neuropediatrics UKGM, University of Giessen, Giessen, Germany
| | - Bobby P C Koeleman
- Department of Genetics, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Katherine L Helbig
- Division of Clinical Genomics, Ambry Genetics, Aliso Viejo, CA, USA
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Yvonne G Weber
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
- Department of Epileptology and Neurology, University of Aachen, Aachen, Germany
| | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Amit R Majithia
- Division of Endocrinology, Department of Medicine, University of California, San Diego, CA, USA
| | - Aarno Palotie
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and M.I.T, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, USA
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Mark J Daly
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and M.I.T, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, USA.
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
11
|
Malinverni D, Barducci A. Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting. ENTROPY (BASEL, SWITZERLAND) 2020; 21:1127. [PMID: 32002010 PMCID: PMC6992422 DOI: 10.3390/e21111127] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 11/14/2019] [Indexed: 01/07/2023]
Abstract
Extracting structural information from sequence co-variation has become a common computational biology practice in the recent years, mainly due to the availability of large sequence alignments of protein families. However, identifying features that are specific to sub-classes and not shared by all members of the family using sequence-based approaches has remained an elusive problem. We here present a coevolutionary-based method to differentially analyze subfamily specific structural features by a continuous sequence reweighting (SR) approach. We introduce the underlying principles and test its predictive capabilities on the Response Regulator family, whose subfamilies have been previously shown to display distinct, specific homo-dimerization patterns. Our results show that this reweighting scheme is effective in assigning structural features known a priori to subfamilies, even when sequence data is relatively scarce. Furthermore, sequence reweighting allows assessing if individual structural contacts pertain to specific subfamilies and it thus paves the way for the identification specificity-determining contacts from sequence variation data.
Collapse
Affiliation(s)
- Duccio Malinverni
- Medical Research Council (MRC) Laboratory of Molecular Biology, Cambridge CB20QH, UK
| | - Alessandro Barducci
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| |
Collapse
|
12
|
Fan J, Cannistra A, Fried I, Lim T, Schaffner T, Crovella M, Hescott B, Leiserson MDM. Functional protein representations from biological networks enable diverse cross-species inference. Nucleic Acids Res 2019; 47:e51. [PMID: 30847485 PMCID: PMC6511848 DOI: 10.1093/nar/gkz132] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 01/09/2019] [Accepted: 02/18/2019] [Indexed: 12/31/2022] Open
Abstract
Transferring knowledge between species is key for many biological applications, but is complicated by divergent and convergent evolution. Many current approaches for this problem leverage sequence and interaction network data to transfer knowledge across species, exemplified by network alignment methods. While these techniques do well, they are limited in scope, creating metrics to address one specific problem or task. We take a different approach by creating an environment where multiple knowledge transfer tasks can be performed using the same protein representations. Specifically, our kernel-based method, MUNK, integrates sequence and network structure to create functional protein representations, embedding proteins from different species in the same vector space. First we show proteins in different species that are close in MUNK-space are functionally similar. Next, we use these representations to share knowledge of synthetic lethal interactions between species. Importantly, we find that the results using MUNK-representations are at least as accurate as existing algorithms for these tasks. Finally, we generalize the notion of a phenolog ('orthologous phenotype') to use functionally similar proteins (i.e. those with similar representations). We demonstrate the utility of this broadened notion by using it to identify known phenologs and novel non-obvious ones supported by current research.
Collapse
Affiliation(s)
- Jason Fan
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA
| | | | - Inbar Fried
- University of North Carolina Medical School, USA
| | - Tim Lim
- Department of Computer Science, Boston University, USA
| | | | - Mark Crovella
- Department of Computer Science, Boston University, USA
| | - Benjamin Hescott
- College of Computer and Information Science, Northeastern University, USA
| | - Mark D M Leiserson
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA
| |
Collapse
|
13
|
Pascual-García A, Arenas M, Bastolla U. The Molecular Clock in the Evolution of Protein Structures. Syst Biol 2019; 68:987-1002. [DOI: 10.1093/sysbio/syz022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 03/20/2019] [Accepted: 04/09/2019] [Indexed: 12/11/2022] Open
Abstract
Abstract
The molecular clock hypothesis, which states that substitutions accumulate in protein sequences at a constant rate, plays a fundamental role in molecular evolution but it is violated when selective or mutational processes vary with time. Such violations of the molecular clock have been widely investigated for protein sequences, but not yet for protein structures. Here, we introduce a novel statistical test (Significant Clock Violations) and perform a large scale assessment of the molecular clock in the evolution of both protein sequences and structures in three large superfamilies. After validating our method with computer simulations, we find that clock violations are generally consistent in sequence and structure evolution, but they tend to be larger and more significant in structure evolution. Moreover, changes of function assessed through Gene Ontology and InterPro terms are associated with large and significant clock violations in structure evolution. We found that almost one third of significant clock violations are significant in structure evolution but not in sequence evolution, highlighting the advantage to use structure information for assessing accelerated evolution and gathering hints of positive selection. Clock violations between closely related pairs are frequently significant in sequence evolution, consistent with the observed time dependence of the substitution rate attributed to segregation of neutral and slightly deleterious polymorphisms, but not in structure evolution, suggesting that these substitutions do not affect protein structure although they may affect stability. These results are consistent with the view that natural selection, both negative and positive, constrains more strongly protein structures than protein sequences. Our code for computing clock violations is freely available at https://github.com/ugobas/Molecular_clock.
Collapse
Affiliation(s)
- Alberto Pascual-García
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot, UK
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
| | - Miguel Arenas
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Spain
| | - Ugo Bastolla
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
14
|
Xiao J, Hu R, Gu T, Han J, Qiu D, Su P, Feng J, Chang J, Yang G, He G. Genome-wide identification and expression profiling of trihelix gene family under abiotic stresses in wheat. BMC Genomics 2019; 20:287. [PMID: 30975075 PMCID: PMC6460849 DOI: 10.1186/s12864-019-5632-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 03/21/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The trihelix gene family is a plant-specific transcription factor family that plays important roles in plant growth, development, and responses to abiotic stresses. However, to date, no systemic characterization of the trihelix genes has yet been conducted in wheat and its close relatives. RESULTS We identified a total of 94 trihelix genes in wheat, as well as 22 trihelix genes in Triticum urartu, 29 in Aegilops tauschii, and 31 in Brachypodium distachyon. We analyzed the chromosomal locations and orthology relations of the identified trihelix genes, and no trihelix gene was found to be located on chromosome 7A, 7B, or 7D of wheat, thereby reflecting the uneven distributions of wheat trihelix genes. Phylogenetic analysis indicated that the 186 identified trihelix proteins in wheat, rice, B. distachyon, and Arabidopsis were clustered into five major clades. The trihelix genes belonging to the same clades usually shared similar motif compositions and exon/intron structural patterns. Five pairs of tandem duplication genes and three pairs of segmental duplication genes were identified in the wheat trihelix gene family, thereby validating the supposition that more intrachromosomal gene duplication events occur in the genome of wheat than in that of other grass species. The tissue-specific expression and differential expression profiling of the identified genes under cold and drought stresses were analyzed by using RNA-seq data. qRT-PCR was also used to confirm the expression profiles of ten selected wheat trihelix genes under multiple abiotic stresses, and we found that these genes mainly responded to salt and cold stresses. CONCLUSIONS In this study, we identified trihelix genes in wheat and its close relatives and found that gene duplication events are the main driving force for trihelix gene evolution in wheat. Our expression profiling analysis demonstrated that wheat trihelix genes responded to multiple abiotic stresses, especially salt and cold stresses. The results of our study built a basis for further investigation of the functions of wheat trihelix genes and provided candidate genes for stress-resistant wheat breeding programs.
Collapse
Affiliation(s)
- Jie Xiao
- The Genetic Engineering International Cooperation Base of Chinese Ministry of Science and Technology, Key Laboratory of Molecular Biophysics of Chinese Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), Wuhan, 430074 China
| | - Rui Hu
- The Genetic Engineering International Cooperation Base of Chinese Ministry of Science and Technology, Key Laboratory of Molecular Biophysics of Chinese Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), Wuhan, 430074 China
| | - Ting Gu
- The Genetic Engineering International Cooperation Base of Chinese Ministry of Science and Technology, Key Laboratory of Molecular Biophysics of Chinese Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), Wuhan, 430074 China
| | - Jiapeng Han
- The Genetic Engineering International Cooperation Base of Chinese Ministry of Science and Technology, Key Laboratory of Molecular Biophysics of Chinese Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), Wuhan, 430074 China
| | - Ding Qiu
- The Genetic Engineering International Cooperation Base of Chinese Ministry of Science and Technology, Key Laboratory of Molecular Biophysics of Chinese Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), Wuhan, 430074 China
| | - Peipei Su
- The Genetic Engineering International Cooperation Base of Chinese Ministry of Science and Technology, Key Laboratory of Molecular Biophysics of Chinese Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), Wuhan, 430074 China
| | - Jialu Feng
- The Genetic Engineering International Cooperation Base of Chinese Ministry of Science and Technology, Key Laboratory of Molecular Biophysics of Chinese Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), Wuhan, 430074 China
| | - Junli Chang
- The Genetic Engineering International Cooperation Base of Chinese Ministry of Science and Technology, Key Laboratory of Molecular Biophysics of Chinese Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), Wuhan, 430074 China
| | - Guangxiao Yang
- The Genetic Engineering International Cooperation Base of Chinese Ministry of Science and Technology, Key Laboratory of Molecular Biophysics of Chinese Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), Wuhan, 430074 China
| | - Guangyuan He
- The Genetic Engineering International Cooperation Base of Chinese Ministry of Science and Technology, Key Laboratory of Molecular Biophysics of Chinese Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), Wuhan, 430074 China
| |
Collapse
|
15
|
Dabral D, Coorssen JR. Combined targeted Omic and Functional Assays Identify Phospholipases A₂ that Regulate Docking/Priming in Calcium-Triggered Exocytosis. Cells 2019; 8:cells8040303. [PMID: 30986994 PMCID: PMC6523306 DOI: 10.3390/cells8040303] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 03/24/2019] [Accepted: 03/28/2019] [Indexed: 12/12/2022] Open
Abstract
The fundamental molecular mechanism underlying the membrane merger steps of regulated exocytosis is highly conserved across cell types. Although involvement of Phospholipase A₂ (PLA₂) in regulated exocytosis has long been suggested, its function or that of its metabolites-a lyso-phospholipid and a free fatty acid-remain somewhat speculative. Here, using a combined bioinformatics and top-down discovery proteomics approach, coupled with lipidomic analyses, PLA₂ were found to be associated with release-ready cortical secretory vesicles (CV) that possess the minimal molecular machinery for docking, Ca2+ sensing and membrane fusion. Tightly coupling the molecular analyses with well-established quantitative fusion assays, we show for the first time that inhibition of a CV surface calcium independent intracellular PLA₂ and a luminal secretory PLA₂ significantly reduce docking/priming in the late steps of regulated exocytosis, indicating key regulatory roles in the critical step(s) preceding membrane merger.
Collapse
Affiliation(s)
- Deepti Dabral
- Molecular Physiology and Molecular Medicine Research Group, School of Medicine, Western Sydney University, Campbelltown Campus, NSW 2560, Australia.
| | - Jens R Coorssen
- Department of Health Sciences, Faculty of Applied Health Sciences and Department of Biological Sciences, Faculty of Mathematics & Science, Brock University, St. Catharines, ON L2S 3A1, Canada.
| |
Collapse
|
16
|
Yang X, Wang J, Bing G, Bie P, De Y, Lyu Y, Wu Q. Ortholog-based screening and identification of genes related to intracellular survival. Gene 2018; 651:134-142. [DOI: 10.1016/j.gene.2018.01.059] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2017] [Revised: 10/29/2017] [Accepted: 01/17/2018] [Indexed: 12/29/2022]
|
17
|
Jahangiri-Tazehkand S, Wong L, Eslahchi C. OrthoGNC: A Software for Accurate Identification of Orthologs Based on Gene Neighborhood Conservation. GENOMICS PROTEOMICS & BIOINFORMATICS 2017; 15:361-370. [PMID: 29133277 PMCID: PMC5828658 DOI: 10.1016/j.gpb.2017.07.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2017] [Revised: 07/17/2017] [Accepted: 07/28/2017] [Indexed: 11/17/2022]
Abstract
Orthology relations can be used to transfer annotations from one gene (or protein) to another. Hence, detecting orthology relations has become an important task in the post-genomic era. Various genomic events, such as duplication and horizontal gene transfer, can cause erroneous assignment of orthology relations. In closely-related species, gene neighborhood information can be used to resolve many ambiguities in orthology inference. Here we present OrthoGNC, a software for accurately predicting pairwise orthology relations based on gene neighborhood conservation. Analyses on simulated and real data reveal the high accuracy of OrthoGNC. In addition to orthology detection, OrthoGNC can be employed to investigate the conservation of genomic context among potential orthologs detected by other methods. OrthoGNC is freely available online at http://bs.ipm.ir/softwares/orthognc and http://tinyurl.com/orthoGNC.
Collapse
Affiliation(s)
| | - Limsoon Wong
- School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Changiz Eslahchi
- Department of Computer Science, Shahid Beheshti University, Tehran 1983969411, Iran.
| |
Collapse
|
18
|
Bharathi M, Chellapandi P. Intergenomic evolution and metabolic cross-talk between rumen and thermophilic autotrophic methanogenic archaea. Mol Phylogenet Evol 2016; 107:293-304. [PMID: 27864137 DOI: 10.1016/j.ympev.2016.11.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 09/17/2016] [Accepted: 11/13/2016] [Indexed: 02/01/2023]
Abstract
Methanobrevibacter ruminantium M1 (MRU) is a rumen methanogenic archaean that can be able to utilize formate and CO2/H2 as growth substrates. Extensive analysis on the evolutionary genomic contexts considered herein to unravel its intergenomic relationship and metabolic adjustment acquired from the genomic content of Methanothermobacter thermautotrophicus ΔH. We demonstrated its intergenomic distance, genome function, synteny homologs and gene families, origin of replication, and methanogenesis to reveal the evolutionary relationships between Methanobrevibacter and Methanothermobacter. Comparison of the phylogenetic and metabolic markers was suggested for its archaeal metabolic core lineage that might have evolved from Methanothermobacter. Orthologous genes involved in its hydrogenotrophic methanogenesis might be acquired from intergenomic ancestry of Methanothermobacter via Methanobacterium formicicum. Formate dehydrogenase (fdhAB) coding gene cluster and carbon monoxide dehydrogenase (cooF) coding gene might have evolved from duplication events within Methanobrevibacter-Methanothermobacter lineage, and fdhCD gene cluster acquired from bacterial origins. Genome-wide metabolic survey found the existence of four novel pathways viz. l-tyrosine catabolism, mevalonate pathway II, acyl-carrier protein metabolism II and glutathione redox reactions II in MRU. Finding of these pathways suggested that MRU has shown a metabolic potential to tolerate molecular oxygen, antimicrobial metabolite biosynthesis and atypical lipid composition in cell wall, which was acquainted by metabolic cross-talk with mammalian bacterial origins. We conclude that coevolution of genomic contents between Methanobrevibacter and Methanothermobacter provides a clue to understand the metabolic adaptation of MRU in the rumen at different environmental niches.
Collapse
Affiliation(s)
- M Bharathi
- Molecular Systems Engineering Lab, Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620 024, Tamil Nadu, India
| | - P Chellapandi
- Molecular Systems Engineering Lab, Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620 024, Tamil Nadu, India.
| |
Collapse
|
19
|
Varshney D, Jaiswar A, Adholeya A, Prasad P. Phylogenetic analyses reveal molecular signatures associated with functional divergence among Subtilisin like Serine Proteases are linked to lifestyle transitions in Hypocreales. BMC Evol Biol 2016; 16:220. [PMID: 27756202 PMCID: PMC5069783 DOI: 10.1186/s12862-016-0793-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 10/07/2016] [Indexed: 11/15/2022] Open
Abstract
Background Subtilisin-like serine proteases or Subtilases in fungi are important for penetration and colonization of host. In Hypocreales, these proteins share several properties with other fungal, bacterial, plant and mammalian homologs. However, adoption of specific roles in entomopathogenesis may be governed by attainment of unique biochemical and structural features during the evolutionary course. Due to such functional shifts Subtilases coded by different family members of Hypocreales acquire distinct features according to respective hosts and lifestyle. We conducted phylogenetic and DIVERGE analyses and identified important protein residues that putatively assign functional specificity to Subtilases in fungal families/species under the order Hypocreales. Results A total of 161 Subtilases coded by 10 species from five different families under the fungal order Hypocreales was included in the analysis. Based on the presence of conserved domains, the Subtilase genes were divided into three subfamilies, Subtilisin (S08.005), Proteinase K (S08.054) and Serine-carboxyl peptidases (S53.001). These subfamilies were investigated for phylogenetic associations, protein residues under positive selection and functional divergence among paralogous clades. The observations were co-related with the life-styles of the fungal families/species. Phylogenetic and Divergence analyses of Subtilisin (S08.005) and Proteinase K (S08.054) families of proteins revealed that the paralogous clades were clear-cut representation of familial origin of the protein sequences. We observed divergence between the paralogous clades of plant-pathogenic fungi (Nectriaceae), insect-pathogenic fungi (Cordycipitaceae/Clavicipitaceae) and nematophagous fungi (Ophiocordycipitaceae). In addition, Subtilase genes from the nematode-parasitic fungus Purpureocillium lilacinum made a unique cluster which putatively indicated that the fungus might have developed distinctive mechanisms for nematode-pathogenesis. Our evolutionary genetics analysis revealed evidence of positive selection on the Subtilisin (S08.005) and Proteinase K (S08.054) protein sequences of the entomopathogenic and nematophagous species belonging to Cordycipitaceae, Clavicipitaceae and Ophiocordycipitaceae families of Hypocreales. Conclusions Our study provided new insights into the evolution of Subtilisin like serine proteases in Hypocreales, a fungal order largely consisting of biological control species. Subtilisin (S08.005) and Proteinase K (S08.054) proteins seemed to play important roles during life style modifications among different families and species of Hypocreales. Protein residues found significant in functional divergence analysis in the present study may provide support for protein engineering in future. Electronic supplementary material The online version of this article (doi:10.1186/s12862-016-0793-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Deepti Varshney
- TERI Deakin Nanobiotechnology Centre, TERI Gram, The Energy and Resources Institute, Gual Pahari, Faridabad Road, Gurgaon, Haryana, 122001, India
| | - Akanksha Jaiswar
- TERI Deakin Nanobiotechnology Centre, TERI Gram, The Energy and Resources Institute, Gual Pahari, Faridabad Road, Gurgaon, Haryana, 122001, India
| | - Alok Adholeya
- TERI Deakin Nanobiotechnology Centre, TERI Gram, The Energy and Resources Institute, Gual Pahari, Faridabad Road, Gurgaon, Haryana, 122001, India
| | - Pushplata Prasad
- TERI Deakin Nanobiotechnology Centre, TERI Gram, The Energy and Resources Institute, Gual Pahari, Faridabad Road, Gurgaon, Haryana, 122001, India.
| |
Collapse
|
20
|
Kajla M, Kakani P, Choudhury TP, Gupta K, Gupta L, Kumar S. Characterization and expression analysis of gene encoding heme peroxidase HPX15 in major Indian malaria vector Anopheles stephensi (Diptera: Culicidae). Acta Trop 2016; 158:107-116. [PMID: 26943999 DOI: 10.1016/j.actatropica.2016.02.028] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2016] [Revised: 02/27/2016] [Accepted: 02/29/2016] [Indexed: 10/22/2022]
Abstract
The interaction of mosquito immune system with Plasmodium is critical in determining the vector competence. Thus, blocking the crucial mosquito molecules that regulate parasite development might be effective in controlling the disease transmission. In this study, we characterized a full-length AsHPX15 gene from the major Indian malaria vector Anopheles stephensi. This gene is true ortholog of Anopheles gambiae heme peroxidase AgHPX15 (AGAP013327), which modulates midgut immunity and regulates Plasmodium falciparum development. We found that AsHPX15 is highly induced in mosquito developmental stages and blood fed midguts. In addition, this is a lineage-specific gene that has identical features and 65-99% amino acids identity with other HPX15 genes present in eighteen worldwide-distributed anophelines. We discuss that the conserved HPX15 gene might serve as a common target to manipulate mosquito immunity and arresting Plasmodium development inside the vector host.
Collapse
|
21
|
Tekaia F. Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016; 9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]
Abstract
With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|
22
|
Mazzucotelli E, Trono D. Cloning, expression analysis, and functional characterization of two secretory phospholipases A2 in durum wheat (Triticum durum Desf.). PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2015; 241:295-306. [PMID: 26706080 DOI: 10.1016/j.plantsci.2015.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Revised: 10/16/2015] [Accepted: 10/17/2015] [Indexed: 06/05/2023]
Abstract
We previously isolated four cDNAs in durum wheat, TdsPLA2I, TdsPLA2II, TdsPLA2III and TdsPLA2IV, that encode proteins with homology to plant secretory phospholipases A2 (sPLA2s) (Verlotta et al., Int. J. Mol. Sci., 14, 2013, 5146-5169). In this study, we have further characterized TdsPLA2II and TdsPLA2III sequences that, on the basis of our previous findings, might encode sPLA2 isoforms with different features. Functional analysis revealed that, similarly to other known sPLA2s, TdsPLA2II and TdsPLA2III have an optimum at pH 9.0, require Ca(2+), are heat stable, and are inhibited by the disulfide-bond-reducing agent dithiothreitol. However, differences emerged between these TdsPLA2 isoforms. Transcript analysis revealed that the TdsPLA2III gene is highly up-regulated under different environmental stresses; conversely, the TdsPLA2II gene is expressed at constant levels under almost all of the stress conditions examined. Moreover, TdsPLA2II is saturated at micromolar substrate and Ca(2+) concentrations, whereas TdsPLA2III requires millimolar concentrations to reach maximal activity. This suggests that TdsPLA2II normally functions under optimal conditions in vivo, whereas TdsPLA2III is only partially activated, depending on the specific phospholipid and Ca(2+) levels. Altogether these data lead to the hypothesis that in vivo TdsPLA2II and TdsPLA2III are differently regulated at both molecular and biochemical level and that TdsPLA2III plays a major role in durum wheat response to adverse environmental conditions.
Collapse
MESH Headings
- Amino Acid Sequence
- Cloning, Molecular
- DNA, Complementary/genetics
- DNA, Complementary/metabolism
- DNA, Plant/genetics
- DNA, Plant/metabolism
- Gene Expression Regulation, Plant
- Molecular Sequence Data
- Phospholipases A2, Secretory/genetics
- Phospholipases A2, Secretory/metabolism
- Phylogeny
- Plant Proteins/genetics
- Plant Proteins/metabolism
- RNA, Plant/genetics
- RNA, Plant/metabolism
- Sequence Alignment
- Triticum/enzymology
- Triticum/genetics
- Triticum/metabolism
Collapse
Affiliation(s)
- Elisabetta Mazzucotelli
- Consiglio per la Ricerca in Agricoltura e l'Analisi dell'Economia Agraria, Centro di Ricerca per la Genomica Vegetale, Via San Protaso 302, 29017 Fiorenzuola d'Arda, Italy
| | - Daniela Trono
- Consiglio per la Ricerca in Agricoltura e l'Analisi dell'Economia Agraria, Centro di Ricerca per la Cerealicoltura, S.S. 673, Km 25,200, 71122 Foggia, Italy.
| |
Collapse
|
23
|
Kaitetzidou E, Chatzifotis S, Antonopoulou E, Sarropoulou E. Identification, Phylogeny, and Function of fabp2 Paralogs in Two Non-Model Teleost Fish Species. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2015; 17:663-677. [PMID: 26272429 DOI: 10.1007/s10126-015-9648-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Accepted: 06/22/2015] [Indexed: 06/04/2023]
Abstract
Intestinal fatty-acid-binding protein (IFABP or FABP2) is a cytosolic transporter of long-chain fatty acids, which is mainly expressed in cells of intestinal tissue. Fatty acids in teleosts are an important source of energy for growth, reproduction, and swimming and a main ingredient in the yolk sac of embryos and larvae. The fabp2 paralogs, fabp2a and fabp2b, were identified for 26 teleost fish species including the paralogs for the two non-model teleost fish species, namely the gilthead sea bream (Sparus aurata) and the European sea bass (Dicentrarchus labrax). Despite the high similarity of fabp2 paralogs, as well as the identical organization in four exons, paralogs were mapped to different chromosomes/linkage groups supporting the hypothesis that the identified transcripts are true paralogs originating from a single ancestor gene after genome duplication. This was also confirmed by phylogenetic analysis using fabp2 sequences of 26 teleosts and by synteny analysis carried out with ten teleosts. Differential expression analysis of the gilthead sea bream and European sea bass fabp2 paralogs in the intestine after fasting and refeeding experiment further revealed their altered implication in metabolism. Additional expression studies in seven developmental stages of the two species detected fabp2 paralogs relatively early in the embryonic development as well as possible complementary or separated roles of the paralogs. The identification and characterization of the two fabp2 paralogs will contribute significantly to the understanding of the fabp2 evolution as well as of the divergences in fatty acid metabolism.
Collapse
Affiliation(s)
- Elisavet Kaitetzidou
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Thalassocosmos, Gournes Pediados, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | | | | | | |
Collapse
|
24
|
Sudha G, Naveenkumar N, Srinivasan N. Evolutionary and structural analyses of heterodimeric proteins composed of subunits with same fold. Proteins 2015; 83:1766-86. [PMID: 26148218 DOI: 10.1002/prot.24849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2015] [Revised: 05/30/2015] [Accepted: 06/21/2015] [Indexed: 11/10/2022]
Abstract
Heterodimeric proteins with homologous subunits of same fold are involved in various biological processes. The objective of this study is to understand the evolution of structural and functional features of such heterodimers. Using a non-redundant dataset of 70 such heterodimers of known 3D structure and an independent dataset of 173 heterodimers from yeast, we note that the mean sequence identity between interacting homologous subunits is only 23-24% suggesting that, generally, highly diverged paralogues assemble to form such a heterodimer. We also note that the functional roles of interacting subunits/domains are generally quite different. This suggests that, though the interacting subunits/domains are homologous, the high evolutionary divergence characterize their high functional divergence which contributes to a gross function for the heterodimer considered as a whole. The inverse relationship between sequence identity and RMSD of interacting homologues in heterodimers is not followed. We also addressed the question of formation of homodimers of the subunits of heterodimers by generating models of fictitious homodimers on the basis of the 3D structures of the heterodimers. Interaction energies associated with these homodimers suggests that, in overwhelming majority of the cases, such homodimers are unlikely to be stable. Majority of the homologues of heterodimers of known structures form heterodimers (51.8%) and a small proportion (14.6%) form homodimers. Comparison of 3D structures of heterodimers with homologous homodimers suggests that interfacial nature of residues is not well conserved. In over 90% of the cases we note that the interacting subunits of heterodimers are co-localized in the cell.
Collapse
Affiliation(s)
- Govindarajan Sudha
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Nagarajan Naveenkumar
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, Karnataka, 560065, India.,Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
| | | |
Collapse
|
25
|
Musungu B, Bhatnagar D, Brown RL, Fakhoury AM, Geisler M. A predicted protein interactome identifies conserved global networks and disease resistance subnetworks in maize. Front Genet 2015; 6:201. [PMID: 26089837 PMCID: PMC4454876 DOI: 10.3389/fgene.2015.00201] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 05/21/2015] [Indexed: 12/30/2022] Open
Abstract
Interactomes are genome-wide roadmaps of protein-protein interactions. They have been produced for humans, yeast, the fruit fly, and Arabidopsis thaliana and have become invaluable tools for generating and testing hypotheses. A predicted interactome for Zea mays (PiZeaM) is presented here as an aid to the research community for this valuable crop species. PiZeaM was built using a proven method of interologs (interacting orthologs) that were identified using both one-to-one and many-to-many orthology between genomes of maize and reference species. Where both maize orthologs occurred for an experimentally determined interaction in the reference species, we predicted a likely interaction in maize. A total of 49,026 unique interactions for 6004 maize proteins were predicted. These interactions are enriched for processes that are evolutionarily conserved, but include many otherwise poorly annotated proteins in maize. The predicted maize interactions were further analyzed by comparing annotation of interacting proteins, including different layers of ontology. A map of pairwise gene co-expression was also generated and compared to predicted interactions. Two global subnetworks were constructed for highly conserved interactions. These subnetworks showed clear clustering of proteins by function. Another subnetwork was created for disease response using a bait and prey strategy to capture interacting partners for proteins that respond to other organisms. Closer examination of this subnetwork revealed the connectivity between biotic and abiotic hormone stress pathways. We believe PiZeaM will provide a useful tool for the prediction of protein function and analysis of pathways for Z. mays researchers and is presented in this paper as a reference tool for the exploration of protein interactions in maize.
Collapse
Affiliation(s)
- Bryan Musungu
- Department of Plant Biology, Southern Illinois University Carbondale, IL, USA
| | - Deepak Bhatnagar
- Food and Feed Safety Research, Southern Regional Research Center, United States Department of Agriculture, Agricultural Research Service New Orleans, LA, USA
| | - Robert L Brown
- Food and Feed Safety Research, Southern Regional Research Center, United States Department of Agriculture, Agricultural Research Service New Orleans, LA, USA
| | - Ahmad M Fakhoury
- Department of Plant Soil and Agriculture Systems, Southern Illinois University Carbondale, IL, USA
| | - Matt Geisler
- Department of Plant Biology, Southern Illinois University Carbondale, IL, USA
| |
Collapse
|
26
|
Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 2015; 44:1172-239. [PMID: 25503938 PMCID: PMC4349129 DOI: 10.1039/c4cs00351a] [Citation(s) in RCA: 256] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Indexed: 12/21/2022]
Abstract
The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust.
Collapse
Affiliation(s)
- Andrew Currin
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| | - Neil Swainston
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- School of Computer Science , The University of Manchester , Manchester M13 9PL , UK
| | - Philip J. Day
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- Faculty of Medical and Human Sciences , The University of Manchester , Manchester M13 9PT , UK
| | - Douglas B. Kell
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| |
Collapse
|
27
|
Junges Â, Boldo JT, Souza BK, Guedes RLM, Sbaraini N, Kmetzsch L, Thompson CE, Staats CC, de Almeida LGP, de Vasconcelos ATR, Vainstein MH, Schrank A. Genomic analyses and transcriptional profiles of the glycoside hydrolase family 18 genes of the entomopathogenic fungus Metarhizium anisopliae. PLoS One 2014; 9:e107864. [PMID: 25232743 PMCID: PMC4169460 DOI: 10.1371/journal.pone.0107864] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Accepted: 08/16/2014] [Indexed: 12/26/2022] Open
Abstract
Fungal chitin metabolism involves diverse processes such as metabolically active cell wall maintenance, basic nutrition, and different aspects of virulence. Chitinases are enzymes belonging to the glycoside hydrolase family 18 (GH18) and 19 (GH19) and are responsible for the hydrolysis of β-1,4-linkages in chitin. This linear homopolymer of N-acetyl-β-D-glucosamine is an essential constituent of fungal cell walls and arthropod exoskeletons. Several chitinases have been directly implicated in structural, morphogenetic, autolytic and nutritional activities of fungal cells. In the entomopathogen Metarhizium anisopliae, chitinases are also involved in virulence. Filamentous fungi genomes exhibit a higher number of chitinase-coding genes than bacteria or yeasts. The survey performed in the M. anisopliae genome has successfully identified 24 genes belonging to glycoside hydrolase family 18, including three previously experimentally determined chitinase-coding genes named chit1, chi2 and chi3. These putative chitinases were classified based on domain organization and phylogenetic analysis into the previously described A, B and C chitinase subgroups, and into a new subgroup D. Moreover, three GH18 proteins could be classified as putative endo-N-acetyl-β-D-glucosaminidases, enzymes that are associated with deglycosylation and were therefore assigned to a new subgroup E. The transcriptional profile of the GH18 genes was evaluated by qPCR with RNA extracted from eight culture conditions, representing different stages of development or different nutritional states. The transcripts from the GH18 genes were detected in at least one of the different M. anisopliae developmental stages, thus validating the proposed genes. Moreover, not all members from the same chitinase subgroup presented equal patterns of transcript expression under the eight distinct conditions studied. The determination of M. anisopliae chitinases and ENGases and a more detailed study concerning the enzymes’ roles in morphological or nutritional functions will allow comprehensive insights into the chitinolytic potential of this highly infective entomopathogenic fungus.
Collapse
Affiliation(s)
- Ângela Junges
- Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | | | - Bárbara Kunzler Souza
- Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | | | - Nicolau Sbaraini
- Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Lívia Kmetzsch
- Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | | | | | | | | | | | - Augusto Schrank
- Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
- * E-mail:
| |
Collapse
|
28
|
Puggioni V, Dondi A, Folli C, Shin I, Rhee S, Percudani R. Gene Context Analysis Reveals Functional Divergence between Hypothetically Equivalent Enzymes of the Purine–Ureide Pathway. Biochemistry 2014; 53:735-45. [DOI: 10.1021/bi4010107] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Vincenzo Puggioni
- Laboratory
of Biochemistry, Molecular Biology, and Bioinformatics, Department
of Life Sciences, University of Parma, Italy
| | - Ambra Dondi
- Laboratory
of Biochemistry, Molecular Biology, and Bioinformatics, Department
of Life Sciences, University of Parma, Italy
| | - Claudia Folli
- Department
of Food Science, University of Parma, Italy
| | - Inchul Shin
- Department
of Agricultural Biotechnology, Seoul National University, Seoul, Korea
| | - Sangkee Rhee
- Department
of Agricultural Biotechnology, Seoul National University, Seoul, Korea
| | - Riccardo Percudani
- Laboratory
of Biochemistry, Molecular Biology, and Bioinformatics, Department
of Life Sciences, University of Parma, Italy
| |
Collapse
|
29
|
Wu YC, Rasmussen MD, Bansal MS, Kellis M. Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees. Genome Res 2013; 24:475-86. [PMID: 24310000 PMCID: PMC3941112 DOI: 10.1101/gr.161968.113] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Accurate gene tree-species tree reconciliation is fundamental to inferring the evolutionary history of a gene family. However, although it has long been appreciated that population-related effects such as incomplete lineage sorting (ILS) can dramatically affect the gene tree, many of the most popular reconciliation methods consider discordance only due to gene duplication and loss (and sometimes horizontal gene transfer). Methods that do model ILS are either highly parameterized or consider a restricted set of histories, thus limiting their applicability and accuracy. To address these challenges, we present a novel algorithm DLCpar for inferring a most parsimonious (MP) history of a gene family in the presence of duplications, losses, and ILS. Our algorithm relies on a new reconciliation structure, the labeled coalescent tree (LCT), that simultaneously describes coalescent and duplication-loss history. We show that the LCT representation enables an exhaustive and efficient search over the space of reconciliations, and, for most gene families, the least common ancestor (LCA) mapping is an optimal solution for the species mapping between the gene tree and species tree in an MP LCT. Applying our algorithm to a variety of clades, including flies, fungi, and primates, as well as to simulated phylogenies, we achieve high accuracy, comparable to sophisticated probabilistic reconciliation methods, at reduced run time and with far fewer parameters. These properties enable inferences of the complex evolution of gene families across a broad range of species and large data sets.
Collapse
Affiliation(s)
- Yi-Chieh Wu
- Department of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | | | | | |
Collapse
|
30
|
Kumar S, Biswal DK, Tandon V. In-silico analysis of caspase-3 and -7 proteases from blood-parasitic Schistosoma species (Trematoda) and their human host. Bioinformation 2013; 9:456-63. [PMID: 23847399 PMCID: PMC3705615 DOI: 10.6026/97320630009456] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2012] [Revised: 10/21/2012] [Accepted: 04/17/2013] [Indexed: 12/24/2022] Open
Abstract
Proteolytic enzymes of the caspase family, which reside as latent precursors in most nucleated metazoan cells, are core effectors of apoptosis. Of them, the executioner caspases- 3 and -7 exist within the cytosol as inactive dimers and are activated by a process called dimerization. Caspase inhibition is looked upon as a promising approach for treating multiple diseases. Though caspases have been extensively studied in the human system, their role in eukaryotic pathogens and parasites of human hosts has not drawn enough attention. In protein sequence analysis, caspases of blood flukes (Schistosoma spp) were revealed to have a low sequence identity with their counterparts in human and other mammalian hosts, which encouraged us to analyse interacting domains that participate in dimerization of caspases in the parasite and to reveal differences, if any, between the host-parasite systems. Significant differences in the molecular surface arrangement of the dimer interfaces reveal that in schistosomal caspases only eight out of forty dimer conformations are similar to human caspase structures. Thus, the parasite-specific dimer conformations (that are different from caspases of the host) may emerge as potential drug targets of therapeutic value against schistosomal infections. Three important factors namely, the size of amino acids, secondary structures and geometrical arrangement of interacting domains influence the pattern of caspase dimer formation, which, in turn, is manifested in varied structural conformations of caspases in the parasite and its human hosts.
Collapse
Affiliation(s)
- Shakti Kumar
- Bioinformatics Centre, North-Eastern Hill University, Shillong 793022, Meghalaya, India
- Department of Zoology, North-Eastern Hill University, Shillong 793022, Meghalaya, India
| | - Devendra Kumar Biswal
- Bioinformatics Centre, North-Eastern Hill University, Shillong 793022, Meghalaya, India
| | - Veena Tandon
- Bioinformatics Centre, North-Eastern Hill University, Shillong 793022, Meghalaya, India
- Department of Zoology, North-Eastern Hill University, Shillong 793022, Meghalaya, India
| |
Collapse
|
31
|
Wolf YI, Koonin EV. A tight link between orthologs and bidirectional best hits in bacterial and archaeal genomes. Genome Biol Evol 2013; 4:1286-94. [PMID: 23160176 PMCID: PMC3542571 DOI: 10.1093/gbe/evs100] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Orthologous relationships between genes are routinely inferred from bidirectional best hits (BBH) in pairwise genome comparisons. However, to our knowledge, it has never been quantitatively demonstrated that orthologs form BBH. To test this “BBH-orthology conjecture,” we take advantage of the operon organization of bacterial and archaeal genomes and assume that, when two genes in compared genomes are flanked by two BBH show statistically significant sequence similarity to one another, these genes are bona fide orthologs. Under this assumption, we tested whether middle genes in “syntenic orthologous gene triplets” form BBH. We found that this was the case in more than 95% of the syntenic gene triplets in all genome comparisons. A detailed examination of the exceptions to this pattern, including maximum likelihood phylogenetic tree analysis, showed that some of these deviations involved artifacts of genome annotation, whereas very small fractions represented random assignment of the best hit to one of closely related in-paralogs, paralogous displacement in situ, or even less frequent genuine violations of the BBH–orthology conjecture caused by acceleration of evolution in one of the orthologs. We conclude that, at least in prokaryotes, genes for which independent evidence of orthology is available typically form BBH and, conversely, BBH can serve as a strong indication of gene orthology.
Collapse
|
32
|
Abstract
Orthologues and paralogues are types of homologous genes that are related by speciation or duplication, respectively. Orthologous genes are generally assumed to retain equivalent functions in different organisms and to share other key properties. Several recent comparative genomic studies have focused on testing these expectations. Here we discuss the complexity of the evolution of gene-phenotype relationships and assess the validity of the key implications of orthology and paralogy relationships as general statistical trends and guiding principles.
Collapse
|
33
|
Frings O, Mank JE, Alexeyenko A, Sonnhammer ELL. Network analysis of functional genomics data: application to avian sex-biased gene expression. ScientificWorldJournal 2012; 2012:130491. [PMID: 23319882 PMCID: PMC3540752 DOI: 10.1100/2012/130491] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2012] [Accepted: 11/25/2012] [Indexed: 12/03/2022] Open
Abstract
Gene expression analysis is often used to investigate the molecular and functional underpinnings of a phenotype. However, differential expression of individual genes is limited in that it does not consider how the genes interact with each other in networks. To address this shortcoming we propose a number of network-based analyses that give additional functional insights into the studied process. These were applied to a dataset of sex-specific gene expression in the chicken gonad and brain at different developmental stages. We first constructed a global chicken interaction network. Combining the network with the expression data showed that most sex-biased genes tend to have lower network connectivity, that is, act within local network environments, although some interesting exceptions were found. Genes of the same sex bias were generally more strongly connected with each other than expected. We further studied the fates of duplicated sex-biased genes and found that there is a significant trend to keep the same pattern of sex bias after duplication. We also identified sex-biased modules in the network, which reveal pathways or complexes involved in sex-specific processes. Altogether, this work integrates evolutionary genomics with systems biology in a novel way, offering new insights into the modular nature of sex-biased genes.
Collapse
Affiliation(s)
- Oliver Frings
- Stockholm Bioinformatics Centre, Science for Life Laboratory, Box 1031, SE-171 21 Solna, Sweden
| | | | | | | |
Collapse
|
34
|
Whiteside MD, Winsor GL, Laird MR, Brinkman FSL. OrtholugeDB: a bacterial and archaeal orthology resource for improved comparative genomic analysis. Nucleic Acids Res 2012. [PMID: 23203876 PMCID: PMC3531125 DOI: 10.1093/nar/gks1241] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Prediction of orthologs (homologous genes that diverged because of speciation) is an integral component of many comparative genomics methods. Although orthologs are more likely to have similar function versus paralogs (genes that diverged because of duplication), recent studies have shown that their degree of functional conservation is variable. Also, there are inherent problems with several large-scale ortholog prediction approaches. To address these issues, we previously developed Ortholuge, which uses phylogenetic distance ratios to provide more precise ortholog assessments for a set of predicted orthologs. However, the original version of Ortholuge required manual intervention and was not easily accessible; therefore, we now report the development of OrtholugeDB, available online at http://www.pathogenomics.sfu.ca/ortholugedb. OrtholugeDB provides ortholog predictions for completely sequenced bacterial and archaeal genomes from NCBI based on reciprocal best Basic Local Alignment Search Tool hits, supplemented with further evaluation by the more precise Ortholuge method. The OrtholugeDB web interface facilitates user-friendly and flexible ortholog analysis, from single genes to genomes, plus flexible data download options. We compare Ortholuge with similar methods, showing how it may more consistently identify orthologs with conserved features across a wide range of taxonomic distances. OrtholugeDB facilitates rapid, and more accurate, bacterial and archaeal comparative genomic analysis and large-scale ortholog predictions.
Collapse
Affiliation(s)
- Matthew D Whiteside
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada
| | | | | | | |
Collapse
|
35
|
Improving N-terminal protein annotation of Plasmodium species based on signal peptide prediction of orthologous proteins. Malar J 2012; 11:375. [PMID: 23153225 PMCID: PMC3529677 DOI: 10.1186/1475-2875-11-375] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Accepted: 10/31/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Signal peptide is one of the most important motifs involved in protein trafficking and it ultimately influences protein function. Considering the expected functional conservation among orthologs it was hypothesized that divergence in signal peptides within orthologous groups is mainly due to N-terminal protein sequence misannotation. Thus, discrepancies in signal peptide prediction of orthologous proteins were used to identify misannotated proteins in five Plasmodium species. METHODS Signal peptide (SignalP) and orthology (OrthoMCL) were combined in an innovative strategy to identify orthologous groups showing discrepancies in signal peptide prediction among their protein members (Mixed groups). In a comparative analysis, multiple alignments for each of these groups and gene models were visually inspected in search of misannotated proteins and, whenever possible, alternative gene models were proposed. Thresholds for signal peptide prediction parameters were also modified to reduce their impact as a possible source of discrepancy among orthologs. Validation of new gene models was based on RT-PCR (few examples) or on experimental evidence already published (ApiLoc). RESULTS The rate of misannotated proteins was significantly higher in Mixed groups than in Positive or Negative groups, corroborating the proposed hypothesis. A total of 478 proteins were reannotated and change of signal peptide prediction from negative to positive was the most common. Reannotations triggered the conversion of almost 50% of all Mixed groups, which were further reduced by optimization of signal peptide prediction parameters. CONCLUSIONS The methodological novelty proposed here combining orthology and signal peptide prediction proved to be an effective strategy for the identification of proteins showing wrongly N-terminal annotated sequences, and it might have an important impact in the available data for genome-wide searching of potential vaccine and drug targets and proteins involved in host/parasite interactions, as demonstrated for five Plasmodium species.
Collapse
|
36
|
Chen C, DeClerck G, Tian F, Spooner W, McCouch S, Buckler E. PICARA, an analytical pipeline providing probabilistic inference about a priori candidates genes underlying genome-wide association QTL in plants. PLoS One 2012; 7:e46596. [PMID: 23144785 PMCID: PMC3492367 DOI: 10.1371/journal.pone.0046596] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 09/04/2012] [Indexed: 01/28/2023] Open
Abstract
PICARA is an analytical pipeline designed to systematically summarize observed SNP/trait associations identified by genome wide association studies (GWAS) and to identify candidate genes involved in the regulation of complex trait variation. The pipeline provides probabilistic inference about a priori candidate genes using integrated information derived from genome-wide association signals, gene homology, and curated gene sets embedded in pathway descriptions. In this paper, we demonstrate the performance of PICARA using data for flowering time variation in maize – a key trait for geographical and seasonal adaption of plants. Among 406 curated flowering time-related genes from Arabidopsis, we identify 61 orthologs in maize that are significantly enriched for GWAS SNP signals, including key regulators such as FT (Flowering Locus T) and GI (GIGANTEA), and genes centered in the Arabidopsis circadian pathway, including TOC1 (Timing of CAB Expression 1) and LHY (Late Elongated Hypocotyl). In addition, we discover a regulatory feature that is characteristic of these a priori flowering time candidates in maize. This new probabilistic analytical pipeline helps researchers infer the functional significance of candidate genes associated with complex traits and helps guide future experiments by providing statistical support for gene candidates based on the integration of heterogeneous biological information.
Collapse
Affiliation(s)
- Charles Chen
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York, United States of America.
| | | | | | | | | | | |
Collapse
|
37
|
Molecular characterization of an α-N-acetylgalactosaminidase from Clonorchis sinensis. Parasitol Res 2012; 111:2149-56. [PMID: 22926676 DOI: 10.1007/s00436-012-3063-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 07/24/2012] [Indexed: 10/28/2022]
Abstract
The α-N-acetylgalactosaminidase (α-NAGAL) is an exoglycosidase that selectively cleaves terminal α-linked N-acetylgalactosamines from a variety of sugar chains. A complementary DNA (cDNA) clone encoding a novel Clonorchis sinensis α-NAGAL (Cs-α-NAGAL) was identified in the expressed sequence tags database of the adult C. sinensis liver fluke. The complete coding sequence was 1,308 bp long and encoded a 436-residue protein. The selected glycosidase was manually curated as α-NAGAL (EC 3.2.1.49) based on a composite bioinformatics analysis including a search for orthologues, comparative structure modeling, and the generation of a phylogenetic tree. One orthologue of Cs-α-NAGAL was the Rattus norvegicus α-NAGAL (accession number: NP_001012120) that does not exist in C. sinensis. Cs-α-NAGAL belongs to the GH27 family and the GH-D clan. A phylogenetic analysis revealed that the GH27 family of Cs-α-NAGAL was distinct from GH31 and GH36 within the GH-D clan. The putative 3D structure of Cs-α-NAGAL was built using SWISS-MODEL with a Gallus gallus α-NAGAL template (PDB code 1ktb chain A); this model demonstrated the superimposition of a TIM barrel fold (α/β) structure and substrate binding pocket. Cs-α-NAGAL transcripts were detected in the adult worm and egg cDNA libraries of C. sinensis but not in the metacercaria. Recombinant Cs-α-NAGAL (rCs-α-NAGAL) was expressed in Escherichia coli, and the purified rCs-α-NAGAL was recognized specifically by the C. sinensis-infected human sera. This is the first report of an α-NAGAL protein in the Trematode class, suggesting that it is a potential diagnostic or vaccine candidate with strong antigenicity.
Collapse
|
38
|
Altenhoff AM, Studer RA, Robinson-Rechavi M, Dessimoz C. Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs. PLoS Comput Biol 2012; 8:e1002514. [PMID: 22615551 PMCID: PMC3355068 DOI: 10.1371/journal.pcbi.1002514] [Citation(s) in RCA: 143] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2011] [Accepted: 03/26/2012] [Indexed: 02/07/2023] Open
Abstract
The function of most proteins is not determined experimentally, but is extrapolated from homologs. According to the “ortholog conjecture”, or standard model of phylogenomics, protein function changes rapidly after duplication, leading to paralogs with different functions, while orthologs retain the ancestral function. We report here that a comparison of experimentally supported functional annotations among homologs from 13 genomes mostly supports this model. We show that to analyze GO annotation effectively, several confounding factors need to be controlled: authorship bias, variation of GO term frequency among species, variation of background similarity among species pairs, and propagated annotation bias. After controlling for these biases, we observe that orthologs have generally more similar functional annotations than paralogs. This is especially strong for sub-cellular localization. We observe only a weak decrease in functional similarity with increasing sequence divergence. These findings hold over a large diversity of species; notably orthologs from model organisms such as E. coli, yeast or mouse have conserved function with human proteins. To infer the function of an unknown gene, possibly the most effective way is to identify a well-characterized evolutionarily related gene, and assume that they have both kept their ancestral function. If several such homologs are available, all else being equal, it has long been assumed that those that diverged by speciation (“ortholog”) are functionally closer than those that diverged by duplication (“paralogs”); thus function is more reliably inferred from the former. But despite its prevalence, this model mostly rests on first principles, as for the longest time we have not had sufficient data to test it empirically. Recently, some studies began investigating this question and have cast doubt on the validity of this model. Here, we show that by considering a wide range of organisms and data, and, crucially, by correcting for several easily overlooked biases affecting functional annotations, the standard model is corroborated by the presently available experimental data.
Collapse
Affiliation(s)
- Adrian M. Altenhoff
- ETH Zurich, Department of Computer Science, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Romain A. Studer
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, United Kingdom
| | - Marc Robinson-Rechavi
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- ETH Zurich, Department of Computer Science, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- EMBL-European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom
- * E-mail:
| |
Collapse
|
39
|
Dessimoz C, Gabaldón T, Roos DS, Sonnhammer ELL, Herrero J. Toward community standards in the quest for orthologs. Bioinformatics 2012; 28:900-4. [PMID: 22332236 PMCID: PMC3307119 DOI: 10.1093/bioinformatics/bts050] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
The identification of orthologs—genes pairs descended from a common ancestor through speciation, rather than duplication—has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second ‘Quest for Orthologs’ meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications. Contact:dessimoz@ebi.ac.uk
Collapse
|
40
|
Kruger FA, Overington JP. Global analysis of small molecule binding to related protein targets. PLoS Comput Biol 2012; 8:e1002333. [PMID: 22253582 PMCID: PMC3257267 DOI: 10.1371/journal.pcbi.1002333] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2011] [Accepted: 11/16/2011] [Indexed: 11/29/2022] Open
Abstract
We report on the integration of pharmacological data and homology information for a large scale analysis of small molecule binding to related targets. Differences in small molecule binding have been assessed for curated pairs of human to rat orthologs and also for recently diverged human paralogs. Our analysis shows that in general, small molecule binding is conserved for pairs of human to rat orthologs. Using statistical tests, we identified a small number of cases where small molecule binding is different between human and rat, some of which had previously been reported in the literature. Knowledge of species specific pharmacology can be advantageous for drug discovery, where rats are frequently used as a model system. For human paralogs, we demonstrate a global correlation between sequence identity and the binding of small molecules with equivalent affinity. Our findings provide an initial general model relating small molecule binding and sequence divergence, containing the foundations for a general model to anticipate and predict within-target-family selectivity. Many drugs are small molecules that specifically bind to proteins involved in disease related processes. In this way, drugs modulate the function of a targeted protein and ultimately the process causing the disease. The development of drugs crucially relies on assays that measure the potency of the effect a small molecule exerts on its protein target. We compared the potencies of small molecules measured for human proteins and the corresponding (orthologous) protein in rat. Our results suggest that, after subtraction of statistical noise, most human proteins are equally susceptible to small molecule binding as their orthologs in rats. However, we identified a small number of exceptions to this rule, for example the histamine H3 receptor, a protein of the central nervous system. We also compared the potency of small molecules measured against a human protein and another member of the same protein family. In drug development it is often desired to target a protein selectively over other related proteins. The observed differences were generally greater than the statistical noise, indicating that most of the small molecules in our study have some degree of selectivity within protein families.
Collapse
|
41
|
Abeysundera M, Field C, Gu H. Phylogenetic analysis based on spectral methods. Mol Biol Evol 2011; 29:579-97. [PMID: 21880577 DOI: 10.1093/molbev/msr205] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Whole-genome or multiple gene phylogenetic analysis is of interest since single gene analysis often results in poorly resolved trees. Here, the use of spectral techniques for analyzing multigene data sets is explored. The protein sequences are treated as categorical time series, and a measure of similarity between a pair of sequences, the spectral covariance, is based on the common periodicity between these two sequences. Unlike the other methods, the spectral covariance method focuses on the relationship between the sites of genetic sequences. By properly scaling the dissimilarity measures derived from different genes between a pair of species, we can use the mean of these scaled dissimilarity measures as a summary statistic to measure the taxonomic distances across multiple genes. The methods are applied to three different data sets, one noncontroversial and two with some dispute over the correct placement of the taxa in the tree. Trees are constructed using two distance-based methods, BIONJ and FITCH. A variation of block bootstrap sampling method is used for inference. The methods are able to recover all major clades in the corresponding reference trees with moderate to high bootstrap support. Through simulations, we show that the covariance-based methods effectively capture phylogenetic signal even when structural information is not fully retained. Comparisons of simulation results with the bootstrap permutation results indicate that the covariance-based methods are fairly robust under perturbations in sequence similarity but more sensitive to perturbations in structural similarity.
Collapse
Affiliation(s)
- Melanie Abeysundera
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada.
| | | | | |
Collapse
|
42
|
Forslund K, Pekkari I, Sonnhammer ELL. Domain architecture conservation in orthologs. BMC Bioinformatics 2011; 12:326. [PMID: 21819573 PMCID: PMC3215765 DOI: 10.1186/1471-2105-12-326] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2011] [Accepted: 08/05/2011] [Indexed: 11/16/2022] Open
Abstract
Background As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs. Results The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent. Conclusions On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance.
Collapse
Affiliation(s)
- Kristoffer Forslund
- Stockholm Bioinformatics Centre, Science for Life Laboratory, Box 1031, Solna, 17121 Sweden
| | | | | |
Collapse
|
43
|
Kristensen DM, Wolf YI, Mushegian AR, Koonin EV. Computational methods for Gene Orthology inference. Brief Bioinform 2011; 12:379-91. [PMID: 21690100 DOI: 10.1093/bib/bbr030] [Citation(s) in RCA: 150] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches. The most direct tree-based methods typically rely on the comparison of an individual gene tree with a species tree. Once the two trees are accurately constructed, orthologs are straightforwardly identified by the definition of orthology as those homologs that are related by speciation, rather than gene duplication, at their most recent point of origin. Although ideal for the purpose of orthology identification in principle, phylogenetic trees are computationally expensive to construct for large numbers of genes and genomes, and they often contain errors, especially at large evolutionary distances. Moreover, in many organisms, in particular prokaryotes and viruses, evolution does not appear to have followed a simple 'tree-like' mode, which makes conventional tree reconciliation inapplicable. Other, heuristic methods identify probable orthologs as the closest homologous pairs or groups of genes in a set of organisms. These approaches are faster and easier to automate than tree-based methods, with efficient implementations provided by graph-theoretical algorithms enabling comparisons of thousands of genomes. Comparisons of these two approaches show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances. Synteny also can aid in identification of orthologs. Often, tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods.
Collapse
Affiliation(s)
- David M Kristensen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
44
|
Linard B, Thompson JD, Poch O, Lecompte O. OrthoInspector: comprehensive orthology analysis and visual exploration. BMC Bioinformatics 2011; 12:11. [PMID: 21219603 PMCID: PMC3024942 DOI: 10.1186/1471-2105-12-11] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Accepted: 01/10/2011] [Indexed: 01/28/2023] Open
Abstract
Background The accurate determination of orthology and inparalogy relationships is essential for comparative sequence analysis, functional gene annotation and evolutionary studies. Various methods have been developed based on either simple blast all-versus-all pairwise comparisons and/or time-consuming phylogenetic tree analyses. Results We have developed OrthoInspector, a new software system incorporating an original algorithm for the rapid detection of orthology and inparalogy relations between different species. In comparisons with existing methods, OrthoInspector improves detection sensitivity, with a minimal loss of specificity. In addition, several visualization tools have been developed to facilitate in-depth studies based on these predictions. The software has been used to study the orthology/in-paralogy relationships for a large set of 940,855 protein sequences from 59 different eukaryotic species. Conclusion OrthoInspector is a new software system for orthology/paralogy analysis. It is made available as an independent software suite that can be downloaded and installed for local use. Command line querying facilitates the integration of the software in high throughput processing pipelines and a graphical interface provides easy, intuitive access to results for the non-expert.
Collapse
Affiliation(s)
- Benjamin Linard
- Laboratoire de bioinformatique et genomique integratives, Département de Biologie et Génomique Structurales CNRS/INSERM/UDS, Institut de Génétique et de Biologie Moléculaire et Cellulaire, 1 rue Laurent Fries, 67404, Illkirch, Cedex, France.
| | | | | | | |
Collapse
|
45
|
Protein structure determination by exhaustive search of Protein Data Bank derived databases. Proc Natl Acad Sci U S A 2010; 107:21476-81. [PMID: 21098306 DOI: 10.1073/pnas.1012095107] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.
Collapse
|
46
|
Knutson BA. Insights into the domain and repeat architecture of target of rapamycin. J Struct Biol 2010; 170:354-63. [PMID: 20060908 DOI: 10.1016/j.jsb.2010.01.002] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2009] [Revised: 12/21/2009] [Accepted: 01/04/2010] [Indexed: 01/09/2023]
Abstract
A simple and efficient protein sequence analysis strategy was developed to predict the number and location of structural repeats in the TOR protein. This strategy uses multiple HHpred alignments against proteins of known 3D structure to enable protein repeats referenced from the 3D structure to be traced back to the query protein sequence by using user-directed repeat assignments. The HHpred strategy performed with high sensitivity by predicting 100% of the repeat units within a test set of HEAT- and TPR-repeat-containing proteins of known three-dimensional structure. The HHpred strategy predicts that TOR contains 32 tandem HEAT repeats extending from the N-terminus to the FAT domain, which is itself comprised of 16 tandem TPR repeats. These findings were used to assemble a 3D atomic model for the TOR protein.
Collapse
Affiliation(s)
- Bruce A Knutson
- Fred Hutchinson Cancer Research Center, Basic Sciences Division, 1100 Fairview Ave N, P.O. Box 19024, Mailstop A1-162, Seattle, WA 98109, USA.
| |
Collapse
|