1
|
Cosentino S, Sriswasdi S, Iwasaki W. SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models. Genome Biol 2024; 25:195. [PMID: 39054525 PMCID: PMC11270883 DOI: 10.1186/s13059-024-03298-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 06/04/2024] [Indexed: 07/27/2024] Open
Abstract
Accurate inference of orthologous genes constitutes a prerequisite for comparative and evolutionary genomics. SonicParanoid is one of the fastest tools for orthology inference; however, its scalability and accuracy have been hampered by time-consuming all-versus-all alignments and the existence of proteins with complex domain architectures. Here, we present a substantial update of SonicParanoid, where a gradient boosting predictor halves the execution time and a language model doubles the recall. Application to empirical large-scale and standardized benchmark datasets shows that SonicParanoid2 is much faster than comparable methods and also the most accurate. SonicParanoid2 is available at https://gitlab.com/salvo981/sonicparanoid2 and https://zenodo.org/doi/10.5281/zenodo.11371108 .
Collapse
Affiliation(s)
- Salvatore Cosentino
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa, Japan
| | - Sira Sriswasdi
- Center of Excellence in Computational Molecular Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Wataru Iwasaki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa, Japan.
- Department of Biological Sciences, Graduate School of Science, the University of Tokyo, Bunkyo-ku, Japan.
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa, Japan.
- Atmosphere and Ocean Research Institute, the University of Tokyo, Kashiwa, Japan.
- Institute for Quantitative Biosciences, the University of Tokyo, Bunkyo-ku, Japan.
- Collaborative Research Institute for Innovative Microbiology, the University of Tokyo, Bunkyo-ku, Japan.
| |
Collapse
|
2
|
Aufiero G, Fruggiero C, D’Angelo D, D’Agostino N. Homoeologs in Allopolyploids: Navigating Redundancy as Both an Evolutionary Opportunity and a Technical Challenge-A Transcriptomics Perspective. Genes (Basel) 2024; 15:977. [PMID: 39202338 PMCID: PMC11353593 DOI: 10.3390/genes15080977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 07/22/2024] [Accepted: 07/23/2024] [Indexed: 09/03/2024] Open
Abstract
Allopolyploidy in plants involves the merging of two or more distinct parental genomes into a single nucleus, a significant evolutionary process in the plant kingdom. Transcriptomic analysis provides invaluable insights into allopolyploid plants by elucidating the fate of duplicated genes, revealing evolutionary novelties and uncovering their environmental adaptations. By examining gene expression profiles, scientists can discern how duplicated genes have evolved to acquire new functions or regulatory roles. This process often leads to the development of novel traits and adaptive strategies that allopolyploid plants leverage to thrive in diverse ecological niches. Understanding these molecular mechanisms not only enhances our appreciation of the genetic complexity underlying allopolyploidy but also underscores their importance in agriculture and ecosystem resilience. However, transcriptome profiling is challenging due to genomic redundancy, which is further complicated by the presence of multiple chromosomes sets and the variations among homoeologs and allelic genes. Prior to transcriptome analysis, sub-genome phasing and homoeology inference are essential for obtaining a comprehensive view of gene expression. This review aims to clarify the terminology in this field, identify the most challenging aspects of transcriptome analysis, explain their inherent difficulties, and suggest reliable analytic strategies. Furthermore, bulk RNA-seq is highlighted as a primary method for studying allopolyploid gene expression, focusing on critical steps like read mapping and normalization in differential gene expression analysis. This approach effectively captures gene expression from both parental genomes, facilitating a comprehensive analysis of their combined profiles. Its sensitivity in detecting low-abundance transcripts allows for subtle differences between parental genomes to be identified, crucial for understanding regulatory dynamics and gene expression balance in allopolyploids.
Collapse
Affiliation(s)
| | | | | | - Nunzio D’Agostino
- Department of Agricultural Sciences, University of Naples Federico II, 80055 Portici, Italy; (G.A.); (C.F.); (D.D.)
| |
Collapse
|
3
|
Guo L, Wang S, Jiao X, Ye X, Deng D, Liu H, Li Y, Van de Peer Y, Wu W. Convergent and/or parallel evolution of RNA-binding proteins in angiosperms after polyploidization. THE NEW PHYTOLOGIST 2024; 242:1377-1393. [PMID: 38436132 DOI: 10.1111/nph.19656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 02/20/2024] [Indexed: 03/05/2024]
Abstract
Increasing studies suggest that the biased retention of stress-related transcription factors (TFs) after whole-genome duplications (WGDs) could rewire gene transcriptional networks, facilitating plant adaptation to challenging environments. However, the role of posttranscriptional factors (e.g. RNA-binding proteins, RBPs) following WGDs has been largely ignored. Uncovering thousands of RBPs in 21 representative angiosperm species, we integrate genomic, transcriptomic, regulatomic, and paleotemperature datasets to unravel their evolutionary trajectories and roles in adapting to challenging environments. We reveal functional enrichments of RBP genes in stress responses and identify their convergent retention across diverse angiosperms from independent WGDs, coinciding with global cooling periods. Numerous RBP duplicates derived from WGDs are then identified as cold-induced. A significant overlap of 29 orthogroups between WGD-derived and cold-induced RBP genes across diverse angiosperms highlights a correlation between WGD and cold stress. Notably, we unveil an orthogroup (Glycine-rich RNA-binding Proteins 7/8, GRP7/8) and relevant TF duplicates (CCA1/LHY, RVE4/8, CBF2/4, etc.), co-retained in different angiosperms post-WGDs. Finally, we illustrate their roles in rewiring circadian and cold-regulatory networks at both transcriptional and posttranscriptional levels during global cooling. Altogether, we underline the adaptive evolution of RBPs in angiosperms after WGDs during global cooling, improving our understanding of plants surviving periods of environmental turmoil.
Collapse
Affiliation(s)
- Liangyu Guo
- State Key Laboratory of Subtropical Silviculture, School of Forestry and Biotechnology, Zhejiang A&F University, Lin'an, Hangzhou, 311300, China
| | - Shuo Wang
- State Key Laboratory of Subtropical Silviculture, School of Forestry and Biotechnology, Zhejiang A&F University, Lin'an, Hangzhou, 311300, China
| | - Xi Jiao
- State Key Laboratory of Subtropical Silviculture, School of Forestry and Biotechnology, Zhejiang A&F University, Lin'an, Hangzhou, 311300, China
| | - Xiaoxue Ye
- Institute of Tropical Biosciences and Biotechnology, Chinese Academy of Tropical Agricultural Sciences, Haikou, 571101, China
| | - Deyin Deng
- State Key Laboratory of Subtropical Silviculture, School of Forestry and Biotechnology, Zhejiang A&F University, Lin'an, Hangzhou, 311300, China
| | - Hua Liu
- State Key Laboratory of Subtropical Silviculture, School of Forestry and Biotechnology, Zhejiang A&F University, Lin'an, Hangzhou, 311300, China
| | - Yan Li
- State Key Laboratory of Subtropical Silviculture, School of Forestry and Biotechnology, Zhejiang A&F University, Lin'an, Hangzhou, 311300, China
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, VIB - UGent Center for Plant Systems Biology, Ghent University, B-9052, Ghent, Belgium
- College of Horticulture, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, 210095, China
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, 0028, South Africa
| | - Wenwu Wu
- State Key Laboratory of Subtropical Silviculture, School of Forestry and Biotechnology, Zhejiang A&F University, Lin'an, Hangzhou, 311300, China
| |
Collapse
|
4
|
Steinbinder J, Sachslehner AP, Holthaus KB, Eckhart L. Comparative genomics of monotremes provides insights into the early evolution of mammalian epidermal differentiation genes. Sci Rep 2024; 14:1437. [PMID: 38228724 PMCID: PMC10791643 DOI: 10.1038/s41598-024-51926-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 01/11/2024] [Indexed: 01/18/2024] Open
Abstract
The function of the skin as a barrier against the environment depends on the differentiation of epidermal keratinocytes into highly resilient corneocytes that form the outermost skin layer. Many genes encoding structural components of corneocytes are clustered in the epidermal differentiation complex (EDC), which has been described in placental and marsupial mammals as well as non-mammalian tetrapods. Here, we analyzed the genomes of the platypus (Ornithorhynchus anatinus) and the echidna (Tachyglossus aculeatus) to determine the gene composition of the EDC in the basal clade of mammals, the monotremes. We report that mammal-specific subfamilies of EDC genes encoding small proline-rich proteins (SPRRs) and late cornified envelope proteins as well as single-copy EDC genes such as involucrin are conserved in monotremes, suggesting that they have originated in stem mammals. Monotremes have at least one gene homologous to the group of filaggrin (FLG), FLG2 and hornerin (HRNR) in placental mammals, but no clear one-to-one pairwise ortholog of either FLG, FLG2 or HRNR. Caspase-14, a keratinocyte differentiation-associated protease implicated in the processing of filaggrin, is encoded by at least 3 gene copies in the echidna. Our results reveal evolutionarily conserved and clade-specific features of the genetic regulation of epidermal differentiation in monotremes.
Collapse
Affiliation(s)
- Julia Steinbinder
- Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | | | | | - Leopold Eckhart
- Department of Dermatology, Medical University of Vienna, Vienna, Austria.
| |
Collapse
|
5
|
Singh V, Singh V. Inferring Interaction Networks from Transcriptomic Data: Methods and Applications. Methods Mol Biol 2024; 2812:11-37. [PMID: 39068355 DOI: 10.1007/978-1-0716-3886-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Transcriptomic data is a treasure trove in modern molecular biology, as it offers a comprehensive viewpoint into the intricate nuances of gene expression dynamics underlying biological systems. This genetic information must be utilized to infer biomolecular interaction networks that can provide insights into the complex regulatory mechanisms underpinning the dynamic cellular processes. Gene regulatory networks and protein-protein interaction networks are two major classes of such networks. This chapter thoroughly investigates the wide range of methodologies used for distilling insightful revelations from transcriptomic data that include association-based methods (based on correlation among expression vectors), probabilistic models (using Bayesian and Gaussian models), and interologous methods. We reviewed different approaches for evaluating the significance of interactions based on the network topology and biological functions of the interacting molecules and discuss various strategies for the identification of functional modules. The chapter concludes with highlighting network-based techniques of prioritizing key genes, outlining the centrality-based, diffusion- based, and subgraph-based methods. The chapter provides a meticulous framework for investigating transcriptomic data to uncover assembly of complex molecular networks for their adaptable analyses across a broad spectrum of biological domains.
Collapse
Affiliation(s)
- Vikram Singh
- Centre for Computational Biology and Bioinformatics, Central University of Himachal Pradesh, Dharamshala, Himachal Pradesh, India
| | - Vikram Singh
- Centre for Computational Biology and Bioinformatics, Central University of Himachal Pradesh, Dharamshala, Himachal Pradesh, India.
| |
Collapse
|
6
|
Ebu SM, Ray L, Panda AN, Gouda SK. De novo assembly and comparative genome analysis for polyhydroxyalkanoates-producing Bacillus sp. BNPI-92 strain. J Genet Eng Biotechnol 2023; 21:132. [PMID: 37991636 PMCID: PMC10665291 DOI: 10.1186/s43141-023-00578-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 10/26/2023] [Indexed: 11/23/2023]
Abstract
BACKGROUND Certain Bacillus species play a vital role in polyhydroxyalkanoate (PHA) production. However, most of these isolates did not properly identify to species level when scientifically had been reported. RESULTS From NGS analysis, 5719 genes were predicted in the de novo genome assembly. Based on genome annotation using RAST server, 5,527,513 bp sequences were predicted with 5679 bp number of protein-coding sequence. Its genome sequence contains 35.1% and 156 GC content and contigs, respectively. In RAST server analysis, subsystem (43%) and non-subsystem coverage (57%) were generated. Ortho Venn comparative genome analysis indicated that Bacillus sp. BNPI-92 shared 2930 gene cluster (core gene) with B. cereus ATCC 14579 T (AE016877), B. paranthracis Mn5T (MACE01000012), B. thuringiensis ATCC 10792 T (ACNF01000156), and B. antrics Amen T (AE016879) strains. For our strain, the maximum gene cluster (190) was shared with B. cereus ATCC 14579 T (AE016877). For Ortho Venn pair wise analysis, the maximum overlapping gene clusters thresholds have been detected between Bacillus s p.BNPI-92 and Ba. cereus ATCC 14579 T (5414). Average nucleotide identity (ANI) such as OriginalANI and OrthoANI, in silicon digital DND-DNA hybridization (isDDH), Type (Strain) Genome Server (TYGS), and Genome-Genome Distance Calculator (GGDC) were more essentially related Bacillus sp. BNPI-92 with B. cereus ATCC 14579 T strain. Therefore, based on the combination of RAST annotation, OrthoVenn server, ANI and isDDH result Bacillus sp.BNPI-92 strain was strongly confirmed to be a B. cereus type strain. It was designated as B. cereus BNPI-92 strain. In B. cereus BNPI-92 strain whole genome sequence, PHA biosynthesis encoding genes such as phaP, phaQ, phaR (PHA synthesis repressor phaR gene sequence), phaB/phbB, and phaC were predicted on the same operon. These gene clusters were designated as phaPQRBC. However, phaA was located on other operons. CONCLUSIONS This newly obtained isolate was found to be new a strain based on comparative genomic analysis and it was also observed as a potential candidate for PHA biosynthesis.
Collapse
Affiliation(s)
- Seid Mohammed Ebu
- Department of Applied Biology, SoANS, Adama Science and Technology University, Oromia, Ethiopia.
| | - Lopamudra Ray
- School of Law, Campus -16 Adjunct Faculty, School of Biotech, Campus-11 KIIT University, Bhubaneswar, Odisha, 751024, India
| | - Ananta N Panda
- School of Biotechnology, Campus-11 KIIT University, Bhubaneswar, Odisha, 751024, India
| | - Sudhansu K Gouda
- School of Biotechnology, Campus-11 KIIT University, Bhubaneswar, Odisha, 751024, India
| |
Collapse
|
7
|
Manzano-Morales S, Liu Y, González-Bodí S, Huerta-Cepas J, Iranzo J. Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses. Genome Biol 2023; 24:250. [PMID: 37904249 PMCID: PMC10614367 DOI: 10.1186/s13059-023-03089-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
BACKGROUND A key step for comparative genomics is to group open reading frames into functionally and evolutionarily meaningful gene clusters. Gene clustering is complicated by intraspecific duplications and horizontal gene transfers that are frequent in prokaryotes. In consequence, gene clustering methods must deal with a trade-off between identifying vertically transmitted representatives of multicopy gene families, which are recognizable by synteny conservation, and retrieving complete sets of species-level orthologs. We studied the implications of adopting homology, orthology, or synteny conservation as formal criteria for gene clustering by performing comparative analyses of 125 prokaryotic pangenomes. RESULTS Clustering criteria affect pangenome functional characterization, core genome inference, and reconstruction of ancestral gene content to different extents. Species-wise estimates of pangenome and core genome sizes change by the same factor when using different clustering criteria, allowing robust cross-species comparisons regardless of the clustering criterion. However, cross-species comparisons of genome plasticity and functional profiles are substantially affected by inconsistencies among clustering criteria. Such inconsistencies are driven not only by mobile genetic elements, but also by genes involved in defense, secondary metabolism, and other accessory functions. In some pangenome features, the variability attributed to methodological inconsistencies can even exceed the effect sizes of ecological and phylogenetic variables. CONCLUSIONS Choosing an appropriate criterion for gene clustering is critical to conduct unbiased pangenome analyses. We provide practical guidelines to choose the right method depending on the research goals and the quality of genome assemblies, and a benchmarking dataset to assess the robustness and reproducibility of future comparative studies.
Collapse
Affiliation(s)
- Saioa Manzano-Morales
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Barcelona Supercomputing Centre (BSC-CNS) - Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Yang Liu
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, China
| | - Sara González-Bodí
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain.
| | - Jaime Iranzo
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain.
- Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Zaragoza, Spain.
| |
Collapse
|
8
|
Dobbelaere J, Su TY, Erdi B, Schleiffer A, Dammermann A. A phylogenetic profiling approach identifies novel ciliogenesis genes in Drosophila and C. elegans. EMBO J 2023; 42:e113616. [PMID: 37317646 PMCID: PMC10425847 DOI: 10.15252/embj.2023113616] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 05/22/2023] [Accepted: 06/01/2023] [Indexed: 06/16/2023] Open
Abstract
Cilia are cellular projections that perform sensory and motile functions in eukaryotic cells. A defining feature of cilia is that they are evolutionarily ancient, yet not universally conserved. In this study, we have used the resulting presence and absence pattern in the genomes of diverse eukaryotes to identify a set of 386 human genes associated with cilium assembly or motility. Comprehensive tissue-specific RNAi in Drosophila and mutant analysis in C. elegans revealed signature ciliary defects for 70-80% of novel genes, a percentage similar to that for known genes within the cluster. Further characterization identified different phenotypic classes, including a set of genes related to the cartwheel component Bld10/CEP135 and two highly conserved regulators of cilium biogenesis. We propose this dataset defines the core set of genes required for cilium assembly and motility across eukaryotes and presents a valuable resource for future studies of cilium biology and associated disorders.
Collapse
Affiliation(s)
- Jeroen Dobbelaere
- Max Perutz LabsUniversity of Vienna, Vienna Biocenter (VBC)ViennaAustria
| | - Tiffany Y Su
- Max Perutz LabsUniversity of Vienna, Vienna Biocenter (VBC)ViennaAustria
- Vienna BioCenter PhD ProgramDoctoral School of the University of Vienna and Medical University of ViennaViennaAustria
| | - Balazs Erdi
- Max Perutz LabsUniversity of Vienna, Vienna Biocenter (VBC)ViennaAustria
| | - Alexander Schleiffer
- Research Institute of Molecular Pathology, Vienna Biocenter (VBC)ViennaAustria
- Institute of Molecular Biotechnology of the Austrian Academy of Sciences, Vienna Biocenter (VBC)ViennaAustria
| | | |
Collapse
|
9
|
Lyubetsky VA, Rubanov LI, Tereshina MB, Ivanova AS, Araslanova KR, Uroshlev LA, Goremykina GI, Yang JR, Kanovei VG, Zverkov OA, Shitikov AD, Korotkova DD, Zaraisky AG. Wide-scale identification of novel/eliminated genes responsible for evolutionary transformations. Biol Direct 2023; 18:45. [PMID: 37568147 PMCID: PMC10416458 DOI: 10.1186/s13062-023-00405-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 08/07/2023] [Indexed: 08/13/2023] Open
Abstract
BACKGROUND It is generally accepted that most evolutionary transformations at the phenotype level are associated either with rearrangements of genomic regulatory elements, which control the activity of gene networks, or with changes in the amino acid contents of proteins. Recently, evidence has accumulated that significant evolutionary transformations could also be associated with the loss/emergence of whole genes. The targeted identification of such genes is a challenging problem for both bioinformatics and evo-devo research. RESULTS To solve this problem we propose the WINEGRET method, named after the first letters of the title. Its main idea is to search for genes that satisfy two requirements: first, the desired genes were lost/emerged at the same evolutionary stage at which the phenotypic trait of interest was lost/emerged, and second, the expression of these genes changes significantly during the development of the trait of interest in the model organism. To verify the first requirement, we do not use existing databases of orthologs, but rely purely on gene homology and local synteny by using some novel quickly computable conditions. Genes satisfying the second requirement are found by deep RNA sequencing. As a proof of principle, we used our method to find genes absent in extant amniotes (reptiles, birds, mammals) but present in anamniotes (fish and amphibians), in which these genes are involved in the regeneration of large body appendages. As a result, 57 genes were identified. For three of them, c-c motif chemokine 4, eotaxin-like, and a previously unknown gene called here sod4, essential roles for tail regeneration were demonstrated. Noteworthy, we established that the latter gene belongs to a novel family of Cu/Zn-superoxide dismutases lost by amniotes, SOD4. CONCLUSIONS We present a method for targeted identification of genes whose loss/emergence in evolution could be associated with the loss/emergence of a phenotypic trait of interest. In a proof-of-principle study, we identified genes absent in amniotes that participate in body appendage regeneration in anamniotes. Our method provides a wide range of opportunities for studying the relationship between the loss/emergence of phenotypic traits and the loss/emergence of specific genes in evolution.
Collapse
Affiliation(s)
- Vassily A Lyubetsky
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
- Department of Mechanics and Mathematics, Lomonosov Moscow State University, Kolmogorova Str., 1, Moscow, Russia, 119234
| | - Lev I Rubanov
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Maria B Tereshina
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Pirogov Russian National Research Medical University, Moscow, Russia
| | - Anastasiya S Ivanova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, USA
| | - Karina R Araslanova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
| | - Leonid A Uroshlev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 32, Vavilova Str., Moscow, Russia, 119991
| | - Galina I Goremykina
- Plekhanov Russian University of Economics, Stremyanny Lane 36, Moscow, Russia
| | - Jian-Rong Yang
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Vladimir G Kanovei
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Oleg A Zverkov
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Alexander D Shitikov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
| | - Daria D Korotkova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Global Health Institute, School of Life Sciences, EPFL, Lausanne, Switzerland
| | - Andrey G Zaraisky
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997.
- Pirogov Russian National Research Medical University, Moscow, Russia.
| |
Collapse
|
10
|
Xiong W, Risse J, Berke L, Zhao T, van de Geest H, Oplaat C, Busscher M, Ferreira de Carvalho J, van der Meer IM, Verhoeven KJF, Schranz ME, Vijverberg K. Phylogenomic analysis provides insights into MADS-box and TCP gene diversification and floral development of the Asteraceae, supported by de novo genome and transcriptome sequences from dandelion ( Taraxacum officinale). FRONTIERS IN PLANT SCIENCE 2023; 14:1198909. [PMCID: PMC10338227 DOI: 10.3389/fpls.2023.1198909] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 05/26/2023] [Indexed: 07/15/2023]
Abstract
The Asteraceae is the largest angiosperm family with more than 25,000 species. Individual studies have shown that MADS-box and TCP transcription factors are regulators of the development and symmetry of flowers, contributing to their iconic flower-head (capitulum) and floret. However, a systematic study of MADS-box and TCP genes across the Asteraceae is lacking. We performed a comparative analysis of genome sequences of 33 angiosperm species including our de novo assembly of diploid sexual dandelion (Taraxacum officinale) and 11 other Asteraceae to investigate the lineage-specific evolution of MADS-box and TCP genes in the Asteraceae. We compared the phylogenomic results of MADS-box and TCP genes with their expression in T. officinale floral tissues at different developmental stages to demonstrate the regulation of genes with Asteraceae-specific attributes. Here, we show that MADS-box MIKCc and TCP-CYCLOIDEA (CYC) genes have expanded in the Asteraceae. The phylogenomic analysis identified AGAMOUS-like (AG-like: SEEDSTICK [STK]-like), SEPALATA-like (SEP3-like), and TCP-PROLIFERATING CELL FACTOR (PCF)-like copies with lineage-specific genomic contexts in the Asteraceae, Cichorioideae, or dandelion. Different expression patterns of some of these gene copies suggest functional divergence. We also confirm the presence and revisit the evolutionary history of previously named “Asteraceae-Specific MADS-box genes (AS-MADS).” Specifically, we identify non-Asteraceae homologs, indicating a more ancient origin of this gene clade. Syntenic relationships support that AS-MADS is paralogous to FLOWERING LOCUS C (FLC) as demonstrated by the shared ancient duplication of FLC and SEP3.
Collapse
Affiliation(s)
- Wei Xiong
- Biosystematics Group, Wageningen University and Research, Wageningen, Netherlands
| | - Judith Risse
- Bioinformatics Group, Wageningen University and Research, Wageningen, Netherlands
- Department of Terrestrial Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, Netherlands
| | - Lidija Berke
- Biosystematics Group, Wageningen University and Research, Wageningen, Netherlands
| | - Tao Zhao
- Biosystematics Group, Wageningen University and Research, Wageningen, Netherlands
| | | | - Carla Oplaat
- Biosystematics Group, Wageningen University and Research, Wageningen, Netherlands
| | - Marco Busscher
- Biosystematics Group, Wageningen University and Research, Wageningen, Netherlands
- Bioscience, Wageningen University and Research, Wageningen, Netherlands
| | - Julie Ferreira de Carvalho
- Department of Terrestrial Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, Netherlands
| | | | - Koen J. F. Verhoeven
- Department of Terrestrial Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, Netherlands
| | - M. Eric Schranz
- Biosystematics Group, Wageningen University and Research, Wageningen, Netherlands
| | - Kitty Vijverberg
- Biosystematics Group, Wageningen University and Research, Wageningen, Netherlands
| |
Collapse
|
11
|
Watanabe T, Kure A, Horiike T. OrthoPhy: A Program to Construct Ortholog Data Sets Using Taxonomic Information. Genome Biol Evol 2023; 15:7044703. [PMID: 36799928 PMCID: PMC9991595 DOI: 10.1093/gbe/evad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 01/30/2023] [Accepted: 02/13/2023] [Indexed: 02/18/2023] Open
Abstract
Species phylogenetic trees represent the evolutionary processes of organisms, and they are fundamental in evolutionary research. Therefore, new methods have been developed to obtain more reliable species phylogenetic trees. A highly reliable method is the construction of an ortholog data set based on sequence information of genes, which is then used to infer the species phylogenetic tree. However, although methods for constructing an ortholog data set for species phylogenetic analysis have been developed, they cannot remove some paralogs, which is necessary for reliable species phylogenetic inference. To address the limitations of current methods, we developed OrthoPhy, a program that excludes paralogs and constructs highly accurate ortholog data sets using taxonomic information dividing analyzed species into monophyletic groups. OrthoPhy can remove paralogs, detecting inconsistencies between taxonomic information and phylogenetic trees of candidate ortholog groups clustered by sequence similarity. Performance tests using evolutionary simulated sequences and real sequences of 40 bacteria revealed that the precision of ortholog inference by OrthoPhy is higher than that of existing programs. Additionally, the phylogenetic analysis of species was more accurate when performed using ortholog data sets constructed by OrthoPhy than that performed using data sets constructed by existing programs. Furthermore, we performed a benchmark test of the Quest for Orthologs using real sequence data and found that the concordance rate between the phylogenetic trees of orthologs inferred by OrthoPhy and those of species was higher than the rates obtained by other ortholog inference programs. Therefore, ortholog data sets constructed using OrthoPhy enabled a more accurate phylogenetic analysis of species than those constructed using the existing programs, and OrthoPhy can be used for the phylogenetic analysis of species even for distantly related species that have experienced many evolutionary events.
Collapse
Affiliation(s)
- Tomoaki Watanabe
- United Graduate School of Agricultural Science, Gifu University, Gifu, Japan
| | - Akinori Kure
- Graduate School of Integrated Science and Technology, Shizuoka University, Shizuoka, Japan
| | - Tokumasa Horiike
- Department of Bioresource Sciences, Shizuoka University, Shizuoka, Japan
| |
Collapse
|
12
|
Yu C, Chen H, zhu L, Song Y, Jiang Q, Zhang Y, Ali Q, Gu Q, Gao X, Borriss R, Dong S, Wu H. Profiling of Antimicrobial Metabolites Synthesized by the Endophytic and Genetically Amenable Biocontrol Strain Bacillus velezensis DMW1. Microbiol Spectr 2023; 11:e0003823. [PMID: 36809029 PMCID: PMC10100683 DOI: 10.1128/spectrum.00038-23] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 01/26/2023] [Indexed: 02/23/2023] Open
Abstract
The genus Bacillus is one of the most important genera for the biological control of plant diseases that are caused by various phytopathogens. The endophytic Bacillus strain DMW1 was isolated from the inner tissues of potato tubers and exhibited strong biocontrol activity. Based on its whole-genome sequence, DMW1 belongs to the Bacillus velezensis species, and it is similar to the model strain B. velezensis FZB42. 12 secondary metabolite biosynthetic gene clusters (BGCs), including two unknown function BGCs, were detected in the DMW1 genome. The strain was shown to be genetically amenable, and seven secondary metabolites acting antagonistically against plant pathogens were identified by a combined genetic and chemical approach. Strain DMW1 did significantly improve the growth of tomato and soybean seedlings, and it was able to control the Phytophthora sojae and Ralstonia solanacearum that were present in the plant seedlings. Due to these properties, the endophytic strain DMW1 appears to be a promising candidate for comparative investigations performed together with the Gram-positive model rhizobacterium FZB42, which is only able to colonize the rhizoplane. IMPORTANCE Phytopathogens are responsible for the wide spread of plant diseases as well as for great losses of crop yields. At present, the strategies used to control plant disease, including the development of resistant cultivars and chemical control, may become ineffective due to the adaptive evolution of pathogens. Therefore, the use of beneficial microorganisms to deal with plant diseases attracts great attention. In the present study, a new strain DMW1, belonging to the species B. velezensis, was discovered with outstanding biocontrol properties. It showed plant growth promotion and disease control abilities that are comparable with those of B. velezensis FZB42 under greenhouse conditions. According to a genomic analysis and a bioactive metabolites analysis, genes that are responsible for promoting plant growth were detected, and metabolites with different antagonistic activities were identified. Our data provide a basis for DMW1 to be further developed and applied as a biopesticide, which is similar to the closely related model strain FZB42.
Collapse
Affiliation(s)
- Chenjie Yu
- Department of Plant Pathology, College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Ministry of Education, Nanjing, China
| | - Han Chen
- Department of Plant Pathology, College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Ministry of Education, Nanjing, China
| | - Linli zhu
- Department of Plant Pathology, College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Ministry of Education, Nanjing, China
| | - Yan Song
- Department of Plant Pathology, College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Ministry of Education, Nanjing, China
| | - Qifan Jiang
- Department of Plant Pathology, College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Ministry of Education, Nanjing, China
| | - Yaming Zhang
- Department of Plant Pathology, College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Ministry of Education, Nanjing, China
| | - Qurban Ali
- Department of Plant Pathology, College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Ministry of Education, Nanjing, China
| | - Qin Gu
- Department of Plant Pathology, College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Ministry of Education, Nanjing, China
| | - Xuewen Gao
- Department of Plant Pathology, College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Ministry of Education, Nanjing, China
| | - Rainer Borriss
- Humboldt University Berlin, Institut für Biologie, Berlin, Germany
| | - Suomeng Dong
- Department of Plant Pathology, College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Ministry of Education, Nanjing, China
| | - Huijun Wu
- Department of Plant Pathology, College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Ministry of Education, Nanjing, China
| |
Collapse
|
13
|
Liu K, Chen Q, Huang GH. An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF. Genes (Basel) 2023; 14:421. [PMID: 36833348 PMCID: PMC9957060 DOI: 10.3390/genes14020421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 02/10/2023] Open
Abstract
Gene families, which are parts of a genome's information storage hierarchy, play a significant role in the development and diversity of multicellular organisms. Several studies have focused on the characteristics of gene families, such as function, homology, or phenotype. However, statistical and correlation analyses on the distribution of gene family members in the genome have yet to be conducted. Here, a novel framework incorporating gene family analysis and genome selection based on NMF-ReliefF is reported. Specifically, the proposed method starts by obtaining gene families from the TreeFam database and determining the number of gene families within the feature matrix. Then, NMF-ReliefF is used to select features from the gene feature matrix, which is a new feature selection algorithm that overcomes the inefficiencies of traditional methods. Finally, a support vector machine is utilized to classify the acquired features. The results show that the framework achieved an accuracy of 89.1% and an AUC of 0.919 on the insect genome test set. We also employed four microarray gene data sets to evaluate the performance of the NMF-ReliefF algorithm. The outcomes show that the proposed method may strike a delicate balance between robustness and discrimination. Additionally, the proposed method's categorization is superior to state-of-the-art feature selection approaches.
Collapse
Affiliation(s)
- Kai Liu
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China
| | - Qi Chen
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
| | - Guo-Hua Huang
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
| |
Collapse
|
14
|
Iqbal S, Qasim M, Rahman H, Khan N, Paracha RZ, Bhatti MF, Javed A, Janjua HA. Genome mining, antimicrobial and plant growth-promoting potentials of halotolerant Bacillus paralicheniformis ES-1 isolated from salt mine. Mol Genet Genomics 2023; 298:79-93. [PMID: 36301366 DOI: 10.1007/s00438-022-01964-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 10/11/2022] [Indexed: 01/10/2023]
Abstract
Salinity severely affects crop yield by hindering nitrogen uptake and reducing plant growth. Plant growth-promoting bacteria (PGPB) are capable of providing cross-protection against biotic/abiotic stresses and facilitating plant growth. Genome-level knowledge of PGPB is necessary to translate the knowledge into a product as efficient biofertilizers and biocontrol agents. The current study aimed to isolate and characterize indigenous plant growth-promoting strains with the potential to promote plant growth under various stress conditions. In this regard, 72 bacterial strains were isolated from various saline-sodic soil/lakes; 19 exhibited multiple in vitro plant growth-promoting traits, including indole 3 acetic acid production, phosphate solubilization, siderophore synthesis, lytic enzymes production, biofilm formation, and antibacterial activities. To get an in-depth insight into genome composition and diversity, whole-genome sequence and genome mining of one promising Bacillus paralicheniformis strain ES-1 were performed. The strain ES-1 genome carries 12 biosynthetic gene clusters, at least six genomic islands, and four prophage regions. Genome mining identified plant growth-promoting conferring genes such as phosphate solubilization, nitrogen fixation, tryptophan production, siderophore, acetoin, butanediol, chitinase, hydrogen sulfate synthesis, chemotaxis, and motility. Comparative genome analysis indicates the region of genome plasticity which shapes the structure and function of B. paralicheniformis and plays a crucial role in habitat adaptation. The strain ES-1 has a relatively large accessory genome of 649 genes (~ 19%) and 180 unique genes. Overall, these results provide valuable insight into the bioactivity and genomic insight into B. paralicheniformis strain ES-1 with its potential use in sustainable agriculture.
Collapse
Affiliation(s)
- Sajid Iqbal
- Department of Industrial Biotechnology, Atta-Ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
| | - Muhammad Qasim
- Department of Microbiology, Kohat University of Science and Technology (KUST), Kohat, Pakistan
| | - Hazir Rahman
- Department of Microbiology, Abdul Wali Khan University Mardan (AWKUM), Mardan, Pakistan
| | - Naeem Khan
- Department of Agronomy, University of Florida, Gainesville, FL, 32611, USA
| | - Rehan Zafar Paracha
- School of Interdisciplinary Engineering and Science (SINES, National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
| | - Muhammad Faraz Bhatti
- Department of Plant Biotechnology, Atta-Ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
| | - Aneela Javed
- Department of Healthcare Biotechnology, Atta-Ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
| | - Hussnain Ahmed Janjua
- Department of Industrial Biotechnology, Atta-Ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan.
| |
Collapse
|
15
|
Conant GC. POInT: Modeling Polyploidy in the Era of Ubiquitous Genomics. Methods Mol Biol 2023; 2545:77-90. [PMID: 36720808 DOI: 10.1007/978-1-0716-2561-3_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Thirteen years ago, we described an evolutionary modeling tool that could resolve the orthology relationships among the homologous genomic regions created by a whole-genome duplication. This tool, which we subsequently named POInT (the Polyploid Orthology Inference Tool), was originally only useful for studying a genome duplication known from bakers' yeast and its relatives. Now, with hundreds of genome sequences that contain the relicts of ancient polyploidy available, POInT can be used to study dozens of different polyploidies, asking both questions about the history of individual events and about the commonalities and differences seen between those events. In this chapter, I give a brief history of the development of POInT as an illustration of the interconnected nature of computational biology research. I then further describe how POInT operates and some of the strengths and drawbacks of its structure. I close with a few examples of discoveries we have made using it.
Collapse
Affiliation(s)
- Gavin C Conant
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA.
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA.
- Program in Genetics, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
16
|
Duan G, Wu G, Chen X, Tian D, Li Z, Sun Y, Du Z, Hao L, Song S, Gao Y, Xiao J, Zhang Z, Bao Y, Tang B, Zhao W. HGD: an integrated homologous gene database across multiple species. Nucleic Acids Res 2022; 51:D994-D1002. [PMID: 36318261 PMCID: PMC9825607 DOI: 10.1093/nar/gkac970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/28/2022] [Accepted: 10/17/2022] [Indexed: 11/06/2022] Open
Abstract
Homology is fundamental to infer genes' evolutionary processes and relationships with shared ancestry. Existing homolog gene resources vary in terms of inferring methods, homologous relationship and identifiers, posing inevitable difficulties for choosing and mapping homology results from one to another. Here, we present HGD (Homologous Gene Database, https://ngdc.cncb.ac.cn/hgd), a comprehensive homologs resource integrating multi-species, multi-resources and multi-omics, as a complement to existing resources providing public and one-stop data service. Currently, HGD houses a total of 112 383 644 homologous pairs for 37 species, including 19 animals, 16 plants and 2 microorganisms. Meanwhile, HGD integrates various annotations from public resources, including 16 909 homologs with traits, 276 670 homologs with variants, 398 573 homologs with expression and 536 852 homologs with gene ontology (GO) annotations. HGD provides a wide range of omics gene function annotations to help users gain a deeper understanding of gene function.
Collapse
Affiliation(s)
| | | | - Xiaoning Chen
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dongmei Tian
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Zhaohua Li
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yanling Sun
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Zhenglin Du
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Lili Hao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Shuhui Song
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yuan Gao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingfa Xiao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhang Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yiming Bao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bixia Tang
- Correspondence may also be addressed to Bixia Tang.
| | - Wenming Zhao
- To whom correspondence should be addressed. Tel: +86 1084097636; Fax: +86 1084097720;
| |
Collapse
|
17
|
Magid M, Wold JR, Moraga R, Cubrinovska I, Houston DM, Gartrell BD, Steeves TE. Leveraging an existing whole-genome resequencing population data set to characterize toll-like receptor gene diversity in a threatened bird. Mol Ecol Resour 2022; 22:2810-2825. [PMID: 35635119 PMCID: PMC9543821 DOI: 10.1111/1755-0998.13656] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 04/29/2022] [Accepted: 05/26/2022] [Indexed: 11/27/2022]
Abstract
Species recovery programs are increasingly using genomic data to measure neutral genetic diversity and calculate metrics like relatedness. While these measures can inform conservation management, determining the mechanisms underlying inbreeding depression requires information about functional genes associated with adaptive or maladaptive traits. Toll-like receptors (TLRs) are one family of functional genes, which play a crucial role in recognition of pathogens and activation of the immune system. Previously, these genes have been analysed using species-specific primers and PCR. Here, we leverage an existing short-read reference genome, whole-genome resequencing population data set, and bioinformatic tools to characterize TLR gene diversity in captive and wild tchūriwat'/tūturuatu/shore plover (Thinornis novaeseelandiae), a threatened bird endemic to Aotearoa New Zealand. Our results show that TLR gene diversity in tchūriwat'/tūturuatu is low, and forms two distinct captive and wild genetic clusters. The bioinformatic approach presented here has broad applicability to other threatened species with existing genomic resources in Aotearoa New Zealand and beyond.
Collapse
Affiliation(s)
- Molly Magid
- School of Biological SciencesUniversity of CanterburyChristchurchNew Zealand
| | - Jana R. Wold
- School of Biological SciencesUniversity of CanterburyChristchurchNew Zealand
| | - Roger Moraga
- Tea Break Bioinformatics, LtdPalmerston NorthNew Zealand
| | - Ilina Cubrinovska
- School of Biological SciencesUniversity of CanterburyChristchurchNew Zealand
| | - Dave M. Houston
- Department of ConservationBiodiversity GroupAucklandNew Zealand
| | - Brett D. Gartrell
- Institute of Veterinary, Animal, and Biomedical SciencesWildbase, Massey UniversityPalmerston NorthNew Zealand
| | - Tammy E. Steeves
- School of Biological SciencesUniversity of CanterburyChristchurchNew Zealand
| |
Collapse
|
18
|
Ncube D, Tallafuss A, Serafin J, Bruckner J, Farnsworth DR, Miller AC, Eisen JS, Washbourne P. A conserved transcriptional fingerprint of multi-neurotransmitter neurons necessary for social behavior. BMC Genomics 2022; 23:675. [PMID: 36175871 PMCID: PMC9523972 DOI: 10.1186/s12864-022-08879-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 09/02/2022] [Indexed: 11/11/2022] Open
Abstract
Background An essential determinant of a neuron’s functionality is its neurotransmitter phenotype. We previously identified a defined subpopulation of cholinergic neurons required for social orienting behavior in zebrafish. Results We transcriptionally profiled these neurons and discovered that they are capable of synthesizing both acetylcholine and GABA. We also established a constellation of transcription factors and neurotransmitter markers that can be used as a “transcriptomic fingerprint” to recognize a homologous neuronal population in another vertebrate. Conclusion Our results suggest that this transcriptomic fingerprint and the cholinergic-GABAergic neuronal subtype that it defines are evolutionarily conserved. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08879-w.
Collapse
Affiliation(s)
- Denver Ncube
- Institute of Neuroscience, 1254 University of Oregon, Eugene, OR, 97403, USA
| | - Alexandra Tallafuss
- Institute of Neuroscience, 1254 University of Oregon, Eugene, OR, 97403, USA
| | - Jen Serafin
- Institute of Neuroscience, 1254 University of Oregon, Eugene, OR, 97403, USA
| | - Joseph Bruckner
- Institute of Neuroscience, 1254 University of Oregon, Eugene, OR, 97403, USA
| | - Dylan R Farnsworth
- Institute of Neuroscience, 1254 University of Oregon, Eugene, OR, 97403, USA
| | - Adam C Miller
- Institute of Neuroscience, 1254 University of Oregon, Eugene, OR, 97403, USA
| | - Judith S Eisen
- Institute of Neuroscience, 1254 University of Oregon, Eugene, OR, 97403, USA
| | - Philip Washbourne
- Institute of Neuroscience, 1254 University of Oregon, Eugene, OR, 97403, USA.
| |
Collapse
|
19
|
Benndorf R, Velazquez R, Zehr JD, Pond SLK, Martin JL, Lucaci AG. Human HspB1, HspB3, HspB5 and HspB8: Shaping these disease factors during vertebrate evolution. Cell Stress Chaperones 2022; 27:309-323. [PMID: 35678958 PMCID: PMC9346038 DOI: 10.1007/s12192-022-01268-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/21/2022] [Accepted: 03/22/2022] [Indexed: 12/05/2022] Open
Abstract
Small heat shock proteins (sHSPs) emerged early in evolution and occur in all domains of life and nearly in all species, including humans. Mutations in four sHSPs (HspB1, HspB3, HspB5, HspB8) are associated with neuromuscular disorders. The aim of this study is to investigate the evolutionary forces shaping these sHSPs during vertebrate evolution. We performed comparative evolutionary analyses on a set of orthologous sHSP sequences, based on the ratio of non-synonymous: synonymous substitution rates for each codon. We found that these sHSPs had been historically exposed to different degrees of purifying selection, decreasing in this order: HspB8 > HspB1, HspB5 > HspB3. Within each sHSP, regions with different degrees of purifying selection can be discerned, resulting in characteristic selective pressure profiles. The conserved α-crystallin domains were exposed to the most stringent purifying selection compared to the flanking regions, supporting a 'dimorphic pattern' of evolution. Thus, during vertebrate evolution the different sequence partitions were exposed to different and measurable degrees of selective pressures. Among the disease-associated mutations, most are missense mutations primarily in HspB1 and to a lesser extent in the other sHSPs. Our data provide an explanation for this disparate incidence. Contrary to the expectation, most missense mutations cause dominant disease phenotypes. Theoretical considerations support a connection between the historic exposure of these sHSP genes to a high degree of purifying selection and the unusual prevalence of genetic dominance of the associated disease phenotypes. Our study puts the genetics of inheritable sHSP-borne diseases into the context of vertebrate evolution.
Collapse
Affiliation(s)
| | - Ryan Velazquez
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, PA 19122 USA
| | - Jordan D. Zehr
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, PA 19122 USA
| | - Sergei L. Kosakovsky Pond
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, PA 19122 USA
| | - Jody L. Martin
- Cell and Molecular Core, Cardiovascular Research Institute, University of California at Davis, Davis, CA USA
| | - Alexander G. Lucaci
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, PA 19122 USA
| |
Collapse
|
20
|
Peng W, Yang Y, Xu J, Peng E, Dai S, Dai L, Wang Y, Yi T, Wang B, Li D, Song N. TALE Transcription Factors in Sweet Orange ( Citrus sinensis): Genome-Wide Identification, Characterization, and Expression in Response to Biotic and Abiotic Stresses. FRONTIERS IN PLANT SCIENCE 2022; 12:814252. [PMID: 35126435 PMCID: PMC8811264 DOI: 10.3389/fpls.2021.814252] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 12/13/2021] [Indexed: 06/14/2023]
Abstract
Three-amino-acid-loop-extension (TALE) transcription factors comprise one of the largest gene families in plants, in which they contribute to regulation of a wide variety of biological processes, including plant growth and development, as well as governing stress responses. Although sweet orange (Citrus sinensis) is among the most commercially important fruit crops cultivated worldwide, there have been relatively few functional studies on TALE genes in this species. In this study, we investigated 18 CsTALE gene family members with respect to their phylogeny, physicochemical properties, conserved motif/domain sequences, gene structures, chromosomal location, cis-acting regulatory elements, and protein-protein interactions (PPIs). These CsTALE genes were classified into two subfamilies based on sequence homology and phylogenetic analyses, and the classification was equally strongly supported by the highly conserved gene structures and motif/domain compositions. CsTALEs were found to be unevenly distributed on the chromosomes, and duplication analysis revealed that segmental duplication and purifying selection have been major driving force in the evolution of these genes. Expression profile analysis indicated that CsTALE genes exhibit a discernible spatial expression pattern in different tissues and differing expression patterns in response to different biotic/abiotic stresses. Of the 18 CsTALE genes examined, 10 were found to be responsive to high temperature, four to low temperature, eight to salt, and four to wounding. Moreover, the expression of CsTALE3/8/12/16 was induced in response to infection with the fungal pathogen Diaporthe citri and bacterial pathogen Candidatus Liberibacter asiaticus, whereas the expression of CsTALE15/17 was strongly suppressed. The transcriptional activity of CsTALE proteins was also verified in yeast, with yeast two-hybrid assays indicating that CsTALE3/CsTALE8, CsTALE3/CsTALE11, CsTALE10/CsTALE12, CsTALE14/CsTALE8, CsTALE14/CsTALE11 can form respective heterodimers. The findings of this study could lay the foundations for elucidating the biological functions of the TALE family genes in sweet orange and contribute to the breeding of stress-tolerant plants.
Collapse
Affiliation(s)
- Weiye Peng
- College of Plant Protection, Hunan Agricultural University, Changsha, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
| | - Yang Yang
- College of Plant Protection, Hunan Agricultural University, Changsha, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
| | - Jing Xu
- College of Plant Protection, Hunan Agricultural University, Changsha, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
| | - Erping Peng
- College of Plant Protection, Hunan Agricultural University, Changsha, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
| | - Suming Dai
- Horticulture College, Hunan Agricultural University, Changsha, China
- National Center for Citrus Improvement Changsha, Changsha, China
| | - Liangying Dai
- College of Plant Protection, Hunan Agricultural University, Changsha, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
| | - Yunsheng Wang
- College of Plant Protection, Hunan Agricultural University, Changsha, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
| | - Tuyong Yi
- College of Plant Protection, Hunan Agricultural University, Changsha, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
| | - Bing Wang
- College of Plant Protection, Hunan Agricultural University, Changsha, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
| | - Dazhi Li
- Horticulture College, Hunan Agricultural University, Changsha, China
- National Center for Citrus Improvement Changsha, Changsha, China
| | - Na Song
- College of Plant Protection, Hunan Agricultural University, Changsha, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
| |
Collapse
|
21
|
Huang LC, Taujale R, Gravel N, Venkat A, Yeung W, Byrne DP, Eyers PA, Kannan N. KinOrtho: a method for mapping human kinase orthologs across the tree of life and illuminating understudied kinases. BMC Bioinformatics 2021; 22:446. [PMID: 34537014 PMCID: PMC8449880 DOI: 10.1186/s12859-021-04358-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Protein kinases are among the largest druggable family of signaling proteins, involved in various human diseases, including cancers and neurodegenerative disorders. Despite their clinical relevance, nearly 30% of the 545 human protein kinases remain highly understudied. Comparative genomics is a powerful approach for predicting and investigating the functions of understudied kinases. However, an incomplete knowledge of kinase orthologs across fully sequenced kinomes severely limits the application of comparative genomics approaches for illuminating understudied kinases. Here, we introduce KinOrtho, a query- and graph-based orthology inference method that combines full-length and domain-based approaches to map one-to-one kinase orthologs across 17 thousand species. RESULTS Using multiple metrics, we show that KinOrtho performed better than existing methods in identifying kinase orthologs across evolutionarily divergent species and eliminated potential false positives by flagging sequences without a proper kinase domain for further evaluation. We demonstrate the advantage of using domain-based approaches for identifying domain fusion events, highlighting a case between an understudied serine/threonine kinase TAOK1 and a metabolic kinase PIK3C2A with high co-expression in human cells. We also identify evolutionary fission events involving the understudied OBSCN kinase domains, further highlighting the value of domain-based orthology inference approaches. Using KinOrtho-defined orthologs, Gene Ontology annotations, and machine learning, we propose putative biological functions of several understudied kinases, including the role of TP53RK in cell cycle checkpoint(s), the involvement of TSSK3 and TSSK6 in acrosomal vesicle localization, and potential functions for the ULK4 pseudokinase in neuronal development. CONCLUSIONS In sum, KinOrtho presents a novel query-based tool to identify one-to-one orthologous relationships across thousands of proteomes that can be applied to any protein family of interest. We exploit KinOrtho here to identify kinase orthologs and show that its well-curated kinome ortholog set can serve as a valuable resource for illuminating understudied kinases, and the KinOrtho framework can be extended to any protein-family of interest.
Collapse
Affiliation(s)
- Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Rahil Taujale
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Nathan Gravel
- PREP@UGA, University of Georgia, 500 D.W. Brooks Drive, Athens, GA 30602 USA
| | - Aarya Venkat
- Department of Biochemistry and Molecular Biology, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Dominic P. Byrne
- Department of Biochemistry and Systems Biology, University of Liverpool, Crown St, Liverpool, UK
| | - Patrick A. Eyers
- Department of Biochemistry and Systems Biology, University of Liverpool, Crown St, Liverpool, UK
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
- Department of Biochemistry and Molecular Biology, University of Georgia, 120 Green St., Athens, GA 30602 USA
| |
Collapse
|
22
|
Hybrid Deep Learning Based on a Heterogeneous Network Profile for Functional Annotations of Plasmodium falciparum Genes. Int J Mol Sci 2021; 22:ijms221810019. [PMID: 34576183 PMCID: PMC8468833 DOI: 10.3390/ijms221810019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 09/13/2021] [Accepted: 09/14/2021] [Indexed: 12/15/2022] Open
Abstract
Functional annotation of unknown function genes reveals unidentified functions that can enhance our understanding of complex genome communications. A common approach for inferring gene function involves the ortholog-based method. However, genetic data alone are often not enough to provide information for function annotation. Thus, integrating other sources of data can potentially increase the possibility of retrieving annotations. Network-based methods are efficient techniques for exploring interactions among genes and can be used for functional inference. In this study, we present an analysis framework for inferring the functions of Plasmodium falciparum genes based on connection profiles in a heterogeneous network between human and Plasmodium falciparum proteins. These profiles were fed into a hybrid deep learning algorithm to predict the orthologs of unknown function genes. The results show high performance of the model's predictions, with an AUC of 0.89. One hundred and twenty-one predicted pairs with high prediction scores were selected for inferring the functions using statistical enrichment analysis. Using this method, PF3D7_1248700 and PF3D7_0401800 were found to be involved with muscle contraction and striated muscle tissue development, while PF3D7_1303800 and PF3D7_1201000 were found to be related to protein dephosphorylation. In conclusion, combining a heterogeneous network and a hybrid deep learning technique can allow us to identify unknown gene functions of malaria parasites. This approach is generalized and can be applied to other diseases that enhance the field of biomedical science.
Collapse
|
23
|
Molecular underpinnings of the early brain developmental response to differential feeding in the honey bee Apis mellifera. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2021; 1864:194732. [PMID: 34242825 DOI: 10.1016/j.bbagrm.2021.194732] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 06/25/2021] [Accepted: 06/29/2021] [Indexed: 12/14/2022]
Abstract
Brain differential morphogenesis in females is one of the major phenotypic manifestations of caste development in honey bees. Brain diphenism appears at the fourth larval phase as a result of the differential feeding regime developing females are submitted during early phases of larval development. Here, we used a forward genetics approach to test the early brain molecular response to differential feeding leading to the brain diphenism observed at later developmental phases. Using RNA sequencing analysis, we identified 53 differentially expressed genes (DEGs) between the brains of queens and workers at the third larval phase. Since miRNAs have been suggested to play a role in caste differentiation after horizontal and vertical transmission, we tested their potential participation in regulating the DEGs. The miRNA-mRNA interaction network, including the DEGs and the royal- and worker-jelly enriched miRNA populations, revealed a subset of miRNAs potentially involved in regulating the expression of DEGs. The interaction of miR-34, miR-210, and miR-317 with Takeout, Neurotrophin-1, Forked, and Masquerade genes was experimentally confirmed using a luciferase reporter system. Taken together, our results reconstruct the regulatory network that governs the development of the early brain diphenism in honey bees.
Collapse
|
24
|
Harris CD, Torrance EL, Raymann K, Bobay LM. CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets. Mol Biol Evol 2021; 38:727-734. [PMID: 32886787 PMCID: PMC7826169 DOI: 10.1093/molbev/msaa224] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The core genome represents the set of genes shared by all, or nearly all, strains of a given population or species of prokaryotes. Inferring the core genome is integral to many genomic analyses, however, most methods rely on the comparison of all the pairs of genomes; a step that is becoming increasingly difficult given the massive accumulation of genomic data. Here, we present CoreCruncher; a program that robustly and rapidly constructs core genomes across hundreds or thousands of genomes. CoreCruncher does not compute all pairwise genome comparisons and uses a heuristic based on the distributions of identity scores to classify sequences as orthologs or paralogs/xenologs. Although it is much faster than current methods, our results indicate that our approach is more conservative than other tools and less sensitive to the presence of paralogs and xenologs. CoreCruncher is freely available from: https://github.com/lbobay/CoreCruncher. CoreCruncher is written in Python 3.7 and can also run on Python 2.7 without modification. It requires the python library Numpy and either Usearch or Blast. Certain options require the programs muscle or mafft.
Collapse
Affiliation(s)
- Connor D Harris
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC
| | - Ellis L Torrance
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC
| | - Kasie Raymann
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC
| | - Louis-Marie Bobay
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC
| |
Collapse
|
25
|
Hao Y, Lee HJ, Baraboo M, Burch K, Maurer T, Somarelli JA, Conant GC. Baby Genomics: Tracing the Evolutionary Changes That Gave Rise to Placentation. Genome Biol Evol 2021; 12:35-47. [PMID: 32053193 PMCID: PMC7144826 DOI: 10.1093/gbe/evaa026] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/02/2020] [Indexed: 12/12/2022] Open
Abstract
It has long been challenging to uncover the molecular mechanisms behind striking morphological innovations such as mammalian pregnancy. We studied the power of a robust comparative orthology pipeline based on gene synteny to address such problems. We inferred orthology relations between human genes and genes from each of 43 other vertebrate genomes, resulting in ∼18,000 orthologous pairs for each genome comparison. By identifying genes that first appear coincident with origin of the placental mammals, we hypothesized that we would define a subset of the genome enriched for genes that played a role in placental evolution. We thus pinpointed orthologs that appeared before and after the divergence of eutherian mammals from marsupials. Reinforcing previous work, we found instead that much of the genetic toolkit of mammalian pregnancy evolved through the repurposing of preexisting genes to new roles. These genes acquired regulatory controls for their novel roles from a group of regulatory genes, many of which did in fact originate at the appearance of the eutherians. Thus, orthologs appearing at the origin of the eutherians are enriched in functions such as transcriptional regulation by Krüppel-associated box-zinc-finger proteins, innate immune responses, keratinization, and the melanoma-associated antigen protein class. Because the cellular mechanisms of invasive placentae are similar to those of metastatic cancers, we then used our orthology inferences to explore the association between placenta invasion and cancer metastasis. Again echoing previous work, we find that genes that are phylogenetically older are more likely to be implicated in cancer development.
Collapse
Affiliation(s)
- Yue Hao
- Bioinformatics Research Center, North Carolina State University
| | - Hyuk Jin Lee
- Division of Biological Sciences, University of Missouri-Columbia
| | | | | | | | - Jason A Somarelli
- Duke Cancer Institute, Duke University Medical Center.,Department of Medicine, Duke University School of Medicine
| | - Gavin C Conant
- Bioinformatics Research Center, North Carolina State University.,Division of Animal Sciences, University of Missouri-Columbia.,Program in Genetics, North Carolina State University.,Department of Biological Sciences, North Carolina State University
| |
Collapse
|
26
|
Boyle JH, Rastas PMA, Huang X, Garner AG, Vythilingam I, Armbruster PA. A Linkage-Based Genome Assembly for the Mosquito Aedes albopictus and Identification of Chromosomal Regions Affecting Diapause. INSECTS 2021; 12:167. [PMID: 33669192 PMCID: PMC7919801 DOI: 10.3390/insects12020167] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 02/08/2021] [Accepted: 02/10/2021] [Indexed: 12/16/2022]
Abstract
The Asian tiger mosquito, Aedes albopictus, is an invasive vector mosquito of substantial public health concern. The large genome size (~1.19-1.28 Gb by cytofluorometric estimates), comprised of ~68% repetitive DNA sequences, has made it difficult to produce a high-quality genome assembly for this species. We constructed a high-density linkage map for Ae. albopictus based on 111,328 informative SNPs obtained by RNAseq. We then performed a linkage-map anchored reassembly of AalbF2, the genome assembly produced by Palatini et al. (2020). Our reassembled genome sequence, AalbF3, represents several improvements relative to AalbF2. First, the size of the AalbF3 assembly is 1.45 Gb, almost half the size of AalbF2. Furthermore, relative to AalbF2, AalbF3 contains a higher proportion of complete and single-copy BUSCO genes (84.3%) and a higher proportion of aligned RNAseq reads that map concordantly to a single location of the genome (46%). We demonstrate the utility of AalbF3 by using it as a reference for a bulk-segregant-based comparative genomics analysis that identifies chromosomal regions with clusters of candidate SNPs putatively associated with photoperiodic diapause, a crucial ecological adaptation underpinning the rapid range expansion and climatic adaptation of A. albopictus.
Collapse
Affiliation(s)
- John H. Boyle
- Department of Biology, Georgetown University, 37th and O St, Washington, DC 20057, USA; (J.H.B.); (X.H.); (A.G.G.)
- Department of Biology, University of Mary, Bismarck, ND 58504, USA
| | - Pasi M. A. Rastas
- Institute of Biotechnology, Helsinki Institute of Life Science (HiLIFE), University of Helsinki, 00014 Helsinki, Finland;
| | - Xin Huang
- Department of Biology, Georgetown University, 37th and O St, Washington, DC 20057, USA; (J.H.B.); (X.H.); (A.G.G.)
| | - Austin G. Garner
- Department of Biology, Georgetown University, 37th and O St, Washington, DC 20057, USA; (J.H.B.); (X.H.); (A.G.G.)
| | - Indra Vythilingam
- Department of Parasitology, Faculty of Medicine, University of Malaya, 50603 Kuala Lumpur, Malaysia;
| | - Peter A. Armbruster
- Department of Biology, Georgetown University, 37th and O St, Washington, DC 20057, USA; (J.H.B.); (X.H.); (A.G.G.)
| |
Collapse
|
27
|
Koonin EV, Makarova KS, Wolf YI. Evolution of Microbial Genomics: Conceptual Shifts over a Quarter Century. Trends Microbiol 2021; 29:582-592. [PMID: 33541841 DOI: 10.1016/j.tim.2021.01.005] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 01/07/2021] [Accepted: 01/08/2021] [Indexed: 12/20/2022]
Abstract
Prokaryote genomics started in earnest in 1995, with the complete sequences of two small bacterial genomes, those of Haemophilus influenzae and Mycoplasma genitalium. During the next quarter century, the prokaryote genome database has been growing exponentially, with no saturation in sight. For most of these 25 years, genome sequencing remained limited to cultivable microbes. Together with next-generation sequencing methods, advances in metagenomics and single-cell genomics have lifted this limitation, providing for an increasingly unbiased characterization of the global prokaryote diversity. Advances in computational genomics followed the progress of genome sequencing, even if occasionally lagging behind. Several major new branches of bacteria and archaea were discovered, including Asgard archaea, the apparent closest relatives of eukaryotes and expansive groups of bacteria and archaea with small genomes thought to be symbionts of other prokaryotes. Comparative analysis of numerous prokaryote genomes spanning a wide range of evolutionary distances changed the conceptual foundations of microbiology, supplanting the notion of species genomes with fixed gene sets with that of dynamic pangenomes and the notion of a single Tree of Life (ToL) with a statistical tree-like trend among individual gene trees. Strides were also made towards a theory and quantitative laws of prokaryote genome evolution.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA.
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| |
Collapse
|
28
|
Hernández-Salmerón JE, Moreno-Hagelsieb G. Progress in quickly finding orthologs as reciprocal best hits: comparing blast, last, diamond and MMseqs2. BMC Genomics 2020; 21:741. [PMID: 33099302 PMCID: PMC7585182 DOI: 10.1186/s12864-020-07132-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 10/09/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Finding orthologs remains an important bottleneck in comparative genomics analyses. While the authors of software for the quick comparison of protein sequences evaluate the speed of their software and compare their results against the most usual software for the task, it is not common for them to evaluate their software for more particular uses, such as finding orthologs as reciprocal best hits (RBH). Here we compared RBH results obtained using software that runs faster than blastp. Namely, lastal, diamond, and MMseqs2. RESULTS We found that lastal required the least time to produce results. However, it yielded fewer results than any other program when comparing the proteins encoded by evolutionarily distant genomes. The program producing the most similar number of RBH to blastp was diamond ran with the "ultra-sensitive" option. However, this option was diamond's slowest, with the "very-sensitive" option offering the best balance between speed and RBH results. The speeding up of the programs was much more evident when dealing with eukaryotic genomes, which code for more numerous proteins. For example, lastal took a median of approx. 1.5% of the blastp time to run with bacterial proteomes and 0.6% with eukaryotic ones, while diamond with the very-sensitive option took 7.4% and 5.2%, respectively. Though estimated error rates were very similar among the RBH obtained with all programs, RBH obtained with MMseqs2 had the lowest error rates among the programs tested. CONCLUSIONS The fast algorithms for pairwise protein comparison produced results very similar to blast in a fraction of the time, with diamond offering the best compromise in speed, sensitivity and quality, as long as a sensitivity option, other than the default, was chosen.
Collapse
Affiliation(s)
| | - Gabriel Moreno-Hagelsieb
- Wilfrid Laurier University, Department of Biology, 75 University Ave W, Waterloo, N2L 3C5 ON Canada
| |
Collapse
|
29
|
Nucleotide substitution rates of diatom plastid encoded protein genes are positively correlated with genome architecture. Sci Rep 2020; 10:14358. [PMID: 32873883 PMCID: PMC7462845 DOI: 10.1038/s41598-020-71473-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 08/17/2020] [Indexed: 01/02/2023] Open
Abstract
Diatoms are the largest group of heterokont algae with more than 100,000 species. As one of the single-celled photosynthetic organisms that inhabit marine, aquatic and terrestrial ecosystems, diatoms contribute ~ 45% of global primary production. Despite their ubiquity and environmental significance, very few diatom plastid genomes (plastomes) have been sequenced and studied. This study explored patterns of nucleotide substitution rates of diatom plastids across the entire suite of plastome protein-coding genes for 40 taxa representing the major clades. The highest substitution rate was lineage-specific within the araphid 2 taxon Astrosyne radiata and radial 2 taxon Proboscia sp. Rate heterogeneity was also evident in different functional classes and individual genes. Similar to land plants, proteins genes involved in photosynthetic metabolism have lower synonymous and nonsynonymous substitutions rates than those involved in transcription and translation. Significant positive correlations were identified between substitution rates and measures of genomic rearrangements, including indels and inversions, which is a similar result to what was found in legume plants. This work advances the understanding of the molecular evolution of diatom plastomes and provides a foundation for future studies.
Collapse
|
30
|
Owen CL, Stern DB, Hilton SK, Crandall KA. Hemiptera phylogenomic resources: Tree‐based orthology prediction and conserved exon identification. Mol Ecol Resour 2020; 20:1346-1360. [DOI: 10.1111/1755-0998.13180] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 04/02/2020] [Accepted: 04/27/2020] [Indexed: 12/21/2022]
Affiliation(s)
- Christopher L. Owen
- Computational Biology Institute George Washington University Washington DC USA
- Systematic Entomology Laboratory USDA‐ARS Beltsville MD USA
| | - David B. Stern
- Computational Biology Institute George Washington University Washington DC USA
- Department of Integrative Biology University of Wisconsin ‐ Madison Madison WI USA
| | - Sarah K. Hilton
- Computational Biology Institute George Washington University Washington DC USA
- Department of Genome Sciences University of Washington Washington DC USA
| | - Keith A. Crandall
- Computational Biology Institute George Washington University Washington DC USA
| |
Collapse
|
31
|
Abstract
Knowing phylogenetic relationships among species is fundamental for many studies in biology. An accurate phylogenetic tree underpins our understanding of the major transitions in evolution, such as the emergence of new body plans or metabolism, and is key to inferring the origin of new genes, detecting molecular adaptation, understanding morphological character evolution and reconstructing demographic changes in recently diverged species. Although data are ever more plentiful and powerful analysis methods are available, there remain many challenges to reliable tree building. Here, we discuss the major steps of phylogenetic analysis, including identification of orthologous genes or proteins, multiple sequence alignment, and choice of substitution models and inference methodologies. Understanding the different sources of errors and the strategies to mitigate them is essential for assembling an accurate tree of life.
Collapse
|
32
|
Galperin MY, Kristensen DM, Makarova KS, Wolf YI, Koonin EV. Microbial genome analysis: the COG approach. Brief Bioinform 2020; 20:1063-1070. [PMID: 28968633 DOI: 10.1093/bib/bbx117] [Citation(s) in RCA: 152] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 08/01/2017] [Indexed: 11/15/2022] Open
Abstract
For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis.
Collapse
|
33
|
Prasanna AN, Gerber D, Kijpornyongpan T, Aime MC, Doyle VP, Nagy LG. Model Choice, Missing Data, and Taxon Sampling Impact Phylogenomic Inference of Deep Basidiomycota Relationships. Syst Biol 2020; 69:17-37. [PMID: 31062852 DOI: 10.1093/sysbio/syz029] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 04/21/2019] [Accepted: 04/26/2019] [Indexed: 11/12/2022] Open
Abstract
Resolving deep divergences in the tree of life is challenging even for analyses of genome-scale phylogenetic data sets. Relationships between Basidiomycota subphyla, the rusts and allies (Pucciniomycotina), smuts and allies (Ustilaginomycotina), and mushroom-forming fungi and allies (Agaricomycotina) were found particularly recalcitrant both to traditional multigene and genome-scale phylogenetics. Here, we address basal Basidiomycota relationships using concatenated and gene tree-based analyses of various phylogenomic data sets to examine the contribution of several potential sources of bias. We evaluate the contribution of biological causes (hard polytomy, incomplete lineage sorting) versus unmodeled evolutionary processes and factors that exacerbate their effects (e.g., fast-evolving sites and long-branch taxa) to inferences of basal Basidiomycota relationships. Bayesian Markov Chain Monte Carlo and likelihood mapping analyses reject the hard polytomy with confidence. In concatenated analyses, fast-evolving sites and oversimplified models of amino acid substitution favored the grouping of smuts with mushroom-forming fungi, often leading to maximal bootstrap support in both concatenation and coalescent analyses. On the contrary, the most conserved data subsets grouped rusts and allies with mushroom-forming fungi, although this relationship proved labile, sensitive to model choice, to different data subsets and to missing data. Excluding putative long-branch taxa, genes with high proportions of missing data and/or with strong signal failed to reveal a consistent trend toward one or the other topology, suggesting that additional sources of conflict are at play. While concatenated analyses yielded strong but conflicting support, individual gene trees mostly provided poor support for any resolution of rusts, smuts, and mushroom-forming fungi, suggesting that the true Basidiomycota tree might be in a part of tree space that is difficult to access using both concatenation and gene tree-based approaches. Inference-based assessments of absolute model fit strongly reject best-fit models for the vast majority of genes, indicating a poor fit of even the most commonly used models. While this is consistent with previous assessments of site-homogenous models of amino acid evolution, this does not appear to be the sole source of confounding signal. Our analyses suggest that topologies uniting smuts with mushroom-forming fungi can arise as a result of inappropriate modeling of amino acid sites that might be prone to systematic bias. We speculate that improved models of sequence evolution could shed more light on basal splits in the Basidiomycota, which, for now, remain unresolved despite the use of whole genome data.
Collapse
Affiliation(s)
- Arun N Prasanna
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary
| | - Daniel Gerber
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary.,Institute of Archaeology, Research Centre for the Humanities, Hungarian Academy of Sciences, Budapest 1097, Hungary
| | | | - M Catherine Aime
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN 47907, USA
| | - Vinson P Doyle
- Department of Plant Pathology and Crop Physiology, Louisiana State University AgCenter, Baton Rouge, LA 70803, USA
| | - Laszlo G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary
| |
Collapse
|
34
|
Comparative Genomics and CAZyme Genome Repertoires of Marine Zobellia amurskyensis KMM 3526 T and Zobellia laminariae KMM 3676 T. Mar Drugs 2019; 17:md17120661. [PMID: 31771309 PMCID: PMC6950322 DOI: 10.3390/md17120661] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 11/21/2019] [Accepted: 11/22/2019] [Indexed: 01/01/2023] Open
Abstract
We obtained two novel draft genomes of type Zobellia strains with estimated genome sizes of 5.14 Mb for Z. amurskyensis KMM 3526Т and 5.16 Mb for Z. laminariae KMM 3676Т. Comparative genomic analysis has been carried out between obtained and known genomes of Zobellia representatives. The pan-genome of Zobellia genus is composed of 4853 orthologous clusters and the core genome was estimated at 2963 clusters. The genus CAZome was represented by 775 GHs classified into 62 families, 297 GTs of 16 families, 100 PLs of 13 families, 112 CEs of 13 families, 186 CBMs of 18 families and 42 AAs of six families. A closer inspection of the carbohydrate-active enzyme (CAZyme) genomic repertoires revealed members of new putative subfamilies of GH16 and GH117, which can be biotechnologically promising for production of oligosaccharides and rare monomers with different bioactivities. We analyzed AA3s, among them putative FAD-dependent glycoside oxidoreductases (FAD-GOs) being of particular interest as promising biocatalysts for glycoside deglycosylation in food and pharmaceutical industries.
Collapse
|
35
|
Rubanov LI, Zaraisky AG, Shilovsky GA, Seliverstov AV, Zverkov OA, Lyubetsky VA. Screening for mouse genes lost in mammals with long lifespans. BioData Min 2019; 12:20. [PMID: 31728160 PMCID: PMC6842137 DOI: 10.1186/s13040-019-0208-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 10/25/2019] [Indexed: 12/23/2022] Open
Abstract
Background Gerontogenes include those that modulate life expectancy in various species and may be the actual longevity genes. We believe that a long (relative to body weight) lifespan in individual rodent and primate species can be due, among other things, to the loss of particular genes that are present in short-lived species of the same orders. These genes can also explain the widely different rates of aging among diverse species as well as why similarly sized rodents or primates sometimes have anomalous life expectancies (e.g., naked mole-rats and humans). Here, we consider the gene loss in the context of the prediction of Williams’ theory that concerns the reallocation of physiological resources of an organism between active reproduction (r-strategy) and self-maintenance (K-strategy). We have identified such lost genes using an original computer-aided approach; the software considers the loss of a gene as disruptions in gene orthology, local gene synteny or both. Results A method and software identifying the genes that are absent from a predefined set of species but present in another predefined set of species are suggested. Examples of such pairs of sets include long-lived vs short-lived, homeothermic vs poikilothermic, amniotic vs anamniotic, aquatic vs terrestrial, and neotenic vs nonneotenic species, among others. Species are included in one of two sets according to the property of interest, such as longevity or homeothermy. The program is universal towards these pairs, i.e., towards the underlying property, although the sets should include species with quality genome assemblies. Here, the proposed method was applied to study the longevity of Euarchontoglires species. It largely predicted genes that are highly expressed in the testis, epididymis, uterus, mammary glands, and the vomeronasal and other reproduction-related organs. This agrees with Williams’ theory that hypothesizes a species transition from r-strategy to K-strategy. For instance, the method predicts the mouse gene Smpd5, which has an expression level 20 times greater in the testis than in organs unrelated to reproduction as experimentally demonstrated elsewhere. At the same time, its paralog Smpd3 is not predicted by the program and is widely expressed in many organs not specifically related to reproduction. Conclusions The method and program, which were applied here to screen for gene losses that can accompany increased lifespan, were also applied to study reduced regenerative capacity and development of the telencephalon, neoteny, etc. Some of these results have been carefully tested experimentally. Therefore, we assume that the method is widely applicable.
Collapse
Affiliation(s)
- Lev I Rubanov
- 1Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute) IITP RAS, 19 build. 1 Bolshoy Karetny per., Moscow, 127051 Russia
| | - Andrey G Zaraisky
- 2Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences (IBCH RAS) 16/10, Miklukho-Maklaya str., Moscow, 117997 Russia
| | - Gregory A Shilovsky
- 1Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute) IITP RAS, 19 build. 1 Bolshoy Karetny per., Moscow, 127051 Russia
| | - Alexandr V Seliverstov
- 1Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute) IITP RAS, 19 build. 1 Bolshoy Karetny per., Moscow, 127051 Russia
| | - Oleg A Zverkov
- 1Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute) IITP RAS, 19 build. 1 Bolshoy Karetny per., Moscow, 127051 Russia
| | - Vassily A Lyubetsky
- 1Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute) IITP RAS, 19 build. 1 Bolshoy Karetny per., Moscow, 127051 Russia
| |
Collapse
|
36
|
Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 2019; 20:238. [PMID: 31727128 PMCID: PMC6857279 DOI: 10.1186/s13059-019-1832-y] [Citation(s) in RCA: 2815] [Impact Index Per Article: 563.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 09/23/2019] [Indexed: 12/22/2022] Open
Abstract
Here, we present a major advance of the OrthoFinder method. This extends OrthoFinder's high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted gene trees, gene duplication events, the rooted species tree, and comparative genomics statistics. Each output is benchmarked on appropriate real or simulated datasets, and where comparable methods exist, OrthoFinder is equivalent to or outperforms these methods. Furthermore, OrthoFinder is the most accurate ortholog inference method on the Quest for Orthologs benchmark test. Finally, OrthoFinder's comprehensive phylogenetic analysis is achieved with equivalent speed and scalability to the fastest, score-based heuristic methods. OrthoFinder is available at https://github.com/davidemms/OrthoFinder.
Collapse
Affiliation(s)
- David M Emms
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| |
Collapse
|
37
|
Andrade CH, Neves BJ, Melo-Filho CC, Rodrigues J, Silva DC, Braga RC, Cravo PVL. In Silico Chemogenomics Drug Repositioning Strategies for Neglected Tropical Diseases. Curr Med Chem 2019. [DOI: 10.2174/0929867325666180309114824] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Only ~1% of all drug candidates against Neglected Tropical Diseases (NTDs)
have reached clinical trials in the last decades, underscoring the need for new, safe and effective
treatments. In such context, drug repositioning, which allows finding novel indications
for approved drugs whose pharmacokinetic and safety profiles are already known,
emerging as a promising strategy for tackling NTDs. Chemogenomics is a direct descendent
of the typical drug discovery process that involves the systematic screening of chemical
compounds against drug targets in high-throughput screening (HTS) efforts, for the identification
of lead compounds. However, different to the one-drug-one-target paradigm, chemogenomics
attempts to identify all potential ligands for all possible targets and diseases. In
this review, we summarize current methodological development efforts in drug repositioning
that use state-of-the-art computational ligand- and structure-based chemogenomics approaches.
Furthermore, we highlighted the recent progress in computational drug repositioning
for some NTDs, based on curation and modeling of genomic, biological, and chemical data.
Additionally, we also present in-house and other successful examples and suggest possible solutions
to existing pitfalls.
Collapse
Affiliation(s)
- Carolina Horta Andrade
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Bruno Junior Neves
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Cleber Camilo Melo-Filho
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Juliana Rodrigues
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Diego Cabral Silva
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Rodolpho Campos Braga
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Pedro Vitor Lemos Cravo
- Laboratory of Cheminformatics, Centro Universitario de Anapolis (UniEVANGELICA), Anapolis, GO, 75083-515, Brazil
| |
Collapse
|
38
|
Testis-specific Arf promoter expression in a transposase-aided BAC transgenic mouse model. Mol Biol Rep 2019; 46:6243-6252. [PMID: 31583563 DOI: 10.1007/s11033-019-05063-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 09/04/2019] [Indexed: 10/25/2022]
Abstract
CDKN2A is an evolutionarily conserved gene encoding proteins implicated in tumor suppression, ocular development, aging, and metabolic diseases. Like the human form, mouse Cdkn2a encodes two distinct proteins-p16Ink4a, which blocks cyclin-dependent kinase activity, and p19Arf, which is best known as a positive regulator of the p53 tumor suppressor-and their functions have been well-studied in genetically engineered mouse models. Relatively little is known about how expression of the two transcripts is controlled in normal development and in certain disease states. To better understand their coordinate and transcript-specific expression in situ, we used a transposase-aided approach to generate a new BAC transgenic mouse model in which the first exons encoding Arf and Ink4a are replaced by fluorescent reporters. We show that mouse embryo fibroblasts generated from the transgenic lines faithfully display induction of each transgenic reporter in cell culture models, and we demonstrate the expected expression of the Arf reporter in the normal testis, one of the few places where that promoter is normally expressed. Interestingly, the TGFβ-2-dependent induction of the Arf reporter in the eye-a process essential for normal eye development-does not occur. Our findings illustrate the value of BAC transgenesis in mapping key regulatory elements in the mouse by revealing the genomic DNA required for Cdkn2a induction in cultured cells and the developing testis, and the apparent lack of elements driving expression in the developing eye.
Collapse
|
39
|
Hu X, Friedberg I. SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier. Gigascience 2019; 8:giz118. [PMID: 31648300 PMCID: PMC6812468 DOI: 10.1093/gigascience/giz118] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 06/07/2019] [Accepted: 09/05/2019] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Gene homology type classification is required for many types of genome analyses, including comparative genomics, phylogenetics, and protein function annotation. Consequently, a large variety of tools have been developed to perform homology classification across genomes of different species. However, when applied to large genomic data sets, these tools require high memory and CPU usage, typically available only in computational clusters. FINDINGS Here we present a new graph-based orthology analysis tool, SwiftOrtho, which is optimized for speed and memory usage when applied to large-scale data. SwiftOrtho uses long k-mers to speed up homology search, while using a reduced amino acid alphabet and spaced seeds to compensate for the loss of sensitivity due to long k-mers. In addition, it uses an affinity propagation algorithm to reduce the memory usage when clustering large-scale orthology relationships into orthologous groups. In our tests, SwiftOrtho was the only tool that completed orthology analysis of proteins from 1,760 bacterial genomes on a computer with only 4 GB RAM. Using various standard orthology data sets, we also show that SwiftOrtho has a high accuracy. CONCLUSIONS SwiftOrtho enables the accurate comparative genomic analyses of thousands of genomes using low-memory computers. SwiftOrtho is available at https://github.com/Rinoahu/SwiftOrtho.
Collapse
Affiliation(s)
- Xiao Hu
- Department of Veterinary Microbiology and Preventive Medicine, 2118 Veterinary Medicine, College of Veterinary Medicine, Iowa State University, Ames, IA, 50011, USA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, 2118 Veterinary Medicine, College of Veterinary Medicine, Iowa State University, Ames, IA, 50011, USA
| |
Collapse
|
40
|
Lafond M, Meghdari Miardan M, Sankoff D. Accurate prediction of orthologs in the presence of divergence after duplication. Bioinformatics 2019; 34:i366-i375. [PMID: 29950018 PMCID: PMC6022570 DOI: 10.1093/bioinformatics/bty242] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Motivation When gene duplication occurs, one of the copies may become free of selective pressure and evolve at an accelerated pace. This has important consequences on the prediction of orthology relationships, since two orthologous genes separated by divergence after duplication may differ in both sequence and function. In this work, we make the distinction between the primary orthologs, which have not been affected by accelerated mutation rates on their evolutionary path, and the secondary orthologs, which have. Similarity-based prediction methods will tend to miss secondary orthologs, whereas phylogeny-based methods cannot separate primary and secondary orthologs. However, both types of orthology have applications in important areas such as gene function prediction and phylogenetic reconstruction, motivating the need for methods that can distinguish the two types. Results We formalize the notion of divergence after duplication and provide a theoretical basis for the inference of primary and secondary orthologs. We then put these ideas to practice with the Hybrid Prediction of Paralogs and Orthologs (HyPPO) framework, which combines ideas from both similarity and phylogeny approaches. We apply our method to simulated and empirical datasets and show that we achieve superior accuracy in predicting primary orthologs, secondary orthologs and paralogs. Availability and implementation HyPPO is a modular framework with a core developed in Python and is provided with a variety of C++ modules. The source code is available at https://github.com/manuellafond/HyPPO. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manuel Lafond
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada.,Department of Computer Science, Université de Sherbrooke, Sherbrooke, Canada
| | | | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada
| |
Collapse
|
41
|
Verbruggen B, Gunnarsson L, Kristiansson E, Österlund T, Owen SF, Snape JR, Tyler CR. ECOdrug: a database connecting drugs and conservation of their targets across species. Nucleic Acids Res 2019; 46:D930-D936. [PMID: 29140522 PMCID: PMC5753218 DOI: 10.1093/nar/gkx1024] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Accepted: 10/23/2017] [Indexed: 12/12/2022] Open
Abstract
Pharmaceuticals are designed to interact with specific molecular targets in humans and these targets generally have orthologs in other species. This provides opportunities for the drug discovery community to use alternative model species for drug development. It also means, however, there is potential for mode of action related effects in non-target wildlife species as many pharmaceuticals reach the environment through patient use and manufacturing wastes. Acquiring insight in drug target ortholog predictions across species and taxonomic groups has proven difficult because of the lack of an optimal strategy and because necessary information is spread across multiple and diverse sources and platforms. We introduce a new research platform tool, ECOdrug, that reliably connects drugs to their protein targets across divergent species. It harmonizes ortholog predictions from multiple sources via a simple user interface underpinning critical applications for a wide range of studies in pharmacology, ecotoxicology and comparative evolutionary biology. ECOdrug can be used to identify species with drug targets and identify drugs that interact with those targets. As such, it can be applied to support intelligent targeted drug safety testing by ensuring appropriate and relevant species are selected in ecological risk assessments. ECOdrug is freely accessible and available at: http://www.ecodrug.org.
Collapse
Affiliation(s)
- Bas Verbruggen
- Biosciences, College of Life & Environmental Sciences, University of Exeter, Exeter EX4 4QD, UK
| | - Lina Gunnarsson
- Biosciences, College of Life & Environmental Sciences, University of Exeter, Exeter EX4 4QD, UK
| | - Erik Kristiansson
- Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, Gothenburg SE-416 12, Sweden
| | - Tobias Österlund
- Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, Gothenburg SE-416 12, Sweden
| | | | - Jason R Snape
- Global Environment, AstraZeneca, Cheshire SK10 4TF, UK.,School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| | - Charles R Tyler
- Biosciences, College of Life & Environmental Sciences, University of Exeter, Exeter EX4 4QD, UK
| |
Collapse
|
42
|
Xu L, Dong Z, Fang L, Luo Y, Wei Z, Guo H, Zhang G, Gu YQ, Coleman-Derr D, Xia Q, Wang Y. OrthoVenn2: a web server for whole-genome comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Res 2019; 47:W52-W58. [PMID: 31053848 PMCID: PMC6602458 DOI: 10.1093/nar/gkz333] [Citation(s) in RCA: 569] [Impact Index Per Article: 113.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 04/16/2019] [Accepted: 04/25/2019] [Indexed: 12/28/2022] Open
Abstract
OrthoVenn is a powerful web platform for the comparison and analysis of whole-genome orthologous clusters. Here we present an updated version, OrthoVenn2, which provides new features that facilitate the comparative analysis of orthologous clusters among up to 12 species. Additionally, this update offers improvements to data visualization and interpretation, including an occurrence pattern table for interrogating the overlap of each orthologous group for the queried species. Within the occurrence table, the functional annotations and summaries of the disjunctions and intersections of clusters between the chosen species can be displayed through an interactive Venn diagram. To facilitate a broader range of comparisons, a larger number of species, including vertebrates, metazoa, protists, fungi, plants and bacteria, have been added in OrthoVenn2. Finally, a stand-alone version is available to perform large dataset comparisons and to visualize results locally without limitation of species number. In summary, OrthoVenn2 is an efficient and user-friendly web server freely accessible at https://orthovenn2.bioinfotoolkits.net.
Collapse
Affiliation(s)
- Ling Xu
- Biological Science Research Center, Southwest University, Chongqing 400715, China
| | - Zhaobin Dong
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94710, USA
- USDA-ARS, Plant Gene Expression Center, Albany, CA 94706, USA
| | - Lu Fang
- Biological Science Research Center, Southwest University, Chongqing 400715, China
| | - Yongjiang Luo
- Biological Science Research Center, Southwest University, Chongqing 400715, China
| | - Zhaoyuan Wei
- Biological Science Research Center, Southwest University, Chongqing 400715, China
| | - Hailong Guo
- Biological Science Research Center, Southwest University, Chongqing 400715, China
| | - Guoqing Zhang
- Biological Science Research Center, Southwest University, Chongqing 400715, China
| | - Yong Q Gu
- USDA-ARS, Western Regional Research Center, Crop Improvement and Genetics Research Unit, Albany, CA 94706, USA
| | - Devin Coleman-Derr
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94710, USA
- USDA-ARS, Plant Gene Expression Center, Albany, CA 94706, USA
| | - Qingyou Xia
- Biological Science Research Center, Southwest University, Chongqing 400715, China
| | - Yi Wang
- Biological Science Research Center, Southwest University, Chongqing 400715, China
| |
Collapse
|
43
|
Rey C, Veber P, Boussau B, Sémon M. CAARS: comparative assembly and annotation of RNA-Seq data. Bioinformatics 2019; 35:2199-2207. [PMID: 30452539 PMCID: PMC6596894 DOI: 10.1093/bioinformatics/bty903] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 09/13/2018] [Accepted: 11/16/2018] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. RESULTS We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. AVAILABILITY AND IMPLEMENTATION CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carine Rey
- UnivLyon, Université Claude Bernard Lyon 1, ENS de Lyon, CNRS UMR, INSERM U1210, LBMC, F-69007, Lyon, France
| | - Philippe Veber
- UnivLyon, Université Claude Bernard Lyon 1, CNRS, UMR, LBBE, F-69100, Villeurbanne, France
| | - Bastien Boussau
- UnivLyon, Université Claude Bernard Lyon 1, CNRS, UMR, LBBE, F-69100, Villeurbanne, France
| | - Marie Sémon
- UnivLyon, Université Claude Bernard Lyon 1, ENS de Lyon, CNRS UMR, INSERM U1210, LBMC, F-69007, Lyon, France
| |
Collapse
|
44
|
Nigam K, Sanyal S, Gupta S, Gupta OP, Mahdi AA, Bhatt MLB. Alteration of the Risk of Oral Pre-Cancer and Cancer in North India Population by CYP1A1 Polymorphism Genotypes and
Haplotype. Asian Pac J Cancer Prev 2019; 20:345-354. [PMID: 30803192 PMCID: PMC6897020 DOI: 10.31557/apjcp.2019.20.2.345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background: The aim of this study was to evaluate any association between CYP1A1 (T6235C and C4887A, A4889G) gene polymorphisms and the risk of oral pre-cancer and cancer. Methods: In the present study, 250 patients with oral pre-cancer and/or cancer and 250 healthy controls were genotyped for CYP1A1 T6235C, C4887A and A4889G polymorphisms by the PCR-RFLP method. Results: None of the CYP1A1 polymorphisms were associated with the risk of either oral cancer or pre cancer. Nor were any links with clinical parameters of oral cancer found. However, among the consumers of areca nut/pan masala the TC, CA and AG genotypes respectively for the CYP1A1 T6235C,C4887Aand A4889G polymorphisms were significantly more frequent in controls compared to cases (p values for cases vs. controls of 0.0032, 0.0019 and 0.0009, respectively). Similarly, compared to the haplotype TCA, TAG constituted by CYP1A1 T6235C and C4887A and A4889G was more common in controls (6.88%) than in cases (4.07%). Conclusion: Our results suggest that genotypes regarding CYP1A1 polymorphisms may modulate the risk of oral cancer and pre-cancer among the areca nut/pan masala consumers. The haplotype may also exert an influence in our north Indian population.
Collapse
Affiliation(s)
- Kumud Nigam
- Department of Oral Pathology and Microbiology, King George’s Medical University, Lucknow, India.
| | | | | | | | | | | |
Collapse
|
45
|
Torres Manno MA, Pizarro MD, Prunello M, Magni C, Daurelio LD, Espariz M. GeM-Pro: a tool for genome functional mining and microbial profiling. Appl Microbiol Biotechnol 2019; 103:3123-3134. [DOI: 10.1007/s00253-019-09648-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 01/11/2019] [Accepted: 01/14/2019] [Indexed: 11/30/2022]
|
46
|
Guillén Y, Casillas S, Ruiz A. Genome-Wide Patterns of Sequence Divergence of Protein-Coding Genes Between Drosophila buzzatii and D. mojavensis. J Hered 2019; 110:92-101. [PMID: 30124907 DOI: 10.1093/jhered/esy041] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 08/14/2018] [Indexed: 12/15/2022] Open
Abstract
Evolutionary rates for protein-coding genes are determined not only by natural selection but also by multiple genomic factors including mutation rates, recombination, gene expression levels, and chromosomal location. To investigate the joint effects of different genomic determinants on protein evolution, we compared the coding sequences of 9017 single-copy orthologs between 2 cactophilic species from the Drosophila subgenus, Drosophila mojavensis and D. buzzatii, whose genomes have been previously sequenced. We assessed the impact of 7 genomic determinants, that is, chromosome type, recombination, chromosomal inversions, expression breadth, expression level, gene length, and the number of exons, on divergence rates of protein-coding genes to understand patterns of evolutionary variation. Integrative analysis of these factors revealed that 1) X-linked and autosomal genes evolve at significantly different rates in agreement with the faster-X hypothesis, 2) genes located on the dot chromosome and pericentromeric regions have higher divergence rates, 3) genes located at chromosomes with more fixed inversions have higher pairwise divergence than those located at nearly collinear chromosomes, and 4) gene expression patterns can be considered the strongest determinant of protein evolution. In addition, the number of exons and protein length had a significant effect on pairwise divergence at synonymous sites. All in all, our results show the relative importance of each genomic factor on the rates of protein evolution and functional constraint in these 2 cactophilic Drosophila species.
Collapse
Affiliation(s)
- Yolanda Guillén
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Sònia Casillas
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain.,The Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Alfredo Ruiz
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| |
Collapse
|
47
|
Medeiros Filho F, do Nascimento APB, dos Santos MT, Carvalho-Assef APD, da Silva FAB. Gene regulatory network inference and analysis of multidrug-resistant Pseudomonas aeruginosa. Mem Inst Oswaldo Cruz 2019; 114:e190105. [PMID: 31389522 PMCID: PMC6684008 DOI: 10.1590/0074-02760190105] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 06/26/2019] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Healthcare-associated infections caused by bacteria such as
Pseudomonas aeruginosa are a major public health
problem worldwide. Gene regulatory networks (GRN) computationally represent
interactions among regulatory genes and their targets. They are an important
approach to help understand bacterial behaviour and to provide novel ways of
overcoming scientific challenges, including the identification of potential
therapeutic targets and the development of new drugs. OBJECTIVES The goal of this study was to reconstruct the multidrug-resistant (MDR)
P. aeruginosa GRN and to analyse its topological
properties. METHODS The methodology used in this study was based on gene orthology inference
using the reciprocal best hit method. We used the genome of P.
aeruginosa CCBH4851 as the basis of the reconstruction process.
This MDR strain is representative of the sequence type 277, which was
involved in an endemic outbreak in Brazil. FINDINGS We obtained a network with a larger number of regulatory genes, target genes
and interactions as compared to the previously reported network. Topological
analysis results are in accordance with the complex network representation
of biological processes. MAIN CONCLUSIONS The properties of the network were consistent with the biological features
of P. aeruginosa. To the best of our knowledge, the
P. aeruginosa GRN presented here is the most complete
version available to date.
Collapse
|
48
|
Oti M, Pane A, Sammeth M. Comparative Genomics in Drosophila. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2018; 1704:433-450. [PMID: 29277877 DOI: 10.1007/978-1-4939-7463-4_17] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Since the pioneering studies of Thomas Hunt Morgan and coworkers at the dawn of the twentieth century, Drosophila melanogaster and its sister species have tremendously contributed to unveil the rules underlying animal genetics, development, behavior, evolution, and human disease. Recent advances in DNA sequencing technologies launched Drosophila into the post-genomic era and paved the way for unprecedented comparative genomics investigations. The complete sequencing and systematic comparison of the genomes from 12 Drosophila species represents a milestone achievement in modern biology, which allowed a plethora of different studies ranging from the annotation of known and novel genomic features to the evolution of chromosomes and, ultimately, of entire genomes. Despite the efforts of countless laboratories worldwide, the vast amount of data that were produced over the past 15 years is far from being fully explored.In this chapter, we will review some of the bioinformatic approaches that were developed to interrogate the genomes of the 12 Drosophila species. Setting off from alignments of the entire genomic sequences, the degree of conservation can be separately evaluated for every region of the genome, providing already first hints about elements that are under purifying selection and therefore likely functional. Furthermore, the careful analysis of repeated sequences sheds light on the evolutionary dynamics of transposons, an enigmatic and fascinating class of mobile elements housed in the genomes of animals and plants. Comparative genomics also aids in the computational identification of the transcriptionally active part of the genome, first and foremost of protein-coding loci, but also of transcribed nevertheless apparently noncoding regions, which were once considered "junk" DNA. Eventually, the synergy between functional and comparative genomics also facilitates in silico and in vivo studies on cis-acting regulatory elements, like transcription factor binding sites, that due to the high degree of sequence variability usually impose increased challenges for bioinformatics approaches.
Collapse
Affiliation(s)
- Martin Oti
- Institute of Biophysics Carlos Chagas Filho (IBCCF), Federal University of Rio de Janeiro (UFRJ), Avenida Carlos Chagas Filho 373, 21941-902, Rio de Janeiro, RJ, Brazil
| | - Attilio Pane
- Institute of Biomedical Sciences (ICB), Federal University of Rio de Janeiro (UFRJ), 21941-902, Rio de Janeiro, RJ, Brazil
| | - Michael Sammeth
- Institute of Biophysics Carlos Chagas Filho (IBCCF), Federal University of Rio de Janeiro (UFRJ), Avenida Carlos Chagas Filho 373, 21941-902, Rio de Janeiro, RJ, Brazil.
| |
Collapse
|
49
|
Ambrosino L, Ruggieri V, Bostan H, Miralto M, Vitulo N, Zouine M, Barone A, Bouzayen M, Frusciante L, Pezzotti M, Valle G, Chiusano ML. Multilevel comparative bioinformatics to investigate evolutionary relationships and specificities in gene annotations: an example for tomato and grapevine. BMC Bioinformatics 2018; 19:435. [PMID: 30497367 PMCID: PMC6266932 DOI: 10.1186/s12859-018-2420-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Background “Omics” approaches may provide useful information for a deeper understanding of speciation events, diversification and function innovation. This can be achieved by investigating the molecular similarities at sequence level between species, allowing the definition of ortholog and paralog genes. However, the spreading of sequenced genome, often endowed with still preliminary annotations, requires suitable bioinformatics to be appropriately exploited in this framework. Results We presented here a multilevel comparative approach to investigate on genome evolutionary relationships and peculiarities of two fleshy fruit species of relevant agronomic interest, Solanum lycopersicum (tomato) and Vitis vinifera (grapevine). We defined 17,823 orthology relationships between tomato and grapevine reference gene annotations. The resulting orthologs are associated with the detected paralogs in each species, permitting the definition of gene networks, useful to investigate the different relationships. The reconciliation of the compared collections in terms of an updating of the functional descriptions was also exploited. All the results were made accessible in ComParaLogs, a dedicated bioinformatics platform available at http://biosrv.cab.unina.it/comparalogs/gene/search. Conclusions The aim of the work was to suggest a reliable approach to detect all similarities of gene loci between two species based on the integration of results from different levels of information, such as the gene, the transcript and the protein sequences, overcoming possible limits due to exclusive protein versus protein comparisons. This to define reliable ortholog and paralog genes, as well as species specific gene loci in the two species, overcoming limits due to the possible draft nature of preliminary gene annotations. Moreover, reconciled functional descriptions, as well as common or peculiar enzymatic classes and protein domains from tomato and grapevine, together with the definition of species-specific gene sets after the pairwise comparisons, contributed a comprehensive set of information useful to comparatively exploit the two species gene annotations and investigate on differences between species with climacteric and non-climacteric fruits. In addition, the definition of networks of ortholog genes and of associated paralogs, and the organization of web-based interfaces for the exploration of the results, defined a friendly computational bench-work in support of comparative analyses between two species. Electronic supplementary material The online version of this article (10.1186/s12859-018-2420-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Luca Ambrosino
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Naples, Italy
| | - Valentino Ruggieri
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Center for Research in Agricultural Genomics, Cerdanyola, Barcelona, Spain
| | - Hamed Bostan
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Plants for Human Health Institute, North Carolina State University, Kannapolis, NC, USA
| | - Marco Miralto
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Naples, Italy
| | - Nicola Vitulo
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Mohamed Zouine
- Génomique et Biotechnologie des Fruits, UMR990 INRA / INP-Toulouse, Université de Toulouse, Castanet-Tolosan, France
| | - Amalia Barone
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy
| | - Mondher Bouzayen
- Génomique et Biotechnologie des Fruits, UMR990 INRA / INP-Toulouse, Université de Toulouse, Castanet-Tolosan, France
| | - Luigi Frusciante
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy
| | - Mario Pezzotti
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Giorgio Valle
- CRIBI Biotechnology Centre, University of Padova, Padova, Italy
| | - Maria Luisa Chiusano
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy. .,Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Naples, Italy.
| |
Collapse
|
50
|
Lassalle F, Planel R, Penel S, Chapulliot D, Barbe V, Dubost A, Calteau A, Vallenet D, Mornico D, Bigot T, Guéguen L, Vial L, Muller D, Daubin V, Nesme X. Ancestral Genome Estimation Reveals the History of Ecological Diversification in Agrobacterium. Genome Biol Evol 2018; 9:3413-3431. [PMID: 29220487 PMCID: PMC5739047 DOI: 10.1093/gbe/evx255] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/01/2017] [Indexed: 12/12/2022] Open
Abstract
Horizontal gene transfer (HGT) is considered as a major source of innovation in bacteria, and as such is expected to drive adaptation to new ecological niches. However, among the many genes acquired through HGT along the diversification history of genomes, only a fraction may have actively contributed to sustained ecological adaptation. We used a phylogenetic approach accounting for the transfer of genes (or groups of genes) to estimate the history of genomes in Agrobacterium biovar 1, a diverse group of soil and plant-dwelling bacterial species. We identified clade-specific blocks of cotransferred genes encoding coherent biochemical pathways that may have contributed to the evolutionary success of key Agrobacterium clades. This pattern of gene coevolution rejects a neutral model of transfer, in which neighboring genes would be transferred independently of their function and rather suggests purifying selection on collectively coded acquired pathways. The acquisition of these synapomorphic blocks of cofunctioning genes probably drove the ecological diversification of Agrobacterium and defined features of ancestral ecological niches, which consistently hint at a strong selective role of host plant rhizospheres.
Collapse
Affiliation(s)
- Florent Lassalle
- Ecologie Microbienne, CNRS, INRA, VetAgro Sup, UCBL, Université de Lyon, Villeurbanne, France.,Biométrie et Biologie Evolutive, CNRS, UCBL, Université de Lyon, Villeurbanne, France.,Ecole Normale Supérieure de Lyon, Lyon, France
| | - Rémi Planel
- Biométrie et Biologie Evolutive, CNRS, UCBL, Université de Lyon, Villeurbanne, France
| | - Simon Penel
- Biométrie et Biologie Evolutive, CNRS, UCBL, Université de Lyon, Villeurbanne, France
| | - David Chapulliot
- Ecologie Microbienne, CNRS, INRA, VetAgro Sup, UCBL, Université de Lyon, Villeurbanne, France
| | - Valérie Barbe
- Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA) Direction de la Recherche Fondamentale, Institut de Biologie Francois-Jacob (IBFJ), Genoscope, Evry, France
| | - Audrey Dubost
- Ecologie Microbienne, CNRS, INRA, VetAgro Sup, UCBL, Université de Lyon, Villeurbanne, France
| | - Alexandra Calteau
- Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA) Direction de la Recherche Fondamentale, Institut de Biologie Francois-Jacob (IBFJ), Genoscope, Evry, France.,Laboratoire d'Analyse Bioinformatiques pour la Génomique et le Métabolisme, CNRS, UMR 8030, Evry, France.,UEVE, Université d'Evry Val d'Essonne, France
| | - David Vallenet
- Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA) Direction de la Recherche Fondamentale, Institut de Biologie Francois-Jacob (IBFJ), Genoscope, Evry, France.,Laboratoire d'Analyse Bioinformatiques pour la Génomique et le Métabolisme, CNRS, UMR 8030, Evry, France.,UEVE, Université d'Evry Val d'Essonne, France
| | - Damien Mornico
- Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA) Direction de la Recherche Fondamentale, Institut de Biologie Francois-Jacob (IBFJ), Genoscope, Evry, France.,Laboratoire d'Analyse Bioinformatiques pour la Génomique et le Métabolisme, CNRS, UMR 8030, Evry, France.,UEVE, Université d'Evry Val d'Essonne, France
| | - Thomas Bigot
- Biométrie et Biologie Evolutive, CNRS, UCBL, Université de Lyon, Villeurbanne, France
| | - Laurent Guéguen
- Biométrie et Biologie Evolutive, CNRS, UCBL, Université de Lyon, Villeurbanne, France
| | - Ludovic Vial
- Ecologie Microbienne, CNRS, INRA, VetAgro Sup, UCBL, Université de Lyon, Villeurbanne, France
| | - Daniel Muller
- Ecologie Microbienne, CNRS, INRA, VetAgro Sup, UCBL, Université de Lyon, Villeurbanne, France
| | - Vincent Daubin
- Biométrie et Biologie Evolutive, CNRS, UCBL, Université de Lyon, Villeurbanne, France
| | - Xavier Nesme
- Ecologie Microbienne, CNRS, INRA, VetAgro Sup, UCBL, Université de Lyon, Villeurbanne, France
| |
Collapse
|