1
|
Xu L, Yue XL, Li HZ, Jian SL, Shu WS, Cui L, Xu XW. Aerobic Anoxygenic Phototrophic Bacteria in the Marine Environments Revealed by Raman/Fluorescence-Guided Single-Cell Sorting and Targeted Metagenomics. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:7087-7098. [PMID: 38651173 DOI: 10.1021/acs.est.4c02881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Aerobic anoxygenic phototrophic bacteria (AAPB) contribute profoundly to the global carbon cycle. However, most AAPB in marine environments are uncultured and at low abundance, hampering the recognition of their functions and molecular mechanisms. In this study, we developed a new culture-independent method to identify and sort AAPB using single-cell Raman/fluorescence spectroscopy. Characteristic Raman and fluorescent bands specific to bacteriochlorophyll a (Bchl a) in AAPB were determined by comparing multiple known AAPB with non-AAPB isolates. Using these spectroscopic biomarkers, AAPB in coastal seawater, pelagic seawater, and hydrothermal sediment samples were screened, sorted, and sequenced. 16S rRNA gene analysis and functional gene annotations of sorted cells revealed novel AAPB members and functional genes, including one species belonging to the genus Sphingomonas, two genera affiliated to classes Betaproteobacteria and Gammaproteobacteria, and function genes bchCDIX, pucC2, and pufL related to Bchl a biosynthesis and photosynthetic reaction center assembly. Metagenome-assembled genomes (MAGs) of sorted cells from pelagic seawater and deep-sea hydrothermal sediment belonged to Erythrobacter sanguineus that was considered as an AAPB and genus Sphingomonas, respectively. Moreover, multiple photosynthesis-related genes were annotated in both MAGs, and comparative genomic analysis revealed several exclusive genes involved in amino acid and inorganic ion metabolism and transport. This study employed a new single-cell spectroscopy method to detect AAPB, not only broadening the taxonomic and genetic contents of AAPB in marine environments but also revealing their genetic mechanisms at the single-genomic level.
Collapse
Affiliation(s)
- Lin Xu
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, P. R. China
- Collge of Life Sciences and Medicine, Zhejiang Sci-Tech University, Hangzhou 310018, P. R. China
| | - Xiao-Lan Yue
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, P. R. China
- School of Oceanography, Shanghai Jiao Tong University, Shanghai 200030, P. R. China
| | - Hong-Zhe Li
- Key Lab of Urban Environment and Health, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, P. R. China
| | - Shu-Ling Jian
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, P. R. China
- Key Lab of Urban Environment and Health, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, P. R. China
| | - Wen-Sheng Shu
- Institute of Ecological Science, School of Life Science, South China Normal University, Guangzhou 510631, P. R. China
| | - Li Cui
- Key Lab of Urban Environment and Health, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, P. R. China
| | - Xue-Wei Xu
- Key Laboratory of Marine Ecosystem Dynamics, Ministry of Natural Resources & Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, P. R. China
- School of Oceanography, Shanghai Jiao Tong University, Shanghai 200030, P. R. China
| |
Collapse
|
2
|
Penumarthi LR, Baptista RP, Beaudry MS, Glenn TC, Kissinger JC. A new chromosome-level genome assembly and annotation of Cryptosporidium meleagridis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.16.580748. [PMID: 38405792 PMCID: PMC10888889 DOI: 10.1101/2024.02.16.580748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Cryptosporidium spp. are medically and scientifically relevant protozoan parasites that cause severe diarrheal illness in infants and immunosuppressed populations as well as animals. Although most human Cryptosporidium infections are caused by C. parvum and C. hominis, there are several other human-infecting species including C. meleagridis, which is commonly observed in developing countries. Here, we polished and annotated a long-read genome sequence assembly for C. meleagridis TU1867, a species which infects birds and humans. The genome sequence was generated using a combination of whole genome amplification (WGA) and long-read Oxford Nanopore Technologies sequencing. The assembly was then polished with Illumina data. The chromosome-level genome assembly is 9.2 Mbp with a contig N50 of 1.1 Mb. Annotation revealed 3,923 protein-coding genes. A BUSCO analysis indicates a completeness of 96.6% (n=446), including 430 (96.4%) single-copy and 1 (0.224%) duplicated apicomplexan conserved gene(s). The new C. meleagridis genome assembly is nearly gap-free and provides a valuable new resource for the Cryptosporidium community and future studies on evolution and host-specificity.
Collapse
Affiliation(s)
- Lasya R Penumarthi
- Institute of Bioinformatics, University of Georgia. Athens, Georgia. 30602, USA
- Center for Tropical and Emerging Global Diseases, University of Georgia. Athens, Georgia 30602, USA
| | - Rodrigo P Baptista
- Institute of Bioinformatics, University of Georgia. Athens, Georgia. 30602, USA
- Center for Tropical and Emerging Global Diseases, University of Georgia. Athens, Georgia 30602, USA
| | - Megan S Beaudry
- Department of Environmental Health Science, University of Georgia. Athens, GA, USA
| | - Travis C Glenn
- Institute of Bioinformatics, University of Georgia. Athens, Georgia. 30602, USA
- Department of Environmental Health Science, University of Georgia. Athens, GA, USA
- Department of Genetics, University of Georgia. Athens, Georgia 30602, USA
| | - Jessica C Kissinger
- Institute of Bioinformatics, University of Georgia. Athens, Georgia. 30602, USA
- Center for Tropical and Emerging Global Diseases, University of Georgia. Athens, Georgia 30602, USA
- Department of Genetics, University of Georgia. Athens, Georgia 30602, USA
| |
Collapse
|
3
|
Shin J, Rychel K, Palsson BO. Systems biology of competency in Vibrio natriegens is revealed by applying novel data analytics to the transcriptome. Cell Rep 2023; 42:112619. [PMID: 37285268 DOI: 10.1016/j.celrep.2023.112619] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 04/27/2023] [Accepted: 05/22/2023] [Indexed: 06/09/2023] Open
Abstract
Vibrio natriegens regulates natural competence through the TfoX and QstR transcription factors, which are involved in external DNA capture and transport. However, the extensive genetic and transcriptional regulatory basis for competency remains unknown. We used a machine-learning approach to decompose Vibrio natriegens's transcriptome into 45 groups of independently modulated sets of genes (iModulons). Our findings show that competency is associated with the repression of two housekeeping iModulons (iron metabolism and translation) and the activation of six iModulons; including TfoX and QstR, a novel iModulon of unknown function, and three housekeeping iModulons (representing motility, polycations, and reactive oxygen species [ROS] responses). Phenotypic screening of 83 gene deletion strains demonstrates that loss of iModulon function reduces or eliminates competency. This database-iModulon-discovery cycle unveils the transcriptomic basis for competency and its relationship to housekeeping functions. These results provide the genetic basis for systems biology of competency in this organism.
Collapse
Affiliation(s)
- Jongoh Shin
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Kevin Rychel
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA; Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Lyngby, Denmark; Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
4
|
Behjati A, Zare-Mirakabad F, Arab SS, Nowzari-Dalini A. Protein sequence profile prediction using ProtAlbert transformer. Comput Biol Chem 2022; 99:107717. [DOI: 10.1016/j.compbiolchem.2022.107717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 06/03/2022] [Accepted: 06/21/2022] [Indexed: 11/03/2022]
|
5
|
Crow M, Suresh H, Lee J, Gillis J. Coexpression reveals conserved gene programs that co-vary with cell type across kingdoms. Nucleic Acids Res 2022; 50:4302-4314. [PMID: 35451481 PMCID: PMC9071420 DOI: 10.1093/nar/gkac276] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Revised: 03/30/2022] [Accepted: 04/08/2022] [Indexed: 12/24/2022] Open
Abstract
What makes a mouse a mouse, and not a hamster? Differences in gene regulation between the two organisms play a critical role. Comparative analysis of gene coexpression networks provides a general framework for investigating the evolution of gene regulation across species. Here, we compare coexpression networks from 37 species and quantify the conservation of gene activity 1) as a function of evolutionary time, 2) across orthology prediction algorithms, and 3) with reference to cell- and tissue-specificity. We find that ancient genes are expressed in multiple cell types and have well conserved coexpression patterns, however they are expressed at different levels across cell types. Thus, differential regulation of ancient gene programs contributes to transcriptional cell identity. We propose that this differential regulation may play a role in cell diversification in both the animal and plant kingdoms.
Collapse
Affiliation(s)
- Megan Crow
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor NY, USA
| | - Hamsini Suresh
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor NY, USA
| | - John Lee
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor NY, USA
| | - Jesse Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor NY, USA
| |
Collapse
|
6
|
Huang LC, Taujale R, Gravel N, Venkat A, Yeung W, Byrne DP, Eyers PA, Kannan N. KinOrtho: a method for mapping human kinase orthologs across the tree of life and illuminating understudied kinases. BMC Bioinformatics 2021; 22:446. [PMID: 34537014 PMCID: PMC8449880 DOI: 10.1186/s12859-021-04358-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Protein kinases are among the largest druggable family of signaling proteins, involved in various human diseases, including cancers and neurodegenerative disorders. Despite their clinical relevance, nearly 30% of the 545 human protein kinases remain highly understudied. Comparative genomics is a powerful approach for predicting and investigating the functions of understudied kinases. However, an incomplete knowledge of kinase orthologs across fully sequenced kinomes severely limits the application of comparative genomics approaches for illuminating understudied kinases. Here, we introduce KinOrtho, a query- and graph-based orthology inference method that combines full-length and domain-based approaches to map one-to-one kinase orthologs across 17 thousand species. RESULTS Using multiple metrics, we show that KinOrtho performed better than existing methods in identifying kinase orthologs across evolutionarily divergent species and eliminated potential false positives by flagging sequences without a proper kinase domain for further evaluation. We demonstrate the advantage of using domain-based approaches for identifying domain fusion events, highlighting a case between an understudied serine/threonine kinase TAOK1 and a metabolic kinase PIK3C2A with high co-expression in human cells. We also identify evolutionary fission events involving the understudied OBSCN kinase domains, further highlighting the value of domain-based orthology inference approaches. Using KinOrtho-defined orthologs, Gene Ontology annotations, and machine learning, we propose putative biological functions of several understudied kinases, including the role of TP53RK in cell cycle checkpoint(s), the involvement of TSSK3 and TSSK6 in acrosomal vesicle localization, and potential functions for the ULK4 pseudokinase in neuronal development. CONCLUSIONS In sum, KinOrtho presents a novel query-based tool to identify one-to-one orthologous relationships across thousands of proteomes that can be applied to any protein family of interest. We exploit KinOrtho here to identify kinase orthologs and show that its well-curated kinome ortholog set can serve as a valuable resource for illuminating understudied kinases, and the KinOrtho framework can be extended to any protein-family of interest.
Collapse
Affiliation(s)
- Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Rahil Taujale
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Nathan Gravel
- PREP@UGA, University of Georgia, 500 D.W. Brooks Drive, Athens, GA 30602 USA
| | - Aarya Venkat
- Department of Biochemistry and Molecular Biology, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Dominic P. Byrne
- Department of Biochemistry and Systems Biology, University of Liverpool, Crown St, Liverpool, UK
| | - Patrick A. Eyers
- Department of Biochemistry and Systems Biology, University of Liverpool, Crown St, Liverpool, UK
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
- Department of Biochemistry and Molecular Biology, University of Georgia, 120 Green St., Athens, GA 30602 USA
| |
Collapse
|
7
|
Hybrid Deep Learning Based on a Heterogeneous Network Profile for Functional Annotations of Plasmodium falciparum Genes. Int J Mol Sci 2021; 22:ijms221810019. [PMID: 34576183 PMCID: PMC8468833 DOI: 10.3390/ijms221810019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 09/13/2021] [Accepted: 09/14/2021] [Indexed: 12/15/2022] Open
Abstract
Functional annotation of unknown function genes reveals unidentified functions that can enhance our understanding of complex genome communications. A common approach for inferring gene function involves the ortholog-based method. However, genetic data alone are often not enough to provide information for function annotation. Thus, integrating other sources of data can potentially increase the possibility of retrieving annotations. Network-based methods are efficient techniques for exploring interactions among genes and can be used for functional inference. In this study, we present an analysis framework for inferring the functions of Plasmodium falciparum genes based on connection profiles in a heterogeneous network between human and Plasmodium falciparum proteins. These profiles were fed into a hybrid deep learning algorithm to predict the orthologs of unknown function genes. The results show high performance of the model's predictions, with an AUC of 0.89. One hundred and twenty-one predicted pairs with high prediction scores were selected for inferring the functions using statistical enrichment analysis. Using this method, PF3D7_1248700 and PF3D7_0401800 were found to be involved with muscle contraction and striated muscle tissue development, while PF3D7_1303800 and PF3D7_1201000 were found to be related to protein dephosphorylation. In conclusion, combining a heterogeneous network and a hybrid deep learning technique can allow us to identify unknown gene functions of malaria parasites. This approach is generalized and can be applied to other diseases that enhance the field of biomedical science.
Collapse
|
8
|
Kiran K, Rawal HC, Dubey H, Jaswal R, Bhardwaj SC, Deshmukh R, Sharma TR. Genome-Wide Analysis of Four Pathotypes of Wheat Rust Pathogen ( Puccinia graminis) Reveals Structural Variations and Diversifying Selection. J Fungi (Basel) 2021; 7:701. [PMID: 34575739 PMCID: PMC8468629 DOI: 10.3390/jof7090701] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 08/19/2021] [Accepted: 08/21/2021] [Indexed: 12/28/2022] Open
Abstract
Diseases caused by Puccinia graminis are some of the most devastating diseases of wheat. Extensive genomic understanding of the pathogen has proven helpful not only in understanding host- pathogen interaction but also in finding appropriate control measures. In the present study, whole-genome sequencing of four diverse P. graminis pathotypes was performed to understand the genetic variation and evolution. An average of 63.5 Gb of data per pathotype with about 100× average genomic coverage was achieved with 100-base paired-end sequencing performed with Illumina Hiseq 1000. Genome structural annotations collectively predicted 9273 functional proteins including ~583 extracellular secreted proteins. Approximately 7.4% of the genes showed similarity with the PHI database which is suggestive of their significance in pathogenesis. Genome-wide analysis demonstrated pathotype 117-6 as likely distinct and descended through a different lineage. The 3-6% more SNPs in the regulatory regions and 154 genes under positive selection with their orthologs and under negative selection in the other three pathotypes further supported pathotype 117-6 to be highly diverse in nature. The genomic information generated in the present study could serve as an important source for comparative genomic studies across the genus Puccinia and lead to better rust management in wheat.
Collapse
Affiliation(s)
- Kanti Kiran
- Pusa Campus, ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India; (K.K.); (H.C.R.); (H.D.); (R.J.)
| | - Hukam C. Rawal
- Pusa Campus, ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India; (K.K.); (H.C.R.); (H.D.); (R.J.)
| | - Himanshu Dubey
- Pusa Campus, ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India; (K.K.); (H.C.R.); (H.D.); (R.J.)
| | - Rajdeep Jaswal
- Pusa Campus, ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India; (K.K.); (H.C.R.); (H.D.); (R.J.)
| | - Subhash C. Bhardwaj
- Regional Station, ICAR-Indian Institute of Wheat and Barley Research, Shimla 171002, India;
| | - Rupesh Deshmukh
- National Agri-Food Biotechnology Institute, Punjab 140306, India;
| | - Tilak Raj Sharma
- Pusa Campus, ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India; (K.K.); (H.C.R.); (H.D.); (R.J.)
- Division of Crop Science, ICAR-Indian Council of Agricultural Research, New Delhi 110001, India
| |
Collapse
|
9
|
Fu H, Zhang L, Fan C, Liu C, Li W, Li J, Zhao X, Jia S, Zhang Y. Domestication Shapes the Community Structure and Functional Metagenomic Content of the Yak Fecal Microbiota. Front Microbiol 2021; 12:594075. [PMID: 33897627 PMCID: PMC8059439 DOI: 10.3389/fmicb.2021.594075] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 03/05/2021] [Indexed: 01/07/2023] Open
Abstract
Domestication is a key factor of genetic variation; however, the mechanism by which domestication alters gut microbiota is poorly understood. Here, to explore the variation in the structure, function, rapidly evolved genes (REGs), and enzyme profiles of cellulase and hemicellulose in fecal microbiota, we studied the fecal microbiota in wild, half-blood, and domestic yaks based on 16S rDNA sequencing, shotgun-metagenomic sequencing, and the measurement of short-chain-fatty-acids (SCFAs) concentration. Results indicated that wild and half-blood yaks harbored an increased abundance of the phylum Firmicutes and reduced abundance of the genus Akkermansia, which are both associated with efficient energy harvesting. The gut microbial diversity decreased in domestic yaks. The results of the shotgun-metagenomic sequencing showed that the wild yak harbored an increased abundance of microbial pathways that play crucial roles in digestion and growth of the host, whereas the domestic yak harbored an increased abundance of methane-metabolism-related pathways. Wild yaks had enriched amounts of REGs in energy and carbohydrate metabolism pathways, and possessed a significantly increased abundance of cellulases and endohemicellulases in the glycoside hydrolase family compared to domestic yaks. The concentrations of acetic, propionic, n-butyric, i-butyric, n-valeric, and i-valeric acid were highest in wild yaks. Our study displayed the domestic effect on the phenotype of composition, function in gut microbiota, and SCFAs associated with gut microbiota, which had a closely association with the growth performance of the livestock. These findings may enlighten the researchers to construct more links between economic characteristics and gut microbiota, and develop new commercial strains in livestock based on the biotechnology of gut microbiota.
Collapse
Affiliation(s)
- Haibo Fu
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China.,Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Liangzhi Zhang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China.,Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, China
| | - Chao Fan
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China.,Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Chuanfa Liu
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China.,Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, China
| | - Wenjing Li
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China.,Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, China
| | - Jiye Li
- Datong Yak Breeding Farm of Qinghai Province, Datong, China
| | - Xinquan Zhao
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China.,Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, China
| | - Shangang Jia
- College of Grassland Science and Technology, China Agricultural University, Beijing, China
| | - Yanming Zhang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China.,Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, China
| |
Collapse
|
10
|
Ferretti L, Krämer-Eis A, Schiffer PH. Conserved Patterns in Developmental Processes and Phases, Rather than Genes, Unite the Highly Divergent Bilateria. Life (Basel) 2020; 10:E182. [PMID: 32899936 PMCID: PMC7555945 DOI: 10.3390/life10090182] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 08/31/2020] [Accepted: 09/02/2020] [Indexed: 01/03/2023] Open
Abstract
Bilateria are the predominant clade of animals on Earth. Despite having evolved a wide variety of body plans and developmental modes, they are characterized by common morphological traits. By default, researchers have tried to link clade-specific genes to these traits, thus distinguishing bilaterians from non-bilaterians, by their gene content. Here we argue that it is rather biological processes that unite Bilateria and set them apart from their non-bilaterian sisters, with a less complex body morphology. To test this hypothesis, we compared proteomes of bilaterian and non-bilaterian species in an elaborate computational pipeline, aiming to search for a set of bilaterian-specific genes. Despite the limited confidence in their bilaterian specificity, we nevertheless detected Bilateria-specific functional and developmental patterns in the sub-set of genes conserved in distantly related Bilateria. Using a novel multi-species GO-enrichment method, we determined the functional repertoire of genes that are widely conserved among Bilateria. Analyzing expression profiles in three very distantly related model species-D. melanogaster, D. rerio and C. elegans-we find characteristic peaks at comparable stages of development and a delayed onset of expression in embryos. In particular, the expression of the conserved genes appears to peak at the phylotypic stage of different bilaterian phyla. In summary, our study illustrate how development connects distantly related Bilateria after millions of years of divergence, pointing to processes potentially separating them from non-bilaterians. We argue that evolutionary biologists should return from a purely gene-centric view of evolution and place more focus on analyzing and defining conserved developmental processes and periods.
Collapse
Affiliation(s)
- Luca Ferretti
- The Pirbright Institute, Ash Road, Pirbright, Surrey GU24 0NF, UK
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | - Andrea Krämer-Eis
- Institut für Genetik, Universität zu Köln, Zülpicher Straße 47a, 50674 Köln, Germany;
| | - Philipp H. Schiffer
- Institut für Zoologie, Universität zu Köln, Zülpicher Straße 47b, 50674 Köln, Germany
| |
Collapse
|
11
|
Norsigian CJ, Fang X, Seif Y, Monk JM, Palsson BO. A workflow for generating multi-strain genome-scale metabolic models of prokaryotes. Nat Protoc 2020; 15:1-14. [PMID: 31863076 PMCID: PMC7017905 DOI: 10.1038/s41596-019-0254-3] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 10/08/2019] [Indexed: 11/09/2022]
Abstract
Genome-scale models (GEMs) of bacterial strains' metabolism have been formulated and used over the past 20 years. Recently, with the number of genome sequences exponentially increasing, multi-strain GEMs have proved valuable to define the properties of a species. Here, through four major stages, we extend the original Protocol used to generate a GEM for a single strain to enable multi-strain GEMs: (i) obtain or generate a high-quality model of a reference strain; (ii) compare the genome sequence between a reference strain and target strains to generate a homology matrix; (iii) generate draft strain-specific models from the homology matrix; and (iv) manually curate draft models. These multi-strain GEMs can be used to study pan-metabolic capabilities and strain-specific differences across a species, thus providing insights into its range of lifestyles. Unlike the original Protocol, this procedure is scalable and can be partly automated with the Supplementary Jupyter notebook Tutorial. This Protocol Extension joins the ranks of other comparable methods for generating models such as CarveMe and KBase. This extension of the original Protocol takes on the order of weeks to multiple months to complete depending on the availability of a suitable reference model.
Collapse
Affiliation(s)
- Charles J Norsigian
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Xin Fang
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Yara Seif
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Jonathan M Monk
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark.
| |
Collapse
|
12
|
Robinson MD, Vitek O. Benchmarking comes of age. Genome Biol 2019; 20:205. [PMID: 31597556 PMCID: PMC6785869 DOI: 10.1186/s13059-019-1846-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 10/01/2019] [Indexed: 11/25/2022] Open
Affiliation(s)
- Mark D Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057, Zurich, Switzerland.
| | - Olga Vitek
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
| |
Collapse
|
13
|
Hu X, Friedberg I. SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier. Gigascience 2019; 8:giz118. [PMID: 31648300 PMCID: PMC6812468 DOI: 10.1093/gigascience/giz118] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 06/07/2019] [Accepted: 09/05/2019] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Gene homology type classification is required for many types of genome analyses, including comparative genomics, phylogenetics, and protein function annotation. Consequently, a large variety of tools have been developed to perform homology classification across genomes of different species. However, when applied to large genomic data sets, these tools require high memory and CPU usage, typically available only in computational clusters. FINDINGS Here we present a new graph-based orthology analysis tool, SwiftOrtho, which is optimized for speed and memory usage when applied to large-scale data. SwiftOrtho uses long k-mers to speed up homology search, while using a reduced amino acid alphabet and spaced seeds to compensate for the loss of sensitivity due to long k-mers. In addition, it uses an affinity propagation algorithm to reduce the memory usage when clustering large-scale orthology relationships into orthologous groups. In our tests, SwiftOrtho was the only tool that completed orthology analysis of proteins from 1,760 bacterial genomes on a computer with only 4 GB RAM. Using various standard orthology data sets, we also show that SwiftOrtho has a high accuracy. CONCLUSIONS SwiftOrtho enables the accurate comparative genomic analyses of thousands of genomes using low-memory computers. SwiftOrtho is available at https://github.com/Rinoahu/SwiftOrtho.
Collapse
Affiliation(s)
- Xiao Hu
- Department of Veterinary Microbiology and Preventive Medicine, 2118 Veterinary Medicine, College of Veterinary Medicine, Iowa State University, Ames, IA, 50011, USA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, 2118 Veterinary Medicine, College of Veterinary Medicine, Iowa State University, Ames, IA, 50011, USA
| |
Collapse
|
14
|
Evolutionary Patterns of Non-Coding RNA in Cardiovascular Biology. Noncoding RNA 2019; 5:ncrna5010015. [PMID: 30709035 PMCID: PMC6468844 DOI: 10.3390/ncrna5010015] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 01/26/2019] [Accepted: 01/29/2019] [Indexed: 12/15/2022] Open
Abstract
Cardiovascular diseases (CVDs) affect the heart and the vascular system with a high prevalence and place a huge burden on society as well as the healthcare system. These complex diseases are often the result of multiple genetic and environmental risk factors and pose a great challenge to understanding their etiology and consequences. With the advent of next generation sequencing, many non-coding RNA transcripts, especially long non-coding RNAs (lncRNAs), have been linked to the pathogenesis of CVD. Despite increasing evidence, the proper functional characterization of most of these molecules is still lacking. The exploration of conservation of sequences across related species has been used to functionally annotate protein coding genes. In contrast, the rapid evolutionary turnover and weak sequence conservation of lncRNAs make it difficult to characterize functional homologs for these sequences. Recent studies have tried to explore other dimensions of interspecies conservation to elucidate the functional role of these novel transcripts. In this review, we summarize various methodologies adopted to explore the evolutionary conservation of cardiovascular non-coding RNAs at sequence, secondary structure, syntenic, and expression level.
Collapse
|
15
|
Abstract
The distinction between orthologs and paralogs, genes that started diverging by speciation versus duplication, is relevant in a wide range of contexts, most notably phylogenetic tree inference and protein function annotation. In this chapter, we provide an overview of the methods used to infer orthology and paralogy. We survey both graph-based approaches (and their various grouping strategies) and tree-based approaches, which solve the more general problem of gene/species tree reconciliation. We discuss conceptual differences among the various orthology inference methods and databases and examine the difficult issue of verifying and benchmarking orthology predictions. Finally, we review typical applications of orthologous genes, groups, and reconciled trees and conclude with thoughts on future methodological developments.
Collapse
|
16
|
OrthoList 2: A New Comparative Genomic Analysis of Human and Caenorhabditis elegans Genes. Genetics 2018; 210:445-461. [PMID: 30120140 DOI: 10.1534/genetics.118.301307] [Citation(s) in RCA: 176] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Accepted: 08/15/2018] [Indexed: 11/18/2022] Open
Abstract
OrthoList, a compendium of Caenorhabditis elegans genes with human orthologs compiled in 2011 by a meta-analysis of four orthology-prediction methods, has been a popular tool for identifying conserved genes for research into biological and disease mechanisms. However, the efficacy of orthology prediction depends on the accuracy of gene-model predictions, an ongoing process, and orthology-prediction algorithms have also been updated over time. Here we present OrthoList 2 (OL2), a new comparative genomic analysis between C. elegans and humans, and the first assessment of how changes over time affect the landscape of predicted orthologs between two species. Although we find that updates to the orthology-prediction methods significantly changed the landscape of C. elegans-human orthologs predicted by individual programs and-unexpectedly-reduced agreement among them, we also show that our meta-analysis approach "buffered" against changes in gene content. We show that adding results from more programs did not lead to many additions to the list and discuss reasons to avoid assigning "scores" based on support by individual orthology-prediction programs; the treatment of "legacy" genes no longer predicted by these programs; and the practical difficulties of updating due to encountering deprecated, changed, or retired gene identifiers. In addition, we consider what other criteria may support claims of orthology and alternative approaches to find potential orthologs that elude identification by these programs. Finally, we created a new web-based tool that allows for rapid searches of OL2 by gene identifiers, protein domains [InterPro and SMART (Simple Modular Architecture Research Tool], or human disease associations ([OMIM (Online Mendelian Inheritence in Man], and also includes available RNA-interference resources to facilitate potential translational cross-species studies.
Collapse
|
17
|
Galpert D, Fernández A, Herrera F, Antunes A, Molina-Ruiz R, Agüero-Chapin G. Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers. BMC Bioinformatics 2018; 19:166. [PMID: 29724166 PMCID: PMC5934817 DOI: 10.1186/s12859-018-2148-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 04/04/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. RESULTS The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. CONCLUSIONS The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.
Collapse
Affiliation(s)
- Deborah Galpert
- Departamento de Ciencia de la Computación, Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Alberto Fernández
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071, Granada, Spain
| | - Francisco Herrera
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071, Granada, Spain
| | - Agostinho Antunes
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal. .,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal. .,Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba.
| |
Collapse
|
18
|
Positive diversifying selection is a pervasive adaptive force throughout the Drosophila radiation. Mol Phylogenet Evol 2017; 112:230-243. [DOI: 10.1016/j.ympev.2017.04.023] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Revised: 04/26/2017] [Accepted: 04/26/2017] [Indexed: 01/02/2023]
|
19
|
Battenberg K, Lee EK, Chiu JC, Berry AM, Potter D. OrthoReD: a rapid and accurate orthology prediction tool with low computational requirement. BMC Bioinformatics 2017. [PMID: 28633662 PMCID: PMC5479036 DOI: 10.1186/s12859-017-1726-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Identifying orthologous genes is an initial step required for phylogenetics, and it is also a common strategy employed in functional genetics to find candidates for functionally equivalent genes across multiple species. At the same time, in silico orthology prediction tools often require large computational resources only available on computing clusters. Here we present OrthoReD, an open-source orthology prediction tool with accuracy comparable to published tools that requires only a desktop computer. The low computational resource requirement of OrthoReD is achieved by repeating orthology searches on one gene of interest at a time, thereby generating a reduced dataset to limit the scope of orthology search for each gene of interest. Results The output of OrthoReD was highly similar to the outputs of two other published orthology prediction tools, OrthologID and/or OrthoDB, for the three dataset tested, which represented three phyla with different ranges of species diversity and different number of genomes included. Median CPU time for ortholog prediction per gene by OrthoReD executed on a desktop computer was <15 min even for the largest dataset tested, which included all coding sequences of 100 bacterial species. Conclusions With high-throughput sequencing, unprecedented numbers of genes from non-model organisms are available with increasing need for clear information about their orthologies and/or functional equivalents in model organisms. OrthoReD is not only fast and accurate as an orthology prediction tool, but also gives researchers flexibility in the number of genes analyzed at a time, without requiring a high-performance computing cluster. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1726-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kai Battenberg
- Department of Plant Sciences, University of California, Davis, CA, USA.
| | - Ernest K Lee
- Department of Entomology and Nematology, University of California, Davis, CA, USA
| | - Joanna C Chiu
- Department of Entomology and Nematology, University of California, Davis, CA, USA
| | - Alison M Berry
- Department of Plant Sciences, University of California, Davis, CA, USA
| | - Daniel Potter
- Department of Plant Sciences, University of California, Davis, CA, USA
| |
Collapse
|
20
|
Kaduk M, Sonnhammer E. Improved orthology inference with Hieranoid 2. Bioinformatics 2017; 33:1154-1159. [PMID: 28096085 DOI: 10.1093/bioinformatics/btw774] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Accepted: 12/07/2016] [Indexed: 11/13/2022] Open
Abstract
Motivation The initial step in many orthology inference methods is the computationally demanding establishment of all pairwise protein similarities across all analysed proteomes. The quadratic scaling with proteomes has become a major bottleneck. A remedy is offered by the Hieranoid algorithm which reduces the complexity to linear by hierarchically aggregating ortholog groups from InParanoid along a species tree. Results We have further developed the Hieranoid algorithm in many ways. Major improvements have been made to the construction of multiple sequence alignments and consensus sequences. Hieranoid version 2 was evaluated with standard benchmarks that reveal a dramatic increase in the coverage/accuracy tradeoff over version 1, such that it now compares favourably with the best methods. The new parallelized cluster mode allows Hieranoid to be run on large data sets in a much shorter timespan than InParanoid, yet at similar accuracy. Contact mateusz.kaduk@scilifelab.se. Availability and Implementation Perl code freely available at http://hieranoid.sbc.su.se/ . Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mateusz Kaduk
- Department of Biochemistry and Biophysics, Stockholm University.,Science for Life Laboratory (SciLifeLab), Tomtebodavagen 23, Solna, Sweden
| | - Erik Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University.,Science for Life Laboratory (SciLifeLab), Tomtebodavagen 23, Solna, Sweden
| |
Collapse
|
21
|
Vahdati Nia B, Kang C, Tran MG, Lee D, Murakami S. Meta Analysis of Human AlzGene Database: Benefits and Limitations of Using C. elegans for the Study of Alzheimer's Disease and Co-morbid Conditions. Front Genet 2017; 8:55. [PMID: 28553317 PMCID: PMC5427079 DOI: 10.3389/fgene.2017.00055] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 04/19/2017] [Indexed: 11/21/2022] Open
Abstract
Human genome-wide association studies (GWAS) and linkage studies have identified 695 genes associated with Alzheimer's disease (AD), the vast majority of which are associated with late-onset AD. Although orthologs of these AD genes have been studied in several model species, orthologs in the nematode, Caenorhabditis elegans, remain incompletely identified, with orthologs to only 17 AD-related genes identified in the C. elegans database, WormBase. Therefore, we performed a comprehensive search for additional C. elegans orthologs of AD genes using well-established programs, including OrthoList, which utilizes four ontology prediction programs. We also validated 680 of the AD genes as a unique gene from the AlzGene database, including 431 genes (63%) that are predicted to have orthologs in C. elegans. Another 178 human AD genes (26%) were associated with one or more other neurological diseases, including amyotrophic lateral sclerosis, multiple sclerosis, Parkinson's disease, and schizophrenia. Of these, there were 105 genes (59%) with orthologs in C. elegans. Interestingly, three AD genes (ACE, TNF, and MTHFR) were associated with all four of the other neurological diseases. The human AD genes were enriched in three major ontology pathway groups, including lipoprotein metabolism, hemostasis, and extracellular matrix organizations, as well as in pathways that are amyloid related (NOTCH signaling) and associated with neural (neurotransmitter clearance) and immune (advanced glycation end-product receptor signaling and TRAF6-NF-kappaB) systems. Thus, the results from this study provide a potentially useful system for assessing comorbidities that may be associated with late-onset AD and other neurological conditions. The technical advantages and limitations of the ortholog searches are further discussed.
Collapse
Affiliation(s)
- Behrad Vahdati Nia
- Department of Basic Sciences, College of Osteopathic Medicine, Touro University CaliforniaVallejo, CA, USA
| | - Christine Kang
- Department of Basic Sciences, College of Osteopathic Medicine, Touro University CaliforniaVallejo, CA, USA
| | - Michelle G Tran
- Department of Basic Sciences, College of Osteopathic Medicine, Touro University CaliforniaVallejo, CA, USA
| | - Deborah Lee
- Department of Basic Sciences, College of Osteopathic Medicine, Touro University CaliforniaVallejo, CA, USA
| | - Shin Murakami
- Department of Basic Sciences, College of Osteopathic Medicine, Touro University CaliforniaVallejo, CA, USA
| |
Collapse
|
22
|
Ambrosino L, Chiusano ML. Transcriptologs: A Transcriptome-Based Approach to Predict Orthology Relationships. Bioinform Biol Insights 2017; 11:1177932217690136. [PMID: 28469416 PMCID: PMC5348085 DOI: 10.1177/1177932217690136] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Accepted: 12/17/2016] [Indexed: 12/17/2022] Open
Abstract
The detection of orthologs is a key approach in genomics, useful to understand gene evolution and phylogenetic relationships and essential for gene function prediction. However, a reliable annotation of the encoded protein regions is still a limiting aspect in genomics, mainly due to the lack of confirmatory experimental evidence at proteome level. Nevertheless, the current ortholog collections are generally based on protein sequence comparisons, in addition to the availability of large transcriptome sequence collections. We developed Transcriptologs, a method for the prediction of orthologs based on similarities of translated fragments from messenger RNAs of 2 species. We implemented a procedure to extend BLAST-based alignments and to define orthologs based on the Bidirectional Best Hit approach. Results from a test case on Arabidopsis thaliana and Sorghum bicolor transcript collections revealed in some cases outperformance of Transcriptologs in comparison with a classical protein-based analysis in terms of alignment quality, revealing similarities otherwise not detectable.
Collapse
Affiliation(s)
- Luca Ambrosino
- Department of Agriculture, University of Naples "Federico II," Portici, Italy
| | - Maria Luisa Chiusano
- Department of Agriculture, University of Naples "Federico II," Portici, Italy.,Research Infrastructures for Marine Biological Resources (RIMAR), Stazione Zoologica Anton Dohrn Napoli, Naples, Italy
| |
Collapse
|
23
|
FLAGdb ++: A Bioinformatic Environment to Study and Compare Plant Genomes. Methods Mol Biol 2016. [PMID: 27987165 DOI: 10.1007/978-1-4939-6658-5_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Today, the growing knowledge and data accumulation on plant genomes do not solve in a simple way the task of gene function inference. Because data of different types are coming from various sources, we need to integrate and analyze them to help biologists in this task. We created FLAGdb++ ( http://tools.ips2.u-psud.fr/FLAGdb ) to take up this challenge for a selection of plant genomes. In order to enrich gene function predictions, structural and functional annotations of the genomes are explored to generate meta-data and to compare them. Since data are numerous and complex, we focused on accessibility and visualization with an original and user-friendly interface. In this chapter we present the main tools of FLAGdb++ and a use-case to explore a gene family: structural and functional properties of this family and research of orthologous genes in the other plant genomes.
Collapse
|
24
|
Kristensen DM, Wolf YI, Koonin EV. ATGC database and ATGC-COGs: an updated resource for micro- and macro-evolutionary studies of prokaryotic genomes and protein family annotation. Nucleic Acids Res 2016; 45:D210-D218. [PMID: 28053163 PMCID: PMC5210634 DOI: 10.1093/nar/gkw934] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 10/05/2016] [Accepted: 10/12/2016] [Indexed: 11/14/2022] Open
Abstract
The Alignable Tight Genomic Clusters (ATGCs) database is a collection of closely related bacterial and archaeal genomes that provides several tools to aid research into evolutionary processes in the microbial world. Each ATGC is a taxonomy-independent cluster of 2 or more completely sequenced genomes that meet the objective criteria of a high degree of local gene order (synteny) and a small number of synonymous substitutions in the protein-coding genes. As such, each ATGC is suited for analysis of microevolutionary variations within a cohesive group of organisms (e.g. species), whereas the entire collection of ATGCs is useful for macroevolutionary studies. The ATGC database includes many forms of pre-computed data, in particular ATGC-COGs (Clusters of Orthologous Genes), multiple sequence alignments, a set of ‘index’ orthologs representing the most well-conserved members of each ATGC-COG, the phylogenetic tree of the organisms within each ATGC, etc. Although the ATGC database contains several million proteins from thousands of genomes organized into hundreds of clusters (roughly a 4-fold increase since the last version of the ATGC database), it is now built with completely automated methods and will be regularly updated following new releases of the NCBI RefSeq database. The ATGC database is hosted jointly at the University of Iowa at dmk-brain.ecn.uiowa.edu/ATGC/ and the NCBI at ftp.ncbi.nlm.nih.gov/pub/kristensen/ATGC/atgc_home.html.
Collapse
Affiliation(s)
- David M Kristensen
- Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA .,National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, Bethesda, MD 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, Bethesda, MD 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, Bethesda, MD 20894, USA
| |
Collapse
|
25
|
Standardized benchmarking in the quest for orthologs. Nat Methods 2016; 13:425-30. [PMID: 27043882 PMCID: PMC4827703 DOI: 10.1038/nmeth.3830] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 03/09/2016] [Indexed: 11/23/2022]
Abstract
Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision–recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.
Collapse
|
26
|
Tekaia F. Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016; 9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]
Abstract
With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|
27
|
Cheng Y, Perocchi F. ProtPhylo: identification of protein-phenotype and protein-protein functional associations via phylogenetic profiling. Nucleic Acids Res 2015; 43:W160-8. [PMID: 25956654 PMCID: PMC4489284 DOI: 10.1093/nar/gkv455] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 04/24/2015] [Indexed: 01/17/2023] Open
Abstract
ProtPhylo is a web-based tool to identify proteins that are functionally linked to either a phenotype or a protein of interest based on co-evolution. ProtPhylo infers functional associations by comparing protein phylogenetic profiles (co-occurrence patterns of orthology relationships) for more than 9.7 million non-redundant protein sequences from all three domains of life. Users can query any of 2048 fully sequenced organisms, including 1678 bacteria, 255 eukaryotes and 115 archaea. In addition, they can tailor ProtPhylo to a particular kind of biological question by choosing among four main orthology inference methods based either on pair-wise sequence comparisons (One-way Best Hits and Best Reciprocal Hits) or clustering of orthologous proteins across multiple species (OrthoMCL and eggNOG). Next, ProtPhylo ranks phylogenetic neighbors of query proteins or phenotypic properties using the Hamming distance as a measure of similarity between pairs of phylogenetic profiles. Candidate hits can be easily and flexibly prioritized by complementary clues on subcellular localization, known protein–protein interactions, membrane spanning regions and protein domains. The resulting protein list can be quickly exported into a csv text file for further analyses. ProtPhylo is freely available at http://www.protphylo.org.
Collapse
Affiliation(s)
- Yiming Cheng
- Gene Center, Ludwig-Maximilians-University, Munich, Bavaria 81377, Germany Institute of Human Genetics, Helmholtz Zentrum München, Neuherberg, Bavaria 85764, Germany
| | - Fabiana Perocchi
- Gene Center, Ludwig-Maximilians-University, Munich, Bavaria 81377, Germany Institute of Human Genetics, Helmholtz Zentrum München, Neuherberg, Bavaria 85764, Germany
| |
Collapse
|
28
|
Qiu Y, Liu SL, Adams KL. Frequent changes in expression profile and accelerated sequence evolution of duplicated imprinted genes in arabidopsis. Genome Biol Evol 2015; 6:1830-42. [PMID: 25115008 PMCID: PMC4122942 DOI: 10.1093/gbe/evu144] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Eukaryotic genomes have large numbers of duplicated genes that can evolve new functions or expression patterns by changes in coding and regulatory sequences, referred to as neofunctionalization. In flowering plants, some duplicated genes are imprinted in the endosperm, where only one allele is expressed depending on its parental origin. We found that 125 imprinted genes in Arabidopsis arose from gene duplication events during the evolution of the Brassicales. Analyses of 46 gene pairs duplicated by an ancient whole-genome duplication (alpha WGD) indicated that many imprinted genes show an accelerated rate of amino acid changes compared with their paralogs. Analyses of microarray expression data from 63 organ types and developmental stages indicated that many imprinted genes have expression patterns restricted to flowers and/or seeds in contrast to their broadly expressed paralogs. Assays of expression in orthologs from outgroup species revealed that some imprinted genes have acquired an organ-specific expression pattern restricted to flowers and/or seeds. The changes in expression pattern and the accelerated sequence evolution in the imprinted genes suggest that some of them may have undergone neofunctionalization. The imprinted genes MPC, HOMEODOMAIN GLABROUS6 (HDG6), and HDG3 are particularly interesting cases that have different functions from their paralogs. This study indicates that a large number of imprinted genes in Arabidopsis are evolutionarily recent duplicates and that many of them show changes in expression profiles and accelerated sequence evolution. Acquisition of imprinting is a mode of duplicate gene divergence in plants that is more common than previously thought.
Collapse
Affiliation(s)
- Yichun Qiu
- Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada
| | - Shao-Lun Liu
- Department of Life Science, Tunghai University, Taichung, Taiwan
| | - Keith L. Adams
- Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada
- *Corresponding author: E-mail:
| |
Collapse
|
29
|
Rock, paper, scissors: harnessing complementarity in ortholog detection methods improves comparative genomic inference. G3-GENES GENOMES GENETICS 2015; 5:629-38. [PMID: 25711833 PMCID: PMC4390578 DOI: 10.1534/g3.115.017095] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Ortholog detection (OD) is a lynchpin of most statistical methods in comparative genomics. This task involves accurately identifying genes across species that descend from a common ancestral sequence. OD methods comprise a wide variety of approaches, each with their own benefits and costs under a variety of evolutionary and practical scenarios. In this article, we examine the proteomes of ten mammals by using four methodologically distinct, rigorously filtered OD methods. In head-to-head comparisons, we find that these algorithms significantly outperform one another for 38–45% of the genes analyzed. We leverage this high complementarity through the development MOSAIC, or Multiple Orthologous Sequence Analysis and Integration by Cluster optimization, the first tool for integrating methodologically diverse OD methods. Relative to the four methods examined, MOSAIC more than quintuples the number of alignments for which all species are present while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality. Further, this improvement in alignment quality yields more confidently aligned sites and higher levels of overall conservation, while simultaneously detecting of up to 180% more positively selected sites. We close by highlighting a MOSAIC-specific positively selected sites near the active site of TPSAB1, an enzyme linked to asthma, heart disease, and irritable bowel disease. MOSAIC alignments, source code, and full documentation are available at http://pythonhosted.org/bio-MOSAIC.
Collapse
|
30
|
Sonnhammer ELL, Östlund G. InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic Acids Res 2014; 43:D234-9. [PMID: 25429972 PMCID: PMC4383983 DOI: 10.1093/nar/gku1203] [Citation(s) in RCA: 345] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The InParanoid database (http://InParanoid.sbc.su.se) provides a user interface to orthologs inferred by the InParanoid algorithm. As there are now international efforts to curate and standardize complete proteomes, we have switched to using these resources rather than gathering and curating the proteomes ourselves. InParanoid release 8 is based on the 66 reference proteomes that the ‘Quest for Orthologs’ community has agreed on using, plus 207 additional proteomes from the UniProt complete proteomes—in total 273 species. These represent 246 eukaryotes, 20 bacteria and seven archaea. Compared to the previous release, this increases the number of species by 173% and the number of pairwise species comparisons by 650%. In turn, the number of ortholog groups has increased by 423%. We present the contents and usages of InParanoid 8, and a detailed analysis of how the proteome content has changed since the previous release.
Collapse
Affiliation(s)
- Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden
| | - Gabriel Östlund
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden
| |
Collapse
|
31
|
Trachana K, Forslund K, Larsson T, Powell S, Doerks T, von Mering C, Bork P. A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation. PLoS One 2014; 9:e111122. [PMID: 25369365 PMCID: PMC4219706 DOI: 10.1371/journal.pone.0111122] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 09/23/2014] [Indexed: 11/19/2022] Open
Abstract
Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a “core” species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.
Collapse
Affiliation(s)
- Kalliopi Trachana
- Institute for Systems Biology, Seattle, WA, United States of America
| | - Kristoffer Forslund
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Tomas Larsson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Sean Powell
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Tobias Doerks
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Christian von Mering
- Institute of Molecular Life Sciences, University of Zurich and Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Max-Delbruck-Centre for Molecular Medicine, Berlin, Germany
- * E-mail:
| |
Collapse
|
32
|
Hwang S, Kim E, Yang S, Marcotte EM, Lee I. MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network. Nucleic Acids Res 2014; 42:W147-53. [PMID: 24861622 PMCID: PMC4086117 DOI: 10.1093/nar/gku434] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Despite recent advances in human genetics, model organisms are indispensable for human disease research. Most human disease pathways are evolutionally conserved among other species, where they may phenocopy the human condition or be associated with seemingly unrelated phenotypes. Much of the known gene-to-phenotype association information is distributed across diverse databases, growing rapidly due to new experimental techniques. Accessible bioinformatics tools will therefore facilitate translation of discoveries from model organisms into human disease biology. Here, we present a web-based discovery tool for human disease studies, MORPHIN (model organisms projected on a human integrated gene network), which prioritizes the most relevant human diseases for a given set of model organism genes, potentially highlighting new model systems for human diseases and providing context to model organism studies. Conceptually, MORPHIN investigates human diseases by an orthology-based projection of a set of model organism genes onto a genome-scale human gene network. MORPHIN then prioritizes human diseases by relevance to the projected model organism genes using two distinct methods: a conventional overlap-based gene set enrichment analysis and a network-based measure of closeness between the query and disease gene sets capable of detecting associations undetectable by the conventional overlap-based methods. MORPHIN is freely accessible at http://www.inetbio.org/morphin.
Collapse
Affiliation(s)
- Sohyun Hwang
- Department of Biotechnology, Yonsei University, Seoul, 120-749, Korea Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, TX 78712, USA
| | - Eiru Kim
- Department of Biotechnology, Yonsei University, Seoul, 120-749, Korea
| | - Sunmo Yang
- Department of Biotechnology, Yonsei University, Seoul, 120-749, Korea
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, TX 78712, USA
| | - Insuk Lee
- Department of Biotechnology, Yonsei University, Seoul, 120-749, Korea
| |
Collapse
|
33
|
Scheider J, Afonso-Grunz F, Hoffmeier K, Horres R, Groher F, Rycak L, Oehlmann J, Winter P. Gene expression of chicken gonads is sex- and side-specific. Sex Dev 2014; 8:178-91. [PMID: 24820130 DOI: 10.1159/000362259] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/05/2013] [Indexed: 11/19/2022] Open
Abstract
In chicken, the left and right female gonads undergo a completely different program during development. To learn more about the molecular factors underlying side-specific development and to identify potential sex- and side-specific genes in developing gonads, we separately performed next-generation sequencing-based deepSuperSAGE transcription profiling from left and right, female and male gonads of 19-day-old chicken embryos. A total of 836 transcript variants were significantly differentially expressed (p < 10(-5)) between combined male and female gonads. Left-right comparison revealed 1,056 and 822 differentially (p < 10(-5)) expressed transcript variants for male and female gonads, respectively, of which 72 are side-specific in both sexes. At least some of these may represent key players for lateral development in birds. Additionally, several genes with laterally differential expression in the ovaries seem to determine female gonads for growth or regression, whereas right-left differences in testes are mostly limited to the differentially expressed genes present in both sexes. With a few exceptions, side-specific genes are not located on the sex chromosomes. The large differences in lateral gene expression in the ovaries in almost all metabolic pathways suggest that the regressing right gonad might have undergone a change of function during evolution.
Collapse
Affiliation(s)
- Jessica Scheider
- Institute for Ecology, Evolution and Diversity, Goethe University Frankfurt am Main, Frankfurt/M., Germany
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Dalquen DA, Dessimoz C. Bidirectional best hits miss many orthologs in duplication-rich clades such as plants and animals. Genome Biol Evol 2014; 5:1800-6. [PMID: 24013106 PMCID: PMC3814191 DOI: 10.1093/gbe/evt132] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Bidirectional best hits (BBH), which entails identifying the pairs of genes in two different genomes that are more similar to each other than either is to any other gene in the other genome, is a simple and widely used method to infer orthology. A recent study has analyzed the link between BBH and orthology in bacteria and archaea and concluded that, given the very high consistency in BBH they observed among triplets of neighboring genes, a high proportion of BBH are likely to be bona fide orthologs. However, limited by their analysis setup, the previous study could not easily test the reverse question: which proportion of orthologs are BBH? In this follow-up study, we consider this question in theory and answer it based on conceptual arguments, simulated data, and real biological data from all three domains of life. Our analyses corroborate the findings of the previous study, but also show that because of the high rate of gene duplication in plants and animals, as much as 60% of orthologous relations are missed by the BBH criterion.
Collapse
Affiliation(s)
- Daniel A Dalquen
- Computational Biochemistry Research Group, ETH Zurich, Zürich, Switzerland
| | | |
Collapse
|
35
|
Zhang W, Kunte K, Kronforst MR. Genome-wide characterization of adaptation and speciation in tiger swallowtail butterflies using de novo transcriptome assemblies. Genome Biol Evol 2013; 5:1233-45. [PMID: 23737327 PMCID: PMC3698933 DOI: 10.1093/gbe/evt090] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Hybrid speciation appears to be rare in animals, yet characterization of possible examples offers to shed light on the genomic consequences of this unique phenomenon, as well as more general processes such as the role of adaptation in speciation. Here, we first generate transcriptome assemblies for a putative hybrid butterfly species, Papilio appalachiensis, its parental species, P. glaucus and P. canadensis, and an outgroup, P. polytes. Then, we use these data to infer genome-wide patterns of introgression and genomic mosaicism using both phylogenetic and population genetic approaches. Our results reveal that there is little genetic divergence among all three of the focal species, but the subset of gene trees that strongly support a specific tree topology suggest widespread sharing of genetic variation between P. appalachiensis and both parental species, likely as a result of hybrid speciation. We also find evidence for substantial shared genetic variation between P. glaucus and P. canadensis, which may be due to gene flow or ancestral variation. Consistent with previous work, we show that P. applachiensis is more similar to P. canadensis at Z-linked genes and more similar to P. glaucus at mitochondrial genes. We also identify a variety of targets of adaptive evolution, which appear to be enriched for traits that are likely to be important in the evolution of this butterfly system, such as pigmentation, hormone sensitivity, developmental processes, and cuticle formation. Overall, our results provide a genome-wide portrait of divergence and introgression associated with adaptation and speciation in an iconic butterfly radiation.
Collapse
Affiliation(s)
- Wei Zhang
- Department of Ecology & Evolution, University of Chicago, USA
| | | | | |
Collapse
|
36
|
Zhang J, Franks RG, Liu X, Kang M, Keebler JEM, Schaff JE, Huang HW, Xiang QY(J. De novo sequencing, characterization, and comparison of inflorescence transcriptomes of Cornus canadensis and C. florida (Cornaceae). PLoS One 2013; 8:e82674. [PMID: 24386108 PMCID: PMC3873919 DOI: 10.1371/journal.pone.0082674] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2013] [Accepted: 10/25/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Transcriptome sequencing analysis is a powerful tool in molecular genetics and evolutionary biology. Here we report the results of de novo 454 sequencing, characterization, and comparison of inflorescence transcriptomes of two closely related dogwood species, Cornus canadensis and C. florida (Cornaceae). Our goals were to build a preliminary source of genome sequence data, and to identify genes potentially expressed differentially between the inflorescence transcriptomes for these important horticultural species. RESULTS The sequencing of cDNAs from inflorescence buds of C. canadensis (cc) and C. florida (cf), and normalized cDNAs from leaves of C. canadensis resulted in 251799 (ccBud), 96245 (ccLeaf) and 114648 (cfBud) raw reads, respectively. The de novo assembly of the high quality (HQ) reads resulted in 36088, 17802 and 21210 unigenes for ccBud, ccLeaf and cfBud. A reference transcriptome for C. canadensis was built by assembling HQ reads of ccBud and ccLeaf, containing 40884 unigenes. Reference mapping and comparative analyses found 10926 sequences were putatively specific to ccBud, and 6979 putatively specific to cfBud. Putative differentially expressed genes between ccBud and cfBud that are related to flower development and/or stress response were identified among 7718 shared sequences by ccBud and cfBud. Bi-directional BLAST found 87 (41.83% of 208) of Arabidopsis genes related to inflorescence development had putative orthologs in the dogwood transcriptomes. Comparisons of the shared sequences by ccBud and cfBud yielded 65931 high quality SNPs between two species. The twenty unigenes with the most SNPs are listed as potential genetic markers for evolutionary studies. CONCLUSIONS The data provide an important, although preliminary, information platform for functional genomics and evolutionary developmental biology in Cornus. The study identified putative candidates potentially involved in the genetic regulation of inflorescence evolution and/or disease resistance in dogwoods for future analyses. Results of the study also provide markers useful for dogwood phylogenomic studies.
Collapse
Affiliation(s)
- Jian Zhang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, P.R. China
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Robert G. Franks
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Xiang Liu
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Ming Kang
- CAS Key Laboratory of Plant Resource Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, P.R. China
| | - Jonathan E. M. Keebler
- Bioinformatics Analyst and Consultant Genomic Sciences Laboratory, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Jennifer E. Schaff
- Bioinformatics Analyst and Consultant Genomic Sciences Laboratory, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Hong-Wen Huang
- CAS Key Laboratory of Plant Resource Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, P.R. China
- * E-mail: (QX); (HH)
| | - Qiu-Yun (Jenny) Xiang
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, North Carolina, United States of America
- * E-mail: (QX); (HH)
| |
Collapse
|
37
|
Immanen J, Nieminen K, Duchens Silva H, Rodríguez Rojas F, Meisel LA, Silva H, Albert VA, Hvidsten TR, Helariutta Y. Characterization of cytokinin signaling and homeostasis gene families in two hardwood tree species: Populus trichocarpa and Prunus persica. BMC Genomics 2013; 14:885. [PMID: 24341635 PMCID: PMC3866579 DOI: 10.1186/1471-2164-14-885] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 11/27/2013] [Indexed: 01/01/2023] Open
Abstract
Background Through the diversity of cytokinin regulated processes, this phytohormone has a profound impact on plant growth and development. Cytokinin signaling is involved in the control of apical and lateral meristem activity, branching pattern of the shoot, and leaf senescence. These processes influence several traits, including the stem diameter, shoot architecture, and perennial life cycle, which define the development of woody plants. To facilitate research about the role of cytokinin in regulation of woody plant development, we have identified genes associated with cytokinin signaling and homeostasis pathways from two hardwood tree species. Results Taking advantage of the sequenced black cottonwood (Populus trichocarpa) and peach (Prunus persica) genomes, we have compiled a comprehensive list of genes involved in these pathways. We identified genes belonging to the six families of cytokinin oxidases (CKXs), isopentenyl transferases (IPTs), LONELY GUY genes (LOGs), two-component receptors, histidine containing phosphotransmitters (HPts), and response regulators (RRs). All together 85 Populus and 45 Prunus genes were identified, and compared to their Arabidopsis orthologs through phylogenetic analyses. Conclusions In general, when compared to Arabidopsis, differences in gene family structure were often seen in only one of the two tree species. However, one class of genes associated with cytokinin signal transduction, the CKI1-like family of two-component histidine kinases, was larger in both Populus and Prunus than in Arabidopsis.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Ykä Helariutta
- Institute of Biotechnology and Department of Biosciences, University of Helsinki, FI-00014 Helsinki, Finland.
| |
Collapse
|
38
|
Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, Huerta-Cepas J, Gabaldón T, Rattei T, Creevey C, Kuhn M, Jensen LJ, von Mering C, Bork P. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res 2013; 42:D231-9. [PMID: 24297252 PMCID: PMC3964997 DOI: 10.1093/nar/gkt1253] [Citation(s) in RCA: 422] [Impact Index Per Article: 38.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.
Collapse
Affiliation(s)
- Sean Powell
- European Molecular Biology Laboratory, Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany, University of Zurich and Swiss Institute of Bioinformatics, Institute of Molecular Life Sciences, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109-5234, USA, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), C/Dr. Aiguader 88, 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, CUBE-Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Althanstraße 14, 1090 Vienna, Austria, Institute of Biological, Environmental & Rural Sciences, Aberystwyth University, Penglais, Aberystwyth, Ceredigion, SY23 3FG, UK, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200, Copenhagen N, Denmark and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Cho YJ, Yi H, Chun J, Cho SN, Daley CL, Koh WJ, Jae Shin S. The genome sequence of 'Mycobacterium massiliense' strain CIP 108297 suggests the independent taxonomic status of the Mycobacterium abscessus complex at the subspecies level. PLoS One 2013; 8:e81560. [PMID: 24312320 PMCID: PMC3842311 DOI: 10.1371/journal.pone.0081560] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Accepted: 10/23/2013] [Indexed: 11/18/2022] Open
Abstract
Members of the Mycobacterium abscessus complex are rapidly growing mycobacteria that are emerging as human pathogens. The M. abscessus complex was previously composed of three species, namely M. abscessus sensu stricto, 'M. massiliense', and 'M. bolletii'. In 2011, 'M. massiliense' and 'M. bolletii' were united and reclassified as a single subspecies within M. abscessus: M. abscessus subsp. bolletii. However, the placement of 'M. massiliense' within the boundary of M. abscessus subsp. bolletii remains highly controversial with regard to clinical aspects. In this study, we revisited the taxonomic status of members of the M. abscessus complex based on comparative analysis of the whole-genome sequences of 53 strains. The genome sequence of the previous type strain of 'Mycobacterium massiliense' (CIP 108297) was determined using next-generation sequencing. The genome tree based on average nucleotide identity (ANI) values supported the differentiation of 'M. bolletii' and 'M. massiliense' at the subspecies level. The genome tree also clearly illustrated that 'M. bolletii' and 'M. massiliense' form a distinct phylogenetic clade within the radiation of the M. abscessus complex. The genomic distances observed in this study suggest that the current M. abscessus subsp. bolletii taxon should be divided into two subspecies, M. abscessus subsp. massiliense subsp. nov. and M. abscessus subsp. bolletii, to correspondingly accommodate the previously known 'M. massiliense' and 'M. bolletii' strains.
Collapse
Affiliation(s)
- Yong-Joon Cho
- Chunlab, Inc., Seoul National University, Seoul, Korea
| | - Hana Yi
- Department of Public Health Sciences, Graduate School, Korea University, Seoul, Korea
- Korea University Guro Hospital, Korea University College of Medicine, Seoul, Korea
| | - Jongsik Chun
- Chunlab, Inc., Seoul National University, Seoul, Korea
| | - Sang-Nae Cho
- Department of Microbiology and Institute for Immunology and Immunological Diseases, Yonsei University College of Medicine, Seoul, Korea
| | - Charles L. Daley
- Division of Mycobacterial and Respiratory Infections, Department of Medicine, National Jewish Health, Denver, Colorado, United States of America
| | - Won-Jung Koh
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Samsung Medical Center, Sungkyunkwan University, School of Medicine, Seoul, Korea
- * E-mail: (WJK); (SJS)
| | - Sung Jae Shin
- Department of Microbiology and Institute for Immunology and Immunological Diseases, Yonsei University College of Medicine, Seoul, Korea
- * E-mail: (WJK); (SJS)
| |
Collapse
|
40
|
Eisman RC, Kaufman TC. Probing the boundaries of orthology: the unanticipated rapid evolution of Drosophila centrosomin. Genetics 2013; 194:903-26. [PMID: 23749319 PMCID: PMC3730919 DOI: 10.1534/genetics.113.152546] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2013] [Accepted: 05/28/2013] [Indexed: 11/18/2022] Open
Abstract
The rapid evolution of essential developmental genes and their protein products is both intriguing and problematic. The rapid evolution of gene products with simple protein folds and a lack of well-characterized functional domains typically result in a low discovery rate of orthologous genes. Additionally, in the absence of orthologs it is difficult to study the processes and mechanisms underlying rapid evolution. In this study, we have investigated the rapid evolution of centrosomin (cnn), an essential gene encoding centrosomal protein isoforms required during syncytial development in Drosophila melanogaster. Until recently the rapid divergence of cnn made identification of orthologs difficult and questionable because Cnn violates many of the assumptions underlying models for protein evolution. To overcome these limitations, we have identified a group of insect orthologs and present conserved features likely to be required for the functions attributed to cnn in D. melanogaster. We also show that the rapid divergence of Cnn isoforms is apparently due to frequent coding sequence indels and an accelerated rate of intronic additions and eliminations. These changes appear to be buffered by multi-exon and multi-reading frame maximum potential ORFs, simple protein folds, and the splicing machinery. These buffering features also occur in other genes in Drosophila and may help prevent potentially deleterious mutations due to indels in genes with large coding exons and exon-dense regions separated by small introns. This work promises to be useful for future investigations of cnn and potentially other rapidly evolving genes and proteins.
Collapse
Affiliation(s)
- Robert C. Eisman
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| | - Thomas C. Kaufman
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| |
Collapse
|
41
|
Wolf YI, Koonin EV. A tight link between orthologs and bidirectional best hits in bacterial and archaeal genomes. Genome Biol Evol 2013; 4:1286-94. [PMID: 23160176 PMCID: PMC3542571 DOI: 10.1093/gbe/evs100] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Orthologous relationships between genes are routinely inferred from bidirectional best hits (BBH) in pairwise genome comparisons. However, to our knowledge, it has never been quantitatively demonstrated that orthologs form BBH. To test this “BBH-orthology conjecture,” we take advantage of the operon organization of bacterial and archaeal genomes and assume that, when two genes in compared genomes are flanked by two BBH show statistically significant sequence similarity to one another, these genes are bona fide orthologs. Under this assumption, we tested whether middle genes in “syntenic orthologous gene triplets” form BBH. We found that this was the case in more than 95% of the syntenic gene triplets in all genome comparisons. A detailed examination of the exceptions to this pattern, including maximum likelihood phylogenetic tree analysis, showed that some of these deviations involved artifacts of genome annotation, whereas very small fractions represented random assignment of the best hit to one of closely related in-paralogs, paralogous displacement in situ, or even less frequent genuine violations of the BBH–orthology conjecture caused by acceleration of evolution in one of the orthologs. We conclude that, at least in prokaryotes, genes for which independent evidence of orthology is available typically form BBH and, conversely, BBH can serve as a strong indication of gene orthology.
Collapse
|
42
|
Kim E, Kim H, Lee I. JiffyNet: a web-based instant protein network modeler for newly sequenced species. Nucleic Acids Res 2013; 41:W192-7. [PMID: 23685435 PMCID: PMC3692116 DOI: 10.1093/nar/gkt419] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Revolutionary DNA sequencing technology has enabled affordable genome sequencing for numerous species. Thousands of species already have completely decoded genomes, and tens of thousands more are in progress. Naturally, parallel expansion of the functional parts list library is anticipated, yet genome-level understanding of function also requires maps of functional relationships, such as functional protein networks. Such networks have been constructed for many sequenced species including common model organisms. Nevertheless, the majority of species with sequenced genomes still have no protein network models available. Moreover, biologists might want to obtain protein networks for their species of interest on completion of the genome projects. Therefore, there is high demand for accessible means to automatically construct genome-scale protein networks based on sequence information from genome projects only. Here, we present a public web server, JiffyNet, specifically designed to instantly construct genome-scale protein networks based on associalogs (functional associations transferred from a template network by orthology) for a query species with only protein sequences provided. Assessment of the networks by JiffyNet demonstrated generally high predictive ability for pathway annotations. Furthermore, JiffyNet provides network visualization and analysis pages for wide variety of molecular concepts to facilitate network-guided hypothesis generation. JiffyNet is freely accessible at http://www.jiffynet.org.
Collapse
Affiliation(s)
- Eiru Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, 120-749, Korea
| | | | | |
Collapse
|
43
|
Vollmers J, Voget S, Dietrich S, Gollnow K, Smits M, Meyer K, Brinkhoff T, Simon M, Daniel R. Poles apart: Arctic and Antarctic Octadecabacter strains share high genome plasticity and a new type of xanthorhodopsin. PLoS One 2013; 8:e63422. [PMID: 23671678 PMCID: PMC3646047 DOI: 10.1371/journal.pone.0063422] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Accepted: 04/03/2013] [Indexed: 12/11/2022] Open
Abstract
The genus Octadecabacter is a member of the ubiquitous marine Roseobacter clade. The two described species of this genus, Octadecabacter arcticus and Octadecabacter antarcticus, are psychrophilic and display a bipolar distribution. Here we provide the manually annotated and finished genome sequences of the type strains O. arcticus 238 and O. antarcticus 307, isolated from sea ice of the Arctic and Antarctic, respectively. Both genomes exhibit a high genome plasticity caused by an unusually high density and diversity of transposable elements. This could explain the discrepancy between the low genome synteny and high 16S rRNA gene sequence similarity between both strains. Numerous characteristic features were identified in the Octadecabacter genomes, which show indications of horizontal gene transfer and may represent specific adaptations to the habitats of the strains. These include a gene cluster encoding the synthesis and degradation of cyanophycin in O. arcticus 238, which is absent in O. antarcticus 307 and unique among the Roseobacter clade. Furthermore, genes representing a new subgroup of xanthorhodopsins as an adaptation to icy environments are present in both Octadecabacter strains. This new xanthorhodopsin subgroup differs from the previously characterized xanthorhodopsins of Salinibacter ruber and Gloeobacter violaceus in phylogeny, biogeography and the potential to bind 4-keto-carotenoids. Biochemical characterization of the Octadecabacter xanthorhodopsins revealed that they function as light-driven proton pumps.
Collapse
Affiliation(s)
- John Vollmers
- Department of Genomic and Applied Microbiology and Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August University of Göttingen, Göttingen, Germany
| | - Sonja Voget
- Department of Genomic and Applied Microbiology and Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August University of Göttingen, Göttingen, Germany
| | - Sascha Dietrich
- Department of Genomic and Applied Microbiology and Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August University of Göttingen, Göttingen, Germany
| | - Kathleen Gollnow
- Department of Genomic and Applied Microbiology and Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August University of Göttingen, Göttingen, Germany
| | - Maike Smits
- Department of Genomic and Applied Microbiology and Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August University of Göttingen, Göttingen, Germany
| | - Katja Meyer
- Department of Genomic and Applied Microbiology and Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August University of Göttingen, Göttingen, Germany
| | - Thorsten Brinkhoff
- Institute for Chemistry and Biology of the Marine Environment, University of Oldenburg, Oldenburg, Germany
| | - Meinhard Simon
- Institute for Chemistry and Biology of the Marine Environment, University of Oldenburg, Oldenburg, Germany
| | - Rolf Daniel
- Department of Genomic and Applied Microbiology and Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August University of Göttingen, Göttingen, Germany
- * E-mail:
| |
Collapse
|
44
|
Zorzan S, Lorenzetto E, Ettorre M, Pontelli V, Laudanna C, Buffelli M. HOMECAT: consensus homologs mapping for interspecific knowledge transfer and functional genomic data integration. ACTA ACUST UNITED AC 2013; 29:1574-6. [PMID: 23620364 DOI: 10.1093/bioinformatics/btt189] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
MOTIVATION Comparative studies are encouraged by the fast increase of data availability from the latest high-throughput techniques, in particular from functional genomic studies. Yet, the size of datasets, the challenge of complete orthologs findings and not last, the variety of identification formats, make information integration challenging. With HOMECAT, we aim to facilitate cross-species relationship identification and data mapping, by combining orthology predictions from several publicly available sources, a convenient interface for high-throughput data download and automatic identifier conversion into a Cytoscape plug-in, that provides both an integration with a large set of bioinformatics tools, as well as a user-friendly interface. AVAILABILITY HOMECAT and the Supplementary Materials are freely available at http://www.cbmc.it/homecat/.
Collapse
Affiliation(s)
- Simone Zorzan
- Department of Neurological, Neuropsychological, Morphological and Motor Sciences, Section of Physiology, University of Verona, Strada le Grazie 8, 37134, Verona, Italy.
| | | | | | | | | | | |
Collapse
|
45
|
Chung WC, Chen LL, Lo WS, Lin CP, Kuo CH. Comparative analysis of the peanut witches'-broom phytoplasma genome reveals horizontal transfer of potential mobile units and effectors. PLoS One 2013; 8:e62770. [PMID: 23626855 PMCID: PMC3633829 DOI: 10.1371/journal.pone.0062770] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Accepted: 03/25/2013] [Indexed: 11/18/2022] Open
Abstract
Phytoplasmas are a group of bacteria that are associated with hundreds of plant diseases. Due to their economical importance and the difficulties involved in the experimental study of these obligate pathogens, genome sequencing and comparative analysis have been utilized as powerful tools to understand phytoplasma biology. To date four complete phytoplasma genome sequences have been published. However, these four strains represent limited phylogenetic diversity. In this study, we report the shotgun sequencing and evolutionary analysis of a peanut witches'-broom (PnWB) phytoplasma genome. The availability of this genome provides the first representative of the 16SrII group and substantially improves the taxon sampling to investigate genome evolution. The draft genome assembly contains 13 chromosomal contigs with a total size of 562,473 bp, covering ∼90% of the chromosome. Additionally, a complete plasmid sequence is included. Comparisons among the five available phytoplasma genomes reveal the differentiations in gene content and metabolic capacity. Notably, phylogenetic inferences of the potential mobile units (PMUs) in these genomes indicate that horizontal transfer may have occurred between divergent phytoplasma lineages. Because many effectors are associated with PMUs, the horizontal transfer of these transposon-like elements can contribute to the adaptation and diversification of these pathogens. In summary, the findings from this study highlight the importance of improving taxon sampling when investigating genome evolution. Moreover, the currently available sequences are inadequate to fully characterize the pan-genome of phytoplasmas. Future genome sequencing efforts to expand phylogenetic diversity are essential in improving our understanding of phytoplasma evolution.
Collapse
Affiliation(s)
- Wan-Chia Chung
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
| | - Ling-Ling Chen
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
| | - Wen-Sui Lo
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
- Molecular and Biological Agricultural Sciences Program, Taiwan International Graduate Program, National Chung Hsing University and Academia Sinica, Taipei, Taiwan
- Graduate Institute of Biotechnology, National Chung Hsing University, Taichung, Taiwan
| | - Chan-Pin Lin
- Department of Plant Pathology and Microbiology, National Taiwan University, Taipei, Taiwan
| | - Chih-Horng Kuo
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
- Molecular and Biological Agricultural Sciences Program, Taiwan International Graduate Program, National Chung Hsing University and Academia Sinica, Taipei, Taiwan
- Graduate Institute of Biotechnology, National Chung Hsing University, Taichung, Taiwan
- * E-mail:
| |
Collapse
|
46
|
Abstract
Orthologues and paralogues are types of homologous genes that are related by speciation or duplication, respectively. Orthologous genes are generally assumed to retain equivalent functions in different organisms and to share other key properties. Several recent comparative genomic studies have focused on testing these expectations. Here we discuss the complexity of the evolution of gene-phenotype relationships and assess the validity of the key implications of orthology and paralogy relationships as general statistical trends and guiding principles.
Collapse
|
47
|
Rawal HC, Singh NK, Sharma TR. Conservation, Divergence, and Genome-Wide Distribution of PAL and POX A Gene Families in Plants. Int J Genomics 2013; 2013:678969. [PMID: 23671845 PMCID: PMC3647544 DOI: 10.1155/2013/678969] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2012] [Revised: 01/04/2013] [Accepted: 01/11/2013] [Indexed: 01/03/2023] Open
Abstract
Genome-wide identification and phylogenetic and syntenic comparison were performed for the genes responsible for phenylalanine ammonia lyase (PAL) and peroxidase A (POX A) enzymes in nine plant species representing very diverse groups like legumes (Glycine max and Medicago truncatula), fruits (Vitis vinifera), cereals (Sorghum bicolor, Zea mays, and Oryza sativa), trees (Populus trichocarpa), and model dicot (Arabidopsis thaliana) and monocot (Brachypodium distachyon) species. A total of 87 and 1045 genes in PAL and POX A gene families, respectively, have been identified in these species. The phylogenetic and syntenic comparison along with motif distributions shows a high degree of conservation of PAL genes, suggesting that these genes may predate monocot/eudicot divergence. The POX A family genes, present in clusters at the subtelomeric regions of chromosomes, might be evolving and expanding with higher rate than the PAL gene family. Our analysis showed that during the expansion of POX A gene family, many groups and subgroups have evolved, resulting in a high level of functional divergence among monocots and dicots. These results will act as a first step toward the understanding of monocot/eudicot evolution and functional characterization of these gene families in the future.
Collapse
Affiliation(s)
| | | | - T. R. Sharma
- Genoinformatics Laboratory, National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute, Pusa Campus, New Delhi 110 012, India
| |
Collapse
|
48
|
Dalquen DA, Altenhoff AM, Gonnet GH, Dessimoz C. The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study. PLoS One 2013; 8:e56925. [PMID: 23451112 PMCID: PMC3581572 DOI: 10.1371/journal.pone.0056925] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Accepted: 01/16/2013] [Indexed: 11/19/2022] Open
Abstract
The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.
Collapse
Affiliation(s)
- Daniel A. Dalquen
- Eldgenössische Technische Hochschule Zurich, Department of Computer Science, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Zürich, Switzerland
| | - Adrian M. Altenhoff
- Eldgenössische Technische Hochschule Zurich, Department of Computer Science, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Zürich, Switzerland
| | - Gaston H. Gonnet
- Eldgenössische Technische Hochschule Zurich, Department of Computer Science, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Zürich, Switzerland
| | - Christophe Dessimoz
- Eldgenössische Technische Hochschule Zurich, Department of Computer Science, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Zürich, Switzerland
- European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
49
|
Whiteside MD, Winsor GL, Laird MR, Brinkman FSL. OrtholugeDB: a bacterial and archaeal orthology resource for improved comparative genomic analysis. Nucleic Acids Res 2012. [PMID: 23203876 PMCID: PMC3531125 DOI: 10.1093/nar/gks1241] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Prediction of orthologs (homologous genes that diverged because of speciation) is an integral component of many comparative genomics methods. Although orthologs are more likely to have similar function versus paralogs (genes that diverged because of duplication), recent studies have shown that their degree of functional conservation is variable. Also, there are inherent problems with several large-scale ortholog prediction approaches. To address these issues, we previously developed Ortholuge, which uses phylogenetic distance ratios to provide more precise ortholog assessments for a set of predicted orthologs. However, the original version of Ortholuge required manual intervention and was not easily accessible; therefore, we now report the development of OrtholugeDB, available online at http://www.pathogenomics.sfu.ca/ortholugedb. OrtholugeDB provides ortholog predictions for completely sequenced bacterial and archaeal genomes from NCBI based on reciprocal best Basic Local Alignment Search Tool hits, supplemented with further evaluation by the more precise Ortholuge method. The OrtholugeDB web interface facilitates user-friendly and flexible ortholog analysis, from single genes to genomes, plus flexible data download options. We compare Ortholuge with similar methods, showing how it may more consistently identify orthologs with conserved features across a wide range of taxonomic distances. OrtholugeDB facilitates rapid, and more accurate, bacterial and archaeal comparative genomic analysis and large-scale ortholog predictions.
Collapse
Affiliation(s)
- Matthew D Whiteside
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada
| | | | | | | |
Collapse
|
50
|
Tine M, Kuhl H, Jastroch M, Reinhardt R. Genomic characterization of the European sea bass Dicentrarchus labrax reveals the presence of a novel uncoupling protein (UCP) gene family member in the teleost fish lineage. BMC Evol Biol 2012; 12:62. [PMID: 22577775 PMCID: PMC3428666 DOI: 10.1186/1471-2148-12-62] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2011] [Accepted: 05/11/2012] [Indexed: 01/12/2023] Open
Abstract
Background Uncoupling proteins (UCP) are evolutionary conserved mitochondrial carriers that control energy metabolism and therefore play important roles in several physiological processes such as thermogenesis, regulation of reactive oxygen species (ROS), growth control, lipid metabolism and regulation of insulin secretion. Despite their importance in various physiological processes, their molecular function remains controversial. The evolution and phylogenetic distribution may assist to identify their general biological function and structure-function relationships. The exact number of uncoupling protein genes in the fish genome and their evolution is unresolved. Results Here we report the first characterisation of UCP gene family members in sea bass, Dicentrarchus labrax, and then retrace the evolution of the protein family in vertebrates. Four UCP genes that are shared by five other fish species were identified in sea bass genome. Phylogenetic reconstitution among vertebrate species and synteny analysis revealed that UCP1, UCP2 and UCP3 evolved from duplication events that occurred in the common ancestor of vertebrates, whereas the novel fourth UCP originated specifically in the teleost lineage. Functional divergence analysis among teleost species revealed specific amino acid positions that have been subjected to altered functional constraints after duplications. Conclusions This work provides the first unambiguous evidence for the presence of a fourth UCP gene in teleost fish genome and brings new insights into the evolutionary history of the gene family. Our results suggest functional divergence among paralogues which might result from long-term and differential selective pressures, and therefore, provide the indication that UCP genes may have diverse physiological functions in teleost fishes. Further experimental analysis of the critical amino acids identified here may provide valuable information on the physiological functions of UCP genes.
Collapse
Affiliation(s)
- Mbaye Tine
- Max Planck Institute for Molecular Genetics, Ihnestresse 63-73, 14195, Berlin, Germany.
| | | | | | | |
Collapse
|