1
|
Sabat AJ, Durfee T, Baldwin S, Akkerboom V, Voss A, Friedrich AW, Bathoorn E. The complete genome sequence of unculturable Mycoplasma faucium obtained through clinical metagenomic next-generation sequencing. Front Cell Infect Microbiol 2024; 14:1368923. [PMID: 38694516 PMCID: PMC11062135 DOI: 10.3389/fcimb.2024.1368923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 03/25/2024] [Indexed: 05/04/2024] Open
Abstract
Introduction Diagnosing Mycoplasma faucium poses challenges, and it's unclear if its rare isolation is due to infrequent occurrence or its fastidious nutritional requirements. Methods This study analyzes the complete genome sequence of M. faucium, obtained directly from the pus of a sternum infection in a lung transplant patient using metagenomic sequencing. Results Genome analysis revealed limited therapeutic options for the M. faucium infection, primarily susceptibility to tetracyclines. Three classes of mobile genetic elements were identified: two new insertion sequences, a new prophage (phiUMCG-1), and a species-specific variant of a mycoplasma integrative and conjugative element (MICE). Additionally, a Type I Restriction-Modification system was identified, featuring 5'-terminally truncated hsdS pseudogenes with overlapping repeats, indicating the potential for forming alternative hsdS variants through recombination. Conclusion This study represents the first-ever acquisition of a complete circularized bacterial genome directly from a patient sample obtained from invasive infection of a primary sterile site using culture-independent, PCR-free clinical metagenomics.
Collapse
Affiliation(s)
- Artur J. Sabat
- Department of Medical Microbiology and Infection Prevention, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Tim Durfee
- DNASTAR, Inc., Madison, WI, United States
| | | | - Viktoria Akkerboom
- Department of Medical Microbiology and Infection Prevention, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Andreas Voss
- Department of Medical Microbiology and Infection Prevention, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | | | - Erik Bathoorn
- Department of Medical Microbiology and Infection Prevention, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| |
Collapse
|
2
|
Persson E, Sonnhammer ELL. InParanoiDB 9: Ortholog Groups for Protein Domains and Full-Length Proteins. J Mol Biol 2023:168001. [PMID: 36764355 DOI: 10.1016/j.jmb.2023.168001] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 01/20/2023] [Accepted: 02/01/2023] [Indexed: 02/11/2023]
Abstract
Prediction of orthologs is an important bioinformatics pursuit that is frequently used for inferring protein function and evolutionary analyses. The InParanoid database is a well known resource of ortholog predictions between a wide variety of organisms. Although orthologs have historically been inferred at the level of full-length protein sequences, many proteins consist of several independent protein domains that may be orthologous to domains in other proteins in a way that differs from the full-length protein case. To be able to capture all types of orthologous relations, conventional full-length protein orthologs can be complemented with orthologs inferred at the domain level. We here present InParanoiDB 9, covering 640 species and providing orthologs for both protein domains and full-length proteins. InParanoiDB 9 was built using the faster InParanoid-DIAMOND algorithm for orthology analysis, as well as Domainoid and Pfam to infer orthologous domains. InParanoiDB 9 is based on proteomes from 447 eukaryotes, 158 bacteria and 35 archaea, and includes over one billion predicted ortholog groups. A new website has been built for the database, providing multiple search options as well as visualization of groups of orthologs and orthologous domains. This release constitutes a major upgrade of the InParanoid database in terms of the number of species as well as the new capability to operate on the domain level. InParanoiDB 9 is available at https://inparanoidb.sbc.su.se/.
Collapse
Affiliation(s)
- Emma Persson
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden. https://twitter.com/eriksonnhammer
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden.
| |
Collapse
|
3
|
Hu W, Yang X, Wang L, Zhu X. MADGAN:A microbe-disease association prediction model based on generative adversarial networks. Front Microbiol 2023; 14:1159076. [PMID: 37032881 PMCID: PMC10076708 DOI: 10.3389/fmicb.2023.1159076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 03/02/2023] [Indexed: 04/11/2023] Open
Abstract
Researches have demonstrated that microorganisms are indispensable for the nutrition transportation, growth and development of human bodies, and disorder and imbalance of microbiota may lead to the occurrence of diseases. Therefore, it is crucial to study relationships between microbes and diseases. In this manuscript, we proposed a novel prediction model named MADGAN to infer potential microbe-disease associations by combining biological information of microbes and diseases with the generative adversarial networks. To our knowledge, it is the first attempt to use the generative adversarial network to complete this important task. In MADGAN, we firstly constructed different features for microbes and diseases based on multiple similarity metrics. And then, we further adopted graph convolution neural network (GCN) to derive different features for microbes and diseases automatically. Finally, we trained MADGAN to identify latent microbe-disease associations by games between the generation network and the decision network. Especially, in order to prevent over-smoothing during the model training process, we introduced the cross-level weight distribution structure to enhance the depth of the network based on the idea of residual network. Moreover, in order to validate the performance of MADGAN, we conducted comprehensive experiments and case studies based on databases of HMDAD and Disbiome respectively, and experimental results demonstrated that MADGAN not only achieved satisfactory prediction performances, but also outperformed existing state-of-the-art prediction models.
Collapse
Affiliation(s)
- Weixin Hu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
| | - Xiaoyu Yang
- Institute of Bioinformatics Complex Network Big Data, Changsha University, Changsha, China
| | - Lei Wang
- Institute of Bioinformatics Complex Network Big Data, Changsha University, Changsha, China
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China
- *Correspondence: Lei Wang,
| | - Xianyou Zhu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
- Xianyou Zhu,
| |
Collapse
|
4
|
Zhang Y, Zhang H, Zhang Z, Qian Q, Zhang Z, Xiao J. ProPan: a comprehensive database for profiling prokaryotic pan-genome dynamics. Nucleic Acids Res 2022; 51:D767-D776. [PMID: 36169225 PMCID: PMC9825599 DOI: 10.1093/nar/gkac832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/09/2022] [Accepted: 09/16/2022] [Indexed: 01/30/2023] Open
Abstract
Compared with conventional comparative genomics, the recent studies in pan-genomics have provided further insights into species genomic dynamics, taxonomy and identification, pathogenicity and environmental adaptation. To better understand genome characteristics of species of interest and to fully excavate key metabolic and resistant genes and their conservations and variations, here we present ProPan (https://ngdc.cncb.ac.cn/propan), a public database covering 23 archaeal species and 1,481 bacterial species (in a total of 51,882 strains) for comprehensively profiling prokaryotic pan-genome dynamics. By analyzing and integrating these massive datasets, ProPan offers three major aspects for the pan-genome dynamics of the species of interest: 1) the evaluations of various species' characteristics and composition in pan-genome dynamics; 2) the visualization of map association, the functional annotation and presence/absence variation for all contained species' gene clusters; 3) the typical characteristics of the environmental adaptation, including resistance genes prediction of 126 substances (biocide, antimicrobial drug and metal) and evaluation of 31 metabolic cycle processes. Besides, ProPan develops a very user-friendly interface, flexible retrieval and multi-level real-time statistical visualization. Taken together, ProPan will serve as a weighty resource for the studies of prokaryotic pan-genome dynamics, taxonomy and identification as well as environmental adaptation.
Collapse
Affiliation(s)
| | | | - Zaichao Zhang
- Department of Biology, The University of Western Ontario, London, Ontario N6A 5B7, Canada
| | - Qiheng Qian
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhewen Zhang
- Correspondence may also be addressed to Zhewen Zhang.
| | - Jingfa Xiao
- To whom correspondence should be addressed. Tel: +86 10 8409 7443; Fax: +86 10 8409 7720;
| |
Collapse
|
5
|
Keith MF, Gopalakrishna KP, Bhavana VH, Hillebrand GH, Elder JL, Megli CJ, Sadovsky Y, Hooven TA. Nitric Oxide Production and Effects in Group B Streptococcus Chorioamnionitis. Pathogens 2022; 11:1115. [PMID: 36297171 PMCID: PMC9608865 DOI: 10.3390/pathogens11101115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 09/21/2022] [Accepted: 09/23/2022] [Indexed: 11/16/2022] Open
Abstract
Intrauterine infection, or chorioamnionitis, due to group B Streptococcus (GBS) is a common cause of miscarriage and preterm birth. To cause chorioamnionitis, GBS must bypass maternal-fetal innate immune defenses including nitric oxide (NO), a microbicidal gas produced by nitric oxide synthases (NOS). This study examined placental NO production and its role in host-pathogen interactions in GBS chorioamnionitis. In a murine model of ascending GBS chorioamnionitis, placental NOS isoform expression quantified by RT-qPCR revealed a four-fold expression increase in inducible NOS, no significant change in expression of endothelial NOS, and decreased expression of neuronal NOS. These NOS expression results were recapitulated ex vivo in freshly collected human placental samples that were co-incubated with GBS. Immunohistochemistry of wild type C57BL/6 murine placentas with GBS chorioamnionitis demonstrated diffuse inducible NOS expression with high-expression foci in the junctional zone and areas of abscess. Pregnancy outcomes between wild type and inducible NOS-deficient mice did not differ significantly although wild type dams had a trend toward more frequent preterm delivery. We also identified possible molecular mechanisms that GBS uses to survive in a NO-rich environment. In vitro exposure of GBS to NO resulted in dose-dependent growth inhibition that varied by serovar. RNA-seq on two GBS strains with distinct NO resistance phenotypes revealed that both GBS strains shared several detoxification pathways that were differentially expressed during NO exposure. These results demonstrate that the placental immune response to GBS chorioamnionitis includes induced NO production and indicate that GBS activates conserved stress pathways in response to NO exposure.
Collapse
Affiliation(s)
- Mary Frances Keith
- Department of Pediatrics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | | | | | - Gideon Hayden Hillebrand
- Department of Pediatrics, University of Pittsburgh School of Medicine, Pittsburgh, PA 15224, USA
| | - Jordan Lynn Elder
- Manual Hematology and Coagulation Department, The Cleveland Clinic, Cleveland, OH 44195, USA
| | - Christina Joann Megli
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
- UPMC Magee-Womens Research Institute, Pittsburgh, PA 15213, USA
| | - Yoel Sadovsky
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
- UPMC Magee-Womens Research Institute, Pittsburgh, PA 15213, USA
| | - Thomas Alexander Hooven
- Department of Pediatrics, University of Pittsburgh School of Medicine, Pittsburgh, PA 15224, USA
- UPMC Magee-Womens Research Institute, Pittsburgh, PA 15213, USA
- UPMC Children’s Hospital of Pittsburgh Richard King Mellon Institute for Pediatric Research, Pittsburgh, PA 15224, USA
- UPMC Children’s Hospital of Pittsburgh, 4401 Penn Ave. Rangos Research Building #8128, Pittsburgh, PA 15224, USA
| |
Collapse
|
6
|
Wang Y, Lei X, Lu C, Pan Y. Predicting Microbe-Disease Association Based on Multiple Similarities and LINE Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2399-2408. [PMID: 34014827 DOI: 10.1109/tcbb.2021.3082183] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Numerous microbes have been found to have vital impacts on human health through affecting biological processes. Therefore, exploring potential associations between microbes and diseases will promote the understanding and diagnosis of diseases. In this study, we present a novel computational model, named MSLINE, to infer potential microbe-disease associations by integrating Multiple Similarities and Large-scale Information Network Embedding (LINE) based on known associations. Specifically, on the basis of known microbe-disease associations from the Human Microbe-Disease Association Database, we first increase the known associations by collecting proven associations from existing literatures. We then construct a microbe-disease heterogeneous network (MDHN) by integrating known associations and multiple similarities (including Gaussian interaction profile kernel similarity, microbe function similarity, disease semantic similarity and disease-symptom similarity). After that, we implement random walk and LINE algorithm on MDHN to learn its structure information. Finally, we score the microbe-disease associations according to the structure information for every nodes. In the Leave-one-out cross validation and 5-fold cross validation, MSLINE performs better compared to other existing methods. Moreover, case studies of different diseases proved that MSLINE could predict the potential microbe-disease associations efficiently.
Collapse
|
7
|
Malla MA, Dubey A, Raj A, Kumar A, Upadhyay N, Yadav S. Emerging frontiers in microbe-mediated pesticide remediation: Unveiling role of omics and In silico approaches in engineered environment. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2022; 299:118851. [PMID: 35085655 DOI: 10.1016/j.envpol.2022.118851] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 01/09/2022] [Accepted: 01/11/2022] [Indexed: 06/14/2023]
Abstract
The overuse of pesticides for augmenting agriculture productivity always comes at the cost of environment, biodiversity, and human health and has put the land, water, and environmental footprints under severe threat throughout the globe. Underpinning and maximizing the microbiome functions in pesticide-contaminated environments has become a prerequisite for a sustainable environment and resilient agriculture. It is imperative to elucidate the metabolic network of the microbial communities and environmental variables at the contaminated site to predict the best strategy for remediation and soil microbe-pesticide interactions. High throughput next-generation sequencing and in silico analysis allow us to identify and discern the members and characteristics of core microbiomes at the contaminated site. Integration of modern high throughput multi-omics investigations and informatics pipelines provide novel approaches and pathways to capitalize on the core microbiomes for enhancing environmental functioning and mitigation. The role of eco-genomics tools in visualising the microbial network, taxonomy, functional potential, and environmental variables in contaminated habitats is discussed in this review. The integrated role of the potential microbe identification as individual or consortia, mechanistic approach for pesticide degradation, identification of responsible enzymes/genes, and in silico approach is emphasized for the prospects of the area.
Collapse
Affiliation(s)
- Muneer Ahmad Malla
- Department of Zoology, Dr. Harisingh Gour University (Central University), Sagar, 470003, MP, India; Metagenomics and Secretomics Research Laboratory, Department of Botany, Dr. Harisingh Gour University (Central University), Sagar, 470003, MP, India
| | - Anamika Dubey
- Metagenomics and Secretomics Research Laboratory, Department of Botany, Dr. Harisingh Gour University (Central University), Sagar, 470003, MP, India
| | - Aman Raj
- Metagenomics and Secretomics Research Laboratory, Department of Botany, Dr. Harisingh Gour University (Central University), Sagar, 470003, MP, India
| | - Ashwani Kumar
- Metagenomics and Secretomics Research Laboratory, Department of Botany, Dr. Harisingh Gour University (Central University), Sagar, 470003, MP, India.
| | - Niraj Upadhyay
- Department of Chemistry, Dr. Harisingh Gour University (Central University), Sagar, 470003, MP, India
| | - Shweta Yadav
- Department of Zoology, Dr. Harisingh Gour University (Central University), Sagar, 470003, MP, India
| |
Collapse
|
8
|
Dong AY, Wang Z, Huang JJ, Song BA, Hao GF. Bioinformatic tools support decision-making in plant disease management. TRENDS IN PLANT SCIENCE 2021; 26:953-967. [PMID: 34039514 DOI: 10.1016/j.tplants.2021.05.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 02/10/2021] [Accepted: 05/01/2021] [Indexed: 06/12/2023]
Abstract
Food loss due to pathogens is a major concern in agriculture, requiring the need for advanced disease detection and prevention measures to minimize pathogen damage to plants. Novel bioinformatic tools have opened doors for the low-cost rapid identification of pathogens and prevention of disease. The number of these tools is growing fast and a comprehensive and comparative summary of these resources is currently lacking. Here, we review all current bioinformatic tools used to identify the mechanisms of pathogen pathogenicity, plant resistance protein identification, and the detection and treatment of plant disease. We compare functionality, data volume, data sources, performance, and applicability of all tools to provide a comprehensive toolbox for researchers in plant disease management.
Collapse
Affiliation(s)
- An-Yu Dong
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, P. R. China
| | - Zheng Wang
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, P. R. China
| | - Jun-Jie Huang
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, P. R. China
| | - Bao-An Song
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, P. R. China
| | - Ge-Fei Hao
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, P. R. China.
| |
Collapse
|
9
|
Complete Genome Sequence of Ferrigenium kumadai An22, a Microaerophilic Iron-Oxidizing Bacterium Isolated from a Paddy Field Soil. Microbiol Resour Announc 2021; 10:e0034621. [PMID: 34236217 PMCID: PMC8265223 DOI: 10.1128/mra.00346-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Ferrigenium kumadai An22T (= JCM 30584T = NBRC 112974T = ATCC TSD-51T) is a microaerophilic iron oxidizer isolated from paddy field soil and belongs to the family Gallionellaceae. Here, we report the complete genome sequence of F. kumadai An22T, which was obtained from the hybrid data of Oxford Nanopore long-read and Illumina short-read sequencing.
Collapse
|
10
|
Riediger M, Spät P, Bilger R, Voigt K, Maček B, Hess WR. Analysis of a photosynthetic cyanobacterium rich in internal membrane systems via gradient profiling by sequencing (Grad-seq). THE PLANT CELL 2021; 33:248-269. [PMID: 33793824 PMCID: PMC8136920 DOI: 10.1093/plcell/koaa017] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 11/12/2020] [Indexed: 05/23/2023]
Abstract
Although regulatory small RNAs have been reported in photosynthetic cyanobacteria, the lack of clear RNA chaperones involved in their regulation poses a conundrum. Here, we analyzed the full complement of cellular RNAs and proteins using gradient profiling by sequencing (Grad-seq) in Synechocystis 6803. Complexes with overlapping subunits such as the CpcG1-type versus the CpcL-type phycobilisomes or the PsaK1 versus PsaK2 photosystem I pre(complexes) could be distinguished, supporting the high quality of this approach. Clustering of the in-gradient distribution profiles followed by several additional criteria yielded a short list of potential RNA chaperones that include an YlxR homolog and a cyanobacterial homolog of the KhpA/B complex. The data suggest previously undetected complexes between accessory proteins and CRISPR-Cas systems, such as a Csx1-Csm6 ribonucleolytic defense complex. Moreover, the exclusive association of either RpoZ or 6S RNA with the core RNA polymerase complex and the existence of a reservoir of inactive sigma-antisigma complexes is suggested. The Synechocystis Grad-seq resource is available online at https://sunshine.biologie.uni-freiburg.de/GradSeqExplorer/ providing a comprehensive resource for the functional assignment of RNA-protein complexes and multisubunit protein complexes in a photosynthetic organism.
Collapse
Affiliation(s)
- Matthias Riediger
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Schänzlestr. 1, 79104 Freiburg, Germany
| | - Philipp Spät
- Department of Quantitative Proteomics, Interfaculty Institute for Cell Biology, University of Tübingen, Auf der Morgenstelle 15, 72076 Tübingen, Germany
| | - Raphael Bilger
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Schänzlestr. 1, 79104 Freiburg, Germany
| | - Karsten Voigt
- IT Administration, Institute of Biology 3, Faculty of Biology, University of Freiburg, Schänzlestr. 1, 79104 Freiburg, Germany
| | - Boris Maček
- Department of Quantitative Proteomics, Interfaculty Institute for Cell Biology, University of Tübingen, Auf der Morgenstelle 15, 72076 Tübingen, Germany
| | - Wolfgang R Hess
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Schänzlestr. 1, 79104 Freiburg, Germany
| |
Collapse
|
11
|
Linard B, Ebersberger I, McGlynn SE, Glover N, Mochizuki T, Patricio M, Lecompte O, Nevers Y, Thomas PD, Gabaldón T, Sonnhammer E, Dessimoz C, Uchiyama I. Ten Years of Collaborative Progress in the Quest for Orthologs. Mol Biol Evol 2021; 38:3033-3045. [PMID: 33822172 PMCID: PMC8321534 DOI: 10.1093/molbev/msab098] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 02/07/2021] [Accepted: 04/01/2021] [Indexed: 12/19/2022] Open
Abstract
Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology-evolutionary relatedness-is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit-from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.
Collapse
Affiliation(s)
- Benjamin Linard
- LIRMM, University of Montpellier, CNRS, Montpellier, France.,SPYGEN, Le Bourget-du-Lac, France
| | - Ingo Ebersberger
- Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Centre (S-BIKF), Frankfurt, Germany.,LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt, Germany
| | - Shawn E McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan.,Blue Marble Space Institute of Science, Seattle, WA, USA
| | - Natasha Glover
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Tomohiro Mochizuki
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Yannis Nevers
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BCS-CNS), Jordi Girona, Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Department of Computer Science, University College London, London, United Kingdom.,Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ikuo Uchiyama
- Department of Theoretical Biology, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
| | | |
Collapse
|
12
|
Tang J, Wu X, Mou M, Wang C, Wang L, Li F, Guo M, Yin J, Xie W, Wang X, Wang Y, Ding Y, Xue W, Zhu F. GIMICA: host genetic and immune factors shaping human microbiota. Nucleic Acids Res 2021; 49:D715-D722. [PMID: 33045729 PMCID: PMC7779047 DOI: 10.1093/nar/gkaa851] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/09/2020] [Accepted: 10/08/2020] [Indexed: 01/09/2023] Open
Abstract
Besides the environmental factors having tremendous impacts on the composition of microbial community, the host factors have recently gained extensive attentions on their roles in shaping human microbiota. There are two major types of host factors: host genetic factors (HGFs) and host immune factors (HIFs). These factors of each type are essential for defining the chemical and physical landscapes inhabited by microbiota, and the collective consideration of both types have great implication to serve comprehensive health management. However, no database was available to provide the comprehensive factors of both types. Herein, a database entitled 'Host Genetic and Immune Factors Shaping Human Microbiota (GIMICA)' was constructed. Based on the 4257 microbes confirmed to inhabit nine sites of human body, 2851 HGFs (1368 single nucleotide polymorphisms (SNPs), 186 copy number variations (CNVs), and 1297 non-coding ribonucleic acids (RNAs)) modulating the expression of 370 microbes were collected, and 549 HIFs (126 lymphocytes and phagocytes, 387 immune proteins, and 36 immune pathways) regulating the abundance of 455 microbes were also provided. All in all, GIMICA enabled the collective consideration not only between different types of host factor but also between the host and environmental ones, which is freely accessible without login requirement at: https://idrblab.org/gimica/.
Collapse
Affiliation(s)
- Jing Tang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,College of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Xianglu Wu
- Joint International Research Lab of Reproductive and Development, Department of Reproductive Biology, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Chuan Wang
- College of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lidan Wang
- College of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Maiyuan Guo
- College of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Wenqin Xie
- College of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Xiaona Wang
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Yingxiong Wang
- College of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.,Joint International Research Lab of Reproductive and Development, Department of Reproductive Biology, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Yubin Ding
- Joint International Research Lab of Reproductive and Development, Department of Reproductive Biology, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
13
|
Galperin MY, Wolf YI, Makarova KS, Vera Alvarez R, Landsman D, Koonin EV. COG database update: focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res 2021; 49:D274-D281. [PMID: 33167031 DOI: 10.1093/nar/gkaa1018] [Citation(s) in RCA: 387] [Impact Index Per Article: 129.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 10/13/2020] [Accepted: 10/14/2020] [Indexed: 12/20/2022] Open
Abstract
The Clusters of Orthologous Genes (COG) database, also referred to as the Clusters of Orthologous Groups of proteins, was created in 1997 and went through several rounds of updates, most recently, in 2014. The current update, available at https://www.ncbi.nlm.nih.gov/research/COG, substantially expands the scope of the database to include complete genomes of 1187 bacteria and 122 archaea, typically, with a single genome per genus. In addition, the current version of the COGs includes the following new features: (i) the recently deprecated NCBI's gene index (gi) numbers for the encoded proteins are replaced with stable RefSeq or GenBank\ENA\DDBJ coding sequence (CDS) accession numbers; (ii) COG annotations are updated for >200 newly characterized protein families with corresponding references and PDB links, where available; (iii) lists of COGs grouped by pathways and functional systems are added; (iv) 266 new COGs for proteins involved in CRISPR-Cas immunity, sporulation in Firmicutes and photosynthesis in cyanobacteria are included; and (v) the database is made available as a web page, in addition to FTP. The current release includes 4877 COGs. Future plans include further expansion of the COG collection by adding archaeal COGs (arCOGs), splitting the COGs containing multiple paralogs, and continued refinement of COG annotations.
Collapse
Affiliation(s)
- Michael Y Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Roberto Vera Alvarez
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - David Landsman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
14
|
Sato N, Obayashi T. Lipid Pathway Databases with a Focus on Algae. Methods Mol Biol 2021; 2295:455-468. [PMID: 34047993 DOI: 10.1007/978-1-0716-1362-7_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Pathways of lipid biosynthesis are highly complex and have been established in model organisms such as Arabidopsis thaliana and Chlamydomonas reinhardtii, whereas in other organisms, we need bioinformatic tools to map individual enzymes onto reference pathways. In this chapter, we explain representative tools that are useful in identifying algal orthologs of lipid biosynthetic enzymes and finding new enzymes that are possibly involved in the pathway of interest. All descriptions in this chapter refer to in silico (i.e., computer-based) methods rather than laboratory experiments.
Collapse
Affiliation(s)
- Naoki Sato
- Department of Life Sciences, University of Tokyo, Tokyo, Japan.
| | | |
Collapse
|
15
|
Fasim A, More VS, More SS. Large-scale production of enzymes for biotechnology uses. Curr Opin Biotechnol 2020; 69:68-76. [PMID: 33388493 DOI: 10.1016/j.copbio.2020.12.002] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 11/12/2020] [Accepted: 12/08/2020] [Indexed: 01/08/2023]
Abstract
Enzymes are biocatalysts that speed up the chemical reaction to obtain the final valuable product/s. Biotechnology has revolutionized the use of traditional enzymes to be applicable in industries such as food, beverage, personal and household care, agriculture, bioenergy, pharmaceutical, and various other segments. With respect to the exponential growth of enzymes in biotech industries, it becomes important to highlight the advancements and impact of enzyme technology over recent years. In this review article, we discuss the existing and emerging production approaches, applications, developments, and global need for enzymes. Special emphasis is given to the predominantly utilized hydrolytic microbial enzymes in industrial bioprocesses.
Collapse
Affiliation(s)
- Aneesa Fasim
- School of Basic and Applied Sciences, Dayananda Sagar University, Bengaluru 560 111, Karnataka, India
| | - Veena S More
- Department of Biotechnology, Sapthagiri College of Engineering, Bengaluru 560 057 Karnataka, India
| | - Sunil S More
- School of Basic and Applied Sciences, Dayananda Sagar University, Bengaluru 560 111, Karnataka, India.
| |
Collapse
|
16
|
Jackson LK, Potter B, Schneider S, Fitzgibbon M, Blair K, Farah H, Krishna U, Bedford T, Peek RM, Salama NR. Helicobacter pylori diversification during chronic infection within a single host generates sub-populations with distinct phenotypes. PLoS Pathog 2020; 16:e1008686. [PMID: 33370399 PMCID: PMC7794030 DOI: 10.1371/journal.ppat.1008686] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 01/08/2021] [Accepted: 10/22/2020] [Indexed: 12/15/2022] Open
Abstract
Helicobacter pylori chronically infects the stomach of approximately half of the world's population. Manifestation of clinical diseases associated with H. pylori infection, including cancer, is driven by strain properties and host responses; and as chronic infection persists, both are subject to change. Previous studies have documented frequent and extensive within-host bacterial genetic variation. To define how within-host diversity contributes to phenotypes related to H. pylori pathogenesis, this project leverages a collection of 39 clinical isolates acquired prospectively from a single subject at two time points and from multiple gastric sites. During the six years separating collection of these isolates, this individual, initially harboring a duodenal ulcer, progressed to gastric atrophy and concomitant loss of acid secretion. Whole genome sequence analysis identified 1,767 unique single nucleotide polymorphisms (SNPs) across isolates and a nucleotide substitution rate of 1.3x10-4 substitutions/site/year. Gene ontology analysis identified cell envelope genes among the genes with excess accumulation of nonsynonymous SNPs (nSNPs). A maximum likelihood tree based on genetic similarity clusters isolates from each time point separately. Within time points, there is segregation of subgroups with phenotypic differences in bacterial morphology, ability to induce inflammatory cytokines, and mouse colonization. Higher inflammatory cytokine induction in recent isolates maps to shared polymorphisms in the Cag PAI protein, CagY, while rod morphology in a subgroup of recent isolates mapped to eight mutations in three distinct helical cell shape determining (csd) genes. The presence of subgroups with unique genetic and phenotypic properties suggest complex selective forces and multiple niches within the stomach during chronic infection.
Collapse
Affiliation(s)
- Laura K. Jackson
- Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, United States of America
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA, United States of America
| | - Barney Potter
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, United States of America
| | - Sean Schneider
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA, United States of America
| | - Matthew Fitzgibbon
- Genomics & Bioinformatics Shared Resource, Fred Hutchinson Cancer Research Center, Seattle, WA, United States of America
| | - Kris Blair
- Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, United States of America
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA, United States of America
| | - Hajirah Farah
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA, United States of America
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA, United States of America
| | - Uma Krishna
- Division of Gastroenterology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Trevor Bedford
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA, United States of America
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, United States of America
| | - Richard M. Peek
- Division of Gastroenterology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Nina R. Salama
- Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, United States of America
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA, United States of America
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA, United States of America
| |
Collapse
|
17
|
Altenhoff AM, Garrayo-Ventas J, Cosentino S, Emms D, Glover NM, Hernández-Plaza A, Nevers Y, Sundesha V, Szklarczyk D, Fernández JM, Codó L, For Orthologs Consortium TQ, Gelpi JL, Huerta-Cepas J, Iwasaki W, Kelly S, Lecompte O, Muffato M, Martin MJ, Capella-Gutierrez S, Thomas PD, Sonnhammer E, Dessimoz C. The Quest for Orthologs benchmark service and consensus calls in 2020. Nucleic Acids Res 2020; 48:W538-W545. [PMID: 32374845 PMCID: PMC7319555 DOI: 10.1093/nar/gkaa308] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 04/16/2020] [Accepted: 04/20/2020] [Indexed: 12/18/2022] Open
Abstract
The identification of orthologs—genes in different species which descended from the same gene in their last common ancestor—is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.benchmarkservice.org). Furthermore, consensus ortholog calls derived from public benchmark submissions are provided on the Alliance of Genome Resources website, the joint portal of NIH-funded model organism databases.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,ETH Zurich, Department of Computer Science, Zurich, Switzerland
| | | | - Salvatore Cosentino
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - David Emms
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, UK
| | - Natasha M Glover
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Ana Hernández-Plaza
- Centro de Biotecnologia y Genomica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain
| | - Yannis Nevers
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Vicky Sundesha
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Damian Szklarczyk
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Laia Codó
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | | | - Josep Ll Gelpi
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Department of Biochemistry and Molecular Biomedicine. University of Barcelona. Barcelona, Spain
| | - Jaime Huerta-Cepas
- Centro de Biotecnologia y Genomica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain
| | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, UK
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, USA
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution & Environment, University College London, London, UK.,Department of Computer Science, University College London, London, UK
| |
Collapse
|
18
|
Long Y, Luo J, Zhang Y, Xia Y. Predicting human microbe-disease associations via graph attention networks with inductive matrix completion. Brief Bioinform 2020; 22:5876591. [PMID: 32725163 DOI: 10.1093/bib/bbaa146] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 06/07/2020] [Accepted: 06/11/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION human microbes play a critical role in an extensive range of complex human diseases and become a new target in precision medicine. In silico methods of identifying microbe-disease associations not only can provide a deep insight into understanding the pathogenic mechanism of complex human diseases but also assist pharmacologists to screen candidate targets for drug development. However, the majority of existing approaches are based on linear models or label propagation, which suffers from limitations in capturing nonlinear associations between microbes and diseases. Besides, it is still a great challenge for most previous methods to make predictions for new diseases (or new microbes) with few or without any observed associations. RESULTS in this work, we construct features for microbes and diseases by fully exploiting multiply sources of biomedical data, and then propose a novel deep learning framework of graph attention networks with inductive matrix completion for human microbe-disease association prediction, named GATMDA. To our knowledge, this is the first attempt to leverage graph attention networks for this important task. In particular, we develop an optimized graph attention network with talking-heads to learn representations for nodes (i.e. microbes and diseases). To focus on more important neighbours and filter out noises, we further design a bi-interaction aggregator to enforce representation aggregation of similar neighbours. In addition, we combine inductive matrix completion to reconstruct microbe-disease associations to capture the complicated associations between diseases and microbes. Comprehensive experiments on two data sets (i.e. HMDAD and Disbiome) demonstrated that our proposed model consistently outperformed baseline methods. Case studies on two diseases, i.e. asthma and inflammatory bowel disease, further confirmed the effectiveness of our proposed model of GATMDA. AVAILABILITY python codes and data set are available at: https://github.com/yahuilong/GATMDA. CONTACT luojiawei@hnu.edu.cn.
Collapse
Affiliation(s)
- Yahui Long
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China.,School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China
| | - Yu Zhang
- School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | - Yan Xia
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China
| |
Collapse
|
19
|
Li J, Chen Z, Wang Y. Contents, Construction Methods, Data Resources, and Functions Comparative Analysis of Bacteria Databases. Int J Biol Sci 2020; 16:838-848. [PMID: 32071553 PMCID: PMC7019132 DOI: 10.7150/ijbs.39289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 11/11/2019] [Indexed: 11/20/2022] Open
Abstract
Many bacterial-related databases are developed to meet the researchers' needs of analysis and search for a number of bacterial information. However, these databases have different data resources, construction methods, data formats, and analysis tools. It's difficult for researchers to select appropriate databases and analysis tools to promote their researches. In the paper, we compared the contents, construction methods, data sources, update frequency, scope and scale of data, analysis tools, and features of nine famous bacterial databases: CARD, EffectiveDB, MBGD, MPD, PATRCI, PHI-base, VFDB, gcMeta and SILVA, and help researchers to better make better use of these databases. In addition, we also hope this review can help researchers develop a more comprehensive database and better tools to meet the needs of researchers.
Collapse
Affiliation(s)
- Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, China
| | - Zhuo Chen
- School of Computer Science and Technology, Harbin Institute of Technology, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, China
| |
Collapse
|
20
|
Glover N, Dessimoz C, Ebersberger I, Forslund SK, Gabaldón T, Huerta-Cepas J, Martin MJ, Muffato M, Patricio M, Pereira C, da Silva AS, Wang Y, Sonnhammer E, Thomas PD. Advances and Applications in the Quest for Orthologs. Mol Biol Evol 2020; 36:2157-2164. [PMID: 31241141 PMCID: PMC6759064 DOI: 10.1093/molbev/msz150] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Gene families evolve by the processes of speciation (creating orthologs), gene duplication (paralogs), and horizontal gene transfer (xenologs), in addition to sequence divergence and gene loss. Orthologs in particular play an essential role in comparative genomics and phylogenomic analyses. With the continued sequencing of organisms across the tree of life, the data are available to reconstruct the unique evolutionary histories of tens of thousands of gene families. Accurate reconstruction of these histories, however, is a challenging computational problem, and the focus of the Quest for Orthologs Consortium. We review the recent advances and outstanding challenges in this field, as revealed at a symposium and meeting held at the University of Southern California in 2017. Key advances have been made both at the level of orthology algorithm development and with respect to coordination across the community of algorithm developers and orthology end-users. Applications spanned a broad range, including gene function prediction, phylostratigraphy, genome evolution, and phylogenomics. The meetings highlighted the increasing use of meta-analyses integrating results from multiple different algorithms, and discussed ongoing challenges in orthology inference as well as the next steps toward improvement and integration of orthology resources.
Collapse
Affiliation(s)
- Natasha Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution & Environment, University College London, London, United Kingdom.,Department of Computer Science, University College London, London, United Kingdom
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Centre (BIK-F), Frankfurt, Germany.,LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
| | - Sofia K Forslund
- Experimental and Clinical Research Center, A Cooperation of Charité-Universitätsmedizin Berlin and Max Delbruck Center for Molecular Medicine, Berlin, Germany.,Max Delbruck Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany.,Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität u Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany.,Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Toni Gabaldón
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,ICREA, Barcelona, Spain
| | - Jaime Huerta-Cepas
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.,Centro de Biotecnología y Genómica de Plantas, Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Cécile Pereira
- Eura Nova, Marseille, France.,Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL
| | - Alan Sousa da Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Yan Wang
- Department of Microbiology and Plant Pathology, Institute for Integrative Genome Biology, University of California-Riverside, Riverside, CA
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| |
Collapse
|
21
|
Sima AC, Dessimoz C, Stockinger K, Zahn-Zabal M, Mendes de Farias T. A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL. F1000Res 2019; 8:1822. [PMID: 32612807 PMCID: PMC7324951 DOI: 10.12688/f1000research.21027.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/09/2020] [Indexed: 11/20/2022] Open
Abstract
The increasing use of Semantic Web technologies in the life sciences, in particular the use of the Resource Description Framework (RDF) and the RDF query language SPARQL, opens the path for novel integrative analyses, combining information from multiple data sources. However, analyzing evolutionary data in RDF is not trivial, due to the steep learning curve required to understand both the data models adopted by different RDF data sources, as well as the equivalent SPARQL constructs required to benefit from this data - in particular, recursive property paths. In this article, we provide a hands-on introduction to querying evolutionary data across several data sources that publish orthology information in RDF, namely: The Orthologous MAtrix (OMA), the European Bioinformatics Institute (EBI) RDF platform, the Database of Orthologous Groups (OrthoDB) and the Microbial Genome Database (MBGD). We present four protocols in increasing order of complexity. In these protocols, we demonstrate through SPARQL queries how to retrieve pairwise orthologs, homologous groups, and hierarchical orthologous groups. Finally, we show how orthology information in different data sources can be compared, through the use of federated SPARQL queries.
Collapse
Affiliation(s)
- Ana Claudia Sima
- ZHAW Zurich University of Applied Sciences, Winterthur, Zurich, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Vaud, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Vaud, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Vaud, Switzerland.,Department of Computer Science, University College London, London, UK.,Department of Genetics, Evolution, and Environment, University College London, London, UK
| | - Kurt Stockinger
- ZHAW Zurich University of Applied Sciences, Winterthur, Zurich, Switzerland
| | - Monique Zahn-Zabal
- Department of Computational Biology, University of Lausanne, Lausanne, Vaud, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, Switzerland
| | - Tarcisio Mendes de Farias
- Department of Computational Biology, University of Lausanne, Lausanne, Vaud, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Vaud, Switzerland.,Department of Ecology and Evolution, University of Lausanne, Lausanne, Vaud, Switzerland
| |
Collapse
|
22
|
Sima AC, Dessimoz C, Stockinger K, Zahn-Zabal M, Mendes de Farias T. A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL. F1000Res 2019; 8:1822. [PMID: 32612807 PMCID: PMC7324951 DOI: 10.12688/f1000research.21027.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/22/2019] [Indexed: 08/01/2024] Open
Abstract
The increasing use of Semantic Web technologies in the life sciences, in particular the use of the Resource Description Framework (RDF) and the RDF query language SPARQL, opens the path for novel integrative analyses, combining information from multiple sources. However, analyzing evolutionary data in RDF is not trivial, due to the steep learning curve required to understand both the data models adopted by different RDF data sources, as well as the SPARQL query language. In this article, we provide a hands-on introduction to querying evolutionary data across multiple sources that publish orthology information in RDF, namely: The Orthologous MAtrix (OMA), the European Bioinformatics Institute (EBI) RDF platform, the Database of Orthologous Groups (OrthoDB) and the Microbial Genome Database (MBGD). We present four protocols in increasing order of complexity. In these protocols, we demonstrate through SPARQL queries how to retrieve pairwise orthologs, homologous groups, and hierarchical orthologous groups. Finally, we show how orthology information in different sources can be compared, through the use of federated SPARQL queries.
Collapse
Affiliation(s)
- Ana Claudia Sima
- ZHAW Zurich University of Applied Sciences, Winterthur, Zurich, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Vaud, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Vaud, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, Vaud, Switzerland
- Department of Computer Science, University College London, London, UK
- Department of Genetics, Evolution, and Environment, University College London, London, UK
| | - Kurt Stockinger
- ZHAW Zurich University of Applied Sciences, Winterthur, Zurich, Switzerland
| | - Monique Zahn-Zabal
- Department of Computational Biology, University of Lausanne, Lausanne, Vaud, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, Switzerland
| | - Tarcisio Mendes de Farias
- Department of Computational Biology, University of Lausanne, Lausanne, Vaud, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, Vaud, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Vaud, Switzerland
| |
Collapse
|