1
|
Pavia MJ, Chede A, Wu Z, Cadillo-Quiroz H, Zhu Q. BinaRena: a dedicated interactive platform for human-guided exploration and binning of metagenomes. MICROBIOME 2023; 11:186. [PMID: 37596696 PMCID: PMC10439608 DOI: 10.1186/s40168-023-01625-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 07/16/2023] [Indexed: 08/20/2023]
Abstract
BACKGROUND Exploring metagenomic contigs and "binning" them into metagenome-assembled genomes (MAGs) are essential for the delineation of functional and evolutionary guilds within microbial communities. Despite the advances in automated binning algorithms, their capabilities in recovering MAGs with accuracy and biological relevance are so far limited. Researchers often find that human involvement is necessary to achieve representative binning results. This manual process however is expertise demanding and labor intensive, and it deserves to be supported by software infrastructure. RESULTS We present BinaRena, a comprehensive and versatile graphic interface dedicated to aiding human operators to explore metagenome assemblies via customizable visualization and to associate contigs with bins. Contigs are rendered as an interactive scatter plot based on various data types, including sequence metrics, coverage profiles, taxonomic assignments, and functional annotations. Various contig-level operations are permitted, such as selection, masking, highlighting, focusing, and searching. Binning plans can be conveniently edited, inspected, and compared visually or using metrics including silhouette coefficient and adjusted Rand index. Completeness and contamination of user-selected contigs can be calculated in real time. In demonstration of BinaRena's usability, we show that it facilitated biological pattern discovery, hypothesis generation, and bin refinement in a complex tropical peatland metagenome. It enabled isolation of pathogenic genomes within closely related populations from the gut microbiota of diarrheal human subjects. It significantly improved overall binning quality after curating results of automated binners using a simulated marine dataset. CONCLUSIONS BinaRena is an installation-free, dependency-free, client-end web application that operates directly in any modern web browser, facilitating ease of deployment and accessibility for researchers of all skill levels. The program is hosted at https://github.com/qiyunlab/binarena , together with documentation, tutorials, example data, and a live demo. It effectively supports human researchers in intuitive interpretation and fine tuning of metagenomic data. Video Abstract.
Collapse
Affiliation(s)
- Michael J Pavia
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
- Biodesign Swette Center for Environmental Biotechnology, Arizona State University, Tempe, AZ, USA
| | - Abhinav Chede
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
| | - Zijun Wu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Hinsby Cadillo-Quiroz
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
- Biodesign Swette Center for Environmental Biotechnology, Arizona State University, Tempe, AZ, USA.
| | - Qiyun Zhu
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
| |
Collapse
|
2
|
Zafar H, Saier MH. Understanding the Relationship of the Human Bacteriome with COVID-19 Severity and Recovery. Cells 2023; 12:cells12091213. [PMID: 37174613 PMCID: PMC10177376 DOI: 10.3390/cells12091213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/05/2023] [Accepted: 04/11/2023] [Indexed: 05/15/2023] Open
Abstract
The Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) first emerged in 2019 in China and has resulted in millions of human morbidities and mortalities across the globe. Evidence has been provided that this novel virus originated in animals, mutated, and made the cross-species jump to humans. At the time of this communication, the Coronavirus disease (COVID-19) may be on its way to an endemic form; however, the threat of the virus is more for susceptible (older and immunocompromised) people. The human body has millions of bacterial cells that influence health and disease. As a consequence, the bacteriomes in the human body substantially influence human health and disease. The bacteriomes in the body and the immune system seem to be in constant association during bacterial and viral infections. In this review, we identify various bacterial spp. In major bacteriomes (oral, nasal, lung, and gut) of the body in healthy humans and compare them with dysbiotic bacteriomes of COVID-19 patients. We try to identify key bacterial spp. That have a positive effect on the functionality of the immune system and human health. These select bacterial spp. Could be used as potential probiotics to counter or prevent COVID-19 infections. In addition, we try to identify key metabolites produced by probiotic bacterial spp. That could have potential anti-viral effects against SARS-CoV-2. These metabolites could be subject to future therapeutic trials to determine their anti-viral efficacies.
Collapse
Affiliation(s)
- Hassan Zafar
- Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, CA 92093-0116, USA
- Central European Institute of Technology, Masaryk University, 625 00 Brno, Czech Republic
| | - Milton H Saier
- Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, CA 92093-0116, USA
| |
Collapse
|
3
|
Iordache D, Baci GM, Căpriță O, Farkas A, Lup A, Butiuc-Keul A. Correlation between CRISPR Loci Diversity in Three Enterobacterial Taxa. Int J Mol Sci 2022; 23:ijms232112766. [PMID: 36361556 PMCID: PMC9658729 DOI: 10.3390/ijms232112766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 10/20/2022] [Accepted: 10/20/2022] [Indexed: 11/05/2022] Open
Abstract
CRISPR-Cas is an adaptive immunity system of prokaryotes, composed of CRISPR arrays and the associated proteins. The successive addition of spacer sequences in the CRISPR array has made the system a valuable molecular marker, with multiple applications. Due to the high degree of polymorphism of the CRISPR loci, their comparison in bacteria from various sources may provide insights into the evolution and spread of the CRISPR-Cas systems. The aim of this study was to establish a correlation between the enterobacterial CRISPR loci, the sequence of direct repeats (DR), and the number of spacer units, along with the geographical origin and collection source. For this purpose, 3474 genomes containing CRISPR loci from the CRISPRCasdb of Salmonella enterica, Escherichia coli, and Klebsiella pneumoniae were analyzed, and the information regarding the isolates was recorded from the NCBI database. The most prevalent was the I-E CRISPR-Cas system in all three studied taxa. E. coli also presents the I-F type, but in a much lesser percentage. The systems found in K. pneumoniae can be classified into I-E and I-E*. The I-E and I-F systems have two CRISPR loci, while I-E* has only one locus upstream of the Cas cluster. PCR primers have been developed in this study for each CRISPR locus. Distinct clustering was not evident, but statistically significant relationships occurred between the different CRISPR loci and the number of spacer units. For each of the queried taxa, the number of spacers was significantly different (p < 0.01) by origin (Africa, Asia, Australia and Oceania, Europe, North America, and South America) but was not linked to the isolation source type (human, animal, plant, food, or laboratory strains).
Collapse
Affiliation(s)
- Dumitrana Iordache
- Doctoral School of Integrative Biology, Babeș-Bolyai University, 44 Republicii street, 400015 Cluj-Napoca, Romania
- Department of Molecular Biology and Biotechnology, Faculty of Biology and Geology, Babeș-Bolyai University, 1 M. Kogalniceanu Street, 400084 Cluj-Napoca, Romania
- Centre for Systems Biology, Biodiversity and Bioresources, Babeș-Bolyai University, 5–7 Clinicilor Street, 400006 Cluj-Napoca, Romania
| | - Gabriela-Maria Baci
- Faculty of Animal Science and Biotechnology, University of Agricultural Sciences and Veterinary Medicine, 400372 Cluj-Napoca, Romania
| | - Oana Căpriță
- Department of Molecular Biology and Biotechnology, Faculty of Biology and Geology, Babeș-Bolyai University, 1 M. Kogalniceanu Street, 400084 Cluj-Napoca, Romania
| | - Anca Farkas
- Department of Molecular Biology and Biotechnology, Faculty of Biology and Geology, Babeș-Bolyai University, 1 M. Kogalniceanu Street, 400084 Cluj-Napoca, Romania
- Centre for Systems Biology, Biodiversity and Bioresources, Babeș-Bolyai University, 5–7 Clinicilor Street, 400006 Cluj-Napoca, Romania
- Correspondence:
| | - Andreea Lup
- Department of Molecular Biology and Biotechnology, Faculty of Biology and Geology, Babeș-Bolyai University, 1 M. Kogalniceanu Street, 400084 Cluj-Napoca, Romania
| | - Anca Butiuc-Keul
- Department of Molecular Biology and Biotechnology, Faculty of Biology and Geology, Babeș-Bolyai University, 1 M. Kogalniceanu Street, 400084 Cluj-Napoca, Romania
- Centre for Systems Biology, Biodiversity and Bioresources, Babeș-Bolyai University, 5–7 Clinicilor Street, 400006 Cluj-Napoca, Romania
| |
Collapse
|
4
|
Chitcharoen S, Sivapornnukul P, Payungporn S. Revolutionized virome research using systems microbiology approaches. Exp Biol Med (Maywood) 2022; 247:1135-1147. [PMID: 35723062 PMCID: PMC9335507 DOI: 10.1177/15353702221102895] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Currently, both pathogenic and commensal viruses are continuously being discovered and acknowledged as ubiquitous components of microbial communities. The advancements of systems microbiological approaches have changed the face of virome research. Here, we focus on viral metagenomic approach to study virus community and their interactions with other microbial members as well as their hosts. This review also summarizes challenges, limitations, and benefits of the current virome approaches. Potentially, the studies of virome can be further applied in various biological and clinical fields.
Collapse
Affiliation(s)
- Suwalak Chitcharoen
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok 10330, Thailand,Research Unit of Systems Microbiology, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
| | - Pavaret Sivapornnukul
- Research Unit of Systems Microbiology, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand,Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
| | - Sunchai Payungporn
- Research Unit of Systems Microbiology, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand,Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand,Sunchai Payungporn.
| |
Collapse
|
5
|
Mekbib Y, Tesfaye K, Dong X, Saina JK, Hu GW, Wang QF. Whole-genome resequencing of Coffea arabica L. (Rubiaceae) genotypes identify SNP and unravels distinct groups showing a strong geographical pattern. BMC PLANT BIOLOGY 2022; 22:69. [PMID: 35164709 PMCID: PMC8842891 DOI: 10.1186/s12870-022-03449-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 01/27/2022] [Indexed: 06/04/2023]
Abstract
BACKGROUND Coffea arabica L. is an economically important agricultural crop and the most popular beverage worldwide. As a perennial crop with recalcitrant seed, conservation of the genetic resources of coffee can be achieved through the complementary approach of in-situ and ex-situ field genebank. In Ethiopia, a large collection of C. arabica L. germplasm is preserved in field gene banks. Here, we report the whole-genome resequencing of 90 accessions from Choche germplasm bank representing garden and forest-based coffee production systems using Illumina sequencing technology. RESULTS The genome sequencing generated 6.41 billion paired-end reads, with a mean of 71.19 million reads per sample. More than 93% of the clean reads were mapped onto the C. arabica L. reference genome. A total of 11.08 million variants were identified, among which 9.74 million (87.9%) were SNPs (Single nucleotide polymorphisms) and 1.34 million (12.1%) were InDels. In all accessions, genomic variants were unevenly distributed across the coffee genome. The phylogenetic analysis using the SNP markers displayed distinct groups. CONCLUSIONS Resequencing of the coffee accessions has allowed identification of genetic markers, such as SNPs and InDels. The SNPs discovered in this study might contribute to the variation in important pathways of genes for important agronomic traits such as caffeine content, yield, disease, and pest in coffee. Moreover, the genome resequencing data and the genetic markers identified from 90 accessions provide insight into the genetic variation of the coffee germplasm and facilitate a broad range of genetic studies.
Collapse
Affiliation(s)
- Yeshitila Mekbib
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China
- Ethiopian Biodiversity Institute, P.O. Box 30726, Addis Ababa, Ethiopia
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan, 430074, China
| | - Kassahun Tesfaye
- Department of Microbial, Cellular and Molecular Biology, Addis Ababa University, Addis Ababa, Ethiopia
- Ethiopian Biotechnology Institute, Ministry of Innovation and Technology, Addis Ababa, Ethiopia
| | - Xiang Dong
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan, 430074, China
| | - Josphat K Saina
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan, 430074, China
- Centre for Integrative Conservation, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, 666303, China
| | - Guang-Wan Hu
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan, 430074, China.
| | - Qing-Feng Wang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan, 430074, China
| |
Collapse
|
6
|
Identification of discriminative gene-level and protein-level features associated with pathogenic gain-of-function and loss-of-function variants. Am J Hum Genet 2021; 108:2301-2318. [PMID: 34762822 DOI: 10.1016/j.ajhg.2021.10.007] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Accepted: 10/19/2021] [Indexed: 12/13/2022] Open
Abstract
Identifying whether a given genetic mutation results in a gene product with increased (gain-of-function; GOF) or diminished (loss-of-function; LOF) activity is an important step toward understanding disease mechanisms because they may result in markedly different clinical phenotypes. Here, we generated an extensive database of documented germline GOF and LOF pathogenic variants by employing natural language processing (NLP) on the available abstracts in the Human Gene Mutation Database. We then investigated various gene- and protein-level features of GOF and LOF variants and applied machine learning and statistical analyses to identify discriminative features. We found that GOF variants were enriched in essential genes, for autosomal-dominant inheritance, and in protein binding and interaction domains, whereas LOF variants were enriched in singleton genes, for protein-truncating variants, and in protein core regions. We developed a user-friendly web-based interface that enables the extraction of selected subsets from the GOF/LOF database by a broad set of annotated features and downloading of up-to-date versions. These results improve our understanding of how variants affect gene/protein function and may ultimately guide future treatment options.
Collapse
|
7
|
Ejigu GF, Jung J. Review on the Computational Genome Annotation of Sequences Obtained by Next-Generation Sequencing. BIOLOGY 2020; 9:E295. [PMID: 32962098 PMCID: PMC7565776 DOI: 10.3390/biology9090295] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 09/13/2020] [Accepted: 09/16/2020] [Indexed: 12/16/2022]
Abstract
Next-Generation Sequencing (NGS) has made it easier to obtain genome-wide sequence data and it has shifted the research focus into genome annotation. The challenging tasks involved in annotation rely on the currently available tools and techniques to decode the information contained in nucleotide sequences. This information will improve our understanding of general aspects of life and evolution and improve our ability to diagnose genetic disorders. Here, we present a summary of both structural and functional annotations, as well as the associated comparative annotation tools and pipelines. We highlight visualization tools that immensely aid the annotation process and the contributions of the scientific community to the annotation. Further, we discuss quality-control practices and the need for re-annotation, and highlight the future of annotation.
Collapse
Affiliation(s)
| | - Jaehee Jung
- Department of Information and Communication Engineering, Myongji University, Yongin-si 17058, Gyeonggi-do, Korea;
| |
Collapse
|
8
|
Khan Mirzaei M, Xue J, Costa R, Ru J, Schulz S, Taranu ZE, Deng L. Challenges of Studying the Human Virome - Relevant Emerging Technologies. Trends Microbiol 2020; 29:171-181. [PMID: 32622559 DOI: 10.1016/j.tim.2020.05.021] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 05/27/2020] [Accepted: 05/28/2020] [Indexed: 01/17/2023]
Abstract
In this review we provide an overview of current challenges and advances in bacteriophage research within the growing field of viromics. In particular, we discuss, from a human virome study perspective, the current and emerging technologies available, their limitations in terms of de novo discoveries, and possible solutions to overcome present experimental and computational biases associated with low abundance of viral DNA or RNA. We summarize recent breakthroughs in metagenomics assembling tools and single-cell analysis, which have the potential to increase our understanding of phage biology, diversity, and interactions with both the microbial community and the human body. We expect that these recent and future advances in the field of viromics will have a strong impact on how we develop phage-based therapeutic approaches.
Collapse
Affiliation(s)
- Mohammadali Khan Mirzaei
- Institute of Virology, Helmholtz Centre Munich and Technical University of Munich, Neuherberg, Bavaria 85764, Germany
| | - Jinling Xue
- Institute of Virology, Helmholtz Centre Munich and Technical University of Munich, Neuherberg, Bavaria 85764, Germany
| | - Rita Costa
- Institute of Virology, Helmholtz Centre Munich and Technical University of Munich, Neuherberg, Bavaria 85764, Germany
| | - Jinlong Ru
- Institute of Virology, Helmholtz Centre Munich and Technical University of Munich, Neuherberg, Bavaria 85764, Germany
| | - Sarah Schulz
- Institute of Virology, Helmholtz Centre Munich and Technical University of Munich, Neuherberg, Bavaria 85764, Germany
| | - Zofia E Taranu
- Aquatic Contaminants Research Division (ACRD), Environment and Climate Change Canada (ECCC), Montréal, QC H2Y 2E7, Canada
| | - Li Deng
- Institute of Virology, Helmholtz Centre Munich and Technical University of Munich, Neuherberg, Bavaria 85764, Germany.
| |
Collapse
|
9
|
Zhang N, Hu G, Myers TG, Williamson PR. Protocols for the Analysis of microRNA Expression, Biogenesis, and Function in Immune Cells. CURRENT PROTOCOLS IN IMMUNOLOGY 2019; 126:e78. [PMID: 31483103 PMCID: PMC6727972 DOI: 10.1002/cpim.78] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
MicroRNAs (miRNAs) are short (19- to 25-nucleotide) noncoding RNA molecules that target mRNAs to repress gene expression and that play important roles in regulating many fundamental biological functions including cell differentiation, development, growth, and metabolism. They are well conserved in eukaryotic cells and are considered essential ancient elements of gene regulation. miRNA genes are transcribed by RNA polymerase II to generate primary miRNAs (pri-miRNAs), which are cleaved by microprocessor complex in the nucleus to generate stem-loop structures known as pre-miRNAs. Pre-miRNAs are translocated to the cytoplasm and cleaved by Dicer to form the mature miRNAs, which mediate mRNA degradation through their loading to the RNA-induced silencing complex (RISC) and binding to complementary sequences within target mRNAs to repress their translation by mRNA degradation and/or translation inhibition. Because ∼1900 miRNA genes are reported in the human genome, many associated with disease, appropriate methods to study miRNA expression and regulation under physiological and pathological conditions have become increasingly important to the study of many aspects of human biology, including immune regulation. As with small interfering RNA (siRNA), the mechanism of miRNA-mediated targeting has been used to develop miRNA-based therapeutics. For a complete and systematic analysis, it is critical to utilize a variety of different tools to analyze the expression of pri-mRNAs, pre-miRNAs, and mature miRNAs and characterize their targets both in vitro and in vivo. Such studies will facilitate future novel drug design and development. This unit provides six basic protocols for miRNA analysis, covering next-generation sequencing, quantitative real-time PCR (qRT-PCR), and digoxigenin-based expression analysis of pri-mRNAs, pre-miRNAs, and mature miRNAs; mapping of pri-miRNA and their cleavage sites by rapid amplification of cDNA ends (RACE); electrophoretic mobility shift assays (EMSAs) or biotin-based nonradioactive detection of miRNA-protein complexes (miRNPs); and functional analysis of miRNAs using miRNA mimics and inhibitors. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Nannan Zhang
- Laboratory of Clinical Immunology and Microbiology, National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, Maryland, USA
| | - Guowu Hu
- Laboratory of Clinical Immunology and Microbiology, National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, Maryland, USA
| | - Timothy G. Myers
- Genomic Technologies Section, Research Technologies Branch, National Institute of Allergy and infectious diseases, National Institute of Health, Bethesda, Maryland, USA
| | - Peter R. Williamson
- Laboratory of Clinical Immunology and Microbiology, National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, Maryland, USA
| |
Collapse
|
10
|
Mitra S. Multiple Data Analyses and Statistical Approaches for Analyzing Data from Metagenomic Studies and Clinical Trials. Methods Mol Biol 2019; 1910:605-634. [PMID: 31278679 DOI: 10.1007/978-1-4939-9074-0_20] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Metagenomics, also known as environmental genomics, is the study of the genomic content of a sample of organisms (microbes) obtained from a common habitat. Metagenomics and other "omics" disciplines have captured the attention of researchers for several decades. The effect of microbes in our body is a relevant concern for health studies. There are plenty of studies using metagenomics which examine microorganisms that inhabit niches in the human body, sometimes causing disease, and are often correlated with multiple treatment conditions. No matter from which environment it comes, the analyses are often aimed at determining either the presence or absence of specific species of interest in a given metagenome or comparing the biological diversity and the functional activity of a wider range of microorganisms within their communities. The importance increases for comparison within different environments such as multiple patients with different conditions, multiple drugs, and multiple time points of same treatment or same patient. Thus, no matter how many hypotheses we have, we need a good understanding of genomics, bioinformatics, and statistics to work together to analyze and interpret these datasets in a meaningful way. This chapter provides an overview of different data analyses and statistical approaches (with example scenarios) to analyze metagenomics samples from different medical projects or clinical trials.
Collapse
Affiliation(s)
- Suparna Mitra
- Leeds Institute of Medical Research, University of Leeds, Microbiology, Old Medical School, Leeds General Infirmary, Leeds LS1 3EX, West Yorkshire, UK.
| |
Collapse
|
11
|
Thirugnanasambandam R, Inbakandan D, Kumar C, Subashni B, Vasantharaja R, Stanley Abraham L, Ayyadurai N, Sriyutha Murthy P, Kirubagaran R, Ajmal Khan S, Balasubramanian T. Genomic insights of Vibrio harveyi RT-6 strain, from infected “Whiteleg shrimp” (Litopenaeus vannamei) using Illumina platform. Mol Phylogenet Evol 2019; 130:35-44. [DOI: 10.1016/j.ympev.2018.09.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 09/20/2018] [Accepted: 09/24/2018] [Indexed: 10/28/2022]
|
12
|
Pavlopoulos GA, Kontou PI, Pavlopoulou A, Bouyioukos C, Markou E, Bagos PG. Bipartite graphs in systems biology and medicine: a survey of methods and applications. Gigascience 2018; 7:1-31. [PMID: 29648623 PMCID: PMC6333914 DOI: 10.1093/gigascience/giy014] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2017] [Revised: 01/15/2018] [Accepted: 02/13/2018] [Indexed: 11/14/2022] Open
Abstract
The latest advances in high-throughput techniques during the past decade allowed the systems biology field to expand significantly. Today, the focus of biologists has shifted from the study of individual biological components to the study of complex biological systems and their dynamics at a larger scale. Through the discovery of novel bioentity relationships, researchers reveal new information about biological functions and processes. Graphs are widely used to represent bioentities such as proteins, genes, small molecules, ligands, and others such as nodes and their connections as edges within a network. In this review, special focus is given to the usability of bipartite graphs and their impact on the field of network biology and medicine. Furthermore, their topological properties and how these can be applied to certain biological case studies are discussed. Finally, available methodologies and software are presented, and useful insights on how bipartite graphs can shape the path toward the solution of challenging biological problems are provided.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Lawrence Berkeley Labs, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Panagiota I Kontou
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| | - Athanasia Pavlopoulou
- Izmir International Biomedicine and Genome Institute (iBG-Izmir), Dokuz Eylül University, 35340, Turkey
| | - Costas Bouyioukos
- Université Paris Diderot, Sorbonne Paris Cité, Epigenetics and Cell Fate, UMR7216, CNRS, France
| | - Evripides Markou
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| | - Pantelis G Bagos
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| |
Collapse
|
13
|
Wadapurkar RM, Vyas R. Computational analysis of next generation sequencing data and its applications in clinical oncology. INFORMATICS IN MEDICINE UNLOCKED 2018. [DOI: 10.1016/j.imu.2018.05.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
14
|
Khomtchouk BB, Hennessy JR, Wahlestedt C. shinyheatmap: Ultra fast low memory heatmap web interface for big data genomics. PLoS One 2017; 12:e0176334. [PMID: 28493881 PMCID: PMC5426587 DOI: 10.1371/journal.pone.0176334] [Citation(s) in RCA: 77] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2016] [Accepted: 04/10/2017] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Transcriptomics, metabolomics, metagenomics, and other various next-generation sequencing (-omics) fields are known for their production of large datasets, especially across single-cell sequencing studies. Visualizing such big data has posed technical challenges in biology, both in terms of available computational resources as well as programming acumen. Since heatmaps are used to depict high-dimensional numerical data as a colored grid of cells, efficiency and speed have often proven to be critical considerations in the process of successfully converting data into graphics. For example, rendering interactive heatmaps from large input datasets (e.g., 100k+ rows) has been computationally infeasible on both desktop computers and web browsers. In addition to memory requirements, programming skills and knowledge have frequently been barriers-to-entry for creating highly customizable heatmaps. RESULTS We propose shinyheatmap: an advanced user-friendly heatmap software suite capable of efficiently creating highly customizable static and interactive biological heatmaps in a web browser. shinyheatmap is a low memory footprint program, making it particularly well-suited for the interactive visualization of extremely large datasets that cannot typically be computed in-memory due to size restrictions. Also, shinyheatmap features a built-in high performance web plug-in, fastheatmap, for rapidly plotting interactive heatmaps of datasets as large as 105-107 rows within seconds, effectively shattering previous performance benchmarks of heatmap rendering speed. CONCLUSIONS shinyheatmap is hosted online as a freely available web server with an intuitive graphical user interface: http://shinyheatmap.com. The methods are implemented in R, and are available as part of the shinyheatmap project at: https://github.com/Bohdan-Khomtchouk/shinyheatmap. Users can access fastheatmap directly from within the shinyheatmap web interface, and all source code has been made publicly available on Github: https://github.com/Bohdan-Khomtchouk/fastheatmap.
Collapse
Affiliation(s)
- Bohdan B. Khomtchouk
- Center for Therapeutic Innovation, University of Miami Miller School of Medicine, 1501 NW 10th Ave., Miami, FL, 33136, United States of America
- Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, 33136, United States of America
| | - James R. Hennessy
- Department of Mathematics, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33146, United States of America
| | - Claes Wahlestedt
- Center for Therapeutic Innovation, University of Miami Miller School of Medicine, 1501 NW 10th Ave., Miami, FL, 33136, United States of America
- Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, 33136, United States of America
| |
Collapse
|
15
|
Comparative genomics of Tunisian Leishmania major isolates causing human cutaneous leishmaniasis with contrasting clinical severity. INFECTION GENETICS AND EVOLUTION 2016; 50:110-120. [PMID: 27818279 DOI: 10.1016/j.meegid.2016.10.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2016] [Revised: 09/27/2016] [Accepted: 10/29/2016] [Indexed: 12/23/2022]
Abstract
Zoonotic cutaneous leishmaniasis caused by Leishmania (L.) major parasites affects urban and suburban areas in the center and south of Tunisia where the disease is endemo-epidemic. Several cases were reported in human patients for which infection due to L. major induced lesions with a broad range of severity. However, very little is known about the mechanisms underlying this diversity. Our hypothesis is that parasite genomic variability could, in addition to the host immunological background, contribute to the intra-species clinical variability observed in patients and explain the lesion size differences observed in the experimental model. Based on several epidemiological, in vivo and in vitro experiments, we focused on two clinical isolates showing contrasted severity in patients and BALB/c experimental mice model. We used DNA-seq as a high-throughput technology to facilitate the identification of genetic variants with discriminating potential between both isolates. Our results demonstrate that various levels of heterogeneity could be found between both L. major isolates in terms of chromosome or gene copy number variation (CNV), and that the intra-species divergence could surprisingly be related to single nucleotide polymorphisms (SNPs) and Insertion/Deletion (InDels) events. Interestingly, we particularly focused here on genes affected by both types of variants and correlated them with the observed gene CNV. Whether these differences are sufficient to explain the severity in patients is obviously still open to debate, but we do believe that additional layers of -omic information is needed to complement the genomic screen in order to draw a more complete map of severity determinants.
Collapse
|
16
|
A Comprehensive In Silico Analysis on the Structural and Functional Impact of SNPs in the Congenital Heart Defects Associated with NKX2-5 Gene-A Molecular Dynamic Simulation Approach. PLoS One 2016; 11:e0153999. [PMID: 27152669 PMCID: PMC4859487 DOI: 10.1371/journal.pone.0153999] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 03/21/2016] [Indexed: 11/23/2022] Open
Abstract
Congenital heart defects (CHD) presented as structural defects in the heart and blood vessels during birth contribute an important cause of childhood morbidity and mortality worldwide. Many Single nucletotide polymorphisms (SNPs) in different genes have been associated with various types of congenital heart defects. NKX 2–5 gene is one among them, which encodes a homeobox-containing transcription factor that plays a crucial role during the initial phases of heart formation and development. Mutations in this gene could cause different types of congenital heart defects, including Atrial septal defect (ASD), Atrial ventricular block (AVB), Tetralogy of fallot and ventricular septal defect. This highlights the importance of studying the impact of different SNPs found within this gene that might cause structural and functional modification of its encoded protein. In this study, we retrieved SNPs from the database (dbSNP), followed by identification of potentially deleterious Non-synonymous single nucleotide polymorphisms (nsSNPs) and prediction of their effect on proteins by computational screening using SIFT and Polyphen. Furthermore, we have carried out molecular dynamic simulation (MDS) in order to uncover the SNPs that would cause the most structural damage to the protein altering its biological function. The most important SNP that was found using our approach was rs137852685 R161P, which was predicted to cause the most damage to the structural features of the protein. Mapping nsSNPs in genes such as NKX 2–5 would provide valuable information about individuals carrying these polymorphisms, where such variations could be used as diagnostic markers.
Collapse
|
17
|
Guan P, Sung WK. Structural variation detection using next-generation sequencing data: A comparative technical review. Methods 2016; 102:36-49. [PMID: 26845461 DOI: 10.1016/j.ymeth.2016.01.020] [Citation(s) in RCA: 98] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2015] [Revised: 01/09/2016] [Accepted: 01/31/2016] [Indexed: 12/11/2022] Open
Abstract
Structural variations (SVs) are mutations in the genome of size at least fifty nucleotides. They contribute to the phenotypic differences among healthy individuals, cause severe diseases and even cancers by breaking or linking genes. Thus, it is crucial to systematically profile SVs in the genome. In the past decade, many next-generation sequencing (NGS)-based SV detection methods have been proposed due to the significant cost reduction of NGS experiments and their ability to unbiasedly detect SVs to the base-pair resolution. These SV detection methods vary in both sensitivity and specificity, since they use different SV-property-dependent and library-property-dependent features. As a result, predictions from different SV callers are often inconsistent. Besides, the noises in the data (both platform-specific sequencing error and artificial chimeric reads) impede the specificity of SV detection. Poorly characterized regions in the human genome (e.g., repeat regions) greatly impact the reads mapping and in turn affect the SV calling accuracy. Calling of complex SVs requires specialized SV callers. Apart from accuracy, processing speed of SV caller is another factor deciding its usability. Knowing the pros and cons of different SV calling techniques and the objectives of the biological study are essential for biologists and bioinformaticians to make informed decisions. This paper describes different components in the SV calling pipeline and reviews the techniques used by existing SV callers. Through simulation study, we also demonstrate that library properties, especially insert size, greatly impact the sensitivity of different SV callers. We hope the community can benefit from this work both in designing new SV calling methods and in selecting the appropriate SV caller for specific biological studies.
Collapse
Affiliation(s)
- Peiyong Guan
- School of Computing, National University of Singapore, 117543, Singapore
| | - Wing-Kin Sung
- School of Computing, National University of Singapore, 117543, Singapore; Computational & Mathematical Biology Group, Genome Institute of Singapore, 138672, Singapore.
| |
Collapse
|
18
|
Impact of germline and somatic missense variations on drug binding sites. THE PHARMACOGENOMICS JOURNAL 2016; 17:128-136. [PMID: 26810135 PMCID: PMC5380835 DOI: 10.1038/tpj.2015.97] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Revised: 11/02/2015] [Accepted: 11/13/2015] [Indexed: 11/10/2022]
Abstract
Advancements in next-generation sequencing (NGS) technologies are generating a vast amount of data. This exacerbates the current challenge of translating NGS data into actionable clinical interpretations. We have comprehensively combined germline and somatic nonsynonymous single-nucleotide variations (nsSNVs) that affect drug binding sites in order to investigate their prevalence. The integrated data thus generated in conjunction with exome or whole-genome sequencing can be used to identify patients who may not respond to a specific drug because of alterations in drug binding efficacy due to nsSNVs in the target protein's gene. To identify the nsSNVs that may affect drug binding, protein–drug complex structures were retrieved from Protein Data Bank (PDB) followed by identification of amino acids in the protein–drug binding sites using an occluded surface method. Then, the germline and somatic mutations were mapped to these amino acids to identify which of these alter protein–drug binding sites. Using this method we identified 12 993 amino acid–drug binding sites across 253 unique proteins bound to 235 unique drugs. The integration of amino acid–drug binding sites data with both germline and somatic nsSNVs data sets revealed 3133 nsSNVs affecting amino acid–drug binding sites. In addition, a comprehensive drug target discovery was conducted based on protein structure similarity and conservation of amino acid–drug binding sites. Using this method, 81 paralogs were identified that could serve as alternative drug targets. In addition, non-human mammalian proteins bound to drugs were used to identify 142 homologs in humans that can potentially bind to drugs. In the current protein–drug pairs that contain somatic mutations within their binding site, we identified 85 proteins with significant differential gene expression changes associated with specific cancer types. Information on protein–drug binding predicted drug target proteins and prevalence of both somatic and germline nsSNVs that disrupt these binding sites can provide valuable knowledge for personalized medicine treatment. A web portal is available where nsSNVs from individual patient can be checked by scanning against DrugVar to determine whether any of the SNVs affect the binding of any drug in the database.
Collapse
|
19
|
Abstract
Next-generation sequencing experiment can generate billions of short reads for each sample and processing of the raw reads will add more information. Various file formats have been introduced/developed in order to store and manipulate this information. This chapter presents an overview of the file formats including FASTQ, FASTA, SAM/BAM, GFF/GTF, BED, and VCF that are commonly used in analysis of next-generation sequencing data.
Collapse
Affiliation(s)
- Hongen Zhang
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, 37 Convent Drive, Room 6138, Bethesda, MD, 20892, USA.
| |
Collapse
|
20
|
A review of genome-wide association studies for multiple sclerosis: classical and hypothesis-driven approaches. Hum Genet 2015; 134:1143-62. [DOI: 10.1007/s00439-015-1601-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 09/10/2015] [Indexed: 12/17/2022]
|
21
|
Pavlopoulos GA, Malliarakis D, Papanikolaou N, Theodosiou T, Enright AJ, Iliopoulos I. Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future. Gigascience 2015; 4:38. [PMID: 26309733 PMCID: PMC4548842 DOI: 10.1186/s13742-015-0077-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 08/03/2015] [Indexed: 01/31/2023] Open
Abstract
"Α picture is worth a thousand words." This widely used adage sums up in a few words the notion that a successful visual representation of a concept should enable easy and rapid absorption of large amounts of information. Although, in general, the notion of capturing complex ideas using images is very appealing, would 1000 words be enough to describe the unknown in a research field such as the life sciences? Life sciences is one of the biggest generators of enormous datasets, mainly as a result of recent and rapid technological advances; their complexity can make these datasets incomprehensible without effective visualization methods. Here we discuss the past, present and future of genomic and systems biology visualization. We briefly comment on many visualization and analysis tools and the purposes that they serve. We focus on the latest libraries and programming languages that enable more effective, efficient and faster approaches for visualizing biological concepts, and also comment on the future human-computer interaction trends that would enable for enhancing visualization further.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | | | - Nikolas Papanikolaou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | - Theodosis Theodosiou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | - Anton J Enright
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD UK
| | - Ioannis Iliopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| |
Collapse
|
22
|
Olson ND, Lund SP, Colman RE, Foster JT, Sahl JW, Schupp JM, Keim P, Morrow JB, Salit ML, Zook JM. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet 2015. [PMID: 26217378 PMCID: PMC4493402 DOI: 10.3389/fgene.2015.00235] [Citation(s) in RCA: 109] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit’s focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards.
Collapse
Affiliation(s)
- Nathan D Olson
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Steven P Lund
- Statistical Engineering Division, Information Technology Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Rebecca E Colman
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA
| | - Jeffrey T Foster
- Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - Jason W Sahl
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA ; Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - James M Schupp
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA
| | - Paul Keim
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA ; Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - Jayne B Morrow
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Marc L Salit
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA ; Department of Bioengineering, Stanford University , Stanford, CA, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| |
Collapse
|
23
|
Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, Arvanitidis C, Iliopoulos I. Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform Biol Insights 2015; 9:75-88. [PMID: 25983555 PMCID: PMC4426941 DOI: 10.4137/bbi.s12462] [Citation(s) in RCA: 177] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 03/09/2015] [Accepted: 03/13/2015] [Indexed: 12/14/2022] Open
Abstract
Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of "metagenomics", often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards.
Collapse
Affiliation(s)
- Anastasis Oulas
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Christina Pavloudi
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
- Department of Biology, University of Ghent, Ghent, Belgium
- Department of Microbial Ecophysiology, University of Bremen, Bremen, Germany
| | - Paraskevi Polymenakou
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| | - Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| | - Georgios Kotoulas
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Christos Arvanitidis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| |
Collapse
|
24
|
Katsila T, Patrinos GP. Whole genome sequencing in pharmacogenomics. Front Pharmacol 2015; 6:61. [PMID: 25859217 PMCID: PMC4374451 DOI: 10.3389/fphar.2015.00061] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Accepted: 03/09/2015] [Indexed: 11/13/2022] Open
Abstract
Pharmacogenomics aims to shed light on the role of genes and genomic variants in clinical treatment response. Although, several drug-gene relationships are characterized to date, many challenges still remain toward the application of pharmacogenomics in the clinic; clinical guidelines for pharmacogenomic testing are still in their infancy, whereas the emerging high throughput genotyping technologies produce a tsunami of new findings. Herein, the potential of whole genome sequencing on pharmacogenomics research and clinical application are highlighted.
Collapse
Affiliation(s)
- Theodora Katsila
- Department of Pharmacy, School of Health Sciences, University of Patras Patras, Greec
| | - George P Patrinos
- Department of Pharmacy, School of Health Sciences, University of Patras Patras, Greec
| |
Collapse
|
25
|
Benjak A, Sala C, Hartkoorn RC. Whole-genome sequencing for comparative genomics and de novo genome assembly. Methods Mol Biol 2015; 1285:1-16. [PMID: 25779307 DOI: 10.1007/978-1-4939-2450-9_1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).
Collapse
Affiliation(s)
- Andrej Benjak
- École polytechnique fédérale de Lausanne (EPFL), Global Health Institute, Lausanne, CH-1015, Switzerland
| | | | | |
Collapse
|
26
|
Favorova OO, Bashinskaya VV, Kulakova OG, Favorov AV, Boyko AN. Genome-wide association study as a method to analyze the genome architecture in polygenic diseases, with the example of multiple sclerosis. Mol Biol 2014. [DOI: 10.1134/s0026893314040037] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
27
|
Li MJ, Yan B, Sham PC, Wang J. Exploring the function of genetic variants in the non-coding genomic regions: approaches for identifying human regulatory variants affecting gene expression. Brief Bioinform 2014; 16:393-412. [PMID: 24916300 DOI: 10.1093/bib/bbu018] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 04/23/2014] [Indexed: 12/13/2022] Open
Abstract
Understanding the genetic basis of human traits/diseases and the underlying mechanisms of how these traits/diseases are affected by genetic variations is critical for public health. Current genome-wide functional genomics data uncovered a large number of functional elements in the noncoding regions of human genome, providing new opportunities to study regulatory variants (RVs). RVs play important roles in transcription factor bindings, chromatin states and epigenetic modifications. Here, we systematically review an array of methods currently used to map RVs as well as the computational approaches in annotating and interpreting their regulatory effects, with emphasis on regulatory single-nucleotide polymorphism. We also briefly introduce experimental methods to validate these functional RVs.
Collapse
|
28
|
Abstract
The advent of the polymerase chain reaction and the availability of data from various global human genome projects should make it possible, using a DNA sample isolated from white blood cells, to diagnose rapidly and accurately almost any monogenic condition resulting from single nucleotide changes. DNA-based diagnosis for malignant hyperthermia (MH) is an attractive proposition, because it could replace the invasive and morbid caffeine-halothane/in vitro contracture tests of skeletal muscle biopsy tissue. Moreover, MH is preventable if an accurate diagnosis of susceptibility can be made before general anesthesia, the most common trigger of an MH episode. Diagnosis of MH using DNA was suggested as early as 1990 when the skeletal muscle ryanodine receptor gene (RYR1), and a single point mutation therein, was linked to MH susceptibility. In 1994, a single point mutation in the α 1 subunit of the dihydropyridine receptor gene (CACNA1S) was identified and also subsequently shown to be causative of MH. In the succeeding years, the number of identified mutations in RYR1 has grown, as has the number of potential susceptibility loci, although no other gene has yet been definitively associated with MH. In addition, it has become clear that MH is associated with either of these 2 genes (RYR1 and CACNA1S) in only 50% to 70% of affected families. While DNA testing for MH susceptibility has now become widespread, it still does not replace the in vitro contracture tests. Whole exome sequence analysis makes it potentially possible to identify all variants within human coding regions, but the complexity of the genome, the heterogeneity of MH, the limitations of bioinformatic tools, and the lack of precise genotype/phenotype correlations are all confounding factors. In addition, the requirement for demonstration of causality, by in vitro functional analysis, of any familial mutation currently precludes DNA-based diagnosis as the sole test for MH susceptibility. Nevertheless, familial DNA testing for MH susceptibility is now widespread although limited to a positive diagnosis and to those few mutations that have been functionally characterized. Identification of new susceptibility genes remains elusive. When new genes are identified, it will be the role of the biochemists, physiologists, and biophysicists to devise functional assays in appropriate systems. This will remain the bottleneck unless high throughput platforms can be designed for functional work. Analysis of entire genomes from several individuals simultaneously is a reality. DNA testing for MH, based on current criteria, remains the dream.
Collapse
Affiliation(s)
- Kathryn M Stowell
- From the Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| |
Collapse
|
29
|
Yoshikawa T, Kanazawa H, Fujimoto S, Hirata K. Epistatic effects of multiple receptor genes on pathophysiology of asthma - its limits and potential for clinical application. Med Sci Monit 2014; 20:64-71. [PMID: 24435185 PMCID: PMC3907491 DOI: 10.12659/msm.889754] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Accepted: 11/09/2013] [Indexed: 01/31/2023] Open
Abstract
To date, genome-wide association studies (GWAS) permit a comprehensive scan of the genome in an unbiased manner, with high sensitivity, and thereby have the potential to identify candidate genes for the prevalence or development of multifactorial diseases such as bronchial asthma. However, most studies have only managed to explain a small additional percentage of hereditability estimates, and often fail to show consistent results among studies despite large sample sizes. Epistasis is defined as the interaction between multiple different genes affecting phenotypes. By applying epistatic analysis to clinical genetic research, we can analyze interactions among more than 2 molecules (genes) considering the whole system of the human body, illuminating dynamic molecular mechanisms. An increasing number of genetic studies have investigated epistatic effects on the risk for development of asthma. The present review highlights a concept of epistasis to overcome traditional genetic studies in humans and provides an update of evidence on epistatic effects on asthma. Furthermore, we review concerns regarding recent trends in epistatic analyses from the perspective of clinical physicians. These concerns include biological plausibility of genes identified by computational statistics, and definition of the diagnostic label of 'physician-diagnosed asthma'. In terms of these issues, further application of epistatic analysis will prompt identification of susceptibility of diseases and lead to the development of a new generation of pharmacological strategies to treat asthma.
Collapse
Affiliation(s)
- Takahiro Yoshikawa
- Department of Sports Medicine, Osaka City University Graduate School of Medicine, Osaka, Japan
| | - Hiroshi Kanazawa
- Department of Respiratory Medicine, Osaka City University Graduate School of Medicine, Osaka, Japan
| | - Shigeo Fujimoto
- Department of Sports Medicine, Osaka City University Graduate School of Medicine, Osaka, Japan
| | - Kazuto Hirata
- Department of Respiratory Medicine, Osaka City University Graduate School of Medicine, Osaka, Japan
| |
Collapse
|
30
|
Abstract
The bioinformatics requirements within the clinical environment are very specific, and analytic techniques need to be fit for purpose, robust, and predictable. At the same time, the bewildering amount of information produced during these analyses needs to be carefully managed, used and interpreted correctly. The challenge for clinical laboratories now is to implement production analytical processes that are capable of handling different experimental approaches on current equipment, as well as to incorporate ways for these systems to evolve to take account of developments likely to make impacts in the near future. This is complicated by the many options available at each of the critical processing steps and a clear method needs to be developed to assemble appropriate pipelines. Here, I discuss the issues relevant to the development of an informatics pipeline that meets these criteria that should allow individual laboratories to assess their proposed strategies.
Collapse
Affiliation(s)
- Richard James Nigel Allcock
- School of Pathology and Laboratory Medicine, University of Western Australia, M574 Stirling Highway, Nedlands, WA, 6009, Australia,
| |
Collapse
|
31
|
Jakupciak JP, Wells JM, Karalus RJ, Pawlowski DR, Lin JS, Feldman AB. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis. J Nucleic Acids 2013; 2013:801505. [PMID: 24455204 PMCID: PMC3877622 DOI: 10.1155/2013/801505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Revised: 10/01/2013] [Accepted: 10/02/2013] [Indexed: 11/18/2022] Open
Abstract
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Collapse
Affiliation(s)
| | | | | | | | - Jeffrey S. Lin
- The Johns Hopkins University, Applied Physics Laboratory, 11100 Johns Hopkins Road, Laurel, MD 20723, USA
| | - Andrew B. Feldman
- The Johns Hopkins University, Applied Physics Laboratory, 11100 Johns Hopkins Road, Laurel, MD 20723, USA
| |
Collapse
|