1
|
Sinha R, Pal RK, De RK. ENLIGHTENMENT: A Scalable Annotated Database of Genomics and NGS-Based Nucleotide Level Profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:155-168. [PMID: 38055361 DOI: 10.1109/tcbb.2023.3340067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
The revolution in sequencing technologies has enabled human genomes to be sequenced at a very low cost and time leading to exponential growth in the availability of whole-genome sequences. However, the complete understanding of our genome and its association with cancer is a far way to go. Researchers are striving hard to detect new variants and find their association with diseases, which further gives rise to the need for aggregation of this Big Data into a common standard scalable platform. In this work, a database named Enlightenment has been implemented which makes the availability of genomic data integrated from eight public databases, and DNA sequencing profiles of H. sapiens in a single platform. Annotated results with respect to cancer specific biomarkers, pharmacogenetic biomarkers and its association with variability in drug response, and DNA profiles along with novel copy number variants are computed and stored, which are accessible through a web interface. In order to overcome the challenge of storage and processing of NGS technology-based whole-genome DNA sequences, Enlightenment has been extended and deployed to a flexible and horizontally scalable database HBase, which is distributed over a hadoop cluster, which would enable the integration of other omics data into the database for enlightening the path towards eradication of cancer.
Collapse
|
2
|
Aradhya S, Facio FM, Metz H, Manders T, Colavin A, Kobayashi Y, Nykamp K, Johnson B, Nussbaum RL. Applications of artificial intelligence in clinical laboratory genomics. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2023; 193:e32057. [PMID: 37507620 DOI: 10.1002/ajmg.c.32057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023]
Abstract
The transition from analog to digital technologies in clinical laboratory genomics is ushering in an era of "big data" in ways that will exceed human capacity to rapidly and reproducibly analyze those data using conventional approaches. Accurately evaluating complex molecular data to facilitate timely diagnosis and management of genomic disorders will require supportive artificial intelligence methods. These are already being introduced into clinical laboratory genomics to identify variants in DNA sequencing data, predict the effects of DNA variants on protein structure and function to inform clinical interpretation of pathogenicity, link phenotype ontologies to genetic variants identified through exome or genome sequencing to help clinicians reach diagnostic answers faster, correlate genomic data with tumor staging and treatment approaches, utilize natural language processing to identify critical published medical literature during analysis of genomic data, and use interactive chatbots to identify individuals who qualify for genetic testing or to provide pre-test and post-test education. With careful and ethical development and validation of artificial intelligence for clinical laboratory genomics, these advances are expected to significantly enhance the abilities of geneticists to translate complex data into clearly synthesized information for clinicians to use in managing the care of their patients at scale.
Collapse
Affiliation(s)
- Swaroop Aradhya
- Invitae Corporation, San Francisco, California, USA
- Adjunct Clinical Faculty, Department of Pathology, Stanford University School of Medicine, Stanford, California, USA
| | | | - Hillery Metz
- Invitae Corporation, San Francisco, California, USA
| | - Toby Manders
- Invitae Corporation, San Francisco, California, USA
| | | | | | - Keith Nykamp
- Invitae Corporation, San Francisco, California, USA
| | | | - Robert L Nussbaum
- Invitae Corporation, San Francisco, California, USA
- Volunteer Faculty, School of Medicine, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
3
|
Borowiec ML, Dikow RB, Frandsen PB, McKeeken A, Valentini G, White AE. Deep learning as a tool for ecology and evolution. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13901] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Marek L. Borowiec
- Entomology, Plant Pathology and Nematology University of Idaho Moscow ID USA
- Institute for Bioinformatics and Evolutionary Studies (IBEST) University of Idaho Moscow ID USA
| | - Rebecca B. Dikow
- Data Science Lab, Office of the Chief Information Officer Smithsonian Institution Washington DC USA
| | - Paul B. Frandsen
- Data Science Lab, Office of the Chief Information Officer Smithsonian Institution Washington DC USA
- Department of Plant and Wildlife Sciences Brigham Young University Provo UT USA
| | - Alexander McKeeken
- Entomology, Plant Pathology and Nematology University of Idaho Moscow ID USA
| | | | - Alexander E. White
- Data Science Lab, Office of the Chief Information Officer Smithsonian Institution Washington DC USA
- Department of Botany, National Museum of Natural History Smithsonian Institution Washington DC USA
| |
Collapse
|
4
|
Liu Z, Roberts R, Mercer TR, Xu J, Sedlazeck FJ, Tong W. Towards accurate and reliable resolution of structural variants for clinical diagnosis. Genome Biol 2022; 23:68. [PMID: 35241127 PMCID: PMC8892125 DOI: 10.1186/s13059-022-02636-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 02/15/2022] [Indexed: 12/17/2022] Open
Abstract
Structural variants (SVs) are a major source of human genetic diversity and have been associated with different diseases and phenotypes. The detection of SVs is difficult, and a diverse range of detection methods and data analysis protocols has been developed. This difficulty and diversity make the detection of SVs for clinical applications challenging and requires a framework to ensure accuracy and reproducibility. Here, we discuss current developments in the diagnosis of SVs and propose a roadmap for the accurate and reproducible detection of SVs that includes case studies provided from the FDA-led SEquencing Quality Control Phase II (SEQC-II) and other consortium efforts.
Collapse
Affiliation(s)
- Zhichao Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Ruth Roberts
- ApconiX, BioHub at Alderley Park, Alderley Edge, SK10 4TG, UK.,University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Timothy R Mercer
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia.,Garvan Institute of Medical Research, Sydney, NSW, Australia.,St Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia
| | - Joshua Xu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
5
|
Pillay NS, Ross OA, Christoffels A, Bardien S. Current Status of Next-Generation Sequencing Approaches for Candidate Gene Discovery in Familial Parkinson´s Disease. Front Genet 2022; 13:781816. [PMID: 35299952 PMCID: PMC8921601 DOI: 10.3389/fgene.2022.781816] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 01/12/2022] [Indexed: 11/13/2022] Open
Abstract
Parkinson’s disease is a neurodegenerative disorder with a heterogeneous genetic etiology. The advent of next-generation sequencing (NGS) technologies has aided novel gene discovery in several complex diseases, including PD. This Perspective article aimed to explore the use of NGS approaches to identify novel loci in familial PD, and to consider their current relevance. A total of 17 studies, spanning various populations (including Asian, Middle Eastern and European ancestry), were identified. All the studies used whole-exome sequencing (WES), with only one study incorporating both WES and whole-genome sequencing. It is worth noting how additional genetic analyses (including linkage analysis, haplotyping and homozygosity mapping) were incorporated to enhance the efficacy of some studies. Also, the use of consanguineous families and the specific search for de novo mutations appeared to facilitate the finding of causal mutations. Across the studies, similarities and differences in downstream analysis methods and the types of bioinformatic tools used, were observed. Although these studies serve as a practical guide for novel gene discovery in familial PD, these approaches have not significantly resolved the “missing heritability” of PD. We speculate that what is needed is the use of third-generation sequencing technologies to identify complex genomic rearrangements and new sequence variation, missed with existing methods. Additionally, the study of ancestrally diverse populations (in particular those of Black African ancestry), with the concomitant optimization and tailoring of sequencing and analytic workflows to these populations, are critical. Only then, will this pave the way for exciting new discoveries in the field.
Collapse
Affiliation(s)
- Nikita Simone Pillay
- South African National Bioinformatics Institute (SANBI), South African Medical Research Council Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
| | - Owen A. Ross
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, United States
- Department of Clinical Genomics, Mayo Clinic, Jacksonville, FL, United States
| | - Alan Christoffels
- South African National Bioinformatics Institute (SANBI), South African Medical Research Council Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
- Africa Centres for Disease Control and Prevention, African Union Headquarters, Addis Ababa, Ethiopia
| | - Soraya Bardien
- Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
- South African Medical Research Council/Stellenbosch University Genomics of Brain Disorders Research Unit, Cape Town, South Africa
- *Correspondence: Soraya Bardien,
| |
Collapse
|
6
|
Sinha R, Pal RK, De RK. GenSeg and MR-GenSeg: A Novel Segmentation Algorithm and its Parallel MapReduce Based Approach for Identifying Genomic Regions With Copy Number Variations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:443-454. [PMID: 32750860 DOI: 10.1109/tcbb.2020.3000661] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Identifying intragenic as well as intergenic sequences of the DNA, having structural alterations, is a significantly important research area, since this may be the root cause of many neurological and autoimmune diseases, including cancer. Working with whole genome NGS data has provided a new insight in this regard, but has lead to huge explosion of data that is growing exponentially. Hence, the challenges lie in efficient means of storage and processing this big data. In this study, we have developed a novel segmentation algorithm, called GenSeg, and its parallel MapReduce based algorithm, called MR-GenSeg, for detecting copy number variations. In order to annotate CNVs (variants), segments formed by GenSeg/MR-GenSeg have been represented in a novel way using a binary tree, where each node is a CNV event. GenSeg considers each position specific data of whole genome DNA sequence, so that precise identification of breakpoints is possible. GenSeg/MR-GenSeg has been compared with twelve popular CNV detection algorithms, where it has outperformed the others in terms of sensitivity, and has achieved a good F-score value. MR-GenSeg has excelled in terms of SpeedUp, when compared with these algorithms. The effect of CNVs on immunoglobulin (IG) genes has also been analysed in this study. Availability: The source codes are available at https://github.com/rituparna-sinha/MapReduce-GENSEG.
Collapse
|
7
|
Herai RH, Szeto RA, Trujillo CA, Muotri AR. Response to Comment on "Reintroduction of the archaic variant of NOVA1 in cortical organoids alters neurodevelopment". Science 2021; 374:eabi9881. [PMID: 34648331 DOI: 10.1126/science.abi9881] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Maricic et al. performed an undisclosed in silico-only whole-exome sequencing analysis of our data and found genomic alterations previously undetected in some clones. Some of the predicted alterations, if true, could change the original genotype of the clones. We failed to experimentally validate all but one of these genomic alterations, which did not affect our previous results or data interpretation.
Collapse
Affiliation(s)
- Roberto H Herai
- Experimental Multiuser Laboratory (LEM), Graduate Program in Health Sciences, School of Medicine, Pontifícia Universidade Católica do Paraná, Curitiba, PR 80215-901, Brazil
| | - Ryan A Szeto
- Department of Pediatrics and Department of Cellular and Molecular Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92037, USA
| | - Cleber A Trujillo
- Department of Pediatrics and Department of Cellular and Molecular Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92037, USA
| | - Alysson R Muotri
- Department of Pediatrics and Department of Cellular and Molecular Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92037, USA.,Department of Pediatrics and Department of Cellular and Molecular Medicine, School of Medicine, Center for Academic Research and Training in Anthropogeny (CARTA), Kavli Institute for Brain and Mind, Archealization Center (ArchC), University of California, San Diego, La Jolla, CA 92037, USA
| |
Collapse
|
8
|
Hill T, Rosales-Stephens HL, Unckless RL. Rapid divergence of the male reproductive proteins in the Drosophila dunni group and implications for postmating incompatibilities between species. G3 (BETHESDA, MD.) 2021; 11:jkab050. [PMID: 33599779 PMCID: PMC8759818 DOI: 10.1093/g3journal/jkab050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 02/17/2021] [Indexed: 11/17/2022]
Abstract
Proteins involved in post-copulatory interactions between males and females are among the fastest evolving genes in many species, usually attributed to their involvement in reproductive conflict. As a result, these proteins are thought to often be involved in the formation of postmating-prezygotic incompatibilities between species. The Drosophila dunni subgroup consists of a dozen recently diverged species found across the Caribbean islands with varying levels of hybrid incompatibility. We performed experimental crosses between species in the dunni group and see some evidence of hybrid incompatibilities. We also find evidence of reduced survival following hybrid mating, likely due to postmating-prezygotic incompatibilities. We assessed rates of evolution between these species genomes and find evidence of rapid evolution and divergence of some reproductive proteins, specifically the seminal fluid proteins. This work suggests the rapid evolution of seminal fluid proteins may be associated with postmating-prezygotic isolation, which acts as a barrier for gene flow between even the most closely related species.
Collapse
Affiliation(s)
- Tom Hill
- The Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66045, USA
| | | | - Robert L Unckless
- The Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
9
|
Searles Quick VB, Wang B, State MW. Leveraging large genomic datasets to illuminate the pathobiology of autism spectrum disorders. Neuropsychopharmacology 2021; 46:55-69. [PMID: 32668441 PMCID: PMC7688655 DOI: 10.1038/s41386-020-0768-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Revised: 06/26/2020] [Accepted: 07/06/2020] [Indexed: 12/15/2022]
Abstract
"Big data" approaches in the form of large-scale human genomic studies have led to striking advances in autism spectrum disorder (ASD) genetics. Similar to many other psychiatric syndromes, advances in genotyping technology, allowing for inexpensive genome-wide assays, has confirmed the contribution of polygenic inheritance involving common alleles of small effect, a handful of which have now been definitively identified. However, the past decade of gene discovery in ASD has been most notable for the application, in large family-based cohorts, of high-density microarray studies of submicroscopic chromosomal structure as well as high-throughput DNA sequencing-leading to the identification of an increasingly long list of risk regions and genes disrupted by rare, de novo germline mutations of large effect. This genomic architecture offers particular advantages for the illumination of biological mechanisms but also presents distinctive challenges. While the tremendous locus heterogeneity and functional pleiotropy associated with the more than 100 identified ASD-risk genes and regions is daunting, a growing armamentarium of comprehensive, large, foundational -omics databases, across species and capturing developmental trajectories, are increasingly contributing to a deeper understanding of ASD pathology.
Collapse
Affiliation(s)
- Veronica B Searles Quick
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Belinda Wang
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Matthew W State
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, 94143, USA.
| |
Collapse
|
10
|
Tigano A. A population genomics approach to uncover the CNVs, and their evolutionary significance, hidden in reduced-representation sequencing data sets. Mol Ecol 2020; 29:4749-4753. [PMID: 32997366 DOI: 10.1111/mec.15665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 09/11/2020] [Indexed: 12/01/2022]
Abstract
The importance of structural variation in adaptation and speciation is becoming increasingly evident in the literature. Among SVs, copy number variants (CNVs) are known to affect phenotypes through changes in gene expression and can potentially reduce recombination between alleles with different copy numbers. However, little is known about their abundance, distribution and frequency in natural populations. In a "From the Cover" article in this issue of Molecular Ecology, Dorant et al. (2020) present a new cost-effective approach to genotype copy number variants (CNVs) from large reduced-representation sequencing (RRS) data sets in nonmodel organisms, and thus to analyse sequence and structural variation jointly. They show that in American lobsters (Homarus americanus), CNVs exhibit strong population structure and several significant associations with annual variance in sea surface temperature, while SNPs fail to uncover any population structure or genotype-environment associations. Their results clearly illustrate that structural variants like CNVs can potentially store important information on differentiation and adaptive differences that cannot be retrieved from the analysis of sequence variation alone. To better understand the factors affecting the evolution of CNVs and their role in adaptation and speciation, we need to compare and synthesize data from a wide variety of species with different demographic histories and genome structure. The approach developed by Dorant et al. (2020) now allows to gain crucial knowledge on CNVs in a cost-effective way, even in species with limited genomic resources.
Collapse
Affiliation(s)
- Anna Tigano
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH, USA.,Hubbard Center for Genome Studies, University of New Hampshire, Durham, NH, USA
| |
Collapse
|
11
|
Meggendorfer M, Walter W, Haferlach T. WGS and WTS in leukaemia: A tool for diagnostics? Best Pract Res Clin Haematol 2020; 33:101190. [DOI: 10.1016/j.beha.2020.101190] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 05/27/2020] [Indexed: 12/20/2022]
|
12
|
Mérot C, Oomen RA, Tigano A, Wellenreuther M. A Roadmap for Understanding the Evolutionary Significance of Structural Genomic Variation. Trends Ecol Evol 2020; 35:561-572. [PMID: 32521241 DOI: 10.1016/j.tree.2020.03.002] [Citation(s) in RCA: 135] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 02/25/2020] [Accepted: 03/03/2020] [Indexed: 12/12/2022]
Abstract
Structural genomic variants (SVs) are ubiquitous and play a major role in adaptation and speciation. Yet, comparative and population genomics have focused predominantly on gene duplications and large-effect inversions. The lack of a common framework for studying all SVs is hampering progress towards a more systematic assessment of their evolutionary significance. Here we (i) review how different types of SVs affect ecological and evolutionary processes; (ii) suggest unifying definitions and recommendations for future studies; and (iii) provide a roadmap for the integration of SVs in ecoevolutionary studies. In doing so, we lay the foundation for population genomics, theoretical, and experimental approaches to understand how the full spectrum of SVs impacts ecological and evolutionary processes.
Collapse
Affiliation(s)
- Claire Mérot
- Université Laval, Institut de Biologie Intégrative des Systèmes, 1030 Avenue de la Médecine, G1V 0A6, Québec, QC, Canada.
| | - Rebekah A Oomen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Blindernveien 31, 0371 Oslo, Norway; Centre for Coastal Research, University of Agder, Universitetsveien 25, 4630 Kristiansand, Norway.
| | - Anna Tigano
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH, USA; Hubbard Center for Genome Studies, University of New Hampshire, Durham, NH, USA.
| | - Maren Wellenreuther
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand; The New Zealand Institute for Plant & Food Research Ltd, Nelson, New Zealand.
| |
Collapse
|