251
|
Comparative Phylogenomics, a Stepping Stone for Bird Biodiversity Studies. DIVERSITY-BASEL 2019. [DOI: 10.3390/d11070115] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Birds are a group with immense availability of genomic resources, and hundreds of forthcoming genomes at the doorstep. We review recent developments in whole genome sequencing, phylogenomics, and comparative genomics of birds. Short read based genome assemblies are common, largely due to efforts of the Bird 10K genome project (B10K). Chromosome-level assemblies are expected to increase due to improved long-read sequencing. The available genomic data has enabled the reconstruction of the bird tree of life with increasing confidence and resolution, but challenges remain in the early splits of Neoaves due to their explosive diversification after the Cretaceous-Paleogene (K-Pg) event. Continued genomic sampling of the bird tree of life will not just better reflect their evolutionary history but also shine new light onto the organization of phylogenetic signal and conflict across the genome. The comparatively simple architecture of avian genomes makes them a powerful system to study the molecular foundation of bird specific traits. Birds are on the verge of becoming an extremely resourceful system to study biodiversity from the nucleotide up.
Collapse
|
252
|
|
253
|
Nasykhova YA, Barbitoff YA, Serebryakova EA, Katserov DS, Glotov AS. Recent advances and perspectives in next generation sequencing application to the genetic research of type 2 diabetes. World J Diabetes 2019; 10:376-395. [PMID: 31363385 PMCID: PMC6656706 DOI: 10.4239/wjd.v10.i7.376] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 05/23/2019] [Accepted: 06/11/2019] [Indexed: 02/05/2023] Open
Abstract
Type 2 diabetes (T2D) mellitus is a common complex disease that currently affects more than 400 million people worldwide and has become a global health problem. High-throughput sequencing technologies such as whole-genome and whole-exome sequencing approaches have provided numerous new insights into the molecular bases of T2D. Recent advances in the application of sequencing technologies to T2D research include, but are not limited to: (1) Fine mapping of causal rare and common genetic variants; (2) Identification of confident gene-level associations; (3) Identification of novel candidate genes by specific scoring approaches; (4) Interrogation of disease-relevant genes and pathways by transcriptional profiling and epigenome mapping techniques; and (5) Investigation of microbial community alterations in patients with T2D. In this work we review these advances in application of next-generation sequencing methods for elucidation of T2D pathogenesis, as well as progress and challenges in implementation of this new knowledge about T2D genetics in diagnosis, prevention, and treatment of the disease.
Collapse
Affiliation(s)
- Yulia A Nasykhova
- Laboratory of Biobanking and Genomic Medicine of Institute of Translation Biomedicine, St. Petersburg State University, St. Petersburg 199034, Russia
- Department of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, St. Petersburg 199034, Russia
| | - Yury A Barbitoff
- Laboratory of Biobanking and Genomic Medicine of Institute of Translation Biomedicine, St. Petersburg State University, St. Petersburg 199034, Russia
- Bioinformatics Institute, St. Petersburg 194021, Russia
- Department of Genetics and Biotechnology, St. Petersburg State University, St. Petersburg 199034, Russia
| | - Elena A Serebryakova
- Department of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, St. Petersburg 199034, Russia
- Department of Genetics, City Hospital No. 40, St. Petersburg 197706, Russia
| | - Dmitry S Katserov
- Institute of Living Systems, Immanuel Kant Baltic Federal University, Kaliningrad 236016, Russia
| | - Andrey S Glotov
- Laboratory of Biobanking and Genomic Medicine of Institute of Translation Biomedicine, St. Petersburg State University, St. Petersburg 199034, Russia
- Department of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, St. Petersburg 199034, Russia
- Department of Genetics, City Hospital No. 40, St. Petersburg 197706, Russia
- Institute of Living Systems, Immanuel Kant Baltic Federal University, Kaliningrad 236016, Russia
| |
Collapse
|
254
|
Schmid M, Frei D, Patrignani A, Schlapbach R, Frey JE, Remus-Emsermann MNP, Ahrens CH. Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats. Nucleic Acids Res 2019; 46:8953-8965. [PMID: 30137508 PMCID: PMC6158609 DOI: 10.1093/nar/gky726] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 08/15/2018] [Indexed: 12/16/2022] Open
Abstract
Generating a complete, de novo genome assembly for prokaryotes is often considered a solved problem. However, we here show that Pseudomonas koreensis P19E3 harbors multiple, near identical repeat pairs up to 70 kilobase pairs in length, which contained several genes that may confer fitness advantages to the strain. Its complex genome, which also included a variable shufflon region, could not be de novo assembled with long reads produced by Pacific Biosciences’ technology, but required very long reads from Oxford Nanopore Technologies. Importantly, a repeat analysis, whose results we release for over 9600 prokaryotes, indicated that very complex bacterial genomes represent a general phenomenon beyond Pseudomonas. Roughly 10% of 9331 complete bacterial and a handful of 293 complete archaeal genomes represented this ‘dark matter’ for de novo genome assembly of prokaryotes. Several of these ‘dark matter’ genome assemblies contained repeats far beyond the resolution of the sequencing technology employed and likely contain errors, other genomes were closed employing labor-intense steps like cosmid libraries, primer walking or optical mapping. Using very long sequencing reads in combination with assembly algorithms capable of resolving long, near identical repeats will bring most prokaryotic genomes within reach of fast and complete de novo genome assembly.
Collapse
Affiliation(s)
- Michael Schmid
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics, Wädenswil CH-8820, Switzerland.,SIB Swiss Institute of Bioinformatics, Wädenswil CH-8820, Switzerland
| | - Daniel Frei
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics, Wädenswil CH-8820, Switzerland
| | - Andrea Patrignani
- Functional Genomics Center Zurich, University of Zurich & ETH Zurich, Zurich CH-8057, Switzerland
| | - Ralph Schlapbach
- Functional Genomics Center Zurich, University of Zurich & ETH Zurich, Zurich CH-8057, Switzerland
| | - Jürg E Frey
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics, Wädenswil CH-8820, Switzerland
| | - Mitja N P Remus-Emsermann
- School of Biological Sciences, University of Canterbury, Christchurch 8140, New Zealand.,Biomolecular Interaction Centre, University of Canterbury, Christchurch, 8140, New Zealand
| | - Christian H Ahrens
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics, Wädenswil CH-8820, Switzerland.,SIB Swiss Institute of Bioinformatics, Wädenswil CH-8820, Switzerland
| |
Collapse
|
255
|
Ebler J, Haukness M, Pesout T, Marschall T, Paten B. Haplotype-aware diplotyping from noisy long reads. Genome Biol 2019; 20:116. [PMID: 31159868 PMCID: PMC6547545 DOI: 10.1186/s13059-019-1709-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 05/06/2019] [Indexed: 12/19/2022] Open
Abstract
Current genotyping approaches for single-nucleotide variations rely on short, accurate reads from second-generation sequencing devices. Presently, third-generation sequencing platforms are rapidly becoming more widespread, yet approaches for leveraging their long but error-prone reads for genotyping are lacking. Here, we introduce a novel statistical framework for the joint inference of haplotypes and genotypes from noisy long reads, which we term diplotyping. Our technique takes full advantage of linkage information provided by long reads. We validate hundreds of thousands of candidate variants that have not yet been included in the high-confidence reference set of the Genome-in-a-Bottle effort.
Collapse
Affiliation(s)
- Jana Ebler
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, 66123, Germany
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
- Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, Saarbrücken, Germany
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, 95064, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, 95064, CA, USA
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, 66123, Germany.
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany.
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, 95064, CA, USA.
| |
Collapse
|
256
|
Abstract
The computational reconstruction of genome sequences from shotgun sequencing data has been greatly simplified by the advent of sequencing technologies that generate long reads. In the case of relatively small genomes (e.g., bacterial or viral), complete genome sequences can frequently be reconstructed computationally without the need for further experiments. However, large and complex genomes, such as those of most animals and plants, continue to pose significant challenges. In such genomes, assembly software produces incomplete and fragmented reconstructions that require additional experimentally derived information and manual intervention in order to reconstruct individual chromosome arms. Recent technologies originally designed to capture chromatin structure have been shown to effectively complement sequencing data, leading to much more contiguous reconstructions of genomes than previously possible. Here, we survey these technologies and the algorithms used to assemble and analyze large eukaryotic genomes, placed within the historical context of genome scaffolding technologies that have been in existence since the dawn of the genomic era.
Collapse
Affiliation(s)
- Jay Ghurye
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Mihai Pop
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| |
Collapse
|
257
|
Mantere T, Kersten S, Hoischen A. Long-Read Sequencing Emerging in Medical Genetics. Front Genet 2019; 10:426. [PMID: 31134132 PMCID: PMC6514244 DOI: 10.3389/fgene.2019.00426] [Citation(s) in RCA: 226] [Impact Index Per Article: 45.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 04/18/2019] [Indexed: 12/12/2022] Open
Abstract
The wide implementation of next-generation sequencing (NGS) technologies has revolutionized the field of medical genetics. However, the short read lengths of currently used sequencing approaches pose a limitation for the identification of structural variants, sequencing repetitive regions, phasing of alleles and distinguishing highly homologous genomic regions. These limitations may significantly contribute to the diagnostic gap in patients with genetic disorders who have undergone standard NGS, like whole exome or even genome sequencing. Now, the emerging long-read sequencing (LRS) technologies may offer improvements in the characterization of genetic variation and regions that are difficult to assess with the prevailing NGS approaches. LRS has so far mainly been used to investigate genetic disorders with previously known or strongly suspected disease loci. While these targeted approaches already show the potential of LRS, it remains to be seen whether LRS technologies can soon enable true whole genome sequencing routinely. Ultimately, this could allow the de novo assembly of individual whole genomes used as a generic test for genetic disorders. In this article, we summarize the current LRS-based research on human genetic disorders and discuss the potential of these technologies to facilitate the next major advancements in medical genetics.
Collapse
Affiliation(s)
- Tuomo Mantere
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, Netherlands
- Laboratory of Cancer Genetics and Tumor Biology, Cancer and Translational Medicine Research Unit and Biocenter Oulu, University of Oulu, Oulu, Finland
| | - Simone Kersten
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, Netherlands
- Department of Internal Medicine, Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Alexander Hoischen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, Netherlands
- Department of Internal Medicine, Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| |
Collapse
|
258
|
A highly flexible and repeatable genotyping method for aquaculture studies based on target amplicon sequencing using next-generation sequencing technology. Sci Rep 2019; 9:6904. [PMID: 31061473 PMCID: PMC6502806 DOI: 10.1038/s41598-019-43336-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 04/18/2019] [Indexed: 11/08/2022] Open
Abstract
Studies using genome-wide single nucleotide polymorphisms (SNPs) have become commonplace in genetics and genomics, due to advances in high-throughput sequencing technologies. Since the numbers of required SNPs and samples vary depending on each research goal, genotyping technologies with high flexibility in the number of SNPs/samples and high repeatability have been intensively investigated. For example, the ultrahigh-multiplexed amplicon sequencing, Ion AmpliSeq, has been used as a high-throughput genotyping method mainly for diagnostic purposes. Here, we designed a custom panel targeting 3,187 genome-wide SNPs of fugu, Takifugu rubripes, and applied it for genotyping farmed fugu to test its feasibility in aquaculture studies. We sequenced two libraries consisting of different pools of individuals (n = 326 each) on the Illumina MiSeq sequencer. Consequently, over 99% target regions (3,178 SNPs) were amplified and 2,655 SNPs were available after filtering steps. Strong correlation was observed in the mean depth of coverage of each SNP between duplicate runs (r = 0.993). Genetic analysis using these genotype data successfully detected the known population structure and the sex determining locus of fugu. These results show the method is superior in repeatability and flexibility, and suits genetic studies including molecular breeding, such as marker assisted and genomic selection.
Collapse
|
259
|
Lakdawala SS, Lee N, Brooke CB. Teaching an Old Virus New Tricks: A Review on New Approaches to Study Age-Old Questions in Influenza Biology. J Mol Biol 2019; 431:4247-4258. [PMID: 31051174 DOI: 10.1016/j.jmb.2019.04.038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 04/12/2019] [Accepted: 04/23/2019] [Indexed: 01/31/2023]
Abstract
Influenza viruses have been studied for over 80 years, yet much about the basic viral lifecycle remain unknown. However, new imaging, biochemical, and sequencing techniques have revealed significant insight into many age-old questions of influenza virus biology. In this review, we will cover the role of imaging techniques to describe unique aspects of influenza virus assembly, biochemical techniques to study viral genomic organization, and next-generation sequencing to explore influenza genomic evolution. Our goal is to provide a brief overview of how emerging techniques are being used to answer basic questions about influenza viruses. This is not a comprehensive list of emerging techniques, rather ones that we feel will continue to make significant contributions to field of influenza biology.
Collapse
Affiliation(s)
- Seema S Lakdawala
- Department of Microbiology and Molecular Genetics, University of Pittsburgh, School of Medicine Pittsburgh, PA 15219, USA.
| | - Nara Lee
- Department of Microbiology and Molecular Genetics, University of Pittsburgh, School of Medicine Pittsburgh, PA 15219, USA.
| | - Christopher B Brooke
- Department of Microbiology, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA.
| |
Collapse
|
260
|
Huang Y, Zheng S, Wang R, Tang C, Zhu J, Li J. CCL5 and related genes might be the potential diagnostic biomarkers for the therapeutic strategies of rheumatoid arthritis. Clin Rheumatol 2019; 38:2629-2635. [PMID: 31011897 DOI: 10.1007/s10067-019-04533-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 03/07/2019] [Accepted: 03/25/2019] [Indexed: 12/11/2022]
Abstract
OBJECTIVE Rheumatoid arthritis (RA) is a common disease of rheumatic diseases. The aim of this study was to identify gene signatures in RA and uncover their potential mechanisms. METHOD Gene expression profiles of GSE1919, GSE55235, GSE55457, and GSE77928 were downloaded from GEO database. The above four series contained 76 samples, including 44 RA patients and 32 normal controls. The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed, and protein-protein interaction (PPI) network of the differentially expressed genes (DEGs) was constructed by Cytoscape software. RESULTS Up-regulated DEGs were significantly enriched in biological processes, including immune response, positive regulation of immune system process and regulation of immune system process, while down-regulated DEGs were significantly enriched in biological processes, including response to oxygen-containing compound, cellular lipid metabolic process, and lipid metabolic process. KEGG pathway analysis showed the up-regulated DEGs were enriched in cytokine-cytokine receptor interaction, chemokine signaling pathway, and primary immunodeficiency. The 104 hub genes, which were significantly differently expressed between patients and normal controls in at least two datasets, were identified from the PPI network, and subnetworks revealed that these genes were involved in significant pathways, including cytokine-cytokine receptor interaction, chemokine signaling pathway, and primary immunodeficiency. CONCLUSION The present study indicated that the identified DEGs and hub genes promote our understanding of molecular mechanisms underlying the development of RA, such as C-C motif chemokine 5 (CCL5), might have a negative impact in the development of RA. CCL5 and its related genes might be the potential diagnostic biomarkers for the therapeutic strategies of RA.
Collapse
Affiliation(s)
- Yinger Huang
- The Department of Internal Medicine of Traditional Chinese Medicine, College of Traditional Chinese Medicine, Southern Medical University, Guangzhou, 510515, Guangdong, China
| | - Songyuan Zheng
- The Department of Internal Medicine of Traditional Chinese Medicine, College of Traditional Chinese Medicine, Southern Medical University, Guangzhou, 510515, Guangdong, China
| | - Ran Wang
- The Department of Internal Medicine of Traditional Chinese Medicine, College of Traditional Chinese Medicine, Southern Medical University, Guangzhou, 510515, Guangdong, China
| | - Cuiping Tang
- The Department of Internal Medicine of Traditional Chinese Medicine, College of Traditional Chinese Medicine, Southern Medical University, Guangzhou, 510515, Guangdong, China
| | - Junqing Zhu
- The Department of Internal Medicine of Traditional Chinese Medicine, College of Traditional Chinese Medicine, Southern Medical University, Guangzhou, 510515, Guangdong, China
- The Department of Rheumatology, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, Guangdong, China
| | - Juan Li
- The Department of Internal Medicine of Traditional Chinese Medicine, College of Traditional Chinese Medicine, Southern Medical University, Guangzhou, 510515, Guangdong, China.
- The Department of Rheumatology, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, Guangdong, China.
| |
Collapse
|
261
|
Platanus-allee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regions. Nat Commun 2019; 10:1702. [PMID: 30979905 PMCID: PMC6461651 DOI: 10.1038/s41467-019-09575-2] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 03/19/2019] [Indexed: 12/14/2022] Open
Abstract
The ultimate goal for diploid genome determination is to completely decode homologous chromosomes independently, and several phasing programs from consensus sequences have been developed. These methods work well for lowly heterozygous genomes, but the manifold species have high heterozygosity. Additionally, there are highly divergent regions (HDRs), where the haplotype sequences differ considerably. Because HDRs are likely to direct various interesting biological phenomena, many genomic analysis targets fall within these regions. However, they cannot be accessed by existing phasing methods, and we have to adopt costly traditional methods. Here, we develop a de novo haplotype assembler, Platanus-allee ( http://platanus.bio.titech.ac.jp/platanus2 ), which initially constructs each haplotype sequence and then untangles the assembly graphs utilizing sequence links and synteny information. A comprehensive benchmark analysis reveals that Platanus-allee exhibits high recall and precision, particularly for HDRs. Using this approach, previously unknown HDRs are detected in the human genome, which may uncover novel aspects of genome variability.
Collapse
|
262
|
Wallberg A, Bunikis I, Pettersson OV, Mosbech MB, Childers AK, Evans JD, Mikheyev AS, Robertson HM, Robinson GE, Webster MT. A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds. BMC Genomics 2019; 20:275. [PMID: 30961563 PMCID: PMC6454739 DOI: 10.1186/s12864-019-5642-0] [Citation(s) in RCA: 120] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 03/24/2019] [Indexed: 01/27/2023] Open
Abstract
Background The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map. Results Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel_HAv3) is significantly more contiguous and complete than the previous one (Amel_4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor > 98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features. Conclusions The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics. Electronic supplementary material The online version of this article (10.1186/s12864-019-5642-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Andreas Wallberg
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ignas Bunikis
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Olga Vinnere Pettersson
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Mai-Britt Mosbech
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Anna K Childers
- USDA-ARS Insect Genetics and Biochemistry Research Unit, Fargo, ND, USA.,USDA-ARS Bee Research Lab, Beltsville, MD, USA
| | - Jay D Evans
- USDA-ARS Bee Research Lab, Beltsville, MD, USA
| | | | - Hugh M Robertson
- Department of Entomology and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Gene E Robinson
- Department of Entomology and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Matthew T Webster
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
263
|
Jiang T, Fu Y, Liu B, Wang Y. Long-Read Based Novel Sequence Insertion Detection With rCANID. IEEE Trans Nanobioscience 2019; 18:343-352. [PMID: 30946672 DOI: 10.1109/tnb.2019.2908438] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Novel sequence insertion (NSI) is an essential category of genome structural variations (SVs), which represents DNA segments absent from the reference genome assembly. It has important biological functions and strong correlation with phenotypes and diseases. The rapid development of long-read sequencing technologies provides the opportunities to discover NSIs more sensitively, since the much longer reads are helpful for the assembly and location of the novel sequences. However, most of state-of-the-art long-read based SV detection approaches are in generic design to detect various kinds of SVs, and they are either not suited to detect NSIs or computationally expensive. Herein, we propose read clustering and assembly-based novel insertion detection tool (rCANID). It applies tailored chimerically aligned and unaligned read clustering and lightweight local assembly methods to reconstruct inserted sequences with low computational cost. Benchmarks on both simulated and real datasets demonstrate that rCANID can discover NSIs sensitively and efficiently, especially for NSI events with long inserted sequences which is still a non-trivial task for state-of-the-art approaches. With its good NSI detection ability, rCANID is suited to be integrated into computational pipelines to play important roles in many cutting-edge genomics studies.
Collapse
|
264
|
Zhao L, Zhang H, Kohnen MV, Prasad KVSK, Gu L, Reddy ASN. Analysis of Transcriptome and Epitranscriptome in Plants Using PacBio Iso-Seq and Nanopore-Based Direct RNA Sequencing. Front Genet 2019; 10:253. [PMID: 30949200 PMCID: PMC6438080 DOI: 10.3389/fgene.2019.00253] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 03/06/2019] [Indexed: 12/18/2022] Open
Abstract
Nanopore sequencing from Oxford Nanopore Technologies (ONT) and Pacific BioSciences (PacBio) single-molecule real-time (SMRT) long-read isoform sequencing (Iso-Seq) are revolutionizing the way transcriptomes are analyzed. These methods offer many advantages over most widely used high-throughput short-read RNA sequencing (RNA-Seq) approaches and allow a comprehensive analysis of transcriptomes in identifying full-length splice isoforms and several other post-transcriptional events. In addition, direct RNA-Seq provides valuable information about RNA modifications, which are lost during the PCR amplification step in other methods. Here, we present a comprehensive summary of important applications of these technologies in plants, including identification of complex alternative splicing (AS), full-length splice variants, fusion transcripts, and alternative polyadenylation (APA) events. Furthermore, we discuss the impact of the newly developed nanopore direct RNA-Seq in advancing epitranscriptome research in plants. Additionally, we summarize computational tools for identifying and quantifying full-length isoforms and other co/post-transcriptional events and discussed some of the limitations with these methods. Sequencing of transcriptomes using these new single-molecule long-read methods will unravel many aspects of transcriptome complexity in unprecedented ways as compared to previous short-read sequencing approaches. Analysis of plant transcriptomes with these new powerful methods that require minimum sample processing is likely to become the norm and is expected to uncover novel co/post-transcriptional gene regulatory mechanisms that control biological outcomes during plant development and in response to various stresses.
Collapse
Affiliation(s)
- Liangzhen Zhao
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Hangxiao Zhang
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Markus V. Kohnen
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Kasavajhala V. S. K. Prasad
- Program in Cell and Molecular Biology, Department of Biology, Colorado State University, Fort Collins, CO, United States
| | - Lianfeng Gu
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Anireddy S. N. Reddy
- Program in Cell and Molecular Biology, Department of Biology, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
265
|
Parivesh A, Barseghyan H, Délot E, Vilain E. Translating genomics to the clinical diagnosis of disorders/differences of sex development. Curr Top Dev Biol 2019; 134:317-375. [PMID: 30999980 PMCID: PMC7382024 DOI: 10.1016/bs.ctdb.2019.01.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The medical and psychosocial challenges faced by patients living with Disorders/Differences of Sex Development (DSD) and their families can be alleviated by a rapid and accurate diagnostic process. Clinical diagnosis of DSD is limited by a lack of standardization of anatomical and endocrine phenotyping and genetic testing, as well as poor genotype/phenotype correlation. Historically, DSD genes have been identified through positional cloning of disease-associated variants segregating in families and validation of candidates in animal and in vitro modeling of variant pathogenicity. Owing to the complexity of conditions grouped under DSD, genome-wide scanning methods are better suited for identifying disease causing gene variant(s) and providing a clinical diagnosis. Here, we review a number of established genomic tools (karyotyping, chromosomal microarrays and exome sequencing) used in clinic for DSD diagnosis, as well as emerging genomic technologies such as whole-genome (short-read) sequencing, long-read sequencing, and optical mapping used for novel DSD gene discovery. These, together with gene expression and epigenetic studies can potentiate the clinical diagnosis of DSD diagnostic rates and enhance the outcomes for patients and families.
Collapse
Affiliation(s)
- Abhinav Parivesh
- Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC, United States
| | - Hayk Barseghyan
- Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC, United States; Department of Genomics and Precision Medicine, The George Washington University, Washington, DC, United States
| | - Emmanuèle Délot
- Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC, United States; Department of Genomics and Precision Medicine, The George Washington University, Washington, DC, United States.
| | - Eric Vilain
- Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC, United States; Department of Genomics and Precision Medicine, The George Washington University, Washington, DC, United States.
| |
Collapse
|
266
|
Alonge M, Schatz MC. A master regulator of regeneration. Science 2019; 363:1152-1153. [PMID: 30872508 DOI: 10.1126/science.aaw6258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Affiliation(s)
- Michael Alonge
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. .,Department of Biology, Johns Hopkins University, Baltimore, MD, USA.,Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
267
|
Luo R, Sedlazeck FJ, Lam TW, Schatz MC. A multi-task convolutional deep neural network for variant calling in single molecule sequencing. Nat Commun 2019; 10:998. [PMID: 30824707 PMCID: PMC6397153 DOI: 10.1038/s41467-019-09025-z] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2018] [Accepted: 02/15/2019] [Indexed: 12/22/2022] Open
Abstract
The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source ( https://github.com/aquaskyline/Clairvoyante ), with modules to train, utilize and visualize the model.
Collapse
Affiliation(s)
- Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China.
- Department of Computer Science, Johns Hopkins University, Baltimore, 21218, MD, USA.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, 77030, TX, USA
| | - Tak-Wah Lam
- Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, 21218, MD, USA
| |
Collapse
|
268
|
Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. Machine Learning and Integrative Analysis of Biomedical Big Data. Genes (Basel) 2019; 10:E87. [PMID: 30696086 PMCID: PMC6410075 DOI: 10.3390/genes10020087] [Citation(s) in RCA: 153] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 01/08/2019] [Accepted: 01/21/2019] [Indexed: 12/11/2022] Open
Abstract
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.
Collapse
Affiliation(s)
- Bilal Mirza
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Wei Wang
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Jie Wang
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Howard Choi
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Neo Christopher Chung
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland.
| | - Peipei Ping
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Medicine (Cardiology), University of California Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
269
|
Abstract
We polled the Editorial Board of Genome Biology to ask where they see genomics going in the next few years. Here are some of their responses.
Collapse
|
270
|
Marchet C, Lecompte L, Silva CD, Cruaud C, Aury JM, Nicolas J, Peterlongo P. De novo clustering of long reads by gene from transcriptomics data. Nucleic Acids Res 2019; 47:e2. [PMID: 30260405 PMCID: PMC6326815 DOI: 10.1093/nar/gky834] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Revised: 09/04/2018] [Accepted: 09/10/2018] [Indexed: 02/07/2023] Open
Abstract
Long-read sequencing currently provides sequences of several thousand base pairs. It is therefore possible to obtain complete transcripts, offering an unprecedented vision of the cellular transcriptome. However the literature lacks tools for de novo clustering of such data, in particular for Oxford Nanopore Technologies reads, because of the inherent high error rate compared to short reads. Our goal is to process reads from whole transcriptome sequencing data accurately and without a reference genome in order to reliably group reads coming from the same gene. This de novo approach is therefore particularly suitable for non-model species, but can also serve as a useful pre-processing step to improve read mapping. Our contribution both proposes a new algorithm adapted to clustering of reads by gene and a practical and free access tool that allows to scale the complete processing of eukaryotic transcriptomes. We sequenced a mouse RNA sample using the MinION device. This dataset is used to compare our solution to other algorithms used in the context of biological clustering. We demonstrate that it is the best approach for transcriptomics long reads. When a reference is available to enable mapping, we show that it stands as an alternative method that predicts complementary clusters.
Collapse
Affiliation(s)
| | | | - Corinne Da Silva
- Commissariat à l’Énergie Atomique (CEA), Institut de Biologie François Jacob, Genoscope, 91000 Evry, France
| | - Corinne Cruaud
- Commissariat à l’Énergie Atomique (CEA), Institut de Biologie François Jacob, Genoscope, 91000 Evry, France
| | - Jean-Marc Aury
- Commissariat à l’Énergie Atomique (CEA), Institut de Biologie François Jacob, Genoscope, 91000 Evry, France
| | | | | |
Collapse
|
271
|
Single-Molecule Sequencing: Towards Clinical Applications. Trends Biotechnol 2019; 37:72-85. [DOI: 10.1016/j.tibtech.2018.07.013] [Citation(s) in RCA: 112] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 07/16/2018] [Accepted: 07/18/2018] [Indexed: 12/31/2022]
|
272
|
Zascavage RR, Thorson K, Planz JV. Nanopore sequencing: An enrichment-free alternative to mitochondrial DNA sequencing. Electrophoresis 2019; 40:272-280. [PMID: 30511783 PMCID: PMC6590251 DOI: 10.1002/elps.201800083] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Revised: 10/25/2018] [Accepted: 11/03/2018] [Indexed: 12/31/2022]
Abstract
Mitochondrial DNA sequence data are often utilized in disease studies, conservation genetics and forensic identification. The current approaches for sequencing the full mtGenome typically require several rounds of PCR enrichment during Sanger or MPS protocols followed by fairly tedious assembly and analysis. Here we describe an efficient approach to sequencing directly from genomic DNA samples without prior enrichment or extensive library preparation steps. A comparison is made between libraries sequenced directly from native DNA and the same samples sequenced from libraries generated with nine overlapping mtDNA amplicons on the Oxford Nanopore MinION™ device. The native and amplicon library preparation methods and alternative base calling strategies were assessed to establish error rates and identify trends of discordance between the two library preparation approaches. For the complete mtGenome, 16 569 nucleotides, an overall error rate of approximately 1.00% was observed. As expected with mtDNA, the majority of error was detected in homopolymeric regions. The use of a modified basecaller that corrects for ambiguous signal in homopolymeric stretches reduced the error rate for both library preparation methods to approximately 0.30%. Our study indicates that direct mtDNA sequencing from native DNA on the MinION™ device provides comparable results to those obtained from common mtDNA sequencing methods and is a reliable alternative to approaches using PCR-enriched libraries.
Collapse
Affiliation(s)
- Roxanne R. Zascavage
- Department of MicrobiologyImmunology and GeneticsUniversity of North Texas Health Science CenterFort WorthTXUSA
- Department of Criminology and Criminal JusticeUniversity of Texas at ArlingtonArlingtonTXUSA
| | - Kelcie Thorson
- Department of MicrobiologyImmunology and GeneticsUniversity of North Texas Health Science CenterFort WorthTXUSA
- Zoetis Inc.ParsippanyNJUSA
| | - John V. Planz
- Department of MicrobiologyImmunology and GeneticsUniversity of North Texas Health Science CenterFort WorthTXUSA
| |
Collapse
|
273
|
Abstract
Somatic structural variants undoubtedly play important roles in driving tumourigenesis. This is evident despite the substantial technical challenges that remain in accurately detecting structural variants and their breakpoints in tumours and in spite of our incomplete understanding of the impact of structural variants on cellular function. Developments in these areas of research contribute to the ongoing discovery of structural variation with a clear impact on the evolution of the tumour and on the clinical importance to the patient. Recent large whole genome sequencing studies have reinforced our impression of each tumour as a unique combination of mutations but paradoxically have also discovered similar genome-wide patterns of single-nucleotide and structural variation between tumours. Statistical methods have been developed to deconvolute mutation patterns, or signatures, that recur across samples, providing information about the mutagens and repair processes that may be active in a given tumour. These signatures can guide treatment by, for example, highlighting vulnerabilities in a particular tumour to a particular chemotherapy. Thus, although the complete reconstruction of the full evolutionary trajectory of a tumour genome remains currently out of reach, valuable data are already emerging to improve the treatment of cancer.
Collapse
Affiliation(s)
- Ailith Ewing
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH42XU, UK
| | - Colin Semple
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH42XU, UK
| |
Collapse
|
274
|
|
275
|
Lin YY, Wu PC, Chen PL, Oyang YJ, Chen CY. HAHap: a read-based haplotyping method using hierarchical assembly. PeerJ 2018; 6:e5852. [PMID: 30397550 PMCID: PMC6214236 DOI: 10.7717/peerj.5852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 09/27/2018] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND The need for read-based phasing arises with advances in sequencing technologies. The minimum error correction (MEC) approach is the primary trend to resolve haplotypes by reducing conflicts in a single nucleotide polymorphism-fragment matrix. However, it is frequently observed that the solution with the optimal MEC might not be the real haplotypes, due to the fact that MEC methods consider all positions together and sometimes the conflicts in noisy regions might mislead the selection of corrections. To tackle this problem, we present a hierarchical assembly-based method designed to progressively resolve local conflicts. RESULTS This study presents HAHap, a new phasing algorithm based on hierarchical assembly. HAHap leverages high-confident variant pairs to build haplotypes progressively. The phasing results by HAHap on both real and simulated data, compared to other MEC-based methods, revealed better phasing error rates for constructing haplotypes using short reads from whole-genome sequencing. We compared the number of error corrections (ECs) on real data with other methods, and it reveals the ability of HAHap to predict haplotypes with a lower number of ECs. We also used simulated data to investigate the behavior of HAHap under different sequencing conditions, highlighting the applicability of HAHap in certain situations.
Collapse
Affiliation(s)
- Yu-Yu Lin
- Department of Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Ping Chun Wu
- Taipei Blood Center, Taiwan Blood Services Foundation, Taipei, Taiwan
| | - Pei-Lung Chen
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Yen-Jen Oyang
- Department of Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Chien-Yu Chen
- Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
276
|
Hehir-Kwa JY, Tops BBJ, Kemmeren P. The clinical implementation of copy number detection in the age of next-generation sequencing. Expert Rev Mol Diagn 2018; 18:907-915. [PMID: 30221560 DOI: 10.1080/14737159.2018.1523723] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
INTRODUCTION The role of copy number variants (CNVs) in disease is now well established. In parallel NGS technologies, such as long-read technologies, there is continual development and data analysis methods continue to be refined. Clinical exome sequencing data is now a reality for many diagnostic laboratories in both congenital genetics and oncology. This provides the ability to detect and report both SNVs and structural variants, including CNVs, using a single assay for a wide range of patient cohorts. Areas covered: Currently, whole-genome sequencing is mainly restricted to research applications and clinical utility studies. Furthermore, detecting the full-size spectrum of CNVs as well as somatic events remains difficult for both exome and whole-genome sequencing. As a result, the full extent of genomic variants in an individual's genome is still largely unknown. Recently, new sequencing technologies have been introduced which maintain the long-range genomic context, aiding the detection of CNVs and structural variants. Expert commentary: The development of long-read sequencing promises to resolve many CNV and SV detection issues but is yet to become established. The current challenge for clinical CNV detection is how to fully exploit all the data which is generated by high throughput sequencing technologies.
Collapse
Affiliation(s)
- Jayne Y Hehir-Kwa
- a Princess Máxima Center for Pediatric Oncology , Utrecht , Netherlands
| | - Bastiaan B J Tops
- a Princess Máxima Center for Pediatric Oncology , Utrecht , Netherlands
| | - Patrick Kemmeren
- a Princess Máxima Center for Pediatric Oncology , Utrecht , Netherlands
| |
Collapse
|
277
|
Piégu B, Arensburger P, Guillou F, Bigot Y. But where did the centromeres go in the chicken genome models? Chromosome Res 2018; 26:297-306. [PMID: 30225548 DOI: 10.1007/s10577-018-9585-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Revised: 08/31/2018] [Accepted: 09/03/2018] [Indexed: 11/30/2022]
Abstract
The chicken genome was the third vertebrate to be sequenced. To date, its sequence and feature annotations are used as the reference for avian models in genome sequencing projects developed on birds and other Sauropsida species, and in genetic studies of domesticated birds of economic and evolutionary biology interest. Therefore, an accurate description of this genome model is important to a wide number of scientists. Here, we review the location and features of a very basic element, the centromeres of chromosomes in the galGal5 genome model. Centromeres are elements that are not determined by their DNA sequence but by their epigenetic status, in particular by the accumulation of the histone-like protein CENP-A. Comparison of data from several public sources (primarily marker probes flanking centromeres using fluorescent in situ hybridization done on giant lampbrush chromosomes and CENP-A ChIP-seq datasets) with galGal5 annotations revealed that centromeres are likely inappropriately mapped in 9 of the 16 galGal5 chromosome models in which they are described. Analysis of karyology data confirmed that the location of the main CENP-A peaks in chromosomes is the best means of locating the centromeres in 25 galGal5 chromosome models, the majority of which (16) are fully sequenced and assembled. This data re-analysis reaffirms that several sources of information should be examined to produce accurate genome annotations, particularly for basic structures such as centromeres that are epigenetically determined.
Collapse
Affiliation(s)
- Benoît Piégu
- PRC, UMR INRA0085, CNRS 7247, Centre INRA Val de Loire, 37380, Nouzilly, France
| | - Peter Arensburger
- Biological Sciences Department, California State Polytechnic University, Pomona, CA, 91768, USA
| | - Florian Guillou
- PRC, UMR INRA0085, CNRS 7247, Centre INRA Val de Loire, 37380, Nouzilly, France
| | - Yves Bigot
- PRC, UMR INRA0085, CNRS 7247, Centre INRA Val de Loire, 37380, Nouzilly, France.
| |
Collapse
|
278
|
Peona V, Weissensteiner MH, Suh A. How complete are “complete” genome assemblies?-An avian perspective. Mol Ecol Resour 2018; 18:1188-1195. [DOI: 10.1111/1755-0998.12933] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Revised: 06/11/2018] [Accepted: 07/06/2018] [Indexed: 12/26/2022]
Affiliation(s)
- Valentina Peona
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| | - Matthias H. Weissensteiner
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
- Division of Evolutionary Biology; Faculty of Biology; Ludwig-Maximilian University of Munich; Planegg-Martinsried Germany
| | - Alexander Suh
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| |
Collapse
|
279
|
Rogers J. Adding resolution and dimensionality to comparative genomics: moving from reference genomes to clade genomics. Genome Biol 2018; 19:115. [PMID: 30107805 PMCID: PMC6090731 DOI: 10.1186/s13059-018-1500-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The main goal and promise of comparative genomics has been to create a comprehensive catalog of genomic information and function across the phenomenal diversity of living systems. A recent study has demonstrated the evolutionary insights possible by generating high-quality whole-genome assemblies from multiple species of a clade.
Collapse
Affiliation(s)
- Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
280
|
Guppy JL, Jones DB, Jerry DR, Wade NM, Raadsma HW, Huerlimann R, Zenger KR. The State of " Omics" Research for Farmed Penaeids: Advances in Research and Impediments to Industry Utilization. Front Genet 2018; 9:282. [PMID: 30123237 PMCID: PMC6085479 DOI: 10.3389/fgene.2018.00282] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 07/09/2018] [Indexed: 12/19/2022] Open
Abstract
Elucidating the underlying genetic drivers of production traits in agricultural and aquaculture species is critical to efforts to maximize farming efficiency. "Omics" based methods (i.e., transcriptomics, genomics, proteomics, and metabolomics) are increasingly being applied to gain unprecedented insight into the biology of many aquaculture species. While the culture of penaeid shrimp has increased markedly, the industry continues to be impeded in many regards by disease, reproductive dysfunction, and a poor understanding of production traits. Extensive effort has been, and continues to be, applied to develop critical genomic resources for many commercially important penaeids. However, the industry application of these genomic resources, and the translation of the knowledge derived from "omics" studies has not yet been completely realized. Integration between the multiple "omics" resources now available (i.e., genome assemblies, transcriptomes, linkage maps, optical maps, and proteomes) will prove critical to unlocking the full utility of these otherwise independently developed and isolated resources. Furthermore, emerging "omics" based techniques are now available to address longstanding issues with completing keystone genome assemblies (e.g., through long-read sequencing), and can provide cost-effective industrial scale genotyping tools (e.g., through low density SNP chips and genotype-by-sequencing) to undertake advanced selective breeding programs (i.e., genomic selection) and powerful genome-wide association studies. In particular, this review highlights the status, utility and suggested path forward for continued development, and improved use of "omics" resources in penaeid aquaculture.
Collapse
Affiliation(s)
- Jarrod L. Guppy
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- College of Science and Engineering and Centre for Sustainable Tropical Fisheries and Aquaculture, James Cook University, Townsville, QLD, Australia
| | - David B. Jones
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- College of Science and Engineering and Centre for Sustainable Tropical Fisheries and Aquaculture, James Cook University, Townsville, QLD, Australia
| | - Dean R. Jerry
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- College of Science and Engineering and Centre for Sustainable Tropical Fisheries and Aquaculture, James Cook University, Townsville, QLD, Australia
| | - Nicholas M. Wade
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- Aquaculture Program, CSIRO Agriculture & Food, Queensland Bioscience Precinct, St Lucia, QLD, Australia
| | - Herman W. Raadsma
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- Faculty of Science, Sydney School of Veterinary Science, The University of Sydney, Camden, NSW, Australia
| | - Roger Huerlimann
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- College of Science and Engineering and Centre for Sustainable Tropical Fisheries and Aquaculture, James Cook University, Townsville, QLD, Australia
| | - Kyall R. Zenger
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- College of Science and Engineering and Centre for Sustainable Tropical Fisheries and Aquaculture, James Cook University, Townsville, QLD, Australia
| |
Collapse
|
281
|
Beretta S, Patterson MD, Zaccaria S, Della Vedova G, Bonizzoni P. HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads. BMC Bioinformatics 2018; 19:252. [PMID: 29970002 PMCID: PMC6029272 DOI: 10.1186/s12859-018-2253-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 06/18/2018] [Indexed: 01/08/2023] Open
Abstract
Background Haplotype assembly is the process of assigning the different alleles of the variants covered by mapped sequencing reads to the two haplotypes of the genome of a human individual. Long reads, which are nowadays cheaper to produce and more widely available than ever before, have been used to reduce the fragmentation of the assembled haplotypes since their ability to span several variants along the genome. These long reads are also characterized by a high error rate, an issue which may be mitigated, however, with larger sets of reads, when this error rate is uniform across genome positions. Unfortunately, current state-of-the-art dynamic programming approaches designed for long reads deal only with limited coverages. Results Here, we propose a new method for assembling haplotypes which combines and extends the features of previous approaches to deal with long reads and higher coverages. In particular, our algorithm is able to dynamically adapt the estimated number of errors at each variant site, while minimizing the total number of error corrections necessary for finding a feasible solution. This allows our method to significantly reduce the required computational resources, allowing to consider datasets composed of higher coverages. The algorithm has been implemented in a freely available tool, HapCHAT: Haplotype Assembly Coverage Handling by Adapting Thresholds. An experimental analysis on sequencing reads with up to 60 × coverage reveals improvements in accuracy and recall achieved by considering a higher coverage with lower runtimes. Conclusions Our method leverages the long-range information of sequencing reads that allows to obtain assembled haplotypes fragmented in a lower number of unphased haplotype blocks. At the same time, our method is also able to deal with higher coverages to better correct the errors in the original reads and to obtain more accurate haplotypes as a result. Availability HapCHAT is available at http://hapchat.algolab.euunder the GNU Public License (GPL).
Collapse
Affiliation(s)
- Stefano Beretta
- Department of Informatics, Systems, and Communication, University of Milano-Bicocca, Milan, Italy
| | - Murray D Patterson
- Department of Informatics, Systems, and Communication, University of Milano-Bicocca, Milan, Italy.
| | - Simone Zaccaria
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| | - Gianluca Della Vedova
- Department of Informatics, Systems, and Communication, University of Milano-Bicocca, Milan, Italy
| | - Paola Bonizzoni
- Department of Informatics, Systems, and Communication, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
282
|
van Dijk EL, Jaszczyszyn Y, Naquin D, Thermes C. The Third Revolution in Sequencing Technology. Trends Genet 2018; 34:666-681. [PMID: 29941292 DOI: 10.1016/j.tig.2018.05.008] [Citation(s) in RCA: 561] [Impact Index Per Article: 93.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/18/2018] [Accepted: 05/29/2018] [Indexed: 12/16/2022]
Abstract
Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time. A second revolution came when next-generation sequencing (NGS) technologies appeared, which made genome sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality. Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology. Here we review and compare the various long-read methods. We discuss their applications and their respective strengths and weaknesses and provide future perspectives.
Collapse
Affiliation(s)
- Erwin L van Dijk
- Institute for Integrative Biology of the Cell, UMR9198, CNRS CEA Université Paris-Sud, Université Paris-Saclay, 9198 Gif sur Yvette Cedex, France.
| | - Yan Jaszczyszyn
- Institute for Integrative Biology of the Cell, UMR9198, CNRS CEA Université Paris-Sud, Université Paris-Saclay, 9198 Gif sur Yvette Cedex, France
| | - Delphine Naquin
- Institute for Integrative Biology of the Cell, UMR9198, CNRS CEA Université Paris-Sud, Université Paris-Saclay, 9198 Gif sur Yvette Cedex, France
| | - Claude Thermes
- Institute for Integrative Biology of the Cell, UMR9198, CNRS CEA Université Paris-Sud, Université Paris-Saclay, 9198 Gif sur Yvette Cedex, France
| |
Collapse
|
283
|
Bourne SD, Hudson J, Holman LE, Rius M. Marine Invasion Genomics: Revealing Ecological and Evolutionary Consequences of Biological Invasions. ACTA ACUST UNITED AC 2018. [DOI: 10.1007/13836_2018_21] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
284
|
Kyriakidou M, Tai HH, Anglin NL, Ellis D, Strömvik MV. Current Strategies of Polyploid Plant Genome Sequence Assembly. FRONTIERS IN PLANT SCIENCE 2018; 9:1660. [PMID: 30519250 PMCID: PMC6258962 DOI: 10.3389/fpls.2018.01660] [Citation(s) in RCA: 101] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 10/25/2018] [Indexed: 05/14/2023]
Abstract
Polyploidy or duplication of an entire genome occurs in the majority of angiosperms. The understanding of polyploid genomes is important for the improvement of those crops, which humans rely on for sustenance and basic nutrition. As climate change continues to pose a potential threat to agricultural production, there will increasingly be a demand for plant cultivars that can resist biotic and abiotic stresses and also provide needed and improved nutrition. In the past decade, Next Generation Sequencing (NGS) has fundamentally changed the genomics landscape by providing tools for the exploration of polyploid genomes. Here, we review the challenges of the assembly of polyploid plant genomes, and also present recent advances in genomic resources and functional tools in molecular genetics and breeding. As genomes of diploid and less heterozygous progenitor species are increasingly available, we discuss the lack of complexity of these currently available reference genomes as they relate to polyploid crops. Finally, we review recent approaches of haplotyping by phasing and the impact of third generation technologies on polyploid plant genome assembly.
Collapse
Affiliation(s)
- Maria Kyriakidou
- Department of Plant Science, McGill University, Montreal, QC, Canada
| | - Helen H. Tai
- Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, Fredericton, NB, Canada
| | | | | | - Martina V. Strömvik
- Department of Plant Science, McGill University, Montreal, QC, Canada
- *Correspondence: Martina V. Strömvik
| |
Collapse
|
285
|
Luikart G, Kardos M, Hand BK, Rajora OP, Aitken SN, Hohenlohe PA. Population Genomics: Advancing Understanding of Nature. POPULATION GENOMICS 2018. [DOI: 10.1007/13836_2018_60] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|