1
|
Worthey EA. Analysis and Annotation of Whole-Genome or Whole-Exome Sequencing Derived Variants for Clinical Diagnosis. ACTA ACUST UNITED AC 2017; 95:9.24.1-9.24.28. [PMID: 29044471 DOI: 10.1002/cphg.49] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Over the last 10 years, next-generation sequencing (NGS) has transformed genomic research through substantial advances in technology and reduction in the cost of sequencing, and also in the systems required for analysis of these large volumes of data. This technology is now being used as a standard molecular diagnostic test in some clinical settings. The advances in sequencing have come so rapidly that the major bottleneck in identification of causal variants is no longer the sequencing or analysis (given access to appropriate tools), but rather clinical interpretation. Interpretation of genetic findings in a complex and ever changing clinical setting is scarcely a new challenge, but the task is increasingly complex in clinical genome-wide sequencing given the dramatic increase in dataset size and complexity. This increase requires application of appropriate interpretation tools, as well as development and application of appropriate methodologies and standard procedures. This unit provides an overview of these items. Specific challenges related to implementation of genome-wide sequencing in a clinical setting are discussed. © 2017 by John Wiley & Sons, Inc.
Collapse
|
2
|
Whole-Genome Restriction Mapping by "Subhaploid"-Based RAD Sequencing: An Efficient and Flexible Approach for Physical Mapping and Genome Scaffolding. Genetics 2017; 206:1237-1250. [PMID: 28468906 PMCID: PMC5500127 DOI: 10.1534/genetics.117.200303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 04/17/2017] [Indexed: 11/18/2022] Open
Abstract
Assembly of complex genomes using short reads remains a major challenge, which usually yields highly fragmented assemblies. Generation of ultradense linkage maps is promising for anchoring such assemblies, but traditional linkage mapping methods are hindered by the infrequency and unevenness of meiotic recombination that limit attainable map resolution. Here we develop a sequencing-based "in vitro" linkage mapping approach (called RadMap), where chromosome breakage and segregation are realized by generating hundreds of "subhaploid" fosmid/bacterial-artificial-chromosome clone pools, and by restriction site-associated DNA sequencing of these clone pools to produce an ultradense whole-genome restriction map to facilitate genome scaffolding. A bootstrap-based minimum spanning tree algorithm is developed for grouping and ordering of genome-wide markers and is implemented in a user-friendly, integrated software package (AMMO). We perform extensive analyses to validate the power and accuracy of our approach in the model plant Arabidopsis thaliana and human. We also demonstrate the utility of RadMap for enhancing the contiguity of a variety of whole-genome shotgun assemblies generated using either short Illumina reads (300 bp) or long PacBio reads (6-14 kb), with up to 15-fold improvement of N50 (∼816 kb-3.7 Mb) and high scaffolding accuracy (98.1-98.5%). RadMap outperforms BioNano and Hi-C when input assembly is highly fragmented (contig N50 = 54 kb). RadMap can capture wide-range contiguity information and provide an efficient and flexible tool for high-resolution physical mapping and scaffolding of highly fragmented assemblies.
Collapse
|
3
|
Deng T, Pang C, Lu X, Zhu P, Duan A, Tan Z, Huang J, Li H, Chen M, Liang X. De Novo Transcriptome Assembly of the Chinese Swamp Buffalo by RNA Sequencing and SSR Marker Discovery. PLoS One 2016; 11:e0147132. [PMID: 26766209 PMCID: PMC4713091 DOI: 10.1371/journal.pone.0147132] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 12/29/2015] [Indexed: 01/11/2023] Open
Abstract
The Chinese swamp buffalo (Bubalis bubalis) is vital to the lives of small farmers and has tremendous economic importance. However, a lack of genomic information has hampered research on augmenting marker assisted breeding programs in this species. Thus, a high-throughput transcriptomic sequencing of B. bubalis was conducted to generate transcriptomic sequence dataset for gene discovery and molecular marker development. Illumina paired-end sequencing generated a total of 54,109,173 raw reads. After trimming, de novo assembly was performed, which yielded 86,017 unigenes, with an average length of 972.41 bp, an N50 of 1,505 bp, and an average GC content of 49.92%. A total of 62,337 unigenes were successfully annotated. Among the annotated unigenes, 27,025 (43.35%) and 23,232 (37.27%) unigenes showed significant similarity to known proteins in NCBI non-redundant protein and Swiss-Prot databases (E-value < 1.0E-5), respectively. Of these annotated unigenes, 14,439 and 15,813 unigenes were assigned to the Gene Ontology (GO) categories and EuKaryotic Ortholog Group (KOG) cluster, respectively. In addition, a total of 14,167 unigenes were assigned to 331 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Furthermore, 17,401 simple sequence repeats (SSRs) were identified as potential molecular markers. One hundred and fifteen primer pairs were randomly selected for amplification to detect polymorphisms. The results revealed that 110 primer pairs (95.65%) yielded PCR amplicons and 69 primer pairs (60.00%) presented polymorphisms in 35 individual buffaloes. A phylogenetic analysis showed that the five swamp buffalo populations were clustered together, whereas two river buffalo breeds clustered separately. In the present study, the Illumina RNA-seq technology was utilized to perform transcriptome analysis and SSR marker discovery in the swamp buffalo without using a reference genome. Our findings will enrich the current SSR markers resources and help spearhead molecular genetic research studies on the swamp buffalo.
Collapse
Affiliation(s)
- Tingxian Deng
- Key Laboratory of Buffalo Genetics, Breeding and Reproduction technology, Ministry of Agriculture, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning, Guangxi, P. R. China
| | - Chunying Pang
- Key Laboratory of Buffalo Genetics, Breeding and Reproduction technology, Ministry of Agriculture, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning, Guangxi, P. R. China
| | - Xingrong Lu
- Key Laboratory of Buffalo Genetics, Breeding and Reproduction technology, Ministry of Agriculture, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning, Guangxi, P. R. China
| | - Peng Zhu
- Key Laboratory of Buffalo Genetics, Breeding and Reproduction technology, Ministry of Agriculture, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning, Guangxi, P. R. China
| | - Anqin Duan
- Key Laboratory of Buffalo Genetics, Breeding and Reproduction technology, Ministry of Agriculture, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning, Guangxi, P. R. China
| | - Zhengzhun Tan
- Key Laboratory of Buffalo Genetics, Breeding and Reproduction technology, Ministry of Agriculture, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning, Guangxi, P. R. China
| | - Jian Huang
- Key Laboratory of Buffalo Genetics, Breeding and Reproduction technology, Ministry of Agriculture, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning, Guangxi, P. R. China
| | - Hui Li
- Key Laboratory of Buffalo Genetics, Breeding and Reproduction technology, Ministry of Agriculture, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning, Guangxi, P. R. China
| | - Mingtan Chen
- Key Laboratory of Buffalo Genetics, Breeding and Reproduction technology, Ministry of Agriculture, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning, Guangxi, P. R. China
| | - Xianwei Liang
- Key Laboratory of Buffalo Genetics, Breeding and Reproduction technology, Ministry of Agriculture, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning, Guangxi, P. R. China
- * E-mail:
| |
Collapse
|
4
|
Jiang Z, Wang H, Michal JJ, Zhou X, Liu B, Woods LCS, Fuchs RA. Genome Wide Sampling Sequencing for SNP Genotyping: Methods, Challenges and Future Development. Int J Biol Sci 2016; 12:100-8. [PMID: 26722221 PMCID: PMC4679402 DOI: 10.7150/ijbs.13498] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Accepted: 11/07/2015] [Indexed: 12/04/2022] Open
Abstract
Genetic polymorphisms, particularly single nucleotide polymorphisms (SNPs), have been widely used to advance quantitative, functional and evolutionary genomics. Ideally, all genetic variants among individuals should be discovered when next generation sequencing (NGS) technologies and platforms are used for whole genome sequencing or resequencing. In order to improve the cost-effectiveness of the process, however, the research community has mainly focused on developing genome-wide sampling sequencing (GWSS) methods, a collection of reduced genome complexity sequencing, reduced genome representation sequencing and selective genome target sequencing. Here we review the major steps involved in library preparation, the types of adapters used for ligation and the primers designed for amplification of ligated products for sequencing. Unfortunately, currently available GWSS methods have their drawbacks, such as inconsistency in the number of reads per sample library, the number of sites/targets per individual, and the number of reads per site/target, all of which result in missing data. Suggestions are proposed here to improve library construction, genotype calling accuracy, genome-wide marker density and read mapping rate. In brief, optimized GWSS library preparation should generate a unique set of target sites with dense distribution along chromosomes and even coverage per site across all individuals.
Collapse
Affiliation(s)
- Zhihua Jiang
- 1. Department of Animal Sciences, Washington State University, Pullman, WA 99164-7620, USA
| | - Hongyang Wang
- 1. Department of Animal Sciences, Washington State University, Pullman, WA 99164-7620, USA; ; 2. Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, China
| | - Jennifer J Michal
- 1. Department of Animal Sciences, Washington State University, Pullman, WA 99164-7620, USA
| | - Xiang Zhou
- 1. Department of Animal Sciences, Washington State University, Pullman, WA 99164-7620, USA
| | - Bang Liu
- 2. Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, China
| | - Leah C Solberg Woods
- 3. Department of Pediatrics, Human and Molecular Genetics Center and Children's Research Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Rita A Fuchs
- 4. Department of Integrative Physiology and Neuroscience, Washington State University College of Veterinary Medicine, Pullman, WA 99164-7620, USA
| |
Collapse
|
5
|
Whole transcriptome analysis with sequencing: methods, challenges and potential solutions. Cell Mol Life Sci 2015; 72:3425-39. [PMID: 26018601 DOI: 10.1007/s00018-015-1934-y] [Citation(s) in RCA: 142] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Revised: 04/25/2015] [Accepted: 05/21/2015] [Indexed: 10/23/2022]
Abstract
Whole transcriptome analysis plays an essential role in deciphering genome structure and function, identifying genetic networks underlying cellular, physiological, biochemical and biological systems and establishing molecular biomarkers that respond to diseases, pathogens and environmental challenges. Here, we review transcriptome analysis methods and technologies that have been used to conduct whole transcriptome shotgun sequencing or whole transcriptome tag/target sequencing analyses. We focus on how adaptors/linkers are added to both 5' and 3' ends of mRNA molecules for cloning or PCR amplification before sequencing. Challenges and potential solutions are also discussed. In brief, next generation sequencing platforms have accelerated releases of the large amounts of gene expression data. It is now time for the genome research community to assemble whole transcriptomes of all species and collect signature targets for each gene/transcript, and thus use known genes/transcripts to determine known transcriptomes directly in the near future.
Collapse
|
6
|
Abstract
We have come a long way in the 55 years since Edmond Fischer and the late Edwin Krebs discovered that the activity of glycogen phosphorylase is regulated by reversible protein phosphorylation. Many of the fundamental molecular mechanisms that operate in biological signaling have since been characterized and the vast web of interconnected pathways that make up the cellular signaling network has been mapped in considerable detail. Nonetheless, it is important to consider how fast this field is still moving and the issues at the current boundaries of our understanding. One must also appreciate what experimental strategies have allowed us to attain our present level of knowledge. We summarize here some key issues (both conceptual and methodological), raise unresolved questions, discuss potential pitfalls, and highlight areas in which our understanding is still rudimentary. We hope these wide-ranging ruminations will be useful to investigators who carry studies of signal transduction forward during the rest of the 21st century.
Collapse
|
7
|
Worthey EA. Analysis and annotation of whole-genome or whole-exome sequencing-derived variants for clinical diagnosis. CURRENT PROTOCOLS IN HUMAN GENETICS 2013; 79:9.24.1-9.24.24. [PMID: 24510652 DOI: 10.1002/0471142905.hg0924s79] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Over the last several years, next-generation sequencing (NGS) has transformed genomic research through substantial advances in technology and reduction in the cost of sequencing, and also in the systems required for analysis of these large volumes of data. This technology is now being used as a standard molecular diagnostic test under particular circumstances in some clinical settings. The advances in sequencing have come so rapidly that the major bottleneck in identification of causal variants is no longer the sequencing but rather the analysis and interpretation. Interpretation of genetic findings in a clinical setting is scarcely a new challenge, but the task is increasingly complex in clinical genome-wide sequencing given the dramatic increase in dataset size and complexity. This increase requires the development of novel or repositioned analysis tools, methodologies, and processes. This unit provides an overview of these items. Specific challenges related to implementation in a clinical setting are discussed.
Collapse
Affiliation(s)
- Elizabeth A Worthey
- Department of Pediatrics, Medical College of Wisconsin, Milwaukee, Wisconsin.,The Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wisconsin.,Department of Computer Science, University of Wisconsin, Milwaukee, Wisconsin
| |
Collapse
|
8
|
Boulanger J, Muresan L, Tiemann-Boege I. Massively parallel haplotyping on microscopic beads for the high-throughput phase analysis of single molecules. PLoS One 2012; 7:e36064. [PMID: 22558329 PMCID: PMC3340404 DOI: 10.1371/journal.pone.0036064] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2012] [Accepted: 03/30/2012] [Indexed: 12/12/2022] Open
Abstract
In spite of the many advances in haplotyping methods, it is still very difficult to characterize rare haplotypes in tissues and different environmental samples or to accurately assess the haplotype diversity in large mixtures. This would require a haplotyping method capable of analyzing the phase of single molecules with an unprecedented throughput. Here we describe such a haplotyping method capable of analyzing in parallel hundreds of thousands single molecules in one experiment. In this method, multiple PCR reactions amplify different polymorphic regions of a single DNA molecule on a magnetic bead compartmentalized in an emulsion drop. The allelic states of the amplified polymorphisms are identified with fluorescently labeled probes that are then decoded from images taken of the arrayed beads by a microscope. This method can evaluate the phase of up to 3 polymorphisms separated by up to 5 kilobases in hundreds of thousands single molecules. We tested the sensitivity of the method by measuring the number of mutant haplotypes synthesized by four different commercially available enzymes: Phusion, Platinum Taq, Titanium Taq, and Phire. The digital nature of the method makes it highly sensitive to detecting haplotype ratios of less than 1:10,000. We also accurately quantified chimera formation during the exponential phase of PCR by different DNA polymerases.
Collapse
Affiliation(s)
- Jérôme Boulanger
- Cell and Tissue Imaging Core, Centre National de la Recherche Scientifique, Institut Curie, Paris, France
- Radon Institute for Computational and Applied Mathematics of the Austrian Academy of Sciences, Linz, Austria
| | - Leila Muresan
- Department of Knowledge-Based Mathematical Systems, Johannes Kepler University, Linz, Austria
| | | |
Collapse
|
9
|
Jiang Z, Michal JJ, Beckman KB, Lyons JB, Zhang M, Pan Z, Rokhsar DS, Harland RM. Development and initial characterization of a HAPPY panel for mapping the X. tropicalis genome. Int J Biol Sci 2011; 7:1037-44. [PMID: 21912511 PMCID: PMC3164153 DOI: 10.7150/ijbs.7.1037] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Accepted: 08/13/2011] [Indexed: 01/22/2023] Open
Abstract
HAPPY mapping was designed to pursue the analysis of approximately random HAPloid DNA breakage samples using the PolYmerase chain reaction for mapping genomes. In the present study, we improved the method and integrated two other molecular techniques into the process: whole genome amplification and the Sequenom SNP (single nucleotide polymorphism) genotyping assay in order to facilitate whole genome mapping of X. tropicalis. The former technique amplified enough DNA materials to genotype a large number of markers, while the latter allowed for relatively high throughput marker genotyping with multiplex assays on the HAPPY lines. A total of 58 X. tropicalis genes were genotyped on an initial panel of 383 HAPPY lines, which contributed to formation of a working panel of 146 lines. Further genotyping of 29 markers on the working panel led to construction of a HAPPY map for the X. tropicalis genome. We believe that our improved HAPPY method described in the present study has paved the way for the community to map different genomes with a simple, but powerful approach.
Collapse
Affiliation(s)
- Zhihua Jiang
- Department of Animal Sciences, Washington State University, Pullman, WA 99164-6351, USA.
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Michelizzi VN, Dodson MV, Pan Z, Amaral MEJ, Michal JJ, McLean DJ, Womack JE, Jiang Z. Water buffalo genome science comes of age. Int J Biol Sci 2010; 6:333-49. [PMID: 20582226 PMCID: PMC2892297 DOI: 10.7150/ijbs.6.333] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2010] [Accepted: 06/14/2010] [Indexed: 12/30/2022] Open
Abstract
The water buffalo is vital to the lives of small farmers and to the economy of many countries worldwide. Not only are they draught animals, but they are also a source of meat, horns, skin and particularly the rich and precious milk that may be converted to creams, butter, yogurt and many cheeses. Genome analysis of water buffalo has advanced significantly in recent years. This review focuses on currently available genome resources in water buffalo in terms of cytogenetic characterization, whole genome mapping and next generation sequencing. No doubt, these resources indicate that genome science comes of age in the species and will provide knowledge and technologies to help optimize production potential, reproduction efficiency, product quality, nutritional value and resistance to diseases. As water buffalo and domestic cattle, both members of the Bovidae family, are closely related, the vast amount of cattle genetic/genomic resources might serve as shortcuts for the buffalo community to further advance genome science and biotechnologies in the species.
Collapse
Affiliation(s)
- Vanessa N Michelizzi
- Department of Animal Sciences, Washington State University, Pullman, WA 99164-6351, USA
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Mir KU. Sequencing genomes: from individuals to populations. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2010; 8:367-78. [PMID: 19808932 DOI: 10.1093/bfgp/elp040] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The whole genome sequences of Jim Watson and Craig Venter are early examples of personalized genomics, which promises to change how we approach healthcare in the future. Before personal sequencing can have practical medical benefits, however, and before it should be advocated for implementation at the population-scale, there needs to be a better understanding of which genetic variants influence which traits and how their effects are modified by epigenetic factors. Nonetheless, for forging links between DNA sequence and phenotype, efforts to sequence the genomes of individuals need to continue; this includes sequencing sub-populations for association studies which analyse the difference in sequence between disease affected and unaffected individuals. Such studies can only be applied on a large enough scale to be effective if the massive strides in sequencing technology that have recently occurred also continue.
Collapse
Affiliation(s)
- Kalim U Mir
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|