951
|
Tweardy DJ, Belmont JW. "Personalizing" academic medicine: opportunities and challenges in implementing genomic profiling. Transl Res 2009; 154:288-94. [PMID: 19931194 PMCID: PMC2830892 DOI: 10.1016/j.trsl.2009.09.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2009] [Revised: 09/19/2009] [Accepted: 09/22/2009] [Indexed: 10/20/2022]
Abstract
BCM faculty members spearheaded the development of a first-generation Personal Genome Profile (Baylor PGP) assay to assist physicians in diagnosing and managing patients in this new era of medicine. The principles that guided the design and implementation of the Baylor PGP were high quality, robustness, low expense, flexibility, practical clinical utility, and the ability to facilitate broad areas of clinical research. The most distinctive feature of the approach taken is an emphasis on extensive screening for rare disease-causing mutations rather than common risk-increasing polymorphisms. Because these variants have large direct effects, the ability to screen for them inexpensively could have a major immediate clinical impact in disease diagnosis, carrier detection, presymptomatic detection of late onset disease, and even prenatal diagnosis. In addition to creating a counseling tool for individual "consumers," this system will fit into the established medical record and be used by physicians involved in direct patient care. This article describes an overall framework for clinical diagnostic array genotyping and the available technologies, as well as highlights the opportunities and challenges for implementation.
Collapse
Affiliation(s)
- David J Tweardy
- Department of Medicine (Section of Infectious Diseases), Baylor College of Medicine, Houston, Tex. 77030, USA.
| | | |
Collapse
|
952
|
Wimmer E, Mueller S, Tumpey TM, Taubenberger JK. Synthetic viruses: a new opportunity to understand and prevent viral disease. Nat Biotechnol 2009; 27:1163-72. [PMID: 20010599 PMCID: PMC2819212 DOI: 10.1038/nbt.1593] [Citation(s) in RCA: 113] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Rapid progress in DNA synthesis and sequencing is spearheading the deliberate, large-scale genetic alteration of organisms. These new advances in DNA manipulation have been extended to the level of whole-genome synthesis, as evident from the synthesis of poliovirus, from the resurrection of the extinct 1918 strain of influenza virus and of human endogenous retroviruses and from the restructuring of the phage T7 genome. The largest DNA synthesized so far is the 582,970 base pair genome of Mycoplasma genitalium, although, as yet, this synthetic DNA has not been 'booted' to life. As genome synthesis is independent of a natural template, it allows modification of the structure and function of a virus's genetic information to an extent that was hitherto impossible. The common goal of this new strategy is to further our understanding of an organism's properties, particularly its pathogenic armory if it causes disease in humans, and to make use of this new information to protect from, or treat, human viral disease. Although only a few applications of virus synthesis have been described as yet, key recent findings have been the resurrection of the 1918 influenza virus and the generation of codon- and codon pair-deoptimized polioviruses.
Collapse
Affiliation(s)
- Eckard Wimmer
- Department of Molecular Genetics and Microbiology, Stony Brook University, Stony Brook, New York, USA.
| | | | | | | |
Collapse
|
953
|
Melum E, Franke A, Karlsen TH. Genome-wide association studies - A summary for the clinical gastroenterologist. World J Gastroenterol 2009; 15:5377-96. [PMID: 19916168 PMCID: PMC2778094 DOI: 10.3748/wjg.15.5377] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Genome-wide association studies (GWAS) have been applied to various gastrointestinal and liver diseases in recent years. A large number of susceptibility genes and key biological pathways in disease development have been identified. So far, studies in inflammatory bowel diseases, and in particular Crohn’s disease, have been especially successful in defining new susceptibility loci using the GWAS design. The identification of associations related to autophagy as well as several genes involved in immunological response will be important to future research on Crohn’s disease. In this review, key methodological aspects of GWAS, the importance of proper cohort collection, genotyping issues and statistical methods are summarized. Ways of addressing the shortcomings of the GWAS design, when it comes to rare variants, are also discussed. For each of the relevant conditions, findings from the various GWAS are summarized with a focus on the affected biological systems.
Collapse
|
954
|
Steuernagel B, Taudien S, Gundlach H, Seidel M, Ariyadasa R, Schulte D, Petzold A, Felder M, Graner A, Scholz U, Mayer KFX, Platzer M, Stein N. De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and genome analysis in the complex genome of barley. BMC Genomics 2009; 10:547. [PMID: 19930547 PMCID: PMC2784808 DOI: 10.1186/1471-2164-10-547] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2009] [Accepted: 11/20/2009] [Indexed: 01/18/2023] Open
Abstract
Background De novo sequencing the entire genome of a large complex plant genome like the one of barley (Hordeum vulgare L.) is a major challenge both in terms of experimental feasibility and costs. The emergence and breathtaking progress of next generation sequencing technologies has put this goal into focus and a clone based strategy combined with the 454/Roche technology is conceivable. Results To test the feasibility, we sequenced 91 barcoded, pooled, gene containing barley BACs using the GS FLX platform and assembled the sequences under iterative change of parameters. The BAC assemblies were characterized by N50 of ~50 kb (N80 ~31 kb, N90 ~21 kb) and a Q40 of 94%. For ~80% of the clones, the best assemblies consisted of less than 10 contigs at 24-fold mean sequence coverage. Moreover we show that gene containing regions seem to assemble completely and uninterrupted thus making the approach suitable for detecting complete and positionally anchored genes. By comparing the assemblies of four clones to their complete reference sequences generated by the Sanger method, we evaluated the distribution, quality and representativeness of the 454 sequences as well as the consistency and reliability of the assemblies. Conclusion The described multiplex 454 sequencing of barcoded BACs leads to sequence consensi highly representative for the clones. Assemblies are correct for the majority of contigs. Though the resolution of complex repetitive structures requires additional experimental efforts, our approach paves the way for a clone based strategy of sequencing the barley genome.
Collapse
Affiliation(s)
- Burkhard Steuernagel
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Gatersleben, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
955
|
Imelfort M, Edwards D. De novo sequencing of plant genomes using second-generation technologies. Brief Bioinform 2009; 10:609-18. [DOI: 10.1093/bib/bbp039] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
|
956
|
Makrythanasis P, Kapranov P, Bartoloni L, Reymond A, Deutsch S, Guigó R, Denoeud F, Drenkow J, Rossier C, Ariani F, Capra V, Excoffier L, Renieri A, Gingeras TR, Antonarakis SE. Variation in novel exons (RACEfrags) of the MECP2 gene in Rett syndrome patients and controls. Hum Mutat 2009; 30:E866-79. [PMID: 19562714 DOI: 10.1002/humu.21073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The study of transcription using genomic tiling arrays has lead to the identification of numerous additional exons. One example is the MECP2 gene on the X chromosome; using 5'RACE and RT-PCR in human tissues and cell lines, we have found more than 70 novel exons (RACEfrags) connecting to at least one annotated exon.. We sequenced all MECP2-connected exons and flanking sequences in 3 groups: 46 patients with the Rett syndrome and without mutations in the currently annotated exons of the MECP2 and CDKL5 genes; 32 patients with the Rett syndrome and identified mutations in the MECP2 gene; 100 control individuals from the same geoethnic group. Approximately 13 kb were sequenced per sample, (2.4 Mb of DNA resequencing). A total of 75 individuals had novel rare variants (mostly private variants) but no statistically significant difference was found among the 3 groups. These results suggest that variants in the newly discovered exons may not contribute to Rett syndrome. Interestingly however, there are about twice more variants in the novel exons than in the flanking sequences (44 vs. 21 for approximately 1.3 Mb sequenced for each class of sequences, p=0.0025). Thus the evolutionary forces that shape these novel exons may be different than those of neighboring sequences.
Collapse
Affiliation(s)
- Periklis Makrythanasis
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
957
|
Friedman J, Adam S, Arbour L, Armstrong L, Baross A, Birch P, Boerkoel C, Chan S, Chai D, Delaney AD, Flibotte S, Gibson WT, Langlois S, Lemyre E, Li HI, MacLeod P, Mathers J, Michaud JL, McGillivray BC, Patel MS, Qian H, Rouleau GA, Van Allen MI, Yong SL, Zahir FR, Eydoux P, Marra MA. Detection of pathogenic copy number variants in children with idiopathic intellectual disability using 500 K SNP array genomic hybridization. BMC Genomics 2009; 10:526. [PMID: 19917086 PMCID: PMC2781027 DOI: 10.1186/1471-2164-10-526] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2009] [Accepted: 11/16/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Array genomic hybridization is being used clinically to detect pathogenic copy number variants in children with intellectual disability and other birth defects. However, there is no agreement regarding the kind of array, the distribution of probes across the genome, or the resolution that is most appropriate for clinical use. RESULTS We performed 500 K Affymetrix GeneChip array genomic hybridization in 100 idiopathic intellectual disability trios, each comprised of a child with intellectual disability of unknown cause and both unaffected parents. We found pathogenic genomic imbalance in 16 of these 100 individuals with idiopathic intellectual disability. In comparison, we had found pathogenic genomic imbalance in 11 of 100 children with idiopathic intellectual disability in a previous cohort who had been studied by 100 K GeneChip array genomic hybridization. Among 54 intellectual disability trios selected from the previous cohort who were re-tested with 500 K GeneChip array genomic hybridization, we identified all 10 previously-detected pathogenic genomic alterations and at least one additional pathogenic copy number variant that had not been detected with 100 K GeneChip array genomic hybridization. Many benign copy number variants, including one that was de novo, were also detected with 500 K array genomic hybridization, but it was possible to distinguish the benign and pathogenic copy number variants with confidence in all but 3 (1.9%) of the 154 intellectual disability trios studied. CONCLUSION Affymetrix GeneChip 500 K array genomic hybridization detected pathogenic genomic imbalance in 10 of 10 patients with idiopathic developmental disability in whom 100 K GeneChip array genomic hybridization had found genomic imbalance, 1 of 44 patients in whom 100 K GeneChip array genomic hybridization had found no abnormality, and 16 of 100 patients who had not previously been tested. Effective clinical interpretation of these studies requires considerable skill and experience.
Collapse
Affiliation(s)
- Jm Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, Canada.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
958
|
Gao M, Skolnick J. A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLoS Comput Biol 2009; 5:e1000567. [PMID: 19911048 PMCID: PMC2770119 DOI: 10.1371/journal.pcbi.1000567] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2009] [Accepted: 10/16/2009] [Indexed: 11/18/2022] Open
Abstract
Diverse mechanisms for DNA-protein recognition have been elucidated in numerous atomic complex structures from various protein families. These structural data provide an invaluable knowledge base not only for understanding DNA-protein interactions, but also for developing specialized methods that predict the DNA-binding function from protein structure. While such methods are useful, a major limitation is that they require an experimental structure of the target as input. To overcome this obstacle, we develop a threading-based method, DNA-Binding-Domain-Threader (DBD-Threader), for the prediction of DNA-binding domains and associated DNA-binding protein residues. Our method, which uses a template library composed of DNA-protein complex structures, requires only the target protein's sequence. In our approach, fold similarity and DNA-binding propensity are employed as two functional discriminating properties. In benchmark tests on 179 DNA-binding and 3,797 non-DNA-binding proteins, using templates whose sequence identity is less than 30% to the target, DBD-Threader achieves a sensitivity/precision of 56%/86%. This performance is considerably better than the standard sequence comparison method PSI-BLAST and is comparable to DBD-Hunter, which requires an experimental structure as input. Moreover, for over 70% of predicted DNA-binding domains, the backbone Root Mean Square Deviations (RMSDs) of the top-ranked structural models are within 6.5 A of their experimental structures, with their associated DNA-binding sites identified at satisfactory accuracy. Additionally, DBD-Threader correctly assigned the SCOP superfamily for most predicted domains. To demonstrate that DBD-Threader is useful for automatic function annotation on a large-scale, DBD-Threader was applied to 18,631 protein sequences from the human genome; 1,654 proteins are predicted to have DNA-binding function. Comparison with existing Gene Ontology (GO) annotations suggests that approximately 30% of our predictions are new. Finally, we present some interesting predictions in detail. In particular, it is estimated that approximately 20% of classic zinc finger domains play a functional role not related to direct DNA-binding.
Collapse
Affiliation(s)
- Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
959
|
Chung CC, Magalhaes WCS, Gonzalez-Bosquet J, Chanock SJ. Genome-wide association studies in cancer--current and future directions. Carcinogenesis 2009; 31:111-20. [PMID: 19906782 DOI: 10.1093/carcin/bgp273] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Genome-wide association studies (GWAS) have emerged as an important tool for discovering regions of the genome that harbor genetic variants that confer risk for different types of cancers. The success of GWAS in the last 3 years is due to the convergence of new technologies that can genotype hundreds of thousands of single-nucleotide polymorphism markers together with comprehensive annotation of genetic variation. This approach has provided the opportunity to scan across the genome in a sufficiently large set of cases and controls without a set of prior hypotheses in search of susceptibility alleles with low effect sizes. Generally, the susceptibility alleles discovered thus far are common, namely, with a frequency in one or more population of >10% and each allele confers a small contribution to the overall risk for the disease. For nearly all regions conclusively identified by GWAS, the per allele effect sizes estimated are <1.3. Consequently, the findings of GWAS underscore the complex nature of cancer and have focused attention on a subset of the genetic variants that comprise the genomic architecture of each type of cancer, which already can differ substantially by the number of regions associated with specific types of cancer. For instance, in prostate cancer, there could be >30 distinct regions harboring common susceptibility alleles identified by GWAS, whereas in lung cancer, a disease strongly driven by exposure to tobacco products, so far, only three regions have been conclusively established. To date, >85 regions have been conclusively associated in over a dozen different cancers, yet no more than five regions have been associated with more than one distinct cancer type. GWAS are an important discovery tool that require extensive follow-up to map each region, investigate the biological mechanism underpinning the association and eventually test the optimal markers for assessing risk for a disease or its outcome, such as in pharmacogenomics, the study of the effect of genetic variation on pharmacological interventions. The success of GWAS has opened new horizons for exploration and highlighted the complex genomic architecture of disease susceptibility.
Collapse
Affiliation(s)
- Charles C Chung
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20892-4608, USA
| | | | | | | |
Collapse
|
960
|
Matsuzaki H, Wang PH, Hu J, Rava R, Fu GK. High resolution discovery and confirmation of copy number variants in 90 Yoruba Nigerians. Genome Biol 2009; 10:R125. [PMID: 19900272 PMCID: PMC3091319 DOI: 10.1186/gb-2009-10-11-r125] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2009] [Revised: 09/04/2009] [Accepted: 11/09/2009] [Indexed: 01/20/2023] Open
Abstract
Most microRNAs have a stronger inhibitory effect in estrogen receptor-negative than in estrogen receptor-positive breast cancers. Background Copy number variants (CNVs) account for a large proportion of genetic variation in the genome. The initial discoveries of long (> 100 kb) CNVs in normal healthy individuals were made on BAC arrays and low resolution oligonucleotide arrays. Subsequent studies that used higher resolution microarrays and SNP genotyping arrays detected the presence of large numbers of CNVs that are < 100 kb, with median lengths of approximately 10 kb. More recently, whole genome sequencing of individuals has revealed an abundance of shorter CNVs with lengths < 1 kb. Results We used custom high density oligonucleotide arrays in whole-genome scans at approximately 200-bp resolution, and followed up with a localized CNV typing array at resolutions as close as 10 bp, to confirm regions from the initial genome scans, and to detect the occurrence of sample-level events at shorter CNV regions identified in recent whole-genome sequencing studies. We surveyed 90 Yoruba Nigerians from the HapMap Project, and uncovered approximately 2,700 potentially novel CNVs not previously reported in the literature having a median length of approximately 3 kb. We generated sample-level event calls in the 90 Yoruba at nearly 9,000 regions, including approximately 2,500 regions having a median length of just approximately 200 bp that represent the union of CNVs independently discovered through whole-genome sequencing of two individuals of Western European descent. Event frequencies were noticeably higher at shorter regions < 1 kb compared to longer CNVs (> 1 kb). Conclusions As new shorter CNVs are discovered through whole-genome sequencing, high resolution microarrays offer a cost-effective means to detect the occurrence of events at these regions in large numbers of individuals in order to gain biological insights beyond the initial discovery.
Collapse
Affiliation(s)
- Hajime Matsuzaki
- Affymetrix, Inc, 3420 Central Expressway, Santa Clara, CA 95051, USA.
| | | | | | | | | |
Collapse
|
961
|
Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, Carnevali P, Nazarenko I, Nilsen GB, Yeung G, Dahl F, Fernandez A, Staker B, Pant KP, Baccash J, Borcherding AP, Brownley A, Cedeno R, Chen L, Chernikoff D, Cheung A, Chirita R, Curson B, Ebert JC, Hacker CR, Hartlage R, Hauser B, Huang S, Jiang Y, Karpinchyk V, Koenig M, Kong C, Landers T, Le C, Liu J, McBride CE, Morenzoni M, Morey RE, Mutch K, Perazich H, Perry K, Peters BA, Peterson J, Pethiyagoda CL, Pothuraju K, Richter C, Rosenbaum AM, Roy S, Shafto J, Sharanhovich U, Shannon KW, Sheppy CG, Sun M, Thakuria JV, Tran A, Vu D, Zaranek AW, Wu X, Drmanac S, Oliphant AR, Banyai WC, Martin B, Ballinger DG, Church GM, Reid CA. Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays. Science 2009; 327:78-81. [DOI: 10.1126/science.1181498] [Citation(s) in RCA: 962] [Impact Index Per Article: 64.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
962
|
Weise A, Timmermann B, Grabherr M, Werber M, Heyn P, Kosyakova N, Liehr T, Neitzel H, Konrat K, Bommer C, Dietrich C, Rajab A, Reinhardt R, Mundlos S, Lindner TH, Hoffmann K. High-throughput sequencing of microdissected chromosomal regions. Eur J Hum Genet 2009; 18:457-62. [PMID: 19888302 DOI: 10.1038/ejhg.2009.196] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The linkage of disease gene mapping with DNA sequencing is an essential strategy for defining the genetic basis of a disease. New massively parallel sequencing procedures will greatly facilitate this process, although enrichment for the target region before sequencing remains necessary. For this step, various DNA capture approaches have been described that rely on sequence-defined probe sets. To avoid making assumptions on the sequences present in the targeted region, we accessed specific cytogenetic regions in preparation for next-generation sequencing. We directly microdissected the target region in metaphase chromosomes, amplified it by degenerate oligonucleotide-primed PCR, and obtained sufficient material of high quality for high-throughput sequencing. Sequence reads could be obtained from as few as six chromosomal fragments. The power of cytogenetic enrichment followed by next-generation sequencing is that it does not depend on earlier knowledge of sequences in the region being studied. Accordingly, this method is uniquely suited for situations in which the sequence of a reference region of the genome is not available, including population-specific or tumor rearrangements, as well as previously unsequenced genomic regions such as centromeres.
Collapse
Affiliation(s)
- Anja Weise
- Institute of Human Genetics and Anthropology, Jena, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
963
|
Abstract
While pharmacogenetics - the correlation of genotype and response to medicines - currently has a small but measurable impact on the prescribing practice of clinicians, the advent of the ;personal genome' is likely to change this significantly. Advances in high-throughput technologies aimed at characterizing human genetic variation, including chip-based genotyping and next-generation sequencing, are poised to provide a flood of information that will affect both pharmacogenetic discovery and pharmacogenetic application in clinical practice. In order for this flood of information to not overwhelm both researchers and clinicians alike, a variety of new and expanded information management tools will be needed, including electronic medical records, bioinformatic algorithms for analyzing sequence data, information management systems for storing, retrieving and interpreting whole-genome sequence data, and pharmacogenetic decision tools for prescribers.
Collapse
Affiliation(s)
- Michael J Wagner
- Institute for Pharmacogenomics and Individualized Therapy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-27361, USA
| |
Collapse
|
964
|
Microdroplet-based PCR enrichment for large-scale targeted sequencing. Nat Biotechnol 2009; 27:1025-31. [PMID: 19881494 PMCID: PMC2779736 DOI: 10.1038/nbt.1583] [Citation(s) in RCA: 313] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2009] [Accepted: 10/01/2009] [Indexed: 12/03/2022]
Abstract
Targeted sequencing of specific loci of the human genome is a promising approach for maximizing the efficiency of second-generation sequencing technologies for population-based studies of genetic variation. Here we describe microdroplet PCR, which performs 1.5 million separate amplifications in parallel, as an approach for enriching targeted sequences in the human genome. We initially designed primers to 435 exons of 47 genes that were selected for having a broad spectrum of sequence characteristics. Using this primer set we amplified the same six samples by both microdroplet and traditional singleplex PCR and sequenced the products using the Illumina GAII demonstrating that both methods generate similarly high quality data; 84% of the uniquely mapping reads fell within the targeted sequences, uniform coverage of ~90% of the targeted bases, greater than 99% accuracy in sequence variant calls, and high reproducibility between different samples (r2=0.9). We next scaled the microdroplet PCR to 3976 amplicons totaling 1.49 Mb of sequence, sequenced the resulting sample on both the Illumina GAII and Roche 454 platforms, and obtained data with equally high specificity and sensitivity quality. Our results demonstrate that microdroplet technology is well suited for processing DNA for massively parallel amplification of specific subsets of the human genome for targeted sequencing.
Collapse
|
965
|
Affiliation(s)
- Elaine R Mardis
- The Genome Center at Washington University, Washington University School of Medicine, 4444 Forest Park Blvd, St. Louis, MO 63108, USA.
| | - Jeantine E Lunshof
- European Centre for Public Health Genomics, Maastricht University, Maastricht, The Netherlands and Department of Molecular Cell Physiology, VU University, Amsterdam, The Netherlands
| |
Collapse
|
966
|
Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U S A 2009; 106:19096-101. [PMID: 19861545 DOI: 10.1073/pnas.0910672106] [Citation(s) in RCA: 915] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Protein coding genes constitute only approximately 1% of the human genome but harbor 85% of the mutations with large effects on disease-related traits. Therefore, efficient strategies for selectively sequencing complete coding regions (i.e., "whole exome") have the potential to contribute to the understanding of rare and common human diseases. Here we report a method for whole-exome sequencing coupling Roche/NimbleGen whole exome arrays to the Illumina DNA sequencing platform. We demonstrate the ability to capture approximately 95% of the targeted coding sequences with high sensitivity and specificity for detection of homozygous and heterozygous variants. We illustrate the utility of this approach by making an unanticipated genetic diagnosis of congenital chloride diarrhea in a patient referred with a suspected diagnosis of Bartter syndrome, a renal salt-wasting disease. The molecular diagnosis was based on the finding of a homozygous missense D652N mutation at a position in SLC26A3 (the known congenital chloride diarrhea locus) that is virtually completely conserved in orthologues and paralogues from invertebrates to humans, and clinical follow-up confirmed the diagnosis. To our knowledge, whole-exome (or genome) sequencing has not previously been used to make a genetic diagnosis. Five additional patients suspected to have Bartter syndrome but who did not have mutations in known genes for this disease had homozygous deleterious mutations in SLC26A3. These results demonstrate the clinical utility of whole-exome sequencing and have implications for disease gene discovery and clinical diagnosis.
Collapse
|
967
|
Philp AR, Jin M, Li S, Schindler EI, Iannaccone A, Lam BL, Weleber RG, Fishman GA, Jacobson SG, Mullins RF, Travis GH, Stone EM. Predicting the pathogenicity of RPE65 mutations. Hum Mutat 2009; 30:1183-8. [PMID: 19431183 DOI: 10.1002/humu.21033] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
To assist in distinguishing disease-causing mutations from nonpathogenic polymorphisms, we developed an objective algorithm to calculate an "estimate of pathogenic probability" (EPP) based on the prevalence of a specific variation, its segregation within families, and its predicted effects on protein structure. Eleven missense variations in the RPE65 gene were evaluated in patients with Leber congenital amaurosis (LCA) using the EPP algorithm. The accuracy of the EPP algorithm was evaluated using a cell-culture assay of RPE65-isomerase activity The variations were engineered into plasmids containing a human RPE65 cDNA and the retinoid isomerase activity of each variant was determined in cultured cells. The EPP algorithm predicted eight substitution mutations to be disease-causing variants. The isomerase catalytic activities of these RPE65 variants were all less than 6% of wild-type. In contrast, the EPP algorithm predicted the other three substitutions to be non-disease-causing, with isomerase activities of 68%, 127%, and 110% of wild-type, respectively. We observed complete concordance between the predicted pathogenicities of missense variations in the RPE65 gene and retinoid isomerase activities measured in a functional assay. These results suggest that the EPP algorithm may be useful to evaluate the pathogenicity of missense variations in other disease genes where functional assays are not available.
Collapse
Affiliation(s)
- A R Philp
- Department of Ophthalmology and Visual Sciences, University of Iowa Hospitals and Clinics, Iowa City, Iowa
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
968
|
Chen CH, Chuang TJ, Liao BY, Chen FC. Scanning for the signatures of positive selection for human-specific insertions and deletions. Genome Biol Evol 2009; 1:415-9. [PMID: 20333210 PMCID: PMC2817433 DOI: 10.1093/gbe/evp041] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/16/2009] [Indexed: 12/03/2022] Open
Abstract
Human-specific small insertions and deletions (HS indels, with lengths <100 bp) are reported to be ubiquitous in the human genome. However, whether these indels contribute to human-specific traits remains unclear. Here we employ a modified McDonald–Kreitman (MK) test and a combinatorial population genetics approach to infer, respectively, the occurrence of positive selection and recent selective sweep events associated with HS indels. We first extract 625,890 HS indels from the human–chimpanzee–macaque–mouse multiple alignments and classify them into nonpolymorphic (41%) and polymorphic (59%) indels with reference to the human indel polymorphism data. The modified MK test is then applied to 100-kb partially overlapped sliding windows across the human genome to scan for the signs of positive selection. After excluding the possibility of biased gene conversion and controlling for false discovery rate, we show that HS indels are potentially positively selected in about 10 Mb of the human genome. Furthermore, the indel-associated positively selected regions overlap with genes more often than expected. However, our result suggests that the potential targets of positive selection are located in noncoding regions. Meanwhile, we also demonstrate that the genomic regions surrounding HS indels are more frequently involved in recent selective sweep than the other regions. In addition, HS indels are associated with distinct recent selective sweep events in different human subpopulations. Our results suggest that HS indels may have been associated with human adaptive changes at both the species level and the subpopulation level.
Collapse
|
969
|
Wendl MC, Wilson RK. The theory of discovering rare variants via DNA sequencing. BMC Genomics 2009; 10:485. [PMID: 19843339 PMCID: PMC2778663 DOI: 10.1186/1471-2164-10-485] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2009] [Accepted: 10/20/2009] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Rare population variants are known to have important biomedical implications, but their systematic discovery has only recently been enabled by advances in DNA sequencing. The design process of a discovery project remains formidable, being limited to ad hoc mixtures of extensive computer simulation and pilot sequencing. Here, the task is examined from a general mathematical perspective. RESULTS We pose and solve the population sequencing design problem and subsequently apply standard optimization techniques that maximize the discovery probability. Emphasis is placed on cases whose discovery thresholds place them within reach of current technologies. We find that parameter values characteristic of rare-variant projects lead to a general, yet remarkably simple set of optimization rules. Specifically, optimal processing occurs at constant values of the per-sample redundancy, refuting current notions that sample size should be selected outright. Optimal project-wide redundancy and sample size are then shown to be inversely proportional to the desired variant frequency. A second family of constants governs these relationships, permitting one to immediately establish the most efficient settings for a given set of discovery conditions. Our results largely concur with the empirical design of the Thousand Genomes Project, though they furnish some additional refinement. CONCLUSION The optimization principles reported here dramatically simplify the design process and should be broadly useful as rare-variant projects become both more important and routine in the future.
Collapse
Affiliation(s)
- Michael C Wendl
- The Genome Center and Department of Genetics, Washington University, St. Louis MO 63108, USA
| | - Richard K Wilson
- The Genome Center and Department of Genetics, Washington University, St. Louis MO 63108, USA
| |
Collapse
|
970
|
Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 2009; 461:809-13. [PMID: 19812674 DOI: 10.1038/nature08489] [Citation(s) in RCA: 832] [Impact Index Per Article: 55.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2009] [Accepted: 09/10/2009] [Indexed: 12/29/2022]
Abstract
Recent advances in next generation sequencing have made it possible to precisely characterize all somatic coding mutations that occur during the development and progression of individual cancers. Here we used these approaches to sequence the genomes (>43-fold coverage) and transcriptomes of an oestrogen-receptor-alpha-positive metastatic lobular breast cancer at depth. We found 32 somatic non-synonymous coding mutations present in the metastasis, and measured the frequency of these somatic mutations in DNA from the primary tumour of the same patient, which arose 9 years earlier. Five of the 32 mutations (in ABCB11, HAUS3, SLC24A4, SNX4 and PALB2) were prevalent in the DNA of the primary tumour removed at diagnosis 9 years earlier, six (in KIF1C, USP28, MYH8, MORC1, KIAA1468 and RNASEH2A) were present at lower frequencies (1-13%), 19 were not detected in the primary tumour, and two were undetermined. The combined analysis of genome and transcriptome data revealed two new RNA-editing events that recode the amino acid sequence of SRP9 and COG3. Taken together, our data show that single nucleotide mutational heterogeneity can be a property of low or intermediate grade primary breast cancers and that significant evolution can occur with disease progression.
Collapse
|
971
|
Tewhey R, Nakano M, Wang X, Pabón-Peña C, Novak B, Giuffre A, Lin E, Happe S, Roberts DN, LeProust EM, Topol EJ, Harismendy O, Frazer KA. Enrichment of sequencing targets from the human genome by solution hybridization. Genome Biol 2009; 10:R116. [PMID: 19835619 PMCID: PMC2784331 DOI: 10.1186/gb-2009-10-10-r116] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2009] [Revised: 09/05/2009] [Accepted: 10/16/2009] [Indexed: 01/18/2023] Open
Abstract
A method for target sequence enrichment from the human genome is described. This hybridization-based approach using oligonucleotide probes in solution has excellent sensitivity and accuracy for calling SNPs To exploit fully the potential of current sequencing technologies for population-based studies, one must enrich for loci from the human genome. Here we evaluate the hybridization-based approach by using oligonucleotide capture probes in solution to enrich for approximately 3.9 Mb of sequence target. We demonstrate that the tiling probe frequency is important for generating sequence data with high uniform coverage of targets. We obtained 93% sensitivity to detect SNPs, with a calling accuracy greater than 99%.
Collapse
Affiliation(s)
- Ryan Tewhey
- Scripps Genomic Medicine, Scripps Translational Science Institute, The Scripps Research Institute, 3344 N Torrey Pines Court, La Jolla, CA 92037, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
972
|
Abstract
The emergence of massively parallel DNA sequencing platforms has made resequencing an affordable approach to study genetic variation. However, the cost of whole genome resequencing remains too high to apply to large numbers of human samples. Genomic partitioning methods allow enrichment for regions of interest at a scale that is matched to the throughput of the new sequencing platforms. We review general categories of methods for genomic partitioning including multiplex PCR, capture-by-circularization, and capture-by-hybridization. Parameters that are relevant to the performance of any given method include multiplexity, specificity, uniformity, input requirements, scalability, and cost. The successful development of genomic partitioning strategies will be key to taking full advantage of massively parallel sequencing, at least until resequencing of complete mammalian genomes becomes widely affordable.
Collapse
Affiliation(s)
- Emily H Turner
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065, USA.
| | | | | | | |
Collapse
|
973
|
Wiseman RW, Karl JA, Bimber BN, O'Leary CE, Lank SM, Tuscher JJ, Detmer AM, Bouffard P, Levenkova N, Turcotte CL, Szekeres E, Wright C, Harkins T, O'Connor DH. Major histocompatibility complex genotyping with massively parallel pyrosequencing. Nat Med 2009; 15:1322-6. [PMID: 19820716 DOI: 10.1038/nm.2038] [Citation(s) in RCA: 118] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2009] [Accepted: 05/17/2009] [Indexed: 11/10/2022]
Abstract
Major histocompatibility complex (MHC) genetics dictate adaptive cellular immune responses, making robust MHC genotyping methods essential for studies of infectious disease, vaccine development and transplantation. Nonhuman primates provide essential preclinical models for these areas of biomedical research. Unfortunately, given the unparalleled complexity of macaque MHCs, existing methodologies are inadequate for MHC typing of these key model animals. Here we use pyrosequencing of complementary DNA-PCR amplicons as a general approach to determine comprehensive MHC class I genotypes in nonhuman primates. More than 500 unique MHC class I sequences were resolved by sequence-based typing of rhesus, cynomolgus and pig-tailed macaques, nearly half of which have not been reported previously. The remarkable sensitivity of this approach in macaques demonstrates that pyrosequencing is viable for ultra-high-throughput MHC genotyping of primates, including humans.
Collapse
Affiliation(s)
- Roger W Wiseman
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
974
|
Origins and functional impact of copy number variation in the human genome. Nature 2009; 464:704-12. [PMID: 19812545 DOI: 10.1038/nature08516] [Citation(s) in RCA: 1390] [Impact Index Per Article: 92.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Accepted: 09/21/2009] [Indexed: 02/07/2023]
Abstract
Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.
Collapse
|
975
|
Gilad Y, Pritchard JK, Thornton K. Characterizing natural variation using next-generation sequencing technologies. Trends Genet 2009; 25:463-71. [PMID: 19801172 DOI: 10.1016/j.tig.2009.09.003] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2009] [Revised: 09/08/2009] [Accepted: 09/09/2009] [Indexed: 01/22/2023]
Abstract
Progress in evolutionary genomics is tightly coupled with the development of new technologies to collect high-throughput data. The availability of next-generation sequencing technologies has the potential to revolutionize genomic research and enable us to focus on a large number of outstanding questions that previously could not be addressed effectively. Indeed, we are now able to study genetic variation on a genome-wide scale, characterize gene regulatory processes at unprecedented resolution, and soon, we expect that individual laboratories might be able to rapidly sequence new genomes. However, at present, the analysis of next-generation sequencing data is challenging, in particular because most sequencing platforms provide short reads, which are difficult to align and assemble. In addition, only little is known about sources of variation that are associated with next-generation sequencing study designs. A better understanding of the sources of error and bias in sequencing data is essential, especially in the context of studies of variation at dynamic quantitative traits.
Collapse
Affiliation(s)
- Yoav Gilad
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
| | | | | |
Collapse
|
976
|
Ali-Khan SE, Daar AS, Shuman C, Ray PN, Scherer SW. Whole genome scanning: resolving clinical diagnosis and management amidst complex data. Pediatr Res 2009; 66:357-63. [PMID: 19531980 DOI: 10.1203/pdr.0b013e3181b0cbd8] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Momentum around the era of genomic medicine is building, and with it, anticipation of the benefits that whole genome analysis (personalized or individualized genomics) will bring for the provision of health care. These technologies have the potential to revolutionize genetic diagnosis; however, the expansive data generated can lead to complex or unexpected findings, sometimes complicating clinical utility and patient benefit. Here, we use our experience with whole genome scanning microarrays, an early instance of whole genome analysis already in clinical use, to highlight fundamental challenges raised by these technologies and to discuss their medical, ethical, legal and social implications. We discuss issues that physicians and healthcare professionals will face, in particular, as the resolution of testing further increases toward full genome sequence determination. We emphasize that addressing these issues now, and starting to evolve our healthcare systems in response, will be pivotal in avoiding harms and realizing the promise of these new technologies.
Collapse
Affiliation(s)
- Sarah E Ali-Khan
- McLaughlin-Rotman Centre for Global Health, University Health Network and University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | | | | | | | | |
Collapse
|
977
|
Tang J, Li Y, Pan Z, Guo Y, Ma J, Ning S, Xiao P, Lu Z. Single nucleotide variation detection by ligation of universal probes on a 3D poyacrylamide gel DNA microarray. Hum Mutat 2009; 30:1460-8. [DOI: 10.1002/humu.21080] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
978
|
Liu P, Mathies RA. Integrated microfluidic systems for high-performance genetic analysis. Trends Biotechnol 2009; 27:572-81. [DOI: 10.1016/j.tibtech.2009.07.002] [Citation(s) in RCA: 99] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2009] [Revised: 06/30/2009] [Accepted: 07/02/2009] [Indexed: 01/09/2023]
|
979
|
Experimental therapies in hypertrophic cardiomyopathy. J Cardiovasc Transl Res 2009; 2:483-92. [PMID: 20560006 DOI: 10.1007/s12265-009-9132-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/01/2009] [Accepted: 09/16/2009] [Indexed: 12/31/2022]
Abstract
The quintessential clinical diagnostic phenotype of human hypertrophic cardiomyopathy (HCM) is primary cardiac hypertrophy. Cardiac hypertrophy is also a major determinant of mortality and morbidity including the risk of sudden cardiac death (SCD) in patients with HCM. Reversal and attenuation of cardiac hypertrophy and its accompanying fibrosis is expected to improve morbidity as well as decrease the risk of SCD in patients with HCM.The conventionally used pharmacological agents in treatment of patients with HCM have not been shown to reverse or attenuate established cardiac hypertrophy and fibrosis. An effective treatment of HCM has to target the molecular mechanisms that are involved in the pathogenesis of the phenotype. Mechanistic studies suggest that cardiac hypertrophy in HCM is secondary to activation of various hypertrophic signaling molecules and, hence, is potentially reversible. The hypothesis is supported by the results of genetic and pharmacological interventions in animal models. The results have shown potential beneficial effects of angiotensin II receptor blocker losartan, mineralocorticoid receptor blocker spironolactone, 3-hydroxy-3-methyglutaryl-coenzyme A reductase inhibitors simvastatin and atorvastatin, and most recently, N-acetylcysteine (NAC) on reversal or prevention of hypertrophy and fibrosis in HCM. The most promising results have been obtained with NAC, which through multiple thiol-responsive mechanisms completely reversed established cardiac hypertrophy and fibrosis in three independent studies. Pilot studies with losartan and statins in humans have established the feasibility of such studies. The results in animal models have firmly established the reversibility of established cardiac hypertrophy and fibrosis in HCM and have set the stage for advancing the findings in the animal models to human patients with HCM through conducting large-scale efficacy studies.
Collapse
|
980
|
Kim JI, Ju YS, Kim SH, Hong DW, Seo JS. Detection of hydin Gene Duplication in Personal Genome Sequence Data. Genomics Inform 2009. [DOI: 10.5808/gi.2009.7.3.159] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
981
|
Triplet repeat length bias and variation in the human transcriptome. Proc Natl Acad Sci U S A 2009; 106:17095-100. [PMID: 19805156 DOI: 10.1073/pnas.0907112106] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Length variation in short tandem repeats (STRs) is an important family of DNA polymorphisms with numerous applications in genetics, medicine, forensics, and evolutionary analysis. Several major diseases have been associated with length variation of trinucleotide (triplet) repeats including Huntington's disease, hereditary ataxias and spinobulbar muscular atrophy. Using the reference human genome, we have catalogued all triplet repeats in genic regions. This data revealed a bias in noncoding DNA repeat lengths. It also enabled a survey of repeat-length polymorphisms (RLPs) in human genomes and a comparison of the rate of polymorphism in humans versus divergence from chimpanzee. For short repeats, this analysis of three human genomes reveals a relatively low RLP rate in exons and, somewhat surprisingly, in introns. All short RLPs observed in multiple genomes are biallelic (at least in this small sample). In contrast, long repeats are highly polymorphic and some long RLPs are multiallelic. For long repeats, the chimpanzee sequence frequently differs from all observed human alleles. This suggests a high expansion/contraction rate in all long repeats. Expansions and contractions are not, however, affected by natural selection discernable from our comparison of human-chimpanzee divergence with human RLPs. Our catalog of human triplet repeats and their surrounding flanking regions can be used to produce a cost-effective whole-genome assay to test individuals. This repeat assay could someday complement SNP arrays for producing tests that assess the risk of an individual to develop a disease, or become part of personalized genomic strategy that provides therapeutic guidance with respect to drug response.
Collapse
|
982
|
Extracting evidence from forensic DNA analyses: future molecular biology directions. Biotechniques 2009; 46:339-40, 342-50. [PMID: 19480629 DOI: 10.2144/000113136] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Molecular biology tools have enhanced the capability of the forensic scientist to characterize biological evidence to the point where it is feasible to analyze minute samples and achieve high levels of individualization. Even with the forensic DNA field's maturity, there still are a number of areas where improvements can be made. These include: enabling the typing of samples of limited quantity and quality; using genetic information and novel markers to provide investigative leads; enhancing automation with robotics, different chemistries, and better software tools; employing alternate platforms for typing DNA samples; developing integrated microfluidic/microfabrication devices to process DNA samples with higher throughput, faster turnaround times, lower risk of contamination, reduced labor, and less consumption of evidentiary samples; and exploiting high-throughput sequencing, particularly for attribution in microbial forensics cases. Knowledge gaps and new directions have been identified where molecular biology will likely guide the field of forensics. This review aims to provide a roadmap to guide those interested in contributing to the further development of forensic genetics.
Collapse
|
983
|
Qi W, Käser M, Röltgen K, Yeboah-Manu D, Pluschke G. Genomic diversity and evolution of Mycobacterium ulcerans revealed by next-generation sequencing. PLoS Pathog 2009; 5:e1000580. [PMID: 19806175 PMCID: PMC2736377 DOI: 10.1371/journal.ppat.1000580] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2009] [Accepted: 08/17/2009] [Indexed: 12/02/2022] Open
Abstract
Mycobacterium ulcerans is the causative agent of Buruli ulcer, the third most common mycobacterial disease after tuberculosis and leprosy. It is an emerging infectious disease that afflicts mainly children and youths in West Africa. Little is known about the evolution and transmission mode of M. ulcerans, partially due to the lack of known genetic polymorphisms among isolates, limiting the application of genetic epidemiology. To systematically profile single nucleotide polymorphisms (SNPs), we sequenced the genomes of three M. ulcerans strains using 454 and Solexa technologies. Comparison with the reference genome of the Ghanaian classical lineage isolate Agy99 revealed 26,564 SNPs in a Japanese strain representing the ancestral lineage. Only 173 SNPs were found when comparing Agy99 with two other Ghanaian isolates, which belong to the two other types previously distinguished in Ghana by variable number tandem repeat typing. We further analyzed a collection of Ghanaian strains using the SNPs discovered. With 68 SNP loci, we were able to differentiate 54 strains into 13 distinct SNP haplotypes. The average SNP nucleotide diversity was low (average 0.06–0.09 across 68 SNP loci), and 96% of the SNP locus pairs were in complete linkage disequilibrium. We estimated that the divergence of the M. ulcerans Ghanaian clade from the Japanese strain occurred 394 to 529 thousand years ago. The Ghanaian subtypes diverged about 1000 to 3000 years ago, or even much more recently, because we found evidence that they evolved significantly faster than average. Our results offer significant insight into the evolution of M. ulcerans and provide a comprehensive report on genetic diversity within a highly clonal M. ulcerans population from a Buruli ulcer endemic region, which can facilitate further epidemiological studies of this pathogen through the development of high-resolution tools. Mycobacterium ulcerans is the causative agent of Buruli ulcer (BU), a necrotizing skin disease and the third most common mycobacterial disease after tuberculosis and leprosy. It is an emerging infectious disease that afflicts mainly children and youths in West Africa. The disease is also found in tropical and subtropical regions of Asia, the Western Pacific, and Latin America. Limited knowledge of this neglected tropical disease is partially due to the lack of known genetic polymorphisms among isolates, which hinder the study of transmission, epidemiology, and evolution of M. ulcerans. Our aim is to systematically profile genetic diversity among M. ulcerans isolates by sequencing and comparing the genomes of selected strains. We identified single nucleotide polymorphisms (SNPs) within a highly clonal M. ulcerans population from a Buruli ulcer endemic region. Based on the SNPs discovered, we developed SNP typing assays and were able to differentiate a collection of M. ulcerans isolates from this Buruli ulcer endemic region into 13 SNP haplotypes. Our results lay the ground for developing a highly discriminatory and cost-effective tool to study M. ulcerans evolution and epidemiology at a population level.
Collapse
Affiliation(s)
- Weihong Qi
- Department of Medical Parasitology and Infection Biology, Swiss Tropical Institute, Basel, Switzerland
| | - Michael Käser
- Department of Medical Parasitology and Infection Biology, Swiss Tropical Institute, Basel, Switzerland
| | - Katharina Röltgen
- Department of Medical Parasitology and Infection Biology, Swiss Tropical Institute, Basel, Switzerland
| | - Dorothy Yeboah-Manu
- Department of Bacteriology, Noguchi Memorial Institute for Medical Research, University of Ghana, Legon, Ghana
| | - Gerd Pluschke
- Department of Medical Parasitology and Infection Biology, Swiss Tropical Institute, Basel, Switzerland
- * E-mail:
| |
Collapse
|
984
|
Hodoglugil U, Williamson DW, Mahley RW. Polymorphisms in the hepatic lipase gene affect plasma HDL-cholesterol levels in a Turkish population. J Lipid Res 2009; 51:422-30. [PMID: 19734193 DOI: 10.1194/jlr.p001578] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We investigated the effects of single nucleotide polymorphisms (SNPs) of the hepatic lipase gene (LIPC) on plasma HDL-cholesterol (HDL-C) levels in Turks, a population with low levels of HDL-C. All exons and six evolutionarily conserved regions from 28 Turkish subjects were sequenced. We found 51 SNPs, nine of which were novel. Those 51 SNPs and SNPs from the National Center for Biotechnology Information dbSNP were evaluated by bioinformatics approaches. The population frequencies and linkage disequilibrium among SNPs from HapMap were combined with results from transcriptional factor prediction tools and the literature to select SNPs for genotyping. We found that five tagging LIPC SNPs, two reported here for the first time, were significantly associated with plasma HDL-C levels in both men and women (n = 2,612). These results were replicated in a separate Turkish cohort (n = 1,164). Plasma HDL-C levels were higher in subjects homozygous for the minor alleles of rs4775041, rs1800588 (-514C>T), and rs11858164 and lower in subjects homozygous for the minor alleles of rs11856322 and rs2242061. These SNPs seemed to have independent and additive effects on plasma HDL-C levels (1.5-5.2 mg/dl). Hepatic lipase activity in a subset (n = 260) of the main cohort was also significantly associated with all five SNPs. Thus, five LIPC SNPs, two novel, are associated with plasma HDL-C levels and hepatic lipase activity in two cohorts of Turkish subjects.
Collapse
Affiliation(s)
- Ugur Hodoglugil
- Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, CA, USA
| | | | | |
Collapse
|
985
|
Abstract
The origin of new genes is extremely important to evolutionary innovation. Most new genes arise from existing genes through duplication or recombination. The origin of new genes from noncoding DNA is extremely rare, and very few eukaryotic examples are known. We present evidence for the de novo origin of at least three human protein-coding genes since the divergence with chimp. Each of these genes has no protein-coding homologs in any other genome, but is supported by evidence from expression and, importantly, proteomics data. The absence of these genes in chimp and macaque cannot be explained by sequencing gaps or annotation error. High-quality sequence data indicate that these loci are noncoding DNA in other primates. Furthermore, chimp, gorilla, gibbon, and macaque share the same disabling sequence difference, supporting the inference that the ancestral sequence was noncoding over the alternative possibility of parallel gene inactivation in multiple primate lineages. The genes are not well characterized, but interestingly, one of them was first identified as an up-regulated gene in chronic lymphocytic leukemia. This is the first evidence for entirely novel human-specific protein-coding genes originating from ancestrally noncoding sequences. We estimate that 0.075% of human genes may have originated through this mechanism leading to a total expectation of 18 such cases in a genome of 24,000 protein-coding genes.
Collapse
|
986
|
Yngvadottir B, Macarthur DG, Jin H, Tyler-Smith C. The promise and reality of personal genomics. Genome Biol 2009; 10:237. [PMID: 19723346 PMCID: PMC2768970 DOI: 10.1186/gb-2009-10-9-237] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The publication of the highest-quality and best-annotated personal genome yet tells us much about sequencing technology, something about genetic ancestry, but still little of medical relevance.
Collapse
Affiliation(s)
- Bryndis Yngvadottir
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | | | | | | |
Collapse
|
987
|
Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS, McGrath SD, Wendl MC, Zhang Q, Locke DP, Shi X, Fulton RS, Ley TJ, Wilson RK, Ding L, Mardis ER. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods 2009; 6:677-81. [PMID: 19668202 PMCID: PMC3661775 DOI: 10.1038/nmeth.1363] [Citation(s) in RCA: 1017] [Impact Index Per Article: 67.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2009] [Accepted: 07/13/2009] [Indexed: 11/09/2022]
Abstract
Detection and characterization of genomic structural variation are important for understanding the landscape of genetic variation in human populations and in complex diseases such as cancer. Recent studies demonstrate the feasibility of detecting structural variation using next-generation, short-insert, paired-end sequencing reads. However, the utility of these reads is not entirely clear, nor are the analysis methods with which accurate detection can be achieved. The algorithm BreakDancer predicts a wide variety of structural variants including insertion-deletions (indels), inversions and translocations. We examined BreakDancer's performance in simulation, in comparison with other methods and in analyses of a sample from an individual with acute myeloid leukemia and of samples from the 1,000 Genomes trio individuals. BreakDancer sensitively and accurately detected indels ranging from 10 base pairs to 1 megabase pair that are difficult to detect via a single conventional approach.
Collapse
Affiliation(s)
- Ken Chen
- The Genome Center, Washington University School of Medicine, St. Louis, Missouri, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
988
|
Barr CS. Strategies for performing genotype-phenotype association studies in nonhuman primates. Methods 2009; 49:56-62. [PMID: 19505576 PMCID: PMC2739376 DOI: 10.1016/j.ymeth.2009.05.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2009] [Revised: 05/20/2009] [Accepted: 05/22/2009] [Indexed: 01/21/2023] Open
Abstract
Anthropoid primate models offer opportunities to study genetic influence on alcohol consumption and alcohol-related intermediate phenotypes in socially and behaviorally complex animal models that are closely related to humans, and in which functionally equivalent or orthologous genetic variants exist. This review will discuss the methods commonly used for performing candidate gene-based studies in rhesus macaques in order to model how functional genetic variation moderates risk for human psychiatric disorders. Various in silico and in vitro approaches to identifying functional genetic variants for performance of these studies will be discussed. Next, I will provide examples of how this approach can be used for performing candidate gene-based studies and for examining gene by environment interactions. Finally, these approaches will then be placed in the context of how function-guided studies can inform us of genetic variants that may be under selection across species, demonstrating how functional genetic variants that may have conferred selective advantage at some point in the evolutionary history of humans could increase risk for addictive disorders in modern society.
Collapse
Affiliation(s)
- Christina S Barr
- Laboratories of Neurogenetics and Clinical and Translational Studies, NIH/NIAAA, 5625 Fishers Lane, Rm. 3S-32, Rockville, MD 20852, USA.
| |
Collapse
|
989
|
Leser TD, Mølbak L. Better living through microbial action: the benefits of the mammalian gastrointestinal microbiota on the host. Environ Microbiol 2009; 11:2194-206. [DOI: 10.1111/j.1462-2920.2009.01941.x] [Citation(s) in RCA: 207] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
990
|
Honey K. Tales from the gene pool: a genomic view of infectious disease. J Clin Invest 2009; 119:2452-4. [PMID: 19729842 DOI: 10.1172/jci40662] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Research into the pathogenesis, prevention, and control of infectious and parasitic diseases remains a global priority, as these scourges continue to be a substantial cause of mortality and morbidity. The plethora of molecular tools that are now readily available has facilitated a genome-wide approach to studying the pathogenesis of such diseases, with direct implications for disease prevention and treatment. The articles in this Review Series describe how genome-wide approaches have provided insight into a range of human pathogens, leading to greater understanding of the human diseases that they cause, and highlight some of the challenges that must be overcome if we are to maximize what we learn from the wealth of genomic information now available.
Collapse
Affiliation(s)
- Karen Honey
- The Journal of Clinical Investigation, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
| |
Collapse
|
991
|
Morozova O, Hirst M, Marra MA. Applications of new sequencing technologies for transcriptome analysis. Annu Rev Genomics Hum Genet 2009; 10:135-51. [PMID: 19715439 DOI: 10.1146/annurev-genom-082908-145957] [Citation(s) in RCA: 340] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Transcriptome analysis has been a key area of biological inquiry for decades. Over the years, research in the field has progressed from candidate gene-based detection of RNAs using Northern blotting to high-throughput expression profiling driven by the advent of microarrays. Next-generation sequencing technologies have revolutionized transcriptomics by providing opportunities for multidimensional examinations of cellular transcriptomes in which high-throughput expression data are obtained at a single-base resolution.
Collapse
Affiliation(s)
- Olena Morozova
- BC Cancer Agency, Genome Sciences Center, Vancouver, BC V5Z 4S6, Canada.
| | | | | |
Collapse
|
992
|
Vandiedonck C, Knight JC. The human Major Histocompatibility Complex as a paradigm in genomics research. BRIEFINGS IN FUNCTIONAL GENOMICS & PROTEOMICS 2009; 8:379-94. [PMID: 19468039 PMCID: PMC2987720 DOI: 10.1093/bfgp/elp010] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Since its discovery more than 50 years ago, the human Major Histocompatibility Complex (MHC) on chromosome 6p21.3 has been at the forefront of human genetic research. Here, we review from a historical perspective the major advances in our understanding of the nature and consequences of genetic variation which have involved the MHC, as well as highlighting likely future directions. As a consequence of its particular genomic structure, its remarkable polymorphism and its early implication in numerous diseases, the MHC has been considered as a model region for genomics, being the first substantial region to be sequenced and establishing fundamental concepts of linkage disequilibrium, haplotypic structure and meiotic recombination. Recently, the MHC became the first genomic region to be entirely re-sequenced for common haplotypes, while studies mapping gene expression phenotypes across the genome have strongly implicated variation in the MHC. This review shows how the MHC continues to provide new insights and remains in the vanguard of contemporary research in human genomics.
Collapse
Affiliation(s)
- Claire Vandiedonck
- Wellcome Trust Centre for Human Genetics (WTCHG), University of Oxford, Oxford, UK.
| | | |
Collapse
|
993
|
|
994
|
Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 2009; 41:1061-7. [PMID: 19718026 PMCID: PMC2875196 DOI: 10.1038/ng.437] [Citation(s) in RCA: 486] [Impact Index Per Article: 32.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2009] [Accepted: 07/23/2009] [Indexed: 12/18/2022]
Abstract
Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable due to difficulties in accurately resolving their structure, copy number and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads allowing for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy-number differences. We estimate that 73–87 genes will be on average copy-number variable between two human genomes and find that these genic differences overwhelmingly correspond to segmental duplications (OR=135; p<2.2e-16). Our method can distinguish between different copies of highly identical genes, providing a more accurate census of gene content and insight into functional constraint without the limitations of array-based technology.
Collapse
|
995
|
Boyle B, Dallaire N, MacKay J. Evaluation of the impact of single nucleotide polymorphisms and primer mismatches on quantitative PCR. BMC Biotechnol 2009; 9:75. [PMID: 19715565 PMCID: PMC2741440 DOI: 10.1186/1472-6750-9-75] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2008] [Accepted: 08/28/2009] [Indexed: 11/25/2022] Open
Abstract
Background Robust designs of PCR-based molecular diagnostic assays rely on the discrimination potential of sequence variants affecting primer-to-template annealing. However, for accurate quantitative PCR (qPCR) assessment of gene expression in populations with gene polymorphisms, the effects of sequence variants within primer binding sites must be minimized. This dichotomy in PCR applications prompted us to design experiments to specifically address the quantitative nature of PCR amplifications with oligonucleotides containing mismatches. Results We performed qPCR reactions with several primer-target combinations and calculated ratios of molecules obtained with mismatch oligonucleotides to the average obtained with perfect match primer pairs. Amplifications were performed with genomic DNA and complementary DNA samples from different genotypes to validate the findings obtained with plasmid DNA. Our results demonstrate that PCR amplifications are driven by probabilities of oligonucleotides annealing to target sequences. Empiric probabilities can be measured for any primer pair. Alternatively, for primers containing mismatches, probabilities can be measured for individual primers and calculated for primer pairs. Conclusion The ability to evaluate priming (and mispriming) rates and to predict their impacts provided a precise and quantitative description of assay performance. Priming probabilities were also found to be a good measure of analytical specificity.
Collapse
Affiliation(s)
- Brian Boyle
- Centre d'Etude de la Forêt, Institut de biologie intégrative et des systèmes, Pav, CE Marchand, Université Laval, Quebec City, QC G1V 0A6, Canada.
| | | | | |
Collapse
|
996
|
Franzén O, Jerlström-Hultqvist J, Castro E, Sherwood E, Ankarklev J, Reiner DS, Palm D, Andersson JO, Andersson B, Svärd SG. Draft genome sequencing of giardia intestinalis assemblage B isolate GS: is human giardiasis caused by two different species? PLoS Pathog 2009; 5:e1000560. [PMID: 19696920 PMCID: PMC2723961 DOI: 10.1371/journal.ppat.1000560] [Citation(s) in RCA: 192] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2009] [Accepted: 07/27/2009] [Indexed: 01/05/2023] Open
Abstract
Giardia intestinalis is a major cause of diarrheal disease worldwide and two major Giardia genotypes, assemblages A and B, infect humans. The genome of assemblage A parasite WB was recently sequenced, and the structurally compact 11.7 Mbp genome contains simplified basic cellular machineries and metabolism. We here performed 454 sequencing to 16× coverage of the assemblage B isolate GS, the only Giardia isolate successfully used to experimentally infect animals and humans. The two genomes show 77% nucleotide and 78% amino-acid identity in protein coding regions. Comparative analysis identified 28 unique GS and 3 unique WB protein coding genes, and the variable surface protein (VSP) repertoires of the two isolates are completely different. The promoters of several enzymes involved in the synthesis of the cyst-wall lack binding sites for encystation-specific transcription factors in GS. Several synteny-breaks were detected and verified. The tetraploid GS genome shows higher levels of overall allelic sequence polymorphism (0.5 versus <0.01% in WB). The genomic differences between WB and GS may explain some of the observed biological and clinical differences between the two isolates, and it suggests that assemblage A and B Giardia can be two different species. Giardia intestinalis is a major contributor to the enormous burden of diarrheal diseases with 250 million symptomatic infections per year, and it is part of the WHO neglected disease initiative. Nonetheless, there is poor insight into how Giardia causes disease; it is not invasive, secretes no known toxin and both the duration and symptoms of giardiasis are highly variable. Currently, there are seven defined variants (assemblages) of G. intestinalis, with only assemblages A and B being known to infect humans. Although assemblage B is the most prevalent worldwide, it is inconclusive whether the various genotypes are associated with different disease outcomes. We have used the 454 sequencing technology to sequence the first assemblage B isolate, and the genome was compared to the earlier sequenced assemblage A isolate. Large genetic differences were detected in genes involved in survival of the parasite during infections. The genomic differences between assemblage A and B can explain some of the observed biological and clinical differences between the two assemblages. Our data suggest that assemblage A and B Giardia can be two different species. The identification of genomic differences between assemblages is indeed very important for further studies of the disease and in the development of new methods for diagnosis and treatment of giardiasis.
Collapse
Affiliation(s)
- Oscar Franzén
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | | | - Elsie Castro
- Centre for Microbiological Preparedness, Swedish Institute for Infectious Disease Control, Solna, Sweden
| | - Ellen Sherwood
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Johan Ankarklev
- Department of Cell and Molecular Biology, BMC, Uppsala University, Uppsala, Sweden
| | - David S. Reiner
- The Burnham Institute for Medical Research, La Jolla, California, United States of America
| | - Daniel Palm
- Centre for Microbiological Preparedness, Swedish Institute for Infectious Disease Control, Solna, Sweden
| | - Jan O. Andersson
- Department of Evolution, Genomics and Systematics, EBC, Uppsala University, Uppsala, Sweden
| | - Björn Andersson
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Staffan G. Svärd
- Department of Cell and Molecular Biology, BMC, Uppsala University, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
997
|
Ultra-Structure database design methodology for managing systems biology data and analyses. BMC Bioinformatics 2009; 10:254. [PMID: 19691849 PMCID: PMC2748085 DOI: 10.1186/1471-2105-10-254] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2009] [Accepted: 08/19/2009] [Indexed: 11/22/2022] Open
Abstract
Background Modern, high-throughput biological experiments generate copious, heterogeneous, interconnected data sets. Research is dynamic, with frequently changing protocols, techniques, instruments, and file formats. Because of these factors, systems designed to manage and integrate modern biological data sets often end up as large, unwieldy databases that become difficult to maintain or evolve. The novel rule-based approach of the Ultra-Structure design methodology presents a potential solution to this problem. By representing both data and processes as formal rules within a database, an Ultra-Structure system constitutes a flexible framework that enables users to explicitly store domain knowledge in both a machine- and human-readable form. End users themselves can change the system's capabilities without programmer intervention, simply by altering database contents; no computer code or schemas need be modified. This provides flexibility in adapting to change, and allows integration of disparate, heterogenous data sets within a small core set of database tables, facilitating joint analysis and visualization without becoming unwieldy. Here, we examine the application of Ultra-Structure to our ongoing research program for the integration of large proteomic and genomic data sets (proteogenomic mapping). Results We transitioned our proteogenomic mapping information system from a traditional entity-relationship design to one based on Ultra-Structure. Our system integrates tandem mass spectrum data, genomic annotation sets, and spectrum/peptide mappings, all within a small, general framework implemented within a standard relational database system. General software procedures driven by user-modifiable rules can perform tasks such as logical deduction and location-based computations. The system is not tied specifically to proteogenomic research, but is rather designed to accommodate virtually any kind of biological research. Conclusion We find Ultra-Structure offers substantial benefits for biological information systems, the largest being the integration of diverse information sources into a common framework. This facilitates systems biology research by integrating data from disparate high-throughput techniques. It also enables us to readily incorporate new data types, sources, and domain knowledge with no change to the database structure or associated computer code. Ultra-Structure may be a significant step towards solving the hard problem of data management and integration in the systems biology era.
Collapse
|
998
|
Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing. PLoS One 2009; 4:e6659. [PMID: 19684856 PMCID: PMC2722027 DOI: 10.1371/journal.pone.0006659] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2009] [Accepted: 07/26/2009] [Indexed: 11/19/2022] Open
Abstract
Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb) and 7 (1.1 Mb) from an individual from the International HapMap Project (NA12872). We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage≥4-fold, and 97.9% concordant in regions with coverage≥15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.
Collapse
|
999
|
Dickins B, Nekrutenko A. High-resolution mapping of evolutionary trajectories in a phage. Genome Biol Evol 2009; 1:294-307. [PMID: 20333199 PMCID: PMC2817424 DOI: 10.1093/gbe/evp029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2009] [Indexed: 12/11/2022] Open
Abstract
Experimental evolution in rapidly reproducing viruses offers a robust means to infer substitution trajectories during evolution. But with conventional approaches, this inference is limited by how many individual genotypes can be sampled from the population at a time. Low-frequency changes are difficult to detect, potentially rendering early stages of adaptation unobservable. Here we circumvent this using short-read sequencing technology in a fine-grained analysis of polymorphism dynamics in the sentinel organism: a single-stranded DNA phage PhiX174. Nucleotide differences were educed from noise with binomial filtering methods that harnessed quality scores and separate data from brief phage amplifications. Remarkably, a significant degree of variation was observed in all samples including those grown in brief 2-h cultures. Sites previously reported as subject to high-frequency polymorphisms over a course of weeks exhibited monotonic increases in polymorphism frequency within hours in this study. Additionally, even with limitations imposed by the short length of sequencing reads, we were able to observe statistically significant linkage among polymorphic sites in evolved lineages. Additional parallels between replicate lineages were apparent in the sharing of polymorphic sites and in correlated polymorphism frequencies. Missense mutations were more likely to occur than silent mutations. This study offers the first glimpse into "real-time" substitution dynamics and offers a robust conceptual framework for future viral resequencing studies.
Collapse
Affiliation(s)
- Benjamin Dickins
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, USA.
| | | |
Collapse
|
1000
|
Varshney RK, Nayak SN, May GD, Jackson SA. Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 2009; 27:522-30. [PMID: 19679362 DOI: 10.1016/j.tibtech.2009.05.006] [Citation(s) in RCA: 401] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2009] [Revised: 05/21/2009] [Accepted: 05/27/2009] [Indexed: 10/20/2022]
Abstract
Using next-generation sequencing technologies it is possible to resequence entire plant genomes or sample entire transcriptomes more efficiently and economically and in greater depth than ever before. Rather than sequencing individual genomes, we envision the sequencing of hundreds or even thousands of related genomes to sample genetic diversity within and between germplasm pools. Identification and tracking of genetic variation are now so efficient and precise that thousands of variants can be tracked within large populations. In this review, we outline some important areas such as the large-scale development of molecular markers for linkage mapping, association mapping, wide crosses and alien introgression, epigenetic modifications, transcript profiling, population genetics and de novo genome/organellar genome assembly for which these technologies are expected to advance crop genetics and breeding, leading to crop improvement.
Collapse
Affiliation(s)
- Rajeev K Varshney
- Centre of Excellence in Genomics (CEG), International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru 502324, A.P., India.
| | | | | | | |
Collapse
|