51
|
Teer JK, Johnston JJ, Anzick SL, Pineda M, Stone G, Meltzer PS, Mullikin JC, Biesecker LG. Massively-parallel sequencing of genes on a single chromosome: a comparison of solution hybrid selection and flow sorting. BMC Genomics 2013; 14:253. [PMID: 23586822 PMCID: PMC3637801 DOI: 10.1186/1471-2164-14-253] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 03/20/2013] [Indexed: 11/10/2022] Open
Abstract
Background Targeted capture, combined with massively-parallel sequencing, is a powerful technique that allows investigation of specific portions of the genome for less cost than whole genome sequencing. Several methods have been developed, and improvements have resulted in commercial products targeting the human or mouse exonic regions (the exome). In some cases it is desirable to custom-target other regions of the genome, either to reduce the amount of sequence that is targeted or to capture regions that are not targeted by commercial kits. It is important to understand the advantages, limitations, and complexity of a given capture method before embarking on a targeted sequencing experiment. Results We compared two custom targeted capture methods suitable for single chromosome analysis: Solution Hybrid Selection (SHS) and Flow Sorting (FS) of single chromosomes. Both methods can capture targeted material and result in high percentages of genotype identifications across these regions: 59-92% for SHS and 70-79% for FS. FS is amenable to current structural variation detection methods, and variants were detected. Structural variation was also assessed for SHS samples with paired end sequencing, resulting in variant identification. Conclusions While both methods can effectively target genomic regions for genotype determination, several considerations make each method appropriate in different circumstances. SHS is well suited for experiments targeting smaller regions in a larger number of samples. FS is well suited when regions of interest cover large regions of a single chromosome. Although whole genome sequencing is becoming less expensive, the sequencing, data storage, and analysis costs make targeted sequencing using SHS or FS a compelling option.
Collapse
Affiliation(s)
- Jamie K Teer
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
52
|
Lee S, Chugh PE, Shen H, Eberle R, Dittmer DP. Poisson factor models with applications to non-normalized microRNA profiling. ACTA ACUST UNITED AC 2013; 29:1105-11. [PMID: 23428639 DOI: 10.1093/bioinformatics/btt091] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Next-generation (NextGen) sequencing is becoming increasingly popular as an alternative for transcriptional profiling, as is the case for micro RNAs (miRNA) profiling and classification. miRNAs are a new class of molecules that are regulated in response to differentiation, tumorigenesis or infection. Our primary motivating application is to identify different viral infections based on the induced change in the host miRNA profile. Statistical challenges are encountered because of special features of NextGen sequencing data: the data are read counts that are extremely skewed and non-negative; the total number of reads varies dramatically across samples that require appropriate normalization. Statistical tools developed for microarray expression data, such as principal component analysis, are sub-optimal for analyzing NextGen sequencing data. RESULTS We propose a family of Poisson factor models that explicitly takes into account the count nature of sequencing data and automatically incorporates sample normalization through the use of offsets. We develop an efficient algorithm for estimating the Poisson factor model, entitled Poisson Singular Value Decomposition with Offset (PSVDOS). The method is shown to outperform several other normalization and dimension reduction methods in a simulation study. Through analysis of an miRNA profiling experiment, we further illustrate that our model achieves insightful dimension reduction of the miRNA profiles of 18 samples: the extracted factors lead to more accurate and meaningful clustering of the cell lines. AVAILABILITY The PSVDOS software is available on request.
Collapse
Affiliation(s)
- Seonjoo Lee
- Center for Neuroscience and Regenerative Medicine, The Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, MD 20892, USA
| | | | | | | | | |
Collapse
|
53
|
Zhou JB, Zhang T, Wang BF, Gao HZ, Xu X. Identification of a novel gene fusion RNF213‑SLC26A11 in chronic myeloid leukemia by RNA-Seq. Mol Med Rep 2012; 7:591-7. [PMID: 23151810 DOI: 10.3892/mmr.2012.1183] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Accepted: 10/25/2012] [Indexed: 11/05/2022] Open
Abstract
Chronic myeloid leukemia (CML) was the first hematological malignancy to be associated with a specific genetic lesion. The Philadelphia translocation, producing a BCR‑ABL hybrid oncogene, is the most common mechanism of CML development. However, in the present study, b3a2, b2a2 and ela2 fusion junctions of the breakpoint cluster region (BCR)-V-abl Abelson murine leukemia viral oncogene homolog 1 (ABL) gene were not detected in patients diagnosed with CML three and four years previously. RNA-Seq technology, with an average coverage of ~30‑fold, was used to detect gene fusion events in a patient with a 6-year history of CML, identified to be in the chronic phase of the disease. Using deFuse and TopHat‑fusion programs with improved filtering methods, we identified two reliable gene fusions in a blood sample obtained from the CML patient, including extremely low expression levels of the classic BCR‑ABL1 gene fusion. In addition, a novel gene fusion involving the ring finger protein 213 (RNF213)-solute carrier family 26, member 11 (SLC26A11) was identified and validated by reverse transcription polymerase chain reaction. Further bioinformatic analysis revealed that specific domains of SLC26A11 were damaged, which may affect the function of sulfate transportation of the normal gene. The present study demonstrated that, in specific cases, alternative gene fusions, besides BCR‑ABL, may be associated with the development of CML.
Collapse
Affiliation(s)
- Jian-Bo Zhou
- Department of Clinical Laboratories, Jiang Yin People's Hospital, Jiang Yin, Jiangsu 214400, P.R. China.
| | | | | | | | | |
Collapse
|
54
|
Doležel J, Vrána J, Safář J, Bartoš J, Kubaláková M, Simková H. Chromosomes in the flow to simplify genome analysis. Funct Integr Genomics 2012; 12:397-416. [PMID: 22895700 PMCID: PMC3431466 DOI: 10.1007/s10142-012-0293-0] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2012] [Accepted: 07/30/2012] [Indexed: 11/25/2022]
Abstract
Nuclear genomes of human, animals, and plants are organized into subunits called chromosomes. When isolated into aqueous suspension, mitotic chromosomes can be classified using flow cytometry according to light scatter and fluorescence parameters. Chromosomes of interest can be purified by flow sorting if they can be resolved from other chromosomes in a karyotype. The analysis and sorting are carried out at rates of 10(2)-10(4) chromosomes per second, and for complex genomes such as wheat the flow sorting technology has been ground-breaking in reducing genome complexity for genome sequencing. The high sample rate provides an attractive approach for karyotype analysis (flow karyotyping) and the purification of chromosomes in large numbers. In characterizing the chromosome complement of an organism, the high number that can be studied using flow cytometry allows for a statistically accurate analysis. Chromosome sorting plays a particularly important role in the analysis of nuclear genome structure and the analysis of particular and aberrant chromosomes. Other attractive but not well-explored features include the analysis of chromosomal proteins, chromosome ultrastructure, and high-resolution mapping using FISH. Recent results demonstrate that chromosome flow sorting can be coupled seamlessly with DNA array and next-generation sequencing technologies for high-throughput analyses. The main advantages are targeting the analysis to a genome region of interest and a significant reduction in sample complexity. As flow sorters can also sort single copies of chromosomes, shotgun sequencing DNA amplified from them enables the production of haplotype-resolved genome sequences. This review explains the principles of flow cytometric chromosome analysis and sorting (flow cytogenetics), discusses the major uses of this technology in genome analysis, and outlines future directions.
Collapse
Affiliation(s)
- Jaroslav Doležel
- Centre of the Region Haná for Biotechnological and Agricultural Research, Institute of Experimental Botany, Sokolovská 6, Olomouc, Czech Republic.
| | | | | | | | | | | |
Collapse
|
55
|
Kim HG, Kim HT, Leach NT, Lan F, Ullmann R, Silahtaroglu A, Kurth I, Nowka A, Seong IS, Shen Y, Talkowski ME, Ruderfer D, Lee JH, Glotzbach C, Ha K, Kjaergaard S, Levin AV, Romeike BF, Kleefstra T, Bartsch O, Elsea SH, Jabs EW, MacDonald ME, Harris DJ, Quade BJ, Ropers HH, Shaffer LG, Kutsche K, Layman LC, Tommerup N, Kalscheuer VM, Shi Y, Morton CC, Kim CH, Gusella JF. Translocations disrupting PHF21A in the Potocki-Shaffer-syndrome region are associated with intellectual disability and craniofacial anomalies. Am J Hum Genet 2012; 91:56-72. [PMID: 22770980 PMCID: PMC3397276 DOI: 10.1016/j.ajhg.2012.05.005] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2011] [Revised: 03/18/2012] [Accepted: 05/10/2012] [Indexed: 12/30/2022] Open
Abstract
Potocki-Shaffer syndrome (PSS) is a contiguous gene disorder due to the interstitial deletion of band p11.2 of chromosome 11 and is characterized by multiple exostoses, parietal foramina, intellectual disability (ID), and craniofacial anomalies (CFAs). Despite the identification of individual genes responsible for multiple exostoses and parietal foramina in PSS, the identity of the gene(s) associated with the ID and CFA phenotypes has remained elusive. Through characterization of independent subjects with balanced translocations and supportive comparative deletion mapping of PSS subjects, we have uncovered evidence that the ID and CFA phenotypes are both caused by haploinsufficiency of a single gene, PHF21A, at 11p11.2. PHF21A encodes a plant homeodomain finger protein whose murine and zebrafish orthologs are both expressed in a manner consistent with a function in neurofacial and craniofacial development, and suppression of the latter led to both craniofacial abnormalities and neuronal apoptosis. Along with lysine-specific demethylase 1 (LSD1), PHF21A, also known as BHC80, is a component of the BRAF-histone deacetylase complex that represses target-gene transcription. In lymphoblastoid cell lines from two translocation subjects in whom PHF21A was directly disrupted by the respective breakpoints, we observed derepression of the neuronal gene SCN3A and reduced LSD1 occupancy at the SCN3A promoter, supporting a direct functional consequence of PHF21A haploinsufficiency on transcriptional regulation. Our finding that disruption of PHF21A by translocations in the PSS region is associated with ID adds to the growing list of ID-associated genes that emphasize the critical role of transcriptional regulation and chromatin remodeling in normal brain development and cognitive function.
Collapse
Affiliation(s)
- Hyung-Goo Kim
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, 02114, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
56
|
Vrána J, Simková H, Kubaláková M, Cíhalíková J, Doležel J. Flow cytometric chromosome sorting in plants: the next generation. Methods 2012; 57:331-7. [PMID: 22440520 DOI: 10.1016/j.ymeth.2012.03.006] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2011] [Revised: 03/01/2012] [Accepted: 03/05/2012] [Indexed: 10/28/2022] Open
Abstract
Genome analysis in many plant species is hampered by large genome size and by sequence redundancy due to the presence of repetitive DNA and polyploidy. One solution is to reduce the sample complexity by dissecting the genomes to single chromosomes. This can be realized by flow cytometric sorting, which enables purification of chromosomes in large numbers. Coupling the chromosome sorting technology with next generation sequencing provides a targeted and cost effective way to tackle complex genomes. The methods outlined in this article describe a procedure for preparation of chromosomal DNA suitable for next-generation sequencing.
Collapse
Affiliation(s)
- Jan Vrána
- Centre of the Region Haná for Biotechnological and Agricultural Research, Institute of Experimental Botany, Sokolovská 6, CZ-77200 Olomouc, Czech Republic
| | | | | | | | | |
Collapse
|
57
|
Hermetz KE, Surti U, Cody JD, Rudd MK. A recurrent translocation is mediated by homologous recombination between HERV-H elements. Mol Cytogenet 2012; 5:6. [PMID: 22260357 PMCID: PMC3292815 DOI: 10.1186/1755-8166-5-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2011] [Accepted: 01/19/2012] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Chromosome rearrangements are caused by many mutational mechanisms; of these, recurrent rearrangements can be particularly informative for teasing apart DNA sequence-specific factors. Some recurrent translocations are mediated by homologous recombination between large blocks of segmental duplications on different chromosomes. Here we describe a recurrent unbalanced translocation casued by recombination between shorter homologous regions on chromosomes 4 and 18 in two unrelated children with intellectual disability. RESULTS Array CGH resolved the breakpoints of the 6.97-Megabase (Mb) loss of 18q and the 7.30-Mb gain of 4q. Sequencing across the translocation breakpoints revealed that both translocations occurred between 92%-identical human endogenous retrovirus (HERV) elements in the same orientation on chromosomes 4 and 18. In addition, we find sequence variation in the chromosome 4 HERV that makes one allele more like the chromosome 18 HERV. CONCLUSIONS Homologous recombination between HERVs on the same chromosome is known to cause chromosome deletions, but this is the first report of interchromosomal HERV-HERV recombination leading to a translocation. It is possible that normal sequence variation in substrates of non-allelic homologous recombination (NAHR) affects the alignment of recombining segments and influences the propensity to chromosome rearrangement.
Collapse
Affiliation(s)
- Karen E Hermetz
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA.
| | | | | | | |
Collapse
|
58
|
Emde AK, Schulz MH, Weese D, Sun R, Vingron M, Kalscheuer VM, Haas SA, Reinert K. Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS. ACTA ACUST UNITED AC 2012; 28:619-27. [PMID: 22238266 DOI: 10.1093/bioinformatics/bts019] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
MOTIVATION The reliable detection of genomic variation in resequencing data is still a major challenge, especially for variants larger than a few base pairs. Sequencing reads crossing boundaries of structural variation carry the potential for their identification, but are difficult to map. RESULTS Here we present a method for 'split' read mapping, where prefix and suffix match of a read may be interrupted by a longer gap in the read-to-reference alignment. We use this method to accurately detect medium-sized insertions and long deletions with precise breakpoints in genomic resequencing data. Compared with alternative split mapping methods, SplazerS significantly improves sensitivity for detecting large indel events, especially in variant-rich regions. Our method is robust in the presence of sequencing errors as well as alignment errors due to genomic mutations/divergence, and can be used on reads of variable lengths. Our analysis shows that SplazerS is a versatile tool applicable to unanchored or single-end as well as anchored paired-end reads. In addition, application of SplazerS to targeted resequencing data led to the interesting discovery of a complete, possibly functional gene retrocopy variant. AVAILABILITY SplazerS is available from http://www.seqan.de/projects/ splazers. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anne-Katrin Emde
- Department of Computer Science, Freie Universität Berlin, Takustrasse 9, Max-Planck-Institute for Molecular Genetics, Berlin, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
59
|
Bayés M, Heath S, Gut IG. Applications of second generation sequencing technologies in complex disorders. Curr Top Behav Neurosci 2012; 12:321-343. [PMID: 22331695 DOI: 10.1007/7854_2011_196] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Second generation sequencing (2ndGS) technologies generate unprecedented amounts of sequence data very rapidly and at relatively limited costs, allowing the sequence of a human genome to be completed in a few weeks. The principle is on the basis of generating millions of relatively short reads from amplified single DNA fragments using iterative cycles of nucleotide extensions. However, the data generated on this scale present new challenges in interpretation, data analysis and data management. 2ndGS technologies are becoming widespread and are profoundly impacting biomedical research. Common applications include whole-genome sequencing, target resequencing, characterization of structural and copy number variation, profiling epigenetic modifications, transcriptome sequencing and identification of infectious agents. New methodologies and instruments that will enable to sequence the complete human genome in less than a day at a cost of less than $1,000 are currently in development.
Collapse
Affiliation(s)
- Mònica Bayés
- Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain,
| | | | | |
Collapse
|
60
|
Characterising chromosome rearrangements: recent technical advances in molecular cytogenetics. Heredity (Edinb) 2011; 108:75-85. [PMID: 22086080 PMCID: PMC3238113 DOI: 10.1038/hdy.2011.100] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Genomic rearrangements can result in losses, amplifications, translocations and inversions of DNA fragments thereby modifying genome architecture, and potentially having clinical consequences. Many genomic disorders caused by structural variation have initially been uncovered by early cytogenetic methods. The last decade has seen significant progression in molecular cytogenetic techniques, allowing rapid and precise detection of structural rearrangements on a whole-genome scale. The high resolution attainable with these recently developed techniques has also uncovered the role of structural variants in normal genetic variation alongside single-nucleotide polymorphisms (SNPs). We describe how array-based comparative genomic hybridisation, SNP arrays, array painting and next-generation sequencing analytical methods (read depth, read pair and split read) allow the extensive characterisation of chromosome rearrangements in human genomes.
Collapse
|
61
|
Hochstenbach R, Buizer-Voskamp JE, Vorstman JAS, Ophoff RA. Genome arrays for the detection of copy number variations in idiopathic mental retardation, idiopathic generalized epilepsy and neuropsychiatric disorders: lessons for diagnostic workflow and research. Cytogenet Genome Res 2011; 135:174-202. [PMID: 22056632 DOI: 10.1159/000332928] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2022] Open
Abstract
We review the contributions and limitations of genome-wide array-based identification of copy number variants (CNVs) in the clinical diagnostic evaluation of patients with mental retardation (MR) and other brain-related disorders. In unselected MR referrals a causative genomic gain or loss is detected in 14-18% of cases. Usually, such CNVs arise de novo, are not found in healthy subjects, and have a major impact on the phenotype by altering the dosage of multiple genes. This high diagnostic yield justifies array-based segmental aneuploidy screening as the initial genetic test in these patients. This also pertains to patients with autism (expected yield about 5-10% in nonsyndromic and 10-20% in syndromic patients) and schizophrenia (at least 5% yield). CNV studies in idiopathic generalized epilepsy, attention-deficit hyperactivity disorder, major depressive disorder and Tourette syndrome indicate that patients have, on average, a larger CNV burden as compared to controls. Collectively, the CNV studies suggest that a wide spectrum of disease-susceptibility variants exists, most of which are rare (<0.1%) and of variable and usually small effect. Notwithstanding, a rare CNV can have a major impact on the phenotype. Exome sequencing in MR and autism patients revealed de novo mutations in protein coding genes in 60 and 20% of cases, respectively. Therefore, it is likely that arrays will be supplanted by next-generation sequencing methods as the initial and perhaps ultimate diagnostic tool in patients with brain-related disorders, revealing both CNVs and mutations in a single test.
Collapse
Affiliation(s)
- R Hochstenbach
- Division of Biomedical Genetics, Department of Medical Genetics, University Medical Centre Utrecht, Utrecht, The Netherlands.
| | | | | | | |
Collapse
|
62
|
Sobreira NLM, Gnanakkan V, Walsh M, Marosy B, Wohler E, Thomas G, Hoover-Fong JE, Hamosh A, Wheelan SJ, Valle D. Characterization of complex chromosomal rearrangements by targeted capture and next-generation sequencing. Genome Res 2011; 21:1720-7. [PMID: 21890680 DOI: 10.1101/gr.122986.111] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Translocations are a common class of chromosomal aberrations and can cause disease by physically disrupting genes or altering their regulatory environment. Some translocations, apparently balanced at the microscopic level, include deletions, duplications, insertions, or inversions at the molecular level. Traditionally, chromosomal rearrangements have been investigated with a conventional banded karyotype followed by arduous positional cloning projects. More recently, molecular cytogenetic approaches using fluorescence in situ hybridization (FISH), array comparative genomic hybridization (aCGH), or whole-genome SNP genotyping together with molecular methods such as inverse PCR and quantitative PCR have allowed more precise evaluation of the breakpoints. These methods suffer, however, from being experimentally intensive and time-consuming and of less than single base pair resolution. Here we describe targeted breakpoint capture followed by next-generation sequencing (TBCS) as a new approach to the general problem of determining the precise structural characterization of translocation breakpoints and related chromosomal aberrations. We tested this approach in three patients with complex chromosomal translocations: The first had craniofacial abnormalities and an apparently balanced t(2;3)(p15;q12) translocation; the second has cleidocranial dysplasia (OMIM 119600) associated with a t(2;6)(q22;p12.3) translocation and a breakpoint in RUNX2 on chromosome 6p; and the third has acampomelic campomelic dysplasia (OMIM 114290) associated with a t(5;17)(q23.2;q24) translocation, with a breakpoint upstream of SOX9 on chromosome 17q. Preliminary studies indicated complex rearrangements in patients 1 and 3 with a total of 10 predicted breakpoints in the three patients. By using TBCS, we quickly and precisely defined eight of the 10 breakpoints.
Collapse
Affiliation(s)
- Nara L M Sobreira
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
63
|
Yu P, Wang C, Xu Q, Feng Y, Yuan X, Yu H, Wang Y, Tang S, Wei X. Detection of copy number variations in rice using array-based comparative genomic hybridization. BMC Genomics 2011; 12:372. [PMID: 21771342 PMCID: PMC3156786 DOI: 10.1186/1471-2164-12-372] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2011] [Accepted: 07/20/2011] [Indexed: 01/02/2023] Open
Abstract
Background Copy number variations (CNVs) can create new genes, change gene dosage, reshape gene structures, and modify elements regulating gene expression. As with all types of genetic variation, CNVs may influence phenotypic variation and gene expression. CNVs are thus considered major sources of genetic variation. Little is known, however, about their contribution to genetic variation in rice. Results To detect CNVs, we used a set of NimbleGen whole-genome comparative genomic hybridization arrays containing 718,256 oligonucleotide probes with a median probe spacing of 500 bp. We compiled a high-resolution map of CNVs in the rice genome, showing 641 CNVs between the genomes of the rice cultivars 'Nipponbare' (from O. sativa ssp. japonica) and 'Guang-lu-ai 4' (from O. sativa ssp. indica). The CNVs identified vary in size from 1.1 kb to 180.7 kb, and encompass approximately 7.6 Mb of the rice genome. The largest regions showing copy gain and loss are of 37.4 kb on chromosome 4, and 180.7 kb on chromosome 8. In addition, 85 DNA segments were identified, including some genic sequences. Contracted genes greatly outnumbered duplicated ones. Many of the contracted genes corresponded to either the same genes or genes involved in the same biological processes; this was also the case for genes involved in disease and defense. Conclusion We detected CNVs in rice by array-based comparative genomic hybridization. These CNVs contain known genes. Further discussion of CNVs is important, as they are linked to variation among rice varieties, and are likely to contribute to subspecific characteristics.
Collapse
Affiliation(s)
- Ping Yu
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, China
| | | | | | | | | | | | | | | | | |
Collapse
|
64
|
Ajay SS, Parker SCJ, Abaan HO, Fajardo KVF, Margulies EH. Accurate and comprehensive sequencing of personal genomes. Genome Res 2011; 21:1498-505. [PMID: 21771779 DOI: 10.1101/gr.123638.111] [Citation(s) in RCA: 139] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
As whole-genome sequencing becomes commoditized and we begin to sequence and analyze personal genomes for clinical and diagnostic purposes, it is necessary to understand what constitutes a complete sequencing experiment for determining genotypes and detecting single-nucleotide variants. Here, we show that the current recommendation of ∼30× coverage is not adequate to produce genotype calls across a large fraction of the genome with acceptably low error rates. Our results are based on analyses of a clinical sample sequenced on two related Illumina platforms, GAII(x) and HiSeq 2000, to a very high depth (126×). We used these data to establish genotype-calling filters that dramatically increase accuracy. We also empirically determined how the callable portion of the genome varies as a function of the amount of sequence data used. These results help provide a "sequencing guide" for future whole-genome sequencing decisions and metrics by which coverage statistics should be reported.
Collapse
Affiliation(s)
- Subramanian S Ajay
- Genome Informatics Section, Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | | | | |
Collapse
|
65
|
Next-generation sequencing strategies enable routine detection of balanced chromosome rearrangements for clinical diagnostics and genetic research. Am J Hum Genet 2011; 88:469-81. [PMID: 21473983 DOI: 10.1016/j.ajhg.2011.03.013] [Citation(s) in RCA: 135] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2011] [Revised: 03/14/2011] [Accepted: 03/17/2011] [Indexed: 02/03/2023] Open
Abstract
The contribution of balanced chromosomal rearrangements to complex disorders remains unclear because they are not detected routinely by genome-wide microarrays and clinical localization is imprecise. Failure to consider these events bypasses a potentially powerful complement to single nucleotide polymorphism and copy-number association approaches to complex disorders, where much of the heritability remains unexplained. To capitalize on this genetic resource, we have applied optimized sequencing and analysis strategies to test whether these potentially high-impact variants can be mapped at reasonable cost and throughput. By using a whole-genome multiplexing strategy, rearrangement breakpoints could be delineated at a fraction of the cost of standard sequencing. For rearrangements already mapped regionally by karyotyping and fluorescence in situ hybridization, a targeted approach enabled capture and sequencing of multiple breakpoints simultaneously. Importantly, this strategy permitted capture and unique alignment of up to 97% of repeat-masked sequences in the targeted regions. Genome-wide analyses estimate that only 3.7% of bases should be routinely omitted from genomic DNA capture experiments. Illustrating the power of these approaches, the rearrangement breakpoints were rapidly defined to base pair resolution and revealed unexpected sequence complexity, such as co-occurrence of inversion and translocation as an underlying feature of karyotypically balanced alterations. These findings have implications ranging from genome annotation to de novo assemblies and could enable sequencing screens for structural variations at a cost comparable to that of microarrays in standard clinical practice.
Collapse
|
66
|
Obenauf AC, Schwarzbraun T, Auer M, Hoffmann EM, Waldispuehl-Geigl J, Ulz P, Günther B, Duba HC, Speicher MR, Geigl JB. Mapping of balanced chromosome translocation breakpoints to the basepair level from microdissected chromosomes. J Cell Mol Med 2011; 14:2078-84. [PMID: 20597996 PMCID: PMC3822999 DOI: 10.1111/j.1582-4934.2010.01116.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
The analysis of structural variants associated with specific phenotypic features is promising for the elucidation of the function of involved genes. There is, however, at present no approach allowing the rapid mapping of chromosomal translocation breakpoints to the basepair level from a single chromosome. Here we demonstrate that we have advanced both the microdissection and the subsequent unbiased amplification to an extent that breakpoint mapping to the basepair level has become possible. As a case in point we analysed the two breakpoints of a t(7;13) translocation observed in a patient with split hand/foot malformation (SHFM1). The amplification products of the der(7) and of the der(13) were hybridized to custom-made arrays, enabling us to define primers at flanking breakpoint regions and thus to fine-map the breakpoints to the basepair level. Consequently, our results will also contribute to a further delineation of causative mechanisms underlying SHFM1 which are currently unknown.
Collapse
Affiliation(s)
- Anna C Obenauf
- Institute of Human Genetics, Medical University of Graz, Graz, Austria
| | | | | | | | | | | | | | | | | | | |
Collapse
|
67
|
Kloosterman WP, Guryev V, van Roosmalen M, Duran KJ, de Bruijn E, Bakker SCM, Letteboer T, van Nesselrooij B, Hochstenbach R, Poot M, Cuppen E. Chromothripsis as a mechanism driving complex de novo structural rearrangements in the germline. Hum Mol Genet 2011; 20:1916-24. [PMID: 21349919 DOI: 10.1093/hmg/ddr073] [Citation(s) in RCA: 236] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
A variety of mutational mechanisms shape the dynamic architecture of human genomes and occasionally result in congenital defects and disease. Here, we used genome-wide long mate-pair sequencing to systematically screen for inherited and de novo structural variation in a trio including a child with severe congenital abnormalities. We identified 4321 inherited structural variants and 17 de novo rearrangements. We characterized the de novo structural changes to the base-pair level revealing a complex series of balanced inter- and intra-chromosomal rearrangements consisting of 12 breakpoints involving chromosomes 1, 4 and 10. Detailed inspection of breakpoint regions indicated that a series of simultaneous double-stranded DNA breaks caused local shattering of chromosomes. Fusion of the resulting chromosomal fragments involved non-homologous end joining, since junction points displayed limited or no homology and small insertions and deletions. The pattern of random joining of chromosomal fragments that we observe here strongly resembles the somatic rearrangement patterns--termed chromothripsis--that have recently been described in deranged cancer cells. We conclude that a similar mechanism may also drive the formation of de novo structural variation in the germline.
Collapse
Affiliation(s)
- Wigard P Kloosterman
- Department of Medical Genetics, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG Utrecht, The Netherlands
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
68
|
Løvf M, Thomassen GOS, Bakken AC, Celestino R, Fioretos T, Lind GE, Lothe RA, Skotheim RI. Fusion gene microarray reveals cancer type-specificity among fusion genes. Genes Chromosomes Cancer 2011; 50:348-57. [PMID: 21305644 DOI: 10.1002/gcc.20860] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2010] [Accepted: 01/17/2011] [Indexed: 01/19/2023] Open
Abstract
Detection of fusion genes for diagnostic purposes and as a guide to treatment is well-established in hematological malignancies, and the prevalence of fusion genes in epithelial cancers is also increasingly appreciated. To study whether established fusion genes are present within additional cancer types, we have used an updated version of our fusion gene microarray in a systematic survey of reported fusion genes in multiple cancer types. We assembled a comprehensive database of published fusion genes, including those reported only in individual studies and samples, and fusion genes resulting from deep sequencing of cancer genomes and transcriptomes. From the total set of 548 fusion genes, we designed 599,839 oligonucleotides, targeting both chimeric transcript junctions as well as sequences internal to each of the fusion gene partners. We investigated the presence of fusion genes in a series of 67 cell lines representing 15 different cancer types. Data from ten leukemia cell lines with known fusion gene status were used to develop an automated scoring algorithm, and in five cell lines the correct fusion gene was the top scoring hit, and one came second. Two additional fusion genes, BCAS4-BCAS3 in the MCF-7 breast cancer cell line and CCDC6-RET in the TPC-1 thyroid cancer cell line were validated as true positive fusion transcripts. However, these fusion genes were not new to these cancer types, and none of 548 fusion genes were identified from a novel cancer type. We therefore find it unlikely that the assayed fusion genes are commonly present across multiple cancer types.
Collapse
Affiliation(s)
- Marthe Løvf
- Department of Cancer Prevention, Institute for Cancer Research, Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | | | | | | | | | | | | | | |
Collapse
|
69
|
Wong DWS, Leung ELH, Wong SKM, Tin VPC, Sihoe ADL, Cheng LC, Au JSK, Chung LP, Wong MP. A novel KIF5B-ALK variant in nonsmall cell lung cancer. Cancer 2011; 117:2709-18. [PMID: 21656749 DOI: 10.1002/cncr.25843] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2010] [Revised: 10/01/2010] [Accepted: 11/12/2010] [Indexed: 11/07/2022]
Abstract
BACKGROUND The anaplastic lymphoma kinase (ALK) gene is involved frequently in chromosomal translocations, resulting in fusion genes with different partners found in various lymphoproliferative conditions. It was recently reported in nonsmall cell lung cancer (NSCLC) that the fusion protein encoded by echinoderm microtubule-associated protein-like 4-ALK (EML4-ALK) fusion gene conferred oncogenic properties. The objective of the current study was to identify other possible ALK fusion genes in NSCLC. METHODS Immunohistochemical analysis was used to screen for aberrant ALK expression in primary NSCLC. The authors used 5' rapid amplification of complementary DNA ends to screen for potential, novel 5' fusion partners of ALK other than EML4-ALK. Reverse transcriptase-polymerase chain reaction and fluorescence in situ hybridization analyses were used to confirm the identity of 5' fusion partners. The genomic breakpoint was verified using genomic sequencing. Overexpression of the novel ALK fusion gene and variants 3a and 3b of EML4-ALK was performed to assess downstream signaling and functional effects. RESULTS The authors identified a novel gene resulting from the fusion of kinesin family member 5B (KIF5B) exon 15 to ALK exon 20 in a primary lung adenocarcinoma. Western blot analysis of clinical tumor tissues revealed the expression of a protein whose size correlated with that of the predicted KIF5B-ALK. Overexpression of KIF5B-ALK in mammalian cells led to the activation of signal transducer and activator of transcription 3 and protein kinase B and to enhanced cell proliferation, migration, and invasion. CONCLUSIONS The discovery of the novel KIF5B-ALK variant further consolidated the role of aberrant ALK signaling in lung carcinogenesis.
Collapse
Affiliation(s)
- Daisy Wing-Sze Wong
- Department of Pathology, The University of Hong Kong, Queen Mary Hospital, Hong Kong SAR, China
| | | | | | | | | | | | | | | | | |
Collapse
|
70
|
Kitada K, Taima A, Ogasawara K, Metsugi S, Aikawa S. Chromosome-specific segmentation revealed by structural analysis of individually isolated chromosomes. Genes Chromosomes Cancer 2011; 50:217-27. [PMID: 21319258 DOI: 10.1002/gcc.20847] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2010] [Revised: 11/19/2010] [Accepted: 11/22/2010] [Indexed: 11/09/2022] Open
Abstract
Analysis of structural rearrangements at the individual chromosomal level is still technologically challenging. Here we optimized a chromosome isolation method using fluorescent marker-assisted laser-capture and laser-beam microdissection and applied it to structural analysis of two aberrant chromosomes found in a lung cancer cell line. A high-density array-comparative genomic hybridization (array-CGH) analysis of DNA samples prepared from each of the chromosomes revealed that these two chromosomes contained 296 and 263 segments, respectively, ranging from 1.5 kb to 784.3 kb in size, derived from different portions of chromosome 8. Among these segments, 242 were common in both aberrant chromosomes, but 75 were found to be chromosome-specific. Sequences of 263 junction sites connecting the ends of segments were determined using a PCR/Sanger-sequencing procedure. Overlapping microhomologies were found at 169 junction sites. Junction partners came from various portions of chromosome 8 and no biased pattern in the positional distribution of junction partners was detected. These structural characteristics suggested the occurrence of random fragmentation of the entire chromosome 8 followed by random rejoining of these fragments. Based on that, we proposed a model to explain how these aberrant chromosomes are formed. Through these structural analyses, it was demonstrated that the optimized chromosome isolation method described here can provide high-quality chromosomal DNA for high resolution array-CGH analysis and probably for massively parallel sequencing analysis.
Collapse
Affiliation(s)
- Kunio Kitada
- Kamakura Research Laboratories, Chugai Pharmaceutical Co. Ltd., 200-Kajiwara, Kamakura, Kanagawa 247-8530, Japan.
| | | | | | | | | |
Collapse
|
71
|
Doležel J, Kubaláková M, Cíhalíková J, Suchánková P, Simková H. Chromosome analysis and sorting using flow cytometry. Methods Mol Biol 2011; 701:221-38. [PMID: 21181533 DOI: 10.1007/978-1-61737-957-4_12] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Chromosome analysis and sorting using flow cytometry (flow cytogenetics) is an attractive tool for fractionating plant genomes to small parts. The reduction of complexity greatly simplifies genetics and genomics in plant species with large genomes. However, as flow cytometry requires liquid suspensions of particles, the lack of suitable protocols for preparation of solutions of intact chromosomes delayed the application of flow cytogenetics in plants. This chapter outlines a high-yielding procedure for preparation of solutions of intact mitotic chromosomes from root tips of young seedlings and for their analysis using flow cytometry and sorting. Root tips accumulated at metaphase are mildly fixed with formaldehyde, and solutions of intact chromosomes are prepared by mechanical homogenization. The advantages of the present approach include the use of seedlings, which are easy to handle, and the karyological stability of root meristems, which can be induced to high degree of metaphase synchrony. Chromosomes isolated according to this protocol have well-preserved morphology, withstand shearing forces during sorting, and their DNA is intact and suitable for a range of applications.
Collapse
Affiliation(s)
- Jaroslav Doležel
- Laboratory of Molecular Cytogenetics and Cytometry, Institute of Experimental Botany, Olomouc, Czech Republic.
| | | | | | | | | |
Collapse
|
72
|
Genome organization influences partner selection for chromosomal rearrangements. Trends Genet 2010; 27:63-71. [PMID: 21144612 DOI: 10.1016/j.tig.2010.11.001] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2010] [Revised: 11/02/2010] [Accepted: 11/03/2010] [Indexed: 11/22/2022]
Abstract
Chromosomal rearrangements occur as a consequence of the erroneous repair of DNA double-stranded breaks, and often underlie disease. The recurrent detection of specific tumorigenic rearrangements suggests that there is a mechanism behind chromosomal partner selection involving the shape of the genome. With the advent of novel high-throughput approaches, detailed genome integrity and folding maps are becoming available. Integrating these data with knowledge of experimentally induced DNA recombination strongly suggests that partner choice in chromosomal rearrangement primarily follows the three-dimensional conformation of the genome. Local rearrangements are favored over distal and interchromosomal rearrangements. This is seen for neutral rearrangements, but not necessarily for rearrangements that drive oncogenesis. The recurrent detection of tumorigenic rearrangements probably reflects their exceptional capacity to confer growth advantage to the rare cells that contain them. The abundant presence of neutral rearrangements suggests that somatic genome variation is also common in healthy tissue.
Collapse
|
73
|
Kong F, Zhu J, Wu J, Peng J, Wang Y, Wang Q, Fu S, Yuan LL, Li T. dbCRID: a database of chromosomal rearrangements in human diseases. Nucleic Acids Res 2010; 39:D895-900. [PMID: 21051346 PMCID: PMC3013658 DOI: 10.1093/nar/gkq1038] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Chromosomal rearrangement (CR) events result from abnormal breaking and rejoining of the DNA molecules, or from crossing-over between repetitive DNA sequences, and they are involved in many tumor and non-tumor diseases. Investigations of disease-associated CR events can not only lead to important discoveries about DNA breakage and repair mechanisms, but also offer important clues about the pathologic causes and the diagnostic/therapeutic targets of these diseases. We have developed a database of Chromosomal Rearrangements In Diseases (dbCRID, http://dbCRID.biolead.org), a comprehensive database of human CR events and their associated diseases. For each reported CR event, dbCRID documents the type of the event, the disease or symptoms associated, and--when possible--detailed information about the CR event including precise breakpoint positions, junction sequences, genes and gene regions disrupted and experimental techniques applied to discover/analyze the CR event. With 2643 records of disease-associated CR events curated from 1172 original studies, dbCRID is a comprehensive and dynamic resource useful for studying DNA breakage and repair mechanisms, and for analyzing the genetic basis of human tumor and non-tumor diseases.
Collapse
Affiliation(s)
- Fanlou Kong
- Biolead.org Research Group, LC Sciences, Houston, TX 77054, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
74
|
Bueno R, De Rienzo A, Dong L, Gordon GJ, Hercus CF, Richards WG, Jensen RV, Anwar A, Maulik G, Chirieac LR, Ho KF, Taillon BE, Turcotte CL, Hercus RG, Gullans SR, Sugarbaker DJ. Second generation sequencing of the mesothelioma tumor genome. PLoS One 2010; 5:e10612. [PMID: 20485525 PMCID: PMC2869344 DOI: 10.1371/journal.pone.0010612] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Accepted: 04/01/2010] [Indexed: 12/29/2022] Open
Abstract
The current paradigm for elucidating the molecular etiology of cancers relies on the interrogation of small numbers of genes, which limits the scope of investigation. Emerging second-generation massively parallel DNA sequencing technologies have enabled more precise definition of the cancer genome on a global scale. We examined the genome of a human primary malignant pleural mesothelioma (MPM) tumor and matched normal tissue by using a combination of sequencing-by-synthesis and pyrosequencing methodologies to a 9.6X depth of coverage. Read density analysis uncovered significant aneuploidy and numerous rearrangements. Method-dependent informatics rules, which combined the results of different sequencing platforms, were developed to identify and validate candidate mutations of multiple types. Many more tumor-specific rearrangements than point mutations were uncovered at this depth of sequencing, resulting in novel, large-scale, inter- and intra-chromosomal deletions, inversions, and translocations. Nearly all candidate point mutations appeared to be previously unknown SNPs. Thirty tumor-specific fusions/translocations were independently validated with PCR and Sanger sequencing. Of these, 15 represented disrupted gene-encoding regions, including kinases, transcription factors, and growth factors. One large deletion in DPP10 resulted in altered transcription and expression of DPP10 transcripts in a set of 53 additional MPM tumors correlated with survival. Additionally, three point mutations were observed in the coding regions of NKX6-2, a transcription regulator, and NFRKB, a DNA-binding protein involved in modulating NFKB1. Several regions containing genes such as PCBD2 and DHFR, which are involved in growth factor signaling and nucleotide synthesis, respectively, were selectively amplified in the tumor. Second-generation sequencing uncovered all types of mutations in this MPM tumor, with DNA rearrangements representing the dominant type.
Collapse
Affiliation(s)
- Raphael Bueno
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Assunta De Rienzo
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Lingsheng Dong
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Gavin J. Gordon
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | | | - William G. Richards
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Roderick V. Jensen
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | | | - Gautam Maulik
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Lucian R. Chirieac
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | | | - Bruce E. Taillon
- 454 Life Sciences, Inc., Branford, Connecticut, United States of America
| | | | | | - Steven R. Gullans
- Excel Medical Ventures, Boston, Massachusetts, United States of America
| | - David J. Sugarbaker
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| |
Collapse
|
75
|
Beló A, Beatty MK, Hondred D, Fengler KA, Li B, Rafalski A. Allelic genome structural variations in maize detected by array comparative genome hybridization. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2010; 120:355-67. [PMID: 19756477 DOI: 10.1007/s00122-009-1128-9] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2009] [Accepted: 07/28/2009] [Indexed: 05/04/2023]
Abstract
DNA polymorphisms such as insertion/deletions and duplications affecting genome segments larger than 1 kb are known as copy-number variations (CNVs) or structural variations (SVs). They have been recently studied in animals and humans by using array-comparative genome hybridization (aCGH), and have been associated with several human diseases. Their presence and phenotypic effects in plants have not been investigated on a genomic scale, although individual structural variations affecting traits have been described. We used aCGH to investigate the presence of CNVs in maize by comparing the genome of 13 maize inbred lines to B73. Analysis of hybridization signal ratios of 60,472 60-mer oligonucleotide probes between inbreds in relation to their location in the reference genome (B73) allowed us to identify clusters of probes that deviated from the ratio expected for equal copy-numbers. We found CNVs distributed along the maize genome in all chromosome arms. They occur with appreciable frequency in different germplasm subgroups, suggesting ancient origin. Validation of several CNV regions showed both insertion/deletions and copy-number differences. The nature of CNVs detected suggests CNVs might have a considerable impact on plant phenotypes, including disease response and heterosis.
Collapse
Affiliation(s)
- André Beló
- DuPont Crop Genetics, Route 141, Henry Clay Road, Wilmington, DE 19803, USA.
| | | | | | | | | | | |
Collapse
|
76
|
Ashton F, O�Connor R, Love J, Doherty E, Aftimos S, George A, Love D. Case Report Molecular characterisation of a der(Y)t(Xp;Yp) with Xp functional disomy and sex reversal. GENETICS AND MOLECULAR RESEARCH 2010; 9:1815-23. [DOI: 10.4238/vol9-3gmr896] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
77
|
Ropers HH. Single gene disorders come into focus--again. DIALOGUES IN CLINICAL NEUROSCIENCE 2010; 12:95-102. [PMID: 20373671 PMCID: PMC3181948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
In the early 1990s, when the second 5-year plan for the Human Genome Project-which requested more money than any previous research project in biology-was written, common disorders were presented as the future target of genome research. This was a clever move to ensure continued public support for this endeavor, which had been justified previously by the prospect that it would lead to the diagnosis, prevention, and therapy of severe, but mostly rare, Mendelian disorders. Today, more than 15 years later, after billions of dollars have been spent on genome-wide association studies (GWAS), very few major genetic risk factors for common diseases have been identified, and the enthusiasm for large GWAS is dwindling. At the same time, there is renewed interest for studying single gene disorders, which are now considered by some as a better clue to the understanding of common diseases. While this is probably true, Mendelian disorders are also important in their own right, since they must be far more common than generally thought. As discussed here, various efficient strategies exist for the elucidation of single gene defects, and their systematic application in combination with novel genome partitioning and massive parallel sequencing techniques, will have far-reaching implications for health care.
Collapse
|
78
|
Breakpoint analysis of balanced chromosome rearrangements by next-generation paired-end sequencing. Eur J Hum Genet 2009; 18:539-43. [PMID: 19953122 DOI: 10.1038/ejhg.2009.211] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Characterisation of breakpoints in disease-associated balanced chromosome rearrangements (DBCRs), which disrupt or inactivate specific genes, has facilitated the molecular elucidation of a wide variety of genetic disorders. However, conventional methods for mapping chromosome breakpoints, such as in situ hybridisation with fluorescent dye-labelled bacterial artificial chromosome clones (BAC-FISH), are laborious, time consuming and often with insufficient resolution to unequivocally identify the disrupted gene. By combining DNA array hybridisation with chromosome sorting, the efficiency of breakpoint mapping has dramatically improved. However, this can only be applied when the physical properties of the derivative chromosomes allow them to be flow sorted. To characterise the breakpoints in all types of balanced chromosome rearrangements more efficiently and more accurately, we performed massively parallel sequencing using Illumina 1G analyser and ABI SOLiD systems to generate short sequencing reads from both ends of DNA fragments. We applied this method to four different DBCRs, including two reciprocal translocations and two inversions. By identifying read pairs spanning the breakpoints, we were able to map the breakpoints to a region of a few hundred base pairs that could be confirmed by subsequent PCR amplification and Sanger sequencing of the junction fragments. Our results show the feasibility of paired-end sequencing of systematic breakpoint mapping and gene finding in patients with disease-associated chromosome rearrangements.
Collapse
|
79
|
Clement NL, Snell Q, Clement MJ, Hollenhorst PC, Purwar J, Graves BJ, Cairns BR, Johnson WE. The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. ACTA ACUST UNITED AC 2009; 26:38-45. [PMID: 19861355 DOI: 10.1093/bioinformatics/btp614] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The advent of next-generation sequencing technologies has increased the accuracy and quantity of sequence data, opening the door to greater opportunities in genomic research. RESULTS In this article, we present GNUMAP (Genomic Next-generation Universal MAPper), a program capable of overcoming two major obstacles in the mapping of reads from next-generation sequencing runs. First, we have created an algorithm that probabilistically maps reads to repeat regions in the genome on a quantitative basis. Second, we have developed a probabilistic Needleman-Wunsch algorithm which utilizes _prb.txt and _int.txt files produced in the Solexa/Illumina pipeline to improve the mapping accuracy for lower quality reads and increase the amount of usable data produced in a given experiment. AVAILABILITY The source code for the software can be downloaded from http://dna.cs.byu.edu/gnumap.
Collapse
Affiliation(s)
- Nathan L Clement
- Department of Computer Science, Department of Statistics, Brigham Young University, Provo, UT 84602, USA.
| | | | | | | | | | | | | | | |
Collapse
|
80
|
Chromosome aberrations involving 10q22: report of three overlapping interstitial deletions and a balanced translocation disrupting C10orf11. Eur J Hum Genet 2009; 18:291-5. [PMID: 19844253 DOI: 10.1038/ejhg.2009.163] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Interstitial deletions of chromosome band 10q22 are rare. We report on the characterization of three overlapping de novo 10q22 deletions by high-resolution array comparative genomic hybridization in three unrelated patients. Patient 1 had a 7.9 Mb deletion in 10q21.3-q22.2 and suffered from severe feeding problems, facial dysmorphisms and profound mental retardation. Patients 2 and 3 had nearly identical deletions of 3.2 and 3.6 Mb, the proximal breakpoints of which were located at an identical low-copy repeat. Both patients were mentally retarded; patient 3 also suffered from growth retardation and hypotonia. We also report on the results of breakpoint analysis by array painting in a mentally retarded patient with a balanced chromosome translocation 46,XY,t(10;13)(q22;p13)dn. The breakpoint in 10q22 was found to disrupt C10orf11, a brain-expressed gene in the common deleted interval of patients 1-3. This finding suggests that haploinsufficiency of C10orf11 contributes to the cognitive defects in 10q22 deletion patients.
Collapse
|
81
|
High-resolution identification of balanced and complex chromosomal rearrangements by 4C technology. Nat Methods 2009; 6:837-42. [PMID: 19820713 DOI: 10.1038/nmeth.1391] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2009] [Accepted: 09/16/2009] [Indexed: 01/13/2023]
Abstract
Balanced chromosomal rearrangements can cause disease, but techniques for their rapid and accurate identification are missing. Here we demonstrate that chromatin conformation capture on chip (4C) technology can be used to screen large genomic regions for balanced and complex inversions and translocations at high resolution. The 4C technique can be used to detect breakpoints also in repetitive DNA sequences as it uniquely relies on capturing genomic fragments across the breakpoint. Using 4C, we uncovered LMO3 as a potentially leukemogenic translocation partner of TRB@. We developed multiplex 4C to simultaneously screen for translocation partners of multiple selected loci. We identified unsuspected translocations and complex rearrangements. Furthermore, using 4C we detected translocations even in small subpopulations of cells. This strategy opens avenues for the rapid fine-mapping of cytogenetically identified translocations and inversions, and the efficient screening for balanced rearrangements near candidate loci, even when rearrangements exist only in subpopulations of cells.
Collapse
|
82
|
Córdova-Fletes C, Rademacher N, Müller I, Mundo-Ayala JN, Morales-Jeanhs EA, García-Ortiz JE, León-Gil A, Rivera H, Domínguez MG, Kalscheuer VM. CDKL5 truncation due to a t(X;2)(p22.1;p25.3) in a girl with X-linked infantile spasm syndrome. Clin Genet 2009; 77:92-6. [PMID: 19807736 DOI: 10.1111/j.1399-0004.2009.01286.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
83
|
Morozova O, Hirst M, Marra MA. Applications of new sequencing technologies for transcriptome analysis. Annu Rev Genomics Hum Genet 2009; 10:135-51. [PMID: 19715439 DOI: 10.1146/annurev-genom-082908-145957] [Citation(s) in RCA: 339] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Transcriptome analysis has been a key area of biological inquiry for decades. Over the years, research in the field has progressed from candidate gene-based detection of RNAs using Northern blotting to high-throughput expression profiling driven by the advent of microarrays. Next-generation sequencing technologies have revolutionized transcriptomics by providing opportunities for multidimensional examinations of cellular transcriptomes in which high-throughput expression data are obtained at a single-base resolution.
Collapse
Affiliation(s)
- Olena Morozova
- BC Cancer Agency, Genome Sciences Center, Vancouver, BC V5Z 4S6, Canada.
| | | | | |
Collapse
|
84
|
Tucker T, Marra M, Friedman JM. Massively parallel sequencing: the next big thing in genetic medicine. Am J Hum Genet 2009; 85:142-54. [PMID: 19679224 DOI: 10.1016/j.ajhg.2009.06.022] [Citation(s) in RCA: 214] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2009] [Revised: 06/24/2009] [Accepted: 06/29/2009] [Indexed: 01/24/2023] Open
Abstract
Massively parallel sequencing has reduced the cost and increased the throughput of genomic sequencing by more than three orders of magnitude, and it seems likely that costs will fall and throughput improve even more in the next few years. Clinical use of massively parallel sequencing will provide a way to identify the cause of many diseases of unknown etiology through simultaneous screening of thousands of loci for pathogenic mutations and by sequencing biological specimens for the genomic signatures of novel infectious agents. In addition to providing these entirely new diagnostic capabilities, massively parallel sequencing may also replace arrays and Sanger sequencing in clinical applications where they are currently being used. Routine clinical use of massively parallel sequencing will require higher accuracy, better ways to select genomic subsets of interest, and improvements in the functionality, speed, and ease of use of data analysis software. In addition, substantial enhancements in laboratory computer infrastructure, data storage, and data transfer capacity will be needed to handle the extremely large data sets produced. Clinicians and laboratory personnel will require training to use the sequence data effectively, and appropriate methods will need to be developed to deal with the incidental discovery of pathogenic mutations and variants of uncertain clinical significance. Massively parallel sequencing has the potential to transform the practice of medical genetics and related fields, but the vast amount of personal genomic data produced will increase the responsibility of geneticists to ensure that the information obtained is used in a medically and socially responsible manner.
Collapse
|
85
|
Weese D, Emde AK, Rausch T, Döring A, Reinert K. RazerS--fast read mapping with sensitivity control. Genome Res 2009; 19:1646-54. [PMID: 19592482 DOI: 10.1101/gr.088823.108] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Second-generation sequencing technologies deliver DNA sequence data at unprecedented high throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. Due to the large amounts of data, efficient algorithms and implementations are crucial for this task. We present an efficient read mapping tool called RazerS. It allows the user to align sequencing reads of arbitrary length using either the Hamming distance or the edit distance. Our tool can work either lossless or with a user-defined loss rate at higher speeds. Given the loss rate, we present an approach that guarantees not to lose more reads than specified. This enables the user to adapt to the problem at hand and provides a seamless tradeoff between sensitivity and running time.
Collapse
Affiliation(s)
- David Weese
- Department of Computer Science, Free University of Berlin, 14195 Berlin, Germany.
| | | | | | | | | |
Collapse
|
86
|
Hurd PJ, Nelson CJ. Advantages of next-generation sequencing versus the microarray in epigenetic research. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:174-83. [PMID: 19535508 DOI: 10.1093/bfgp/elp013] [Citation(s) in RCA: 157] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Several recent studies from the field of epigenetics have combined chromatin-immunoprecipitation (ChIP) with next-generation high-throughput sequencing technologies to describe the locations of histone post-translational modifications (PTM) and DNA methylation genome-wide. While these reports begin to quench the chromatin biologists thirst for visualizing where in the genome epigenetic marks are placed, they also illustrate several advantages of sequencing based genomics compared to microarray analysis. Accordingly, next-generation sequencing (NGS) technologies are now challenging microarrays as the tool of choice for genome analysis. The increased affordability of comprehensive sequence-based genomic analysis will enable new questions to be addressed in many areas of biology. It is inevitable that massively-parallel sequencing platforms will supercede the microarray for many applications, however, there are niches for microarrays to fill and interestingly we may very well witness a symbiotic relationship between microarrays and high-throughput sequencing in the future.
Collapse
Affiliation(s)
- Paul J Hurd
- School of Biological and Chemical Sciences, Queen Mary University of London, London, E1 4NS, UK.
| | | |
Collapse
|
87
|
Guffanti A, Iacono M, Pelucchi P, Kim N, Soldà G, Croft LJ, Taft RJ, Rizzi E, Askarian-Amiri M, Bonnal RJ, Callari M, Mignone F, Pesole G, Bertalot G, Bernardi LR, Albertini A, Lee C, Mattick JS, Zucchi I, De Bellis G. A transcriptional sketch of a primary human breast cancer by 454 deep sequencing. BMC Genomics 2009; 10:163. [PMID: 19379481 PMCID: PMC2678161 DOI: 10.1186/1471-2164-10-163] [Citation(s) in RCA: 195] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2008] [Accepted: 04/20/2009] [Indexed: 02/07/2023] Open
Abstract
Background The cancer transcriptome is difficult to explore due to the heterogeneity of quantitative and qualitative changes in gene expression linked to the disease status. An increasing number of "unconventional" transcripts, such as novel isoforms, non-coding RNAs, somatic gene fusions and deletions have been associated with the tumoral state. Massively parallel sequencing techniques provide a framework for exploring the transcriptional complexity inherent to cancer with a limited laboratory and financial effort. We developed a deep sequencing and bioinformatics analysis protocol to investigate the molecular composition of a breast cancer poly(A)+ transcriptome. This method utilizes a cDNA library normalization step to diminish the representation of highly expressed transcripts and biology-oriented bioinformatic analyses to facilitate detection of rare and novel transcripts. Results We analyzed over 132,000 Roche 454 high-confidence deep sequencing reads from a primary human lobular breast cancer tissue specimen, and detected a range of unusual transcriptional events that were subsequently validated by RT-PCR in additional eight primary human breast cancer samples. We identified and validated one deletion, two novel ncRNAs (one intergenic and one intragenic), ten previously unknown or rare transcript isoforms and a novel gene fusion specific to a single primary tissue sample. We also explored the non-protein-coding portion of the breast cancer transcriptome, identifying thousands of novel non-coding transcripts and more than three hundred reads corresponding to the non-coding RNA MALAT1, which is highly expressed in many human carcinomas. Conclusion Our results demonstrate that combining 454 deep sequencing with a normalization step and careful bioinformatic analysis facilitates the discovery and quantification of rare transcripts or ncRNAs, and can be used as a qualitative tool to characterize transcriptome complexity, revealing many hitherto unknown transcripts, splice isoforms, gene fusion events and ncRNAs, even at a relatively low sequence sampling.
Collapse
Affiliation(s)
- Alessandro Guffanti
- Institute of Biomedical Technologies, National Research Council, Milan, Italy.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
88
|
Chu T, Bunce K, Hogge WA, Peters DG. Statistical model for whole genome sequencing and its application to minimally invasive diagnosis of fetal genetic disease. Bioinformatics 2009; 25:1244-50. [DOI: 10.1093/bioinformatics/btp156] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
89
|
Xie C, Tammi MT. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 2009; 10:80. [PMID: 19267900 PMCID: PMC2667514 DOI: 10.1186/1471-2105-10-80] [Citation(s) in RCA: 395] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Accepted: 03/06/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations. RESULTS Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads. CONCLUSION Simulation of various sequencing methods with coverage between 0.1x to 8x show overall specificity between 91.7 - 99.9%, and sensitivity between 72.2 - 96.5%. We also show the results for assessment of CNV between two individual human genomes.
Collapse
Affiliation(s)
- Chao Xie
- Department of Biological Sciences, National University of Singapore, Singapore.
| | | |
Collapse
|
90
|
Prensner JR, Chinnaiyan AM. Oncogenic gene fusions in epithelial carcinomas. Curr Opin Genet Dev 2009; 19:82-91. [PMID: 19233641 DOI: 10.1016/j.gde.2008.11.008] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2008] [Accepted: 11/21/2008] [Indexed: 12/12/2022]
Abstract
New discoveries regarding recurrent chromosomal aberrations in epithelial tumors have challenged the view that gene fusions play a minor role in these cancers. It is now known that recurrent fusions characterize significant subsets of prostate, breast, lung and renal-cell carcinomas, among others. This work has generated new insights into the molecular subtypes of tumors and highlighted important advances in bioinformatics, sequencing, and microarray technology as tools for gene fusion discovery. Given the ubiquity of tyrosine kinases and transcription factors in gene fusions, further interest in the potential 'druggability' of gene fusions with targeted therapeutics has also flourished. Nevertheless, the majority of chromosomal abnormalities in epithelial cancers remain uncharacterized, underscoring the limitations of our knowledge of carcinogenesis and the requirement for further research.
Collapse
Affiliation(s)
- John R Prensner
- Michigan Center for Translational Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | | |
Collapse
|
91
|
Skotheim RI, Thomassen GOS, Eken M, Lind GE, Micci F, Ribeiro FR, Cerveira N, Teixeira MR, Heim S, Rognes T, Lothe RA. A universal assay for detection of oncogenic fusion transcripts by oligo microarray analysis. Mol Cancer 2009; 8:5. [PMID: 19152679 PMCID: PMC2633275 DOI: 10.1186/1476-4598-8-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2008] [Accepted: 01/19/2009] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND The ability to detect neoplasia-specific fusion genes is important not only in cancer research, but also increasingly in clinical settings to ensure that correct diagnosis is made and the optimal treatment is chosen. However, the available methodologies to detect such fusions all have their distinct short-comings. RESULTS We describe a novel oligonucleotide microarray strategy whereby one can screen for all known oncogenic fusion transcripts in a single experiment. To accomplish this, we combine measurements of chimeric transcript junctions with exon-wise measurements of individual fusion partners. To demonstrate the usefulness of the approach, we designed a DNA microarray containing 68,861 oligonucleotide probes that includes oligos covering all combinations of chimeric exon-exon junctions from 275 pairs of fusion genes, as well as sets of oligos internal to all the exons of the fusion partners. Using this array, proof of principle was demonstrated by detection of known fusion genes (such as TCF3:PBX1, ETV6:RUNX1, and TMPRSS2:ERG) from all six positive controls consisting of leukemia cell lines and prostate cancer biopsies. CONCLUSION This new method bears promise of an important complement to currently used diagnostic and research tools for the detection of fusion genes in neoplastic diseases.
Collapse
Affiliation(s)
- Rolf I Skotheim
- Department of Cancer Prevention, Institute for Cancer Research, Norwegian Radium Hospital, Rikshospitalet University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, University of Oslo, Oslo, Norway
| | - Gard OS Thomassen
- Department of Cancer Prevention, Institute for Cancer Research, Norwegian Radium Hospital, Rikshospitalet University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, University of Oslo, Oslo, Norway
- Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, Rikshospitalet University Hospital, Oslo, Norway
| | - Marthe Eken
- Department of Cancer Prevention, Institute for Cancer Research, Norwegian Radium Hospital, Rikshospitalet University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, University of Oslo, Oslo, Norway
- Department of Molecular Biosciences, University of Oslo, Oslo, Norway
| | - Guro E Lind
- Department of Cancer Prevention, Institute for Cancer Research, Norwegian Radium Hospital, Rikshospitalet University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, University of Oslo, Oslo, Norway
| | - Francesca Micci
- Department of Cancer Genetics, Norwegian Radium Hospital, Rikshospitalet University Hospital, Oslo, Norway
| | - Franclim R Ribeiro
- Department of Cancer Prevention, Institute for Cancer Research, Norwegian Radium Hospital, Rikshospitalet University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, University of Oslo, Oslo, Norway
- Department of Genetics, Portuguese Oncology Institute, Porto, Portugal
| | - Nuno Cerveira
- Department of Genetics, Portuguese Oncology Institute, Porto, Portugal
| | - Manuel R Teixeira
- Centre for Cancer Biomedicine, University of Oslo, Oslo, Norway
- Department of Genetics, Portuguese Oncology Institute, Porto, Portugal
| | - Sverre Heim
- Department of Cancer Genetics, Norwegian Radium Hospital, Rikshospitalet University Hospital, Oslo, Norway
- Medical Faculty, University of Oslo, Oslo, Norway
| | - Torbjørn Rognes
- Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, Rikshospitalet University Hospital, Oslo, Norway
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Ragnhild A Lothe
- Department of Cancer Prevention, Institute for Cancer Research, Norwegian Radium Hospital, Rikshospitalet University Hospital, Oslo, Norway
- Centre for Cancer Biomedicine, University of Oslo, Oslo, Norway
- Department of Molecular Biosciences, University of Oslo, Oslo, Norway
| |
Collapse
|
92
|
Abstract
DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high-throughput data collection. Over the past three years, massively parallel DNA sequencing platforms have become widely available, reducing the cost of DNA sequencing by over two orders of magnitude, and democratizing the field by putting the sequencing capacity of a major genome center in the hands of individual investigators. These new technologies are rapidly evolving, and near-term challenges include the development of robust protocols for generating sequencing libraries, building effective new approaches to data-analysis, and often a rethinking of experimental design. Next-generation DNA sequencing has the potential to dramatically accelerate biological and biomedical research, by enabling the comprehensive analysis of genomes, transcriptomes and interactomes to become inexpensive, routine and widespread, rather than requiring significant production-scale efforts.
Collapse
Affiliation(s)
- Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065, USA.
| | | |
Collapse
|
93
|
Denoeud F, Aury JM, Da Silva C, Noel B, Rogier O, Delledonne M, Morgante M, Valle G, Wincker P, Scarpelli C, Jaillon O, Artiguenave F. Annotating genomes with massive-scale RNA sequencing. Genome Biol 2008; 9:R175. [PMID: 19087247 PMCID: PMC2646279 DOI: 10.1186/gb-2008-9-12-r175] [Citation(s) in RCA: 171] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2008] [Revised: 10/30/2008] [Accepted: 12/16/2008] [Indexed: 01/13/2023] Open
Abstract
A method for de novo genome annotation using high-throughput cDNA sequencing data. Next generation technologies enable massive-scale cDNA sequencing (so-called RNA-Seq). Mainly because of the difficulty of aligning short reads on exon-exon junctions, no attempts have been made so far to use RNA-Seq for building gene models de novo, that is, in the absence of a set of known genes and/or splicing events. We present G-Mo.R-Se (Gene Modelling using RNA-Seq), an approach aimed at building gene models directly from RNA-Seq and demonstrate its utility on the grapevine genome.
Collapse
Affiliation(s)
- France Denoeud
- CEA, DSV, Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, CP5706, 91057 Evry, France.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
94
|
Scheibye-Alsing K, Hoffmann S, Frankel A, Jensen P, Stadler PF, Mang Y, Tommerup N, Gilchrist MJ, Nygård AB, Cirera S, Jørgensen CB, Fredholm M, Gorodkin J. Sequence assembly. Comput Biol Chem 2008; 33:121-36. [PMID: 19152793 DOI: 10.1016/j.compbiolchem.2008.11.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2008] [Revised: 11/28/2008] [Accepted: 11/28/2008] [Indexed: 01/20/2023]
Abstract
Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and plays an important role in processing the information generated by these methods. Here, we provide a comprehensive overview of the current publicly available sequence assembly programs. We describe the basic principles of computational assembly along with the main concerns, such as repetitive sequences in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html.
Collapse
Affiliation(s)
- K Scheibye-Alsing
- Division of Genetics and Bioinformatics, IBHV, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
95
|
Rougemont J, Amzallag A, Iseli C, Farinelli L, Xenarios I, Naef F. Probabilistic base calling of Solexa sequencing data. BMC Bioinformatics 2008; 9:431. [PMID: 18851737 PMCID: PMC2575221 DOI: 10.1186/1471-2105-9-431] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2008] [Accepted: 10/13/2008] [Indexed: 12/02/2022] Open
Abstract
Background Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. Results We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. Conclusion We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
Collapse
Affiliation(s)
- Jacques Rougemont
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland.
| | | | | | | | | | | |
Collapse
|
96
|
Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques 2008; 45:81-94. [PMID: 18611170 DOI: 10.2144/000112900] [Citation(s) in RCA: 327] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Sequence-based methods for transcriptome characterization have typically relied on generation of either serial analysis of gene expression tags or expressed sequence tags. Although such approaches have the potential to enumerate transcripts by counting sequence tags derived from them, they typically do not robustly survey the majority of transcripts along their entire length. Here we show that massively parallel sequencing of randomly primed cDNAs, using a next-generation sequencing-by-synthesis technology, offers the potential to generate relative measures of mRNA and individual exon abundance while simultaneously profiling the prevalence of both annotated and novel exons and exon-splicing events. This technique identifies known single nucleotide polymorphisms (SNPs) as well as novel single-base variants. Analysis of these variants, and previously unannotated splicing events in the HeLa S3 cell line, reveals an overrepresentation of gene categories including those previously implicated in cancer.
Collapse
|
97
|
Ropers HH. Genetics of intellectual disability. Curr Opin Genet Dev 2008; 18:241-50. [DOI: 10.1016/j.gde.2008.07.008] [Citation(s) in RCA: 143] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Accepted: 07/15/2008] [Indexed: 11/16/2022]
|