1
|
Liu TJ, Zhou JJ, Chen FY, Gan ZM, Li YP, Zhang JZ, Hu CG. Identification of the Genetic Variation and Gene Exchange between Citrus Trifoliata and Citrus Clementina. Biomolecules 2018; 8:E182. [PMID: 30572650 PMCID: PMC6315893 DOI: 10.3390/biom8040182] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 12/13/2018] [Accepted: 12/17/2018] [Indexed: 11/17/2022] Open
Abstract
To identify the genetic variation between Citrus trifoliata and Citrus clementina, we performed genome resequencing on the two citrus species. Compared with the citrus reference genome, a total of 9,449,204 single-nucleotide polymorphisms (SNPs) and 846,615 insertion/deletion polymorphisms (InDels) were identified in the two citrus species, while 1,868,115 (19.77%) of the SNPs and 190,199 (22.47%) of the InDels from the two citrus species were located in the genic regions. Meanwhile, a total of 8,091,407 specific SNPs and 692,654 specific InDels were identified in the two citrus genotypes, yielding an average of 27.32 SNPs/kb and 2.34 InDels/kb. We identified and characterized the patterns of gene exchanges in the grafted citrus plants by using specific genetic variation from genome resequencing. A total of 4396 transporting genes across graft junctions was identified. Some specific genetic variation and mobile genes was also confirmed by Sanger sequencing. Furthermore, these mobile genes could move directionally or bidirectionally between the scions and the rootstocks. In addition, a total of 1581 and 2577 differentially expressed genes were found in the scions and the rootstocks after grafting compared with the control, respectively. These genetic variations provide fundamental information on the genetic basis of important traits between C. trifoliata and C. clementina, as the transport of genes would be applicable to horticulture crops.
Collapse
Affiliation(s)
- Tian-Jia Liu
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Jing-Jing Zhou
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Fa-Yi Chen
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Zhi-Meng Gan
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Yong-Ping Li
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Jin-Zhi Zhang
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Chun-Gen Hu
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
2
|
Liu TJ, Li YP, Zhou JJ, Hu CG, Zhang JZ. Genome-wide genetic variation and comparison of fruit-associated traits between kumquat (Citrus japonica) and Clementine mandarin (Citrus clementina). PLANT MOLECULAR BIOLOGY 2018; 96:493-507. [PMID: 29480424 DOI: 10.1007/s11103-018-0712-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2017] [Accepted: 02/21/2018] [Indexed: 06/08/2023]
Abstract
The comprehensive genetic variation of two citrus species were analyzed at genome and transcriptome level. A total of 1090 differentially expressed genes were found during fruit development by RNA-sequencing. Fruit size (fruit equatorial diameter) and weight (fresh weight) are the two most important components determining yield and consumer acceptability for many horticultural crops. However, little is known about the genetic control of these traits. Here, we performed whole-genome resequencing to reveal the comprehensive genetic variation of the fruit development between kumquat (Citrus japonica) and Clementine mandarin (Citrus clementina). In total, 5,865,235 single-nucleotide polymorphisms (SNPs) and 414,447 insertions/deletions (InDels) were identified in the two citrus species. Based on integrative analysis of genome and transcriptome of fruit, 640,801 SNPs and 20,733 InDels were identified. The features, genomic distribution, functional effect, and other characteristics of these genetic variations were explored. RNA-sequencing identified 1090 differentially expressed genes (DEGs) during fruit development of kumquat and Clementine mandarin. Gene Ontology revealed that these genes were involved in various molecular functional and biological processes. In addition, the genetic variation of 939 DEGs and 74 multiple fruit development pathway genes from previous reports were also identified. A global survey identified 24,237 specific alternative splicing events in the two citrus species and showed that intron retention is the most prevalent pattern of alternative splicing. These genome variation data provide a foundation for further exploration of citrus diversity and gene-phenotype relationships and for future research on molecular breeding to improve kumquat, Clementine mandarin and related species.
Collapse
Affiliation(s)
- Tian-Jia Liu
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yong-Ping Li
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jing-Jing Zhou
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Chun-Gen Hu
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Jin-Zhi Zhang
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
3
|
Zhang JZ, Liu SR, Hu CG. Identifying the genome-wide genetic variation between precocious trifoliate orange and its wild type and developing new markers for genetics research. DNA Res 2016; 23:403-14. [PMID: 27106267 PMCID: PMC4991830 DOI: 10.1093/dnares/dsw017] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2015] [Accepted: 03/21/2016] [Indexed: 01/01/2023] Open
Abstract
To increase our understanding of the genes involved in flowering in citrus, we performed genome resequencing of an early flowering trifoliate orange mutant (Poncirus trifoliata L. Raf.) and its wild type. At the genome level, 3,932,628 single nucleotide polymorphisms (SNPs), 1,293,383 insertion/deletion polymorphisms (InDels), and 52,135 structural variations were identified between the mutant and its wild type based on the citrus reference genome. Based on integrative analysis of resequencing and transcriptome analysis, 233,998 SNPs and 75,836 InDels were also identified between the mutant and its wild type at the transcriptional level. Also, 272 citrus homologous flowering-time transcripts containing genetic variation were also identified. Gene Ontology and Kyoto Encyclopaedia of Genes and Genomes annotation revealed that the transcripts containing the mutant- and the wild-type-specific InDel were involved in diverse biological processes and molecular function. Among these transcripts, there were 131 transcripts that were expressed differently in the two genotypes. When 268 selected InDels were tested on 32 genotypes of the three genera of Rutaceae for the genetic diversity assessment, these InDel-based markers showed high transferability. This work provides important information that will allow a better understanding of the citrus genome and that will be helpful for dissecting the genetic basis of important traits in citrus.
Collapse
Affiliation(s)
- Jin-Zhi Zhang
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China
| | - Sheng-Rui Liu
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China
| | - Chun-Gen Hu
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
4
|
Modolo L, Lerat E. UrQt: an efficient software for the Unsupervised Quality trimming of NGS data. BMC Bioinformatics 2015; 16:137. [PMID: 25924884 PMCID: PMC4450468 DOI: 10.1186/s12859-015-0546-8] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Accepted: 03/20/2015] [Indexed: 11/25/2022] Open
Abstract
Background Quality control is a necessary step of any Next Generation Sequencing analysis. Although customary, this step still requires manual interventions to empirically choose tuning parameters according to various quality statistics. Moreover, current quality control procedures that provide a “good quality” data set, are not optimal and discard many informative nucleotides. To address these drawbacks, we present a new quality control method, implemented in UrQt software, for Unsupervised Quality trimming of Next Generation Sequencing reads. Results Our trimming procedure relies on a well-defined probabilistic framework to detect the best segmentation between two segments of unreliable nucleotides, framing a segment of informative nucleotides. Our software only requires one user-friendly parameter to define the minimal quality threshold (phred score) to consider a nucleotide to be informative, which is independent of both the experiment and the quality of the data. This procedure is implemented in C++ in an efficient and parallelized software with a low memory footprint. We tested the performances of UrQt compared to the best-known trimming programs, on seven RNA and DNA sequencing experiments and demonstrated its optimality in the resulting tradeoff between the number of trimmed nucleotides and the quality objective. Conclusions By finding the best segmentation to delimit a segment of good quality nucleotides, UrQt greatly increases the number of reads and of nucleotides that can be retained for a given quality objective. UrQt source files, binary executables for different operating systems and documentation are freely available (under the GPLv3) at the following address: https://lbbe.univ-lyon1.fr/-UrQt-.html. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0546-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laurent Modolo
- Université de Lyon; Université Lyon 1; CNRS; UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 43 bd du 11 novembre 1918, Villeurbanne cedex, 69622, France.
| | - Emmanuelle Lerat
- Université de Lyon; Université Lyon 1; CNRS; UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 43 bd du 11 novembre 1918, Villeurbanne cedex, 69622, France.
| |
Collapse
|
5
|
Xu K, Sun F, Chai G, Wang Y, Shi L, Liu S, Xi Y. De novo assembly and transcriptome analysis of two contrary tillering mutants to learn the mechanisms of tillers outgrowth in switchgrass (Panicum virgatum L.). FRONTIERS IN PLANT SCIENCE 2015; 6:749. [PMID: 26442062 PMCID: PMC4584987 DOI: 10.3389/fpls.2015.00749] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Accepted: 09/02/2015] [Indexed: 05/20/2023]
Abstract
Tillering is an important trait in monocotyledon plants. The switchgrass (Panicum virgatum), studied usually as a source of biomass for energy production, can produce hundreds of tillers in its lifetime. Studying the tillering of switchgrass also provides information for other monocot crops. High-tillering and low-tillering mutants were produced by ethyl methanesulfonate mutagenesis. Alteration of tillering ability resulted from different tiller buds outgrowth in the two mutants. We sequenced the tiller buds transcriptomes of high-tillering and low-tillering plants using next-generation sequencing technology, and generated 34 G data in total. In the de novo assembly results, 133,828 unigenes were detected with an average length of 1,238 bp, and 5,290 unigenes were differentially expressed between the two mutants, including 3,225 up-regulated genes and 2,065 down-regulated genes. Differentially expressed gene analysis with functional annotations was performed to identify candidate genes involved in tiller bud outgrowth processes using Gene Ontology classification, Cluster of Orthologous Groups of proteins, and Kyoto Encyclopedia of Genes and Genomes pathway analysis. This is the first study to explore the tillering transcriptome in two types of tillering mutants by de novo sequencing.
Collapse
Affiliation(s)
- Kaijie Xu
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Agronomy, Northwest A&F UniversityYangling, China
- Institute of Cotton Research of CAASAnyang, China
| | - Fengli Sun
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Agronomy, Northwest A&F UniversityYangling, China
- *Correspondence: Yajun Xi and Fengli Sun, State Key Laboratory of Crop Stress Biology for Arid Areas, College of Agronomy, Northwest A&F University, No. 3, Taicheng Road, Yangling, Shaanxi 712100, China, ;
| | - Guaiqiang Chai
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Agronomy, Northwest A&F UniversityYangling, China
| | - Yongfeng Wang
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Agronomy, Northwest A&F UniversityYangling, China
| | - Lili Shi
- HanDanShi Agriculture Academy of SciencesHandan, China
| | - Shudong Liu
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Agronomy, Northwest A&F UniversityYangling, China
| | - Yajun Xi
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Agronomy, Northwest A&F UniversityYangling, China
- *Correspondence: Yajun Xi and Fengli Sun, State Key Laboratory of Crop Stress Biology for Arid Areas, College of Agronomy, Northwest A&F University, No. 3, Taicheng Road, Yangling, Shaanxi 712100, China, ;
| |
Collapse
|
6
|
Bacher U, Kohlmann A, Haferlach T. Mutational profiling in patients with MDS: ready for every-day use in the clinic? Best Pract Res Clin Haematol 2014; 28:32-42. [PMID: 25659728 DOI: 10.1016/j.beha.2014.11.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 11/04/2014] [Indexed: 12/18/2022]
Abstract
Multiple recurrent somatic mutations were identified in the majority of patients with myelodysplastic syndromes (MDS), but investigating the broad spectrum of molecular markers in MDS exceeds many laboratories' capacity when traditional molecular techniques are used. High-throughput second generation sequencing (=next-generation sequencing, NGS) has proven to be applicable for comprehensive biomarker mutation analyses allowing to increase diagnostic sensitivity and accuracy and to improve risk stratification and prognostication in addition to cytomorphology and cytogenetic analysis in patients with MDS. Amplicon deep-sequencing enables comprehensive biomarker analysis in a multitude of patients per investigation in an acceptable turn-around time and at affordable costs. Comprehensive myeloid marker panels were successfully introduced into diagnostic practice. Therefore, molecular mutation analysis is ready for use in all patients with suspected MDS, may contribute to risk stratification in possible candidates for allogeneic stem cell transplantation, and should become an integral part of clinical research studies in MDS patients.
Collapse
|
7
|
Tytgat B, Verleyen E, Obbels D, Peeters K, De Wever A, D’hondt S, De Meyer T, Van Criekinge W, Vyverman W, Willems A. Bacterial diversity assessment in Antarctic terrestrial and aquatic microbial mats: a comparison between bidirectional pyrosequencing and cultivation. PLoS One 2014; 9:e97564. [PMID: 24887330 PMCID: PMC4041716 DOI: 10.1371/journal.pone.0097564] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Accepted: 04/21/2014] [Indexed: 12/26/2022] Open
Abstract
The application of high-throughput sequencing of the 16S rRNA gene has increased the size of microbial diversity datasets by several orders of magnitude, providing improved access to the rare biosphere compared with cultivation-based approaches and more established cultivation-independent techniques. By contrast, cultivation-based approaches allow the retrieval of both common and uncommon bacteria that can grow in the conditions used and provide access to strains for biotechnological applications. We performed bidirectional pyrosequencing of the bacterial 16S rRNA gene diversity in two terrestrial and seven aquatic Antarctic microbial mat samples previously studied by heterotrophic cultivation. While, not unexpectedly, 77.5% of genera recovered by pyrosequencing were not among the isolates, 25.6% of the genera picked up by cultivation were not detected by pyrosequencing. To allow comparison between both techniques, we focused on the five phyla (Proteobacteria, Actinobacteria, Bacteroidetes, Firmicutes and Deinococcus-Thermus) recovered by heterotrophic cultivation. Four of these phyla were among the most abundantly recovered by pyrosequencing. Strikingly, there was relatively little overlap between cultivation and the forward and reverse pyrosequencing-based datasets at the genus (17.1–22.2%) and OTU (3.5–3.6%) level (defined on a 97% similarity cut-off level). Comparison of the V1–V2 and V3–V2 datasets of the 16S rRNA gene revealed remarkable differences in number of OTUs and genera recovered. The forward dataset missed 33% of the genera from the reverse dataset despite comprising 50% more OTUs, while the reverse dataset did not contain 40% of the genera of the forward dataset. Similar observations were evident when comparing the forward and reverse cultivation datasets. Our results indicate that the region under consideration can have a large impact on perceived diversity, and should be considered when comparing different datasets. Finally, a high number of OTUs could not be classified using the RDP reference database, suggesting the presence of a large amount of novel diversity.
Collapse
Affiliation(s)
- Bjorn Tytgat
- Laboratory for Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
- * E-mail:
| | - Elie Verleyen
- Laboratory of Protistology and Aquatic Ecology, Department of Biology, Ghent University, Ghent, Belgium
| | - Dagmar Obbels
- Laboratory of Protistology and Aquatic Ecology, Department of Biology, Ghent University, Ghent, Belgium
| | - Karolien Peeters
- Laboratory for Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Aaike De Wever
- Laboratory of Protistology and Aquatic Ecology, Department of Biology, Ghent University, Ghent, Belgium
| | - Sofie D’hondt
- Laboratory of Protistology and Aquatic Ecology, Department of Biology, Ghent University, Ghent, Belgium
| | - Tim De Meyer
- Laboratory of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Ghent, Belgium
| | - Wim Van Criekinge
- Laboratory of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Ghent, Belgium
| | - Wim Vyverman
- Laboratory of Protistology and Aquatic Ecology, Department of Biology, Ghent University, Ghent, Belgium
| | - Anne Willems
- Laboratory for Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| |
Collapse
|
8
|
Kohlmann A, Bacher U, Schnittger S, Haferlach T. Perspective on how to approach molecular diagnostics in acute myeloid leukemia and myelodysplastic syndromes in the era of next-generation sequencing. Leuk Lymphoma 2014; 55:1725-34. [PMID: 24144312 DOI: 10.3109/10428194.2013.856427] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Molecular mutation information became essential for biological subclassification, risk stratification and therapeutic decisions in patients with acute myeloid leukemia (AML). In myelodysplastic syndromes (MDS), a broad spectrum of molecular biomarkers such as the spliceosome mutations has been identified in recent years. The currently established combination of various polymerase chain reaction (PCR) methods with capillary Sanger sequencing for mutation analysis in AML is time-consuming and labor-intensive. The constantly increasing spectrum of molecular mutations is a tremendous challenge for hematological laboratories. The introduction of high-throughput sequencing technology, which allows the massive parallel analysis of hundreds of thousands of alleles in the shortest time, provides new options for molecular mutation analyses and for follow-up diagnostics in myeloid neoplasms. In contrast to whole-genome or exome analyses, amplicon deep-sequencing focuses on distinct genomic loci and their mutation patterns and enables a comprehensive biomarker analysis in a multitude of patients per analysis. This review summarizes thus far established common molecular diagnostic strategies and intends to outline the perspective of distinct novel amplicon deep-sequencing panels for patients with AML and MDS. It is foreseeable that clearly defined algorithms for molecular investigations will revolutionize diagnosis in patients with AML and MDS in the near future.
Collapse
|
9
|
Valdés A, Ibáñez C, Simó C, García-Cañas V. Recent transcriptomics advances and emerging applications in food science. Trends Analyt Chem 2013. [DOI: 10.1016/j.trac.2013.06.014] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
10
|
Besnard T, García-García G, Baux D, Vaché C, Faugère V, Larrieu L, Léonard S, Millan JM, Malcolm S, Claustres M, Roux AF. Experience of targeted Usher exome sequencing as a clinical test. Mol Genet Genomic Med 2013; 2:30-43. [PMID: 24498627 PMCID: PMC3907913 DOI: 10.1002/mgg3.25] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2013] [Accepted: 06/06/2013] [Indexed: 12/15/2022] Open
Abstract
We show that massively parallel targeted sequencing of 19 genes provides a new and reliable strategy for molecular diagnosis of Usher syndrome (USH) and nonsyndromic deafness, particularly appropriate for these disorders characterized by a high clinical and genetic heterogeneity and a complex structure of several of the genes involved. A series of 71 patients including Usher patients previously screened by Sanger sequencing plus newly referred patients was studied. Ninety-eight percent of the variants previously identified by Sanger sequencing were found by next-generation sequencing (NGS). NGS proved to be efficient as it offers analysis of all relevant genes which is laborious to reach with Sanger sequencing. Among the 13 newly referred Usher patients, both mutations in the same gene were identified in 77% of cases (10 patients) and one candidate pathogenic variant in two additional patients. This work can be considered as pilot for implementing NGS for genetically heterogeneous diseases in clinical service.
Collapse
Affiliation(s)
- Thomas Besnard
- U827, Inserm Montpellier, F-34000, France ; Univ, Montpellier I Montpellier, F-34000, France
| | - Gema García-García
- U827, Inserm Montpellier, F-34000, France ; Grupo de Investigación en Enfermedades Neurosensoriales, Instituto de Investigación Sanitaria IIS-La Fe and CIBERER Valencia, Spain
| | - David Baux
- Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| | - Christel Vaché
- Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| | - Valérie Faugère
- Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| | - Lise Larrieu
- Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| | - Susana Léonard
- Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| | - Jose M Millan
- Grupo de Investigación en Enfermedades Neurosensoriales, Instituto de Investigación Sanitaria IIS-La Fe and CIBERER Valencia, Spain
| | - Sue Malcolm
- Clinical and Molecular Genetics, Institute of Child Health, University College London London, United Kingdom
| | - Mireille Claustres
- U827, Inserm Montpellier, F-34000, France ; Univ, Montpellier I Montpellier, F-34000, France ; Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| | - Anne-Françoise Roux
- U827, Inserm Montpellier, F-34000, France ; Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| |
Collapse
|
11
|
Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z. A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform 2013; 15:256-78. [PMID: 23341494 PMCID: PMC3956068 DOI: 10.1093/bib/bbs086] [Citation(s) in RCA: 335] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers.
Collapse
Affiliation(s)
- Stephan Pabinger
- Division for Bioinformatics, Innsbruck Medical University, Innrain 80, 6020 Innsbruck, Austria. Tel.: +43-512-9003-71401; Fax: +43-512-9003-73100;
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Kohlmann A, Grossmann V, Nadarajah N, Haferlach T. Next-generation sequencing - feasibility and practicality in haematology. Br J Haematol 2013; 160:736-53. [PMID: 23294427 DOI: 10.1111/bjh.12194] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2012] [Accepted: 11/26/2012] [Indexed: 11/27/2022]
Abstract
Next-generation sequencing platforms have evolved to provide an accurate and comprehensive means for the detection of molecular mutations in heterogeneous tumour specimens. Here, we review the feasibility and practicality of this novel laboratory technology. In particular, we focus on the utility of next-generation sequencing technology in characterizing haematological neoplasms and the landmark findings in key haematological malignancies. We also discuss deep-sequencing strategies to analyse the constantly increasing number of molecular markers applied for disease classification, patient stratification and individualized monitoring of minimal residual disease. Although many facets of this assay need to be taken into account, amplicon deep-sequencing has already demonstrated a promising technical performance and is being continuously developed towards routine application in diagnostic laboratories so that an impact on clinical practice can be achieved.
Collapse
|
13
|
Liu B, Wang Y, Zhai W, Deng J, Wang H, Cui Y, Cheng F, Wang X, Wu J. Development of InDel markers for Brassica rapa based on whole-genome re-sequencing. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2013; 126:231-9. [PMID: 22972202 DOI: 10.1007/s00122-012-1976-6] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 08/24/2012] [Indexed: 05/04/2023]
Abstract
Genome-wide detection of short insertion/deletion length polymorphisms (InDels, <5 bp) in Brassica rapa (named the A genome) was performed by comparing whole-genome re-sequencing data from two B. rapa accessions, L144 and Z16, to the reference genome sequence of Chiifu-401-42. In total, we identified 108,558 InDel polymorphisms between Chiifu-401-42 and L144, 26,795 InDels between Z16 and Chiifu-401-42, and 26,693 InDels between L144 and Z16. From these, 639 InDel polymorphisms of 3-5 bp in length between L144 and Z16 were selected for experimental validation; 491 (77%) yielded single PCR fragments and showed polymorphisms, 7 (1%) did not amplify a product, and 141 (22%) showed no polymorphism. For further validation of these intra-specific InDel polymorphisms, 503 candidates, randomly selected from the 639 InDels, were screened across seven accessions representing different B. rapa cultivar groups. Of these assayed markers, 387 (77%) were polymorphic, 111 (22%) were not polymorphic and 5 (1%) did not amplify a PCR product. Furthermore, we randomly selected 518 InDel markers to validate their polymorphism in B. napus (the AC genome) and B. juncea (the AB genome), of which more than 90% amplified a PCR product; 132 (25%) showed polymorphism between the two B. napus accessions and 41 (8%) between the two B. juncea accessions. This set of novel PCR-based InDel markers will be a valuable resource for genetic studies and breeding programs in B. rapa.
Collapse
Affiliation(s)
- Bo Liu
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing 100081, People's Republic of China.
| | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Jorge NAN, Ferreira CG, Passetti F. Bioinformatics of Cancer ncRNA in High Throughput Sequencing: Present State and Challenges. Front Genet 2012; 3:287. [PMID: 23251139 PMCID: PMC3523245 DOI: 10.3389/fgene.2012.00287] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Accepted: 11/22/2012] [Indexed: 12/24/2022] Open
Abstract
The numerous genome sequencing projects produced unprecedented amount of data providing significant information to the discovery of novel non-coding RNA (ncRNA). Several ncRNAs have been described to control gene expression and display important role during cell differentiation and homeostasis. In the last decade, high throughput methods in conjunction with approaches in bioinformatics have been used to identify, classify, and evaluate the expression of hundreds of ncRNA in normal and pathological states, such as cancer. Patient outcomes have been already associated with differential expression of ncRNAs in normal and tumoral tissues, providing new insights in the development of innovative therapeutic strategies in oncology. In this review, we present and discuss bioinformatics advances in the development of computational approaches to analyze and discover ncRNA data in oncology using high throughput sequencing technologies.
Collapse
|
15
|
Oberg AL, Bot BM, Grill DE, Poland GA, Therneau TM. Technical and biological variance structure in mRNA-Seq data: life in the real world. BMC Genomics 2012; 13:304. [PMID: 22769017 PMCID: PMC3505161 DOI: 10.1186/1471-2164-13-304] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2012] [Accepted: 07/07/2012] [Indexed: 01/14/2023] Open
Abstract
Background mRNA expression data from next generation sequencing platforms is obtained in the form of counts per gene or exon. Counts have classically been assumed to follow a Poisson distribution in which the variance is equal to the mean. The Negative Binomial distribution which allows for over-dispersion, i.e., for the variance to be greater than the mean, is commonly used to model count data as well. Results In mRNA-Seq data from 25 subjects, we found technical variation to generally follow a Poisson distribution as has been reported previously and biological variability was over-dispersed relative to the Poisson model. The mean-variance relationship across all genes was quadratic, in keeping with a Negative Binomial (NB) distribution. Over-dispersed Poisson and NB distributional assumptions demonstrated marked improvements in goodness-of-fit (GOF) over the standard Poisson model assumptions, but with evidence of over-fitting in some genes. Modeling of experimental effects improved GOF for high variance genes but increased the over-fitting problem. Conclusions These conclusions will guide development of analytical strategies for accurate modeling of variance structure in these data and sample size determination which in turn will aid in the identification of true biological signals that inform our understanding of biological systems.
Collapse
Affiliation(s)
- Ann L Oberg
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, Rochester, MN 55905, USA.
| | | | | | | | | |
Collapse
|
16
|
Kohlmann A, Grossmann V, Haferlach T. Integration of next-generation sequencing into clinical practice: are we there yet? Semin Oncol 2012; 39:26-36. [PMID: 22289489 DOI: 10.1053/j.seminoncol.2011.11.008] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Next-generation sequencing (NGS) platforms have evolved to provide an accurate and comprehensive means for the detection of molecular mutations in heterogeneous tumor specimens. Here, we review potential applications of this novel laboratory technology. In particular, we focus on the utility of amplicon deep-sequencing assays in characterizing myeloid neoplasms where the number of molecular markers applied for disease classification, patient stratification, and individualized monitoring of minimal residual disease is constantly increasing. We highlight the potential of this technology by discussing data from a recent study on chronic myelomonocytic leukemia (CMML). Although many facets of this assay need to be taken into account, eg, the preparation of sequencing libraries with molecular barcodes, specific experimental design options when considering sequencing coverage to calculate diagnostic sensitivity, or the use of suitable software and data processing solutions to obtain accurate results, amplicon deep-sequencing has already demonstrated a promising technical performance that warrants the further development towards a routine application of this technology in diagnostic laboratories so that an impact on clinical practice can be achieved.
Collapse
|
17
|
Abstract
DNA methylation is an epigenetic form of gene regulation that is universally important throughout the life course, especially during in utero and postnatal development. DNA methylation aids in cell cycle regulation and cellular differentiation processes. Previous studies have demonstrated that DNA methylation profiles may be altered by diet and the environment, and that these profiles are especially vulnerable during development. Thus, it is important to understand the role of DNA methylation in developmental governance and subsequent disease progression. A variety of molecular methods exist to assay for global, gene-specific, and epigenome-wide methylation. Here we describe these methods and discuss their relative strengths and limitations.
Collapse
Affiliation(s)
- Karilyn E Sant
- Department of Environmental Health Sciences, University of Michigan, Ann Arbor, MI, USA
| | | | | |
Collapse
|
18
|
Tripathy S, Jiang RHY. Massively parallel sequencing technology in pathogenic microbes. Methods Mol Biol 2012; 835:271-94. [PMID: 22183660 DOI: 10.1007/978-1-61779-501-5_17] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Next-Generation Sequencing (NGS) methods have revolutionized various aspects of genomics including transcriptome analysis. Digital expression analysis is all set to replace analog expression analysis that uses microarray chips through their cost-effectiveness, reproducibility, accuracy, and speed. The last 2 years have seen a surge in the development of statistical methods and software tools for analysis and visualization of NGS data. Large amounts of NGS data are available for pathogenic fungi and oomycetes. As the analysis results start pouring in, it brings about a paradigm shift in the understanding of host pathogen interactions with discovery of new transcripts, splice variants, mutations, regulatory elements, and epigenetic controls. Here we describe the core technology of the new sequencing platforms, the methodology of data analysis, and different aspects of applications.
Collapse
Affiliation(s)
- Sucheta Tripathy
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA.
| | | |
Collapse
|
19
|
Łabaj PP, Leparc GG, Linggi BE, Markillie LM, Wiley HS, Kreil DP. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. ACTA ACUST UNITED AC 2011; 27:i383-91. [PMID: 21685096 PMCID: PMC3117338 DOI: 10.1093/bioinformatics/btr247] [Citation(s) in RCA: 110] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Motivation: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. Results: We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, <30% of all transcripts could be quantified reliably with a relative error <20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. Contact:rnaseq10@boku.ac.at Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paweł P Łabaj
- Boku University Vienna, 1190 Muthgasse 18, Vienna, Austria
| | | | | | | | | | | |
Collapse
|
20
|
Abstract
Personalized medicine in the treatment of cancer is based on the recognition that molecularly targeted therapies are most effective in patients whose tumors carry specific genetic or genomic alterations. These alterations, which often activate oncogenes that encode the components of signal transduction pathways, serve as predictive markers for sensitivity or resistance of individual tumors to drugs that target such pathways. In the recent past, individual mutations and other changes within tumors have been assayed to determine the likelihood of response or nonresponse to specific targeted therapies. However, with the development of increasing numbers of molecularly targeted drugs, attention has shifted to high-throughput testing of tumors for dozens of predictive markers. This approach to predictive testing has been termed molecular tumor profiling. This review describes the background to this field, the principal markers analyzed, and the methodologies that are being utilized or are under development for tumor profiling.
Collapse
|
21
|
Kim SY, Lohmueller KE, Albrechtsen A, Li Y, Korneliussen T, Tian G, Grarup N, Jiang T, Andersen G, Witte D, Jorgensen T, Hansen T, Pedersen O, Wang J, Nielsen R. Estimation of allele frequency and association mapping using next-generation sequencing data. BMC Bioinformatics 2011; 12:231. [PMID: 21663684 PMCID: PMC3212839 DOI: 10.1186/1471-2105-12-231] [Citation(s) in RCA: 127] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2011] [Accepted: 06/11/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates. RESULTS We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data. CONCLUSIONS Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.
Collapse
Affiliation(s)
- Su Yeon Kim
- Departments of Integrative Biology and Statistics, UC Berkeley, Berkeley, CA 94720, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Abstract
Next-generation sequencing platforms are dramatically reducing the cost of DNA sequencing. With these technologies, bases are inferred from light intensity signals, a process commonly referred to as base-calling. Thus, understanding and improving the quality of sequence data generated using these approaches are of high interest. Recently, a number of papers have characterized the biases associated with base-calling and proposed methodological improvements. In this review, we summarize recent development of base-calling approaches for the Illumina and Roche 454 sequencing platforms.
Collapse
|