1
|
Kalayinia S, Goodarzynejad H, Maleki M, Mahdieh N. Next generation sequencing applications for cardiovascular disease. Ann Med 2018; 50:91-109. [PMID: 29027470 DOI: 10.1080/07853890.2017.1392595] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Abstract
The Human Genome Project (HGP), as the primary sequencing of the human genome, lasted more than one decade to be completed using the traditional Sanger's method. At present, next-generation sequencing (NGS) technology could provide the genome sequence data in hours. NGS has also decreased the expense of sequencing; therefore, nowadays it is possible to carry out both whole-genome (WGS) and whole-exome sequencing (WES) for the variations detection in patients with rare genetic diseases as well as complex disorders such as common cardiovascular diseases (CVDs). Finding new variants may contribute to establishing a risk profile for the pathology process of diseases. Here, recent applications of NGS in cardiovascular medicine are discussed; both Mendelian disorders of the cardiovascular system and complex genetic CVDs including inherited cardiomyopathy, channelopathies, stroke, coronary artery disease (CAD) and are considered. We also state some future use of NGS in clinical practice for increasing our information about the CVDs genetics and the limitations of this new technology. Key messages Traditional Sanger's method was the mainstay for Human Genome Project (HGP); Sanger sequencing has high fidelity but is slow and costly as compared to next generation methods. Within cardiovascular medicine, NGS has been shown to be successful in identifying novel causative mutations and in the diagnosis of Mendelian diseases which are caused by a single variant in a single gene. NGS has provided the opportunity to perform parallel analysis of a great number of genes in an unbiased approach (i.e. without knowing the underlying biological mechanism) which probably contribute to advance our knowledge regarding the pathology of complex diseases such as CVD.
Collapse
Affiliation(s)
- Samira Kalayinia
- a Cardiogenetic Research Laboratory , Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences , Tehran , Iran
| | | | - Majid Maleki
- a Cardiogenetic Research Laboratory , Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences , Tehran , Iran
| | - Nejat Mahdieh
- a Cardiogenetic Research Laboratory , Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences , Tehran , Iran
| |
Collapse
|
2
|
Interactive knowledge discovery and data mining on genomic expression data with numeric formal concept analysis. BMC Bioinformatics 2016; 17:374. [PMID: 27628041 PMCID: PMC5024470 DOI: 10.1186/s12859-016-1234-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 09/01/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene Expression Data (GED) analysis poses a great challenge to the scientific community that can be framed into the Knowledge Discovery in Databases (KDD) and Data Mining (DM) paradigm. Biclustering has emerged as the machine learning method of choice to solve this task, but its unsupervised nature makes result assessment problematic. This is often addressed by means of Gene Set Enrichment Analysis (GSEA). RESULTS We put forward a framework in which GED analysis is understood as an Exploratory Data Analysis (EDA) process where we provide support for continuous human interaction with data aiming at improving the step of hypothesis abduction and assessment. We focus on the adaptation to human cognition of data interpretation and visualization of the output of EDA. First, we give a proper theoretical background to bi-clustering using Lattice Theory and provide a set of analysis tools revolving around [Formula: see text]-Formal Concept Analysis ([Formula: see text]-FCA), a lattice-theoretic unsupervised learning technique for real-valued matrices. By using different kinds of cost structures to quantify expression we obtain different sequences of hierarchical bi-clusterings for gene under- and over-expression using thresholds. Consequently, we provide a method with interleaved analysis steps and visualization devices so that the sequences of lattices for a particular experiment summarize the researcher's vision of the data. This also allows us to define measures of persistence and robustness of biclusters to assess them. Second, the resulting biclusters are used to index external omics databases-for instance, Gene Ontology (GO)-thus offering a new way of accessing publicly available resources. This provides different flavors of gene set enrichment against which to assess the biclusters, by obtaining their p-values according to the terminology of those resources. We illustrate the exploration procedure on a real data example confirming results previously published. CONCLUSIONS The GED analysis problem gets transformed into the exploration of a sequence of lattices enabling the visualization of the hierarchical structure of the biclusters with a certain degree of granularity. The ability of FCA-based bi-clustering methods to index external databases such as GO allows us to obtain a quality measure of the biclusters, to observe the evolution of a gene throughout the different biclusters it appears in, to look for relevant biclusters-by observing their genes and what their persistence is-to infer, for instance, hypotheses on their function.
Collapse
|
3
|
van der Weide RH, Simonis M, Hermsen R, Toonen P, Cuppen E, de Ligt J. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats. PLoS One 2016; 11:e0160036. [PMID: 27501045 PMCID: PMC4976967 DOI: 10.1371/journal.pone.0160036] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 07/12/2016] [Indexed: 01/17/2023] Open
Abstract
Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts.
Collapse
Affiliation(s)
- Robin H. van der Weide
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), University Medical Centre Utrecht, Utrecht, The Netherlands
- Division of Gene Regulation, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Marieke Simonis
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Roel Hermsen
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Pim Toonen
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Edwin Cuppen
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Joep de Ligt
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), University Medical Centre Utrecht, Utrecht, The Netherlands
| |
Collapse
|
4
|
Petric RC, Pop LA, Jurj A, Raduly L, Dumitrascu D, Dragos N, Neagoe IB. Next generation sequencing applications for breast cancer research. ACTA ACUST UNITED AC 2015; 88:278-87. [PMID: 26609257 PMCID: PMC4632883 DOI: 10.15386/cjmed-486] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Revised: 06/26/2015] [Accepted: 06/30/2015] [Indexed: 12/19/2022]
Abstract
For some time, cancer has not been thought of as a disease, but as a multifaceted, heterogeneous complex of genotypic and phenotypic manifestations leading to tumorigenesis. Due to recent technological progress, the outcome of cancer patients can be greatly improved by introducing in clinical practice the advantages brought about by the development of next generation sequencing techniques. Biomedical suppliers have come up with various applications which medical researchers can use to characterize a patient’s disease from molecular and genetic point of view in order to provide caregivers with rapid and relevant information to guide them in choosing the most appropriate course of treatment, with maximum efficiency and minimal side effects. Breast cancer, whose incidence has risen dramatically, is a good candidate for these novel diagnosis and therapeutic approaches, particularly when referring to specific sequencing panels which are designed to detect germline or somatic mutations in genes that are involved in breast cancer tumorigenesis and progression. Benchtop next generation sequencing machines are becoming a more common presence in the clinical setting, empowering physicians to better treat their patients, by offering early diagnosis alternatives, targeted remedies, and bringing medicine a step closer to achieving its ultimate goal, personalized therapy.
Collapse
Affiliation(s)
- Roxana Cojocneanu Petric
- Functional Genomics, Proteomics and Experimental Pathology Department, Prof. Dr. I. Chiricuta Oncology Institute, Cluj-Napoca, Romania ; Research Center for Functional Genomics, Biomedicine and Translational Medicine, Iuliu Hatieganu University of Medicine and Pharmacy, Cluj-Napoca, Romania ; Faculty of Biology and Geology, Babes Bolyai Univesity, Cluj-Napoca, Romania
| | - Laura-Ancuta Pop
- Functional Genomics, Proteomics and Experimental Pathology Department, Prof. Dr. I. Chiricuta Oncology Institute, Cluj-Napoca, Romania ; Research Center for Functional Genomics, Biomedicine and Translational Medicine, Iuliu Hatieganu University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Ancuta Jurj
- Functional Genomics, Proteomics and Experimental Pathology Department, Prof. Dr. I. Chiricuta Oncology Institute, Cluj-Napoca, Romania
| | - Lajos Raduly
- Functional Genomics, Proteomics and Experimental Pathology Department, Prof. Dr. I. Chiricuta Oncology Institute, Cluj-Napoca, Romania ; University of Agricultural Sciences and Veterinary Medicine, Cluj-Napoca, Romania
| | - Dan Dumitrascu
- 2nd Department of Internal Medicine, Iuliu Hatieganu University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Nicolae Dragos
- Taxonomy and Ecology Department, NIRDBS - Institute of Biological Research, Cluj-Napoca, Romania
| | - Ioana Berindan Neagoe
- Functional Genomics, Proteomics and Experimental Pathology Department, Prof. Dr. I. Chiricuta Oncology Institute, Cluj-Napoca, Romania ; Research Center for Functional Genomics, Biomedicine and Translational Medicine, Iuliu Hatieganu University of Medicine and Pharmacy, Cluj-Napoca, Romania ; Department of Experimental Therapeutics, MD Anderson Cancer Center, Houston, Texas, USA ; Department of Immunology, Iuliu Hatieganu University of Medicine and Pharmacy, Cluj-Napoca, Romania
| |
Collapse
|
5
|
Abstract
High-throughput sequencing (HTS) methods for analyzing RNA populations (RNA-Seq) are gaining rapid application to many experimental situations. The steps in an RNA-Seq experiment require thought and planning, especially because the expense in time and materials is currently higher and the protocols are far less routine than those used for other high-throughput methods, such as microarrays. As always, good experimental design will make analysis and interpretation easier. Having a clear biological question, an idea about the best way to do the experiment, and an understanding of the number of replicates needed will make the entire process more satisfying. Whether the goal is capturing transcriptome complexity from a tissue or identifying small fragments of RNA cross-linked to a protein of interest, conversion of the RNA to cDNA followed by direct sequencing using the latest methods is a developing practice, with new technical modifications and applications appearing every day. Even more rapid are the development and improvement of methods for analysis of the very large amounts of data that arrive at the end of an RNA-Seq experiment, making considerations regarding reproducibility, validation, visualization, and interpretation increasingly important. This introduction is designed to review and emphasize a pathway of analysis from experimental design through data presentation that is likely to be successful, with the recognition that better methods are right around the corner.
Collapse
|
6
|
Penchovsky R. Engineering Gene Control Circuits with Allosteric Ribozymes in Human Cells as a Medicine of the Future. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Systems and synthetic biology promise to develop new approaches for analysis and design of complex gene expression regulatory networks in living cells with many practical applications to the pharmaceutical and biotech industries. In this chapter the development of novel universal strategies for exogenous control of gene expression is discussed. They are based on designer allosteric ribozymes that can function in the cell. The synthetic riboswitches are obtained by a patented computational procedure that provides fast and accurate modular designs with various Boolean logic functions. The riboswitches can be designed to sense in the cell either the presence or the absence of disease indicative RNA(s) or small molecules, and to switch on or off the gene expression of any exogenous protein. In addition, the riboswitches can be engineered to induce RNA interference or microRNA pathways that can conditionally down regulate the expression of key proteins in the cell. That can prevent a disease’s development. Therefore, the presented synthetic riboswitches can be used as truly universal cellular biosensors. Nowadays, disease indicative RNA(s) can be precisely identified by employing next-generation sequencing technologies with high accuracy . The methods can be employed not only for exogenous control of gene expression but also for re-programming the cell fate, anticancer, and antiviral gene therapies. Such approaches may be employed as potent molecular medicines of the future.
Collapse
|
7
|
Rowe SJ, Tenesa A. Human complex trait genetics: lifting the lid of the genomics toolbox - from pathways to prediction. Curr Genomics 2012; 13:213-24. [PMID: 23115523 PMCID: PMC3382276 DOI: 10.2174/138920212800543101] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2011] [Revised: 09/09/2011] [Accepted: 10/05/2011] [Indexed: 01/09/2023] Open
Abstract
During the initial stages of the genome revolution human genetics was hugely successful in discovering the underlying genes for monogenic diseases. Over 3,000 monogenic diseases have been discovered with simple patterns of inheritance. The unravelling and identification of the genetic variants underlying complex or multifactorial traits, however, is proving much more elusive. There have been over 1,000 significant variants found for many quantitative and binary traits yet they explain very little of the estimated genetic variance or heritability evident from family analysis. There are many hypotheses as to why this might be the case. This apparent lack of information is holding back the clinical application of genetics and shedding doubt on whether more of the same will reveal where the remainder of the variation lies. Here we explore the current state of play, the types of variants we can detect and how they are currently exploited. Finally we look at the future challenges we must face to persuade the human genome to yield its secrets.
Collapse
Affiliation(s)
- Suzanne J Rowe
- The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, Scotland, UK
| | | |
Collapse
|
8
|
High throughput sequencing approaches to mutation discovery in the mouse. Mamm Genome 2012; 23:499-513. [PMID: 22991087 DOI: 10.1007/s00335-012-9424-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Accepted: 07/19/2012] [Indexed: 12/19/2022]
Abstract
Phenotype-driven approaches in mice are powerful strategies for the discovery of genes and gene functions and for unravelling complex biological mechanisms. Traditional methods for mutation discovery are reliable and robust, but they can also be laborious and time consuming. Recently, high-throughput sequencing (HTS) technologies have revolutionised the process of forward genetics in mice by paving the way to rapid mutation discovery. However, successful application of HTS for mutation discovery relies heavily on the sequencing approach employed and strategies for data analysis. Here we review current HTS applications and resources for mutation discovery and provide an overview of the practical considerations for HTS implementation and data analysis.
Collapse
|
9
|
Wang Q, Xia J, Jia P, Pao W, Zhao Z. Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives. Brief Bioinform 2012; 14:506-19. [PMID: 22877769 DOI: 10.1093/bib/bbs044] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Gene fusions are important genomic events in human cancer because their fusion gene products can drive the development of cancer and thus are potential prognostic tools or therapeutic targets in anti-cancer treatment. Major advancements have been made in computational approaches for fusion gene discovery over the past 3 years due to improvements and widespread applications of high-throughput next generation sequencing (NGS) technologies. To identify fusions from NGS data, existing methods typically leverage the strengths of both sequencing technologies and computational strategies. In this article, we review the NGS and computational features of existing methods for fusion gene detection and suggest directions for future development.
Collapse
|
10
|
Zhang T, Luo Y, Liu K, Pan L, Zhang B, Yu J, Hu S. BIGpre: a quality assessment package for next-generation sequencing data. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 9:238-44. [PMID: 22289480 PMCID: PMC5054156 DOI: 10.1016/s1672-0229(11)60027-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2011] [Accepted: 11/23/2011] [Indexed: 11/25/2022]
Abstract
The emergence of next-generation sequencing (NGS) technologies has significantly improved sequencing throughput and reduced costs. However, the short read length, duplicate reads and massive volume of data make the data processing much more difficult and complicated than the first-generation sequencing technology. Although there are some software packages developed to assess the data quality, those packages either are not easily available to users or require bioinformatics skills and computer resources. Moreover, almost all the quality assessment software currently available didn’t taken into account the sequencing errors when dealing with the duplicate assessment in NGS data. Here, we present a new user-friendly quality assessment software package called BIGpre, which works for both Illumina and 454 platforms. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well. BIGpre is primarily written in Perl and integrates graphical capability from the statistics package R. This package produces both tabular and graphical summaries of data quality for sequencing datasets from Illumina and 454 platforms. Processing hundreds of millions reads within minutes, this package provides immediate diagnostic information for user to manipulate sequencing data for downstream analyses. BIGpre is freely available at http://bigpre.sourceforge.net/.
Collapse
Affiliation(s)
- Tongwu Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
- James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 31007, China
| | - Yingfeng Luo
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Kan Liu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Linlin Pan
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Bing Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Songnian Hu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
- Corresponding author.
| |
Collapse
|
11
|
Egan AN, Schlueter J, Spooner DM. Applications of next-generation sequencing in plant biology. AMERICAN JOURNAL OF BOTANY 2012; 99:175-85. [PMID: 22312116 DOI: 10.3732/ajb.1200020] [Citation(s) in RCA: 140] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
The last several years have seen revolutionary advances in DNA sequencing technologies with the advent of next-generation sequencing (NGS) techniques. NGS methods now allow millions of bases to be sequenced in one round, at a fraction of the cost relative to traditional Sanger sequencing. As costs and capabilities of these technologies continue to improve, we are only beginning to see the possibilities of NGS platforms, which are developing in parallel with online availability of a wide range of biological data sets and scientific publications and allowing us to address a variety of questions not possible before. As techniques and data sets continue to improve and grow, we are rapidly moving to the point where every organism, not just select "model organisms", is open to the power of NGS. This volume presents a brief synopsis of NGS technologies and the development of exemplary applications of such methods in the fields of molecular marker development, hybridization and introgression, transcriptome investigations, phylogenetic and ecological studies, polyploid genetics, and applications for large genebank collections.
Collapse
Affiliation(s)
- Ashley N Egan
- East Carolina University, Department of Biology, Howell Science Complex N303a, Mailstop 551, Greenville, North Carolina 27858, USA.
| | | | | |
Collapse
|
12
|
Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ. WITHDRAWN: Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet 2011:jhg201162. [PMID: 21677664 DOI: 10.1038/jhg.2011.62] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages, when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields and further provided advices on selecting suitable tools for specific biological applications.Journal of Human Genetics advance online publication, 16 June 2011; doi:10.1038/jhg.2011.62.
Collapse
Affiliation(s)
- Suying Bao
- Department of Biochemistry, Center for Reproduction, Development and Growth, The University of Hong Kong, Hong Kong
| | | | | | | | | | | |
Collapse
|
13
|
Abstract
Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields, and further provided advices on selecting suitable tools for specific biological applications.
Collapse
|
14
|
Fontanillas P, Landry CR, Wittkopp PJ, Russ C, Gruber JD, Nusbaum C, Hartl DL. Key considerations for measuring allelic expression on a genomic scale using high-throughput sequencing. Mol Ecol 2010; 19 Suppl 1:212-27. [PMID: 20331781 DOI: 10.1111/j.1365-294x.2010.04472.x] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Differences in gene expression are thought to be an important source of phenotypic diversity, so dissecting the genetic components of natural variation in gene expression is important for understanding the evolutionary mechanisms that lead to adaptation. Gene expression is a complex trait that, in diploid organisms, results from transcription of both maternal and paternal alleles. Directly measuring allelic expression rather than total gene expression offers greater insight into regulatory variation. The recent emergence of high-throughput sequencing offers an unprecedented opportunity to study allelic transcription at a genomic scale for virtually any species. By sequencing transcript pools derived from heterozygous individuals, estimates of allelic expression can be directly obtained. The statistical power of this approach is influenced by the number of transcripts sequenced and the ability to unambiguously assign individual sequence fragments to specific alleles on the basis of transcribed nucleotide polymorphisms. Here, using mathematical modelling and computer simulations, we determine the minimum sequencing depth required to accurately measure relative allelic expression and detect allelic imbalance via high-throughput sequencing under a variety of conditions. We conclude that, within a species, a minimum of 500-1000 sequencing reads per gene are needed to test for allelic imbalance, and consequently, at least five to 10 millions reads are required for studying a genome expressing 10 000 genes. Finally, using 454 sequencing, we illustrate an application of allelic expression by testing for cis-regulatory divergence between closely related Drosophila species.
Collapse
Affiliation(s)
- Pierre Fontanillas
- Department of Ecology and Evolution, University of Lausanne, Switzerland.
| | | | | | | | | | | | | |
Collapse
|
15
|
Horner DS, Pavesi G, Castrignano T, De Meo PD, Liuni S, Sammeth M, Picardi E, Pesole G. Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing. Brief Bioinform 2009; 11:181-97. [DOI: 10.1093/bib/bbp046] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|