101
|
Guo MH, Dauber A, Lippincott MF, Chan YM, Salem RM, Hirschhorn JN. Determinants of Power in Gene-Based Burden Testing for Monogenic Disorders. Am J Hum Genet 2016; 99:527-539. [PMID: 27545677 DOI: 10.1016/j.ajhg.2016.06.031] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Accepted: 06/28/2016] [Indexed: 12/11/2022] Open
Abstract
Whole-exome sequencing has enabled new approaches for discovering genes associated with monogenic disorders. One such approach is gene-based burden testing, in which the aggregate frequency of "qualifying variants" is compared between case and control subjects for each gene. Despite substantial successes of this approach, the genetic causes for many monogenic disorders remain unknown or only partially known. It is possible that particular genetic architectures lower rates of discovery, but the influence of these factors on power has not been rigorously evaluated. Here, we leverage large-scale exome-sequencing data to create an empirically based simulation framework to evaluate the impact of key parameters (background variation rates, locus heterogeneity, mode of inheritance, penetrance) on power in gene-based burden tests in the context of monogenic disorders. Our results demonstrate that across genes, there is a wide range in sample sizes needed to achieve power due to differences in the background rate of rare variants in each gene. Increasing locus heterogeneity results in rapid increases in sample sizes needed to achieve adequate power, particularly when individual genes contribute to less than 5% of cases under a dominant model. Interestingly, incomplete penetrance as low as 10% had little effect on power due to the low prevalence of monogenic disorders. Our results suggest that moderate incomplete penetrance is not an obstacle in this gene-based burden testing approach but that dominant disorders with high locus heterogeneity will require large sample sizes. Our simulations also provide guidance on sample size needs and inform study design under various genetic architectures.
Collapse
Affiliation(s)
- Michael H Guo
- Division of Endocrinology, Department of Medicine, Boston Children's Hospital, Boston, MA 02115, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA; Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA 02115, USA
| | - Andrew Dauber
- Division of Endocrinology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Margaret F Lippincott
- Harvard Reproductive Sciences Center and Reproductive Endocrine Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Yee-Ming Chan
- Division of Endocrinology, Department of Medicine, Boston Children's Hospital, Boston, MA 02115, USA; Harvard Reproductive Sciences Center and Reproductive Endocrine Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Rany M Salem
- Division of Endocrinology, Department of Medicine, Boston Children's Hospital, Boston, MA 02115, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA; Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA 02115, USA
| | - Joel N Hirschhorn
- Division of Endocrinology, Department of Medicine, Boston Children's Hospital, Boston, MA 02115, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA; Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA 02115, USA.
| |
Collapse
|
102
|
Representing genetic variation with synthetic DNA standards. Nat Methods 2016; 13:784-91. [PMID: 27502217 DOI: 10.1038/nmeth.3957] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 06/28/2016] [Indexed: 12/16/2022]
Abstract
The identification of genetic variation with next-generation sequencing is confounded by the complexity of the human genome sequence and by biases that arise during library preparation, sequencing and analysis. We have developed a set of synthetic DNA standards, termed 'sequins', that emulate human genetic features and constitute qualitative and quantitative spike-in controls for genome sequencing. Sequencing reads derived from sequins align exclusively to an artificial in silico reference chromosome, rather than the human reference genome, which allows them them to be partitioned for parallel analysis. Here we use this approach to represent common and clinically relevant genetic variation, ranging from single nucleotide variants to large structural rearrangements and copy-number variation. We validate the design and performance of sequin standards by comparison to examples in the NA12878 reference genome, and we demonstrate their utility during the detection and quantification of variants. We provide sequins as a standardized, quantitative resource against which human genetic variation can be measured and diagnostic performance assessed.
Collapse
|
103
|
Petrovski S, Goldstein DB. Unequal representation of genetic variation across ancestry groups creates healthcare inequality in the application of precision medicine. Genome Biol 2016; 17:157. [PMID: 27418169 PMCID: PMC4944427 DOI: 10.1186/s13059-016-1016-y] [Citation(s) in RCA: 135] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
An important application of modern genomics is diagnosing genetic disorders. We use the largest publicly available exome sequence database to show that this key clinical service can currently be performed much more effectively in individuals of European genetic ancestry.
Collapse
Affiliation(s)
- Slavé Petrovski
- Institute for Genomic Medicine, Columbia University, New York, New York, USA. .,Department of Medicine, The University of Melbourne, Austin Health and Royal Melbourne Hospital, Melbourne, Victoria, Australia.
| | - David B Goldstein
- Institute for Genomic Medicine, Columbia University, New York, New York, USA.
| |
Collapse
|
104
|
Akle S, Chun S, Jordan DM, Cassa CA. Mitigating false-positive associations in rare disease gene discovery. Hum Mutat 2016; 36:998-1003. [PMID: 26378430 DOI: 10.1002/humu.22847] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Accepted: 07/19/2015] [Indexed: 11/09/2022]
Abstract
Clinical sequencing is expanding, but causal variants are still not identified in the majority of cases. These unsolved cases can aid in gene discovery when individuals with similar phenotypes are identified in systems such as the Matchmaker Exchange. We describe risks for gene discovery in this growing set of unsolved cases. In a set of rare disease cases with the same phenotype, it is not difficult to find two individuals with the same phenotype that carry variants in the same gene. We quantify the risk of false-positive association in a cohort of individuals with the same phenotype, using the prior probability of observing a variant in each gene from over 60,000 individuals (Exome Aggregation Consortium). Based on the number of individuals with a genic variant, cohort size, specific gene, and mode of inheritance, we calculate a P value that the match represents a true association. A match in two of 10 patients in MECP2 is statistically significant (P = 0.0014), whereas a match in TTN would not reach significance, as expected (P > 0.999). Finally, we analyze the probability of matching in clinical exome cases to estimate the number of cases needed to identify genes related to different disorders. We offer Rare Disease Match, an online tool to mitigate the uncertainty of false-positive associations.
Collapse
Affiliation(s)
- Sebastian Akle
- Department of Organismic and Evolutionary Biology, Harvard University, Boston, MA.,Division of Genetics, Brigham and Women's Hospital, Boston, MA
| | - Sung Chun
- Division of Genetics, Brigham and Women's Hospital, Boston, MA.,Department of Medicine, Harvard Medical School, Boston, MA
| | - Daniel M Jordan
- Division of Genetics, Brigham and Women's Hospital, Boston, MA.,Department of Medicine, Harvard Medical School, Boston, MA
| | - Christopher A Cassa
- Division of Genetics, Brigham and Women's Hospital, Boston, MA.,Department of Medicine, Harvard Medical School, Boston, MA
| |
Collapse
|
105
|
Zhou K, Pedersen HK, Dawed AY, Pearson ER. Pharmacogenomics in diabetes mellitus: insights into drug action and drug discovery. Nat Rev Endocrinol 2016; 12:337-46. [PMID: 27062931 DOI: 10.1038/nrendo.2016.51] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Genomic studies have greatly advanced our understanding of the multifactorial aetiology of type 2 diabetes mellitus (T2DM) as well as the multiple subtypes of monogenic diabetes mellitus. In this Review, we discuss the existing pharmacogenetic evidence in both monogenic diabetes mellitus and T2DM. We highlight mechanistic insights from the study of adverse effects and the efficacy of antidiabetic drugs. The identification of extreme sulfonylurea sensitivity in patients with diabetes mellitus owing to heterozygous mutations in HNF1A represents a clear example of how pharmacogenetics can direct patient care. However, pharmacogenomic studies of response to antidiabetic drugs in T2DM has yet to be translated into clinical practice, although some moderate genetic effects have now been described that merit follow-up in trials in which patients are selected according to genotype. We also discuss how future pharmacogenomic findings could provide insights into treatment response in diabetes mellitus that, in addition to other areas of human genetics, facilitates drug discovery and drug development for T2DM.
Collapse
Affiliation(s)
- Kaixin Zhou
- School of Medicine, University of Dundee, Dundee, DD1 9SY, UK
| | - Helle Krogh Pedersen
- Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Adem Y Dawed
- School of Medicine, University of Dundee, Dundee, DD1 9SY, UK
| | - Ewan R Pearson
- School of Medicine, University of Dundee, Dundee, DD1 9SY, UK
| |
Collapse
|
106
|
The Impact on Genetic Testing of Mutational Patterns of CFTR Gene in Different Clinical Macrocategories of Cystic Fibrosis. J Mol Diagn 2016; 18:554-65. [PMID: 27157324 DOI: 10.1016/j.jmoldx.2016.02.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Revised: 01/29/2016] [Accepted: 02/22/2016] [Indexed: 12/19/2022] Open
Abstract
More than 2000 sequence variations of the cystic fibrosis transmembrane conductance regulator gene are known. The marked genetic heterogeneity, poor functional characterization of the vast majority of sequence variations, and an uncertain genotype-phenotype relationship complicate the definition of mutational search strategies. We studied the effect of the marked genetic heterogeneity detected in a case series comprising 610 patients of cystic fibrosis (CF), grouped in different clinical macrocategories, on the operative characteristics of the genetic test designed to fully characterize CF patients. The detection rate in each clinical macrocategory and at each mutational step was found to be influenced by genetic heterogeneity. The definition of a single mutational panel that is suitable for all clinical macrocategories proved impossible. Only for classic CF with pancreas insufficiency did a reduced number of mutations yield a detection rate of diagnostic value. All other clinical macrocategories required an extensive genetic search. The search for specific mutational classes appears to be useful only in specific CF clinical forms. A flowchart defining a mutational search that may be adopted for different CF clinical forms, optimized in respect to those already available, is proposed. The findings also have consequences for carrier screening strategies.
Collapse
|
107
|
Dopazo J, Amadoz A, Bleda M, Garcia-Alonso L, Alemán A, García-García F, Rodriguez JA, Daub JT, Muntané G, Rueda A, Vela-Boza A, López-Domingo FJ, Florido JP, Arce P, Ruiz-Ferrer M, Méndez-Vidal C, Arnold TE, Spleiss O, Alvarez-Tejado M, Navarro A, Bhattacharya SS, Borrego S, Santoyo-López J, Antiñolo G. 267 Spanish Exomes Reveal Population-Specific Differences in Disease-Related Genetic Variation. Mol Biol Evol 2016; 33:1205-18. [PMID: 26764160 PMCID: PMC4839216 DOI: 10.1093/molbev/msw005] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Recent results from large-scale genomic projects suggest that allele frequencies, which are highly relevant for medical purposes, differ considerably across different populations. The need for a detailed catalog of local variability motivated the whole-exome sequencing of 267 unrelated individuals, representative of the healthy Spanish population. Like in other studies, a considerable number of rare variants were found (almost one-third of the described variants). There were also relevant differences in allelic frequencies in polymorphic variants, including ∼10,000 polymorphisms private to the Spanish population. The allelic frequencies of variants conferring susceptibility to complex diseases (including cancer, schizophrenia, Alzheimer disease, type 2 diabetes, and other pathologies) were overall similar to those of other populations. However, the trend is the opposite for variants linked to Mendelian and rare diseases (including several retinal degenerative dystrophies and cardiomyopathies) that show marked frequency differences between populations. Interestingly, a correspondence between differences in allelic frequencies and disease prevalence was found, highlighting the relevance of frequency differences in disease risk. These differences are also observed in variants that disrupt known drug binding sites, suggesting an important role for local variability in population-specific drug resistances or adverse effects. We have made the Spanish population variant server web page that contains population frequency information for the complete list of 170,888 variant positions we found publicly available (http://spv.babelomics.org/), We show that it if fundamental to determine population-specific variant frequencies to distinguish real disease associations from population-specific polymorphisms.
Collapse
Affiliation(s)
- Joaquín Dopazo
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain Bioinformatics in Rare Diseases (BIER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Valencia, Spain Functional Genomics Node, National Institute of Bioinformatics (INB), Valencia, Spain
| | - Alicia Amadoz
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Marta Bleda
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain Bioinformatics in Rare Diseases (BIER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Luz Garcia-Alonso
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Alejandro Alemán
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain Bioinformatics in Rare Diseases (BIER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Francisco García-García
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Juan A Rodriguez
- Institut De Biologia Evolutiva, Consejo Superior de Investigaciones Científicas - Universitat Pompeu Fabra, Barcelona, Spain
| | - Josephine T Daub
- Institut De Biologia Evolutiva, Consejo Superior de Investigaciones Científicas - Universitat Pompeu Fabra, Barcelona, Spain
| | - Gerard Muntané
- Institut De Biologia Evolutiva, Consejo Superior de Investigaciones Científicas - Universitat Pompeu Fabra, Barcelona, Spain
| | - Antonio Rueda
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain
| | - Alicia Vela-Boza
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain
| | | | - Javier P Florido
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain
| | - Pablo Arce
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain
| | - Macarena Ruiz-Ferrer
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain Department of Genetics, Reproduction and Fetal Medicine, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/Consejo Superior de Investigaciones Científicas/University of Seville, Sevilla, Spain Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla, Spain
| | - Cristina Méndez-Vidal
- Department of Genetics, Reproduction and Fetal Medicine, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/Consejo Superior de Investigaciones Científicas/University of Seville, Sevilla, Spain Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla, Spain
| | - Todd E Arnold
- Research and Development, 454 Life Sciences, a Roche Company, Branford, CT, USA
| | - Olivia Spleiss
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, Basel, Switzerland
| | | | - Arcadi Navarro
- Departament of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain Institució Catalana de Recerca I Estudis Avançats (ICREA), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain Center for Genomic Regulation (CRG), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain
| | - Shomi S Bhattacharya
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain Andalusian Molecular Biology and Regenerative Medicine Centre (CABIMER), Sevilla, Spain
| | - Salud Borrego
- Department of Genetics, Reproduction and Fetal Medicine, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/Consejo Superior de Investigaciones Científicas/University of Seville, Sevilla, Spain Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla, Spain
| | - Javier Santoyo-López
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain
| | - Guillermo Antiñolo
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain Department of Genetics, Reproduction and Fetal Medicine, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/Consejo Superior de Investigaciones Científicas/University of Seville, Sevilla, Spain Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla, Spain
| |
Collapse
|
108
|
Itan Y. Evolutionary Genomics. Evol Bioinform Online 2016; 11:53-5. [PMID: 27127402 PMCID: PMC4841156 DOI: 10.4137/ebo.s39729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This supplement is intended to focus on evolutionary genomics. Evolutionary Bioinformatics aims to provide researchers working in this complex, quickly developing field with online, open access to highly relevant scholarly articles by leading international researchers. In a field where the literature is ever-expanding, researchers increasingly need access to up-to-date, high quality scholarly articles on areas of specific contemporary interest. This supplement aims to address this by presenting high-quality articles that allow readers to distinguish the signal from the noise. The editor in chief hopes that through this effort, practitioners and researchers will be aided in finding answers to some of the most complex and pressing issues of our time.
Collapse
Affiliation(s)
- Yuval Itan
- Human Genetics of Infectious Diseases, The Rockefeller University, New York, NY, USA
| |
Collapse
|
109
|
Abstract
In recent years, genome and exome sequencing studies have implicated a plethora of new disease genes with rare causal variants. Here, I review 150 exome sequencing studies that claim to have discovered that a disease can be caused by different rare variants in the same gene, and I determine whether their methods followed the current best-practice guidelines in the interpretation of their data. Specifically, I assess whether studies appropriately assess controls for rare variants throughout the entire gene or implicated region as opposed to only investigating the specific rare variants identified in the cases, and I assess whether studies present sufficient co-segregation data for statistically significant linkage. I find that the proportion of studies performing gene-based analyses has increased with time, but that even in 2015 fewer than 40% of the reviewed studies used this method, and only 10% presented statistically significant co-segregation data. Furthermore, I find that the genes reported in these papers are explaining a decreasing proportion of cases as the field moves past most of the low-hanging fruit, with 50% of the genes from studies in 2014 and 2015 having variants in fewer than 5% of cases. As more studies focus on genes explaining relatively few cases, the importance of performing appropriate gene-based analyses is increasing. It is becoming increasingly important for journal editors and reviewers to require stringent gene-based evidence to avoid an avalanche of misleading disease gene discovery papers.
Collapse
Affiliation(s)
- Elizabeth T Cirulli
- Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, North Carolina, United States of America
| |
Collapse
|
110
|
Darnell AJ, Austin H, Bluemke DA, Cannon RO, Fischbeck K, Gahl W, Goldman D, Grady C, Greene MH, Holland SM, Hull SC, Porter FD, Resnik D, Rubinstein WS, Biesecker LG. A Clinical Service to Support the Return of Secondary Genomic Findings in Human Research. Am J Hum Genet 2016; 98:435-441. [PMID: 26942283 PMCID: PMC4800041 DOI: 10.1016/j.ajhg.2016.01.010] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Indexed: 11/28/2022] Open
Abstract
Human genome and exome sequencing are powerful research tools that can generate secondary findings beyond the scope of the research. Most secondary genomic findings are of low importance, but some (for a current estimate of 1%-3% of individuals) confer high risk of a serious disease that could be mitigated by timely medical intervention. The impact and scope of secondary findings in genome and exome sequencing will only increase in the future. There is considerable agreement that high-impact findings should be returned to participants, but many researchers performing genomic research studies do not have the background, skills, or resources to identify, verify, interpret, and return such variants. Here, we introduce a proposal for the formation of a secondary-genomic-findings service (SGFS) that would support researchers by enabling the return of clinically actionable sequencing results to research participants in a standardized manner. We describe a proposed structure for such a centralized service and evaluate the advantages and challenges of the approach. We suggest that such a service would be of greater benefit to all parties involved than present practice, which is highly variable. We encourage research centers to consider the adoption of a centralized SGFS.
Collapse
Affiliation(s)
- Andrew J Darnell
- Program in Science and Society, Duke University, Durham, NC 27710, USA
| | - Howard Austin
- Kidney Disease Section, National Institute of Diabetes, Digestive, and Kidney Diseases, NIH, Bethesda, MD 20892, USA
| | - David A Bluemke
- Radiology and Imaging Sciences, NIH Clinical Center, Bethesda, MD 20892, USA
| | - Richard O Cannon
- Cardiovascular and Pulmonary Branch, National Heart, Lung, and Blood institute, NIH, Bethesda, MD 20892, USA
| | - Kenneth Fischbeck
- Neurogenetics Branch, National Institute of Neurological Disorders and Stroke, NIH, Bethesda, MD 20892, USA
| | - William Gahl
- Office of the Clinical Director, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - David Goldman
- Laboratory of Neurogenetics and Office of the Clinical Director, National Institute of Alcohol Abuse and Alcoholism, NIH, Bethesda, MD 20892, USA
| | - Christine Grady
- Department of Bioethics, Clinical Research Center, NIH, Bethesda, MD 20892, USA
| | - Mark H Greene
- Clinical Genetics Branch, National Cancer Institute, NIH, Bethesda, MD 20892, USA
| | - Steven M Holland
- Laboratory of Clinical Infectious Diseases, National Institute of Allergy and Infectious Disease, NIH, Bethesda, MD 20892, USA
| | - Sara Chandros Hull
- Department of Bioethics, Clinical Research Center, NIH, Bethesda, MD 20892, USA; Bioethics Core, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Forbes D Porter
- Section on Molecular Dysmorphology, National Institute of Child Health and Human Development, NIH, Bethesda, MD 20892, USA
| | - David Resnik
- Office of the Director, National Institute of Environmental Health Sciences, NIH, Bethesda, MD 20892, USA
| | - Wendy S Rubinstein
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20892, USA
| | - Leslie G Biesecker
- Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA.
| |
Collapse
|
111
|
Fallin MD, Duggal P, Beaty TH. Genetic Epidemiology and Public Health: The Evolution From Theory to Technology. Am J Epidemiol 2016; 183:387-93. [PMID: 26905340 DOI: 10.1093/aje/kww001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 01/04/2016] [Indexed: 12/28/2022] Open
Abstract
Genetic epidemiology represents a hybrid of epidemiologic designs and statistical models that explicitly consider both genetic and environmental risk factors for disease. It is a relatively new field in public health; the term was first coined only 35 years ago. In this short time, the field has been through a major evolution, changing from a field driven by theory, without the technology for genetic measurement or computational capacity to apply much of the designs and methods developed, to a field driven by rapidly expanding technology in genomic measurement and computational analyses while epidemiologic theory struggles to keep up. In this commentary, we describe 4 different eras of genetic epidemiology, spanning this evolution from theory to technology, what we have learned, what we have added to the broader field of public health, and what remains to be done.
Collapse
|
112
|
Harris T, Papadopoulos S, Goldstein DB. Academic-industrial partnerships in drug discovery in the age of genomics. Trends Biotechnol 2016; 33:320-2. [PMID: 25987446 DOI: 10.1016/j.tibtech.2015.02.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Revised: 02/09/2015] [Accepted: 02/27/2015] [Indexed: 12/14/2022]
Abstract
Many US FDA-approved drugs have been developed through productive interactions between the biotechnology industry and academia. Technological breakthroughs in genomics, in particular large-scale sequencing of human genomes, is creating new opportunities to understand the biology of disease and to identify high-value targets relevant to a broad range of disorders. However, the scale of the work required to appropriately analyze large genomic and clinical data sets is challenging industry to develop a broader view of what areas of work constitute precompetitive research.
Collapse
Affiliation(s)
- Tim Harris
- Biogen, 225 Binney Street, Cambridge, MA 02142, USA.
| | | | - David B Goldstein
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| |
Collapse
|
113
|
Li J, Cai T, Jiang Y, Chen H, He X, Chen C, Li X, Shao Q, Ran X, Li Z, Xia K, Liu C, Sun ZS, Wu J. Genes with de novo mutations are shared by four neuropsychiatric disorders discovered from NPdenovo database. Mol Psychiatry 2016; 21:290-7. [PMID: 25849321 PMCID: PMC4837654 DOI: 10.1038/mp.2015.40] [Citation(s) in RCA: 115] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2014] [Revised: 02/26/2015] [Accepted: 03/02/2015] [Indexed: 12/16/2022]
Abstract
Currently, many studies on neuropsychiatric disorders have utilized massive trio-based whole-exome sequencing (WES) and whole-genome sequencing (WGS) to identify numerous de novo mutations (DNMs). Here, we retrieved 17,104 DNMs from 3555 trios across four neuropsychiatric disorders: autism spectrum disorder, epileptic encephalopathy, intellectual disability and schizophrenia, in addition to unaffected siblings (control), from 36 studies by WES/WGS. After eliminating non-exonic variants, we focused on 3334 exonic DNMs for evaluation of their association with these diseases. Our results revealed a higher prevalence of DNMs in the probands of all four disorders compared with the one in the controls (P<1.3 × 10(-7)). The elevated DNM frequency is dominated by loss-of-function/deleterious single-nucleotide variants and frameshift indels (that is, extreme mutations, P<4.5 × 10(-5)). With extensive annotation of these 'extreme' mutations, we prioritized 764 candidate genes in these four disorders. A combined analysis of Gene Ontology, microRNA targets and transcription factor targets revealed shared biological process and non-coding regulatory elements of candidate genes in the pathology of neuropsychiatric disorders. In addition, weighted gene co-expression network analysis of human laminar-specific neocortical expression data showed that candidate genes are convergent on eight shared modules with specific layer enrichment and biological process features. Furthermore, we identified that 53 candidate genes are associated with more than one disorder (P<0.000001), suggesting a possibly shared genetic etiology underlying these disorders. Particularly, DNMs of the SCN2A gene are frequently occurred across all four disorders. Finally, we constructed a freely available NPdenovo database, which provides a comprehensive catalog of the DNMs identified in neuropsychiatric disorders.
Collapse
Affiliation(s)
- Jinchen Li
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China,Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China,State Key Laboratory of Medical Genetics, Central South University, Changsha, China
| | - Tao Cai
- Experimental Medicine Section, NIDCR/NIH, Bethesda, Maryland, USA
| | - Yi Jiang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Huiqian Chen
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Xin He
- Department of Human Genetics, University of Chicago, Chicago, USA
| | - Chao Chen
- State Key Laboratory of Medical Genetics, Central South University, Changsha, China,Department of Psychiatry, University of Illinois at Chicago, Chicago, USA
| | - Xianfeng Li
- State Key Laboratory of Medical Genetics, Central South University, Changsha, China
| | - Qianzhi Shao
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Xia Ran
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Zhongshan Li
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Kun Xia
- State Key Laboratory of Medical Genetics, Central South University, Changsha, China
| | - Chunyu Liu
- State Key Laboratory of Medical Genetics, Central South University, Changsha, China,Department of Psychiatry, University of Illinois at Chicago, Chicago, USA,Correspondence should be addressed to: Jinyu Wu (), Zhong Sheng Sun (), or Chunyu Liu ()
| | - Zhong Sheng Sun
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China,Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China,Correspondence should be addressed to: Jinyu Wu (), Zhong Sheng Sun (), or Chunyu Liu ()
| | - Jinyu Wu
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China,Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China,Correspondence should be addressed to: Jinyu Wu (), Zhong Sheng Sun (), or Chunyu Liu ()
| |
Collapse
|
114
|
Abstract
For the first time in the history of human genetics research, it is now both technically feasible and economically affordable to screen individual genomes for novel disease-causing mutations at base-pair resolution using "next-generation sequencing" (NGS). One popular aim in many of today's NGS studies is genome resequencing (in part or whole) to identify DNA variants potentially accounting for the "missing heritability" problem observed in many genetically complex traits. Thus far, only relatively few projects have applied these powerful new technologies to search for novel Alzheimer's disease (AD) related sequence variants. In this review, I summarize the findings from the first NGS-based resequencing studies in AD and discuss their potential implications and limitations. Notable recent discoveries using NGS include the identification of rare susceptibility modifying alleles in APP, TREM2, and PLD3. Several other large-scale NGS projects are currently underway so that additional discoveries can be expected over the coming years.
Collapse
|
115
|
Abstract
Inborn errors of metabolism are single gene disorders resulting from the defects in the biochemical pathways of the body. Although these disorders are individually rare, collectively they account for a significant portion of childhood disability and deaths. Most of the disorders are inherited as autosomal recessive whereas autosomal dominant and X-linked disorders are also present. The clinical signs and symptoms arise from the accumulation of the toxic substrate, deficiency of the product, or both. Depending on the residual activity of the deficient enzyme, the initiation of the clinical picture may vary starting from the newborn period up until adulthood. Hundreds of disorders have been described until now and there has been a considerable clinical overlap between certain inborn errors. Resulting from this fact, the definite diagnosis of inborn errors depends on enzyme assays or genetic tests. Especially during the recent years, significant achievements have been gained for the biochemical and genetic diagnosis of inborn errors. Techniques such as tandem mass spectrometry and gas chromatography for biochemical diagnosis and microarrays and next-generation sequencing for the genetic diagnosis have enabled rapid and accurate diagnosis. The achievements for the diagnosis also enabled newborn screening and prenatal diagnosis. Parallel to the development the diagnostic methods; significant progress has also been obtained for the treatment. Treatment approaches such as special diets, enzyme replacement therapy, substrate inhibition, and organ transplantation have been widely used. It is obvious that by the help of the preclinical and clinical research carried out for inborn errors, better diagnostic methods and better treatment approaches will high likely be available.
Collapse
|
116
|
Castrillo JI, Oliver SG. Alzheimer's as a Systems-Level Disease Involving the Interplay of Multiple Cellular Networks. Methods Mol Biol 2016; 1303:3-48. [PMID: 26235058 DOI: 10.1007/978-1-4939-2627-5_1] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Alzheimer's disease (AD), and many neurodegenerative disorders, are multifactorial in nature. They involve a combination of genomic, epigenomic, interactomic and environmental factors. Progress is being made, and these complex diseases are beginning to be understood as having their origin in altered states of biological networks at the cellular level. In the case of AD, genomic susceptibility and mechanisms leading to (or accompanying) the impairment of the central Amyloid Precursor Protein (APP) processing and tau networks are widely accepted as major contributors to the diseased state. The derangement of these networks may result in both the gain and loss of functions, increased generation of toxic species (e.g., toxic soluble oligomers and aggregates) and imbalances, whose effects can propagate to supra-cellular levels. Although well sustained by empirical data and widely accepted, this global perspective often overlooks the essential roles played by the main counteracting homeostatic networks (e.g., protein quality control/proteostasis, unfolded protein response, protein folding chaperone networks, disaggregases, ER-associated degradation/ubiquitin proteasome system, endolysosomal network, autophagy, and other stress-protective and clearance networks), whose relevance to AD is just beginning to be fully realized. In this chapter, an integrative perspective is presented. Alzheimer's disease is characterized to be a result of: (a) intrinsic genomic/epigenomic susceptibility and, (b) a continued dynamic interplay between the deranged networks and the central homeostatic networks of nerve cells. This interplay of networks will underlie both the onset and rate of progression of the disease in each individual. Integrative Systems Biology approaches are required to effect its elucidation. Comprehensive Systems Biology experiments at different 'omics levels in simple model organisms, engineered to recapitulate the basic features of AD may illuminate the onset and sequence of events underlying AD. Indeed, studies of models of AD in simple organisms, differentiated cells in culture and rodents are beginning to offer hope that the onset and progression of AD, if detected at an early stage, may be stopped, delayed, or even reversed, by activating or modulating networks involved in proteostasis and the clearance of toxic species. In practice, the incorporation of next-generation neuroimaging, high-throughput and computational approaches are opening the way towards early diagnosis well before irreversible cell death. Thus, the presence or co-occurrence of: (a) accumulation of toxic Aβ oligomers and tau species; (b) altered splicing and transcriptome patterns; (c) impaired redox, proteostatic, and metabolic networks together with, (d) compromised homeostatic capacities may constitute relevant 'AD hallmarks at the cellular level' towards reliable and early diagnosis. From here, preventive lifestyle changes and tailored therapies may be investigated, such as combined strategies aimed at both lowering the production of toxic species and potentiating homeostatic responses, in order to prevent or delay the onset, and arrest, alleviate, or even reverse the progression of the disease.
Collapse
Affiliation(s)
- Juan I Castrillo
- Department of Biochemistry & Cambridge Systems Biology Centre, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge, CB2 1GA, UK,
| | | |
Collapse
|
117
|
Li G, Cui Y, Zhao H. An Empirical Bayes risk prediction model using multiple traits for sequencing data. Stat Appl Genet Mol Biol 2015; 14:551-73. [PMID: 26641974 DOI: 10.1515/sagmb-2015-0060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The rapidly developing sequencing technologies have led to improved disease risk prediction through identifying many novel genes. Many prediction methods have been proposed to use rich genomic information to predict binary disease outcomes. It is intuitive that these methods can be further improved by making efficient use of the rich information in measured quantitative traits that are correlated with binary outcomes. In this article, we propose a novel Empirical Bayes prediction model that uses information from both quantitative traits and binary disease status to improve risk prediction. Our method is built on a new statistic that better infers the gene effect on multiple traits, and it also enjoys the good theoretical properties. We then consider using sequencing data by combining information from multiple rare variants in individual genes to strengthen the signals of causal genetic effects. In simulation study, we find that our proposed Empirical Bayes approach is superior to other existing methods in terms of feature selection and risk prediction. We further evaluate the effectiveness of our proposed method through its application to the sequencing data provided by the Genetic Analysis Workshop 18.
Collapse
|
118
|
Mensah-Ablorh A, Lindstrom S, Haiman CA, Henderson BE, Marchand LL, Lee S, Stram DO, Eliassen AH, Price A, Kraft P. Meta-Analysis of Rare Variant Association Tests in Multiethnic Populations. Genet Epidemiol 2015; 40:57-65. [PMID: 26639010 DOI: 10.1002/gepi.21939] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2015] [Revised: 09/15/2015] [Accepted: 09/19/2015] [Indexed: 12/30/2022]
Abstract
Several methods have been proposed to increase power in rare variant association testing by aggregating information from individual rare variants (MAF < 0.005). However, how to best combine rare variants across multiple ethnicities and the relative performance of designs using different ethnic sampling fractions remains unknown. In this study, we compare the performance of several statistical approaches for assessing rare variant associations across multiple ethnicities. We also explore how different ethnic sampling fractions perform, including single-ethnicity studies and studies that sample up to four ethnicities. We conducted simulations based on targeted sequencing data from 4,611 women in four ethnicities (African, European, Japanese American, and Latina). As with single-ethnicity studies, burden tests had greater power when all causal rare variants were deleterious, and variance component-based tests had greater power when some causal rare variants were deleterious and some were protective. Multiethnic studies had greater power than single-ethnicity studies at many loci, with inclusion of African Americans providing the largest impact. On average, studies including African Americans had as much as 20% greater power than equivalently sized studies without African Americans. This suggests that association studies between rare variants and complex disease should consider including subjects from multiple ethnicities, with preference given to genetically diverse groups.
Collapse
Affiliation(s)
- Akweley Mensah-Ablorh
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Sara Lindstrom
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Christopher A Haiman
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Brian E Henderson
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Research Center, Honolulu, Hawaii, United States of America
| | - Seunngeun Lee
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Daniel O Stram
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - A Heather Eliassen
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Channing Division of Network Medicine, Brigham & Women's Hospital, Boston, Massachusetts, United States of America
| | - Alkes Price
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Peter Kraft
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
119
|
Ingles J, Burns C, Barratt A, Semsarian C. Application of Genetic Testing in Hypertrophic Cardiomyopathy for Preclinical Disease Detection. ACTA ACUST UNITED AC 2015; 8:852-9. [DOI: 10.1161/circgenetics.115.001093] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Jodie Ingles
- From the Agnes Ginges Centre for Molecular Cardiology, Centenary Institute, Sydney NSW, Australia (J.I., C.B., C.S.); Central Clinical School, Sydney Medical School, University of Sydney, Sydney NSW, Australia (J.I., C.B., C.S.); School of Population Health, Sydney Medical School, University of Sydney, Sydney NSW, Australia (A.B.); and Department of Cardiology, Royal Prince Alfred Hospital, Sydney NSW, Australia (J.I., C.B., C.S.)
| | - Charlotte Burns
- From the Agnes Ginges Centre for Molecular Cardiology, Centenary Institute, Sydney NSW, Australia (J.I., C.B., C.S.); Central Clinical School, Sydney Medical School, University of Sydney, Sydney NSW, Australia (J.I., C.B., C.S.); School of Population Health, Sydney Medical School, University of Sydney, Sydney NSW, Australia (A.B.); and Department of Cardiology, Royal Prince Alfred Hospital, Sydney NSW, Australia (J.I., C.B., C.S.)
| | - Alexandra Barratt
- From the Agnes Ginges Centre for Molecular Cardiology, Centenary Institute, Sydney NSW, Australia (J.I., C.B., C.S.); Central Clinical School, Sydney Medical School, University of Sydney, Sydney NSW, Australia (J.I., C.B., C.S.); School of Population Health, Sydney Medical School, University of Sydney, Sydney NSW, Australia (A.B.); and Department of Cardiology, Royal Prince Alfred Hospital, Sydney NSW, Australia (J.I., C.B., C.S.)
| | - Christopher Semsarian
- From the Agnes Ginges Centre for Molecular Cardiology, Centenary Institute, Sydney NSW, Australia (J.I., C.B., C.S.); Central Clinical School, Sydney Medical School, University of Sydney, Sydney NSW, Australia (J.I., C.B., C.S.); School of Population Health, Sydney Medical School, University of Sydney, Sydney NSW, Australia (A.B.); and Department of Cardiology, Royal Prince Alfred Hospital, Sydney NSW, Australia (J.I., C.B., C.S.)
| |
Collapse
|
120
|
Linkage and whole genome sequencing identify a locus on 6q25-26 for formal thought disorder and implicate MEF2A regulation. Schizophr Res 2015; 169:441-446. [PMID: 26421691 DOI: 10.1016/j.schres.2015.08.037] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 08/27/2015] [Accepted: 08/27/2015] [Indexed: 11/24/2022]
Abstract
Formal thought disorder is a major feature of schizophrenia and other psychotic disorders. It is heritable, found in healthy relatives of patients with schizophrenia and other mental disorders but knowledge of specific genetic factors is lacking. The aim of this study was to search for biologically relevant high-risk variants. Formal thought disorder was assessed in participants in the Copenhagen Schizophrenia Linkage Study (N=236), a unique high-risk family study comprised of six large pedigrees. Microsatellite linkage analysis of formal thought disorder was performed and subsequent haplotype analysis of the implicated region using phased microsatellite and SNP genotypes. Whole genome sequencing (N=3) was used in the attempt to identify causative variants in the linkage region. Linkage analysis of formal thought disorder resulted in a single peak at chromosome 6(q26-q27) centred on marker D6S1277, with a maximum LOD score of 4.0. Phasing and fine mapping of the linkage peak identified a 5.5Mb haplotype (chr6:162242322-167753547, hg18) in 31 individuals, all belonging to the same pedigree sharing the haplotype from a common ancestor. The haplotype segregated with increased total thought disorder index score (P=4.9 × 10(-5)) and qualitatively severe forms of thought disturbances. Whole genome sequencing identified a novel nucleotide deletion (chr6:164377205 AG>A, hg18) predicted to disrupt the potential binding of the transcription factor MEF2A. The MEF2A binding site is located between two genes previously reported to associate with schizophrenia, QKI (HGNC:21100) and PDE10A (HGNC:8772). The findings are consistent with MEF2A deregulation conferring risk of formal thought disorder.
Collapse
|
121
|
Xiong J, Dittmer DP, Marron JS. “Virus hunting” using radial distance weighted discrimination. Ann Appl Stat 2015. [DOI: 10.1214/15-aoas869] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
122
|
Abstract
In 2000 the United States launched the National Nanotechnology Initiative and, along with it, a well-defined set of goals for nanomedicine. This Perspective looks back at the progress made toward those goals, within the context of the changing landscape in biomedicine that has occurred over the past 15 years, and considers advances that are likely to occur during the next decade. In particular, nanotechnologies for health-related genomics and single-cell biology, inorganic and organic nanoparticles for biomedicine, and wearable nanotechnologies for wellness monitoring are briefly covered.
Collapse
Affiliation(s)
- James R Heath
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125
| |
Collapse
|
123
|
Ni G, Strom TM, Pausch H, Reimer C, Preisinger R, Simianer H, Erbe M. Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken. BMC Genomics 2015; 16:824. [PMID: 26486989 PMCID: PMC4618161 DOI: 10.1186/s12864-015-2059-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Accepted: 10/09/2015] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND The technical progress in the last decade has made it possible to sequence millions of DNA reads in a relatively short time frame. Several variant callers based on different algorithms have emerged and have made it possible to extract single nucleotide polymorphisms (SNPs) out of the whole-genome sequence. Often, only a few individuals of a population are sequenced completely and imputation is used to obtain genotypes for all sequence-based SNP loci for other individuals, which have been genotyped for a subset of SNPs using a genotyping array. METHODS First, we compared the sets of variants detected with different variant callers, namely GATK, freebayes and SAMtools, and checked the quality of genotypes of the called variants in a set of 50 fully sequenced white and brown layers. Second, we assessed the imputation accuracy (measured as the correlation between imputed and true genotype per SNP and per individual, and genotype conflict between father-progeny pairs) when imputing from high density SNP array data to whole-genome sequence using data from around 1000 individuals from six different generations. Three different imputation programs (Minimac, FImpute and IMPUTE2) were checked in different validation scenarios. RESULTS There were 1,741,573 SNPs detected by all three callers on the studied chromosomes 3, 6, and 28, which was 71.6 % (81.6 %, 88.0 %) of SNPs detected by GATK (SAMtools, freebayes) in total. Genotype concordance (GC) defined as the proportion of individuals whose array-derived genotypes are the same as the sequence-derived genotypes over all non-missing SNPs on the array were 0.98 (GATK), 0.97 (freebayes) and 0.98 (SAMtools). Furthermore, the percentage of variants that had high values (>0.9) for another three measures (non-reference sensitivity, non-reference genotype concordance and precision) were 90 (88, 75) for GATK (SAMtools, freebayes). With all imputation programs, correlation between original and imputed genotypes was >0.95 on average with randomly masked 1000 SNPs from the SNP array and >0.85 for a leave-one-out cross-validation within sequenced individuals. CONCLUSIONS Performance of all variant callers studied was very good in general, particularly for GATK and SAMtools. FImpute performed slightly worse than Minimac and IMPUTE2 in terms of genotype correlation, especially for SNPs with low minor allele frequency, while it had lowest numbers in Mendelian conflicts in available father-progeny pairs. Correlations of real and imputed genotypes remained constantly high even if individuals to be imputed were several generations away from the sequenced individuals.
Collapse
Affiliation(s)
- Guiyan Ni
- Animal Breeding and Genetics Group, Georg-August-Universität, Göttingen, Germany.
| | - Tim M Strom
- Institute of Human Genetics, Helmholtz Zentrum München, Neuherberg, Germany.
| | - Hubert Pausch
- Chair of Animal Breeding, Technische Universität München, Freising, Germany.
| | - Christian Reimer
- Animal Breeding and Genetics Group, Georg-August-Universität, Göttingen, Germany.
| | | | - Henner Simianer
- Animal Breeding and Genetics Group, Georg-August-Universität, Göttingen, Germany.
| | - Malena Erbe
- Animal Breeding and Genetics Group, Georg-August-Universität, Göttingen, Germany. .,Institute for Animal Breeding, Bavarian State Research Centre for Agriculture, Grub, Germany.
| |
Collapse
|
124
|
The human gene damage index as a gene-level approach to prioritizing exome variants. Proc Natl Acad Sci U S A 2015; 112:13615-20. [PMID: 26483451 DOI: 10.1073/pnas.1518646112] [Citation(s) in RCA: 185] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The protein-coding exome of a patient with a monogenic disease contains about 20,000 variants, only one or two of which are disease causing. We found that 58% of rare variants in the protein-coding exome of the general population are located in only 2% of the genes. Prompted by this observation, we aimed to develop a gene-level approach for predicting whether a given human protein-coding gene is likely to harbor disease-causing mutations. To this end, we derived the gene damage index (GDI): a genome-wide, gene-level metric of the mutational damage that has accumulated in the general population. We found that the GDI was correlated with selective evolutionary pressure, protein complexity, coding sequence length, and the number of paralogs. We compared GDI with the leading gene-level approaches, genic intolerance, and de novo excess, and demonstrated that GDI performed best for the detection of false positives (i.e., removing exome variants in genes irrelevant to disease), whereas genic intolerance and de novo excess performed better for the detection of true positives (i.e., assessing de novo mutations in genes likely to be disease causing). The GDI server, data, and software are freely available to noncommercial users from lab.rockefeller.edu/casanova/GDI.
Collapse
|
125
|
Gu W, Gurguis CI, Zhou JJ, Zhu Y, Ko EA, Ko JH, Wang T, Zhou T. Functional and Structural Consequence of Rare Exonic Single Nucleotide Polymorphisms: One Story, Two Tales. Genome Biol Evol 2015; 7:2929-40. [PMID: 26454016 PMCID: PMC4684694 DOI: 10.1093/gbe/evv191] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/05/2015] [Indexed: 01/01/2023] Open
Abstract
Genetic variation arising from single nucleotide polymorphisms (SNPs) is ubiquitously found among human populations. While disease-causing variants are known in some cases, identifying functional or causative variants for most human diseases remains a challenging task. Rare SNPs, rather than common ones, are thought to be more important in the pathology of most human diseases. We propose that rare SNPs should be divided into two categories dependent on whether the minor alleles are derived or ancestral. Derived alleles are less likely to have been purified by evolutionary processes and may be more likely to induce deleterious effects. We therefore hypothesized that the rare SNPs with derived minor alleles would be more important for human diseases and predicted that these variants would have larger functional or structural consequences relative to the rare variants for which the minor alleles are ancestral. We systematically investigated the consequences of the exonic SNPs on protein function, mRNA structure, and translation. We found that the functional and structural consequences are more significant for the rare exonic variants for which the minor alleles are derived. However, this pattern is reversed when the minor alleles are ancestral. Thus, the rare exonic SNPs with derived minor alleles are more likely to be deleterious. Age estimation of rare SNPs confirms that these potentially deleterious SNPs are recently evolved in the human population. These results have important implications for understanding the function of genetic variations in human exonic regions and for prioritizing functional SNPs in genome-wide association studies of human diseases.
Collapse
Affiliation(s)
- Wanjun Gu
- Research Center for Learning Sciences, Southeast University, Nanjing, Jiangsu, China
| | | | - Jin J Zhou
- Department of Epidemiology and Biostatistics, The University of Arizona
| | - Yihua Zhu
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, China College of Information Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Eun-A Ko
- Department of Pharmacology, The University of Nevada School of Medicine, Reno
| | - Jae-Hong Ko
- Department of Physiology, College of Medicine, Chung-Ang University, Seoul, South Korea
| | - Ting Wang
- Department of Medicine, The University of Arizona
| | - Tong Zhou
- Department of Medicine, The University of Arizona
| |
Collapse
|
126
|
Yan S, Yuan S, Xu Z, Zhang B, Zhang B, Kang G, Byrnes A, Li Y. Likelihood-based complex trait association testing for arbitrary depth sequencing data. Bioinformatics 2015; 31:2955-62. [PMID: 25979475 PMCID: PMC4668777 DOI: 10.1093/bioinformatics/btv307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2014] [Revised: 05/06/2015] [Accepted: 05/11/2015] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED In next generation sequencing (NGS)-based genetic studies, researchers typically perform genotype calling first and then apply standard genotype-based methods for association testing. However, such a two-step approach ignores genotype calling uncertainty in the association testing step and may incur power loss and/or inflated type-I error. In the recent literature, a few robust and efficient likelihood based methods including both likelihood ratio test (LRT) and score test have been proposed to carry out association testing without intermediate genotype calling. These methods take genotype calling uncertainty into account by directly incorporating genotype likelihood function (GLF) of NGS data into association analysis. However, existing LRT methods are computationally demanding or do not allow covariate adjustment; while existing score tests are not applicable to markers with low minor allele frequency (MAF). We provide an LRT allowing flexible covariate adjustment, develop a statistically more powerful score test and propose a combination strategy (UNC combo) to leverage the advantages of both tests. We have carried out extensive simulations to evaluate the performance of our proposed LRT and score test. Simulations and real data analysis demonstrate the advantages of our proposed combination strategy: it offers a satisfactory trade-off in terms of computational efficiency, applicability (accommodating both common variants and variants with low MAF) and statistical power, particularly for the analysis of quantitative trait where the power gain can be up to ∼60% when the causal variant is of low frequency (MAF < 0.01). AVAILABILITY AND IMPLEMENTATION UNC combo and the associated R files, including documentation, examples, are available at http://www.unc.edu/∼yunmli/UNCcombo/ CONTACT yunli@med.unc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Song Yan
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Shuai Yuan
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Zheng Xu
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Baqun Zhang
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Bo Zhang
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Guolian Kang
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Andrea Byrnes
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Yun Li
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| |
Collapse
|
127
|
Abstract
The high heritability, early age at onset, and reproductive disadvantages of autism spectrum disorders (ASDs) are consistent with an etiology composed of dominant-acting de novo (spontaneous) mutations. Mutation detection by microarray analysis and DNA sequencing has confirmed that de novo copy-number variants or point mutations in protein-coding regions of genes contribute to risk, and some of the underlying causal variants and genes have been identified. As our understanding of autism genes develops, the spectrum of autism is breaking up into quanta of many different genetic disorders. Given the diversity of etiologies and underlying biochemical pathways, personalized therapy for ASDs is logical, and clinical genetic testing is a prerequisite.
Collapse
|
128
|
|
129
|
Datta AS, Biswas S. Comparison of haplotype-based statistical tests for disease association with rare and common variants. Brief Bioinform 2015; 17:657-71. [PMID: 26338417 DOI: 10.1093/bib/bbv072] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Indexed: 01/26/2023] Open
Abstract
Recent literature has highlighted the advantages of haplotype association methods for detecting rare variants associated with common diseases. As several new haplotype association methods have been proposed in the past few years, a comparison of new and standard methods is important and timely for guidance to the practitioners. We consider nine methods-Haplo.score, Haplo.glm, Hapassoc, Bayesian hierarchical Generalized Linear Model (BhGLM), Logistic Bayesian LASSO (LBL), regularized GLM (rGLM), Haplotype Kernel Association Test, wei-SIMc-matching and Weighted Haplotype and Imputation-based Tests. These can be divided into two types-individual haplotype-specific tests and global tests depending on whether there is just one overall test for a haplotype region (global) or there is an individual test for each haplotype in the region. Haplo.score is the only method that tests for both; Haplo.glm, Hapassoc, BhGLM and LBL are individual haplotype-specific, while the rest are global tests. For comparison, we also apply a popular collapsing method-Sequence Kernel Association Test (SKAT) and its two variants-SKAT-O (Optimal) and SKAT-C (Combined). We carry out an extensive comparison on our simulated data sets as well as on the Genetic Analysis Workshop (GAW) 18 simulated data. Further, we apply the methods to GAW18 real hypertension data and Dallas Heart Study sequence data. We find that LBL, Haplo.score (global test) and rGLM perform well over the scenarios considered here. Also, haplotype methods are more powerful (albeit more computationally intensive) than SKAT and its variants in scenarios where multiple causal variants act interactively to produce haplotype effects.
Collapse
|
130
|
Ramos YFM, Bos SD, van der Breggen R, Kloppenburg M, Ye K, Lameijer EWEMW, Nelissen RGHH, Slagboom PE, Meulenbelt I. A gain of function mutation in TNFRSF11B encoding osteoprotegerin causes osteoarthritis with chondrocalcinosis. Ann Rheum Dis 2015; 74:1756-62. [PMID: 24743232 DOI: 10.1136/annrheumdis-2013-205149] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Accepted: 03/23/2014] [Indexed: 01/16/2023]
Abstract
OBJECTIVE To identify pathogenic mutations that reveal underlying biological mechanisms driving osteoarthritis (OA). METHODS Exome sequencing was applied to two distant family members with dominantly inherited early onset primary OA at multiple joint sites with chondrocalcinosis (familial generalised osteoarthritis, FOA). Confirmation of mutations occurred by genotyping and linkage analyses across the extended family. The functional effect of the mutation was investigated by means of a cell-based assay. To explore generalisability, mRNA expression analysis of the relevant genes in the discovered pathway was explored in preserved and osteoarthritic articular cartilage of independent patients undergoing joint replacement surgery. RESULTS We identified a heterozygous, probably damaging, read-through mutation (c.1205A=>T; p.Stop402Leu) in TNFRSF11B encoding osteoprotegerin that is likely causal to the OA phenotype in the extended family. In a bone resorption assay, the mutant form of osteoprotegerin showed enhanced capacity to inhibit osteoclastogenesis and bone resorption. Expression analyses in preserved and affected articular cartilage of independent OA patients showed that upregulation of TNFRSF11B is a general phenomenon in the pathophysiological process. CONCLUSIONS Albeit that the role of the molecular pathway of osteoprotegerin has been studied in OA, we are the first to demonstrate that enhanced osteoprotegerin function could be a directly underlying cause. We advocate that agents counteracting the function of osteoprotegerin could comply with new therapeutic interventions of OA.
Collapse
Affiliation(s)
- Yolande F M Ramos
- Department of Medical Statistics and Bioinformatics, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands Netherlands Consortium for Healthy Ageing, Leiden, The Netherlands
| | - Steffan D Bos
- Department of Medical Statistics and Bioinformatics, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands Netherlands Consortium for Healthy Ageing, Leiden, The Netherlands
| | - Ruud van der Breggen
- Department of Medical Statistics and Bioinformatics, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Margreet Kloppenburg
- Department of Rheumatology & Department of Clinical Epidemiology, Leiden, The Netherlands
| | - Kai Ye
- Department of Medical Statistics and Bioinformatics, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Eric-Wubbo E M W Lameijer
- Department of Medical Statistics and Bioinformatics, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Rob G H H Nelissen
- Department of Orthopaedics, Leiden University Medical Center, Leiden, The Netherlands
| | - P Eline Slagboom
- Department of Medical Statistics and Bioinformatics, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands Netherlands Consortium for Healthy Ageing, Leiden, The Netherlands
| | - Ingrid Meulenbelt
- Department of Medical Statistics and Bioinformatics, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands Netherlands Consortium for Healthy Ageing, Leiden, The Netherlands
| |
Collapse
|
131
|
Ronowicz A, Janaszak-Jasiecka A, Skokowski J, Madanecki P, Bartoszewski R, Bałut M, Seroczyńska B, Kochan K, Bogdan A, Butkus M, Pęksa R, Ratajska M, Kuźniacka A, Wasąg B, Gucwa M, Krzyżanowski M, Jaśkiewicz J, Jankowski Z, Forsberg L, Ochocka JR, Limon J, Crowley MR, Buckley PG, Messiaen L, Dumanski JP, Piotrowski A. Concurrent DNA Copy-Number Alterations and Mutations in Genes Related to Maintenance of Genome Stability in Uninvolved Mammary Glandular Tissue from Breast Cancer Patients. Hum Mutat 2015. [PMID: 26219265 DOI: 10.1002/humu.22845] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Somatic mosaicism for DNA copy-number alterations (SMC-CNAs) is defined as gain or loss of chromosomal segments in somatic cells within a single organism. As cells harboring SMC-CNAs can undergo clonal expansion, it has been proposed that SMC-CNAs may contribute to the predisposition of these cells to genetic disease including cancer. Herein, the gross genomic alterations (>500 kbp) were characterized in uninvolved mammary glandular tissue from 59 breast cancer patients and matched samples of primary tumors and lymph node metastases. Array-based comparative genomic hybridization showed 10% (6/59) of patients harbored one to 359 large SMC-CNAs (mean: 1,328 kbp; median: 961 kbp) in a substantial portion of glandular tissue cells, distal from the primary tumor site. SMC-CNAs were partially recurrent in tumors, albeit with considerable contribution of stochastic SMC-CNAs indicating genomic destabilization. Targeted resequencing of 301 known predisposition and somatic driver loci revealed mutations and rare variants in genes related to maintenance of genomic integrity: BRCA1 (p.Gln1756Profs*74, p.Arg504Cys), BRCA2 (p.Asn3124Ile), NCOR1 (p.Pro1570Glnfs*45), PALB2 (p.Ser500Pro), and TP53 (p.Arg306*). Co-occurrence of gross SMC-CNAs along with point mutations or rare variants in genes responsible for safeguarding genomic integrity highlights the temporal and spatial neoplastic potential of uninvolved glandular tissue in breast cancer patients.
Collapse
Affiliation(s)
- Anna Ronowicz
- Faculty of Pharmacy, Medical University of Gdansk, Gdansk, Poland
| | | | - Jarosław Skokowski
- The Central Bank of Tissues and Genetic Specimens, Medical University of Gdansk, Gdansk, Poland.,Department of Surgical Oncology, Medical University of Gdansk, Gdansk, Poland
| | - Piotr Madanecki
- Faculty of Pharmacy, Medical University of Gdansk, Gdansk, Poland
| | | | - Magdalena Bałut
- Faculty of Pharmacy, Medical University of Gdansk, Gdansk, Poland
| | - Barbara Seroczyńska
- The Central Bank of Tissues and Genetic Specimens, Medical University of Gdansk, Gdansk, Poland
| | - Kinga Kochan
- Faculty of Pharmacy, Medical University of Gdansk, Gdansk, Poland
| | - Adam Bogdan
- Faculty of Pharmacy, Medical University of Gdansk, Gdansk, Poland
| | | | - Rafał Pęksa
- Department of Pathomorphology, Medical University of Gdansk, Gdansk, Poland
| | - Magdalena Ratajska
- Department of Biology and Genetics, Medical University of Gdansk, Gdansk, Poland
| | - Alina Kuźniacka
- Department of Biology and Genetics, Medical University of Gdansk, Gdansk, Poland
| | - Bartosz Wasąg
- Department of Biology and Genetics, Medical University of Gdansk, Gdansk, Poland
| | - Magdalena Gucwa
- Faculty of Pharmacy, Medical University of Gdansk, Gdansk, Poland
| | - Maciej Krzyżanowski
- Department of Forensic Medicine, Medical University of Gdansk, Gdansk, Poland
| | - Janusz Jaśkiewicz
- Department of Surgical Oncology, Medical University of Gdansk, Gdansk, Poland
| | - Zbigniew Jankowski
- Department of Forensic Medicine, Medical University of Gdansk, Gdansk, Poland
| | - Lars Forsberg
- Department of Immunology, Genetics and Pathology and SciLifeLab, Uppsala University, Uppsala, Sweden
| | - J Renata Ochocka
- Faculty of Pharmacy, Medical University of Gdansk, Gdansk, Poland
| | - Janusz Limon
- Department of Biology and Genetics, Medical University of Gdansk, Gdansk, Poland
| | - Michael R Crowley
- Heflin Center for Genomic Sciences, University of Alabama at Birmingham, Birmingham, Alabama
| | | | - Ludwine Messiaen
- Medical Genomics Laboratory, Department of Genetics, University of Alabama at Birmingham, Birmingham, Alabama
| | - Jan P Dumanski
- Department of Immunology, Genetics and Pathology and SciLifeLab, Uppsala University, Uppsala, Sweden
| | | |
Collapse
|
132
|
Abstract
Consensus practice guidelines and the implementation of clinical therapeutic advances are usually based on the results of large, randomized clinical trials (RCTs). However, RCTs generally inform us on an average treatment effect for a presumably homogeneous population, but therapeutic interventions rarely benefit the entire population targeted. Indeed, multiple RCTs have demonstrated that interindividual variability exists both in drug response and in the development of adverse effects. The field of pharmacogenomics promises to deliver the right drug to the right patient. Substantial progress has been made in this field, with advances in technology, statistical and computational methods, and the use of cell and animal model systems. However, clinical implementation of pharmacogenetic principles has been difficult because RCTs demonstrating benefit are lacking. For patients, the potential benefits of performing such trials include the individualization of therapy to maximize efficacy and minimize adverse effects. These trials would also enable investigators to reduce sample size and hence contain costs for trial sponsors. Multiple ethical, legal, and practical issues need to be considered for the conduct of genotype-based RCTs. Whether pre-emptive genotyping embedded in electronic health records will preclude the need for performing genotype-based RCTs remains to be seen.
Collapse
Affiliation(s)
- Naveen L Pereira
- Division of Cardiovascular Diseases, Department of Internal Medicine, 200 First Street SW, Rochester, MN 55905, USA
| | - Daniel J Sargent
- Department of Biomedical Statistics and Informatics, Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA
| | - Michael E Farkouh
- Peter Munk Cardiac Centre and Heart and Stroke Richard Lewer Centre, University of Toronto, 585 University Avenue, Toronto, ON M5G 2N2, Canada
| | - Charanjit S Rihal
- Division of Cardiovascular Diseases, Department of Internal Medicine, 200 First Street SW, Rochester, MN 55905, USA
| |
Collapse
|
133
|
Khurana JK, Reeder JE, Shrimpton AE, Thakar J. GESPA: classifying nsSNPs to predict disease association. BMC Bioinformatics 2015. [PMID: 26206375 PMCID: PMC4513380 DOI: 10.1186/s12859-015-0673-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Non-synonymous single nucleotide polymorphisms (nsSNPs) are the most common DNA sequence variation associated with disease in humans. Thus determining the clinical significance of each nsSNP is of great importance. Potential detrimental nsSNPs may be identified by genetic association studies or by functional analysis in the laboratory, both of which are expensive and time consuming. Existing computational methods lack accuracy and features to facilitate nsSNP classification for clinical use. We developed the GESPA (GEnomic Single nucleotide Polymorphism Analyzer) program to predict the pathogenicity and disease phenotype of nsSNPs. RESULTS GESPA is a user-friendly software package for classifying disease association of nsSNPs. It allows flexibility in acceptable input formats and predicts the pathogenicity of a given nsSNP by assessing the conservation of amino acids in orthologs and paralogs and supplementing this information with data from medical literature. The development and testing of GESPA was performed using the humsavar, ClinVar and humvar datasets. Additionally, GESPA also predicts the disease phenotype associated with a nsSNP with high accuracy, a feature unavailable in existing software. GESPA's overall accuracy exceeds existing computational methods for predicting nsSNP pathogenicity. The usability of GESPA is enhanced by fast SQL-based cloud storage and retrieval of data. CONCLUSIONS GESPA is a novel bioinformatics tool to determine the pathogenicity and phenotypes of nsSNPs. We anticipate that GESPA will become a useful clinical framework for predicting the disease association of nsSNPs. The program, executable jar file, source code, GPL 3.0 license, user guide, and test data with instructions are available at http://sourceforge.net/projects/gespa.
Collapse
Affiliation(s)
- Jay K Khurana
- Department of Urology, SUNY Upstate Medical University, Syracuse, NY, USA.
| | - Jay E Reeder
- Department of Obstetrics and Gynecology, University of Rochester, Rochester, NY, USA.
| | - Antony E Shrimpton
- Department of Pathology, SUNY Upstate Medical University, Syracuse, NY, USA.
| | - Juilee Thakar
- Department of Microbiology and Immunology, University of Rochester, Rochester, NY, USA. .,Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, USA.
| |
Collapse
|
134
|
Boora GK, Kulkarni AA, Kanwar R, Beyerlein P, Qin R, Banck MS, Ruddy KJ, Pleticha J, Lynch CA, Behrens RJ, Züchner S, Loprinzi CL, Beutler AS. Association of the Charcot-Marie-Tooth disease gene ARHGEF10 with paclitaxel induced peripheral neuropathy in NCCTG N08CA (Alliance). J Neurol Sci 2015; 357:35-40. [PMID: 26143528 DOI: 10.1016/j.jns.2015.06.056] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Revised: 05/23/2015] [Accepted: 06/25/2015] [Indexed: 11/26/2022]
Abstract
The predisposition of patients to develop polyneuropathy in response to toxic exposure may have a genetic basis. The previous study Alliance N08C1 found an association of the Charcot-Marie-Tooth disease (CMT) gene ARHGEF10 with paclitaxel chemotherapy induced peripheral neuropathy (CIPN) related to the three non-synonymous, recurrent single nucleotide variants (SNV), whereby rs9657362 had the strongest effect, and rs2294039 and rs17683288 contributed only weakly. In the present report, Alliance N08CA was chosen to attempt to replicate the above finding. N08CA was chosen because it is the methodologically most similar study (to N08C1) performed in the CIPN field to date. N08CA enrolled patients receiving the neurotoxic chemotherapy agent paclitaxel. Polyneuropathy was assessed by serial repeat administration of the previously validated patient reported outcome instrument CIPN20. A study-wide, Rasch type model was used to perform extreme phenotyping in n=138 eligible patients from which "cases" and "controls" were selected for genetic analysis of SNV performed by TaqMan PCR. A significant association of ARHGEF10 with CIPN was found under the pre-specified primary endpoint, with a significance level of p=0.024. As in the original study, the strongest association of a single SNV was seen for rs9657362 (odds ratio=3.56, p=0.018). To further compare results across the new and the previous study, a statistical "classifier" was tested, which achieved a ROC area under the curve of 0.60 for N08CA and 0.66 for N08C1, demonstrating good agreement. Retesting of the primary endpoint of N08C1 in the replication study N08CA validated the association of ARHGEF10 with CIPN.
Collapse
Affiliation(s)
| | | | - Rahul Kanwar
- Department of Oncology, Mayo Clinic, Rochester, MN, USA
| | - Peter Beyerlein
- Department of Diagnostic Bioinformatics, Technische Hochschule Wildau, Wildau, Germany
| | - Rui Qin
- Alliance Statistics and Data Center, Mayo Clinic, Rochester, MN, USA
| | | | | | | | | | | | - Stephan Züchner
- Department of Human Genetics and Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | | | | |
Collapse
|
135
|
Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation. Am J Hum Genet 2015; 96:926-37. [PMID: 26027497 DOI: 10.1016/j.ajhg.2015.04.018] [Citation(s) in RCA: 102] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2014] [Accepted: 04/29/2015] [Indexed: 11/20/2022] Open
Abstract
Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.
Collapse
|
136
|
Suzuki K, Yu C, Qu J, Li M, Yao X, Yuan T, Goebl A, Tang S, Ren R, Aizawa E, Zhang F, Xu X, Soligalla RD, Chen F, Kim J, Kim NY, Liao HK, Benner C, Esteban CR, Jin Y, Liu GH, Li Y, Izpisua Belmonte JC. Targeted gene correction minimally impacts whole-genome mutational load in human-disease-specific induced pluripotent stem cell clones. Cell Stem Cell 2015; 15:31-6. [PMID: 24996168 DOI: 10.1016/j.stem.2014.06.016] [Citation(s) in RCA: 137] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Revised: 05/19/2014] [Accepted: 06/19/2014] [Indexed: 12/16/2022]
Abstract
The utility of genome editing technologies for disease modeling and developing cellular therapies has been extensively documented, but the impact of these technologies on mutational load at the whole-genome level remains unclear. We performed whole-genome sequencing to evaluate the mutational load at single-base resolution in individual gene-corrected human induced pluripotent stem cell (hiPSC) clones in three different disease models. In single-cell clones, gene correction by helper-dependent adenoviral vector (HDAdV) or Transcription Activator-Like Effector Nuclease (TALEN) exhibited few off-target effects and a low level of sequence variation, comparable to that accumulated in routine hiPSC culture. The sequence variants were randomly distributed and unique to individual clones. We also combined both technologies and developed a TALEN-HDAdV hybrid vector, which significantly increased gene-correction efficiency in hiPSCs. Therefore, with careful monitoring via whole-genome sequencing it is possible to apply genome editing to human pluripotent cells with minimal impact on genomic mutational load.
Collapse
Affiliation(s)
- Keiichiro Suzuki
- Gene Expression Laboratory, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Chang Yu
- BGI, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Jing Qu
- Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Mo Li
- Gene Expression Laboratory, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Xiaotian Yao
- BGI, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Tingting Yuan
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - April Goebl
- Gene Expression Laboratory, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Senwei Tang
- BGI, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China; Institute of Digestive Disease and the Department of Medicine and Therapeutics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong
| | - Ruotong Ren
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Emi Aizawa
- Gene Expression Laboratory, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Fan Zhang
- BGI, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China; Department of Computational Medicine and Bioinformatics, Medical School, University of Michigan, 500 South State Street, Ann Arbor, MI 48109, USA
| | - Xiuling Xu
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Rupa Devi Soligalla
- Gene Expression Laboratory, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Feng Chen
- BGI, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Jessica Kim
- Gene Expression Laboratory, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Na Young Kim
- Gene Expression Laboratory, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Hsin-Kai Liao
- Gene Expression Laboratory, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Chris Benner
- Integrative Genomics and Bioinformatics Core, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Concepcion Rodriguez Esteban
- Gene Expression Laboratory, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Yabin Jin
- BGI, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Guang-Hui Liu
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Center for Age-related Diseases (CAD), Beijing, China; Beijing Institute for Brain Disorders, Beijing 100069, China.
| | - Yingrui Li
- BGI, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China.
| | - Juan Carlos Izpisua Belmonte
- Gene Expression Laboratory, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA.
| |
Collapse
|
137
|
Cui H, Dhroso A, Johnson N, Korkin D. The variation game: Cracking complex genetic disorders with NGS and omics data. Methods 2015; 79-80:18-31. [PMID: 25944472 DOI: 10.1016/j.ymeth.2015.04.018] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Revised: 03/27/2015] [Accepted: 04/17/2015] [Indexed: 12/14/2022] Open
Abstract
Tremendous advances in Next Generation Sequencing (NGS) and high-throughput omics methods have brought us one step closer towards mechanistic understanding of the complex disease at the molecular level. In this review, we discuss four basic regulatory mechanisms implicated in complex genetic diseases, such as cancer, neurological disorders, heart disease, diabetes, and many others. The mechanisms, including genetic variations, copy-number variations, posttranscriptional variations, and epigenetic variations, can be detected using a variety of NGS methods. We propose that malfunctions detected in these mechanisms are not necessarily independent, since these malfunctions are often found associated with the same disease and targeting the same gene, group of genes, or functional pathway. As an example, we discuss possible rewiring effects of the cancer-associated genetic, structural, and posttranscriptional variations on the protein-protein interaction (PPI) network centered around P53 protein. The review highlights multi-layered complexity of common genetic disorders and suggests that integration of NGS and omics data is a critical step in developing new computational methods capable of deciphering this complexity.
Collapse
Affiliation(s)
- Hongzhu Cui
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Andi Dhroso
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Nathan Johnson
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Dmitry Korkin
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States; Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| |
Collapse
|
138
|
Targeted mutation screening panels expose systematic population bias in detection of cystic fibrosis risk. Genet Med 2015; 18:174-9. [PMID: 25880441 DOI: 10.1038/gim.2015.52] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 03/04/2015] [Indexed: 12/23/2022] Open
Abstract
PURPOSE Carrier screening for mutations contributing to cystic fibrosis (CF) is typically accomplished with panels composed of variants that are clinically validated primarily in patients of European descent. This approach has created a static genetic and phenotypic profile for CF. An opportunity now exists to reevaluate the disease profile of CFTR at a global population level. METHODS CFTR allele and genotype frequencies were obtained from a nonpatient cohort with more than 60,000 unrelated personal genomes collected by the Exome Aggregation Consortium. Likely disease-contributing mutations were identified with the use of public database annotations and computational tools. RESULTS We identified 131 previously described and likely pathogenic variants and another 210 untested variants with a high probability of causing protein damage. None of the current genetic screening panels or existing CFTR mutation databases covered a majority of deleterious variants in any geographical population outside of Europe. CONCLUSIONS Both clinical annotation and mutation coverage by commercially available targeted screening panels for CF are strongly biased toward detection of reproductive risk in persons of European descent. South and East Asian populations are severely underrepresented, in part because of a definition of disease that preferences the phenotype associated with European-typical CFTR alleles.
Collapse
|
139
|
Hernansaiz-Ballesteros RD, Salavert F, Sebastián-León P, Alemán A, Medina I, Dopazo J. Assessing the impact of mutations found in next generation sequencing data over human signaling pathways. Nucleic Acids Res 2015; 43:W270-5. [PMID: 25883139 PMCID: PMC4489259 DOI: 10.1093/nar/gkv349] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2015] [Accepted: 04/02/2015] [Indexed: 01/20/2023] Open
Abstract
Modern sequencing technologies produce increasingly detailed data on genomic variation. However, conventional methods for relating either individual variants or mutated genes to phenotypes present known limitations given the complex, multigenic nature of many diseases or traits. Here we present PATHiVar, a web-based tool that integrates genomic variation data with gene expression tissue information. PATHiVar constitutes a new generation of genomic data analysis methods that allow studying variants found in next generation sequencing experiment in the context of signaling pathways. Simple Boolean models of pathways provide detailed descriptions of the impact of mutations in cell functionality so as, recurrences in functionality failures can easily be related to diseases, even if they are produced by mutations in different genes. Patterns of changes in signal transmission circuits, often unpredictable from individual genes mutated, correspond to patterns of affected functionalities that can be related to complex traits such as disease progression, drug response, etc. PATHiVar is available at: http://pathivar.babelomics.org.
Collapse
Affiliation(s)
| | - Francisco Salavert
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, 46012, Spain Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, 46012, Spain
| | - Patricia Sebastián-León
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, 46012, Spain
| | - Alejandro Alemán
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, 46012, Spain Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, 46012, Spain
| | - Ignacio Medina
- HPC Services, University of Cambridge, Cambridge, CB3 0RB, UK
| | - Joaquín Dopazo
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, 46012, Spain Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, 46012, Spain Functional Genomics Node, (INB) at CIPF, Valencia, 45012, Spain
| |
Collapse
|
140
|
The genome as pharmacopeia: Association of genetic dose with phenotypic response. Biochem Pharmacol 2015; 94:229-40. [DOI: 10.1016/j.bcp.2015.02.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Revised: 02/12/2015] [Accepted: 02/12/2015] [Indexed: 11/21/2022]
|
141
|
Itan Y, Casanova JL. Novel primary immunodeficiency candidate genes predicted by the human gene connectome. Front Immunol 2015; 6:142. [PMID: 25883595 PMCID: PMC4381650 DOI: 10.3389/fimmu.2015.00142] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 03/15/2015] [Indexed: 01/06/2023] Open
Abstract
Germline genetic mutations underlie various primary immunodeficiency (PID) diseases. Patients with rare PID diseases (like most non-PID patients and healthy individuals) carry, on average, 20,000 rare and common coding variants detected by high-throughput sequencing. It is thus a major challenge to select only a few candidate disease-causing variants for experimental testing. One of the tools commonly used in the pipeline for estimating a potential PID-candidate gene is to test whether the specific gene is included in the list of genes that were already experimentally validated as PID-causing in previous studies. However, this approach is limited because it cannot detect the PID-causing mutation(s) in the many PID patients carrying causal mutations of as yet unidentified PID-causing genes. In this study, we expanded in silico the list of potential PID-causing candidate genes from 229 to 3,110. We first identified the top 1% of human genes predicted by the human genes connectome to be biologically close to the 229 known PID genes. We then further narrowed down the list of genes by retaining only the most biologically relevant genes, with functionally enriched gene ontology biological categories similar to those for the known PID genes. We validated this prediction by showing that 17 of the 21 novel PID genes published since the last IUIS classification fall into this group of 3,110 genes (p < 10−7). The resulting new extended list of 3,110 predicted PID genes should be useful for the discovery of novel PID genes in patients.
Collapse
Affiliation(s)
- Yuval Itan
- Rockefeller Branch, St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University , New York, NY , USA
| | - Jean-Laurent Casanova
- Rockefeller Branch, St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University , New York, NY , USA ; Necker Branch, Laboratory of Human Genetics of Infectious Diseases, INSERM U1163 , Paris , France ; Imagine Institute, University Paris Descartes , Paris , France ; Howard Hughes Medical Institute , New York, NY , USA ; Pediatric Hematology-Immunology Unit, Necker Hospital for Sick Children , Paris , France
| |
Collapse
|
142
|
O'Geen H, Henry IM, Bhakta MS, Meckler JF, Segal DJ. A genome-wide analysis of Cas9 binding specificity using ChIP-seq and targeted sequence capture. Nucleic Acids Res 2015; 43:3389-404. [PMID: 25712100 PMCID: PMC4381059 DOI: 10.1093/nar/gkv137] [Citation(s) in RCA: 169] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Revised: 02/07/2015] [Accepted: 02/09/2015] [Indexed: 12/26/2022] Open
Abstract
Clustered regularly interspaced short palindromic repeat (CRISPR) RNA-guided nucleases have gathered considerable excitement as a tool for genome engineering. However, questions remain about the specificity of target site recognition. Cleavage specificity is typically evaluated by low throughput assays (T7 endonuclease I assay, target amplification followed by high-throughput sequencing), which are limited to a subset of potential off-target sites. Here, we used ChIP-seq to examine genome-wide CRISPR binding specificity at gRNA-specific and gRNA-independent sites for two guide RNAs. RNA-guided Cas9 binding was highly specific to the target site while off-target binding occurred at much lower intensities. Cas9-bound regions were highly enriched in NGG sites, a sequence required for target site recognition by Streptococcus pyogenes Cas9. To determine the relationship between Cas9 binding and endonuclease activity, we applied targeted sequence capture, which allowed us to survey 1200 genomic loci simultaneously including potential off-target sites identified by ChIP-seq and by computational prediction. A high frequency of indels was observed at both target sites and one off-target site, while no cleavage activity could be detected at other ChIP-bound regions. Our results confirm the high-specificity of CRISPR endonucleases and demonstrate that sequence capture can be used as a high-throughput genome-wide approach to identify off-target activity.
Collapse
Affiliation(s)
- Henriette O'Geen
- Genome Center and Department of Biochemistry and Molecular Medicine, University of California, Davis, CA 95616, USA
| | - Isabelle M Henry
- Department of Plant Biology and Genome Center, University of California, Davis, CA 95616, USA
| | - Mital S Bhakta
- Genome Center and Department of Biochemistry and Molecular Medicine, University of California, Davis, CA 95616, USA
| | - Joshua F Meckler
- Genome Center and Department of Biochemistry and Molecular Medicine, University of California, Davis, CA 95616, USA
| | - David J Segal
- Genome Center and Department of Biochemistry and Molecular Medicine, University of California, Davis, CA 95616, USA
| |
Collapse
|
143
|
Kerzendorfer C, Konopka T, Nijman SMB. A thesaurus of genetic variation for interrogation of repetitive genomic regions. Nucleic Acids Res 2015; 43:e68. [PMID: 25820428 PMCID: PMC4446415 DOI: 10.1093/nar/gkv178] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 02/22/2015] [Indexed: 01/11/2023] Open
Abstract
Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity to limit false discovery. We developed a method that links candidate variant positions within repetitive genomic regions into clusters. The technique relies on a resource, a thesaurus of genetic variation, that enumerates genomic regions with similar sequence. The resource is computationally intensive to generate, but once compiled can be applied efficiently to annotate and prioritize variants in repetitive regions. We show that thesaurus annotation can reduce the rate of false variant calls due to mappability by up to three orders of magnitude. We apply the technique to whole genome datasets and establish that called variants in low mappability regions annotated using the thesaurus can be experimentally validated. We then extend the analysis to a large panel of exomes to show that the annotation technique opens possibilities to study variation in hereto hidden and under-studied parts of the genome.
Collapse
Affiliation(s)
- Claudia Kerzendorfer
- Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Vienna, Austria
| | - Tomasz Konopka
- Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Vienna, Austria
| | - Sebastian M B Nijman
- Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Vienna, Austria
| |
Collapse
|
144
|
Shulman JM. Drosophila and experimental neurology in the post-genomic era. Exp Neurol 2015; 274:4-13. [PMID: 25814441 DOI: 10.1016/j.expneurol.2015.03.016] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Revised: 03/03/2015] [Accepted: 03/18/2015] [Indexed: 12/31/2022]
Abstract
For decades, the fruit fly, Drosophila melanogaster, has been among the premiere genetic model systems for probing fundamental neurobiology, including elucidation of mechanisms responsible for human neurologic disorders. Flies continue to offer virtually unparalleled versatility and speed for genetic manipulation, strong genomic conservation, and a nervous system that recapitulates a range of cellular and network properties relevant to human disease. I focus here on four critical challenges emerging from recent advances in our understanding of the genomic basis of human neurologic disorders where innovative experimental strategies are urgently needed: (1) pinpointing causal genes from associated genomic loci; (2) confirming the functional impact of allelic variants; (3) elucidating nervous system roles for novel or poorly studied genes; and (4) probing network interactions within implicated regulatory pathways. Drosophila genetic approaches are ideally suited to address each of these potential translational roadblocks, and will therefore contribute to mechanistic insights and potential breakthrough therapies for complex genetic disorders in the coming years. Strategic collaboration between neurologists, human geneticists, and the Drosophila research community holds great promise to accelerate progress in the post-genomic era.
Collapse
Affiliation(s)
- Joshua M Shulman
- Departments of Neurology, Molecular and Human Genetics, and Neuroscience, and Program in Developmental Biology, Baylor College of Medicine, Houston, TX, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, USA.
| |
Collapse
|
145
|
Wang X, Zhang S, Li Y, Li M, Sha Q. A powerful approach to test an optimally weighted combination of rare variants in admixed populations. Genet Epidemiol 2015; 39:294-305. [PMID: 25758547 DOI: 10.1002/gepi.21894] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Revised: 01/09/2015] [Accepted: 01/26/2015] [Indexed: 11/09/2022]
Abstract
Population stratification has long been recognized as an issue in genetic association studies because unrecognized population stratification can lead to both false-positive and false-negative findings and can obscure true association signals if not appropriately corrected. This issue can be even worse in rare variant association analyses because rare variants often demonstrate stronger and potentially different patterns of stratification than common variants. To correct for population stratification in genetic association studies, we proposed a novel method to Test the effect of an Optimally Weighted combination of variants in Admixed populations (TOWA) in which the analytically derived optimal weights can be calculated from existing phenotype and genotype data. TOWA up weights rare variants and those variants that have strong associations with the phenotype. Additionally, it can adjust for the direction of the association, and allows for local ancestry difference among study subjects. Extensive simulations show that the type I error rate of TOWA is under control in the presence of population stratification and it is more powerful than existing methods. We have also applied TOWA to a real sequencing data. Our simulation studies as well as real data analysis results indicate that TOWA is a useful tool for rare variant association analyses in admixed populations.
Collapse
Affiliation(s)
- Xuexia Wang
- Joseph J. Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, United States of America
| | | | | | | | | |
Collapse
|
146
|
Mendelian and polygenic inheritance of intelligence: A common set of causal genes? Using next-generation sequencing to examine the effects of 168 intellectual disability genes on normal-range intelligence. INTELLIGENCE 2015. [DOI: 10.1016/j.intell.2014.12.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
147
|
Thormaehlen AS, Schuberth C, Won HH, Blattmann P, Joggerst-Thomalla B, Theiss S, Asselta R, Duga S, Merlini PA, Ardissino D, Lander ES, Gabriel S, Rader DJ, Peloso GM, Pepperkok R, Kathiresan S, Runz H. Systematic cell-based phenotyping of missense alleles empowers rare variant association studies: a case for LDLR and myocardial infarction. PLoS Genet 2015; 11:e1004855. [PMID: 25647241 PMCID: PMC4409815 DOI: 10.1371/journal.pgen.1004855] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 10/27/2014] [Indexed: 01/08/2023] Open
Abstract
A fundamental challenge to contemporary genetics is to distinguish rare missense alleles that disrupt protein functions from the majority of alleles neutral on protein activities. High-throughput experimental tools to securely discriminate between disruptive and non-disruptive missense alleles are currently missing. Here we establish a scalable cell-based strategy to profile the biological effects and likely disease relevance of rare missense variants in vitro. We apply this strategy to systematically characterize missense alleles in the low-density lipoprotein receptor (LDLR) gene identified through exome sequencing of 3,235 individuals and exome-chip profiling of 39,186 individuals. Our strategy reliably identifies disruptive missense alleles, and disruptive-allele carriers have higher plasma LDL-cholesterol (LDL-C). Importantly, considering experimental data refined the risk of rare LDLR allele carriers from 4.5- to 25.3-fold for high LDL-C, and from 2.1- to 20-fold for early-onset myocardial infarction. Our study generates proof-of-concept that systematic functional variant profiling may empower rare variant-association studies by orders of magnitude. Exome sequencing has proven powerful to identify protein-coding variation across the human genome, unravel the basis of monogenic diseases and discover rare alleles that confer risk for complex disease. Nevertheless, two key challenges limit its application to complex phenotypes: first, most alleles identified in a population are extremely rare; and second, most alleles are neutral on protein activities. Consequently, association tests that rely on enumerating rare alleles in cases and controls (termed rare variant association studies, RVAS) are typically underpowered, as the many neutral alleles dampen signals that arise from the few alleles that disrupt protein functions. Strategies to securely discriminate disruptive from neutral variants are immature, in particular for missense variants. Here we show that the statistical power of RVAS improves dramatically if variants are stratified according to their in vitro ascertained functions. We establish scalable technology to objectively profile the biological effects of exome-identified missense variants in the low-density lipoprotein receptor (LDLR) through systematic overexpression and complementation experiments in cells. We demonstrate that carriers of LDLR alleles, which our experiments identify as “disruptive-missense”, have higher plasma LDL-C, and that considering in vitro data may make it possible to reduce RVAS sample sizes by more than 2-fold.
Collapse
Affiliation(s)
- Aenne S. Thormaehlen
- Institute of Human Genetics, University of Heidelberg, Heidelberg, Germany
- Molecular Medicine Partnership Unit (MMPU), University of Heidelberg/ EMBL, Heidelberg, Germany
| | - Christian Schuberth
- Institute of Human Genetics, University of Heidelberg, Heidelberg, Germany
- Molecular Medicine Partnership Unit (MMPU), University of Heidelberg/ EMBL, Heidelberg, Germany
| | - Hong-Hee Won
- Center of Human Genetic Research (CHGR), Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Peter Blattmann
- Molecular Medicine Partnership Unit (MMPU), University of Heidelberg/ EMBL, Heidelberg, Germany
- Cell Biology/Biophysics Unit, European Molecular Biological Laboratory, Heidelberg, Germany
| | - Brigitte Joggerst-Thomalla
- Institute of Human Genetics, University of Heidelberg, Heidelberg, Germany
- Molecular Medicine Partnership Unit (MMPU), University of Heidelberg/ EMBL, Heidelberg, Germany
| | - Susanne Theiss
- Institute of Human Genetics, University of Heidelberg, Heidelberg, Germany
| | | | | | | | | | - Eric S. Lander
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Stacey Gabriel
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Daniel J. Rader
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Gina M. Peloso
- Center of Human Genetic Research (CHGR), Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Rainer Pepperkok
- Molecular Medicine Partnership Unit (MMPU), University of Heidelberg/ EMBL, Heidelberg, Germany
- Cell Biology/Biophysics Unit, European Molecular Biological Laboratory, Heidelberg, Germany
| | - Sekar Kathiresan
- Center of Human Genetic Research (CHGR), Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Division of Cardiology, Ospedale Niguarda, Milan, Italy
| | - Heiko Runz
- Institute of Human Genetics, University of Heidelberg, Heidelberg, Germany
- Molecular Medicine Partnership Unit (MMPU), University of Heidelberg/ EMBL, Heidelberg, Germany
- Center of Human Genetic Research (CHGR), Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
148
|
Wang M, Lin S. Detecting associations of rare variants with common diseases: collapsing or haplotyping? Brief Bioinform 2015; 16:759-68. [PMID: 25596401 DOI: 10.1093/bib/bbu050] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Indexed: 01/11/2023] Open
Abstract
In recent years, a myriad of new statistical methods have been proposed for detecting associations of rare single-nucleotide variants (SNVs) with common diseases. These methods can be generally classified as 'collapsing' or 'haplotyping' based. The former is the predominant class, composed of most of the rare variant association methods proposed to date. However, recent works have suggested that haplotyping-based methods may offer advantages and can even be more powerful than collapsing methods in certain situations. In this article, we review and compare collapsing- versus haplotyping-based methods/software in terms of both power and type I error. For collapsing methods, we consider three approaches: Combined Multivariate and Collapsing, Sequence Kernel Association Test and Family-Based Association Test (FBAT): the first two are population based and are among the most popular; the last test is family based, a modification from the popular FBAT to accommodate rare SNVs. For haplotyping-based methods, we include Logistic Bayesian Lasso (LBL) for population data and family-based LBL (famLBL) for family (trio) data. These two methods are selected, as they can be used to test association for specific rare and common haplotypes. Our results show that haplotype methods can be more powerful than collapsing methods if there are interacting SNVs leading to larger haplotype effects. Even if only common SNVs are genotyped, haplotype methods can still detect specific rare haplotypes that tag rare causal SNVs. As expected, family-based methods are robust, whereas population-based methods are susceptible, to population substructure. However, the population-based haplotype approach appears to have smaller inflation of type I error than its collapsing counterparts.
Collapse
|
149
|
Bishop JP, Halburnt SB, Akkari PA, Sundseth S, Grossman I. Roadmap to Drug Development Enabled by Pharmacogenetics. ADVANCES IN PREDICTIVE, PREVENTIVE AND PERSONALISED MEDICINE 2015. [DOI: 10.1007/978-3-319-15344-5_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
150
|
Jones LH, Narayanan A, Hett EC. Understanding and applying tyrosine biochemical diversity. MOLECULAR BIOSYSTEMS 2014; 10:952-69. [PMID: 24623162 DOI: 10.1039/c4mb00018h] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
This review highlights some of the recent advances made in our understanding of the diversity of tyrosine biochemistry and shows how this has inspired novel applications in numerous areas of molecular design and synthesis, including chemical biology and bioconjugation. The pathophysiological implications of tyrosine biochemistry will be presented from a molecular perspective and the opportunities for therapeutic intervention explored.
Collapse
Affiliation(s)
- Lyn H Jones
- Pfizer R&D, Chemical Biology Group, BioTherapeutics Chemistry, WorldWide Medicinal Chemistry, 200 Cambridge Park Drive, Cambridge, MA 02140, USA.
| | | | | |
Collapse
|