1
|
Alasiri A, Karczewski KJ, Cole B, Loza BL, Moore JH, van der Laan SW, Asselbergs FW, Keating BJ, van Setten J. LoFTK: a framework for fully automated calculation of predicted Loss-of-Function variants and genes. BioData Min 2023; 16:3. [PMID: 36732776 PMCID: PMC9893534 DOI: 10.1186/s13040-023-00321-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 01/04/2023] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Loss-of-Function (LoF) variants in human genes are important due to their impact on clinical phenotypes and frequent occurrence in the genomes of healthy individuals. The association of LoF variants with complex diseases and traits may lead to the discovery and validation of novel therapeutic targets. Current approaches predict high-confidence LoF variants without identifying the specific genes or the number of copies they affect. Moreover, there is a lack of methods for detecting knockout genes caused by compound heterozygous (CH) LoF variants. RESULTS We have developed the Loss-of-Function ToolKit (LoFTK), which allows efficient and automated prediction of LoF variants from genotyped, imputed and sequenced genomes. LoFTK enables the identification of genes that are inactive in one or two copies and provides summary statistics for downstream analyses. LoFTK can identify CH LoF variants, which result in LoF genes with two copies lost. Using data from parents and offspring we show that 96% of CH LoF genes predicted by LoFTK in the offspring have the respective alleles donated by each parent. CONCLUSIONS LoFTK is a command-line based tool that provides a reliable computational workflow for predicting LoF variants from genotyped and sequenced genomes, identifying genes that are inactive in 1 or 2 copies. LoFTK is an open software and is freely available to non-commercial users at https://github.com/CirculatoryHealth/LoFTK .
Collapse
Affiliation(s)
- Abdulrahman Alasiri
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, University of Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Netherlands
- Medical Genomics Research Department, King Abdullah International Medical Research Center, King Saud Bin Abdulaziz University for Health Sciences, Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Brian Cole
- Bioinformatics Core, Harvard Medical School, Boston, MA, USA
| | - Bao-Li Loza
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Sander W van der Laan
- Central Diagnostic Laboratory, Division Laboratories, Pharmacy, and Biomedical Genetics, University Medical Center Utrecht, University of Utrecht, Utrecht, Netherlands
| | - Folkert W Asselbergs
- Department of Cardiology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
- Health Data Research UK and Institute of Health Informatics, University College London, London, UK
| | - Brendan J Keating
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jessica van Setten
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, University of Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Netherlands.
| |
Collapse
|
2
|
Kaminow B, Ballouz S, Gillis J, Dobin A. Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses. Genome Res 2022; 32:738-749. [PMID: 35256454 PMCID: PMC8997357 DOI: 10.1101/gr.275613.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 03/02/2022] [Indexed: 11/25/2022]
Abstract
The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. In order to find the best haploid genome representation, we constructed consensus genomes at the pan-human, super-population, and population levels, utilizing variant information from the 1000 Genomes Project. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of ~2-3 when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase overusing the pan-human consensus, suggesting a limit in the utility of incorporating more specific genomic variation. Replacing reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.
Collapse
Affiliation(s)
- Benjamin Kaminow
- Cold Spring Harbor Laboratory; Weill Cornell Graduate School of Medical Sciences
| | - Sara Ballouz
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research; School of Medical Sciences, University of New South Wales; Cold Spring Harbor Laboratory
| | | | | |
Collapse
|
3
|
Xu YC, Guo YL. Less Is More, Natural Loss-of-Function Mutation Is a Strategy for Adaptation. PLANT COMMUNICATIONS 2020; 1:100103. [PMID: 33367264 PMCID: PMC7743898 DOI: 10.1016/j.xplc.2020.100103] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 07/08/2020] [Accepted: 08/12/2020] [Indexed: 05/12/2023]
Abstract
Gene gain and loss are crucial factors that shape the evolutionary success of diverse organisms. In the past two decades, more attention has been paid to the significance of gene gain through gene duplication or de novo genes. However, gene loss through natural loss-of-function (LoF) mutations, which is prevalent in the genomes of diverse organisms, has been largely ignored. With the development of sequencing techniques, many genomes have been sequenced across diverse species and can be used to study the evolutionary patterns of gene loss. In this review, we summarize recent advances in research on various aspects of LoF mutations, including their identification, evolutionary dynamics in natural populations, and functional effects. In particular, we discuss how LoF mutations can provide insights into the minimum gene set (or the essential gene set) of an organism. Furthermore, we emphasize their potential impact on adaptation. At the genome level, although most LoF mutations are neutral or deleterious, at least some of them are under positive selection and may contribute to biodiversity and adaptation. Overall, we highlight the importance of natural LoF mutations as a robust framework for understanding biological questions in general.
Collapse
Affiliation(s)
- Yong-Chao Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ya-Long Guo
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
4
|
Abstract
The use of the human reference genome has shaped methods and data across modern genomics. This has offered many benefits while creating a few constraints. In the following opinion, we outline the history, properties, and pitfalls of the current human reference genome. In a few illustrative analyses, we focus on its use for variant-calling, highlighting its nearness to a 'type specimen'. We suggest that switching to a consensus reference would offer important advantages over the continued use of the current reference with few disadvantages.
Collapse
Affiliation(s)
- Sara Ballouz
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA
| | - Alexander Dobin
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA
| | - Jesse A Gillis
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
5
|
Bustamante Rivera YY, Brütting C, Schmidt C, Volkmer I, Staege MS. Endogenous Retrovirus 3 - History, Physiology, and Pathology. Front Microbiol 2018; 8:2691. [PMID: 29379485 PMCID: PMC5775217 DOI: 10.3389/fmicb.2017.02691] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Accepted: 12/26/2017] [Indexed: 01/05/2023] Open
Abstract
Endogenous viral elements (EVE) seem to be present in all eukaryotic genomes. The composition of EVE varies between different species. The endogenous retrovirus 3 (ERV3) is one of these elements that is present only in humans and other Catarrhini. Conservation of ERV3 in most of the investigated Catarrhini and the expression pattern in normal tissues suggest a putative physiological role of ERV3. On the other hand, ERV3 has been implicated in the pathogenesis of auto-immunity and cancer. In the present review we summarize knowledge about this interesting EVE. We propose the model that expression of ERV3 (and probably other EVE loci) under pathological conditions might be part of a metazoan SOS response.
Collapse
Affiliation(s)
| | - Christine Brütting
- Department of Paediatrics I, Martin Luther University Halle-Wittenberg, Halle, Germany.,Department of Neurology, Martin Luther University Halle-Wittenberg, Halle, Germany
| | - Caroline Schmidt
- Department of Paediatrics I, Martin Luther University Halle-Wittenberg, Halle, Germany
| | - Ines Volkmer
- Department of Paediatrics I, Martin Luther University Halle-Wittenberg, Halle, Germany
| | - Martin S Staege
- Department of Paediatrics I, Martin Luther University Halle-Wittenberg, Halle, Germany
| |
Collapse
|
6
|
Balasubramanian S, Fu Y, Pawashe M, McGillivray P, Jin M, Liu J, Karczewski KJ, MacArthur DG, Gerstein M. Using ALoFT to determine the impact of putative loss-of-function variants in protein-coding genes. Nat Commun 2017; 8:382. [PMID: 28851873 PMCID: PMC5575292 DOI: 10.1038/s41467-017-00443-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 06/29/2017] [Indexed: 11/09/2022] Open
Abstract
Variants predicted to result in the loss of function of human genes have attracted interest because of their clinical impact and surprising prevalence in healthy individuals. Here, we present ALoFT (annotation of loss-of-function transcripts), a method to annotate and predict the disease-causing potential of loss-of-function variants. Using data from Mendelian disease-gene discovery projects, we show that ALoFT can distinguish between loss-of-function variants that are deleterious as heterozygotes and those causing disease only in the homozygous state. Investigation of variants discovered in healthy populations suggests that each individual carries at least two heterozygous premature stop alleles that could potentially lead to disease if present as homozygotes. When applied to de novo putative loss-of-function variants in autism-affected families, ALoFT distinguishes between deleterious variants in patients and benign variants in unaffected siblings. Finally, analysis of somatic variants in >6500 cancer exomes shows that putative loss-of-function variants predicted to be deleterious by ALoFT are enriched in known driver genes.Variants causing loss of function (LoF) of human genes have clinical implications. Here, the authors present a method to predict disease-causing potential of LoF variants, ALoFT (annotation of Loss-of-Function Transcripts) and show its application to interpreting LoF variants in different contexts.
Collapse
Affiliation(s)
- Suganthi Balasubramanian
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA.
- Regeneron Genetics Center, Tarrytown, NY, 10591, USA.
| | - Yao Fu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Bina Technologies, Part of Roche Sequencing, Belmont, CA, 94002, USA
| | - Mayur Pawashe
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA
| | - Patrick McGillivray
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA
| | - Mike Jin
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA
| | - Jeremy Liu
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA
| | - Konrad J Karczewski
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, 02142, USA
| | - Daniel G MacArthur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, 02142, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA.
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
7
|
Whole-genome sequencing identifies rare genotypes in COMP and CHADL associated with high risk of hip osteoarthritis. Nat Genet 2017; 49:801-805. [DOI: 10.1038/ng.3816] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 02/23/2017] [Indexed: 12/13/2022]
|
8
|
Evaluating the Calling Performance of a Rare Disease NGS Panel for Single Nucleotide and Copy Number Variants. Mol Diagn Ther 2017; 21:303-313. [DOI: 10.1007/s40291-017-0268-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
9
|
Narasimhan VM, Xue Y, Tyler-Smith C. Human Knockout Carriers: Dead, Diseased, Healthy, or Improved? Trends Mol Med 2016; 22:341-351. [PMID: 26988438 PMCID: PMC4826344 DOI: 10.1016/j.molmed.2016.02.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Revised: 02/18/2016] [Accepted: 02/19/2016] [Indexed: 01/11/2023]
Abstract
Whole-genome and whole-exome sequence data from large numbers of individuals reveal that we all carry many variants predicted to inactivate genes (knockouts). This discovery raises questions about the phenotypic consequences of these knockouts and potentially allows us to study human gene function through the investigation of homozygous loss-of-function carriers. Here, we discuss strategies, recent results, and future prospects for large-scale human knockout studies. We examine their relevance to studying gene function, population genetics, and importantly, the implications for accurate clinical interpretations.
Collapse
Affiliation(s)
| | - Yali Xue
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | | |
Collapse
|
10
|
Human knockout research: new horizons and opportunities. Trends Genet 2014; 31:108-15. [PMID: 25497971 DOI: 10.1016/j.tig.2014.11.003] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Revised: 11/15/2014] [Accepted: 11/17/2014] [Indexed: 12/12/2022]
Abstract
Although numerous approaches have been pursued to understand the function of human genes, Mendelian genetics has by far provided the most compelling and medically actionable dataset. Biallelic loss-of-function (LOF) mutations are observed in the majority of autosomal recessive Mendelian disorders, representing natural human knockouts and offering a unique opportunity to study the physiological and developmental context of these genes. The restriction of such context to 'disease' states is artificial, however, and the recent ability to survey entire human genomes for biallelic LOF mutations has revealed a surprising landscape of knockout events in 'healthy' individuals, sparking interest in their role in phenotypic diversity beyond disease causation. As I discuss in this review, the potentially wide implications of human knockout research warrant increased investment and multidisciplinary collaborations to overcome existing challenges and reap its benefits.
Collapse
|
11
|
van der Burgt A, Karimi Jashni M, Bahkali AH, de Wit PJGM. Pseudogenization in pathogenic fungi with different host plants and lifestyles might reflect their evolutionary past. MOLECULAR PLANT PATHOLOGY 2014; 15:133-44. [PMID: 24393451 PMCID: PMC6638865 DOI: 10.1111/mpp.12072] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Pseudogenes are genes with significant homology to functional genes, but contain disruptive mutations (DMs) leading to the production of non- or partially functional proteins. Little is known about pseudogenization in pathogenic fungi with different lifestyles. Here, we report the identification of DMs causing pseudogenes in the genomes of the fungal plant pathogens Botrytis cinerea, Cladosporium fulvum, Dothistroma septosporum, Mycosphaerella fijiensis, Verticillium dahliae and Zymoseptoria tritici. In these fungi, we identified 1740 gene models containing 2795 DMs obtained by an alignment-based gene prediction method. The contribution of sequencing errors to DMs was minimized by analyses of resequenced genomes to obtain a refined dataset of 924 gene models containing 1666 true DMs. The frequency of pseudogenes varied from 1% to 5% in the gene catalogues of these fungi, being the highest in the asexually reproducing fungus C. fulvum (4.9%), followed by D. septosporum (2.4%) and V. dahliae (2.1%). The majority of pseudogenes do not represent recent gene duplications, but members of multi-gene families and unitary genes. In general, there was no bias for pseudogenization of specific genes in the six fungi. Single exceptions were those encoding secreted proteins, including proteases, which appeared more frequently pseudogenized in C. fulvum than in D. septosporum. Most pseudogenes present in these two phylogenetically closely related fungi are not shared, suggesting that they are related to adaptation to a different host (tomato versus pine) and lifestyle (biotroph versus hemibiotroph).
Collapse
Affiliation(s)
- Ate van der Burgt
- Laboratory of Phytopathology, Wageningen University and Research Centre, PO Box 16, 6700 AA, Wageningen, the Netherlands; Applied Bioinformatics, Plant Research International, Wageningen University and Research Centre, PO Box 16, 6700 AA, Wageningen, the Netherlands
| | | | | | | |
Collapse
|
12
|
Gala MK, Mizukami Y, Le LP, Moriichi K, Austin T, Yamamoto M, Lauwers GY, Bardeesy N, Chung DC. Germline mutations in oncogene-induced senescence pathways are associated with multiple sessile serrated adenomas. Gastroenterology 2014; 146:520-9. [PMID: 24512911 PMCID: PMC3978775 DOI: 10.1053/j.gastro.2013.10.045] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/16/2013] [Revised: 10/17/2013] [Accepted: 10/21/2013] [Indexed: 12/20/2022]
Abstract
BACKGROUND & AIMS Little is known about the genetic factors that contribute to the development of sessile serrated adenomas (SSAs). SSAs contain somatic mutations in BRAF or KRAS early in development. However, evidence from humans and mouse models indicates that these mutations result in oncogene-induced senescence (OIS) of intestinal crypt cells. Progression to serrated neoplasia requires cells to escape OIS via inactivation of tumor suppressor pathways. We investigated whether subjects with multiple SSAs carry germline loss-of function mutations (nonsense and splice site) in genes that regulate OIS: the p16-Rb and ATM-ATR DNA damage response pathways. METHODS Through a bioinformatic analysis of the literature, we identified a set of genes that function at the main nodes of the p16-Rb and ATM-ATR DNA damage response pathways. We performed whole-exome sequencing of 20 unrelated subjects with multiple SSAs; most had features of serrated polyposis. We compared sequences with those from 4300 subjects matched for ethnicity (controls). We also used an integrative genomics approach to identify additional genes involved in senescence mechanisms. RESULTS We identified mutations in genes that regulate senescence (ATM, PIF1, TELO2,XAF1, and RBL1) in 5 of 20 subjects with multiple SSAs (odds ratio, 3.0; 95% confidence interval, 0.9–8.9; P =.04). In 2 subjects,we found nonsense mutations in RNF43, indicating that it is also associated with multiple serrated polyps (odds ratio, 460; 95% confidence interval, 23.1–16,384; P = 6.8 x 10(-5)). In knockdown experiments with pancreatic duct cells exposed to UV light, RNF43 appeared to function as a regulator of ATMATRDNA damage response. CONCLUSIONS We associated germline loss-of-function variants in genes that regulate senescence pathways with the development of multiple SSAs.We identified RNF43 as a regulator of the DNA damage response and associated nonsense variants in this gene with a high risk of developing SSAs.
Collapse
Affiliation(s)
- Manish K. Gala
- Massachusetts General Hospital Department of Medicine, G.I. Unit and Harvard Medical School, Boston, MA
| | - Yusuke Mizukami
- Massachusetts General Hospital Department of Medicine, G.I. Unit and Harvard Medical School, Boston, MA,Massachusetts General Hospital Cancer Center and Harvard Medical School, Boston, MA,Center for Clinical and Biomedical Research, Sapporo Higashi Tokushukai Hospital, Sapporo, Japan
| | - Long P. Le
- Massachusetts General Hospital Department of Pathology and Harvard Medical School, Boston, MA
| | - Kentaro Moriichi
- Massachusetts General Hospital Department of Medicine, G.I. Unit and Harvard Medical School, Boston, MA
| | - Thomas Austin
- Massachusetts General Hospital Department of Medicine, G.I. Unit and Harvard Medical School, Boston, MA
| | - Masayoshi Yamamoto
- Massachusetts General Hospital Department of Medicine, G.I. Unit and Harvard Medical School, Boston, MA
| | - Gregory Y. Lauwers
- Massachusetts General Hospital Department of Pathology and Harvard Medical School, Boston, MA
| | - Nabeel Bardeesy
- Massachusetts General Hospital Cancer Center and Harvard Medical School, Boston, MA
| | - Daniel C. Chung
- Massachusetts General Hospital Department of Medicine, G.I. Unit and Harvard Medical School, Boston, MA,Massachusetts General Hospital Cancer Center and Harvard Medical School, Boston, MA
| |
Collapse
|
13
|
Grarup N, Sulem P, Sandholt CH, Thorleifsson G, Ahluwalia TS, Steinthorsdottir V, Bjarnason H, Gudbjartsson DF, Magnusson OT, Sparsø T, Albrechtsen A, Kong A, Masson G, Tian G, Cao H, Nie C, Kristiansen K, Husemoen LL, Thuesen B, Li Y, Nielsen R, Linneberg A, Olafsson I, Eyjolfsson GI, Jørgensen T, Wang J, Hansen T, Thorsteinsdottir U, Stefánsson K, Pedersen O. Genetic architecture of vitamin B12 and folate levels uncovered applying deeply sequenced large datasets. PLoS Genet 2013; 9:e1003530. [PMID: 23754956 PMCID: PMC3674994 DOI: 10.1371/journal.pgen.1003530] [Citation(s) in RCA: 104] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 04/11/2013] [Indexed: 11/26/2022] Open
Abstract
Genome-wide association studies have mainly relied on common HapMap sequence variations. Recently, sequencing approaches have allowed analysis of low frequency and rare variants in conjunction with common variants, thereby improving the search for functional variants and thus the understanding of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B12 (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined samples of 45,576 and 37,341 individuals with serum B12 and folate measurements, respectively. We found six novel loci associating with serum B12 (CD320, TCN2, ABCD4, MMAA, MMACHC) or folate levels (FOLR3) and confirmed seven loci for these traits (TCN1, FUT6, FUT2, CUBN, CLYBL, MUT, MTHFR). Conditional analyses established that four loci contain additional independent signals. Interestingly, 13 of the 18 identified variants were coding and 11 of the 13 target genes have known functions related to B12 and folate pathways. Contrary to epidemiological studies we did not find consistent association of the variants with cardiovascular diseases, cancers or Alzheimer's disease although some variants demonstrated pleiotropic effects. Although to some degree impeded by low statistical power for some of these conditions, these data suggest that sequence variants that contribute to the population diversity in serum B12 or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations. Genome-wide association studies have in recent years revealed a wealth of common variants associated with common diseases and phenotypes. We took advantage of the advances in sequencing technologies to study the association of low frequency and rare variants in conjunction with common variants with serum levels of vitamin B12 (B12) and folate in Icelanders and Danes. We found 18 independent signals in 13 loci associated with serum B12 or folate levels. Interestingly, 13 of the 18 identified variants are coding and 11 of the 13 target genes have known functions related to B12 and folate pathways. These data indicate that the target genes at all of the loci have been identified. Epidemiological studies have shown a relationship between serum B12 and folate levels and the risk of cardiovascular diseases, cancers, and Alzheimer's disease. We investigated association between the identified variants and these diseases but did not find consistent association.
Collapse
Affiliation(s)
- Niels Grarup
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Camilla H. Sandholt
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Tarunveer S. Ahluwalia
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | - Thomas Sparsø
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Anders Albrechtsen
- Centre of Bioinformatics, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | | | - Karsten Kristiansen
- Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Lise Lotte Husemoen
- Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark
| | - Betina Thuesen
- Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark
| | | | - Rasmus Nielsen
- Centre of Bioinformatics, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America
| | - Allan Linneberg
- Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark
| | - Isleifur Olafsson
- Landspitali, The National University Hospital of Iceland, Department of Clinical Biochemistry, Reykjavik, Iceland
| | | | - Torben Jørgensen
- Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark
- Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Faculty of Medicine, University of Aalborg, Aalborg, Denmark
| | - Jun Wang
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- BGI-Shenzhen, Shenzhen, China
- Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Torben Hansen
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark
| | - Unnur Thorsteinsdottir
- deCODE Genetics, Reykjavik, Iceland
- University of Iceland Faculty of Medicine, Reykjavik, Iceland
| | - Kari Stefánsson
- deCODE Genetics, Reykjavik, Iceland
- University of Iceland Faculty of Medicine, Reykjavik, Iceland
- * E-mail: (K. Stefánsson); (O. Pedersen)
| | - Oluf Pedersen
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Faculty of Health Sciences, Aarhus University, Aarhus, Denmark
- Hagedorn Research Institute, Gentofte, Denmark
- Institute of Biomedical Science, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (K. Stefánsson); (O. Pedersen)
| |
Collapse
|
14
|
Styrkarsdottir U, Thorleifsson G, Sulem P, Gudbjartsson DF, Sigurdsson A, Jonasdottir A, Jonasdottir A, Oddsson A, Helgason A, Magnusson OT, Walters GB, Frigge ML, Helgadottir HT, Johannsdottir H, Bergsteinsdottir K, Ogmundsdottir MH, Center JR, Nguyen TV, Eisman JA, Christiansen C, Steingrimsson E, Jonasson JG, Tryggvadottir L, Eyjolfsson GI, Theodors A, Jonsson T, Ingvarsson T, Olafsson I, Rafnar T, Kong A, Sigurdsson G, Masson G, Thorsteinsdottir U, Stefansson K. Nonsense mutation in the LGR4 gene is associated with several human diseases and other traits. Nature 2013; 497:517-20. [DOI: 10.1038/nature12124] [Citation(s) in RCA: 204] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Accepted: 03/27/2013] [Indexed: 12/12/2022]
|
15
|
Nishishita K, Sakai E, Okamoto K, Tsukuba T. Structural and phylogenetic comparison of napsin genes: The duplication, loss of function and human-specific pseudogenization of napsin B. Gene 2013; 517:147-57. [DOI: 10.1016/j.gene.2013.01.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Revised: 12/27/2012] [Accepted: 01/04/2013] [Indexed: 01/28/2023]
|
16
|
Torkamani A, Pham P, Libiger O, Bansal V, Zhang G, Scott-Van Zeeland AA, Tewhey R, Topol EJ, Schork NJ. Clinical implications of human population differences in genome-wide rates of functional genotypes. Front Genet 2012; 3:211. [PMID: 23125845 PMCID: PMC3485509 DOI: 10.3389/fgene.2012.00211] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Accepted: 09/26/2012] [Indexed: 12/21/2022] Open
Abstract
There have been a number of recent successes in the use of whole genome sequencing and sophisticated bioinformatics techniques to identify pathogenic DNA sequence variants responsible for individual idiopathic congenital conditions. However, the success of this identification process is heavily influenced by the ancestry or genetic background of a patient with an idiopathic condition. This is so because potential pathogenic variants in a patient’s genome must be contrasted with variants in a reference set of genomes made up of other individuals’ genomes of the same ancestry as the patient. We explored the effect of ignoring the ancestries of both an individual patient and the individuals used to construct reference genomes. We pursued this exploration in two major steps. We first considered variation in the per-genome number and rates of likely functional derived (i.e., non-ancestral, based on the chimp genome) single nucleotide variants and small indels in 52 individual whole human genomes sampled from 10 different global populations. We took advantage of a suite of computational and bioinformatics techniques to predict the functional effect of over 24 million genomic variants, both coding and non-coding, across these genomes. We found that the typical human genome harbors ∼5.5–6.1 million total derived variants, of which ∼12,000 are likely to have a functional effect (∼5000 coding and ∼7000 non-coding). We also found that the rates of functional genotypes per the total number of genotypes in individual whole genomes differ dramatically between human populations. We then created tables showing how the use of comparator or reference genome panels comprised of genomes from individuals that do not have the same ancestral background as a patient can negatively impact pathogenic variant identification. Our results have important implications for clinical sequencing initiatives.
Collapse
Affiliation(s)
- Ali Torkamani
- The Scripps Translational Science La Jolla, CA, USA ; Scripps Health La Jolla, CA, USA ; Department of Molecular and Experimental Medicine, The Scripps Research Institute La Jolla, CA, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Rosenfeld JA, Mason CE, Smith TM. Limitations of the human reference genome for personalized genomics. PLoS One 2012; 7:e40294. [PMID: 22811759 PMCID: PMC3394790 DOI: 10.1371/journal.pone.0040294] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2012] [Accepted: 06/07/2012] [Indexed: 11/19/2022] Open
Abstract
Data from the 1000 genomes project (1KGP) and Complete Genomics (CG) have dramatically increased the numbers of known genetic variants and challenge several assumptions about the reference genome and its uses in both clinical and research settings. Specifically, 34% of published array-based GWAS studies for a variety of diseases utilize probes that overlap unanticipated single nucleotide polymorphisms (SNPs), indels, or structural variants. Linkage disequilibrium (LD) block length depends on the numbers of markers used, and the mean LD block size decreases from 16 kb to 7 kb,when HapMap-based calculations are compared to blocks computed from1KGP data. Additionally, when 1KGP and CG variants are compared, 19% of the single nucleotide variants (SNVs) reported from common genomes are unique to one dataset; likely a result of differences in data collection methodology, alignment of reads to the reference genome, and variant-calling algorithms. Together these observations indicate that current research resources and informatics methods do not adequately account for the high level of variation that already exists in the human population and significant efforts are needed to create resources that can accurately assess personal genomics for health, disease, and predict treatment outcomes.
Collapse
Affiliation(s)
- Jeffrey A. Rosenfeld
- Division of High Performance and Research Computing, University of Medicine & Dentistry of New Jersey, Newark, New Jersey, United States of America
- American Museum of Natural History, Sackler Institute for Comparative Genomics, New York, New York, United States of America
| | - Christopher E. Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, United States of America
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, United States of America
| | - Todd M. Smith
- PerkinElmer, Seattle, Washington, United States of America
| |
Collapse
|
18
|
Vidal M, Chan DW, Gerstein M, Mann M, Omenn GS, Tagle D, Sechi S. The human proteome - a scientific opportunity for transforming diagnostics, therapeutics, and healthcare. Clin Proteomics 2012; 9:6. [PMID: 22583803 PMCID: PMC3388576 DOI: 10.1186/1559-0275-9-6] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2012] [Accepted: 05/14/2012] [Indexed: 11/16/2022] Open
Abstract
A National Institutes of Health (NIH) workshop was convened in Bethesda, MD on September 26–27, 2011, with representative scientific leaders in the field of proteomics and its applications to clinical settings. The main purpose of this workshop was to articulate ways in which the biomedical research community can capitalize on recent technology advances and synergize with ongoing efforts to advance the field of human proteomics. This executive summary and the following full report describe the main discussions and outcomes of the workshop.
Collapse
Affiliation(s)
- Marc Vidal
- University of Michigan, Ann Arbor, MI, USA.
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Habegger L, Balasubramanian S, Chen DZ, Khurana E, Sboner A, Harmanci A, Rozowsky J, Clarke D, Snyder M, Gerstein M. VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment. ACTA ACUST UNITED AC 2012; 28:2267-9. [PMID: 22743228 DOI: 10.1093/bioinformatics/bts368] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
UNLABELLED The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. AVAILABILITY AND IMPLEMENTATION VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.
Collapse
Affiliation(s)
- Lukas Habegger
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Fuentes Fajardo KV, Adams D, Mason CE, Sincan M, Tifft C, Toro C, Boerkoel CF, Gahl W, Markello T. Detecting false-positive signals in exome sequencing. Hum Mutat 2012; 33:609-13. [PMID: 22294350 PMCID: PMC3302978 DOI: 10.1002/humu.22033] [Citation(s) in RCA: 121] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2011] [Accepted: 12/02/2011] [Indexed: 11/11/2022]
Abstract
Disease gene discovery has been transformed by affordable sequencing of exomes and genomes. Identification of disease-causing mutations requires sifting through a large number of sequence variants. A subset of the variants are unlikely to be good candidates for disease causation based on one or more of the following criteria: (1) being located in genomic regions known to be highly polymorphic, (2) having characteristics suggesting assembly misalignment, and/or (3) being labeled as variants based on misleading reference genome information. We analyzed exome sequence data from 118 individuals in 29 families seen in the NIH Undiagnosed Diseases Program (UDP) to create lists of variants and genes with these characteristics. Specifically, we identified several groups of genes that are candidates for provisional exclusion during exome analysis: 23,389 positions with excess heterozygosity suggestive of alignment errors and 1,009 positions in which the hg18 human genome reference sequence appeared to contain a minor allele. Exclusion of such variants, which we provide in supplemental lists, will likely enhance identification of disease-causing mutations using exome sequence data.
Collapse
Affiliation(s)
- Karin V Fuentes Fajardo
- NIH Undiagnosed Diseases Program, NIH Office of Rare Diseases Research and NHGRI, Bethesda, Maryland, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
21
|
MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, Jostins L, Habegger L, Pickrell JK, Montgomery SB, Albers CA, Zhang ZD, Conrad DF, Lunter G, Zheng H, Ayub Q, DePristo MA, Banks E, Hu M, Handsaker RE, Rosenfeld JA, Fromer M, Jin M, Mu XJ, Khurana E, Ye K, Kay M, Saunders GI, Suner MM, Hunt T, Barnes IHA, Amid C, Carvalho-Silva DR, Bignell AH, Snow C, Yngvadottir B, Bumpstead S, Cooper DN, Xue Y, Romero IG, Wang J, Li Y, Gibbs RA, McCarroll SA, Dermitzakis ET, Pritchard JK, Barrett JC, Harrow J, Hurles ME, Gerstein MB, Tyler-Smith C. A systematic survey of loss-of-function variants in human protein-coding genes. Science 2012; 335:823-8. [PMID: 22344438 PMCID: PMC3299548 DOI: 10.1126/science.1215040] [Citation(s) in RCA: 873] [Impact Index Per Article: 72.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
Collapse
|
22
|
Animal models of human genetic diseases: do they need to be faithful to be useful? Mol Genet Genomics 2011; 286:1-20. [DOI: 10.1007/s00438-011-0627-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2011] [Accepted: 04/21/2011] [Indexed: 12/18/2022]
|