1
|
Shojaei M, Mohammadvand N, Doğan T, Alkan C, Çetin Atalay R, Acar AC. An integrative framework for clinical diagnosis and knowledge discovery from exome sequencing data. Comput Biol Med 2024; 169:107810. [PMID: 38134749 DOI: 10.1016/j.compbiomed.2023.107810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 11/06/2023] [Accepted: 12/03/2023] [Indexed: 12/24/2023]
Abstract
Non-silent single nucleotide genetic variants, like nonsense changes and insertion-deletion variants, that affect protein function and length substantially are prevalent and are frequently misclassified. The low sensitivity and specificity of existing variant effect predictors for nonsense and indel variations restrict their use in clinical applications. We propose the Pathogenic Mutation Prediction (PMPred) method to predict the pathogenicity of single nucleotide variations, which impair protein function by prematurely terminating a protein's elongation during its synthesis. The prediction starts by monitoring functional effects (Gene Ontology annotation changes) of the change in sequence, using an existing ensemble machine learning model (UniGOPred). This, in turn, reveals the mutations that significantly deviate functionally from the wild-type sequence. We have identified novel harmful mutations in patient data and present them as motivating case studies. We also show that our method has increased sensitivity and specificity compared to state-of-the-art, especially in single nucleotide variations that produce large functional changes in the final protein. As further validation, we have done a comparative docking study on such a variation that is misclassified by existing methods and, using the altered binding affinities, show how PMPred can correctly predict the pathogenicity when other tools miss it. PMPred is freely accessible as a web service at https://pmpred.kansil.org/, and the related code is available at https://github.com/kansil/PMPred.
Collapse
Affiliation(s)
- Mona Shojaei
- Cancer Systems Biology Laboratory, Graduate School of Informatics, Middle East Technical University, Ankara 06800 Turkey
| | - Navid Mohammadvand
- Biological Data Science Lab, Dept. of Computer Engineering, Hacettepe University, Ankara 06800 Turkey
| | - Tunca Doğan
- Biological Data Science Lab, Dept. of Computer Engineering, Hacettepe University, Ankara 06800 Turkey; Dept. of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara 06800 Turkey
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara 06800 Turkey
| | - Rengül Çetin Atalay
- Department of Medicine, University of Chicago, Chicago, IL, USA; Section of Pulmonary and Critical Care Medicine, University of Chicago, 5841 S. Maryland Avenue, MC6026, Chicago, IL, 60637, USA
| | - Aybar C Acar
- Cancer Systems Biology Laboratory, Graduate School of Informatics, Middle East Technical University, Ankara 06800 Turkey.
| |
Collapse
|
2
|
Dantsev IS, Parfenenko MA, Radzhabova GM, Nikolaeva EA. An FGFR2 mutation as the potential cause of a new phenotype including early-onset osteoporosis and bone fractures: a case report. BMC Med Genomics 2023; 16:329. [PMID: 38098042 PMCID: PMC10722747 DOI: 10.1186/s12920-023-01750-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/22/2023] [Indexed: 12/17/2023] Open
Abstract
Osteoporosis is a systemic, multifactorial disorder of bone mineralization. Many factors contributing to the development of osteoporosis have been identified so far, including gender, age, nutrition, lifestyle, exercise, drug use, as well as a range of comorbidities. In addition to environmental and lifestyle factors, molecular genetic factors account for 50-85% of osteoporosis cases. For example, the vitamin D receptor (VDR), collagen type I (COL1), estrogen receptor (ER), apolypoprotein Е (ApoE), bone morphogenetic protein (BMP), and Low-density lipoprotein receptor-related protein 5 (LRP5) are all involved in the pathogenesis of osteoporosis. Among the candidate genes, the pathogenic variants in which are involved in the pathogenesis of osteoporosis is FGFR2. Additionally, FGFs/FGFRs-dependent signaling has been shown to regulate skeletal development and has been linked to a plethora of heritable disorders of the musculoskeletal system. In this study we present the clinical, biochemical and radiological findings, as well as results of molecular genetic testing of a 13-year-old male proband with heritable osteoporosis, arthralgia and multiple fractures and a family history of abnormal bone mineralization and fractures. Whole exome sequencing found a heterozygous previously undescribed variant in the FGFR2 gene (NM_000141.5) (GRCh37.p13 ENSG00000066468.16: g.123298133dup; ENST00000358487.5:c.722dup; ENSP00000351276.5:p.Asn241LysfsTer43). The same variant was found in two affected relatives. These data lead us to believe that the variant in FGFR2 found in our proband and his relatives could be related to their phenotype. Therefore, modern methods of molecular genetic testing can allow us to differentiate between osteogenesis imperfecta and other bone mineralization disorders.
Collapse
Affiliation(s)
- Ilya S Dantsev
- Veltischev Research and Clinical Institute for Pediatrics and Pediatric Surgery of the Pirogov, Russian National Research Medical University of the Ministry of Health of the Russian Federation, 2 Taldomskaya St, Moscow, 125412, Russia
| | - Mariia A Parfenenko
- Veltischev Research and Clinical Institute for Pediatrics and Pediatric Surgery of the Pirogov, Russian National Research Medical University of the Ministry of Health of the Russian Federation, 2 Taldomskaya St, Moscow, 125412, Russia.
| | - Gulnara M Radzhabova
- Veltischev Research and Clinical Institute for Pediatrics and Pediatric Surgery of the Pirogov, Russian National Research Medical University of the Ministry of Health of the Russian Federation, 2 Taldomskaya St, Moscow, 125412, Russia
| | - Ekaterina A Nikolaeva
- Veltischev Research and Clinical Institute for Pediatrics and Pediatric Surgery of the Pirogov, Russian National Research Medical University of the Ministry of Health of the Russian Federation, 2 Taldomskaya St, Moscow, 125412, Russia
| |
Collapse
|
3
|
Han H, McGivney BA, Allen L, Bai D, Corduff LR, Davaakhuu G, Davaasambuu J, Dorjgotov D, Hall TJ, Hemmings AJ, Holtby AR, Jambal T, Jargalsaikhan B, Jargalsaikhan U, Kadri NK, MacHugh DE, Pausch H, Readhead C, Warburton D, Dugarjaviin M, Hill EW. Common protein-coding variants influence the racing phenotype in galloping racehorse breeds. Commun Biol 2022; 5:1320. [PMID: 36513809 PMCID: PMC9748125 DOI: 10.1038/s42003-022-04206-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 11/01/2022] [Indexed: 12/14/2022] Open
Abstract
Selection for system-wide morphological, physiological, and metabolic adaptations has led to extreme athletic phenotypes among geographically diverse horse breeds. Here, we identify genes contributing to exercise adaptation in racehorses by applying genomics approaches for racing performance, an end-point athletic phenotype. Using an integrative genomics strategy to first combine population genomics results with skeletal muscle exercise and training transcriptomic data, followed by whole-genome resequencing of Asian horses, we identify protein-coding variants in genes of interest in galloping racehorse breeds (Arabian, Mongolian and Thoroughbred). A core set of genes, G6PC2, HDAC9, KTN1, MYLK2, NTM, SLC16A1 and SYNDIG1, with central roles in muscle, metabolism, and neurobiology, are key drivers of the racing phenotype. Although racing potential is a multifactorial trait, the genomic architecture shaping the common athletic phenotype in horse populations bred for racing provides evidence for the influence of protein-coding variants in fundamental exercise-relevant genes. Variation in these genes may therefore be exploited for genetic improvement of horse populations towards specific types of racing.
Collapse
Affiliation(s)
- Haige Han
- grid.411638.90000 0004 1756 9607Inner Mongolia Key Laboratory of Equine Genetics, Breeding and Reproduction, College of Animal Science, Equine Research Center, Inner Mongolia Agricultural University, Hohhot, 010018 China
| | - Beatrice A. McGivney
- grid.496984.ePlusvital Ltd, The Highline, Dun Laoghaire Business Park, Dublin, A96 W5T3 Ireland
| | - Lucy Allen
- grid.417905.e0000 0001 2186 5933Royal Agricultural University, Cirencester, Gloucestershire GL7 6JS UK
| | - Dongyi Bai
- grid.411638.90000 0004 1756 9607Inner Mongolia Key Laboratory of Equine Genetics, Breeding and Reproduction, College of Animal Science, Equine Research Center, Inner Mongolia Agricultural University, Hohhot, 010018 China
| | - Leanne R. Corduff
- grid.496984.ePlusvital Ltd, The Highline, Dun Laoghaire Business Park, Dublin, A96 W5T3 Ireland
| | - Gantulga Davaakhuu
- grid.425564.40000 0004 0587 3863Institute of Biology, Mongolian Academy of Sciences, Peace Avenue 54B, Ulaanbaatar, 13330 Mongolia
| | - Jargalsaikhan Davaasambuu
- Ajnai Sharga Horse Racing Team, Encanto Town 210-11, Ikh Mongol State Street, 26th Khoroo, Bayanzurkh district Ulaanbaatar, 13312 Mongolia
| | - Dulguun Dorjgotov
- grid.440461.30000 0001 2191 7895School of Industrial Technology, Mongolian University of Science and Technology, Ulaanbaatar, 661 Mongolia
| | - Thomas J. Hall
- grid.7886.10000 0001 0768 2743UCD School of Agriculture and Food Science, University College Dublin, Belfield, Dublin D04 V1W8 Ireland
| | - Andrew J. Hemmings
- grid.417905.e0000 0001 2186 5933Royal Agricultural University, Cirencester, Gloucestershire GL7 6JS UK
| | - Amy R. Holtby
- grid.496984.ePlusvital Ltd, The Highline, Dun Laoghaire Business Park, Dublin, A96 W5T3 Ireland
| | - Tuyatsetseg Jambal
- grid.440461.30000 0001 2191 7895School of Industrial Technology, Mongolian University of Science and Technology, Ulaanbaatar, 661 Mongolia
| | - Badarch Jargalsaikhan
- grid.444534.60000 0000 8485 883XDepartment of Obstetrics and Gynecology, Mongolian National University of Medical Sciences, Ulaanbaatar, 14210 Mongolia
| | - Uyasakh Jargalsaikhan
- Ajnai Sharga Horse Racing Team, Encanto Town 210-11, Ikh Mongol State Street, 26th Khoroo, Bayanzurkh district Ulaanbaatar, 13312 Mongolia
| | - Naveen K. Kadri
- grid.5801.c0000 0001 2156 2780Animal Genomics, Institute of Agricultural Sciences, ETH Zürich, Universitätstrasse 2, 8092 Zürich, Switzerland
| | - David E. MacHugh
- grid.7886.10000 0001 0768 2743UCD School of Agriculture and Food Science, University College Dublin, Belfield, Dublin D04 V1W8 Ireland ,grid.7886.10000 0001 0768 2743UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin D04 V1W8 Ireland
| | - Hubert Pausch
- grid.5801.c0000 0001 2156 2780Animal Genomics, Institute of Agricultural Sciences, ETH Zürich, Universitätstrasse 2, 8092 Zürich, Switzerland
| | - Carol Readhead
- grid.20861.3d0000000107068890Biology and Bioengineering, California Institute of Technology, Pasadena, CA 91125 USA
| | - David Warburton
- grid.42505.360000 0001 2156 6853The Saban Research Institute, Children’s Hospital Los Angeles, Keck School of Medicine, University of Southern California, Los Angeles, CA 90027 USA
| | - Manglai Dugarjaviin
- grid.411638.90000 0004 1756 9607Inner Mongolia Key Laboratory of Equine Genetics, Breeding and Reproduction, College of Animal Science, Equine Research Center, Inner Mongolia Agricultural University, Hohhot, 010018 China
| | - Emmeline W. Hill
- grid.496984.ePlusvital Ltd, The Highline, Dun Laoghaire Business Park, Dublin, A96 W5T3 Ireland ,grid.7886.10000 0001 0768 2743UCD School of Agriculture and Food Science, University College Dublin, Belfield, Dublin D04 V1W8 Ireland
| |
Collapse
|
4
|
Rastogi R, Stenson PD, Cooper DN, Bejerano G. X-CAP improves pathogenicity prediction of stopgain variants. Genome Med 2022; 14:81. [PMID: 35906703 PMCID: PMC9338606 DOI: 10.1186/s13073-022-01078-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 06/23/2022] [Indexed: 12/02/2022] Open
Abstract
Stopgain substitutions are the third-largest class of monogenic human disease mutations and often examined first in patient exomes. Existing computational stopgain pathogenicity predictors, however, exhibit poor performance at the high sensitivity required for clinical use. Here, we introduce a new classifier, termed X-CAP, which uses a novel training methodology and unique feature set to improve the AUROC by 18% and decrease the false-positive rate 4-fold on large variant databases. In patient exomes, X-CAP prioritizes causal stopgains better than existing methods do, further illustrating its clinical utility. X-CAP is available at https://github.com/bejerano-lab/X-CAP .
Collapse
Affiliation(s)
- Ruchir Rastogi
- grid.168010.e0000000419368956Department of Computer Science, Stanford University, Stanford, USA
| | - Peter D. Stenson
- grid.5600.30000 0001 0807 5670Institute of Medical Genetics, Cardiff University, Cardiff, UK
| | - David N. Cooper
- grid.5600.30000 0001 0807 5670Institute of Medical Genetics, Cardiff University, Cardiff, UK
| | - Gill Bejerano
- Department of Computer Science, Stanford University, Stanford, USA. .,Department of Developmental Biology, Stanford University, Stanford, USA. .,Department of Pediatrics, Stanford University, Stanford, USA. .,Department of Biomedical Data Science, Stanford University, Stanford, USA.
| |
Collapse
|
5
|
Cui R, Chen D, Li N, Cai M, Wan T, Zhang X, Zhang M, Du S, Ou H, Jiao J, Jiang N, Zhao S, Song H, Song X, Ma D, Zhang J, Li S. PARD3 gene variation as candidate cause of nonsyndromic cleft palate only. J Cell Mol Med 2022; 26:4292-4304. [PMID: 35789100 PMCID: PMC9344820 DOI: 10.1111/jcmm.17452] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 06/01/2022] [Accepted: 06/06/2022] [Indexed: 12/16/2022] Open
Abstract
Nonsyndromic cleft palate only (NSCP) is a common congenital malformation worldwide. In this study, we report a three‐generation pedigree with NSCP following the autosomal‐dominant pattern. Whole‐exome sequencing and Sanger sequencing revealed that only the frameshift variant c.1012dupG [p. E338Gfs*26] in PARD3 cosegregated with the disease. In zebrafish embryos, ethmoid plate patterning defects were observed with PARD3 ortholog disruption or expression of patient‐derived N‐terminal truncating PARD3 (c.1012dupG), which implicated PARD3 in ethmoid plate morphogenesis. PARD3 plays vital roles in determining cellular polarity. Compared with the apical distribution of wild‐type PARD3, PARD3‐p. E338Gfs*26 mainly localized to the basal membrane in 3D‐cultured MCF‐10A epithelial cells. The interaction between PARD3‐p. E338Gfs*26 and endogenous PARD3 was identified by LC–MS/MS and validated by co‐IP. Immunofluorescence analysis showed that PARD3‐p. E338Gfs*26 substantially altered the localization of endogenous PARD3 to the basement membrane in 3D‐cultured MCF‐10A cells. Furthermore, seven variants, including one nonsense variant and six missense variants, were identified in the coding region of PARD3 in sporadic cases with NSCP. Subsequent analysis showed that PARD3‐p. R133*, like the insertion variant of c.1012dupG, also changed the localization of endogenous full‐length PARD3 and that its expression induced abnormal ethmoid plate morphogenesis in zebrafish. Based on these data, we reveal PARD3 gene variation as a novel candidate cause of nonsyndromic cleft palate only.
Collapse
Affiliation(s)
- Renjie Cui
- Key Laboratory of Metabolism and Molecular Medicine, Ministry of Education, Department of Biochemistry and Molecular Biology, Collaborative Innovation Center of Genetics and Development, Institutes of Biomedical Sciences, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, Shanghai, China
| | - Dingli Chen
- Department of Clinical Laboratory, Central Hospital of Handan, Hebei, China
| | - Na Li
- Key Laboratory of Metabolism and Molecular Medicine, Ministry of Education, Department of Biochemistry and Molecular Biology, Collaborative Innovation Center of Genetics and Development, Institutes of Biomedical Sciences, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China
| | - Ming Cai
- Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, Shanghai, China
| | - Teng Wan
- Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, Shanghai, China
| | - Xueqiang Zhang
- Department of Clinical Laboratory, Central Hospital of Handan, Hebei, China.,Oral and Maxillofacial Surgery, Central Hospital of Handan, Hebei, China
| | - Meiqin Zhang
- Key Laboratory of Metabolism and Molecular Medicine, Ministry of Education, Department of Biochemistry and Molecular Biology, Collaborative Innovation Center of Genetics and Development, Institutes of Biomedical Sciences, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China
| | - Sichen Du
- Key Laboratory of Metabolism and Molecular Medicine, Ministry of Education, Department of Biochemistry and Molecular Biology, Collaborative Innovation Center of Genetics and Development, Institutes of Biomedical Sciences, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China
| | - Huayuan Ou
- Key Laboratory of Metabolism and Molecular Medicine, Ministry of Education, Department of Biochemistry and Molecular Biology, Collaborative Innovation Center of Genetics and Development, Institutes of Biomedical Sciences, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China
| | - Jianjun Jiao
- Oral and Maxillofacial Surgery, Central Hospital of Handan, Hebei, China
| | - Nan Jiang
- Key Laboratory of Metabolism and Molecular Medicine, Ministry of Education, Department of Biochemistry and Molecular Biology, Collaborative Innovation Center of Genetics and Development, Institutes of Biomedical Sciences, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China
| | - Shuangxia Zhao
- Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, Shanghai, China
| | - Huaidong Song
- Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, Shanghai, China
| | - Xuedong Song
- Department of Clinical Laboratory, Central Hospital of Handan, Hebei, China
| | - Duan Ma
- Key Laboratory of Metabolism and Molecular Medicine, Ministry of Education, Department of Biochemistry and Molecular Biology, Collaborative Innovation Center of Genetics and Development, Institutes of Biomedical Sciences, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China.,Children's Hospital of Fudan University, Shanghai, China
| | - Jin Zhang
- Key Laboratory of Metabolism and Molecular Medicine, Ministry of Education, Department of Biochemistry and Molecular Biology, Collaborative Innovation Center of Genetics and Development, Institutes of Biomedical Sciences, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China
| | - Shouxia Li
- Department of Clinical Laboratory, Central Hospital of Handan, Hebei, China
| |
Collapse
|
6
|
Xu YC, Guo YL. Less Is More, Natural Loss-of-Function Mutation Is a Strategy for Adaptation. PLANT COMMUNICATIONS 2020; 1:100103. [PMID: 33367264 PMCID: PMC7743898 DOI: 10.1016/j.xplc.2020.100103] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 07/08/2020] [Accepted: 08/12/2020] [Indexed: 05/12/2023]
Abstract
Gene gain and loss are crucial factors that shape the evolutionary success of diverse organisms. In the past two decades, more attention has been paid to the significance of gene gain through gene duplication or de novo genes. However, gene loss through natural loss-of-function (LoF) mutations, which is prevalent in the genomes of diverse organisms, has been largely ignored. With the development of sequencing techniques, many genomes have been sequenced across diverse species and can be used to study the evolutionary patterns of gene loss. In this review, we summarize recent advances in research on various aspects of LoF mutations, including their identification, evolutionary dynamics in natural populations, and functional effects. In particular, we discuss how LoF mutations can provide insights into the minimum gene set (or the essential gene set) of an organism. Furthermore, we emphasize their potential impact on adaptation. At the genome level, although most LoF mutations are neutral or deleterious, at least some of them are under positive selection and may contribute to biodiversity and adaptation. Overall, we highlight the importance of natural LoF mutations as a robust framework for understanding biological questions in general.
Collapse
Affiliation(s)
- Yong-Chao Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ya-Long Guo
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
7
|
Rausell A, Luo Y, Lopez M, Seeleuthner Y, Rapaport F, Favier A, Stenson PD, Cooper DN, Patin E, Casanova JL, Quintana-Murci L, Abel L. Common homozygosity for predicted loss-of-function variants reveals both redundant and advantageous effects of dispensable human genes. Proc Natl Acad Sci U S A 2020; 117:13626-13636. [PMID: 32487729 PMCID: PMC7306792 DOI: 10.1073/pnas.1917993117] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Humans homozygous or hemizygous for variants predicted to cause a loss of function (LoF) of the corresponding protein do not necessarily present with overt clinical phenotypes. We report here 190 autosomal genes with 207 predicted LoF variants, for which the frequency of homozygous individuals exceeds 1% in at least one human population from five major ancestry groups. No such genes were identified on the X and Y chromosomes. Manual curation revealed that 28 variants (15%) had been misannotated as LoF. Of the 179 remaining variants in 166 genes, only 11 alleles in 11 genes had previously been confirmed experimentally to be LoF. The set of 166 dispensable genes was enriched in olfactory receptor genes (41 genes). The 41 dispensable olfactory receptor genes displayed a relaxation of selective constraints similar to that observed for other olfactory receptor genes. The 125 dispensable nonolfactory receptor genes also displayed a relaxation of selective constraints consistent with greater redundancy. Sixty-two of these 125 genes were found to be dispensable in at least three human populations, suggesting possible evolution toward pseudogenes. Of the 179 LoF variants, 68 could be tested for two neutrality statistics, and 8 displayed robust signals of positive selection. These latter variants included a known FUT2 variant that confers resistance to intestinal viruses, and an APOL3 variant involved in resistance to parasitic infections. Overall, the identification of 166 genes for which a sizeable proportion of humans are homozygous for predicted LoF alleles reveals both redundancies and advantages of such deficiencies for human survival.
Collapse
Affiliation(s)
- Antonio Rausell
- Clinical Bioinformatics Laboratory, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France;
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Yufei Luo
- Clinical Bioinformatics Laboratory, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Marie Lopez
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR2000, CNRS, Paris 75015, France
| | - Yoann Seeleuthner
- University of Paris, Imagine Institute, 75015 Paris, France
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France
| | - Franck Rapaport
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
| | - Antoine Favier
- Clinical Bioinformatics Laboratory, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, CF14 4XN Cardiff, United Kingdom
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, CF14 4XN Cardiff, United Kingdom
| | - Etienne Patin
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR2000, CNRS, Paris 75015, France
| | - Jean-Laurent Casanova
- University of Paris, Imagine Institute, 75015 Paris, France;
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
- Howard Hughes Medical Institute, New York, NY 10065
- Pediatric Hematology and Immunology Unit, Necker Hospital for Sick Children, 75015 Paris, France
| | - Lluis Quintana-Murci
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR2000, CNRS, Paris 75015, France
- Human Genomics and Evolution, Collège de France, Paris 75005, France
| | - Laurent Abel
- University of Paris, Imagine Institute, 75015 Paris, France;
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
| |
Collapse
|
8
|
Mármol-Sánchez E, Luigi-Sierra MG, Quintanilla R, Amills M. Detection of homozygous genotypes for a putatively lethal recessive mutation in the porcine argininosuccinate synthase 1 (ASS1) gene. Anim Genet 2019; 51:106-110. [PMID: 31729055 DOI: 10.1111/age.12877] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/17/2019] [Indexed: 12/18/2022]
Abstract
The sequencing of the pig genome revealed the existence of homozygous individuals for a nonsense mutation in the argininosuccinate synthase 1 (ASS1) gene (rs81212146, c.944T>A, L315X). Paradoxically, an AA homozygous genotype for this polymorphism is expected to abolish the function of the ASS1 enzyme that participates in the urea cycle, leading to citrullinemia, hyperammonemia, coma and death. Sequencing of five Duroc boars that sired a population of 350 Duroc barrows revealed the segregation of the c.944T>A polymorphism, so we aimed to investigate its phenotypic consequences. Genotyping of this mutation in the 350 Duroc barrows revealed the existence of seven individuals homozygous (AA) for the nonsense mutation. These AA pigs had a normal weight despite the fact that mild citrullinemia often involves impaired growth. Sequencing of the region surrounding the mutation in TT, TA and AA individuals revealed that the A substitution in the second position of the codon (c.944T>A) is in complete linkage disequilibrium with a C replacement (c.943T>C) in the first position of the codon. This second mutation would compensate for the potentially damaging effect of the c.944T>A replacement. In fact, this is the most probable reason why pigs with homozygous AA genotypes at the 944 site of the ASS1 coding region are alive. Our results illustrate the complexities of predicting the consequences of nonsense mutations on gene function and phenotypes, not only because of annotation issues but also owing to the existence of genetic mechanisms that sometimes limit the penetrance of highly harmful mutations.
Collapse
Affiliation(s)
- E Mármol-Sánchez
- Department of Animal Genetics, Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus de la Universitat Autònoma de Barcelona, Bellaterra, 08193, Spain
| | - M G Luigi-Sierra
- Department of Animal Genetics, Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus de la Universitat Autònoma de Barcelona, Bellaterra, 08193, Spain
| | - R Quintanilla
- Animal Breeding and Genetics Programme, Institute for Research and Technology in Food and Agriculture (IRTA), Caldes de Montbui, 08140, Spain
| | - M Amills
- Department of Animal Genetics, Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus de la Universitat Autònoma de Barcelona, Bellaterra, 08193, Spain.,Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, Bellaterra, 08193, Spain
| |
Collapse
|
9
|
NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans. Genome Biol 2019; 20:32. [PMID: 30744685 PMCID: PMC6371618 DOI: 10.1186/s13059-019-1634-2] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 01/17/2019] [Indexed: 02/07/2023] Open
Abstract
State-of-the-art methods assessing pathogenic non-coding variants have mostly been characterized on common disease-associated polymorphisms, yet with modest accuracy and strong positional biases. In this study, we curated 737 high-confidence pathogenic non-coding variants associated with monogenic Mendelian diseases. In addition to interspecies conservation, a comprehensive set of recent and ongoing purifying selection signals in humans is explored, accounting for lineage-specific regulatory elements. Supervised learning using gradient tree boosting on such features achieves a high predictive performance and overcomes positional bias. NCBoost performs consistently across diverse learning and independent testing data sets and outperforms other existing reference methods.
Collapse
|
10
|
Pingel J, Andersen JD, Christiansen SL, Børsting C, Morling N, Lorentzen J, Kirk H, Doessing S, Wong C, Nielsen JB. Sequence variants in muscle tissue-related genes may determine the severity of muscle contractures in cerebral palsy. Am J Med Genet B Neuropsychiatr Genet 2019; 180:12-24. [PMID: 30467950 DOI: 10.1002/ajmg.b.32693] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Revised: 07/20/2018] [Accepted: 09/20/2018] [Indexed: 12/30/2022]
Abstract
Muscle contractures are a common complication to cerebral palsy (CP). The purpose of this study was to evaluate whether individuals with CP carry specific gene variants of important structural genes that might explain the severity of muscle contractures. Next-generation-sequencing (NGS) of 96 candidate genes associated with muscle structure and metabolism were analyzed in 43 individuals with CP (Gross Motor Function classification system [GMFCS] I, n=10; GMFCS II, n=14; GMFCS III, n=19) and four control participants. In silico analysis of the identified variants was performed. The variants were classified into four categories ranging from likely benign (VUS0) to highly likely functional effect (VUS3). All individuals with CP were classified and grouped according to their GMFCS level: Statistical comparisons were made between GMFCS groups. Kruskal-Wallis tests showed significantly more VUS2 variants in the genes COL4 (GMFCS I-III; 1, 1, 5, respectively [p < .04]), COL5 (GMFCS I-III; 1, 1, 5 [p < .04]), COL6 (GMFCS I-III; 0, 4, 7 [p < .003]), and COL9 (GMFCS I-III; 1, 1, 5 [p < .04]), in individuals with CP within GMFCS Level III when compared to the other GMFCS levels. Furthermore, significantly more VUS3 variants in COL6 (GMFCS I-III; 0, 5, 2 [p < .01]) and COL7 (GMFCS I-III; 0, 3, 0 [p < .04]) were identified in the GMFCS II level when compared to the other GMFCS levels. The present results highlight several candidate gene variants in different collagen types with likely functional effects in individuals with CP.
Collapse
Affiliation(s)
- Jessica Pingel
- Department of Neuroscience and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jeppe Dyrberg Andersen
- Department of Forensic Medicine, Section of Forensic Genetics, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Sofie Lindgren Christiansen
- Department of Forensic Medicine, Section of Forensic Genetics, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Claus Børsting
- Department of Forensic Medicine, Section of Forensic Genetics, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Niels Morling
- Department of Forensic Medicine, Section of Forensic Genetics, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jakob Lorentzen
- Department of Neuroscience and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Helene Elsass Center, Charlottenlund, Denmark
| | - Henrik Kirk
- Department of Neuroscience and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Helene Elsass Center, Charlottenlund, Denmark
| | - Simon Doessing
- Department of Orthopedic Surgery, Copenhagen University Hospital Hvidovre, Hvidovre, Denmark
| | - Christian Wong
- Department of Orthopedic Surgery, Copenhagen University Hospital Hvidovre, Hvidovre, Denmark
| | - Jens Bo Nielsen
- Department of Neuroscience and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Helene Elsass Center, Charlottenlund, Denmark
| |
Collapse
|
11
|
Zhou Y, Fujikura K, Mkrtchian S, Lauschke VM. Computational Methods for the Pharmacogenetic Interpretation of Next Generation Sequencing Data. Front Pharmacol 2018; 9:1437. [PMID: 30564131 PMCID: PMC6288784 DOI: 10.3389/fphar.2018.01437] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 11/20/2018] [Indexed: 12/21/2022] Open
Abstract
Up to half of all patients do not respond to pharmacological treatment as intended. A substantial fraction of these inter-individual differences is due to heritable factors and a growing number of associations between genetic variations and drug response phenotypes have been identified. Importantly, the rapid progress in Next Generation Sequencing technologies in recent years unveiled the true complexity of the genetic landscape in pharmacogenes with tens of thousands of rare genetic variants. As each individual was found to harbor numerous such rare variants they are anticipated to be important contributors to the genetically encoded inter-individual variability in drug effects. The fundamental challenge however is their functional interpretation due to the sheer scale of the problem that renders systematic experimental characterization of these variants currently unfeasible. Here, we review concepts and important progress in the development of computational prediction methods that allow to evaluate the effect of amino acid sequence alterations in drug metabolizing enzymes and transporters. In addition, we discuss recent advances in the interpretation of functional effects of non-coding variants, such as variations in splice sites, regulatory regions and miRNA binding sites. We anticipate that these methodologies will provide a useful toolkit to facilitate the integration of the vast extent of rare genetic variability into drug response predictions in a precision medicine framework.
Collapse
Affiliation(s)
- Yitian Zhou
- Section of Pharmacogenetics, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Kohei Fujikura
- Department of Diagnostic Pathology, Kobe University Graduate School of Medicine, Kobe, Japan
| | - Souren Mkrtchian
- Section of Pharmacogenetics, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Volker M. Lauschke
- Section of Pharmacogenetics, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
12
|
Ryu CS, Sakong JH, Ahn EH, Kim JO, Ko D, Kim JH, Lee WS, Kim NK. Association study of the three functional polymorphisms (TAS2R46G>A, OR4C16G>A, and OR4X1A>T) with recurrent pregnancy loss. Genes Genomics 2018; 41:61-70. [PMID: 30203366 DOI: 10.1007/s13258-018-0738-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 08/31/2018] [Indexed: 01/07/2023]
Abstract
This study was purposed to investigate whether genetic polymorphisms in the function of stop-gain are associated with a fetal or placental development play roles and a development of idiopathic recurrent pregnancy loss (RPL) in Korean females. Three stop-gain polymorphisms were selected using next-generation sequencing screening, which allows for the rigorous examination and discovery of previously uncharacterized stop-gain genes and stop-gain expression profiles. Accordingly, we investigated the association of stop-gain polymorphisms in Korean women with RPL. Three functional polymorphisms in the TAS2R46G>A (rs2708381), OR4C16G>A (rs1459101), and OR4X1A>T (rs10838851) genes were genotyped using polymerase chain reaction (PCR)-restriction fragment length polymorphism assays and real-time PCR analysis. We determined that the OR4C16G>A polymorphism was associated with idiopathic RPL in Korean women (Adjusted odds ratio [AOR] 1.782; 95% confidence interval [CI] 1.004-3.163; P = 0.048, and AOR 1.766; 95% CI 1.020-3.059; P = 0.042). In addition, the prevalence of RPL was increased in women with the OR4C16GA + AA genotype and blood coagulation measures of prothrombin time (PT) > 10.4 s (AOR 8.292; 95% CI 2.744-25.054). We suggest that the OR4C16G>A polymorphism might serve as a clinically useful biomarker for the development, prevention, and prognosis of RPL.
Collapse
Affiliation(s)
- Chang Soo Ryu
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam, 13488, South Korea
| | - Jung Hyun Sakong
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam, 13488, South Korea
| | - Eun Hee Ahn
- Department of Obstetrics and Gynecology, CHA Bundang Medical Center, CHA University, Seongnam, 13496, South Korea
| | - Jung Oh Kim
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam, 13488, South Korea
| | - Daeun Ko
- Department of Anesthesiology and Pain Medicine, CHA Bundang Medical Center, CHA University, Seongnam, 13496, South Korea
| | - Ji Hyang Kim
- Department of Obstetrics and Gynecology, CHA Bundang Medical Center, CHA University, Seongnam, 13496, South Korea
| | - Woo Sik Lee
- Fertility Center of CHA Gangnam Medical Center, CHA University, Gangnam, 06135, South Korea
| | - Nam Keun Kim
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam, 13488, South Korea.
| |
Collapse
|
13
|
Wang Z, Ng KS, Chen T, Kim TB, Wang F, Shaw K, Scott KL, Meric-Bernstam F, Mills GB, Chen K. Cancer driver mutation prediction through Bayesian integration of multi-omic data. PLoS One 2018; 13:e0196939. [PMID: 29738578 PMCID: PMC5940219 DOI: 10.1371/journal.pone.0196939] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 04/23/2018] [Indexed: 01/23/2023] Open
Abstract
Identification of cancer driver mutations is critical for advancing cancer research and personalized medicine. Due to inter-tumor genetic heterogeneity, many driver mutations occur at low frequencies, which make it challenging to distinguish them from passenger mutations. Here, we show that a novel Bayesian hierarchical modeling approach, named rDriver can achieve enhanced prediction accuracy by identifying mutations that not only have high functional impact scores but also are associated with systemic variation in gene expression levels. In examining 3,080 tumor samples from 8 cancer types in The Cancer Genome Atlas, rDriver predicted 1,389 driver mutations. Compared with existing tools, rDriver identified more low frequency mutations associated with lineage specific functional properties, timing of occurrence and patient survival. Evaluation of rDriver predictions using engineered cell-line models resulted in a positive predictive value of 0.94 in PIK3CA genes. Our study highlights the importance of integrating multi-omic data in predicting cancer driver mutations and provides a statistically rigorous solution for cancer target discovery and development.
Collapse
Affiliation(s)
- Zixing Wang
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
- Institute for Personalized Cancer Therapy, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Kwok-Shing Ng
- Institute for Personalized Cancer Therapy, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Tenghui Chen
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Tae-Beom Kim
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Fang Wang
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Kenna Shaw
- Institute for Personalized Cancer Therapy, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Kenneth L. Scott
- Department of Human and Molecular Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Funda Meric-Bernstam
- Institute for Personalized Cancer Therapy, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
- Department of Investigational Cancer Therapy, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Gordon B. Mills
- Institute for Personalized Cancer Therapy, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
- Department of Systems Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
- Institute for Personalized Cancer Therapy, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
14
|
|
15
|
Balasubramanian S, Fu Y, Pawashe M, McGillivray P, Jin M, Liu J, Karczewski KJ, MacArthur DG, Gerstein M. Using ALoFT to determine the impact of putative loss-of-function variants in protein-coding genes. Nat Commun 2017; 8:382. [PMID: 28851873 PMCID: PMC5575292 DOI: 10.1038/s41467-017-00443-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 06/29/2017] [Indexed: 11/09/2022] Open
Abstract
Variants predicted to result in the loss of function of human genes have attracted interest because of their clinical impact and surprising prevalence in healthy individuals. Here, we present ALoFT (annotation of loss-of-function transcripts), a method to annotate and predict the disease-causing potential of loss-of-function variants. Using data from Mendelian disease-gene discovery projects, we show that ALoFT can distinguish between loss-of-function variants that are deleterious as heterozygotes and those causing disease only in the homozygous state. Investigation of variants discovered in healthy populations suggests that each individual carries at least two heterozygous premature stop alleles that could potentially lead to disease if present as homozygotes. When applied to de novo putative loss-of-function variants in autism-affected families, ALoFT distinguishes between deleterious variants in patients and benign variants in unaffected siblings. Finally, analysis of somatic variants in >6500 cancer exomes shows that putative loss-of-function variants predicted to be deleterious by ALoFT are enriched in known driver genes.Variants causing loss of function (LoF) of human genes have clinical implications. Here, the authors present a method to predict disease-causing potential of LoF variants, ALoFT (annotation of Loss-of-Function Transcripts) and show its application to interpreting LoF variants in different contexts.
Collapse
Affiliation(s)
- Suganthi Balasubramanian
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA.
- Regeneron Genetics Center, Tarrytown, NY, 10591, USA.
| | - Yao Fu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Bina Technologies, Part of Roche Sequencing, Belmont, CA, 94002, USA
| | - Mayur Pawashe
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA
| | - Patrick McGillivray
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA
| | - Mike Jin
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA
| | - Jeremy Liu
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA
| | - Konrad J Karczewski
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, 02142, USA
| | - Daniel G MacArthur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, 02142, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA.
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
16
|
Pagel KA, Pejaver V, Lin GN, Nam HJ, Mort M, Cooper DN, Sebat J, Iakoucheva LM, Mooney SD, Radivojac P. When loss-of-function is loss of function: assessing mutational signatures and impact of loss-of-function genetic variants. Bioinformatics 2017; 33:i389-i398. [PMID: 28882004 PMCID: PMC5870554 DOI: 10.1093/bioinformatics/btx272] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Loss-of-function genetic variants are frequently associated with severe clinical phenotypes, yet many are present in the genomes of healthy individuals. The available methods to assess the impact of these variants rely primarily upon evolutionary conservation with little to no consideration of the structural and functional implications for the protein. They further do not provide information to the user regarding specific molecular alterations potentially causative of disease. RESULTS To address this, we investigate protein features underlying loss-of-function genetic variation and develop a machine learning method, MutPred-LOF, for the discrimination of pathogenic and tolerated variants that can also generate hypotheses on specific molecular events disrupted by the variant. We investigate a large set of human variants derived from the Human Gene Mutation Database, ClinVar and the Exome Aggregation Consortium. Our prediction method shows an area under the Receiver Operating Characteristic curve of 0.85 for all loss-of-function variants and 0.75 for proteins in which both pathogenic and neutral variants have been observed. We applied MutPred-LOF to a set of 1142 de novo vari3ants from neurodevelopmental disorders and find enrichment of pathogenic variants in affected individuals. Overall, our results highlight the potential of computational tools to elucidate causal mechanisms underlying loss of protein function in loss-of-function variants. AVAILABILITY AND IMPLEMENTATION http://mutpred.mutdb.org. CONTACT predrag@indiana.edu.
Collapse
Affiliation(s)
- Kymberleigh A Pagel
- Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA
| | - Vikas Pejaver
- Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA
| | - Guan Ning Lin
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Hyun-Jun Nam
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Matthew Mort
- Institute of Medical Genetics, Cardiff University, Cardiff, UK
| | - David N Cooper
- Institute of Medical Genetics, Cardiff University, Cardiff, UK
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Beyster Center for Psychiatric Genomics, Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Lilia M Iakoucheva
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Predrag Radivojac
- Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA
| |
Collapse
|
17
|
Fischer A, Rausell A. Primary immunodeficiencies suggest redundancy within the human immune system. Sci Immunol 2016; 1:1/6/eaah5861. [PMID: 28783693 DOI: 10.1126/sciimmunol.aah5861] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Revised: 10/03/2016] [Accepted: 12/01/2016] [Indexed: 12/31/2022]
Abstract
Pathogen-driven evolution has shaped the complexity of the human immune system. Our genome contains at least 1854 gene products involved in immune responses. However, the redundancy and robustness of the immune system need further characterization. One way to examine this redundancy is through the study of monogenic primary immunodeficiencies (PIDs) associated with infections. Causal mutations affecting innate immunity genes are, in relative terms, close to seven times less frequent than those affecting adaptive immunity genes in PIDs. Loss-of-function mutations of innate immunity genes encoding pattern-recognition receptors (PRRs) and associated pathways rarely cause susceptibility to infections, which suggests that PRR pathways are partially redundant in the immune responses to infection. This dispensability has also been observed for constitutive products of the immune system, such as secretory immunoglobulin A, and for innate immune cells, such as natural killer and innate lymphoid cell subsets, which are not essential for viability. This Review discusses these findings in the context of their implications for the identification of previously unknown classes of PIDs and assessment of the susceptibility to infection associated with various targeted immunotherapies.
Collapse
Affiliation(s)
- Alain Fischer
- Paris Descartes-Sorbonne Paris Cité University, Imagine Institute, Paris, France. .,Immunology and Pediatric Hematology Department, Assistance Publique-Hôpitaux de Paris, Paris, France.,INSERM UMR 1163, Paris, France.,Collège de France, Paris, France
| | - Antonio Rausell
- Paris Descartes-Sorbonne Paris Cité University, Imagine Institute, Paris, France
| |
Collapse
|
18
|
eMERGE Phenome-Wide Association Study (PheWAS) identifies clinical associations and pleiotropy for stop-gain variants. BMC Med Genomics 2016; 9 Suppl 1:32. [PMID: 27535653 PMCID: PMC4989894 DOI: 10.1186/s12920-016-0191-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND We explored premature stop-gain variants to test the hypothesis that variants, which are likely to have a consequence on protein structure and function, will reveal important insights with respect to the phenotypes associated with them. We performed a phenome-wide association study (PheWAS) exploring the association between a selected list of functional stop-gain genetic variants (variation resulting in truncated proteins or in nonsense-mediated decay) and an extensive group of diagnoses to identify novel associations and uncover potential pleiotropy. RESULTS In this study, we selected 25 stop-gain variants: 5 stop-gain variants with previously reported phenotypic associations, and a set of 20 putative stop-gain variants identified using dbSNP. For the PheWAS, we used data from the electronic MEdical Records and GEnomics (eMERGE) Network across 9 sites with a total of 41,057 unrelated patients. We divided all these samples into two datasets by equal proportion of eMERGE site, sex, race, and genotyping platform. We calculated single effect associations between these 25 stop-gain variants and ICD-9 defined case-control diagnoses. We also performed stratified analyses for samples of European and African ancestry. Associations were adjusted for sex, site, genotyping platform and the first three principal components to account for global ancestry. We identified previously known associations, such as variants in LPL associated with hyperglyceridemia indicating that our approach was robust. We also found a total of three significant associations with p < 0.01 in both datasets, with the most significant replicating result being LPL SNP rs328 and ICD-9 code 272.1 "Disorder of Lipoid metabolism" (pdiscovery = 2.59x10-6, preplicating = 2.7x10-4). The other two significant replicated associations identified by this study are: variant rs1137617 in KCNH2 gene associated with ICD-9 code category 244 "Acquired Hypothyroidism" (pdiscovery = 5.31x103, preplicating = 1.15x10-3) and variant rs12060879 in DPT gene associated with ICD-9 code category 996 "Complications peculiar to certain specified procedures" (pdiscovery = 8.65x103, preplicating = 4.16x10-3). CONCLUSION In conclusion, this PheWAS revealed novel associations of stop-gained variants with interesting phenotypes (ICD-9 codes) along with pleiotropic effects.
Collapse
|
19
|
Martinez-Picado J, McLaren PJ, Erkizia I, Martin MP, Benet S, Rotger M, Dalmau J, Ouchi D, Wolinsky SM, Penugonda S, Günthard HF, Fellay J, Carrington M, Izquierdo-Useros N, Telenti A. Identification of Siglec-1 null individuals infected with HIV-1. Nat Commun 2016; 7:12412. [PMID: 27510803 PMCID: PMC4987525 DOI: 10.1038/ncomms12412] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 06/30/2016] [Indexed: 12/16/2022] Open
Abstract
Siglec-1/CD169 is a myeloid-cell surface receptor critical for HIV-1 capture and infection of bystander target cells. To dissect the role of SIGLEC1 in natura, we scan a large population genetic database and identify a loss-of-function variant (Glu88Ter) that is found in ∼1% of healthy people. Exome analysis and direct genotyping of 4,233 HIV-1-infected individuals reveals two Glu88Ter homozygous and 97 heterozygous subjects, allowing the analysis of ex vivo and in vivo consequences of SIGLEC1 loss-of-function. Cells from these individuals are functionally null or haploinsufficient for Siglec-1 activity in HIV-1 capture and trans-infection ex vivo. However, Siglec-1 protein truncation does not have a measurable impact on HIV-1 acquisition or AIDS outcomes in vivo. This result contrasts with the known in vitro functional role of Siglec-1 in HIV-1 trans-infection. Thus, it provides evidence that the classical HIV-1 infectious routes may compensate for the lack of Siglec-1 in fuelling HIV-1 dissemination within infected individuals.
Collapse
Affiliation(s)
- Javier Martinez-Picado
- AIDS Research Institute IrsiCaixa, Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol (IGTP), Universitat Autònoma de Barcelona, 08916 Badalona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
- University of Vic-Central University of Catalonia (UVic-UCC), 08500 Vic, Barcelona, Spain
| | - Paul J. McLaren
- National HIV and Retrovirology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada R3E 0W3
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada R3E 0J9
| | - Itziar Erkizia
- AIDS Research Institute IrsiCaixa, Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol (IGTP), Universitat Autònoma de Barcelona, 08916 Badalona, Spain
| | - Maureen P. Martin
- Cancer and Inflammation Program, Laboratory of Experimental Immunology, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, USA
| | - Susana Benet
- AIDS Research Institute IrsiCaixa, Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol (IGTP), Universitat Autònoma de Barcelona, 08916 Badalona, Spain
| | - Margalida Rotger
- Institute of Microbiology, University Hospital Center and University of Lausanne, 1011 Lausanne, Switzerland
| | - Judith Dalmau
- AIDS Research Institute IrsiCaixa, Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol (IGTP), Universitat Autònoma de Barcelona, 08916 Badalona, Spain
| | - Dan Ouchi
- AIDS Research Institute IrsiCaixa, Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol (IGTP), Universitat Autònoma de Barcelona, 08916 Badalona, Spain
| | - Steven M. Wolinsky
- Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611, USA
| | - Sudhir Penugonda
- Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611, USA
| | - Huldrych F. Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Jacques Fellay
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Mary Carrington
- Cancer and Inflammation Program, Laboratory of Experimental Immunology, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, USA
- Ragon Institute for MGH, MIT and Harvard, Cambridge, Massachusetts 02139, USA
| | - Nuria Izquierdo-Useros
- AIDS Research Institute IrsiCaixa, Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol (IGTP), Universitat Autònoma de Barcelona, 08916 Badalona, Spain
| | - Amalio Telenti
- Genomic Medicine, J. Craig Venter Institute, La Jolla, California 12037, USA
| |
Collapse
|
20
|
Whole-exome sequencing to analyze population structure, parental inbreeding, and familial linkage. Proc Natl Acad Sci U S A 2016; 113:6713-8. [PMID: 27247391 DOI: 10.1073/pnas.1606460113] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Principal component analysis (PCA), homozygosity rate estimations, and linkage studies in humans are classically conducted through genome-wide single-nucleotide variant arrays (GWSA). We compared whole-exome sequencing (WES) and GWSA for this purpose. We analyzed 110 subjects originating from different regions of the world, including North Africa and the Middle East, which are poorly covered by public databases and have high consanguinity rates. We tested and applied a number of quality control (QC) filters. Compared with GWSA, we found that WES provided an accurate prediction of population substructure using variants with a minor allele frequency > 2% (correlation = 0.89 with the PCA coordinates obtained by GWSA). WES also yielded highly reliable estimates of homozygosity rates using runs of homozygosity with a 1,000-kb window (correlation = 0.94 with the estimates provided by GWSA). Finally, homozygosity mapping analyses in 15 families including a single offspring with high homozygosity rates showed that WES provided 51% less genome-wide linkage information than GWSA overall but 97% more information for the coding regions. At the genome-wide scale, 76.3% of linked regions were found by both GWSA and WES, 17.7% were found by GWSA only, and 6.0% were found by WES only. For coding regions, the corresponding percentages were 83.5%, 7.4%, and 9.1%, respectively. With appropriate QC filters, WES can be used for PCA and adjustment for population substructure, estimating homozygosity rates in individuals, and powerful linkage analyses, particularly in coding regions.
Collapse
|
21
|
Schrimpf R, Gottschalk M, Metzger J, Martinsson G, Sieme H, Distl O. Screening of whole genome sequences identified high-impact variants for stallion fertility. BMC Genomics 2016; 17:288. [PMID: 27079378 PMCID: PMC4832559 DOI: 10.1186/s12864-016-2608-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 03/30/2016] [Indexed: 02/07/2023] Open
Abstract
Background Stallion fertility is an economically important trait due to the increase of artificial insemination in horses. The availability of whole genome sequence data facilitates identification of rare high-impact variants contributing to stallion fertility. The aim of our study was to genotype rare high-impact variants retrieved from next-generation sequencing (NGS)-data of 11 horses in order to unravel harmful genetic variants in large samples of stallions. Methods Gene ontology (GO) terms and search results from public databases were used to obtain a comprehensive list of human und mice genes predicted to participate in the regulation of male reproduction. The corresponding equine orthologous genes were searched in whole genome sequence data of seven stallions and four mares and filtered for high-impact genetic variants using SnpEFF, SIFT and Polyphen 2 software. All genetic variants with the missing homozygous mutant genotype were genotyped on 337 fertile stallions of 19 breeds using KASP genotyping assays or PCR-RFLP. Mixed linear model analysis was employed for an association analysis with de-regressed estimated breeding values of the paternal component of the pregnancy rate per estrus (EBV-PAT). Results We screened next generation sequenced data of whole genomes from 11 horses for equine genetic variants in 1194 human and mice genes involved in male fertility and linked through common gene ontology (GO) with male reproductive processes. Variants were filtered for high-impact on protein structure and validated through SIFT and Polyphen 2. Only those genetic variants were followed up when the homozygote mutant genotype was missing in the detection sample comprising 11 horses. After this filtering process, 17 single nucleotide polymorphism (SNPs) were left. These SNPs were genotyped in 337 fertile stallions of 19 breeds using KASP genotyping assays or PCR-RFLP. An association analysis in 216 Hanoverian stallions revealed a significant association of the splice-site disruption variant g.37455302G>A in NOTCH1 with the de-regressed estimated breeding values of the paternal component of the pregnancy rate per estrus (EBV-PAT). For 9 high-impact variants within the genes CFTR, OVGP1, FBXO43, TSSK6, PKD1, FOXP1, TCP11, SPATA31E1 and NOTCH1 (g.37453246G>C) absence of the homozygous mutant genotype in the validation sample of all 337 fertile stallions was obvious. Therefore, these variants were considered as potentially deleterious factors for stallion fertility. Conclusions In conclusion, this study revealed 17 genetic variants with a predicted high damaging effect on protein structure and missing homozygous mutant genotype. The g.37455302G>A NOTCH1 variant was identified as a significant stallion fertility locus in Hanoverian stallions and further 9 candidate fertility loci with missing homozygous mutant genotypes were validated in a panel including 19 horse breeds. To our knowledge this is the first study in horses using next generation sequencing data to uncover strong candidate factors for stallion fertility. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2608-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rahel Schrimpf
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, 30559, Hannover, Germany
| | - Maren Gottschalk
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, 30559, Hannover, Germany
| | - Julia Metzger
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, 30559, Hannover, Germany
| | - Gunilla Martinsson
- State Stud Celle of Lower Saxony, Spörckenstraße 10, 29221, Celle, Germany
| | - Harald Sieme
- Clinic for Horses, Unit for Reproduction Medicine, University of Veterinary Medicine Hannover, Bünteweg 15, 30559, Hannover, Germany
| | - Ottmar Distl
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, 30559, Hannover, Germany.
| |
Collapse
|
22
|
Chen F, Zhu Z, Zhou X, Yan Y, Dong Z, Cui D. High-Throughput Sequencing Reveals Single Nucleotide Variants in Longer-Kernel Bread Wheat. FRONTIERS IN PLANT SCIENCE 2016; 7:1193. [PMID: 27551288 PMCID: PMC4976665 DOI: 10.3389/fpls.2016.01193] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 07/25/2016] [Indexed: 05/09/2023]
Abstract
The transcriptomes of bread wheat Yunong 201 and its ethyl methanesulfonate derivative Yunong 3114 were obtained by next-sequencing technology. Single nucleotide variants (SNVs) in the wheat strains were explored and compared. A total of 5907 and 6287 non-synonymous SNVs were acquired for Yunong 201 and 3114, respectively. A total of 4021 genes with SNVs were obtained. The genes that underwent non-synonymous SNVs were significantly involved in ATP binding, protein phosphorylation, and cellular protein metabolic process. The heat map analysis also indicated that most of these mutant genes were significantly differentially expressed at different developmental stages. The SNVs in these genes possibly contribute to the longer kernel length of Yunong 3114. Our data provide useful information on wheat transcriptome for future studies on wheat functional genomics. This study could also help in illustrating the gene functions of the non-synonymous SNVs of Yunong 201 and 3114.
Collapse
|
23
|
Bartha I, Rausell A, McLaren PJ, Mohammadi P, Tardaguila M, Chaturvedi N, Fellay J, Telenti A. The Characteristics of Heterozygous Protein Truncating Variants in the Human Genome. PLoS Comput Biol 2015; 11:e1004647. [PMID: 26642228 PMCID: PMC4671652 DOI: 10.1371/journal.pcbi.1004647] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Accepted: 11/06/2015] [Indexed: 11/18/2022] Open
Abstract
Sequencing projects have identified large numbers of rare stop-gain and frameshift variants in the human genome. As most of these are observed in the heterozygous state, they test a gene’s tolerance to haploinsufficiency and dominant loss of function. We analyzed the distribution of truncating variants across 16,260 autosomal protein coding genes in 11,546 individuals. We observed 39,893 truncating variants affecting 12,062 genes, which significantly differed from an expectation of 12,916 genes under a model of neutral de novo mutation (p<10−4). Extrapolating this to increasing numbers of sequenced individuals, we estimate that 10.8% of human genes do not tolerate heterozygous truncating variants. An additional 10 to 15% of truncated genes may be rescued by incomplete penetrance or compensatory mutations, or because the truncating variants are of limited functional impact. The study of protein truncating variants delineates the essential genome and, more generally, identifies rare heterozygous variants as an unexplored source of diversity of phenotypic traits and diseases. Genome sequencing provides evidence for large numbers of putative protein truncating variants in humans. Most truncating variants are only observed in few individuals but are collectively prevalent and widely distributed across the coding genome. Most of the truncating variants are so rare that they are only observed in heterozygosis. The current study identifies 10% of genes where heterozygous truncations are not observed and describes their biological characteristics. In addition, for genes where rare truncations are observed, we argue that these are an unexplored source of diversity of phenotypic traits and diseases.
Collapse
Affiliation(s)
- István Bartha
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Antonio Rausell
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Paul J. McLaren
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Pejman Mohammadi
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- Computational Biology Group, ETH Zurich, Zurich, Switzerland
| | - Manuel Tardaguila
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nimisha Chaturvedi
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Jacques Fellay
- SIB Swiss Institute of Bioinformatics, Lausanne and Basel, Switzerland
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Amalio Telenti
- J. Craig Venter Institute, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
24
|
Gambin T, Jhangiani SN, Below JE, Campbell IM, Wiszniewski W, Muzny DM, Staples J, Morrison AC, Bainbridge MN, Penney S, McGuire AL, Gibbs RA, Lupski JR, Boerwinkle E. Secondary findings and carrier test frequencies in a large multiethnic sample. Genome Med 2015; 7:54. [PMID: 26195989 PMCID: PMC4507324 DOI: 10.1186/s13073-015-0171-1] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 05/06/2015] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Besides its growing importance in clinical diagnostics and understanding the genetic basis of Mendelian and complex diseases, whole exome sequencing (WES) is a rich source of additional information of potential clinical utility for physicians, patients and their families. We analyzed the frequency and nature of single nucleotide variants (SNVs) considered secondary findings and recessive disease allele carrier status in the exomes of 8554 individuals from a large, randomly sampled cohort study and 2514 patients from a study of presumed Mendelian disease having undergone WES. METHODS We used the same sequencing platform and data processing pipeline to analyze all samples and characterized the distributions of reported pathogenic (ClinVar, Human Gene Mutation Database (HGMD)) and predicted deleterious variants in the pre-specified American College of Medical Genetics and Genomics (ACMG) secondary findings and recessive disease genes in different ethnic groups. RESULTS In the 56 ACMG secondary findings genes, the average number of predicted deleterious variants per individual was 0.74, and the mean number of ClinVar reported pathogenic variants was 0.06. We observed an average of 10 deleterious and 0.78 ClinVar reported pathogenic variants per individual in 1423 autosomal recessive disease genes. By repeatedly sampling pairs of exomes, 0.5 % of the randomly generated couples were at 25 % risk of having an affected offspring for an autosomal recessive disorder based on the ClinVar variants. CONCLUSIONS By investigating reported pathogenic and novel, predicted deleterious variants we estimated the lower and upper limits of the population fraction for which exome sequencing may reveal additional medically relevant information. We suggest that the observed wide range for the lower and upper limits of these frequency numbers will be gradually reduced due to improvement in classification databases and prediction algorithms.
Collapse
Affiliation(s)
- Tomasz Gambin
- />Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- />Institute of Computer Science, Warsaw University of Technology, Warsaw, 00-665 Poland
| | - Shalini N. Jhangiani
- />The Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Jennifer E. Below
- />Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Ian M. Campbell
- />Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Wojciech Wiszniewski
- />Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Donna M. Muzny
- />The Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Jeffrey Staples
- />Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Alanna C. Morrison
- />Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Matthew N. Bainbridge
- />The Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Samantha Penney
- />Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030 USA
- />Texas Children’s Hospital, Houston, TX 77030 USA
| | - Amy L. McGuire
- />The Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- />Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, TX 77030 USA
| | - Richard A. Gibbs
- />Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- />The Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - James R. Lupski
- />Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- />Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030 USA
- />Texas Children’s Hospital, Houston, TX 77030 USA
| | - Eric Boerwinkle
- />The Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- />Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| |
Collapse
|
25
|
Abascal F, Ezkurdia I, Rodriguez-Rivas J, Rodriguez JM, del Pozo A, Vázquez J, Valencia A, Tress ML. Alternatively Spliced Homologous Exons Have Ancient Origins and Are Highly Expressed at the Protein Level. PLoS Comput Biol 2015; 11:e1004325. [PMID: 26061177 PMCID: PMC4465641 DOI: 10.1371/journal.pcbi.1004325] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 05/08/2015] [Indexed: 11/19/2022] Open
Abstract
Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, but with conflicting results. Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing that is detectable by high-resolution mass spectroscopy. We find fewer splice events than would be expected: we identified peptides for almost 64% of human protein coding genes, but detected just 282 splice events. This data suggests that most genes have a single dominant isoform at the protein level. Many of the alternative isoforms that we could identify were only subtly different from the main splice isoform. Very few of the splice events identified at the protein level disrupted functional domains, in stark contrast to the two thirds of splice events annotated in the human genome that would lead to the loss or damage of functional domains. The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from the frequency of these events in the genome. These homologous exon substitution events were remarkably conserved—all the homologous exons we identified evolved over 460 million years ago—and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing indicates that isoforms generated from homologous exons may have important cellular roles. Alternative splicing is thought to be one means for generating the protein diversity necessary for the whole range of cellular functions. While the presence of alternatively spliced transcripts in the cell has been amply demonstrated, the same cannot be said for alternatively spliced proteins. The quest for alternative protein isoforms has focused primarily on the analysis of peptides from large-scale mass spectroscopy experiments, but evidence for alternative isoforms has been patchy and contradictory. A careful analysis of the peptide evidence is needed to fully understand the scale of alternative splicing detectable at the protein level. Here we analysed peptides from eight large-scale data sets, identifying just 282 splice events among 12,716 genes. This suggests that most genes have a single dominant isoform. Many of the alternative isoforms that we identified were only subtly different from the main splice variant, and one in five was generated by substitution of homologous exons by swapping one related exon for another. Remarkably, the alternative isoforms generated from homologous exons were highly conserved, first appearing 460 million years ago, and several appear to have tissue-specific roles in the brain and heart. Our results suggest that these particular isoforms are likely to have important cellular roles.
Collapse
Affiliation(s)
- Federico Abascal
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Iakes Ezkurdia
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Juan Rodriguez-Rivas
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Jose Manuel Rodriguez
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Angela del Pozo
- Instituto de Genetica Medica y Molecular, Hospital Universitario La Paz, Madrid, Spain
| | - Jesús Vázquez
- Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares (CNIC) Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- * E-mail: (AV); (MLT)
| | - Michael L. Tress
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- * E-mail: (AV); (MLT)
| |
Collapse
|