1
|
Pérez-Palma E, May P, Iqbal S, Niestroj LM, Du J, Heyne HO, Castrillon JA, O'Donnell-Luria A, Nürnberg P, Palotie A, Daly M, Lal D. Identification of pathogenic variant enriched regions across genes and gene families. Genome Res 2020; 30:62-71. [PMID: 31871067 PMCID: PMC6961572 DOI: 10.1101/gr.252601.119] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 12/19/2019] [Indexed: 12/11/2022]
Abstract
Missense variant interpretation is challenging. Essential regions for protein function are conserved among gene-family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2871 gene-family protein sequence alignments involving 9990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 76,153 missense variants from patients. With this gene-family approach, we identified 465 regions enriched for patient variants spanning 41,463 amino acids in 1252 genes. As a comparison, by testing the same genes individually, we identified fewer patient variant enriched regions, involving only 2639 amino acids and 215 genes. Next, we selected de novo variants from 6753 patients with neurodevelopmental disorders and 1911 unaffected siblings and observed an 8.33-fold enrichment of patient variants in our identified regions (95% C.I. = 3.90-Inf, P-value = 2.72 × 10-11). By using the complete ClinVar variant set, we found that missense variants inside the identified regions are 106-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 106.15, 95% C.I = 70.66-Inf, P-value < 2.2 × 10-16). All pathogenic variant enriched regions (PERs) identified are available online through "PER viewer," a user-friendly online platform for interactive data mining, visualization, and download. In summary, our gene-family burden analysis approach identified novel PERs in protein sequences. This annotation can empower variant interpretation.
Collapse
Affiliation(s)
- Eduardo Pérez-Palma
- Cologne Center for Genomics, University of Cologne, Cologne, 50931 NRW, Germany
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio 44195, USA
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University Luxembourg, L-4367 Esch-sur-Alzette, Luxembourg
| | - Sumaiya Iqbal
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Lisa-Marie Niestroj
- Cologne Center for Genomics, University of Cologne, Cologne, 50931 NRW, Germany
| | - Juanjiangmeng Du
- Cologne Center for Genomics, University of Cologne, Cologne, 50931 NRW, Germany
| | - Henrike O Heyne
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00014 Helsinki, Finland
| | | | - Anne O'Donnell-Luria
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02142, USA
| | - Peter Nürnberg
- Cologne Center for Genomics, University of Cologne, Cologne, 50931 NRW, Germany
| | - Aarno Palotie
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00014 Helsinki, Finland
| | - Mark Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00014 Helsinki, Finland
| | - Dennis Lal
- Cologne Center for Genomics, University of Cologne, Cologne, 50931 NRW, Germany
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio 44195, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, Ohio 44195, USA
| |
Collapse
|