1
|
Magraner-Pardo L, Laskowski RA, Pons T, Thornton JM. A computational and structural analysis of germline and somatic variants affecting the DDR mechanism, and their impact on human diseases. Sci Rep 2021; 11:14268. [PMID: 34253785 PMCID: PMC8275599 DOI: 10.1038/s41598-021-93715-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 06/22/2021] [Indexed: 12/02/2022] Open
Abstract
DNA-Damage Response (DDR) proteins are crucial for maintaining the integrity of the genome by identifying and repairing errors in DNA. Variants affecting their function can have severe consequences since failure to repair damaged DNA can result in cells turning cancerous. Here, we compare germline and somatic variants in DDR genes, specifically looking at their locations in the corresponding three-dimensional (3D) structures, Pfam domains, and protein–protein interaction interfaces. We show that somatic variants in metastatic cases are more likely to be found in Pfam domains and protein interaction interfaces than are pathogenic germline variants or variants of unknown significance (VUS). We also show that there are hotspots in the structures of ATM and BRCA2 proteins where pathogenic germline, and recurrent somatic variants from primary and metastatic tumours, cluster together in 3D. Moreover, in the ATM, BRCA1 and BRCA2 genes from prostate cancer patients, the distributions of germline benign, pathogenic, VUS, and recurrent somatic variants differ across Pfam domains. Together, these results provide a better characterisation of the most recurrent affected regions in DDRs and could help in the understanding of individual susceptibility to tumour development.
Collapse
Affiliation(s)
- Lorena Magraner-Pardo
- Prostate Cancer Clinical Unit, Spanish National Cancer Research Center (CNIO), Madrid, Spain.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Roman A Laskowski
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Tirso Pons
- Department of Immunology and Oncology, National Center for Biotechnology, Spanish National Research Council (CNB-CSIC), Madrid, Spain
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
| |
Collapse
|
2
|
Evans P, Wu C, Lindy A, McKnight DA, Lebo M, Sarmady M, Abou Tayoun AN. Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets. Genome Res 2019; 29:1144-1151. [PMID: 31235655 PMCID: PMC6633260 DOI: 10.1101/gr.240994.118] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2018] [Accepted: 05/24/2019] [Indexed: 01/23/2023]
Abstract
Recent advances in DNA sequencing have expanded our understanding of the molecular basis of genetic disorders and increased the utilization of clinical genomic tests. Given the paucity of evidence to accurately classify each variant and the difficulty of experimentally evaluating its clinical significance, a large number of variants generated by clinical tests are reported as variants of unknown clinical significance. Population-scale variant databases can improve clinical interpretation. Specifically, pathogenicity prediction for novel missense variants can use features describing regional variant constraint. Constrained genomic regions are those that have an unusually low variant count in the general population. Computational methods have been introduced to capture these regions and incorporate them into pathogenicity classifiers, but these methods have yet to be compared on an independent clinical variant data set. Here, we introduce one variant data set derived from clinical sequencing panels and use it to compare the ability of different genomic constraint metrics to determine missense variant pathogenicity. This data set is compiled from 17,071 patients surveyed with clinical genomic sequencing for cardiomyopathy, epilepsy, or RASopathies. We further use this data set to demonstrate the necessity of disease-specific classifiers and to train PathoPredictor, a disease-specific ensemble classifier of pathogenicity based on regional constraint and variant-level features. PathoPredictor achieves an average precision >90% for variants from all 99 tested disease genes while approaching 100% accuracy for some genes. The accumulation of larger clinical variant training data sets can significantly enhance their performance in a disease- and gene-specific manner.
Collapse
Affiliation(s)
- Perry Evans
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Chao Wu
- Division of Genomic Diagnostics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | | | | | - Matthew Lebo
- Laboratory for Molecular Medicine, Partners HealthCare Personalized Medicine, Cambridge, Massachusetts 02139, USA.,Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Mahdi Sarmady
- Division of Genomic Diagnostics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA.,Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Ahmad N Abou Tayoun
- Division of Genomic Diagnostics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA.,Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA.,Al Jalila Children's Specialty Hospital, Dubai, United Arab Emirates
| |
Collapse
|