1
|
Desai S, Ahmad S, Bawaskar B, Rashmi S, Mishra R, Lakhwani D, Dutt A. Singleton mutations in large-scale cancer genome studies: uncovering the tail of cancer genome. NAR Cancer 2024; 6:zcae010. [PMID: 38487301 PMCID: PMC10939354 DOI: 10.1093/narcan/zcae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 02/23/2024] [Indexed: 03/17/2024] Open
Abstract
Singleton or low-frequency driver mutations are challenging to identify. We present a domain driver mutation estimator (DOME) to identify rare candidate driver mutations. DOME analyzes positions analogous to known statistical hotspots and resistant mutations in combination with their functional and biochemical residue context as determined by protein structures and somatic mutation propensity within conserved PFAM domains, integrating the CADD scoring scheme. Benchmarked against seven other tools, DOME exhibited superior or comparable accuracy compared to all evaluated tools in the prediction of functional cancer drivers, with the exception of one tool. DOME identified a unique set of 32 917 high-confidence predicted driver mutations from the analysis of whole proteome missense variants within domain boundaries across 1331 genes, including 1192 noncancer gene census genes, emphasizing its unique place in cancer genome analysis. Additionally, analysis of 8799 TCGA (The Cancer Genome Atlas) and in-house tumor samples revealed 847 potential driver mutations, with mutations in tyrosine kinase members forming the dominant burden, underscoring its higher significance in cancer. Overall, DOME complements current approaches for identifying novel, low-frequency drivers and resistant mutations in personalized therapy.
Collapse
Affiliation(s)
- Sanket Desai
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
- Homi Bhabha National Institute, Training School Complex, Anushakti Nagar, Mumbai 400094, Maharashtra, India
| | - Suhail Ahmad
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
- Homi Bhabha National Institute, Training School Complex, Anushakti Nagar, Mumbai 400094, Maharashtra, India
| | - Bhargavi Bawaskar
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
| | - Sonal Rashmi
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
| | - Rohit Mishra
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
| | - Deepika Lakhwani
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
| | - Amit Dutt
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
- Homi Bhabha National Institute, Training School Complex, Anushakti Nagar, Mumbai 400094, Maharashtra, India
- Department of Genetics, University of Delhi, South Campus, New Delhi 110021, India
| |
Collapse
|
2
|
Vitting-Seerup K. Most protein domains exist as variants with distinct functions across cells, tissues and diseases. NAR Genom Bioinform 2023; 5:lqad084. [PMID: 37745975 PMCID: PMC10516350 DOI: 10.1093/nargab/lqad084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/09/2023] [Accepted: 09/05/2023] [Indexed: 09/26/2023] Open
Abstract
Protein domains are the active subunits that provide proteins with specific functions through precise three-dimensional structures. Such domains facilitate most protein functions, including molecular interactions and signal transduction. Currently, these protein domains are described and analyzed as invariable molecular building blocks with fixed functions. Here, I show that most human protein domains exist as multiple distinct variants termed 'domain isotypes'. Domain isotypes are used in a cell, tissue and disease-specific manner and have surprisingly different 3D structures. Accordingly, domain isotypes, compared to each other, modulate or abolish the functionality of protein domains. These results challenge the current view of protein domains as invariable building blocks and have significant implications for both wet- and dry-lab workflows. The extensive use of protein domain isotypes within protein isoforms adds to the literature indicating we need to transition to an isoform-centric research paradigm.
Collapse
Affiliation(s)
- Kristoffer Vitting-Seerup
- The Bioinformatics Section, Department of Health Technology, The Technical University of Denmark (DTU), Denmark
| |
Collapse
|
3
|
Corcuff M, Garibal M, Desvignes JP, Guien C, Grattepanche C, Collod-Béroud G, Ménoret E, Salgado D, Béroud C. Protein domains provide a new layer of information for classifying human variations in rare diseases. FRONTIERS IN BIOINFORMATICS 2023; 3:1127341. [PMID: 36896423 PMCID: PMC9990413 DOI: 10.3389/fbinf.2023.1127341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 02/01/2023] [Indexed: 02/23/2023] Open
Abstract
Introduction: Using the ACMG-AMP guidelines for the interpretation of sequence variants, it remains difficult to meet the criterion associated with the protein domain, PM1, which is assigned in only about 10% of cases, whereas the criteria related to variant frequency, PM2/BA1/BS1, is reported in 50% of cases. To improve the classification of human missense variants using protein domains information, we developed the DOLPHIN system (https://dolphin.mmg-gbit.eu). Methods: We used Pfam alignments of eukaryotes to define DOLPHIN scores to identify protein domain residues and variants that have a significant impact. In parallel, we enriched gnomAD variants frequencies for each domains' residue. These were validated using ClinVar data. Results: We applied this method to all potential human transcripts' variants, resulting in 30.0% being assigned a PM1 label, whereas 33.2% were eligible for a new benign support criterion, BP8. We also showed that DOLPHIN provides an extrapolated frequency for 31.8% of the variants, compared to the original frequency available in gnomAD for 7.6% of them. Discussion: Overall, DOLPHIN allows a simplified use of the PM1 criterion, an expanded application of the PM2/BS1 criteria and the creation of a new BP8 criterion. DOLPHIN could facilitate the classification of amino acid substitutions in protein domains that cover nearly 40% of proteins and represent the sites of most pathogenic variants.
Collapse
Affiliation(s)
- Mélanie Corcuff
- Aix Marseille University, INSERM, MMG, Bioinformatics & Genetics, Marseille, France
| | - Marc Garibal
- Aix Marseille University, INSERM, MMG, Bioinformatics & Genetics, Marseille, France
| | | | - Céline Guien
- Aix Marseille University, INSERM, MMG, Bioinformatics & Genetics, Marseille, France
| | - Coralie Grattepanche
- Aix Marseille University, INSERM, MMG, Bioinformatics & Genetics, Marseille, France
| | | | - Estelle Ménoret
- Aix Marseille University, INSERM, MMG, Bioinformatics & Genetics, Marseille, France
| | - David Salgado
- Aix Marseille University, INSERM, MMG, Bioinformatics & Genetics, Marseille, France
| | - Christophe Béroud
- Aix Marseille University, INSERM, MMG, Bioinformatics & Genetics, Marseille, France.,Laboratoire de Génétique Médicale, APHM Hôpital d'Enfants de la Timone, Marseille, France
| |
Collapse
|
4
|
Gauran IIM, Park J, Rattsev I, Peterson TA, Kann MG, Park D. Bayesian local false discovery rate for sparse count data with application to the discovery of hotspots in protein domains. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Iris Ivy M. Gauran
- Biostatistics Group, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology
| | - Junyong Park
- Department of Statistics, Seoul National University
| | - Ilia Rattsev
- Department of Biological Sciences, University of Maryland
| | - Thomas A. Peterson
- Institute for Computational Health Science, University of California, San Francisco
| | | | - DoHwan Park
- Department of Mathematics and Statistics, University of Maryland
| |
Collapse
|
5
|
Park H, Park J. Poisson mean vector estimation with nonparametric maximum likelihood estimation and application to protein domain data. Electron J Stat 2022. [DOI: 10.1214/22-ejs2029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Hoyoung Park
- Department of Statistics, Sookmyung Women’s University, Seoul, Korea
| | - Junyong Park
- Department of Statistics, Sookmyung Women’s University, Seoul, Korea
| |
Collapse
|
6
|
Recurrent high-impact mutations at cognate structural positions in class A G protein-coupled receptors expressed in tumors. Proc Natl Acad Sci U S A 2021; 118:2113373118. [PMID: 34916293 PMCID: PMC8713800 DOI: 10.1073/pnas.2113373118] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/01/2021] [Indexed: 12/23/2022] Open
Abstract
GPCRs and GPCR pathways are increasingly being implicated in human malignancies, placing them among the most promising cancer drug candidates. Our results reveal enrichment of highly impactful, recurrent GPCR mutations within cancers. We found that cognate mutations in selected class A GPCRs have deleterious effects on signaling function. The results also suggest that olfactory receptors, often considered inconsequential, display a nonrandom mutation pattern in tumors in which they are expressed. These findings support the idea that protein paralogs can act in parallel as members of an onco-group. G protein-coupled receptors (GPCRs) are the largest family of human proteins. They have a common structure and, signaling through a much smaller set of G proteins, arrestins, and effectors, activate downstream pathways that often modulate hallmark mechanisms of cancer. Because there are many more GPCRs than effectors, mutations in different receptors could perturb signaling similarly so as to favor a tumor. We hypothesized that somatic mutations in tumor samples may not be enriched within a single gene but rather that cognate mutations with similar effects on GPCR function are distributed across many receptors. To test this possibility, we systematically aggregated somatic cancer mutations across class A GPCRs and found a nonrandom distribution of positions with variant amino acid residues. Individual cancer types were enriched for highly impactful, recurrent mutations at selected cognate positions of known functional motifs. We also discovered that no single receptor drives this pattern, but rather multiple receptors contain amino acid substitutions at a few cognate positions. Phenotypic characterization suggests these mutations induce perturbation of G protein activation and/or β-arrestin recruitment. These data suggest that recurrent impactful oncogenic mutations perturb different GPCRs to subvert signaling and promote tumor growth or survival. The possibility that multiple different GPCRs could moonlight as drivers or enablers of a given cancer through mutations located at cognate positions across GPCR paralogs opens a window into cancer mechanisms and potential approaches to therapeutics.
Collapse
|
7
|
Chen HC, Wang J, Liu Q, Shyr Y. A domain damage index to prioritizing the pathogenicity of missense variants. Hum Mutat 2021; 42:1503-1517. [PMID: 34350656 PMCID: PMC8511099 DOI: 10.1002/humu.24269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 07/08/2021] [Accepted: 07/30/2021] [Indexed: 11/09/2022]
Abstract
Prioritizing causal variants is one major challenge for the clinical application of sequencing data. Prompted by the observation that 74.3% of missense pathogenic variants locate in protein domains, we developed an approach named domain damage index (DDI). DDI identifies protein domains depleted of rare missense variations in the general population, which can be further used as a metric to prioritize variants. DDI is significantly correlated with phylogenetic conservation, variant-level metrics, and reported pathogenicity. DDI achieved great performance for distinguishing pathogenic variants from benign ones in three benchmark datasets. The combination of DDI with the other two best approaches improved the performance of each individual method considerably, suggesting DDI provides a powerful and complementary way of variant prioritization.
Collapse
Affiliation(s)
- Hua-Chang Chen
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Jing Wang
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Qi Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Yu Shyr
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| |
Collapse
|
8
|
Chen J, Guo JT. Structural and functional analysis of somatic coding and UTR indels in breast and lung cancer genomes. Sci Rep 2021; 11:21178. [PMID: 34707120 PMCID: PMC8551294 DOI: 10.1038/s41598-021-00583-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 10/14/2021] [Indexed: 11/24/2022] Open
Abstract
Insertions and deletions (Indels) represent one of the major variation types in the human genome and have been implicated in diseases including cancer. To study the features of somatic indels in different cancer genomes, we investigated the indels from two large samples of cancer types: invasive breast carcinoma (BRCA) and lung adenocarcinoma (LUAD). Besides mapping somatic indels in both coding and untranslated regions (UTRs) from the cancer whole exome sequences, we investigated the overlap between these indels and transcription factor binding sites (TFBSs), the key elements for regulation of gene expression that have been found in both coding and non-coding sequences. Compared to the germline indels in healthy genomes, somatic indels contain more coding indels with higher than expected frame-shift (FS) indels in cancer genomes. LUAD has a higher ratio of deletions and higher coding and FS indel rates than BRCA. More importantly, these somatic indels in cancer genomes tend to locate in sequences with important functions, which can affect the core secondary structures of proteins and have a bigger overlap with predicted TFBSs in coding regions than the germline indels. The somatic CDS indels are also enriched in highly conserved nucleotides when compared with germline CDS indels.
Collapse
Affiliation(s)
- Jing Chen
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
9
|
Grillo E, Ravelli C, Corsini M, Zammataro L, Mitola S. Protein domain-based approaches for the identification and prioritization of therapeutically actionable cancer variants. Biochim Biophys Acta Rev Cancer 2021; 1876:188614. [PMID: 34403770 DOI: 10.1016/j.bbcan.2021.188614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 08/11/2021] [Accepted: 08/11/2021] [Indexed: 01/04/2023]
Abstract
The tremendous number of cancer variants that can be detected by NGS analyses has required the development of computational approaches to prioritize mutations on the basis of their biological and clinical significance. Standard strategies take a gene-centric approach to the problem, allowing exclusively the identification of highly frequent variants. On the contrary, protein domain (PD)-based approaches allow to identify functionally relevant low frequency variants by searching for mutations that recur on analogous residues across homologous proteins (i.e. containing the same PD). Such approaches enable to transfer information about the effects and druggability from one known mutation to unknown ones. Here we describe how PD-based strategies work, and discuss how they could be exploited for mutation prioritization. The principle that mutations clustered on specific residues of PDs have the same functional consequences and are therapeutically actionable in a similar manner could help the choice of patient-specific targeted drugs, eventually improving the management of cancer patients.
Collapse
Affiliation(s)
- Elisabetta Grillo
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy.
| | - Cosetta Ravelli
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Michela Corsini
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Luca Zammataro
- Division of Artificial Intelligence Systems for Immunoinformatics, Kiromic BioPharma, Inc., Houston, USA
| | - Stefania Mitola
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy.
| |
Collapse
|
10
|
Etzion-Fuchs A, Todd DA, Singh M. dSPRINT: predicting DNA, RNA, ion, peptide and small molecule interaction sites within protein domains. Nucleic Acids Res 2021; 49:e78. [PMID: 33999210 PMCID: PMC8287948 DOI: 10.1093/nar/gkab356] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 03/30/2021] [Accepted: 04/22/2021] [Indexed: 01/08/2023] Open
Abstract
Domains are instrumental in facilitating protein interactions with DNA, RNA, small molecules, ions and peptides. Identifying ligand-binding domains within sequences is a critical step in protein function annotation, and the ligand-binding properties of proteins are frequently analyzed based upon whether they contain one of these domains. To date, however, knowledge of whether and how protein domains interact with ligands has been limited to domains that have been observed in co-crystal structures; this leaves approximately two-thirds of human protein domain families uncharacterized with respect to whether and how they bind DNA, RNA, small molecules, ions and peptides. To fill this gap, we introduce dSPRINT, a novel ensemble machine learning method for predicting whether a domain binds DNA, RNA, small molecules, ions or peptides, along with the positions within it that participate in these types of interactions. In stringent cross-validation testing, we demonstrate that dSPRINT has an excellent performance in uncovering ligand-binding positions and domains. We also apply dSPRINT to newly characterize the molecular functions of domains of unknown function. dSPRINT's predictions can be transferred from domains to sequences, enabling predictions about the ligand-binding properties of 95% of human genes. The dSPRINT framework and its predictions for 6503 human protein domains are freely available at http://protdomain.princeton.edu/dsprint.
Collapse
Affiliation(s)
- Anat Etzion-Fuchs
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA
| | - David A Todd
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA.,Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| |
Collapse
|
11
|
Spouge JL, Ziegelbauer JM, Gonzalez M. A linear-time algorithm that avoids inverses and computes Jackknife (leave-one-out) products like convolutions or other operators in commutative semigroups. Algorithms Mol Biol 2020; 15:17. [PMID: 32968428 PMCID: PMC7502207 DOI: 10.1186/s13015-020-00178-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 09/08/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the p-value leads to a specific algorithmic problem. Given n elements g 0 , g 1 , … , g n - 1 in a set G with the closure and associative properties and a commutative product without inverses, compute the jackknife (leave-one-out) products g ¯ j = g 0 g 1 ⋯ g j - 1 g j + 1 ⋯ g n - 1 ( 0 ≤ j < n ). RESULTS This article gives a linear-time Jackknife Product algorithm. Its upward phase constructs a standard segment tree for computing segment products like g i , j = g i g i + 1 ⋯ g j - 1 ; its novel downward phase mirrors the upward phase while exploiting the symmetry of g j and its complement g ¯ j . The algorithm requires storage for 2 n elements of G and only about 3 n products. In contrast, the standard segment tree algorithms require about n products for construction and log 2 n products for calculating each g ¯ j , i.e., about n log 2 n products in total; and a naïve quadratic algorithm using n - 2 element-by-element products to compute each g ¯ j requires n n - 2 products. CONCLUSIONS In the herpesvirus application, the Jackknife Product algorithm required 15 min; standard segment tree algorithms would have taken an estimated 3 h; and the quadratic algorithm, an estimated 1 month. The Jackknife Product algorithm has many possible uses in bioinformatics and statistics.
Collapse
|
12
|
Kobren SN, Chazelle B, Singh M. PertInInt: An Integrative, Analytical Approach to Rapidly Uncover Cancer Driver Genes with Perturbed Interactions and Functionalities. Cell Syst 2020; 11:63-74.e7. [PMID: 32711844 PMCID: PMC7493809 DOI: 10.1016/j.cels.2020.06.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 02/23/2020] [Accepted: 06/05/2020] [Indexed: 12/12/2022]
Abstract
A major challenge in cancer genomics is to identify genes with functional roles in cancer and uncover their mechanisms of action. We introduce an integrative framework that identifies cancer-relevant genes by pinpointing those whose interaction or other functional sites are enriched in somatic mutations across tumors. We derive analytical calculations that enable us to avoid time-prohibitive permutation-based significance tests, making it computationally feasible to simultaneously consider multiple measures of protein site functionality. Our accompanying software, PertInInt, combines knowledge about sites participating in interactions with DNA, RNA, peptides, ions, or small molecules with domain, evolutionary conservation, and gene-level mutation data. When applied to 10,037 tumor samples, PertInInt uncovers both known and newly predicted cancer genes, while additionally revealing what types of interactions or other functionalities are disrupted. PertInInt’s analysis demonstrates that somatic mutations are frequently enriched in interaction sites and domains and implicates interaction perturbation as a pervasive cancer-driving event. A fast, analytical framework called PertInInt enables efficient integration of multiple measures of protein site functionality—including interaction, domain, and evolutionary conservation—with gene-level mutation data in order to rapidly detect cancer driver genes along with their disrupted functionalities.
Collapse
Affiliation(s)
- Shilpa Nadimpalli Kobren
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Department of Computer Science, Princeton University, Princeton, NJ, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Bernard Chazelle
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
13
|
Nussinov R, Tsai CJ, Jang H. Why Are Some Driver Mutations Rare? Trends Pharmacol Sci 2019; 40:919-929. [PMID: 31699406 DOI: 10.1016/j.tips.2019.10.003] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 10/09/2019] [Accepted: 10/10/2019] [Indexed: 12/13/2022]
Abstract
Understanding why driver mutations that promote cancer are sometimes rare is important for precision medicine since it would help in their identification. Driver mutations are largely discovered through their frequencies. Thus, rare mutations often escape detection. Unlike high-frequency drivers, low-frequency drivers can be tissue specific; rare drivers have extremely low frequencies. Here, we discuss rare drivers and strategies to discover them. We suggest that allosteric driver mutations shift the protein ensemble from the inactive to the active state. Rare allosteric drivers are statistically rare since, to switch the protein functional state, they cooperate with additional mutations, and these are not considered in the patient cancer-specific protein sequence analysis. A complete landscape of mutations that drive cancer will reveal tumor-specific therapeutic vulnerabilities.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Hyunbum Jang
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| |
Collapse
|
14
|
Leveraging protein dynamics to identify cancer mutational hotspots using 3D structures. Proc Natl Acad Sci U S A 2019; 116:18962-18970. [PMID: 31462496 PMCID: PMC6754584 DOI: 10.1073/pnas.1901156116] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Large-scale exome sequencing of tumors has enabled the identification of cancer drivers using recurrence-based approaches. Some of these methods also employ 3D protein structures to identify mutational hotspots in cancer-associated genes. In determining such mutational clusters in structures, existing approaches overlook protein dynamics, despite its essential role in protein function. We present a framework to identify cancer driver genes using a dynamics-based search of mutational hotspot communities. Mutations are mapped to protein structures, which are partitioned into distinct residue communities. These communities are identified in a framework where residue-residue contact edges are weighted by correlated motions (as inferred by dynamics-based models). We then search for signals of positive selection among these residue communities to identify putative driver genes, while applying our method to the TCGA (The Cancer Genome Atlas) PanCancer Atlas missense mutation catalog. Overall, we predict 1 or more mutational hotspots within the resolved structures of proteins encoded by 434 genes. These genes were enriched among biological processes associated with tumor progression. Additionally, a comparison between our approach and existing cancer hotspot detection methods using structural data suggests that including protein dynamics significantly increases the sensitivity of driver detection.
Collapse
|
15
|
Brown AL, Li M, Goncearenco A, Panchenko AR. Finding driver mutations in cancer: Elucidating the role of background mutational processes. PLoS Comput Biol 2019; 15:e1006981. [PMID: 31034466 PMCID: PMC6508748 DOI: 10.1371/journal.pcbi.1006981] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 05/09/2019] [Accepted: 03/28/2019] [Indexed: 01/22/2023] Open
Abstract
Identifying driver mutations in cancer is notoriously difficult. To date, recurrence of a mutation in patients remains one of the most reliable markers of mutation driver status. However, some mutations are more likely to occur than others due to differences in background mutation rates arising from various forms of infidelity of DNA replication and repair machinery, endogenous, and exogenous mutagens. We calculated nucleotide and codon mutability to study the contribution of background processes in shaping the observed mutational spectrum in cancer. We developed and tested probabilistic pan-cancer and cancer-specific models that adjust the number of mutation recurrences in patients by background mutability in order to find mutations which may be under selection in cancer. We showed that mutations with higher mutability values had higher observed recurrence frequency, especially in tumor suppressor genes. This trend was prominent for nonsense and silent mutations or mutations with neutral functional impact. In oncogenes, however, highly recurring mutations were characterized by relatively low mutability, resulting in an inversed U-shaped trend. Mutations not yet observed in any tumor had relatively low mutability values, indicating that background mutability might limit mutation occurrence. We compiled a dataset of missense mutations from 58 genes with experimentally validated functional and transforming impacts from various studies. We found that mutability of driver mutations was lower than that of passengers and consequently adjusting mutation recurrence frequency by mutability significantly improved ranking of mutations and driver mutation prediction. Even though no training on existing data was involved, our approach performed similarly or better to the state-of-the-art methods.
Collapse
Affiliation(s)
- Anna-Leigh Brown
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD, United States of America
| | - Minghui Li
- School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Alexander Goncearenco
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD, United States of America
| | - Anna R. Panchenko
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD, United States of America
| |
Collapse
|
16
|
Compagnone M, Cifaldi L, Fruci D. Regulation of ERAP1 and ERAP2 genes and their disfunction in human cancer. Hum Immunol 2019; 80:318-324. [PMID: 30825518 DOI: 10.1016/j.humimm.2019.02.014] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 02/01/2019] [Accepted: 02/26/2019] [Indexed: 12/18/2022]
Abstract
The endoplasmic reticulum (ER) aminopeptidases ERAP1 and ERAP2 are two multifunctional enzymes playing an important role in the biological processes requiring trimming of substrates, including the generation of major histocompatibility complex (MHC) class I binding peptides. In the absence of ERAP enzymes, the cells exhibit a different pool of peptides on their surface which can promote both NK and CD8+ T cell-mediated immune responses. The expression of ERAP1 and ERAP2 is frequently altered in tumors, as compared to their normal counterparts, but how this affects tumor growth and anti-tumor immune responses has been little investigated. This review will provide an overview of current knowledge on transcriptional and post-transcriptional regulations of ERAP enzymes, and will discuss the contribution of recent studies to our understanding of ERAP1 and ERAP2 role in cancer immunity.
Collapse
Affiliation(s)
- Mirco Compagnone
- Paediatric Haematology/Oncology Department, Ospedale Pediatrico Bambino Gesù, 00146 Rome, Italy
| | - Loredana Cifaldi
- Academic Department of Pediatrics (DPUO), Ospedale Pediatrico Bambino Gesù, 00146 Rome, Italy
| | - Doriana Fruci
- Paediatric Haematology/Oncology Department, Ospedale Pediatrico Bambino Gesù, 00146 Rome, Italy.
| |
Collapse
|
17
|
Ashford P, Pang CSM, Moya-García AA, Adeyelu T, Orengo CA. A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations. Sci Rep 2019; 9:263. [PMID: 30670742 PMCID: PMC6343001 DOI: 10.1038/s41598-018-36401-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 11/13/2018] [Indexed: 12/31/2022] Open
Abstract
Tumour sequencing identifies highly recurrent point mutations in cancer driver genes, but rare functional mutations are hard to distinguish from large numbers of passengers. We developed a novel computational platform applying a multi-modal approach to filter out passengers and more robustly identify putative driver genes. The primary filter identifies enrichment of cancer mutations in CATH functional families (CATH-FunFams) – structurally and functionally coherent sets of evolutionary related domains. Using structural representatives from CATH-FunFams, we subsequently seek enrichment of mutations in 3D and show that these mutation clusters have a very significant tendency to lie close to known functional sites or conserved sites predicted using CATH-FunFams. Our third filter identifies enrichment of putative driver genes in functionally coherent protein network modules confirmed by literature analysis to be cancer associated. Our approach is complementary to other domain enrichment approaches exploiting Pfam families, but benefits from more functionally coherent groupings of domains. Using a set of mutations from 22 cancers we detect 151 putative cancer drivers, of which 79 are not listed in cancer resources and include recently validated cancer associated genes EPHA7, DCC netrin-1 receptor and zinc-finger protein ZNF479.
Collapse
Affiliation(s)
- Paul Ashford
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Camilla S M Pang
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Aurelio A Moya-García
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.,Laboratorio de Biología Molecular del Cáncer, Centro de Investigaciones Médico-Sanitarias (CIMES), Universidad de Málaga, Málaga, Spain
| | - Tolulope Adeyelu
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
18
|
Satya P, Chakraborty A, Sarkar D, Karan M, Das D, Mandal NA, Saha D, Datta S, Ray S, Kar CS, Karmakar PG, Mitra J, Singh NK. Transcriptome profiling uncovers β-galactosidases of diverse domain classes influencing hypocotyl development in jute (Corchorus capsularis L.). PHYTOCHEMISTRY 2018; 156:20-32. [PMID: 30172937 DOI: 10.1016/j.phytochem.2018.08.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 07/21/2018] [Accepted: 08/21/2018] [Indexed: 05/25/2023]
Abstract
Enzyme β-galactosidase (EC 3.2.1.23) is known to influence vascular differentiation during early vegetative growth of plants, but its role in hypocotyl development is not yet fully understood. We generated the hypocotyl transcriptome data of a hypocotyl-defect jute (Corchorus capsularis L.) mutant (52,393 unigenes) and its wild-type (WT) cv. JRC-212 (44,720 unigenes) by paired-end RNA-seq and identified 11 isoforms of β-galactosidase, using a combination of sequence annotation, domain identification and structural-homology modeling. Phylogenetic analysis classified the jute β-galactosidases into six subfamilies of glycoside hydrolase-35 family, which are closely related to homologs from Malvaceous species. We also report here the expression of a β-galactosidase of glycoside hydrolase-2 family that was earlier considered to be absent in higher plants. Comparative analysis of domain structure allowed us to propose a domain-centric evolution of the five classes of plant β-galactosidases. Further, we observed 1.8-12.2-fold higher expression of nine β-galactosidase isoforms in the mutant hypocotyl, which was characterized by slower growth, undulated shape and deformed cell wall. In vitro and in vivo β-galactosidase activities were also higher in the mutant hypocotyl. Phenotypic analysis supported a significant (P ≤ 0.01) positive correlation between enzyme activity and undulated hypocotyl. Taken together, our study identifies the complete set of β-galactosidases expressed in the jute hypocotyl, and provides compelling evidence that they may be involved in cell wall degradation during hypocotyl development.
Collapse
Affiliation(s)
- Pratik Satya
- ICAR-Central Research Institute for Jute and Allied Fibres, Nilganj, Barrackpore, Kolkata, 700 120, West Bengal, India.
| | - Avrajit Chakraborty
- ICAR-Central Research Institute for Jute and Allied Fibres, Nilganj, Barrackpore, Kolkata, 700 120, West Bengal, India
| | - Debabrata Sarkar
- ICAR-Central Research Institute for Jute and Allied Fibres, Nilganj, Barrackpore, Kolkata, 700 120, West Bengal, India
| | - Maya Karan
- ICAR-Central Research Institute for Jute and Allied Fibres, Nilganj, Barrackpore, Kolkata, 700 120, West Bengal, India
| | - Debajeet Das
- ICAR-Central Research Institute for Jute and Allied Fibres, Nilganj, Barrackpore, Kolkata, 700 120, West Bengal, India
| | - Nur Alam Mandal
- ICAR-Central Research Institute for Jute and Allied Fibres, Nilganj, Barrackpore, Kolkata, 700 120, West Bengal, India
| | - Dipnarayan Saha
- ICAR-Central Research Institute for Jute and Allied Fibres, Nilganj, Barrackpore, Kolkata, 700 120, West Bengal, India
| | - Subhojit Datta
- ICAR-Central Research Institute for Jute and Allied Fibres, Nilganj, Barrackpore, Kolkata, 700 120, West Bengal, India
| | - Soham Ray
- ICAR-Central Research Institute for Jute and Allied Fibres, Nilganj, Barrackpore, Kolkata, 700 120, West Bengal, India
| | - Chandan Sourav Kar
- ICAR-Central Research Institute for Jute and Allied Fibres, Nilganj, Barrackpore, Kolkata, 700 120, West Bengal, India
| | - Pran Gobinda Karmakar
- ICAR-Central Research Institute for Jute and Allied Fibres, Nilganj, Barrackpore, Kolkata, 700 120, West Bengal, India
| | - Jiban Mitra
- ICAR-Central Research Institute for Jute and Allied Fibres, Nilganj, Barrackpore, Kolkata, 700 120, West Bengal, India
| | - Nagendra Kumar Singh
- ICAR-National Research Centre on Plant Biotechnology, Pusa Campus, New Delhi, 110 012, India
| |
Collapse
|
19
|
Piñeiro-Yáñez E, Reboiro-Jato M, Gómez-López G, Perales-Patón J, Troulé K, Rodríguez JM, Tejero H, Shimamura T, López-Casas PP, Carretero J, Valencia A, Hidalgo M, Glez-Peña D, Al-Shahrour F. PanDrugs: a novel method to prioritize anticancer drug treatments according to individual genomic data. Genome Med 2018; 10:41. [PMID: 29848362 PMCID: PMC5977747 DOI: 10.1186/s13073-018-0546-1] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Accepted: 05/04/2018] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Large-sequencing cancer genome projects have shown that tumors have thousands of molecular alterations and their frequency is highly heterogeneous. In such scenarios, physicians and oncologists routinely face lists of cancer genomic alterations where only a minority of them are relevant biomarkers to drive clinical decision-making. For this reason, the medical community agrees on the urgent need of methodologies to establish the relevance of tumor alterations, assisting in genomic profile interpretation, and, more importantly, to prioritize those that could be clinically actionable for cancer therapy. RESULTS We present PanDrugs, a new computational methodology to guide the selection of personalized treatments in cancer patients using the variant lists provided by genome-wide sequencing analyses. PanDrugs offers the largest database of drug-target associations available from well-known targeted therapies to preclinical drugs. Scoring data-driven gene cancer relevance and drug feasibility PanDrugs interprets genomic alterations and provides a prioritized evidence-based list of anticancer therapies. Our tool represents the first drug prescription strategy applying a rational based on pathway context, multi-gene markers impact and information provided by functional experiments. Our approach has been systematically applied to TCGA patients and successfully validated in a cancer case study with a xenograft mouse model demonstrating its utility. CONCLUSIONS PanDrugs is a feasible method to identify potentially druggable molecular alterations and prioritize drugs to facilitate the interpretation of genomic landscape and clinical decision-making in cancer patients. Our approach expands the search of druggable genomic alterations from the concept of cancer driver genes to the druggable pathway context extending anticancer therapeutic options beyond already known cancer genes. The methodology is public and easily integratable with custom pipelines through its programmatic API or its docker image. The PanDrugs webtool is freely accessible at http://www.pandrugs.org .
Collapse
Affiliation(s)
- Elena Piñeiro-Yáñez
- Spanish National Cancer Research Centre (CNIO), 3rd Melchor Fernandez Almagro st., E-28029, Madrid, Spain
| | - Miguel Reboiro-Jato
- Computer Science Department - University of Vigo, Vigo, Spain
- Biomedical Research Centre (CINBIO), Vigo, Spain
| | - Gonzalo Gómez-López
- Spanish National Cancer Research Centre (CNIO), 3rd Melchor Fernandez Almagro st., E-28029, Madrid, Spain
| | - Javier Perales-Patón
- Spanish National Cancer Research Centre (CNIO), 3rd Melchor Fernandez Almagro st., E-28029, Madrid, Spain
| | - Kevin Troulé
- Spanish National Cancer Research Centre (CNIO), 3rd Melchor Fernandez Almagro st., E-28029, Madrid, Spain
| | | | - Héctor Tejero
- Spanish National Cancer Research Centre (CNIO), 3rd Melchor Fernandez Almagro st., E-28029, Madrid, Spain
| | - Takeshi Shimamura
- Loyola University Chicago Stritch School of Medicine, Maywood, IL, USA
| | - Pedro Pablo López-Casas
- Spanish National Cancer Research Centre (CNIO), 3rd Melchor Fernandez Almagro st., E-28029, Madrid, Spain
| | - Julián Carretero
- Department of Physiology - University of Valencia, Valencia, Spain
| | - Alfonso Valencia
- Spanish National Cancer Research Centre (CNIO), 3rd Melchor Fernandez Almagro st., E-28029, Madrid, Spain
| | - Manuel Hidalgo
- Spanish National Cancer Research Centre (CNIO), 3rd Melchor Fernandez Almagro st., E-28029, Madrid, Spain
- Beth Israel Deaconess Medical Center, Boston, USA
| | - Daniel Glez-Peña
- Computer Science Department - University of Vigo, Vigo, Spain
- Biomedical Research Centre (CINBIO), Vigo, Spain
| | - Fátima Al-Shahrour
- Spanish National Cancer Research Centre (CNIO), 3rd Melchor Fernandez Almagro st., E-28029, Madrid, Spain.
| |
Collapse
|
20
|
González-Sánchez JC, Raimondi F, Russell RB. Cancer genetics meets biomolecular mechanism-bridging an age-old gulf. FEBS Lett 2018; 592:463-474. [PMID: 29364530 DOI: 10.1002/1873-3468.12988] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 01/15/2018] [Accepted: 01/19/2018] [Indexed: 12/21/2022]
Abstract
Increasingly available genomic sequencing data are exploited to identify genes and variants contributing to diseases, particularly cancer. Traditionally, methods to find such variants have relied heavily on allele frequency and/or familial history, often neglecting to consider any mechanistic understanding of their functional consequences. Thus, while the set of known cancer-related genes has increased, for many, their mechanistic role in the disease is not completely understood. This issue highlights a wide gap between the disciplines of genetics, which largely aims to correlate genetic events with phenotype, and molecular biology, which ultimately aims at a mechanistic understanding of biological processes. Fortunately, new methods and several systematic studies have proved illuminating for many disease genes and variants by integrating sequencing with mechanistic data, including biomolecular structures and interactions. These have provided new interpretations for known mutations and suggested new disease-relevant variants and genes. Here, we review these approaches and discuss particular examples where these have had a profound impact on the understanding of human cancers.
Collapse
Affiliation(s)
| | - Francesco Raimondi
- Bioquant, Heidelberg University, Germany.,Heidelberg University Biochemistry Center (BZH), Germany
| | - Robert B Russell
- Bioquant, Heidelberg University, Germany.,Heidelberg University Biochemistry Center (BZH), Germany
| |
Collapse
|
21
|
Matityahu A, Onn I. A new twist in the coil: functions of the coiled-coil domain of structural maintenance of chromosome (SMC) proteins. Curr Genet 2017; 64:109-116. [PMID: 28835994 DOI: 10.1007/s00294-017-0735-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Revised: 08/15/2017] [Accepted: 08/17/2017] [Indexed: 02/07/2023]
Abstract
The higher-order organization of chromosomes ensures their stability and functionality. However, the molecular mechanism by which higher order structure is established is poorly understood. Dissecting the activity of the relevant proteins provides information essential for achieving a comprehensive understanding of chromosome structure. Proteins of the structural maintenance of chromosome (SMC) family of ATPases are the core of evolutionary conserved complexes. SMC complexes are involved in regulating genome dynamics and in maintaining genome stability. The structure of all SMC proteins resembles an elongated rod that contains a central coiled-coil domain, a common protein structural motif in which two α-helices twist together. In recent years, the imperative role of the coiled-coil domain to SMC protein activity and regulation has become evident. Here, we discuss recent advances in the function of the SMC coiled coils. We describe the structure of the coiled-coil domain of SMC proteins, modifications and interactions that are mediated by it. Furthermore, we assess the role of the coiled-coil domain in conformational switches of SMC proteins, and in determining the architecture of the SMC dimer. Finally, we review the interplay between mutations in the coiled-coil domain and human disorders. We suggest that distinctive properties of coiled coils of different SMC proteins contribute to their distinct functions. The discussion clarifies the mechanisms underlying the activity of SMC proteins, and advocates future studies to elucidate the function of the SMC coiled coil domain.
Collapse
Affiliation(s)
- Avi Matityahu
- Faculty of Medicine in the Galilee, Bar-Ilan University, 8 Henrietta Szold St., P.O. Box 1589, 1311502, Safed, Israel
| | - Itay Onn
- Faculty of Medicine in the Galilee, Bar-Ilan University, 8 Henrietta Szold St., P.O. Box 1589, 1311502, Safed, Israel.
| |
Collapse
|