1
|
Quan C, Liu F, Qi L, Tie Y. LRT-CLUSTER: A New Clustering Algorithm Based on Likelihood Ratio Test to Identify Driving Genes. Interdiscip Sci 2023; 15:217-230. [PMID: 36848004 DOI: 10.1007/s12539-023-00554-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 01/31/2023] [Accepted: 02/01/2023] [Indexed: 03/01/2023]
Abstract
Somatic mutations often occur at high relapse sites in protein sequences, which indicates that the location clustering of somatic missense mutations can be used to identify driving genes. However, the traditional clustering algorithm has such problems as the background signal over-fitting, the clustering algorithm is not suitable for mutation data, and the performance of identifying low-frequency mutation genes needs to be improved. In this paper, we propose a linear clustering algorithm based on likelihood ratio test knowledge to identify driver genes. In this experiment, firstly, the polynucleotide mutation rate is calculated based on the prior knowledge of likelihood ratio test. Then, the simulation data set is obtained through the background mutation rate model. Finally, the unsupervised peak clustering algorithm is used to, respectively, evaluate the somatic mutation data and the simulation data to identify the driver genes. The experimental results show that our method achieves a better balance of precision and sensitivity. It can also identify the driver genes missed by other methods, making it an effective supplement to other methods. We also discover some potential linkages between genes and between genes and mutation sites, which is of great value to target drug therapy research. Method framework: Our proposed model framework is as follows. a. Counting mutation sites and the number of mutations in tumor gene elements. b. The nucleotide context mutation frequency is counted based on the likelihood ratio test knowledge, and the background mutation rate model is obtained. c. Based on Monte Carlo simulation method, data sets with the same number of mutations as gene elements are randomly sampled to obtain simulated mutation data, and the sampling frequency of each mutation site is related to the mutation rate of polynucleotide. d. The original mutation data and the simulated mutation data after random reconstruction are clustered by peak density, respectively, and the corresponding clustering scores are obtained. e. We can obtain the clustering information statistics in each gene segment and score of each gene segment from the original single nucleotide mutation data through step d. f. According to the observed score and the simulated clustering score, the p-value of the corresponding gene fragment is calculated. g. We can obtain the clustering information statistics in each gene segment and score of each gene segment from the simulated single nucleotide mutation data through step d.
Collapse
Affiliation(s)
- Chenxu Quan
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China.,Department of Respiratory and Sleep Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Fenghui Liu
- Department of Respiratory and Sleep Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Lin Qi
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China
| | - Yun Tie
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China.
| |
Collapse
|
2
|
Iqbal S, Brünger T, Pérez-Palma E, Macnee M, Brunklaus A, Daly MJ, Campbell AJ, Hoksza D, May P, Lal D. Delineation of functionally essential protein regions for 242 neurodevelopmental genes. Brain 2023; 146:519-533. [PMID: 36256779 PMCID: PMC9924913 DOI: 10.1093/brain/awac381] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 08/12/2022] [Accepted: 09/04/2022] [Indexed: 01/25/2023] Open
Abstract
Neurodevelopmental disorders (NDDs), including severe paediatric epilepsy, autism and intellectual disabilities are heterogeneous conditions in which clinical genetic testing can often identify a pathogenic variant. For many of them, genetic therapies will be tested in this or the coming years in clinical trials. In contrast to first-generation symptomatic treatments, the new disease-modifying precision medicines require a genetic test-informed diagnosis before a patient can be enrolled in a clinical trial. However, even in 2022, most identified genetic variants in NDD genes are 'variants of uncertain significance'. To safely enrol patients in precision medicine clinical trials, it is important to increase our knowledge about which regions in NDD-associated proteins can 'tolerate' missense variants and which ones are 'essential' and will cause a NDD when mutated. In addition, knowledge about functionally indispensable regions in the 3D structure context of proteins can also provide insights into the molecular mechanisms of disease variants. We developed a novel consensus approach that overlays evolutionary, and population based genomic scores to identify 3D essential sites (Essential3D) on protein structures. After extensive benchmarking of AlphaFold predicted and experimentally solved protein structures, we generated the currently largest expert curated protein structure set for 242 NDDs and identified 14 377 Essential3D sites across 189 gene disorders associated proteins. We demonstrate that the consensus annotation of Essential3D sites improves prioritization of disease mutations over single annotations. The identified Essential3D sites were enriched for functional features such as intermembrane regions or active sites and discovered key inter-molecule interactions in protein complexes that were otherwise not annotated. Using the currently largest autism, developmental disorders, and epilepsies exome sequencing studies including >360 000 NDD patients and population controls, we found that missense variants at Essential3D sites are 8-fold enriched in patients. In summary, we developed a comprehensive protein structure set for 242 NDDs and identified 14 377 Essential3D sites in these. All data are available at https://es-ndd.broadinstitute.org for interactive visual inspection to enhance variant interpretation and development of mechanistic hypotheses for 242 NDDs genes. The provided resources will enhance clinical variant interpretation and in silico drug target development for NDD-associated genes and encoded proteins.
Collapse
Affiliation(s)
- Sumaiya Iqbal
- The Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Tobias Brünger
- Cologne Center for Genomics, University of Cologne, 50923 Köln, Germany
| | - Eduardo Pérez-Palma
- Universidad del Desarrollo, Centro de Genética y Genómica, Facultad de Medicina Clínica Alemana, 7610658 Las Condes, Santiago de Chile, Chile
| | - Marie Macnee
- Cologne Center for Genomics, University of Cologne, 50923 Köln, Germany
| | - Andreas Brunklaus
- The Paediatric Neurosciences Research Group, Royal Hospital for Children, Glasgow G12 8QQ, UK
- School of Health and Wellbeing, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | - Mark J Daly
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Institute for Molecular Medicine Finland (FIMM), Centre of Excellence in Complex Disease Genetics, University of Helsinki, 00100 Helsinki, Finland
| | - Arthur J Campbell
- The Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - David Hoksza
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, 110 00 Staré Město, Czechia, Czech Republic
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 4365 Esch-sur-Alzette, Luxembourg
| | - Dennis Lal
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Cologne Center for Genomics, University of Cologne, 50923 Köln, Germany
- Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Genomic Medicine Institute, Lerner Research Institute Cleveland Clinic, Cleveland, OH 44106, USA
| |
Collapse
|
3
|
English Speech Recognition System Model Based on Computer-Aided Function and Neural Network Algorithm. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:7846877. [PMID: 35498214 PMCID: PMC9054419 DOI: 10.1155/2022/7846877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 03/12/2022] [Accepted: 03/18/2022] [Indexed: 11/24/2022]
Abstract
With the economic globalization continuous growth of China's socioeconomic level tends to be internationalized, China's attention to English has been significantly improved. However, the domestic English teaching level is limited, so it is impossible to correct students' English pronunciation and make a reasonable evaluation at all times so that oral training has certain disadvantages. However, the computer-aided language learning system at home and abroad focuses on the practice of words and grammar, and the evaluation indicators are less and not comprehensive. In view of the complexity of English pronunciation changes, traditional speech recognition is difficult to recognize speech speed and improve its accuracy. Furthermore, to strengthen the English pronunciation of domestic students, a nonlinear network structure is studied in depth to simulate the human brain to analyze a model of speech recognition is established Mel frequency cepstrum characteristic parameters of human ear model and deep belief network. In this paper, the traditional computer pronunciation evaluation method is improved in an all-round way, and a set of high-quality speech recognition system of speech recognition method is constructed. Aiming at the above problems, it takes the students as the research, which proves that the method adopted in this paper can give the learners accurate pronunciation quality analysis report and guidance and correct their intonation and improve the learning effect, and the experimental data verify that the improved speech recognition system model recognition ability is higher than the traditional model.
Collapse
|
4
|
Abstract
Three-dimensional protein structural data at the molecular level are pivotal for successful precision medicine. Such data are crucial not only for discovering drugs that act to block the active site of the target mutant protein but also for clarifying to the patient and the clinician how the mutations harbored by the patient work. The relative paucity of structural data reflects their cost, challenges in their interpretation, and lack of clinical guidelines for their utilization. Rapid technological advancements in experimental high-resolution structural determination increasingly generate structures. Computationally, modeling algorithms, including molecular dynamics simulations, are becoming more powerful, as are compute-intensive hardware, particularly graphics processing units, overlapping with the inception of the exascale era. Accessible, freely available, and detailed structural and dynamical data can be merged with big data to powerfully transform personalized pharmacology. Here we review protein and emerging genome high-resolution data, along with means, applications, and examples underscoring their usefulness in precision medicine. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA; .,Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA;
| | - Guy Nir
- Department of Biochemistry and Molecular Biology, Department of Neuroscience, Cell Biology and Anatomy, and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, Texas, USA
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA;
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA.,Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, Ohio, USA.,Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| |
Collapse
|
5
|
Killinger BJ, Petyuk VA, Wright AT. Detecting differential protein abundance by combining peptide level P-values. Mol Omics 2020; 16:554-562. [PMID: 32924053 DOI: 10.1039/d0mo00045k] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The majority of methods for detecting differentially abundant proteins between samples in label-free LC-MS bottom-up proteomics experiments rely on statistically testing inferred protein abundances derived from peptide ionization intensities or averaging peptide level statistics. Here, we statistically test peptide ionization intensities directly and combine the resulting dependent P-values using the Empirical Brown's Method (EBM), avoiding error introduced through the estimation of protein abundances or summarizing test statistics. We show that on a spike-in proteomics dataset, a peptide level approach using EBM outperforms differential abundance detection using a protein level approach and several analysis workflows, including MSstats. Additionally, we demonstrate the effectiveness of this approach by detecting enriched proteins from an activity-based protein profiling dataset.
Collapse
Affiliation(s)
- Bryan J Killinger
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA.
| | | | | |
Collapse
|
6
|
Martinez-Ledesma E, Flores D, Trevino V. Computational methods for detecting cancer hotspots. Comput Struct Biotechnol J 2020; 18:3567-3576. [PMID: 33304455 PMCID: PMC7711189 DOI: 10.1016/j.csbj.2020.11.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 11/12/2020] [Accepted: 11/13/2020] [Indexed: 12/14/2022] Open
Abstract
Cancer mutations that are recurrently observed among patients are known as hotspots. Hotspots are highly relevant because they are, presumably, likely functional. Known hotspots in BRAF, PIK3CA, TP53, KRAS, IDH1 support this idea. However, hundreds of hotspots have never been validated experimentally. The detection of hotspots nevertheless is challenging because background mutations obscure their statistical and computational identification. Although several algorithms have been applied to identify hotspots, they have not been reviewed before. Thus, in this mini-review, we summarize more than 40 computational methods applied to detect cancer hotspots in coding and non-coding DNA. We first organize the methods in cluster-based, 3D, position-specific, and miscellaneous to provide a general overview. Then, we describe their embed procedures, implementations, variations, and differences. Finally, we discuss some advantages, provide some ideas for future developments, and mention opportunities such as application to viral integrations, translocations, and epigenetics.
Collapse
Affiliation(s)
- Emmanuel Martinez-Ledesma
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Bioinformática y Diagnóstico Clínico, Monterrey, Nuevo León, Mexico
| | - David Flores
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Bioinformática y Diagnóstico Clínico, Monterrey, Nuevo León, Mexico
- Universidad del Caribe, Departamento de Ciencias Básicas e Ingenierías, Cancún, Quintana Roo, Mexico
| | - Victor Trevino
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Bioinformática y Diagnóstico Clínico, Monterrey, Nuevo León, Mexico
| |
Collapse
|
7
|
Lu X, Qian X, Li X, Miao Q, Peng S. DMCM: a Data-adaptive Mutation Clustering Method to identify cancer-related mutation clusters. Bioinformatics 2019; 35:389-397. [PMID: 30010784 DOI: 10.1093/bioinformatics/bty624] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 07/12/2018] [Indexed: 12/11/2022] Open
Abstract
Motivation Functional somatic mutations within coding amino acid sequences confer growth advantage in pathogenic process. Most existing methods for identifying cancer-related mutations focus on the single amino acid or the entire gene level. However, gain-of-function mutations often cluster in specific protein regions instead of existing independently in the amino acid sequences. Some approaches for identifying mutation clusters with mutation density on amino acid chain have been proposed recently. But their performance in identification of mutation clusters remains to be improved. Results Here we present a Data-adaptive Mutation Clustering Method (DMCM), in which kernel density estimate (KDE) with a data-adaptive bandwidth is applied to estimate the mutation density, to find variable clusters with different lengths on amino acid sequences. We apply this approach in the mutation data of 571 genes in over twenty cancer types from The Cancer Genome Atlas (TCGA). We compare the DMCM with M2C, OncodriveCLUST and Pfam Domain and find that DMCM tends to identify more significant clusters. The cross-validation analysis shows DMCM is robust and cluster cancer type enrichment analysis shows that specific cancer types are enriched for specific mutation clusters. Availability and implementation DMCM is written in Python and analysis methods of DMCM are written in R. They are all released online, available through https://github.com/XinguoLu/DMCM. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xinguo Lu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xin Qian
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xing Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Qiumai Miao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.,School of Computer Science, National University of Defense Technology, Changsha, China
| |
Collapse
|
8
|
JNK 1/2 represses Lkb 1-deficiency-induced lung squamous cell carcinoma progression. Nat Commun 2019; 10:2148. [PMID: 31089135 PMCID: PMC6517592 DOI: 10.1038/s41467-019-09843-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 03/22/2019] [Indexed: 12/13/2022] Open
Abstract
Mechanisms of lung squamous cell carcinoma (LSCC) development are poorly understood. Here, we report that JNK1/2 activities attenuate Lkb1-deficiency-driven LSCC initiation and progression through repressing ΔNp63 signaling. In vivo Lkb1 ablation alone is sufficient to induce LSCC development by reducing MKK7 levels and JNK1/2 activities, independent of the AMPKα and mTOR pathways. JNK1/2 activities is positively regulated by MKK7 during LSCC development. Pharmaceutically elevated JNK1/2 activities abates Lkb1 dependent LSCC formation while compound mutations of Jnk1/2 and Lkb1 further accelerate LSCC progression. JNK1/2 is inactivated in a substantial proportion of human LSCC and JNK1/2 activities positively correlates with survival rates of lung, cervical and head and neck squamous cell carcinoma patients. These findings not only determine a suppressive role of the stress response regulators JNK1/2 on LSCC development by acting downstream of the key LSCC suppresser Lkb1, but also demonstrate activating JNK1/2 activities as a therapeutic approach against LSCC. LKB1 is frequently mutated in lung squamous cell carcinomas. Here, the authors show that sole LKB1 depletion is sufficient to drive the development of this cancer, where downstream defective MKK7-JNK1/2 signalling activates the ∆Np63/p63 pathway to induce subsequent epithelial cells transformation and tumour progression.
Collapse
|
9
|
Capriotti E, Ozturk K, Carter H. Integrating molecular networks with genetic variant interpretation for precision medicine. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2018; 11:e1443. [PMID: 30548534 PMCID: PMC6450710 DOI: 10.1002/wsbm.1443] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 10/23/2018] [Accepted: 10/30/2018] [Indexed: 02/01/2023]
Abstract
More reliable and cheaper sequencing technologies have revealed the vast mutational landscapes characteristic of many phenotypes. The analysis of such genetic variants has led to successful identification of altered proteins underlying many Mendelian disorders. Nevertheless the simple one‐variant one‐phenotype model valid for many monogenic diseases does not capture the complexity of polygenic traits and disorders. Although experimental and computational approaches have improved detection of functionally deleterious variants and important interactions between gene products, the development of comprehensive models relating genotype and phenotypes remains a challenge in the field of genomic medicine. In this context, a new view of the pathologic state as significant perturbation of the network of interactions between biomolecules is crucial for the identification of biochemical pathways associated with complex phenotypes. Seminal studies in systems biology combined the analysis of genetic variation with protein–protein interaction networks to demonstrate that even as biological systems evolve to be robust to genetic variation, their topologies create disease vulnerabilities. More recent analyses model the impact of genetic variants as changes to the “wiring” of the interactome to better capture heterogeneity in genotype–phenotype relationships. These studies lay the foundation for using networks to predict variant effects at scale using machine‐learning or algorithmic approaches. A wealth of databases and resources for the annotation of genotype–phenotype relationships have been developed to support developments in this area. This overview describes how study of the molecular interactome has generated insights linking the organization of biological systems to disease mechanism, and how this information can enable precision medicine. This article is categorized under:
Translational, Genomic, and Systems Medicine > Translational Medicine Biological Mechanisms > Cell Signaling Models of Systems Properties and Processes > Mechanistic Models Analytical and Computational Methods > Computational Methods
Collapse
Affiliation(s)
- Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Kivilcim Ozturk
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, California
| | - Hannah Carter
- Department of Medicine and Institute for Genomic Medicine, University of California, San Diego, La Jolla, California
| |
Collapse
|
10
|
K T, N KV, S S. Distribution based Fuzzy Estimate Spectral Clustering for Cancer Detection with Protein Sequence and Structural Motifs. Asian Pac J Cancer Prev 2018; 19:1935-1940. [PMID: 30051675 PMCID: PMC6165630 DOI: 10.22034/apjcp.2018.19.7.1935] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Objective: In biological data analysis, protein sequence and structural motifs are an amino-acid sequence patterns
that are widespread and used as tools for detecting the cancer at an earlier stage. To improve the cancer detection with
minimum space and time complexity, Distribution based Fuzzy Estimate Spectral Clustering (DFESC) technique is
developed. Methods: Initially, the protein sequence motifs are taken from dataset to form the cluster. The Distribution
based spectral clustering is applied to group the protein sequence by measuring the generalized jaccard similarity
between each protein sequences. To develop the clustering accuracy, soft computing technique namely fuzzy logic is
applied to calculate membership value of each sequence motifs. Results: The outcome showed that the presented DFESC
technique effectively identifies the cancer in terms of clustering accuracy, false positive rate, and cancer detection time
and space complexity. Conclusion: Based on the observations, evaluation of DFESC technique provides improved
result for premature detection of cancer using protein sequence and structural motifs.
Collapse
Affiliation(s)
- Thenmozhi K
- Department of Computer Applications, Selvam College of Technology, Namakkal, TamilNadu, India,For Correspondence:
| | | | - Shanthi S
- Department of Computer Applications, Kongu Engineering College, Erode, TamilNadu, India
| |
Collapse
|
11
|
Comparison of algorithms for the detection of cancer drivers at subgene resolution. Nat Methods 2017; 14:782-788. [PMID: 28714987 DOI: 10.1038/nmeth.4364] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 06/16/2017] [Indexed: 12/19/2022]
Abstract
Understanding genetic events that lead to cancer initiation and progression remains one of the biggest challenges in cancer biology. Traditionally, most algorithms for cancer-driver identification look for genes that have more mutations than expected from the average background mutation rate. However, there is now a wide variety of methods that look for nonrandom distribution of mutations within proteins as a signal for the driving role of mutations in cancer. Here we classify and review such subgene-resolution algorithms, compare their findings on four distinct cancer data sets from The Cancer Genome Atlas and discuss how predictions from these algorithms can be interpreted in the emerging paradigms that challenge the simple dichotomy between driver and passenger genes.
Collapse
|
12
|
Pan-Cancer Mutational and Transcriptional Analysis of the Integrator Complex. Int J Mol Sci 2017; 18:ijms18050936. [PMID: 28468258 PMCID: PMC5454849 DOI: 10.3390/ijms18050936] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Revised: 04/20/2017] [Accepted: 04/23/2017] [Indexed: 12/28/2022] Open
Abstract
The integrator complex has been recently identified as a key regulator of RNA Polymerase II-mediated transcription, with many functions including the processing of small nuclear RNAs, the pause-release and elongation of polymerase during the transcription of protein coding genes, and the biogenesis of enhancer derived transcripts. Moreover, some of its components also play a role in genome maintenance. Thus, it is reasonable to hypothesize that their functional impairment or altered expression can contribute to malignancies. Indeed, several studies have described the mutations or transcriptional alteration of some Integrator genes in different cancers. Here, to draw a comprehensive pan-cancer picture of the genomic and transcriptomic alterations for the members of the complex, we reanalyzed public data from The Cancer Genome Atlas. Somatic mutations affecting Integrator subunit genes and their transcriptional profiles have been investigated in about 11,000 patients and 31 tumor types. A general heterogeneity in the mutation frequencies was observed, mostly depending on tumor type. Despite the fact that we could not establish them as cancer drivers, INTS7 and INTS8 genes were highly mutated in specific cancers. A transcriptome analysis of paired (normal and tumor) samples revealed that the transcription of INTS7, INTS8, and INTS13 is significantly altered in several cancers. Experimental validation performed on primary tumors confirmed these findings.
Collapse
|
13
|
Correction: Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression. PLoS Comput Biol 2017; 13:e1005472. [PMID: 28384155 PMCID: PMC5383014 DOI: 10.1371/journal.pcbi.1005472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|