1
|
Khan MS, Hanif W, Alsakhen N, Jabbar B, Shamkh IM, Alsaiari AA, Almehmadi M, Alghamdi S, Shakoori A, Al Farraj DA, Almutairi SM, Hussein Issa Mohammed Y, Abouzied AS, Rehman AU, Huwaimel B. Isoform switching leads to downregulation of cytokine producing genes in estrogen receptor positive breast cancer. Front Genet 2023; 14:1230998. [PMID: 37900178 PMCID: PMC10611502 DOI: 10.3389/fgene.2023.1230998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 09/18/2023] [Indexed: 10/31/2023] Open
Abstract
Objective: Estrogen receptor breast cancer (BC) is characterized by the expression of estrogen receptors. It is the most common cancer among women, with an incidence rate of 2.26 million cases worldwide. The aim of this study was to identify differentially expressed genes and isoform switching between estrogen receptor positive and triple negative BC samples. Methods: The data were collected from ArrayExpress, followed by preprocessing and subsequent mapping from HISAT2. Read quantification was performed by StringTie, and then R package ballgown was used to perform differential expression analysis. Functional enrichment analysis was conducted using Enrichr, and then immune genes were shortlisted based on the ScType marker database. Isoform switch analysis was also performed using the IsoformSwitchAnalyzeR package. Results: A total of 9,771 differentially expressed genes were identified, of which 86 were upregulated and 117 were downregulated. Six genes were identified as mainly associated with estrogen receptor positive BC, while a novel set of ten genes were found which have not previously been reported in estrogen receptor positive BC. Furthermore, alternative splicing and subsequent isoform usage in the immune system related genes were determined. Conclusion: This study identified the differential usage of isoforms in the immune system related genes in cancer cells that suggest immunosuppression due to the dysregulation of CXCR chemokine receptor binding, iron ion binding, and cytokine activity.
Collapse
Affiliation(s)
| | - Waqar Hanif
- Department of Bioinformatics, Department of Sciences, School of Interdisciplinary Engineering & Science (SINES), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Nada Alsakhen
- Department of Chemistry, Faculty of Science, The Hashemite University, Zarqa, Jordan
| | - Basit Jabbar
- Centre of Excellence in Molecular Biology, University of the Punjab, Lahore, Pakistan
| | - Israa M. Shamkh
- Chemo and Bioinformatics Lab, Bio Search Research Institution, Giza, Egypt
| | - Ahad Amer Alsaiari
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Taif University, Taif, Saudi Arabia
| | - Mazen Almehmadi
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Taif University, Taif, Saudi Arabia
| | - Saad Alghamdi
- Laboratory Medicine Department, Faculty of Applied Medical Sciences, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Afnan Shakoori
- Laboratory Medicine Department, Faculty of Applied Medical Sciences, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Dunia A. Al Farraj
- Department of Botany and Microbiology, College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Saeedah Musaed Almutairi
- Department of Botany and Microbiology, College of Science, King Saud University, Riyadh, Saudi Arabia
| | | | - Amr S. Abouzied
- Department of Pharmaceutical Chemistry, College of Pharmacy, University of Hail, Hail, Saudi Arabia
- Department of Pharmaceutical Chemistry, National Organization for Drug Control and Research (NOD CAR), Giza, Egypt
| | - Aziz-Ur Rehman
- Keystone Pharmacogenomics LLC, Bensalem, PA, United States
| | - Bader Huwaimel
- Department of Pharmaceutical Chemistry, College of Pharmacy, University of Hail, Hail, Saudi Arabia
- Medical and Diagnostic Research Center, University of Hail, Hail, Saudi Arabia
| |
Collapse
|
2
|
McMullen JRW, Soto U. Newly identified breast luminal progenitor and gestational stem cell populations likely give rise to HER2-overexpressing and basal-like breast cancers. Discov Oncol 2022; 13:38. [PMID: 35633393 PMCID: PMC9148339 DOI: 10.1007/s12672-022-00500-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 05/19/2022] [Indexed: 12/24/2022] Open
Abstract
Breast Cancer (BrC) is a common malignancy with genetically diverse subtypes. There is evidence that specific BrC subtypes originate from particular normal mammary cell populations. However, the cell populations that give rise to most BrC subtypes are unidentified. Several human breast scRNAseq datasets are available. In this research, we utilized a robust human scRNAseq dataset to identify population-specific marker genes and then identified the expression of these marker genes in specific BrC subtypes. In humans, several BrC subtypes, HER2-enriched, basal-like, and triple-negative (TN), are more common in women who have had children. This observation suggests that cell populations that originate during pregnancy give rise to these BrCs. The current human datasets have few normal parous samples, so we supplemented this research with mouse datasets, which contain mammary cells from various developmental stages. This research identified two novel normal breast cell populations that may be the origin of the basal-like and HER2-overexpressing subtypes, respectively. A stem cell-like population, SC, that expresses gestation-specific genes has similar gene expression patterns to basal-like BrCs. A novel luminal progenitor cell population and HER2-overexpressing BrCs are marked by S100A7, S100A8, and S100A9 expression. We bolstered our findings by examining SC gene expression in TN BrC scRNAseq datasets and S100A7-A9 gene expression in BrC cell lines. We discovered that several potential cancer stem cell populations highly express most of the SC genes in TN BrCs and confirmed S100A8 and A9 overexpression in a HER2-overexpressing BrC cell line. In summary, normal SC and the novel luminal progenitor cell population likely give rise to basal-like and HER2-overexpressing BrCs, respectively. Characterizing these normal cell populations may facilitate a better understanding of specific BrCs subtypes.
Collapse
Affiliation(s)
- James R W McMullen
- Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA, 92350, USA
| | - Ubaldo Soto
- Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA, 92350, USA.
| |
Collapse
|
3
|
Ren M, Zhang S, Ma S, Zhang Q. Gene-environment interaction identification via penalized robust divergence. Biom J 2022; 64:461-480. [PMID: 34725857 PMCID: PMC9386692 DOI: 10.1002/bimj.202000157] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 06/01/2021] [Accepted: 08/23/2021] [Indexed: 12/11/2022]
Abstract
In high-throughput cancer studies, gene-environment interactions associated with outcomes have important implications. Some commonly adopted identification methods do not respect the "main effect, interaction" hierarchical structure. In addition, they can be challenged by data contamination and/or long-tailed distributions, which are not uncommon. In this article, robust methods based on γ$\gamma$ -divergence and density power divergence are proposed to accommodate contaminated data/long-tailed distributions. A hierarchical sparse group penalty is adopted for regularized estimation and selection and can identify important gene-environment interactions and respect the "main effect, interaction" hierarchical structure. The proposed methods are implemented using an effective group coordinate descent algorithm. Simulation shows that when contamination occurs, the proposed methods can significantly outperform the existing alternatives with more accurate identification. The proposed approach is applied to the analysis of The Cancer Genome Atlas (TCGA) triple-negative breast cancer data and Gene Environment Association Studies (GENEVA) Type 2 Diabetes data.
Collapse
Affiliation(s)
- Mingyang Ren
- School of Mathematics Sciences, University of Chinese Academy of Sciences, Beijing, P. R. China,Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, P. R. China
| | - Sanguo Zhang
- School of Mathematics Sciences, University of Chinese Academy of Sciences, Beijing, P. R. China,Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, P. R. China
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Qingzhao Zhang
- Department of Statistics and Data Science, School of Economics, Wang Yanan Institute for Studies in Economics, Fujian Key Lab of Statistics, Xiamen University, Fujian, P. R. China
| |
Collapse
|
4
|
Chen T, Hua W, Xu B, Chen H, Xie M, Sun X, Ge X. Robust rank aggregation and cibersort algorithm applied to the identification of key genes in head and neck squamous cell cancer. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:4491-4507. [PMID: 34198450 DOI: 10.3934/mbe.2021228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
OBJECTIVE Although multiple hub genes have been identified in head and neck squamous cell cancer (HNSCC) in recent years, because of the limited sample size and inconsistent bioinformatics analysis methods, the results are not reliable. Therefore, it is urgent to use reliable algorithms to find new prognostic markers of HNSCC. METHOD The Robust Rank Aggregation (RRA) method was used to integrate 8 microarray datasets of HNSCC downloaded from the Gene Expression Omnibus (GEO) database to screen differentially expressed genes (DEGs). Later, Gene Ontology (GO) functional annotation together with Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis was carried out to discover functions of those discovered DEGs. According to the KEGG results, those discovered DEGs showed tight association with the occurrence and development of HNSCC. Then cibersort algorithm was used to analyze the infiltration of immune cells of HNSCC and we found that the main infiltrated immune cells were B cells, dendritic cells and macrophages. A protein-protein interaction (PPI) network was established; moreover, key modules were also constructed to select 5 hub genes from the whole network using cytoHubba. 3 hub genes showed significant relationship with prognosis for TCGA-derived HNSCC patients. RESULT The potent DEGs along with hub genes were selected by the combined bioinformatic approach. AURKA, BIRC5 and UBE2C genes may be the potential prognostic biomarker and therapeutic targets of HNSCC. CONCLUSIONS The Robust Rank Aggregation method and cibersort algorithm method can accurately predict the potential prognostic biomarker and therapeutic targets of HNSCC through multiple GEO datasets.
Collapse
Affiliation(s)
- Tingting Chen
- Department of Radiation Oncology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
- Department of Oncology, Northern Jiangsu People's Hospital, Yangzhou, Jiangsu 225000, China
| | - Wei Hua
- Department of Oncology, Northern Jiangsu People's Hospital, Yangzhou, Jiangsu 225000, China
| | - Bing Xu
- Department of Radiation Oncology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| | - Hui Chen
- Department of Radiation Oncology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| | - Minhao Xie
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, Jiangsu 210000, China
| | - Xinchen Sun
- Department of Radiation Oncology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| | - Xiaolin Ge
- Department of Radiation Oncology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| |
Collapse
|
5
|
Elian FA, Are U, Ghosh S, Nuin P, Footz T, McMullen TPW, Brindley DN, Walter MA. FOXQ1 is Differentially Expressed Across Breast Cancer Subtypes with Low Expression Associated with Poor Overall Survival. BREAST CANCER-TARGETS AND THERAPY 2021; 13:171-188. [PMID: 33688250 PMCID: PMC7935334 DOI: 10.2147/bctt.s282860] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 12/16/2020] [Indexed: 12/17/2022]
Abstract
Purpose Forkhead box Q1 (FOXQ1) has been shown to contribute to the development and progression of cancers, including ovarian and breast cancer (BC). However, research exploring FOXQ1 expression, copy number variation (CNV), and prognostic value across different BC subtypes is limited. Our purpose was to evaluate FOXQ1 mRNA expression, CNV, and prognostic value across BC subtypes. Materials and Methods We determined FOXQ1 expression and CNV in BC patient tumors using RT-qPCR and qPCR, respectively. We also analyzed FOXQ1 expression and CNV in BC cell lines in the CCLE database using K-means clustering. The prognostic value of FOXQ1 expression in the TCGA-BRCA database was assessed using univariate and multivariate Cox's regression analysis as well as using the online tools OncoLnc, GEPIA, and UALCAN. Results Our analyses reveal that FOXQ1 mRNA is differentially expressed between different subtypes of BC and is significantly decreased in luminal BC and HER2 patients when compared to normal breast tissue samples. Furthermore, analysis of BC cell lines showed that FOXQ1 mRNA expression was independent of CNV. Moreover, patients with low FOXQ1 mRNA expression had significantly poorer overall survival compared to those with high FOXQ1 mRNA expression. Finally, low FOXQ1 expression had a critical impact on the prognostic values of BC patients and was an independent predictor of overall survival when it was adjusted for BC subtypes and to two other FOX genes, FOXF2 and FOXM1. Conclusion Our study reveals for the first time that FOXQ1 is differentially expressed across BC subtypes and that low expression of FOXQ1 is indicative of poor prognosis in patients with BC.
Collapse
Affiliation(s)
- Fahed A Elian
- Department of Medical Genetics, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Ubah Are
- Department of Medical Genetics, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Sunita Ghosh
- Department of Medical Oncology, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada.,Department of Mathematical and Statistical Sciences, Faculty of Science, University of Alberta, Edmonton, AB, Canada
| | - Paulo Nuin
- Department of Medical Genetics, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Tim Footz
- Department of Medical Genetics, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Todd P W McMullen
- Department of Surgery, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada
| | - David N Brindley
- Department of Biochemistry, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada.,Cancer Research Institute of Northern Alberta, University of Alberta, Edmonton, AB, Canada
| | - Michael A Walter
- Department of Medical Genetics, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada
| |
Collapse
|
6
|
Wang S, Jeong HH, Sohn KA. ClearF: a supervised feature scoring method to find biomarkers using class-wise embedding and reconstruction. BMC Med Genomics 2019; 12:95. [PMID: 31296201 PMCID: PMC6624178 DOI: 10.1186/s12920-019-0512-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Feature selection or scoring methods for the detection of biomarkers are essential in bioinformatics. Various feature selection methods have been developed for the detection of biomarkers, and several studies have employed information-theoretic approaches. However, most of these methods generally require a long processing time. In addition, information-theoretic methods discretize continuous features, which is a drawback that can lead to the loss of information. RESULTS In this paper, a novel supervised feature scoring method named ClearF is proposed. The proposed method is suitable for continuous-valued data, which is similar to the principle of feature selection using mutual information, with the added advantage of a reduced computation time. The proposed score calculation is motivated by the association between the reconstruction error and the information-theoretic measurement. Our method is based on class-wise low-dimensional embedding and the resulting reconstruction error. Given multi-class datasets such as a case-control study dataset, low-dimensional embedding is first applied to each class to obtain a compressed representation of the class, and also for the entire dataset. Reconstruction is then performed to calculate the error of each feature and the final score for each feature is defined in terms of the reconstruction errors. The correlation between the information theoretic measurement and the proposed method is demonstrated using a simulation. For performance validation, we compared the classification performance of the proposed method with those of various algorithms on benchmark datasets. CONCLUSIONS The proposed method showed higher accuracy and lower execution time than the other established methods. Moreover, an experiment was conducted on the TCGA breast cancer dataset, and it was confirmed that the genes with the highest scores were highly associated with subtypes of breast cancer.
Collapse
Affiliation(s)
- Sehee Wang
- Department of Computer Engineering, Ajou University, Suwon, 16499, South Korea
| | - Hyun-Hwan Jeong
- Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kyung-Ah Sohn
- Department of Computer Engineering, Ajou University, Suwon, 16499, South Korea.
| |
Collapse
|
7
|
Fan X, Wang Y, Tang XQ. Extracting predictors for lung adenocarcinoma based on Granger causality test and stepwise character selection. BMC Bioinformatics 2019; 20:197. [PMID: 31074380 PMCID: PMC6509866 DOI: 10.1186/s12859-019-2739-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background Lung adenocarcinoma is the most common type of lung cancer, with high mortality worldwide. Its occurrence and development were thoroughly studied by high-throughput expression microarray, which produced abundant data on gene expression, DNA methylation, and miRNA quantification. However, the hub genes, which can be served as bio-markers for discriminating cancer and healthy individuals, are not well screened. Result Here we present a new method for extracting gene predictors, aiming to obtain the least predictors without losing the efficiency. We firstly analyzed three different expression microarrays and constructed multi-interaction network, since the individual expression dataset is not enough for describing biological behaviors dynamically and systematically. Then, we transformed the undirected interaction network to directed network by employing Granger causality test, followed by the predictors screened with the use of the stepwise character selection algorithm. Six predictors, including TOP2A, GRK5, SIRT7, MCM7, EGFR, and COL1A2, were ultimately identified. All the predictors are the cancer-related, and the number is very small fascinating diagnosis. Finally, the validation of this approach was verified by robustness analyses applied to six independent datasets; the precision is up to 95.3% ∼ 100%. Conclusion Although there are complicated differences between cancer and normal cells in gene functions, cancer cells could be differentiated in case that a group of special genes expresses abnormally. Here we presented a new, robust, and effective method for extracting gene predictors. We identified as low as 6 genes which can be taken as predictors for diagnosing lung adenocarcinoma.
Collapse
Affiliation(s)
- Xuemeng Fan
- School of Science, Jiangnan University, Wuxi, 214122, China
| | - Yaolai Wang
- School of Science, Jiangnan University, Wuxi, 214122, China
| | - Xu-Qing Tang
- School of Science, Jiangnan University, Wuxi, 214122, China. .,Wuxi Engineering Research Center for Biocomputing, Wuxi, 214122, China.
| |
Collapse
|
8
|
Classifying Breast Cancer Subtypes Using Multiple Kernel Learning Based on Omics Data. Genes (Basel) 2019; 10:genes10030200. [PMID: 30866472 PMCID: PMC6471546 DOI: 10.3390/genes10030200] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 02/25/2019] [Accepted: 03/02/2019] [Indexed: 12/31/2022] Open
Abstract
It is very significant to explore the intrinsic differences in breast cancer subtypes. These intrinsic differences are closely related to clinical diagnosis and designation of treatment plans. With the accumulation of biological and medicine datasets, there are many different omics data that can be viewed in different aspects. Combining these multiple omics data can improve the accuracy of prediction. Meanwhile; there are also many different databases available for us to download different types of omics data. In this article, we use estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2) to define breast cancer subtypes and classify any two breast cancer subtypes using SMO-MKL algorithm. We collected mRNA data, methylation data and copy number variation (CNV) data from TCGA to classify breast cancer subtypes. Multiple Kernel Learning (MKL) is employed to use these omics data distinctly. The result of using three omics data with multiple kernels is better than that of using single omics data with multiple kernels. Furthermore; these significant genes and pathways discovered in the feature selection process are also analyzed. In experiments; the proposed method outperforms other state-of-the-art methods and has abundant biological interpretations.
Collapse
|
9
|
Sun M, Ding T, Tang XQ, Keming Y. An Efficient Mixed-Model for Screening Differentially Expressed Genes of Breast Cancer Based on LR-RF. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:124-130. [PMID: 29993693 DOI: 10.1109/tcbb.2018.2829519] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
To screen differentially expressed genes quickly and efficiently in breast cancer, two gene microarray datasets of breast cancer, GSE15852 and GSE45255, were downloaded from GEO. By combining the Logistic Regression and Random Forest algorithm, this paper proposed a novel method named LR-RF to select differentially expressed genes of breast cancer on microarray data by the Bonferroni test of FWER error measure. Comparing with Logistic Regression and Random Forest, our study shows that LR-FR has a great facility in selecting differentially expressed genes. The average prediction accuracy of the proposed LR-RF from replicating random test 10 times surprisingly reaches 93.11 percent with variance as low as 0.00045. The prediction accuracy rate reaches a maximum 95.57 percent when threshold value α = 0.2 in the random forest algorithm process of ranking genes' importance score, and the differentially expressed genes are relatively few in number. In addition, through analyzing the gene interaction networks, most of the top 20 genes we selected were found to involve in the development of breast cancer. All of these results demonstrate the reliability and efficiency of LR-RF. It is anticipated that LR-RF would provide new knowledge and method for biologists, medical scientists, and cognitive computing researchers to identify disease-related genes of breast cancer.
Collapse
|
10
|
Yang J, Wei X, Tufan T, Kuscu C, Unlu H, Farooq S, Demirtas E, Paschal BM, Adli M. Recurrent mutations at estrogen receptor binding sites alter chromatin topology and distal gene expression in breast cancer. Genome Biol 2018; 19:190. [PMID: 30404658 PMCID: PMC6223090 DOI: 10.1186/s13059-018-1572-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 10/22/2018] [Indexed: 02/16/2023] Open
Abstract
BACKGROUND The mutational processes underlying non-coding cancer mutations and their biological significance in tumor evolution are poorly understood. To get better insights into the biological mechanisms of mutational processes in breast cancer, we integrate whole-genome level somatic mutations from breast cancer patients with chromatin states and transcription factor binding events. RESULTS We discover that a large fraction of non-coding somatic mutations in estrogen receptor (ER)-positive breast cancers are confined to ER binding sites. Notably, the highly mutated estrogen receptor binding sites are associated with more frequent chromatin loop contacts and the associated distal genes are expressed at higher level. To elucidate the functional significance of these non-coding mutations, we focus on two of the recurrently mutated estrogen receptor binding sites. Our bioinformatics and biochemical analysis suggest loss of DNA-protein interactions due to the recurrent mutations. Through CRISPR interference, we find that the recurrently mutated regulatory element at the LRRC3C-GSDMA locus impacts the expression of multiple distal genes. Using a CRISPR base editor, we show that the recurrent C→T conversion at the ZNF143 locus results in decreased TF binding, increased chromatin loop formation, and increased expression of multiple distal genes. This single point mutation mediates reduced response to estradiol-induced cell proliferation but increased resistance to tamoxifen-induced growth inhibition. CONCLUSIONS Our data suggest that ER binding is associated with localized accumulation of somatic mutations, some of which affect chromatin architecture, distal gene expression, and cellular phenotypes in ER-positive breast cancer.
Collapse
Affiliation(s)
- Jiekun Yang
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, 1340 Jefferson Park Ave, Pinn Hall, Room: 6228, Charlottesville, VA, 22903, USA
| | - Xiaolong Wei
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, 1340 Jefferson Park Ave, Pinn Hall, Room: 6228, Charlottesville, VA, 22903, USA
| | - Turan Tufan
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, 1340 Jefferson Park Ave, Pinn Hall, Room: 6228, Charlottesville, VA, 22903, USA
| | - Cem Kuscu
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, 1340 Jefferson Park Ave, Pinn Hall, Room: 6228, Charlottesville, VA, 22903, USA
| | - Hayrunnisa Unlu
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, 1340 Jefferson Park Ave, Pinn Hall, Room: 6228, Charlottesville, VA, 22903, USA
| | - Saadia Farooq
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, 1340 Jefferson Park Ave, Pinn Hall, Room: 6228, Charlottesville, VA, 22903, USA
| | - Elif Demirtas
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, 1340 Jefferson Park Ave, Pinn Hall, Room: 6228, Charlottesville, VA, 22903, USA
| | - Bryce M Paschal
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, 1340 Jefferson Park Ave, Pinn Hall, Room: 6228, Charlottesville, VA, 22903, USA
- Center for Cell Signalling, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Mazhar Adli
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, 1340 Jefferson Park Ave, Pinn Hall, Room: 6228, Charlottesville, VA, 22903, USA.
| |
Collapse
|
11
|
Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction. BioData Min 2018; 11:22. [PMID: 30386434 PMCID: PMC6203208 DOI: 10.1186/s13040-018-0184-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 10/11/2018] [Indexed: 11/26/2022] Open
Abstract
Background In the Next Generation Sequencing (NGS) era a large amount of biological data is being sequenced, analyzed, and stored in many public databases, whose interoperability is often required to allow an enhanced accessibility. The combination of heterogeneous NGS genomic data is an open challenge: the analysis of data from different experiments is a fundamental practice for the study of diseases. In this work, we propose to combine DNA methylation and RNA sequencing NGS experiments at gene level for supervised knowledge extraction in cancer. Methods We retrieve DNA methylation and RNA sequencing datasets from The Cancer Genome Atlas (TCGA), focusing on the Breast Invasive Carcinoma (BRCA), the Thyroid Carcinoma (THCA), and the Kidney Renal Papillary Cell Carcinoma (KIRP). We combine the RNA sequencing gene expression values with the gene methylation quantity, as a new measure that we define for representing the methylation quantity associated to a gene. Additionally, we propose to analyze the combined data through tree- and rule-based classification algorithms (C4.5, Random Forest, RIPPER, and CAMUR). Results We extract more than 15,000 classification models (composed of gene sets), which allow to distinguish the tumoral samples from the normal ones with an average accuracy of 95%. From the integrated experiments we obtain about 5000 classification models that consider both the gene measures related to the RNA sequencing and the DNA methylation experiments. Conclusions We compare the sets of genes obtained from the classifications on RNA sequencing and DNA methylation data with the genes obtained from the integration of the two experiments. The comparison results in several genes that are in common among the single experiments and the integrated ones (733 for BRCA, 35 for KIRP, and 861 for THCA) and 509 genes that are in common among the different experiments. Finally, we investigate the possible relationships among the different analyzed tumors by extracting a core set of 13 genes that appear in all tumors. A preliminary functional analysis confirms the relation of part of those genes (5 out of 13 and 279 out of 509) with cancer, suggesting to focus further studies on the new individuated ones. Electronic supplementary material The online version of this article (10.1186/s13040-018-0184-6) contains supplementary material, which is available to authorized users.
Collapse
|
12
|
Global Similarity Method Based on a Two-tier Random Walk for the Prediction of microRNA-Disease Association. Sci Rep 2018; 8:6481. [PMID: 29691434 PMCID: PMC5915491 DOI: 10.1038/s41598-018-24532-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 04/03/2018] [Indexed: 12/15/2022] Open
Abstract
microRNAs (miRNAs) mutation and maladjustment are related to the occurrence and development of human diseases. Studies on disease-associated miRNA have contributed to disease diagnosis and treatment. To address the problems, such as low prediction accuracy and failure to predict the relationship between new miRNAs and diseases and so on, we design a Laplacian score of graphs to calculate the global similarity of networks and propose a Global Similarity method based on a Two-tier Random Walk for the prediction of miRNA-disease association (GSTRW) to reveal the correlation between miRNAs and diseases. This method is a global approach that can simultaneously predict the correlation between all diseases and miRNAs in the absence of negative samples. Experimental results reveal that this method is better than existing approaches in terms of overall prediction accuracy and ability to predict orphan diseases and novel miRNAs. A case study on GSTRW for breast cancer and conlon cancer is also conducted, and the majority of miRNA-disease association can be verified by our experiment. This study indicates that this method is feasible and effective.
Collapse
|
13
|
Chen M, Peng Y, Li A, Li Z, Deng Y, Liu W, Liao B, Dai C. A novel information diffusion method based on network consistency for identifying disease related microRNAs. RSC Adv 2018; 8:36675-36690. [PMID: 35558942 PMCID: PMC9088870 DOI: 10.1039/c8ra07519k] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Accepted: 10/17/2018] [Indexed: 12/27/2022] Open
Abstract
The abnormal expression of miRNAs is directly related to the development of human diseases. Predicting the potential candidate miRNAs associated with diseases can contribute to the detection, diagnosis, treatment and prevention of human complex diseases. The effective inference of the calculation method of the relationship between miRNAs and diseases is an effective supplement to biological experiments. It is of great help in the prevention, treatment and prognosis of complex diseases. This paper proposes a novel information diffusion method based on network consistency (IDNC) for identifying disease related microRNAs. The model first synthesizes the miRNA family information and the miRNA function similarity to reconstruct the miRNA network, and reconstruct the disease network by using the known disease and miRNA-related information and the semantic score between diseases. Then the global similarity of the two networks is obtained by using the Laplacian score of graphs. The global similarity score is a measure of the similarity between diseases and miRNAs. The disease–miRNA relation network was reconstructed by integrating the global similarity relation. The network consistency diffusion seed is then obtained by combining the global similarity network with the reconstructed disease–miRNA association network. Thereafter, the stable diffusion spectrum is generated as the prediction score by using the restarted random walk algorithm. The AUC value obtained by performing the LOOCV in the gold benchmark dataset is 0.8814. The AUC value obtained by performing the LOOCV in the predictive dataset is 0.9512. Compared with other frontier methods, our method has higher accuracy, which is further illustrated by case studies of breast neoplasms and colon neoplasms to prove that IDNC is valuable. The abnormal expression of miRNAs is directly related to the development of human diseases.![]()
Collapse
Affiliation(s)
- Min Chen
- College of Computer Science and Technology
- Hunan Institute of Technology
- 421002 Hengyang
- China
- College of Information Science and Engineering
| | - Yan Peng
- College of International Communication
- Hunan Institute of Technology
- 421002 Hengyang
- China
| | - Ang Li
- College of Computer Science and Technology
- Hunan Institute of Technology
- 421002 Hengyang
- China
| | - Zejun Li
- College of Computer Science and Technology
- Hunan Institute of Technology
- 421002 Hengyang
- China
- College of Information Science and Engineering
| | - Yingwei Deng
- College of Computer Science and Technology
- Hunan Institute of Technology
- 421002 Hengyang
- China
| | - Wenhua Liu
- College of Computer Science and Technology
- Hunan Institute of Technology
- 421002 Hengyang
- China
| | - Bo Liao
- College of Information Science and Engineering
- Hunan University
- Changsha 410082
- China
| | - Chengqiu Dai
- College of Computer Science and Technology
- Hunan Institute of Technology
- 421002 Hengyang
- China
| |
Collapse
|