1
|
Md Zaki FA, Mohamad Hanif EA. Identifying miRNA as biomarker for breast cancer subtyping using association rule. Comput Biol Med 2024; 178:108696. [PMID: 38850957 DOI: 10.1016/j.compbiomed.2024.108696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 05/03/2024] [Accepted: 06/01/2024] [Indexed: 06/10/2024]
Abstract
- This paper presents a comprehensive study focused on breast cancer subtyping, utilizing a multifaceted approach that integrates feature selection, machine learning classifiers, and miRNA regulatory networks. The feature selection process begins with the CFS algorithm, followed by the Apriori algorithm for association rule generation, resulting in the identification of significant features tailored to Luminal A, Luminal B, HER-2 enriched, and Basal-like subtypes. The subsequent application of Random Forest (RF) and Support Vector Machine (SVM) classifiers yielded promising results, with the SVM model achieving an overall accuracy of 76.60 % and the RF model demonstrating robust performance at 80.85 %. Detailed accuracy metrics revealed strengths and areas for refinement, emphasizing the potential for optimizing subtype-specific recall. To explore the regulatory landscape in depth, an analysis of selected miRNAs was conducted using MIENTURNET, a tool for visualizing miRNA-target interactions. While FDR analysis raised concerns for HER-2 and Basal-like subtypes, Luminal A and Luminal B subtypes showcased significant miRNA-gene interactions. Functional enrichment analysis for Luminal A highlighted the role of Ovarian steroidogenesis, implicating specific miRNAs such as hsa-let-7c-5p and hsa-miR-125b-5p as potential diagnostic biomarkers and regulators of Luminal A breast cancer. Luminal B analysis uncovered associations with the MAPK signaling pathway, with miRNAs like hsa-miR-203a-3p and hsa-miR-19a-3p exhibiting potential diagnostic and therapeutic significance. In conclusion, this integrative approach combines machine learning techniques with miRNA analysis to provide a holistic understanding of breast cancer subtypes. The identified miRNAs and associated pathways offer insights into potential diagnostic biomarkers and therapeutic targets, contributing to the ongoing efforts to improve breast cancer diagnostics and personalized treatment strategies.
Collapse
Affiliation(s)
- Fatimah Audah Md Zaki
- Department of Internet Engineering & Computer Science, Universiti Tunku Abdul Rahman (UTAR), Selangor, Malaysia.
| | - Ezanee Azlina Mohamad Hanif
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia Medical Centre, Kuala Lumpur, Malaysia.
| |
Collapse
|
2
|
Li J, Xiang S, Song X. Screening Nonlinear miRNA Features of Breast Cancer by Using Ensemble Regularized Polynomial Logistic Regression. J Comput Biol 2024; 31:670-690. [PMID: 39017171 DOI: 10.1089/cmb.2023.0289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2024] Open
Abstract
Differentiating breast cancer subtypes based on miRNA data helps doctors provide more personalized treatment plans for patients. This paper explored the interaction between miRNA pairs and developed a novel ensemble regularized polynomial logistic regression method for screening nonlinear features of breast cancer. Three different types of second-order polynomial logistic regression with elastic network penalty (SOPLR-EN) in which each type contains 10 identical models were integrated to determine the most suitable sample set for feature screening by using bootstrap sampling strategy. A single feature and 39 nonlinear features were obtained by screening features that appeared at least 15 times in 30 integrations and were involved in the classification of at least 4 subtypes. The second-order polynomial logistic regression with ridge penalty (SOPLR-R) built on screened feature set achieved 82.30% classification accuracy for distinguishing breast cancer subtypes, surpassing the performance of other six methods. Further, 11 nonlinear miRNA biomarkers were identified, and their significant relevance to breast cancer was illustrated through six types of biological analysis.
Collapse
Affiliation(s)
- Juntao Li
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, China
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, Xinxiang, China
| | - Shan Xiang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, China
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, Xinxiang, China
| | - Xuekun Song
- College of Information Technology, Henan University of Chinese Medicine, Zhengzhou, China
| |
Collapse
|
3
|
Trasierras AM, Luna JM, Ventura S. A contrast set mining based approach for cancer subtype analysis. Artif Intell Med 2023; 143:102590. [PMID: 37673572 DOI: 10.1016/j.artmed.2023.102590] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 05/24/2023] [Accepted: 05/30/2023] [Indexed: 09/08/2023]
Abstract
The task of detecting common and unique characteristics among different cancer subtypes is an important focus of research that aims to improve personalized therapies. Unlike current approaches mainly based on predictive techniques, our study aims to improve the knowledge about the molecular mechanisms that descriptively led to cancer, thus not requiring previous knowledge to be validated. Here, we propose an approach based on contrast set mining to capture high-order relationships in cancer transcriptomic data. In this way, we were able to extract valuable insights from several cancer subtypes in the form of highly specific genetic relationships related to functional pathways affected by the disease. To this end, we have divided several cancer gene expression databases by the subtype associated with each sample to detect which gene groups are related to each cancer subtype. To demonstrate the potential and usefulness of the proposed approach we have extensively analysed RNA-Seq gene expression data from breast, kidney, and colon cancer subtypes. The possible role of the obtained genetic relationships was further evaluated through extensive literature research, while its prognosis was assessed via survival analysis, finding gene expression patterns related to survival in various cancer subtypes. Some gene associations were described in the literature as potential cancer biomarkers while other results have been not described yet and could be a starting point for future research.
Collapse
Affiliation(s)
- A M Trasierras
- Department of Computer Science and Numerical Analysis, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), Spain; Maimonides Biomedical Research Institute of Cordoba, IMIBIC, University of Cordoba, Córdoba, 14071, Spain; Phytoplant Research S.L.U, Departamento Tecnología y Control, Rabanales 21-Parque Científico Tecnológico de Córdoba, Calle Astrónoma Cecilia Payne, Córdoba, Spain
| | - J M Luna
- Department of Computer Science and Numerical Analysis, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), Spain; Maimonides Biomedical Research Institute of Cordoba, IMIBIC, University of Cordoba, Córdoba, 14071, Spain
| | - S Ventura
- Department of Computer Science and Numerical Analysis, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), Spain; Maimonides Biomedical Research Institute of Cordoba, IMIBIC, University of Cordoba, Córdoba, 14071, Spain.
| |
Collapse
|
4
|
Fu Q, Li Q, Li X. An improved multi-objective marine predator algorithm for gene selection in classification of cancer microarray data. Comput Biol Med 2023; 160:107020. [PMID: 37196457 DOI: 10.1016/j.compbiomed.2023.107020] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 04/09/2023] [Accepted: 05/05/2023] [Indexed: 05/19/2023]
Abstract
Gene selection (GS) is an important branch of interest within the field of feature selection, which is widely used in cancer classification. It provides essential insights into the pathogenesis of cancer and enables a deeper understanding of cancer data. In cancer classification, GS is essentially a multi-objective optimization problem, which aims to simultaneously optimize the two objectives of classification accuracy and the size of the gene subset. The marine predator algorithm (MPA) has been successfully employed in practical applications, however, its random initialization can lead to blindness, which may adversely affect the convergence of the algorithm. Furthermore, the elite individuals in guiding evolution are randomly chosen from the Pareto solutions, which may degrade the good exploration performance of the population. To overcome these limitations, a multi-objective improved MPA with continuous mapping initialization and leader selection strategies is proposed. In this work, a new continuous mapping initialization with ReliefF overwhelms the defects with less information in late evolution. Moreover, an improved elite selection mechanism with Gaussian distribution guides the population to evolve towards a better Pareto front. Finally, an efficient mutation method is adopted to prevent evolutionary stagnation. To evaluate its effectiveness, the proposed algorithm was compared with 9 famous algorithms. The experimental results on 16 datasets demonstrate that the proposed algorithm can significantly reduce the data dimension and obtain the highest classification accuracy on most of high-dimension cancer microarray datasets.
Collapse
Affiliation(s)
- Qiyong Fu
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Qi Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Xiaobo Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.
| |
Collapse
|
5
|
Daneshvar NHN, Masoudi-Sobhanzadeh Y, Omidi Y. A voting-based machine learning approach for classifying biological and clinical datasets. BMC Bioinformatics 2023; 24:140. [PMID: 37041456 PMCID: PMC10088226 DOI: 10.1186/s12859-023-05274-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 04/05/2023] [Indexed: 04/13/2023] Open
Abstract
BACKGROUND Different machine learning techniques have been proposed to classify a wide range of biological/clinical data. Given the practicability of these approaches accordingly, various software packages have been also designed and developed. However, the existing methods suffer from several limitations such as overfitting on a specific dataset, ignoring the feature selection concept in the preprocessing step, and losing their performance on large-size datasets. To tackle the mentioned restrictions, in this study, we introduced a machine learning framework consisting of two main steps. First, our previously suggested optimization algorithm (Trader) was extended to select a near-optimal subset of features/genes. Second, a voting-based framework was proposed to classify the biological/clinical data with high accuracy. To evaluate the efficiency of the proposed method, it was applied to 13 biological/clinical datasets, and the outcomes were comprehensively compared with the prior methods. RESULTS The results demonstrated that the Trader algorithm could select a near-optimal subset of features with a significant level of p-value < 0.01 relative to the compared algorithms. Additionally, on the large-sie datasets, the proposed machine learning framework improved prior studies by ~ 10% in terms of the mean values associated with fivefold cross-validation of accuracy, precision, recall, specificity, and F-measure. CONCLUSION Based on the obtained results, it can be concluded that a proper configuration of efficient algorithms and methods can increase the prediction power of machine learning approaches and help researchers in designing practical diagnosis health care systems and offering effective treatment plans.
Collapse
Affiliation(s)
| | - Yosef Masoudi-Sobhanzadeh
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.
- Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Yadollah Omidi
- Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Florida, 33328, USA.
| |
Collapse
|
6
|
Manzo G, Pannatier Y, Duflot P, Kolh P, Chavez M, Bleret V, Calvaresi D, Jimenez-Del-Toro O, Schumacher M, Calbimonte JP. Breast cancer survival analysis agents for clinical decision support. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 231:107373. [PMID: 36720187 DOI: 10.1016/j.cmpb.2023.107373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 12/31/2022] [Accepted: 01/23/2023] [Indexed: 06/18/2023]
Abstract
Personalized support and assistance are essential for cancer survivors, given the physical and psychological consequences they have to suffer after all the treatments and conditions associated with this illness. Digital assistive technologies have proved to be effective in enhancing the quality of life of cancer survivors, for instance, through physical exercise monitoring and recommendation or emotional support and prediction. To maximize the efficacy of these techniques, it is challenging to develop accurate models of patient trajectories, which are typically fed with information acquired from retrospective datasets. This paper presents a Machine Learning-based survival model embedded in a clinical decision system architecture for predicting cancer survivors' trajectories. The proposed architecture of the system, named PERSIST, integrates the enrichment and pre-processing of clinical datasets coming from different sources and the development of clinical decision support modules. Moreover, the model includes detecting high-risk markers, which have been evaluated in terms of performance using both a third-party dataset of breast cancer patients and a retrospective dataset collected in the context of the PERSIST clinical study.
Collapse
Affiliation(s)
- Gaetano Manzo
- University of Applied Sciences and Arts Western Switzerland (HES-SO), Switzerland; National Institutes of Health (NIH), Bethesda, MD, USA.
| | - Yvan Pannatier
- University of Applied Sciences and Arts Western Switzerland (HES-SO), Switzerland
| | - Patrick Duflot
- CHU of Liege, Department of Information System Management, Belgium
| | - Philippe Kolh
- CHU of Liege, Department of Information System Management, Belgium
| | - Marcela Chavez
- CHU of Liege, Department of Information System Management, Belgium
| | | | - Davide Calvaresi
- University of Applied Sciences and Arts Western Switzerland (HES-SO), Switzerland
| | | | - Michael Schumacher
- University of Applied Sciences and Arts Western Switzerland (HES-SO), Switzerland
| | - Jean-Paul Calbimonte
- University of Applied Sciences and Arts Western Switzerland (HES-SO), Switzerland; The Sense Innovation and Research Center, Lausanne and Sion, Switzerland
| |
Collapse
|
7
|
Braik M. Enhanced Ali Baba and the forty thieves algorithm for feature selection. Neural Comput Appl 2023; 35:6153-6184. [PMID: 36408290 PMCID: PMC9666985 DOI: 10.1007/s00521-022-08015-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 10/26/2022] [Indexed: 11/16/2022]
Abstract
Feature Selection (FS) aims to ameliorate the classification rate of dataset models by selecting only a small set of appropriate features from the initial range of features. In consequence, a reliable optimization method is needed to deal with the matters involved in this problem. Often, traditional methods fail to optimally reduce the high dimensionality of the feature space of complex datasets, which lead to the elicitation of weak classification models. Meta-heuristics can offer a favorable classification rate for high-dimensional datasets. Here, a binary version of a new human-based algorithm named Ali Baba and the Forty Thieves (AFT) was applied to tackle a pool of FS problems. Although AFT is an efficient meta-heuristic for optimizing many problems, it sometimes exhibits premature convergence and low search performance. These issues were mitigated by proposing three enhanced versions of AFT, namely: (1) A Binary Multi-layered AFT called BMAFT which uses hierarchical and distributed frameworks, (2) Binary Elitist AFT (BEAFT) which uses an elitist learning strategy, and, (3) Binary Self-adaptive AFT (BSAFT) which uses an adapted tracking distance parameter. These versions along with the basic Binary AFT (BAFT) were expansively assessed on twenty-four problems gathered from different repositories. The results showed that the proposed algorithms substantially enhance the performance of BAFT in terms of convergence speed and solution accuracy. On top of that, the overall results showed that BMAFT is the most competitive, which provided the best results with excellent performance scores compared to other competing algorithms.
Collapse
Affiliation(s)
- Malik Braik
- Department of Computer Science, Al-Balqa Applied University, Salt, Jordan
| |
Collapse
|
8
|
Pandiyan S, Wang L. A comprehensive review on recent approaches for cancer drug discovery associated with artificial intelligence. Comput Biol Med 2022; 150:106140. [PMID: 36179510 DOI: 10.1016/j.compbiomed.2022.106140] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 07/20/2022] [Accepted: 09/18/2022] [Indexed: 11/03/2022]
Abstract
Through the revolutionization of artificial intelligence (AI) technologies in clinical research, significant improvement is observed in diagnosis of cancer. Utilization of these AI technologies, such as machine and deep learning, is imperative for the discovery of novel anticancer drugs and improves existing/ongoing cancer therapeutics. However, building a model for complicated cancers and their types remains a challenge due to lack of effective therapeutics that hinder the establishment of effective computational tools. In this review, we exploit recent approaches and state-of-the-art in implementing AI methods for anticancer drug discovery, and discussed how advances in these applications need to be considered in the current cancer therapeutics. Considering the immense potential of AI, we explore molecular docking and their interactions to recognize metabolic activities that support drug design. Finally, we highlight corresponding strategies in applying machine and deep learning methods to various types of cancer with their pros and cons.
Collapse
Affiliation(s)
- Sanjeevi Pandiyan
- Research Center for Intelligent Information Technology, Nantong University, Nantong, China; School of Information Science and Technology, Nantong University, Nantong, China; Nantong Research Institute for Advanced Communication Technologies, Nantong, China
| | - Li Wang
- Research Center for Intelligent Information Technology, Nantong University, Nantong, China; School of Information Science and Technology, Nantong University, Nantong, China; Nantong Research Institute for Advanced Communication Technologies, Nantong, China.
| |
Collapse
|
9
|
Yu L, Ju B, Ren S. HLGNN-MDA: Heuristic Learning Based on Graph Neural Networks for miRNA-Disease Association Prediction. Int J Mol Sci 2022; 23:13155. [PMID: 36361945 PMCID: PMC9657597 DOI: 10.3390/ijms232113155] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 10/23/2022] [Accepted: 10/26/2022] [Indexed: 01/12/2024] Open
Abstract
Identifying disease-related miRNAs can improve the understanding of complex diseases. However, experimentally finding the association between miRNAs and diseases is expensive in terms of time and resources. The computational screening of reliable miRNA-disease associations has thus become a necessary tool to guide biological experiments. "Similar miRNAs will be associated with the same disease" is the assumption on which most current miRNA-disease association prediction methods rely; however, biased prior knowledge, and incomplete and inaccurate miRNA similarity data and disease similarity data limit the performance of the model. Here, we propose heuristic learning based on graph neural networks to predict microRNA-disease associations (HLGNN-MDA). We learn the local graph topology features of the predicted miRNA-disease node pairs using graph neural networks. In particular, our improvements to the graph convolution layer of the graph neural network enable it to learn information among homogeneous nodes and among heterogeneous nodes. We illustrate the performance of HLGNN-MDA by performing tenfold cross-validation against excellent baseline models. The results show that we have promising performance in multiple metrics. We also focus on the role of the improvements to the graph convolution layer in the model. The case studies are supported by evidence on breast cancer, hepatocellular carcinoma and renal cell carcinoma. Given the above, the experiments demonstrate that HLGNN-MDA can serve as a reliable method to identify novel miRNA-disease associations.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China
| | | | | |
Collapse
|
10
|
Li J, Zhang H, Gao F. Identification of miRNA biomarkers for breast cancer by combining ensemble regularized multinomial logistic regression and Cox regression. BMC Bioinformatics 2022; 23:434. [PMID: 36258162 PMCID: PMC9580207 DOI: 10.1186/s12859-022-04982-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 10/05/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Breast cancer is one of the most common cancers in women. It is necessary to classify breast cancer subtypes because different subtypes need specific treatment. Identifying biomarkers and classifying breast cancer subtypes is essential for developing appropriate treatment methods for patients. MiRNAs can be easily detected in tumor biopsy and play an inhibitory or promoting role in breast cancer, which are considered promising biomarkers for distinguishing subtypes. RESULTS A new method combing ensemble regularized multinomial logistic regression and Cox regression was proposed for identifying miRNA biomarkers in breast cancer. After adopting stratified sampling and bootstrap sampling, the most suitable sample subset for miRNA feature screening was determined via ensemble 100 regularized multinomial logistic regression models. 124 miRNAs that participated in the classification of at least 3 subtypes and appeared at least 50 times in 100 integrations were screened as features. 22 miRNAs from the proposed feature set were further identified as the biomarkers for breast cancer by using Cox regression based on survival analysis. The accuracy of 5 methods on the proposed feature set was significantly higher than on the other two feature sets. The results of 7 biological analyses illustrated the rationality of the identified biomarkers. CONCLUSIONS The screened features can better distinguish breast cancer subtypes. Notably, the genes and proteins related to the proposed 22 miRNAs were considered oncogenes or inhibitors of breast cancer. 9 of the 22 miRNAs have been proved to be markers of breast cancer. Therefore, our results can be considered in future related research.
Collapse
Affiliation(s)
- Juntao Li
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, China
| | - Hongmei Zhang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, China.
| | - Fugen Gao
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, China
| |
Collapse
|
11
|
Wong JYY, Imani P, Grigoryan H, Bassig BA, Dai Y, Hu W, Blechter B, Rahman ML, Ji BT, Duan H, Niu Y, Ye M, Jia X, Meng T, Bin P, Downward G, Meliefste K, Leng S, Fu W, Yang J, Ren D, Xu J, Zhou B, Hosgood HD, Vermeulen R, Zheng Y, Silverman DT, Rothman N, Rappaport SM, Lan Q. Exposure to diesel engine exhaust and alterations to the Cys34/Lys525 adductome of human serum albumin. ENVIRONMENTAL TOXICOLOGY AND PHARMACOLOGY 2022; 95:103966. [PMID: 36067935 PMCID: PMC9757949 DOI: 10.1016/j.etap.2022.103966] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 08/28/2022] [Accepted: 08/29/2022] [Indexed: 06/15/2023]
Abstract
We investigated whether exposure to carcinogenic diesel engine exhaust (DEE) was associated with altered adduct levels in human serum albumin (HSA) residues. Nano-liquid chromatography-high resolution mass spectrometry (nLC-HRMS) was used to measure adducts of Cys34 and Lys525 residues in plasma samples from 54 diesel engine factory workers and 55 unexposed controls. An untargeted adductomics and bioinformatics pipeline was used to find signatures of Cys34/Lys525 adductome modifications. To identify adducts that were altered between DEE-exposed and unexposed participants, we used an ensemble feature selection approach that ranks and combines findings from linear regression and penalized logistic regression, then aggregates the important findings with those determined by random forest. We detected 40 Cys34 and 9 Lys525 adducts. Among these findings, we found evidence that 6 Cys34 adducts were altered between DEE-exposed and unexposed participants (i.e., 841.75, 851.76, 856.10, 860.77, 870.43, and 913.45). These adducts were biologically related to antioxidant activity.
Collapse
Affiliation(s)
- Jason Y Y Wong
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA.
| | - Partow Imani
- School of Public Health, University of California, Berkeley, CA, USA
| | - Hasmik Grigoryan
- School of Public Health, University of California, Berkeley, CA, USA
| | - Bryan A Bassig
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Yufei Dai
- National Institute of Occupational Health and Poison Control, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Wei Hu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Batel Blechter
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Mohammad L Rahman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Bu-Tian Ji
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Huawei Duan
- National Institute of Occupational Health and Poison Control, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Yong Niu
- National Institute of Occupational Health and Poison Control, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Meng Ye
- National Institute of Occupational Health and Poison Control, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Xiaowei Jia
- National Institute of Occupational Health and Poison Control, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Tao Meng
- National Institute of Occupational Health and Poison Control, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Ping Bin
- National Institute of Occupational Health and Poison Control, Chinese Center for Disease Control and Prevention, Beijing, China
| | - George Downward
- Division of Environmental Epidemiology, Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
| | - Kees Meliefste
- Division of Environmental Epidemiology, Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
| | - Shuguang Leng
- Cancer Control and Population Sciences, University of New Mexico Comprehensive Cancer Center, Albuquerque, NM, USA; Division of Epidemiology, Biostatistics, and Preventive Medicine, Department of Internal Medicine, University of New Mexico School of Medicine, University of New Mexico Comprehensive Cancer Center, Albuquerque, NM, USA
| | - Wei Fu
- Chaoyang Center for Disease Control and Prevention, Chaoyang, Liaoning, China
| | - Jufang Yang
- Chaoyang Center for Disease Control and Prevention, Chaoyang, Liaoning, China
| | - Dianzhi Ren
- Chaoyang Center for Disease Control and Prevention, Chaoyang, Liaoning, China
| | - Jun Xu
- School of Public Health, The University of Hong Kong, Hong Kong Special Administrative Region
| | - Baosen Zhou
- China Medical University, Shenyang, Liaoning, China
| | - H Dean Hosgood
- Division of Epidemiology, Albert Einstein College of Medicine, New York, NY, USA
| | - Roel Vermeulen
- Division of Environmental Epidemiology, Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
| | - Yuxin Zheng
- School of Public Health, Qingdao University, Qingdao, China
| | - Debra T Silverman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Nathaniel Rothman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | | | - Qing Lan
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| |
Collapse
|
12
|
P D, C G. A systematic review on machine learning and deep learning techniques in cancer survival prediction. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2022; 174:62-71. [PMID: 35933043 DOI: 10.1016/j.pbiomolbio.2022.07.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/13/2022] [Accepted: 07/19/2022] [Indexed: 06/15/2023]
Abstract
Cancer is a disease which is characterised by the unusual and uncontrollable growth of body cells. This usually happens asymptomatically and gets spread to other parts of the body. The major problem in treating cancer is that its progress is not monitored once it is diagnosed. The progress or the prognosis can be done through survival analysis. The survival analysis is the branch of statistics that deals in predicting the time of event of occurrence. In the case of cancer prognosis the event is the survival time of the patient from the onset of the disease or it can be the recurrence of the disease after undergoing a treatment. This study aims to bring out the machine learning and deep learning models involved in providing the prognosis to the cancer patients.
Collapse
Affiliation(s)
- Deepa P
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Gunavathi C
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
13
|
Sheidaei A, Foroushani AR, Gohari K, Zeraati H. A novel dynamic Bayesian network approach for data mining and survival data analysis. BMC Med Inform Decis Mak 2022; 22:251. [PMID: 36138394 PMCID: PMC9503243 DOI: 10.1186/s12911-022-02000-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 09/19/2022] [Indexed: 11/30/2022] Open
Abstract
Background Censorship is the primary challenge in survival modeling, especially in human health studies. The classical methods have been limited by applications like Kaplan–Meier or restricted assumptions like the Cox regression model. On the other hand, Machine learning algorithms commonly rely on the high dimensionality of data and ignore the censorship attribute. In addition, these algorithms are more sophisticated to understand and utilize. We propose a novel approach based on the Bayesian network to address these issues. Methods We proposed a two-slice temporal Bayesian network model for the survival data, introducing the survival and censorship status in each observed time as the dynamic states. A score-based algorithm learned the structure of the directed acyclic graph. The likelihood approach conducted parameter learning. We conducted a simulation study to assess the performance of our model in comparison with the Kaplan–Meier and Cox proportional hazard regression. We defined various scenarios according to the sample size, censoring rate, and shapes of survival and censoring distributions across time. Finally, we fit the model on a real-world dataset that includes 760 post gastrectomy surgery due to gastric cancer. The validation of the model was explored using the hold-out technique based on the posterior classification error. Our survival model performance results were compared using the Kaplan–Meier and Cox proportional hazard models. Results The simulation study shows the superiority of DBN in bias reduction for many scenarios compared with Cox regression and Kaplan–Meier, especially in the late survival times. In the real-world data, the structure of the dynamic Bayesian network model satisfied the finding from Kaplan–Meier and Cox regression classical approaches. The posterior classification error found from the validation technique did not exceed 0.04, representing that our network predicted the state variables with more than 96% accuracy. Conclusions Our proposed dynamic Bayesian network model could be used as a data mining technique in the context of survival data analysis. The advantages of this approach are feature selection ability, straightforward interpretation, handling of high-dimensional data, and few assumptions. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-02000-7.
Collapse
Affiliation(s)
- Ali Sheidaei
- Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Pour Sina St., Keshavarz Blvd., Tehran, 14176-13151, Iran
| | - Abbas Rahimi Foroushani
- Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Pour Sina St., Keshavarz Blvd., Tehran, 14176-13151, Iran
| | - Kimiya Gohari
- Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Hojjat Zeraati
- Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Pour Sina St., Keshavarz Blvd., Tehran, 14176-13151, Iran.
| |
Collapse
|
14
|
Sarkar S, Mali K. Breast Cancer Subtypes Classification with Hybrid Machine Learning Model. Methods Inf Med 2022; 61:68-83. [PMID: 36096144 DOI: 10.1055/s-0042-1751043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
BACKGROUND Breast cancer is the most prevailing heterogeneous disease among females characterized with distinct molecular subtypes and varied clinicopathological features. With the emergence of various artificial intelligence techniques especially machine learning, the breast cancer research has attained new heights in cancer detection and prognosis. OBJECTIVE Recent development in computer driven diagnostic system has enabled the clinicians to improve the accuracy in detecting various types of breast tumors. Our study is to develop a computer driven diagnostic system which will enable the clinicians to improve the accuracy in detecting various types of breast tumors. METHODS In this article, we proposed a breast cancer classification model based on the hybridization of machine learning approaches for classifying triple-negative breast cancer and non-triple negative breast cancer patients with clinicopathological features collected from multiple tertiary care hospitals/centers. RESULTS The results of genetic algorithm and support vector machine (GA-SVM) hybrid model was compared with classics feature selection SVM hybrid models like support vector machine-recursive feature elimination (SVM-RFE), LASSO-SVM, Grid-SVM, and linear SVM. The classification results obtained from GA-SVM hybrid model outperformed the other compared models when applied on two distinct hospital-based datasets of patients investigated with breast cancer in North West of African subcontinent. To validate the predictive model accuracy, 10-fold cross-validation method was applied on all models with the same multicentered datasets. The model performance was evaluated with well-known metrics like mean squared error, logarithmic loss, F1-score, area under the ROC curve, and the precision-recall curve. CONCLUSION The hybrid machine learning model can be employed for breast cancer subtypes classification that could help the medical practitioners in better treatment planning and disease outcome.
Collapse
Affiliation(s)
- Suvobrata Sarkar
- Department of Computer Science and Engineering, Dr. B.C. Roy Engineering College, Durgapur, West Bengal, India
| | - Kalyani Mali
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
| |
Collapse
|
15
|
A Comparative Analysis of Swarm Intelligence and Evolutionary Algorithms for Feature Selection in SVM-Based Hyperspectral Image Classification. REMOTE SENSING 2022. [DOI: 10.3390/rs14133019] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Feature selection (FS) is vital in hyperspectral image (HSI) classification, it is an NP-hard problem, and Swarm Intelligence and Evolutionary Algorithms (SIEAs) have been proved effective in solving it. However, the high dimensionality of HSIs still leads to the inefficient operation of SIEAs. In addition, many SIEAs exist, but few studies have conducted a comparative analysis of them for HSI FS. Thus, our study has two goals: (1) to propose a new filter–wrapper (F–W) framework that can improve the SIEAs’ performance; and (2) to apply ten SIEAs under the F–W framework (F–W–SIEAs) to optimize the support vector machine (SVM) and compare their performance concerning five aspects, namely the accuracy, the number of selected bands, the convergence rate, and the relative runtime. Based on three HSIs (i.e., Indian Pines, Salinas, and Kennedy Space Center (KSC)), we demonstrate how the proposed framework helps improve these SIEAs’ performances. The five aspects of the ten algorithms are different, but some have similar optimization capacities. On average, the F–W–Genetic Algorithm (F–W–GA) and F–W–Grey Wolf Optimizer (F–W–GWO) have the strongest optimization abilities, while the F–W–GWO requires the least runtime among the ten. The F–W–Marine Predators Algorithm (F–W–MPA) is second only to the two and slightly better than F–W–Differential Evolution (F–W–DE). The F–W–Ant Lion Optimizer (F–W–ALO), F–W–I-Ching Divination Evolutionary Algorithm (F–W–IDEA), and F–W–Whale Optimization Algorithm (F–W–WOA) have the middle optimization abilities, and F–W–IDEA takes the most runtime. Moreover, the F–W–SIEAs outperform other commonly used FS techniques in accuracy overall, especially in complex scenes.
Collapse
|
16
|
Wang Q, Duan M, Fan Y, Liu S, Ren Y, Huang L, Zhou F. Transforming OMIC features for classification using Siamese convolutional networks. J Bioinform Comput Biol 2022; 20:2250013. [DOI: 10.1142/s0219720022500135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
17
|
Feature Subset Selection with Optimal Adaptive Neuro-Fuzzy Systems for Bioinformatics Gene Expression Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1698137. [PMID: 35607459 PMCID: PMC9124108 DOI: 10.1155/2022/1698137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 04/20/2022] [Accepted: 04/27/2022] [Indexed: 01/28/2023]
Abstract
Recently, bioinformatics and computational biology-enabled applications such as gene expression analysis, cellular restoration, medical image processing, protein structure examination, and medical data classification utilize fuzzy systems in offering effective solutions and decisions. The latest developments of fuzzy systems with artificial intelligence techniques enable to design the effective microarray gene expression classification models. In this aspect, this study introduces a novel feature subset selection with optimal adaptive neuro-fuzzy inference system (FSS-OANFIS) for gene expression classification. The major aim of the FSS-OANFIS model is to detect and classify the gene expression data. To accomplish this, the FSS-OANFIS model designs an improved grey wolf optimizer-based feature selection (IGWO-FS) model to derive an optimal subset of features. Besides, the OANFIS model is employed for gene classification and the parameter tuning of the ANFIS model is adjusted by the use of coyote optimization algorithm (COA). The application of IGWO-FS and COA techniques helps in accomplishing enhanced microarray gene expression classification outcomes. The experimental validation of the FSS-OANFIS model has been performed using Leukemia, Prostate, DLBCL Stanford, and Colon Cancer datasets. The proposed FSS-OANFIS model has resulted in a maximum classification accuracy of 89.47%.
Collapse
|
18
|
Cancer MiRNA biomarker classification based on Improved Generative Adversarial Network optimized with Mayfly Optimization Algorithm. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103545] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
19
|
da Costa NL, de Sá Alves M, de Sá Rodrigues N, Bandeira CM, Oliveira Alves MG, Mendes MA, Cesar Alves LA, Almeida JD, Barbosa R. Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma - A data mining approach. Comput Biol Med 2022; 143:105296. [PMID: 35149458 DOI: 10.1016/j.compbiomed.2022.105296] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 01/03/2022] [Accepted: 01/20/2022] [Indexed: 12/13/2022]
Abstract
Data mining has proven to be a reliable method to analyze and discover useful knowledge about various diseases, including cancer research. In particular, data mining and machine learning algorithms to study oral squamous cell carcinoma (OSCC), the most common form of oral cancer, is a new area of research. This malignant neoplasm can be studied using saliva samples. Saliva is an important biofluid that must be used to verify potential biomarkers associated with oral cancer. In this study, first, we provide an overview of OSSC diagnoses based on machine learning and salivary metabolites. To our knowledge, this is the first study to apply advanced data mining techniques to diagnose OSCC. Then, we give new results of classification and feature selection algorithms used to identify potential salivary biomarkers of OSCC. To accomplish this task, we used the filter feature selection random forest importance algorithm and a wrapper methodology to evaluate the importance of metabolites obtained from gas chromatography mass-spectrometry (GC-MS) in the context of differentiation of OSCC and the control group. Salivary samples (n = 68) were collected for the control group, and the OSCC group were from patients matched for gender, age, and smoking habit. The classification process occurred based on Random Forest (RF) classification algorithm along with 10-cross validation. The results showed that glucuronic acid, maleic acid, and batyl alcohol can classify the samples with an area under the curve (AUC) of 0.91 versus an AUC of 0.76 using all 51 metabolites analyzed. The methodology used in this study can assist healthcare professionals and be adopted to discover diagnostic biomarkers for other diseases.
Collapse
Affiliation(s)
- Nattane Luíza da Costa
- Informatics Nucleo, Goiano Federal Institute of Education, Science and Technology, Campus Urutaí, Urutaí-GO, Brazil.
| | - Mariana de Sá Alves
- Department of Biosciences and Oral Diagnosis, Institute of Science and Technology, São Paulo State University (Unesp), São José dos Campos, Brazil.
| | - Nayara de Sá Rodrigues
- Department of Biosciences and Oral Diagnosis, Institute of Science and Technology, São Paulo State University (Unesp), São José dos Campos, Brazil.
| | - Celso Muller Bandeira
- Department of Biosciences and Oral Diagnosis, Institute of Science and Technology, São Paulo State University (Unesp), São José dos Campos, Brazil.
| | - Mônica Ghislaine Oliveira Alves
- Technology Reaearch Center (NPT), Universidade Mogi das Cruzes, Mogi das Cruzes, Brazil; School of Medicine, Anhembi Morumbi University, São José dos Campos, Brazil.
| | | | - Levy Anderson Cesar Alves
- School of Dentistry, Universidade Paulista, São Paulo, Brazil; School of Dentistry, Universidade Municipal de São Caetano do Sul, São Caetano do Sul, Brazil.
| | - Janete Dias Almeida
- Department of Biosciences and Oral Diagnosis, Institute of Science and Technology, São Paulo State University (Unesp), São José dos Campos, Brazil.
| | - Rommel Barbosa
- Instituto de Informática, Universidade Federal de Goiás, Goiânia-GO, Brazil.
| |
Collapse
|
20
|
Zhou L, Wang H. A Combined Feature Screening Approach of Random Forest and Filter-based Methods for Ultra-high Dimensional Data. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220221120618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Various feature (variable) screening approaches have been proposed in the past decade to mitigate the impact of ultra-high dimensionality in classification and regression problems, including filter based methods such as sure indepen¬dence screening, and wrapper based methods such random forest. However, the former type of methods rely heavily on strong modelling assumptions while the latter ones requires an adequate sample size to make the data speak for themselves. These require¬ments can seldom be met in biochemical studies in cases where we have only access to ultra-high dimensional data with a complex structure and a small number of observations.
Objective:
In this research, we want to investigate the possibility of combing both filter based screening methods and random forest based screening methods in the regression context.
Method:
We have combined four state-of-art filter approaches, namely, sure independence screening (SIS) , robust rank corre¬lation based screening (RRCS), high dimensional ordinary least squares projection (HOLP) and a model free sure independence screening procedure based on the distance correlation (DCSIS) from the statistical community with a random forest based Boruta screening method from the machine learning community for regression problems.
Result:
Among all combined methods, RF-DCSIS performs better than the other methods in terms of screening accuracy and prediction capability on the simulated scenarios and real benchmark datasets.
Conclusion:
By empirical study from both extensive simulation and real data, we have shown that both filter based screening and random forest based screening have their pros and cons while a combination of both may lead to a better feature screening result and prediction capability
Keywords:
feature screening, filter-based method, ultra-high dimensional data, variable selection, random forest,RF-DCSIS
Collapse
Affiliation(s)
- Lifeng Zhou
- School of Economics and Management, Changsha University, China
| | - Hong Wang
- School of Mathematics and Statistics, Central South University, China
| |
Collapse
|
21
|
Li L, Liu ZP. Detecting prognostic biomarkers of breast cancer by regularized Cox proportional hazards models. J Transl Med 2021; 19:514. [PMID: 34930307 PMCID: PMC8686664 DOI: 10.1186/s12967-021-03180-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 12/03/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The successful identification of breast cancer (BRCA) prognostic biomarkers is essential for the strategic interference of BRCA patients. Recently, various methods have been proposed for exploring a small prognostic gene set that can distinguish the high-risk group from the low-risk group. METHODS Regularized Cox proportional hazards (RCPH) models were proposed to discover prognostic biomarkers of BRCA from gene expression data. Firstly, the maximum connected network with 1142 genes by mapping 956 differentially expressed genes (DEGs) and 677 previously BRCA-related genes into the gene regulatory network (GRN) was constructed. Then, the 72 union genes of the four feature gene sets identified by Lasso-RCPH, Enet-RCPH, [Formula: see text]-RCPH and SCAD-RCPH models were recognized as the robust prognostic biomarkers. These biomarkers were validated by literature checks, BRCA-specific GRN and functional enrichment analysis. Finally, an index of prognostic risk score (PRS) for BRCA was established based on univariate and multivariate Cox regression analysis. Survival analysis was performed to investigate the PRS on 1080 BRCA patients from the internal validation. Particularly, the nomogram was constructed to express the relationship between PRS and other clinical information on the discovery dataset. The PRS was also verified on 1848 BRCA patients of ten external validation datasets or collected cohorts. RESULTS The nomogram highlighted that the importance of PRS in guiding significance for the prognosis of BRCA patients. In addition, the PRS of 301 normal samples and 306 tumor samples from five independent datasets showed that it is significantly higher in tumors than in normal tissues ([Formula: see text]). The protein expression profiles of the three genes, i.e., ADRB1, SAV1 and TSPAN14, involved in the PRS model demonstrated that the latter two genes are more strongly stained in tumor specimens. More importantly, external validation illustrated that the high-risk group has worse survival than the low-risk group ([Formula: see text]) in both internal and external validations. CONCLUSIONS The proposed pipelines of detecting and validating prognostic biomarker genes for BRCA are effective and efficient. Moreover, the proposed PRS is very promising as an important indicator for judging the prognosis of BRCA patients.
Collapse
Affiliation(s)
- Lingyu Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, China.
| |
Collapse
|
22
|
Drug Repositioning and Subgroup Discovery for Precision Medicine Implementation in Triple Negative Breast Cancer. Cancers (Basel) 2021; 13:cancers13246278. [PMID: 34944904 PMCID: PMC8699385 DOI: 10.3390/cancers13246278] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 11/30/2021] [Accepted: 12/02/2021] [Indexed: 12/29/2022] Open
Abstract
Simple Summary The heterogeneity of complicated diseases like cancer negatively affects patients’ responses to treatment. Finding homogeneous subgroups of patients within the cancer population and finding the appropriate treatment for each subgroup will improve patients’ survival. In this study, we focus on triple-negative breast cancer (TNBC), where approximately 80% of patients do not entirely respond to chemotherapy. Our aim is to find subgroups of TNBC patients and identify drugs that have the potential to tailor treatments for each group through drug repositioning. After applying our method to TNBC, we found that different targeted mechanisms were suggested for different groups of patients. Our findings could help the research community to gain a better understanding of different subgroups within the TNBC population and can help the drugs to be repurposed with explainable results regarding the targeted mechanism. Abstract Breast cancer (BC) is the leading cause of death among female patients with cancer. Patients with triple-negative breast cancer (TNBC) have the lowest survival rate. TNBC has substantial heterogeneity within the BC population. This study utilized our novel patient stratification and drug repositioning method to find subgroups of BC patients that share common genetic profiles and that may respond similarly to the recommended drugs. After further examination of the discovered patient subgroups, we identified five homogeneous druggable TNBC subgroups. A drug repositioning algorithm was then applied to find the drugs with a high potential for each subgroup. Most of the top drugs for these subgroups were chemotherapy used for various types of cancer, including BC. After analyzing the biological mechanisms targeted by these drugs, ferroptosis was the common cell death mechanism induced by the top drugs in the subgroups with neoplasm subdivision and race as clinical variables. In contrast, the antioxidative effect on cancer cells was the common targeted mechanism in the subgroup of patients with an age less than 50. Literature reviews were used to validate our findings, which could provide invaluable insights to streamline the drug repositioning process and could be further studied in a wet lab setting and in clinical trials.
Collapse
|
23
|
Abstract
High-throughput technologies such as next-generation sequencing allow biologists to observe cell function with unprecedented resolution, but the resulting datasets are too large and complicated for humans to understand without the aid of advanced statistical methods. Machine learning (ML) algorithms, which are designed to automatically find patterns in data, are well suited to this task. Yet these models are often so complex as to be opaque, leaving researchers with few clues about underlying mechanisms. Interpretable machine learning (iML) is a burgeoning subdiscipline of computational statistics devoted to making the predictions of ML models more intelligible to end users. This article is a gentle and critical introduction to iML, with an emphasis on genomic applications. I define relevant concepts, motivate leading methodologies, and provide a simple typology of existing approaches. I survey recent examples of iML in genomics, demonstrating how such techniques are increasingly integrated into research workflows. I argue that iML solutions are required to realize the promise of precision medicine. However, several open challenges remain. I examine the limitations of current state-of-the-art tools and propose a number of directions for future research. While the horizon for iML in genomics is wide and bright, continued progress requires close collaboration across disciplines.
Collapse
Affiliation(s)
- David S Watson
- Department of Statistical Science, University College London, London, UK.
| |
Collapse
|
24
|
Chen Z, Liang Y, Lu Q, Nazar M, Mao Y, Aboragah A, Yang Z, Loor JJ. Cadmium promotes apoptosis and inflammation via the circ08409/miR-133a/TGFB2 axis in bovine mammary epithelial cells and mouse mammary gland. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2021; 222:112477. [PMID: 34237642 DOI: 10.1016/j.ecoenv.2021.112477] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 06/23/2021] [Accepted: 06/27/2021] [Indexed: 06/13/2023]
Abstract
Cadmium is a common environmental heavy metal pollutant that can accumulate over long periods of time and cause disease. Thus, analysis of the molecular mechanisms affected by cadmium in the body could be of great significance for the prevention and treatment of cadmium-related diseases. In this study, flow cytometry, immunofluorescence, transmission electron microscopy, H&E (Hematoxylin Eosin) staining and TUNEL (TdT-mediated dUTP Nick-End Labeling) assays were used to verify that cadmium induced apoptosis and immune responses in bovine mammary epithelial cells (BMECs) and in mouse mammary gland. Isolated BMECs cultured with or without cadmium were collected to screen miRNA (microRNA) using high-throughput sequencing. There were 42 differentially-expressed miRNAs among which 27 were upregulated and 15 downregulated including bta-miR-133a, bta-miR-23b-5p, bta-miR-29e, bta-miR-365-5p, bta-miR-615, bta-miR-7, bta-miR-11975, bta-miR-127, and bta-miR-411a. Among those, miR-133a (which can specifically target TGFB2 (Recombinant Transforming Growth Factor Beta 2) was the most significantly downregulated with a fold-change of 5.27 in BMECs cultured with cadmium. Application of the double luciferase reporter system, western blotting, and qRT-PCR (Quantitative Real-time PCR) revealed that circ08409 can directly bind to miR-133a. Experiments demonstrated that circRNA-08409 could adsorb bta-miR-133a. Both circ08409 and TGFB2 significantly increased apoptosis and altered expression level of a series of inflammatory factors in BMECs. In contrast, miR-133a decreased significantly apoptosis and inflammation in the cells. Compared with cultures receiving only cadmium, the miR-133a+cadmium cultures exhibited significant reductions in the occurrence of late apoptosis. Overall, results indicated that circ08409 could relieve the inhibitory effect of miR-133a on TGFB2 expression by combining with miR-133a and subsequently modulating cell proliferation, apoptosis and inflammation. Overall, the data suggested that the circ08409/miR-133a/TGFB2 axis might play a role in mediating the effect of cadmium on BMECs. As such, data provide novel insights into controlling hazards that cadmium could induce in the mammary gland.
Collapse
Affiliation(s)
- Zhi Chen
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, PR China; Joint International Research Laboratory of Agriculture & Agri-Product Safety, Ministry of Education, Yangzhou University, Yangzhou 225009, PR China
| | - Yan Liang
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, PR China; Joint International Research Laboratory of Agriculture & Agri-Product Safety, Ministry of Education, Yangzhou University, Yangzhou 225009, PR China
| | - QinYue Lu
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, PR China; Joint International Research Laboratory of Agriculture & Agri-Product Safety, Ministry of Education, Yangzhou University, Yangzhou 225009, PR China
| | - Mudasir Nazar
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, PR China; Joint International Research Laboratory of Agriculture & Agri-Product Safety, Ministry of Education, Yangzhou University, Yangzhou 225009, PR China
| | - Yongjiang Mao
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, PR China; Joint International Research Laboratory of Agriculture & Agri-Product Safety, Ministry of Education, Yangzhou University, Yangzhou 225009, PR China
| | - Ahmad Aboragah
- Mammalian Nutrition Physiology Genomics, Department of Animal Sciences and Division of Nutritional Sciences, University of Illinois, Urbana, IL 61801, USA
| | - Zhangping Yang
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, PR China; Joint International Research Laboratory of Agriculture & Agri-Product Safety, Ministry of Education, Yangzhou University, Yangzhou 225009, PR China.
| | - Juan J Loor
- Mammalian Nutrition Physiology Genomics, Department of Animal Sciences and Division of Nutritional Sciences, University of Illinois, Urbana, IL 61801, USA
| |
Collapse
|
25
|
Yones C, Raad J, Bugnon LA, Milone DH, Stegmayer G. High precision in microRNA prediction: A novel genome-wide approach with convolutional deep residual networks. Comput Biol Med 2021; 134:104448. [PMID: 33979731 DOI: 10.1016/j.compbiomed.2021.104448] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 11/30/2022]
Abstract
MicroRNAs (miRNAs) are small non-coding RNAs that have a key role in the regulation of gene expression. The importance of miRNAs is widely acknowledged by the community nowadays and computational methods are needed for the precise prediction of novel candidates to miRNA. This task can be done by searching homologous with sequence alignment tools, but results are restricted to sequences that are very similar to the known miRNA precursors (pre-miRNAs). Besides, a very important property of pre-miRNAs, their secondary structure, is not taken into account by these methods. To fill this gap, many machine learning approaches were proposed in the last years. However, the methods are generally tested in very controlled conditions. If these methods were used under real conditions, the false positives increase and the precisions fall quite below those published. This work provides a novel approach for dealing with the computational prediction of pre-miRNAs: a convolutional deep residual neural network (mirDNN). This model was tested with several genomes of animals and plants, the full-genomes, achieving a precision up to 5 times larger than other approaches at the same recall rates. Furthermore, a novel validation methodology was used to ensure that the performance reported in this study can be effectively achieved when using mirDNN in novel species. To provide fast an easy access to mirDNN, a web demo is available at http://sinc.unl.edu.ar/web-demo/mirdnn/. The demo can process FASTA files with multiple sequences to calculate the prediction scores and generates the nucleotide importance plots. FULL SOURCE CODE: http://sourceforge.net/projects/sourcesinc/files/mirdnn and https://github.com/cyones/mirDNN. CONTACT: gstegmayer@sinc.unl.edu.ar.
Collapse
Affiliation(s)
- C Yones
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - J Raad
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - L A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina.
| |
Collapse
|
26
|
Dahri M, Abolmaali SS, Abedanzadeh M, Salmanpour M, Maleki R. Composition and surface chemistry engineering of graphene grafting chitosan for stimuli-responsive cancer therapy: An in-silico study. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100627] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|