1
|
Soyer SM, Ozbek P, Kasavi C. Lung Adenocarcinoma Systems Biomarker and Drug Candidates Identified by Machine Learning, Gene Expression Data, and Integrative Bioinformatics Pipeline. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2024. [PMID: 38979602 DOI: 10.1089/omi.2024.0121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Lung adenocarcinoma (LUAD) is a significant planetary health challenge with its high morbidity and mortality rate, not to mention the marked interindividual variability in treatment outcomes and side effects. There is an urgent need for robust systems biomarkers that can help with early cancer diagnosis, prediction of treatment outcomes, and design of precision/personalized medicines for LUAD. The present study aimed at systems biomarkers of LUAD and deployed integrative bioinformatics and machine learning tools to harness gene expression data. Predictive models were developed to stratify patients based on prognostic outcomes. Importantly, we report here several potential key genes, for example, PMEL and BRIP1, and pathways implicated in the progression and prognosis of LUAD that could potentially be targeted for precision/personalized medicine in the future. Our drug repurposing analysis and molecular docking simulations suggested eight drug candidates for LUAD such as heat shock protein 90 inhibitors, cardiac glycosides, an antipsychotic agent (trifluoperazine), and a calcium ionophore (ionomycin). In summary, this study identifies several promising leads on systems biomarkers and drug candidates for LUAD. The findings also attest to the importance of integrative bioinformatics, structural biology and machine learning techniques in biomarker discovery, and precision oncology research and development.
Collapse
Affiliation(s)
- Semra Melis Soyer
- Department of Bioengineering, Faculty of Engineering, Marmara University, İstanbul, Türkiye
| | - Pemra Ozbek
- Department of Bioengineering, Faculty of Engineering, Marmara University, İstanbul, Türkiye
| | - Ceyda Kasavi
- Department of Bioengineering, Faculty of Engineering, Marmara University, İstanbul, Türkiye
| |
Collapse
|
2
|
Zhou Y, Yang Z, Zeng H. An Aging-Related lncRNA Signature Establishing for Breast Cancer Prognosis and Immunotherapy Responsiveness Prediction. Pharmgenomics Pers Med 2024; 17:251-270. [PMID: 38803444 PMCID: PMC11129764 DOI: 10.2147/pgpm.s450960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 05/18/2024] [Indexed: 05/29/2024] Open
Abstract
Purpose Emerging evidence demonstrates the vital role of aging and long non-coding RNAs (lncRNAs) in breast cancer (BC) progression. Our study intended to develop a prognostic risk model based on aging-related lncRNAs (AG-lncs) to foresee BC patients' outcomes. Patients and Methods 307 aging-related genes (AGs) were sequenced from the TCGA project. Then, 697 AG-lncs were identified by the co-expression analysis with AGs. Using multivariate and univariate Cox regression analysis, and LASSO, 6 AG-lncs, including al136531.1, mapt-as1, al451085.2, otud6b-as1, tnfrsf14-as1, and linc01871, were validated to compute the risk score and establish a risk signature. Expression levels of al136531.1, mapt-as1, al451085.2, tnfrsf14-as1, and linc01871 were higher in low-risk BC patients, whereas otud6b-as1 expression was higher in high-risk BC patients. In the training and testing set, high-risk patients performed shorter PFI, OS, and DFS than low-risk patients. Results Our risk signature had the highest concordance index among other established prognostic signatures and displayed ideal predictive ability for 1-, 3- and 5-year patient OS in the nomogram. Additionally, BC patients with different risk score levels showed different immune statuses and responses to immunotherapy via GSEA, ssGSEA, ESTIMATE algorithm, and TIDE algorithm analysis. Of note, the qRT-PCR analysis validated that these 6 AG-lncs expressed quite differentially in BC tissues at various clinical stages. Conclusion The risk signature of 6 AG-lncs might offer a novel prognostic biomarker and promisingly enhance BC immunotherapy's effectiveness.
Collapse
Affiliation(s)
- Yanshijing Zhou
- Department of Plastic and Cosmetic Surgery, Maternal and Child Health Hospital of Hubei Province, Huazhong University of Science and Technology, Wuhan, Hubei, People’s Republic of China
| | - Zihui Yang
- Department of Plastic and Cosmetic Surgery, Tongji Hospital, Huazhong University of Science and Technology, Wuhan, Hubei, People’s Republic of China
| | - Hong Zeng
- Department of Plastic and Cosmetic Surgery, Tongji Hospital, Huazhong University of Science and Technology, Wuhan, Hubei, People’s Republic of China
| |
Collapse
|
3
|
Song J, Liao H, Li H, Chen H, Si H, Wang J, Bai X. Identification of a novel cancer-associated fibroblasts gene signature based on bioinformatics analysis to predict prognosis and therapeutic responses in breast cancer. Heliyon 2024; 10:e29216. [PMID: 38601538 PMCID: PMC11004657 DOI: 10.1016/j.heliyon.2024.e29216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 04/02/2024] [Accepted: 04/02/2024] [Indexed: 04/12/2024] Open
Abstract
Cancer-associated fibroblasts (CAFs) provide suitable conditions for growth of tumor cell and facilitate tumor progression. Hence, we aimed to identify a CAFs-related gene signature associated with the prognosis of patients with breast cancer (BRCA). We downloaded datasets from Gene Expression Omnibus (GEO) and confirmed the correlation between CAFs infiltration scores and prognosis. By performing weighted gene co-expression network analysis (WGCNA) and Lasso Cox regression analysis, we constructed a four-gene (COL5A3, FN1, POSTN, and RARRES2) prognostic CAFs signature model. Based on the median risk score of CAFs, patients with BRCA were divided into high- and low-risk groups. Compared with low-risk group, patients in high-risk group exhibited a poor prognosis and limited response to immunotherapy. Furthermore, patients with high CAFs risk scores were found to have a detrimental prognosis due to the induction of immunosuppressive cell infiltration, resulting in an immunosuppressive tumor microenvironment. Importantly, we found that CAFs overexpressing FN1 and POSTN significantly promoted the wound healing and invasion ability of tumor cells in vitro validation. Taking together, we identified a four-gene prognostic CAFs signature, which was proven to be a reliable indicator for prognosis and therapeutic efficacy in patients with BRCA. This study provided evidence for novel CAFs-based stromal therapy.
Collapse
Affiliation(s)
- Jin Song
- Department of General Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, 100853, China
| | - Huifeng Liao
- Department of General Surgery, The Seventh Medical Center of Chinese PLA General Hospital, Beijing, 100700, China
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, 510515, China
| | - Huayan Li
- Department of Gynecology, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China
| | - Hongye Chen
- Department of General Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, 100853, China
| | - Huiyan Si
- Department of General Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, 100853, China
| | - Jiandong Wang
- Department of General Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, 100853, China
| | - Xue Bai
- Department of General Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, 100853, China
| |
Collapse
|
4
|
Banerjee S, Sengupta A, Ghosh SK, Banerjee R. CDH1 gene as biomarker towards breast cancer prediction. J Biomol Struct Dyn 2024:1-14. [PMID: 38373072 DOI: 10.1080/07391102.2024.2316770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/03/2024] [Indexed: 02/21/2024]
Abstract
Breast cancer is considered to be happened due to genetic aberration. Out of several genes expressed, it is found that cadherin 1, type 1 (CDH1) is responsible in several ways to control the metabolic order in human. Deregulation of the function of protein E-cadherin, expressed from CDH1 plays an important role in lobular breast cancer. In order to understand the root cause of this recent claim, we focus on CDH1 gene: whether the genetic information translated due to any deviation/alteration/modification in its sequence is related to the occurrence of the different types breast cancer. Towards this end, quantitative analysis of different biophysical and bio-chemical properties of CDH1 gene in genomic and proteomic levels from the available genomic (cDNA) sequences of CDH1 gene (obtained from the COSMIC Database for 78 patients, suffering from various types of breast cancer) clearly emphasizes that alternation/modification in the sequence of the CDH1 gene can be detrimental. Furthermore, Random forest, K-nearest neighbour and stochastic gradient descent (SGD) algorithms are applied on the derived dataset to classify the types of breast cancer, and to validate our hypothesis regarding the acute role of CDH1 as potential bio marker for breast cancer. Analysis of the mutated CDH1 gene sequences, and their related parameters using aforesaid machine learning techniques clearly establish that CDH1 gene can take the deterministic role in predicting the chances of occurrences of different types of breast cancer with an accuracy of > 90 % . Such an observation opens a new paradigm in diagnostic approach of breast cancer.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Srijan Banerjee
- Department of Biotechnology, Maulana Abul Kalam Azad University of Technology, Nadia, West Bengal, India
| | - Antara Sengupta
- Department of Computer Science and Engineering, University of Calcutta, Kolkata, West Bengal, India
| | - Shankar Kumar Ghosh
- Department of Computer Science and Engineering, Shiv Nadar Institution of Eminence, Delhi, India
| | - Raja Banerjee
- Department of Biotechnology, Maulana Abul Kalam Azad University of Technology, Nadia, West Bengal, India
| |
Collapse
|
5
|
Rakhshaninejad M, Fathian M, Shirkoohi R, Barzinpour F, Gandomi AH. Refining breast cancer biomarker discovery and drug targeting through an advanced data-driven approach. BMC Bioinformatics 2024; 25:33. [PMID: 38253993 PMCID: PMC10810249 DOI: 10.1186/s12859-024-05657-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 01/15/2024] [Indexed: 01/24/2024] Open
Abstract
Breast cancer remains a major public health challenge worldwide. The identification of accurate biomarkers is critical for the early detection and effective treatment of breast cancer. This study utilizes an integrative machine learning approach to analyze breast cancer gene expression data for superior biomarker and drug target discovery. Gene expression datasets, obtained from the GEO database, were merged post-preprocessing. From the merged dataset, differential expression analysis between breast cancer and normal samples revealed 164 differentially expressed genes. Meanwhile, a separate gene expression dataset revealed 350 differentially expressed genes. Additionally, the BGWO_SA_Ens algorithm, integrating binary grey wolf optimization and simulated annealing with an ensemble classifier, was employed on gene expression datasets to identify predictive genes including TOP2A, AKR1C3, EZH2, MMP1, EDNRB, S100B, and SPP1. From over 10,000 genes, BGWO_SA_Ens identified 1404 in the merged dataset (F1 score: 0.981, PR-AUC: 0.998, ROC-AUC: 0.995) and 1710 in the GSE45827 dataset (F1 score: 0.965, PR-AUC: 0.986, ROC-AUC: 0.972). The intersection of DEGs and BGWO_SA_Ens selected genes revealed 35 superior genes that were consistently significant across methods. Enrichment analyses uncovered the involvement of these superior genes in key pathways such as AMPK, Adipocytokine, and PPAR signaling. Protein-protein interaction network analysis highlighted subnetworks and central nodes. Finally, a drug-gene interaction investigation revealed connections between superior genes and anticancer drugs. Collectively, the machine learning workflow identified a robust gene signature for breast cancer, illuminated their biological roles, interactions and therapeutic associations, and underscored the potential of computational approaches in biomarker discovery and precision oncology.
Collapse
Affiliation(s)
- Morteza Rakhshaninejad
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Mohammad Fathian
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran.
| | - Reza Shirkoohi
- Cancer Biology Research Center, Cancer Institute, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Keshavarz Boulevard, Tehran, 1419733141, Tehran, Iran
| | - Farnaz Barzinpour
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Amir H Gandomi
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, 2007, NSW, Australia
- University Research and Innovation Center (EKIK), Óbuda University, Budapest, 1034, Hungary
| |
Collapse
|
6
|
Tinterri C, Fernandes B, Zambelli A, Sagona A, Barbieri E, Di Maria Grimaldi S, Darwish SS, Jacobs F, De Carlo C, Iuzzolino M, Gentile D. The Impact of Different Patterns of Residual Disease on Long-Term Oncological Outcomes in Breast Cancer Patients Treated with Neo-Adjuvant Chemotherapy. Cancers (Basel) 2024; 16:376. [PMID: 38254865 PMCID: PMC10814808 DOI: 10.3390/cancers16020376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 01/10/2024] [Accepted: 01/14/2024] [Indexed: 01/24/2024] Open
Abstract
BACKGROUNDS The majority of breast cancer (BC) patients treated with neo-adjuvant chemotherapy (NAC) achieves a pathologic partial response with different patterns of residual disease. No clear correlation between these patterns and oncological results was described. Our aims were to define the predictive factors for different patterns of residual disease and compare the outcomes between the scattered versus the circumscribed pattern. METHODS We reviewed 219 postoperative surgical specimens. Patients were divided into two groups: scattered versus circumscribed. Disease-free survival (DFS), distant DFS (DDFS), and overall survival (OS) were analyzed. RESULTS The scattered and circumscribed patterns were assessed in 111 (50.7%) and 108 (49.3%) patients. Two independent predictive factors for the circumscribed pattern were identified: discontinuation of NAC cycles (p = 0.011), and tumor size post-NAC >18 mm (p = 0.022). No difference was observed in terms of DFS and DDFS. Patients with the scattered pattern exhibited a statistically significant better OS. Discontinuation of NAC cycles, tumor size >18 mm, triple-negative BC, and ypN+ were associated with increased recurrence and poorer survival. CONCLUSIONS Discontinuation of NAC cycles and tumor size are independent factors associated with patterns of residual disease. The scattered pattern presents better survival. Understanding the relationship between NAC, the residual pattern, and differences in survival outcomes offers the potential to optimize the therapeutic approaches.
Collapse
Affiliation(s)
- Corrado Tinterri
- Breast Unit, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy; (C.T.); (A.S.); (E.B.); (S.D.M.G.); (S.S.D.)
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20090 Pieve Emanuele, Milan, Italy; (A.Z.); (M.I.)
| | - Bethania Fernandes
- Department of Pathology, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy; (B.F.); (C.D.C.)
| | - Alberto Zambelli
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20090 Pieve Emanuele, Milan, Italy; (A.Z.); (M.I.)
- Medical Oncology and Hematology Unit, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy;
| | - Andrea Sagona
- Breast Unit, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy; (C.T.); (A.S.); (E.B.); (S.D.M.G.); (S.S.D.)
| | - Erika Barbieri
- Breast Unit, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy; (C.T.); (A.S.); (E.B.); (S.D.M.G.); (S.S.D.)
| | - Simone Di Maria Grimaldi
- Breast Unit, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy; (C.T.); (A.S.); (E.B.); (S.D.M.G.); (S.S.D.)
| | - Shadya Sara Darwish
- Breast Unit, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy; (C.T.); (A.S.); (E.B.); (S.D.M.G.); (S.S.D.)
| | - Flavia Jacobs
- Medical Oncology and Hematology Unit, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy;
| | - Camilla De Carlo
- Department of Pathology, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy; (B.F.); (C.D.C.)
| | - Martina Iuzzolino
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20090 Pieve Emanuele, Milan, Italy; (A.Z.); (M.I.)
- Department of Pathology, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy; (B.F.); (C.D.C.)
| | - Damiano Gentile
- Breast Unit, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano, Milan, Italy; (C.T.); (A.S.); (E.B.); (S.D.M.G.); (S.S.D.)
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20090 Pieve Emanuele, Milan, Italy; (A.Z.); (M.I.)
| |
Collapse
|
7
|
Mondello A, Dal Bo M, Toffoli G, Polano M. Machine learning in onco-pharmacogenomics: a path to precision medicine with many challenges. Front Pharmacol 2024; 14:1260276. [PMID: 38264526 PMCID: PMC10803549 DOI: 10.3389/fphar.2023.1260276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 12/26/2023] [Indexed: 01/25/2024] Open
Abstract
Over the past two decades, Next-Generation Sequencing (NGS) has revolutionized the approach to cancer research. Applications of NGS include the identification of tumor specific alterations that can influence tumor pathobiology and also impact diagnosis, prognosis and therapeutic options. Pharmacogenomics (PGx) studies the role of inheritance of individual genetic patterns in drug response and has taken advantage of NGS technology as it provides access to high-throughput data that can, however, be difficult to manage. Machine learning (ML) has recently been used in the life sciences to discover hidden patterns from complex NGS data and to solve various PGx problems. In this review, we provide a comprehensive overview of the NGS approaches that can be employed and the different PGx studies implicating the use of NGS data. We also provide an excursus of the ML algorithms that can exert a role as fundamental strategies in the PGx field to improve personalized medicine in cancer.
Collapse
Affiliation(s)
| | | | | | - Maurizio Polano
- Experimental and Clinical Pharmacology Unit, Centro di Riferimento Oncologico di Aviano (CRO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Aviano, Italy
| |
Collapse
|
8
|
Alexander H, Hu SK, Krinos AI, Pachiadaki M, Tully BJ, Neely CJ, Reiter T. Eukaryotic genomes from a global metagenomic data set illuminate trophic modes and biogeography of ocean plankton. mBio 2023; 14:e0167623. [PMID: 37947402 PMCID: PMC10746220 DOI: 10.1128/mbio.01676-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/27/2023] [Indexed: 11/12/2023] Open
Abstract
IMPORTANCE Single-celled eukaryotes play ecologically significant roles in the marine environment, yet fundamental questions about their biodiversity, ecological function, and interactions remain. Environmental sequencing enables researchers to document naturally occurring protistan communities, without culturing bias, yet metagenomic and metatranscriptomic sequencing approaches cannot separate individual species from communities. To more completely capture the genomic content of mixed protistan populations, we can create bins of sequences that represent the same organism (metagenome-assembled genomes [MAGs]). We developed the EukHeist pipeline, which automates the binning of population-level eukaryotic and prokaryotic genomes from metagenomic reads. We show exciting insight into what protistan communities are present and their trophic roles in the ocean. Scalable computational tools, like EukHeist, may accelerate the identification of meaningful genetic signatures from large data sets and complement researchers' efforts to leverage MAG databases for addressing ecological questions, resolving evolutionary relationships, and discovering potentially novel biodiversity.
Collapse
Affiliation(s)
- Harriet Alexander
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
| | - Sarah K. Hu
- Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
| | - Arianna I. Krinos
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
- MIT-WHOI Joint Program in Oceanography/Applied Ocean Science and Engineering, Cambridge and Woods Hole, Massachusetts, USA
| | - Maria Pachiadaki
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
| | - Benjamin J. Tully
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - Christopher J. Neely
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
| | - Taylor Reiter
- Population Health and Reproduction, University of California, Davis, Davis, California, USA
| |
Collapse
|
9
|
Wu Y, Xiao Q, Wang S, Xu H, Fang Y. Establishment and Analysis of an Artificial Neural Network Model for Early Detection of Polycystic Ovary Syndrome Using Machine Learning Techniques. J Inflamm Res 2023; 16:5667-5676. [PMID: 38050562 PMCID: PMC10693771 DOI: 10.2147/jir.s438838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 11/10/2023] [Indexed: 12/06/2023] Open
Abstract
Background To identify novel gene combinations and to develop an early diagnostic model for Polycystic Ovary Syndrome (PCOS) through the integration of artificial neural networks (ANN) and random forest (RF) methods. Methods We retrieved and processed gene expression datasets for PCOS from the Gene Expression Omnibus (GEO) database. Differential expression analysis of genes (DEGs) within the training set was performed using the "limma" R package. Enrichment analyses on DEGs using gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG), and immune cell infiltration. The identification of critical genes from DEGs was then performed using random forests, followed by the developing of new diagnostic models for PCOS using artificial neural networks. Results We identified 130 up-regulated genes and 132 down-regulated genes in PCOS compared to normal samples. Gene Ontology analysis revealed significant enrichment in myofibrils and highlighted crucial biological functions related to myofilament sliding, myofibril, and actin-binding. Compared with normal tissues, the types of immune cells expressed in PCOS samples are different. A random forest algorithm identified 10 significant genes proposed as potential PCOS-specific biomarkers. Using these genes, an artificial neural network diagnostic model accurately distinguished PCOS from normal samples. The diagnostic model underwent validation using the independent validation set, and the resulting area under the receiver operating characteristic curve (AUC) values was consistent with the anticipated outcomes. Conclusion Utilizing unique gene combinations, this research created a diagnostic model by merging random forest techniques with artificial neural networks. The AUC indicated a notably superior performance of the diagnostic model.
Collapse
Affiliation(s)
- Yumi Wu
- Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| | - QiWei Xiao
- Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| | - ShouDong Wang
- The Out-Patient Department of TCM of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| | - Huanfang Xu
- Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
- Acupuncture and Moxibustion Hospital of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| | - YiGong Fang
- Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
- Acupuncture and Moxibustion Hospital of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| |
Collapse
|
10
|
Turkyilmazoglu M. Hyperthermia therapy of cancerous tumor sitting in breast via analytical fractional model. Comput Biol Med 2023; 164:107271. [PMID: 37494822 DOI: 10.1016/j.compbiomed.2023.107271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 06/06/2023] [Accepted: 07/16/2023] [Indexed: 07/28/2023]
Abstract
The heat transfer in bi-layer spherical composite region representing a cancerous tumor embedded in a homogenous muscle tissue (Andra et al., 1999; Yu and Jiang, 2019) is modeled by means of fractional energy equations with additional interface boundary constrictions. This hyperthermia problem was explored before in literature with proper hyperthermia experimental parameters and numerical simulations were later on devised by substituting the integer order energy model with the fractional order one. In order to match the experimental data to the fractional model, the order of fractional derivative was determined after a laborious inverse solution scheme. Here, we obtain exact analytical solutions to the fractional hyperthermia problem which is shown to be controlled by four thermal parameters corresponding to each fractional order derivative. The spatio-temporal distribution of temperature within the tumor-tissue medium is then studied via the closed-form solutions. From the solutions, the anomalous heat diffusion process for early and late exposure times is detected. The best derivative of fractional order is eventually determined by matching the experimental temperature to analytically derived one here. Excellent agreement with the numerically fitted fractional value is observed. The present approach is eventually extended to a more realistic situation in which the perfusion of blood relative to tumor and skin zone is taken into account. The presented analytical expressions are further beneficial to elaborately alternate the optimized operational thermal parameters of desire during a hyperthermia treatment of different kind tumors.
Collapse
Affiliation(s)
- Mustafa Turkyilmazoglu
- Department of Mathematics, Hacettepe University, 06532 Beytepe, Ankara, Türkiye; Department of Medical Research, China Medical University Hospital, China Medical University, Taichung, Taiwan.
| |
Collapse
|
11
|
Ruiz-Fresneda MA, Gijón A, Morales-Álvarez P. Bibliometric analysis of the global scientific production on machine learning applied to different cancer types. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:96125-96137. [PMID: 37566331 PMCID: PMC10482761 DOI: 10.1007/s11356-023-28576-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 06/29/2023] [Indexed: 08/12/2023]
Abstract
Cancer disease is one of the main causes of death in the world, with million annual cases in the last decades. The need to find a cure has stimulated the search for efficient treatments and diagnostic procedures. One of the most promising tools that has emerged against cancer in recent years is machine learning (ML), which has raised a huge number of scientific papers published in a relatively short period of time. The present study analyzes global scientific production on ML applied to the most relevant cancer types through various bibliometric indicators. We find that over 30,000 studies have been published so far and observe that cancers with the highest number of published studies using ML (breast, lung, and colon cancer) are those with the highest incidence, being the USA and China the main scientific producers on the subject. Interestingly, the role of China and Japan in stomach cancer is correlated with the number of cases of this cancer type in Asia (78% of the worldwide cases). Knowing the countries and institutions that most study each area can be of great help for improving international collaborations between research groups and countries. Our analysis shows that medical and computer science journals lead the number of publications on the subject and could be useful for researchers in the field. Finally, keyword co-occurrence analysis suggests that ML-cancer research trends are focused not only on the use of ML as an effective diagnostic method, but also for the improvement of radiotherapy- and chemotherapy-based treatments.
Collapse
Affiliation(s)
| | - Alfonso Gijón
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
- Research Centre for Information and Communication Technologies (CITIC-UGR), University of Granada, Granada, Spain
| | - Pablo Morales-Álvarez
- Research Centre for Information and Communication Technologies (CITIC-UGR), University of Granada, Granada, Spain
- Department of Statistics and Operations Research, University of Granada, Granada, Spain
| |
Collapse
|
12
|
Huang X, Li S, Gao W, Shi J, Cheng M, Mi Y, Liu Y, Sang M, Li Z, Geng C. KIF20A is a Prognostic Marker for Female Patients with Estrogen Receptor-Positive Breast Cancer and Receiving Tamoxifen as Adjuvant Endocrine Therapy. Int J Gen Med 2023; 16:3623-3635. [PMID: 37637711 PMCID: PMC10455948 DOI: 10.2147/ijgm.s425918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 07/31/2023] [Indexed: 08/29/2023] Open
Abstract
Purpose Our aim was to verify whether KIF20A has the potential to serve as a prognostic marker for female patients with estrogen receptor (ER)-positive breast cancer (BC) and treated with tamoxifen (TAM). Patients and Methods Online tools were used to investigate the potential correlation between KIF20A gene expression and survival of patients with ER-positive BC and TAM treatment. Furthermore, immunohistochemistry (IHC) was conducted to assess the expression levels of KIF20A in patients included from our center. The prognostic value of KIF20A for disease-free survival (DFS) and overall survival (OS) was further evaluated using Cox regression analysis. Results According to the results obtained from online tools, it was found that patients with low KIF20A expression exhibited significantly better survival outcomes in terms of relapse-free survival (RFS), distant metastasis-free survival (DMFS), and OS compared to those with high KIF20A expression (P < 0.001, P < 0.001, and P = 0.008, respectively). Additionally, significantly lower gene expression of KIF20A was found in patients who responded to TAM than in those who did not respond to TAM (P < 0.001). We further included 203 patients with adjuvant TAM therapy, and IHC for KIF20A was performed on sections from paraffin-embedded blocks. Patients with low KIF20A expression had significantly better DFS and OS (P = 0.001 and 0.002, respectively, log rank test), and the expression of KIF20A was identified as an independent factor for predicting both DFS and OS (P = 0.001 and 0.008, respectively). Conclusion KIF20A expression is an independent prognostic factor for survival in patients with ER-positive BC who received adjuvant TAM therapy. In clinical practice, IHC evaluation of KIF20A expression in surgical samples before administering tamoxifen may assist in predicting the treatment outcomes of these patients.
Collapse
Affiliation(s)
- Xuchen Huang
- Department of Breast Center, The Fourth Hospital of Hebei Medical University, Shijiazhuang, Hebei, People’s Republic of China
- Key Laboratory in Hebei Province for Molecular Medicine of Breast Cancer, Shijiazhuang, Hebei, People’s Republic of China
| | - Sainan Li
- Department of Breast Center, The Fourth Hospital of Hebei Medical University, Shijiazhuang, Hebei, People’s Republic of China
- Key Laboratory in Hebei Province for Molecular Medicine of Breast Cancer, Shijiazhuang, Hebei, People’s Republic of China
| | - Wei Gao
- Department of Breast Center, The Fourth Hospital of Hebei Medical University, Shijiazhuang, Hebei, People’s Republic of China
- Key Laboratory in Hebei Province for Molecular Medicine of Breast Cancer, Shijiazhuang, Hebei, People’s Republic of China
| | - Jiajie Shi
- Department of Breast Center, The Fourth Hospital of Hebei Medical University, Shijiazhuang, Hebei, People’s Republic of China
- Key Laboratory in Hebei Province for Molecular Medicine of Breast Cancer, Shijiazhuang, Hebei, People’s Republic of China
| | - Meng Cheng
- Department of Breast Center, The Fourth Hospital of Hebei Medical University, Shijiazhuang, Hebei, People’s Republic of China
- Key Laboratory in Hebei Province for Molecular Medicine of Breast Cancer, Shijiazhuang, Hebei, People’s Republic of China
| | - Yunzhe Mi
- Department of Breast Center, The Fourth Hospital of Hebei Medical University, Shijiazhuang, Hebei, People’s Republic of China
- Key Laboratory in Hebei Province for Molecular Medicine of Breast Cancer, Shijiazhuang, Hebei, People’s Republic of China
| | - Yueping Liu
- Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, Hebei, People’s Republic of China
| | - Meixiang Sang
- Research Center and Tumor Research Institute, The Fourth Hospital of Hebei Medical University, Shijiazhuang, Hebei, People’s Republic of China
| | - Ziyi Li
- Research Center and Tumor Research Institute, The Fourth Hospital of Hebei Medical University, Shijiazhuang, Hebei, People’s Republic of China
| | - Cuizhi Geng
- Department of Breast Center, The Fourth Hospital of Hebei Medical University, Shijiazhuang, Hebei, People’s Republic of China
- Key Laboratory in Hebei Province for Molecular Medicine of Breast Cancer, Shijiazhuang, Hebei, People’s Republic of China
| |
Collapse
|
13
|
Mirza Z, Ansari MS, Iqbal MS, Ahmad N, Alganmi N, Banjar H, Al-Qahtani MH, Karim S. Identification of Novel Diagnostic and Prognostic Gene Signature Biomarkers for Breast Cancer Using Artificial Intelligence and Machine Learning Assisted Transcriptomics Analysis. Cancers (Basel) 2023; 15:3237. [PMID: 37370847 DOI: 10.3390/cancers15123237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/10/2023] [Accepted: 06/13/2023] [Indexed: 06/29/2023] Open
Abstract
BACKGROUND Breast cancer (BC) is one of the most common female cancers. Clinical and histopathological information is collectively used for diagnosis, but is often not precise. We applied machine learning (ML) methods to identify the valuable gene signature model based on differentially expressed genes (DEGs) for BC diagnosis and prognosis. METHODS A cohort of 701 samples from 11 GEO BC microarray datasets was used for the identification of significant DEGs. Seven ML methods, including RFECV-LR, RFECV-SVM, LR-L1, SVC-L1, RF, and Extra-Trees were applied for gene reduction and the construction of a diagnostic model for cancer classification. Kaplan-Meier survival analysis was performed for prognostic signature construction. The potential biomarkers were confirmed via qRT-PCR and validated by another set of ML methods including GBDT, XGBoost, AdaBoost, KNN, and MLP. RESULTS We identified 355 DEGs and predicted BC-associated pathways, including kinetochore metaphase signaling, PTEN, senescence, and phagosome-formation pathways. A hub of 28 DEGs and a novel diagnostic nine-gene signature (COL10A, S100P, ADAMTS5, WISP1, COMP, CXCL10, LYVE1, COL11A1, and INHBA) were identified using stringent filter conditions. Similarly, a novel prognostic model consisting of eight-gene signatures (CCNE2, NUSAP1, TPX2, S100P, ITM2A, LIFR, TNXA, and ZBTB16) was also identified using disease-free survival and overall survival analysis. Gene signatures were validated by another set of ML methods. Finally, qRT-PCR results confirmed the expression of the identified gene signatures in BC. CONCLUSION The ML approach helped construct novel diagnostic and prognostic models based on the expression profiling of BC. The identified nine-gene signature and eight-gene signatures showed excellent potential in BC diagnosis and prognosis, respectively.
Collapse
Affiliation(s)
- Zeenat Mirza
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Department of Medical Laboratory Science, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Md Shahid Ansari
- Department of Clinical Data Analytics, Max Super Speciality Hospital, Saket, New Delhi 110017, India
| | - Md Shahid Iqbal
- Department of Statistics and Computer Applications, Tilka Manjhi Bhagalpur University, Bhagalpur 812007, India
| | - Nesar Ahmad
- Department of Statistics and Computer Applications, Tilka Manjhi Bhagalpur University, Bhagalpur 812007, India
| | - Nofe Alganmi
- Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Centre of Artificial Intelligence in Precision Medicines, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Haneen Banjar
- Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Centre of Artificial Intelligence in Precision Medicines, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Mohammed H Al-Qahtani
- Department of Medical Laboratory Science, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Sajjad Karim
- Department of Medical Laboratory Science, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
14
|
Thafar MA, Albaradei S, Uludag M, Alshahrani M, Gojobori T, Essack M, Gao X. OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features. Front Genet 2023; 14:1139626. [PMID: 37091791 PMCID: PMC10117673 DOI: 10.3389/fgene.2023.1139626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 03/24/2023] [Indexed: 04/08/2023] Open
Abstract
Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.
Collapse
Affiliation(s)
- Maha A. Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Saudi Arabia
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Mona Alshahrani
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| |
Collapse
|
15
|
A Clinical Prediction Model for Breast Cancer in Women Having Their First Mammogram. Healthcare (Basel) 2023; 11:healthcare11060856. [PMID: 36981513 PMCID: PMC10048653 DOI: 10.3390/healthcare11060856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 03/06/2023] [Accepted: 03/09/2023] [Indexed: 03/17/2023] Open
Abstract
Background: Digital mammography is the most efficient screening and diagnostic modality for breast cancer (BC). However, the technology is not widely available in rural areas. This study aimed to construct a prediction model for BC in women scheduled for their first mammography at a breast center to prioritize patients on waiting lists. Methods: This retrospective cohort study analyzed breast clinic data from January 2013 to December 2017. Clinical parameters that were significantly associated with a BC diagnosis were used to construct predictive models using stepwise multiple logistic regression. The models’ discriminative capabilities were compared using receiver operating characteristic curves (AUCs). Results: Data from 822 women were selected for analysis using an inverse probability weighting method. Significant risk factors were age, body mass index (BMI), family history of BC, and indicated symptoms (mass and/or nipple discharge). When these factors were used to construct a model, the model performance according to the Akaike criterion was 1387.9, and the AUC was 0.82 (95% confidence interval: 0.76–0.87). Conclusion: In a resource-limited setting, the priority for a first mammogram should be patients with mass and/or nipple discharge, asymptomatic patients who are older or have high BMI, and women with a family history of BC.
Collapse
|
16
|
Alromema N, Syed AH, Khan T. A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data. Diagnostics (Basel) 2023; 13:diagnostics13040708. [PMID: 36832196 PMCID: PMC9955903 DOI: 10.3390/diagnostics13040708] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 01/30/2023] [Accepted: 02/07/2023] [Indexed: 02/16/2023] Open
Abstract
The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum Redundancy-Maximum Relevance (mRMR), a two-tailed unpaired t-test, and meta-heuristics to screen the most optimal set of gene biomarkers as predictors for BC. The proposed framework identified a set of three most optimal gene biomarkers, namely, MAPK 1, APOBEC3B, and ENAH. In addition, the state-of-the-art supervised Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Net (NN), Naïve Bayes (NB), Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), and Logistic Regression (LR) were used to test the predictive capability of the selected gene biomarkers and select the most effective breast cancer diagnostic model with higher values of performance matrices. Our study found that the XGBoost-based model was the superior performer with an accuracy of 0.976 ± 0.027, an F1-Score of 0.974 ± 0.030, and an AUC value of 0.961 ± 0.035 when tested on an independent test dataset. The screened gene biomarkers-based classification system efficiently detects primary breast tumors from normal breast samples.
Collapse
Affiliation(s)
- Nashwan Alromema
- Department of Computer Science, Faculty of Computing and Information Technology Rabigh (FCITR), King Abdulaziz University, Jeddah 22254, Saudi Arabia
- Correspondence:
| | - Asif Hassan Syed
- Department of Computer Science, Faculty of Computing and Information Technology Rabigh (FCITR), King Abdulaziz University, Jeddah 22254, Saudi Arabia
| | - Tabrej Khan
- Department of Information Systems, Faculty of Computing and Information Technology Rabigh (FCITR), King Abdulaziz University, Jeddah 22254, Saudi Arabia
| |
Collapse
|
17
|
Shin SY, Centenera MM, Hodgson JT, Nguyen EV, Butler LM, Daly RJ, Nguyen LK. A Boolean-based machine learning framework identifies predictive biomarkers of HSP90-targeted therapy response in prostate cancer. Front Mol Biosci 2023; 10:1094321. [PMID: 36743211 PMCID: PMC9892654 DOI: 10.3389/fmolb.2023.1094321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 01/06/2023] [Indexed: 01/20/2023] Open
Abstract
Precision medicine has emerged as an important paradigm in oncology, driven by the significant heterogeneity of individual patients' tumour. A key prerequisite for effective implementation of precision oncology is the development of companion biomarkers that can predict response to anti-cancer therapies and guide patient selection for clinical trials and/or treatment. However, reliable predictive biomarkers are currently lacking for many anti-cancer therapies, hampering their clinical application. Here, we developed a novel machine learning-based framework to derive predictive multi-gene biomarker panels and associated expression signatures that accurately predict cancer drug sensitivity. We demonstrated the power of the approach by applying it to identify response biomarker panels for an Hsp90-based therapy in prostate cancer, using proteomic data profiled from prostate cancer patient-derived explants. Our approach employs a rational feature section strategy to maximise model performance, and innovatively utilizes Boolean algebra methods to derive specific expression signatures of the marker proteins. Given suitable data for model training, the approach is also applicable to other cancer drug agents in different tumour settings.
Collapse
Affiliation(s)
- Sung-Young Shin
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia,Cancer Program, Biomedicine Discovery Institute, Monash University, Clayton, VIC, Australia,*Correspondence: Sung-Young Shin, ; Lan K. Nguyen,
| | - Margaret M. Centenera
- South Australian Immunogenomics Cancer Institute and Freemasons Foundation Centre for Men’s Health, University of Adelaide, Adelaide, SA, Australia,South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - Joshua T. Hodgson
- South Australian Immunogenomics Cancer Institute and Freemasons Foundation Centre for Men’s Health, University of Adelaide, Adelaide, SA, Australia,South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - Elizabeth V. Nguyen
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia,Cancer Program, Biomedicine Discovery Institute, Monash University, Clayton, VIC, Australia
| | - Lisa M. Butler
- South Australian Immunogenomics Cancer Institute and Freemasons Foundation Centre for Men’s Health, University of Adelaide, Adelaide, SA, Australia,South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - Roger J. Daly
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia,Cancer Program, Biomedicine Discovery Institute, Monash University, Clayton, VIC, Australia
| | - Lan K. Nguyen
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia,Cancer Program, Biomedicine Discovery Institute, Monash University, Clayton, VIC, Australia,*Correspondence: Sung-Young Shin, ; Lan K. Nguyen,
| |
Collapse
|
18
|
Li Y, Chen G, Zhang K, Cao J, Zhao H, Cong Y, Qiao G. Integrated transcriptome and network analysis identifies EZH2/CCNB1/PPARG as prognostic factors in breast cancer. Front Genet 2023; 13:1117081. [PMID: 36712863 PMCID: PMC9873965 DOI: 10.3389/fgene.2022.1117081] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 12/27/2022] [Indexed: 01/12/2023] Open
Abstract
Breast cancer (BC) has high morbidity, with significant relapse and mortality rates in women worldwide. Therefore, further exploration of its pathogenesis is of great significance. This study selected therapy genes and possible biomarkers to predict BC using bioinformatic methods. To this end, the study examined 21 healthy breasts along with 457 BC tissues in two Gene Expression Omnibus (GEO) datasets and then identified differentially expressed genes (DEGs). Survival-associated DEGs were screened using the Kaplan-Meier curve. Based on Gene Ontology (GO) annotation, survival-associated DEGs were mostly associated with cell division and cellular response to hormone stimulus. The enriched Kyoto Encyclopedia of Gene and Genome (KEGG) pathway was mostly correlated with cell cycle and tyrosine metabolism. Using overlapped survival-associated DEGs, a survival-associated PPI network was constructed. PPI analysis revealed three hub genes (EZH2, CCNB1, and PPARG) by their degree of connection. These hub genes were confirmed using The Cancer Genome Atlas (TCGA)-BRCA dataset and BC tissue samples. Through Gene Set Enrichment Analysis (GSEA), the molecular mechanism of the potential therapy and prognostic genes were evaluated. Thus, hub genes were shown to be associated with KEGG_CELL_CYCLE and VANTVEER_BREAST_CANCER_POOR_PROGNOSIS gene sets. Finally, based on integrated bioinformatics analysis, this study identified three hub genes as possible prognostic biomarkers and therapeutic targets for BC. The results obtained further understanding of the underground molecular mechanisms related to BC occurrence and prognostic outcomes.
Collapse
Affiliation(s)
- Yalun Li
- Department of Breast Surgery, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, Shandong, China
| | - Gang Chen
- Department of Breast Surgery, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, Shandong, China
| | - Kun Zhang
- Department of Breast Surgery, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, Shandong, China
| | - Jianqiao Cao
- Department of Breast Surgery, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, Shandong, China
| | - Huishan Zhao
- Reproductive Medicine Centre, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, Shandong, China
| | - Yizi Cong
- Department of Breast Surgery, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, Shandong, China,*Correspondence: Yizi Cong, ; Guangdong Qiao,
| | - Guangdong Qiao
- Department of Breast Surgery, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, Shandong, China,*Correspondence: Yizi Cong, ; Guangdong Qiao,
| |
Collapse
|
19
|
Zhou J, Jiang Z, Fu L, Qu F, Dai M, Xie N, Zhang S, Wang F. Contribution of labor related gene subtype classification on heterogeneity of polycystic ovary syndrome. PLoS One 2023; 18:e0282292. [PMID: 36857354 PMCID: PMC9977056 DOI: 10.1371/journal.pone.0282292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 02/11/2023] [Indexed: 03/02/2023] Open
Abstract
OBJECTIVE As one of the most common endocrine disorders in women of reproductive age, polycystic ovary syndrome (PCOS) is highly heterogeneous with varied clinical features and diverse gestational complications among individuals. The patients with PCOS have 2-fold higher risk of preterm labor which is associated with substantial infant morbidity and mortality and great socioeconomic cost. The study was designated to identify molecular subtypes and the related hub genes to facilitate the susceptibility assessment of preterm labor in women with PCOS. METHODS Four mRNA datasets (GSE84958, GSE5090, GSE43264 and GSE98421) were obtained from Gene Expression Omnibus database. Twenty-eight candidate genes related to preterm labor or labor were yielded from the researches and our unpublished data. Then, we utilized unsupervised clustering to identify molecular subtypes in PCOS based on the expression of above candidate genes. Key modules were generated with weighted gene co-expression network analysis R package, and their hub genes were generated with CytoHubba. The probable biological function and mechanism were explored through Gene Ontology analysis and Kyoto Encyclopedia of Genes and Genomes pathway analysis. In addition, STRING and Cytoscape software were used to identify the protein-protein interaction (PPI) network, and the molecular complex detection (MCODE) was used to identify the hub genes. Then the overlapping hub genes were predicted. RESULTS Two molecular subtypes were found in women with PCOS based on the expression similarity of preterm labor or labor-related genes, in which two modules were highlighted. The key modules and PPI network have five overlapping five hub genes, two of which, GTF2F2 and MYO6 gene, were further confirmed by the comparison between clustering subgroups according to the expression of hub genes. CONCLUSIONS Distinct PCOS molecular subtypes were identified with preterm labor or labor-related genes, which might uncover the potential mechanism underlying heterogeneity of clinical pregnancy complications in women with PCOS.
Collapse
Affiliation(s)
- Jue Zhou
- School of Food Science and Biotechnology, Zhejiang Gongshang University, Hangzhou, China
| | - Zhou Jiang
- Department of Obstetrics and Gynecology, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Leyi Fu
- Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Fan Qu
- Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Minchen Dai
- Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Ningning Xie
- Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Songying Zhang
- Department of Obstetrics and Gynecology, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, China
- * E-mail: (FW); (SZ)
| | - Fangfang Wang
- Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
- * E-mail: (FW); (SZ)
| |
Collapse
|
20
|
Glypican-3 Differentiates Intraductal Carcinoma and Paget's Disease from Other Types of Breast Cancer. MEDICINA (KAUNAS, LITHUANIA) 2022; 59:medicina59010086. [PMID: 36676710 PMCID: PMC9862536 DOI: 10.3390/medicina59010086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 12/28/2022] [Accepted: 12/29/2022] [Indexed: 01/03/2023]
Abstract
Background and Objectives: breast cancer remains the most common health burden affecting females worldwide. Despite developments in breast cancer diagnostic approaches and treatment strategies, the clinical management of metastatic breast cancer remains challenging. Thus, there is a need to identify new biomarkers and novel drug targets for breast cancer diagnosis and therapy. Recently, aberrant glypican-3 (GPC3) expression in cancers has gained considerable interest in cancer research. The studies, however, have yielded contradictory results about GPC3 expression in breast cancer. Therefore, the current study aims to analyse GPC3 expression across a large panel of different breast cancer subtypes. Materials and Methods: GPC3 expression was immunohistochemically evaluated in 230 breast cancer patients along with eight normal tissues and its associations to clinical and demographic characteristics, as well as immunohistochemical biomarkers for breast cancer. Moreover, a public database consisting of breast cancer patients' survival data and GPC3 gene expression information was used to assess the prognostic value of GPC3 in the survival of breast cancer patients. Results: GPC3 expression was only characterised in 7.5% of different histological breast cancer subtypes. None of the normal breast tissues displayed GPC3 expression. Interestingly, all cases of Paget's disease, as well as 42.9% of intraductal and 16.7% of mucinous carcinomas were found to have GPC3 expression, where it was able to significantly discriminate Paget's disease and intraductal carcinoma from other breast cancer subtypes. Importantly, GPC3 expression was found more often in tumours that tested positive for the expression of hormone receptors and human epidermal growth factor receptor 2 (HER2), indicating more favourable histological subtypes of breast cancer. Consequently, longer relapse-free survival (RFS) was significantly correlated with higher GPC3 mRNA expression. Conclusions: Our study proposes that GPC3 is a promising breast cancer subtype-specific biomarker. Moreover, GPC3 may have the potential to be a molecular target for the development of new therapeutics for specific subtypes of breast cancer.
Collapse
|
21
|
Wang S, Liu W, Ye Z, Xia X, Guo M. Development of a joint diagnostic model of thyroid papillary carcinoma with artificial neural network and random forest. Front Genet 2022; 13:957718. [PMCID: PMC9585230 DOI: 10.3389/fgene.2022.957718] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
Objective: Papillary thyroid carcinoma (PTC) accounts for 80% of thyroid malignancy, and the occurrence of PTC is increasing rapidly. The present study was conducted with the purpose of identifying novel and important gene panels and developing an early diagnostic model for PTC by combining artificial neural network (ANN) and random forest (RF).Methods and results: Samples were searched from the Gene Expression Omnibus (GEO) database, and gene expression datasets (GSE27155, GSE60542, and GSE33630) were collected and processed. GSE27155 and GSE60542 were merged into the training set, and GSE33630 was defined as the validation set. Differentially expressed genes (DEGs) in the training set were obtained by “limma” of R software. Then, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis as well as immune cell infiltration analysis were conducted based on DEGs. Important genes were identified from the DEGs by random forest. Finally, an artificial neural network was used to develop a diagnostic model. Also, the diagnostic model was validated by the validation set, and the area under the receiver operating characteristic curve (AUC) value was satisfactory.Conclusion: A diagnostic model was established by a joint of random forest and artificial neural network based on a novel gene panel. The AUC showed that the diagnostic model had significantly excellent performance.
Collapse
|
22
|
Taghizadeh E, Heydarheydari S, Saberi A, JafarpoorNesheli S, Rezaeijo SM. Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods. BMC Bioinformatics 2022; 23:410. [PMID: 36183055 PMCID: PMC9526906 DOI: 10.1186/s12859-022-04965-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 09/27/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We used a hybrid machine learning systems (HMLS) strategy that includes the extensive search for the discovery of the most optimal HMLSs, including feature selection algorithms, a feature extraction algorithm, and classifiers for diagnosing breast cancer. Hence, this study aims to obtain a high-importance transcriptome profile linked with classification procedures that can facilitate the early detection of breast cancer. METHODS In the present study, 762 breast cancer patients and 138 solid tissue normal subjects were included. Three groups of machine learning (ML) algorithms were employed: (i) four feature selection procedures are employed and compared to select the most valuable feature: (1) ANOVA; (2) Mutual Information; (3) Extra Trees Classifier; and (4) Logistic Regression (LGR), (ii) a feature extraction algorithm (Principal Component Analysis), iii) we utilized 13 classification algorithms accompanied with automated ML hyperparameter tuning, including (1) LGR; (2) Support Vector Machine; (3) Bagging; (4) Gaussian Naive Bayes; (5) Decision Tree; (6) Gradient Boosting Decision Tree; (7) K Nearest Neighborhood; (8) Bernoulli Naive Bayes; (9) Random Forest; (10) AdaBoost, (11) ExtraTrees; (12) Linear Discriminant Analysis; and (13) Multilayer Perceptron (MLP). For evaluating the proposed models' performance, balance accuracy and area under the curve (AUC) were used. RESULTS Feature selection procedure LGR + MLP classifier achieved the highest prediction accuracy and AUC (balanced accuracy: 0.86, AUC = 0.94), followed by an LGR + LGR classifier (balanced accuracy: 0.84, AUC = 0.94). The results showed that achieved AUC for the LGR + LGR classifier belonged to the 20 biomarkers as follows: TMEM212, SNORD115-13, ATP1A4, FRG2, CFHR4, ZCCHC13, FLJ46361, LY6G6E, ZNF323, KRT28, KRT25, LPPR5, C10orf99, PRKACG, SULT2A1, GRIN2C, EN2, GBA2, CUX2, and SNORA66. CONCLUSIONS The best performance was achieved using the LGR feature selection procedure and MLP classifier. Results show that the 20 biomarkers had the highest score or ranking in breast cancer detection.
Collapse
Affiliation(s)
- Eskandar Taghizadeh
- Department of Medical Genetic, Faculty of Medicine, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | - Sahel Heydarheydari
- Department of Radiology Technology, Shoushtar Faculty of Medical Sciences, Shoushtar, Iran
| | - Alihossein Saberi
- Department of Medical Genetic, Faculty of Medicine, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | | | - Seyed Masoud Rezaeijo
- Department of Medical Physics, Faculty of Medicine, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran.
| |
Collapse
|
23
|
Couckuyt A, Seurinck R, Emmaneel A, Quintelier K, Novak D, Van Gassen S, Saeys Y. Challenges in translational machine learning. Hum Genet 2022; 141:1451-1466. [PMID: 35246744 PMCID: PMC8896412 DOI: 10.1007/s00439-022-02439-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 02/08/2022] [Indexed: 11/25/2022]
Abstract
Machine learning (ML) algorithms are increasingly being used to help implement clinical decision support systems. In this new field, we define as "translational machine learning", joint efforts and strong communication between data scientists and clinicians help to span the gap between ML and its adoption in the clinic. These collaborations also improve interpretability and trust in translational ML methods and ultimately aim to result in generalizable and reproducible models. To help clinicians and bioinformaticians refine their translational ML pipelines, we review the steps from model building to the use of ML in the clinic. We discuss experimental setup, computational analysis, interpretability and reproducibility, and emphasize the challenges involved. We highly advise collaboration and data sharing between consortia and institutes to build multi-centric cohorts that facilitate ML methodologies that generalize across centers. In the end, we hope that this review provides a way to streamline translational ML and helps to tackle the challenges that come with it.
Collapse
Affiliation(s)
- Artuur Couckuyt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Ruth Seurinck
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Annelies Emmaneel
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Katrien Quintelier
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
- Department of Pulmonary Diseases, Erasmus MC, Rotterdam, The Netherlands
| | - David Novak
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Sofie Van Gassen
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Yvan Saeys
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium.
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium.
| |
Collapse
|
24
|
Villagrana-Bañuelos KE, Galván-Tejada CE, Galván-Tejada JI, Gamboa-Rosales H, Celaya-Padilla JM, Soto-Murillo MA, Solís-Robles R. Machine Learning Model Based on Lipidomic Profile Information to Predict Sudden Infant Death Syndrome. Healthcare (Basel) 2022; 10:healthcare10071303. [PMID: 35885829 PMCID: PMC9317003 DOI: 10.3390/healthcare10071303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/03/2022] [Accepted: 07/09/2022] [Indexed: 11/16/2022] Open
Abstract
Sudden infant death syndrome (SIDS) represents the leading cause of death in under one year of age in developing countries. Even in our century, its etiology is not clear, and there is no biomarker that is discriminative enough to predict the risk of suffering from it. Therefore, in this work, taking a public dataset on the lipidomic profile of babies who died from this syndrome compared to a control group, a univariate analysis was performed using the Mann–Whitney U test, with the aim of identifying the characteristics that enable discriminating between both groups. Those characteristics with a p-value less than or equal to 0.05 were taken; once these characteristics were obtained, classification models were implemented (random forests (RF), logistic regression (LR), support vector machine (SVM) and naive Bayes (NB)). We used seventy percent of the data for model training, subjecting it to a cross-validation (k = 5) and later submitting to validation in a blind test with 30% of the remaining data, which allows simulating the scenario in real life—that is, with an unknown population for the model. The model with the best performance was RF, since in the blind test, it obtained an AUC of 0.9, specificity of 1, and sensitivity of 0.8. The proposed model provides the basis for the construction of a SIDS risk prediction computer tool, which will contribute to prevention, and proposes lines of research to deal with this pathology.
Collapse
|
25
|
Chu SS, Nguyen HA, Zhang J, Tabassum S, Cao H. Towards Multiplexed and Multimodal Biosensor Platforms in Real-Time Monitoring of Metabolic Disorders. SENSORS (BASEL, SWITZERLAND) 2022; 22:5200. [PMID: 35890880 PMCID: PMC9323394 DOI: 10.3390/s22145200] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 07/05/2022] [Accepted: 07/07/2022] [Indexed: 06/15/2023]
Abstract
Metabolic syndrome (MS) is a cluster of conditions that increases the probability of heart disease, stroke, and diabetes, and is very common worldwide. While the exact cause of MS has yet to be understood, there is evidence indicating the relationship between MS and the dysregulation of the immune system. The resultant biomarkers that are expressed in the process are gaining relevance in the early detection of related MS. However, sensing only a single analyte has its limitations because one analyte can be involved with various conditions. Thus, for MS, which generally results from the co-existence of multiple complications, a multi-analyte sensing platform is necessary for precise diagnosis. In this review, we summarize various types of biomarkers related to MS and the non-invasively accessible biofluids that are available for sensing. Then two types of widely used sensing platform, the electrochemical and optical, are discussed in terms of multimodal biosensing, figure-of-merit (FOM), sensitivity, and specificity for early diagnosis of MS. This provides a thorough insight into the current status of the available platforms and how the electrochemical and optical modalities can complement each other for a more reliable sensing platform for MS.
Collapse
Affiliation(s)
- Sung Sik Chu
- Department of Biomedical Engineering, Henry Samueli School of Engineering, University of California Irvine, Irvine, CA 92697, USA; (S.S.C.); (J.Z.)
| | - Hung Anh Nguyen
- Department of Electrical Engineering and Computer Science, Henry Samueli School of Engineering, University of California Irvine, Irvine, CA 92697, USA;
| | - Jimmy Zhang
- Department of Biomedical Engineering, Henry Samueli School of Engineering, University of California Irvine, Irvine, CA 92697, USA; (S.S.C.); (J.Z.)
| | - Shawana Tabassum
- Department of Electrical Engineering, College of Engineering, The University of Texas at Tyler, Tyler, TX 75799, USA
| | - Hung Cao
- Department of Biomedical Engineering, Henry Samueli School of Engineering, University of California Irvine, Irvine, CA 92697, USA; (S.S.C.); (J.Z.)
- Department of Electrical Engineering and Computer Science, Henry Samueli School of Engineering, University of California Irvine, Irvine, CA 92697, USA;
| |
Collapse
|
26
|
Diagnostic Accuracy of Machine Learning Models on Mammography in Breast Cancer Classification: A Meta-Analysis. Diagnostics (Basel) 2022; 12:diagnostics12071643. [PMID: 35885548 PMCID: PMC9320089 DOI: 10.3390/diagnostics12071643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/29/2022] [Accepted: 06/29/2022] [Indexed: 11/16/2022] Open
Abstract
In this meta-analysis, we aimed to estimate the diagnostic accuracy of machine learning models on digital mammograms and tomosynthesis in breast cancer classification and to assess the factors affecting its diagnostic accuracy. We searched for related studies in Web of Science, Scopus, PubMed, Google Scholar and Embase. The studies were screened in two stages to exclude the unrelated studies and duplicates. Finally, 36 studies containing 68 machine learning models were included in this meta-analysis. The area under the curve (AUC), hierarchical summary receiver operating characteristics (HSROC) curve, pooled sensitivity and pooled specificity were estimated using a bivariate Reitsma model. Overall AUC, pooled sensitivity and pooled specificity were 0.90 (95% CI: 0.85–0.90), 0.83 (95% CI: 0.78–0.87) and 0.84 (95% CI: 0.81–0.87), respectively. Additionally, the three significant covariates identified in this study were country (p = 0.003), source (p = 0.002) and classifier (p = 0.016). The type of data covariate was not statistically significant (p = 0.121). Additionally, Deeks’ linear regression test indicated that there exists a publication bias in the included studies (p = 0.002). Thus, the results should be interpreted with caution.
Collapse
|
27
|
Padula WV, Kreif N, Vanness DJ, Adamson B, Rueda JD, Felizzi F, Jonsson P, IJzerman MJ, Butte A, Crown W. Machine Learning Methods in Health Economics and Outcomes Research-The PALISADE Checklist: A Good Practices Report of an ISPOR Task Force. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2022; 25:1063-1080. [PMID: 35779937 DOI: 10.1016/j.jval.2022.03.022] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 03/25/2022] [Indexed: 06/15/2023]
Abstract
Advances in machine learning (ML) and artificial intelligence offer tremendous potential benefits to patients. Predictive analytics using ML are already widely used in healthcare operations and care delivery, but how can ML be used for health economics and outcomes research (HEOR)? To answer this question, ISPOR established an emerging good practices task force for the application of ML in HEOR. The task force identified 5 methodological areas where ML could enhance HEOR: (1) cohort selection, identifying samples with greater specificity with respect to inclusion criteria; (2) identification of independent predictors and covariates of health outcomes; (3) predictive analytics of health outcomes, including those that are high cost or life threatening; (4) causal inference through methods, such as targeted maximum likelihood estimation or double-debiased estimation-helping to produce reliable evidence more quickly; and (5) application of ML to the development of economic models to reduce structural, parameter, and sampling uncertainty in cost-effectiveness analysis. Overall, ML facilitates HEOR through the meaningful and efficient analysis of big data. Nevertheless, a lack of transparency on how ML methods deliver solutions to feature selection and predictive analytics, especially in unsupervised circumstances, increases risk to providers and other decision makers in using ML results. To examine whether ML offers a useful and transparent solution to healthcare analytics, the task force developed the PALISADE Checklist. It is a guide for balancing the many potential applications of ML with the need for transparency in methods development and findings.
Collapse
Affiliation(s)
- William V Padula
- Department of Pharmaceutical and Health Economics, School of Pharmacy, University of Southern California, Los Angeles, CA, USA; The Leonard D. Schaeffer Center for Health Policy & Economics, University of Southern California, Los Angeles, CA, USA.
| | - Noemi Kreif
- Centre for Health Economics, University of York, York, England, UK
| | - David J Vanness
- Department of Health Policy and Administration, College of Health and Human Development, Pennsylvania State University, Hershey, PA, USA
| | | | | | | | - Pall Jonsson
- National Institute for Health and Care Excellence, Manchester, England, UK
| | - Maarten J IJzerman
- Centre for Health Policy, School of Population and Global Health, University of Melbourne, Melbourne, Australia
| | - Atul Butte
- School of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - William Crown
- The Heller School for Social Policy and Management, Brandeis University, Waltham, MA, USA.
| |
Collapse
|
28
|
Charles S, Sreekumar J, Natarajan J. Transcriptomic meta-analysis reveals biomarker pairs and key pathways in Tetralogy of Fallot. J Bioinform Comput Biol 2022; 20:2240004. [DOI: 10.1142/s0219720022400042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
29
|
Mondol RK, Truong ND, Reza M, Ippolito S, Ebrahimie E, Kavehei O. AFExNet: An Adversarial Autoencoder for Differentiating Breast Cancer Sub-Types and Extracting Biologically Relevant Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2060-2070. [PMID: 33720833 DOI: 10.1109/tcbb.2021.3066086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Technological advancements in high-throughput genomics enable the generation of complex and large data sets that can be used for classification, clustering, and bio-marker identification. Modern deep learning algorithms provide us with the opportunity of finding most significant features in such huge dataset to characterize diseases (e.g., cancer) and their sub-types. Thus, developing such deep learning method, which can successfully extract meaningful features from various breast cancer sub-types, is of current research interest. In this paper, we develop dual stage (unsupervised pre-training and supervised fine-tuning) neural network architecture termed AFExNet based on adversarial auto-encoder (AAE) to extract features from high dimensional genetic data. We evaluated the performance of our model through twelve different supervised classifiers to verify the usefulness of the new features using public RNA-Seq dataset of breast cancer. AFExNet provides consistent results in all performance metrics across twelve different classifiers which makes our model classifier independent. We also develop a method named 'TopGene' to find highly weighted genes from the latent space which could be useful for finding cancer bio-markers. Put together, AFExNet has great potential for biological data to accurately and effectively extract features. Our work is fully reproducible and source code can be downloaded from Github: https://github.com/NeuroSyd/breast-cancer-sub-types.
Collapse
|
30
|
Zou R, Zhao W, Xiao S, Lu Y. A Signature of Three Apoptosis-Related Genes Predicts Overall Survival in Breast Cancer. Front Surg 2022; 9:863035. [PMID: 35769153 PMCID: PMC9235836 DOI: 10.3389/fsurg.2022.863035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 04/25/2022] [Indexed: 12/17/2022] Open
Abstract
Background The commonest malignancy in women is known as breast cancer (BC). Numerous studies demonstrated that apoptosis appears to be critical to the management and clinical outcome of BC patients. The purpose of this study is to explore the potential connection between apoptosis and BC and establish the apoptosis-associated gene signature in BC. Methods The data of BC patient transcripts and related clinical information comes from the Cancer Genome Atlas Database (TCGA), and the genes related to apoptosis come from the Molecular Characterization Database (MSigDB). We identified the abnormally expressed apoptosis-related genes in BC samples. The optimal apoptosis-related genes screened by Cox regression analysis were designed to construct a prognostic model for predicting BC patients. Using the Nom Chart to Predict 1-Year, 3-Year, and 5-Year overall survival for BC patients. The gene signature-related functional pathways were explored by gene set enrichment analysis (GSEA). Results Three genes [alpha subunit of the interleukin 3 receptor (IL3RA), apoptosis-inducing factor mitochondrial-associated 1 (AIFM1), and phosphatidylinositol-3 kinase catalytic alpha (PIK3CA)] correlated with apoptosis were shown to be strongly linked to the overall survival of BC. Survival analysis shows that the risk score is directly proportional to the poor prognosis of BC patients. Risk assessment based on three genetic characteristics (age, pathological stage N, and pathological stage M) can independently predict the prognosis of patients with BC. The Nom chart is most suitable for assessing the long-term survival rate of BC patients. The results of GSEA demonstrated that numerous cell cycle-related pathways were abundant in the high-risk group. Conclusion We constructed an apoptosis-associated gene signature in BC, which had a potential clinical application prospect for BC patients.
Collapse
|
31
|
Chaudhury S, Krishna AN, Gupta S, Sankaran KS, Khan S, Sau K, Raghuvanshi A, Sammy F. Effective Image Processing and Segmentation-Based Machine Learning Techniques for Diagnosis of Breast Cancer. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:6841334. [PMID: 35432588 PMCID: PMC9012610 DOI: 10.1155/2022/6841334] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Revised: 03/06/2022] [Accepted: 03/21/2022] [Indexed: 01/21/2023]
Abstract
Breast cancer is the second leading cause of death among women, behind only heart disease. However, despite the high incidence and mortality rates associated with breast cancer, it is still unclear as to what is responsible for its development in the first place. The prevention of breast cancer is not possible with any of the current available methods. Patients who are diagnosed and treated for breast cancer at an early stage have a better chance of having a successful treatment and recovery. In the field of breast cancer detection, digital mammography is widely acknowledged to be a highly effective method of detecting the disease early on. We may be able to improve early detection of breast cancer with the use of image processing techniques, thereby boosting our chances of survival and treatment success. This article discusses a breast cancer image processing and machine learning framework that was developed. The input data set for this framework is a sequence of mammography images, which are used as input data. The CLAHE approach is then utilized to improve the overall quality of the photographs by means of image processing. It is called contrast restricted adaptive histogram equalization (CLAHE), and it is an improvement on the original histogram equalization technique. This aids in the removal of noise from photographs while simultaneously improving picture quality. The segmentation of images is the next step in the framework's development. An image is divided into distinct portions at this point because the pixels are labeled at this step. This assists in the identification of objects and the delineation of boundaries. To categorize these preprocessed images, techniques such as fuzzy SVM, Bayesian classifier, and random forest are employed, among others.
Collapse
Affiliation(s)
| | - Alla Naveen Krishna
- Mechanical Engineering Department, Institute of Aeronautical Engineering, Hyderabad, India
| | - Suneet Gupta
- Department of CSE, School of Engineering and Technology, Mody University, Lakshmangarh, Rajasthan, India
| | | | - Samiullah Khan
- Department of Maths, Stat & Computer Science, The University of Agriculture, Pakistan
| | - Kartik Sau
- University of Engineering and Management, Kolkata, West Bengal, India
| | | | - F. Sammy
- Department of Information Technology, Dambi Dollo University, Dembi Dolo, Welega, Ethiopia
| |
Collapse
|
32
|
Steiner CA, Berinstein JA, Louissaint J, Higgins PDR, Spence JR, Shannon C, Lu C, Stidham RW, Fletcher JG, Bruining DH, Feagan BG, Jairath V, Baker ME, Bettenworth D, Rieder F. Biomarkers for the Prediction and Diagnosis of Fibrostenosing Crohn's Disease: A Systematic Review. Clin Gastroenterol Hepatol 2022; 20:817-846.e10. [PMID: 34089850 PMCID: PMC8636551 DOI: 10.1016/j.cgh.2021.05.054] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 05/20/2021] [Accepted: 05/23/2021] [Indexed: 12/15/2022]
Abstract
BACKGROUND AND AIMS Intestinal strictures are a common complication of Crohn's disease (CD). Biomarkers of intestinal strictures would assist in their prediction, diagnosis, and monitoring. Herein we provide a comprehensive systematic review of studies assessing biomarkers that may predict or diagnose CD-associated strictures. METHODS We performed a systematic review of PubMed, EMBASE, ISI Web of Science, Cochrane Library, and Scopus to identify citations pertaining to biomarkers of intestinal fibrosis through July 6, 2020, that used a reference standard of full-thickness histopathology or cross-sectional imaging or endoscopy. Studies were categorized based on the type of biomarker they evaluated (serum, genetic, histopathologic, or fecal). RESULTS Thirty-five distinct biomarkers from 3 major groups were identified: serum (20 markers), genetic (9 markers), and histopathology (6 markers). Promising markers include cartilage oligomeric matrix protein, hepatocyte growth factor activator, and lower levels of microRNA-19-3p (area under the curves were 0.805, 0.738, and 0.67, respectively), and multiple anti-flagellin antibodies (A4-Fla2 [odds ratio, 3.41], anti Fla-X [odds ratio, 2.95], and anti-CBir1 [multiple]). Substantial heterogeneity was observed and none of the markers had undergone formal validation. Specific limitations to acceptance of these markers included failure to use a standardized definition of stricturing disease, lack of specificity, and insufficient relevance to the pathogenesis of intestinal strictures or incomplete knowledge regarding their operating properties. CONCLUSIONS There is a lack of well-defined studies on biomarkers of intestinal stricture. Development of reliable and accurate biomarkers of stricture is a research priority. Biomarkers can support the clinical management of CD patients and aid in the stratification and monitoring of patients during clinical trials of future antifibrotic drug candidates.
Collapse
Affiliation(s)
- Calen A Steiner
- Division of Gastroenterology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan.
| | - Jeffrey A Berinstein
- Division of Gastroenterology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Jeremy Louissaint
- Division of Gastroenterology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Peter D R Higgins
- Division of Gastroenterology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Jason R Spence
- Division of Gastroenterology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan; Department of Cell and Developmental Biology, University of Michigan, Ann Arbor, Michigan
| | - Carol Shannon
- Taubman Health Sciences Library, University of Michigan, Ann Arbor, Michigan
| | - Cathy Lu
- Division of Gastroenterology, Department of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Ryan W Stidham
- Division of Gastroenterology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | | | - David H Bruining
- Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, Minnesota
| | - Brian G Feagan
- Alimentiv Inc, London, Ontario, Canada; Department of Medicine, Western University, London, Ontario, Canada; Department of Biostatistics and Epidemiology, Western University, London, Ontario, Canada
| | - Vipul Jairath
- Alimentiv Inc, London, Ontario, Canada; Department of Medicine, Western University, London, Ontario, Canada; Department of Biostatistics and Epidemiology, Western University, London, Ontario, Canada
| | - Mark E Baker
- Section of Abdominal Imaging, Imaging Institute, Digestive Diseases and Surgery Institute and Taussig Cancer Institute, Cleveland Clinic, Cleveland, Ohio
| | - Dominik Bettenworth
- Department of Medicine B, Gastroenterology and Hepatology, University of Münster, Münster, Germany
| | - Florian Rieder
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, Ohio; Department of Gastroenterology, Hepatology, and Nutrition, Digestive Diseases and Surgery Institute, Cleveland Clinic Foundation, Cleveland, Ohio
| |
Collapse
|
33
|
Recent Applications of Artificial Intelligence in Radiotherapy: Where We Are and Beyond. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12073223] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
In recent decades, artificial intelligence (AI) tools have been applied in many medical fields, opening the possibility of finding novel solutions for managing very complex and multifactorial problems, such as those commonly encountered in radiotherapy (RT). We conducted a PubMed and Scopus search to identify the AI application field in RT limited to the last four years. In total, 1824 original papers were identified, and 921 were analyzed by considering the phase of the RT workflow according to the applied AI approaches. AI permits the processing of large quantities of information, data, and images stored in RT oncology information systems, a process that is not manageable for individuals or groups. AI allows the iterative application of complex tasks in large datasets (e.g., delineating normal tissues or finding optimal planning solutions) and might support the entire community working in the various sectors of RT, as summarized in this overview. AI-based tools are now on the roadmap for RT and have been applied to the entire workflow, mainly for segmentation, the generation of synthetic images, and outcome prediction. Several concerns were raised, including the need for harmonization while overcoming ethical, legal, and skill barriers.
Collapse
|
34
|
Al-Harazi O, Kaya IH, El Allali A, Colak D. A Network-Based Methodology to Identify Subnetwork Markers for Diagnosis and Prognosis of Colorectal Cancer. Front Genet 2021; 12:721949. [PMID: 34790220 PMCID: PMC8591094 DOI: 10.3389/fgene.2021.721949] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 09/28/2021] [Indexed: 12/30/2022] Open
Abstract
The development of reliable methods for identification of robust biomarkers for complex diseases is critical for disease diagnosis and prognosis efforts. Integrating multi-omics data with protein-protein interaction (PPI) networks to investigate diseases may help better understand disease characteristics at the molecular level. In this study, we developed and tested a novel network-based method to detect subnetwork markers for patients with colorectal cancer (CRC). We performed an integrated omics analysis using whole-genome gene expression profiling and copy number alterations (CNAs) datasets followed by building a gene interaction network for the significantly altered genes. We then clustered the constructed gene network into subnetworks and assigned a score for each significant subnetwork. We developed a support vector machine (SVM) classifier using these scores as feature values and tested the methodology in independent CRC transcriptomic datasets. The network analysis resulted in 15 subnetwork markers that revealed several hub genes that may play a significant role in colorectal cancer, including PTP4A3, FGFR2, PTX3, AURKA, FEN1, INHBA, and YES1. The 15-subnetwork classifier displayed over 98 percent accuracy in detecting patients with CRC. In comparison to individual gene biomarkers, subnetwork markers based on integrated multi-omics and network analyses may lead to better disease classification, diagnosis, and prognosis.
Collapse
Affiliation(s)
- Olfat Al-Harazi
- Biostatistics, Epidemiology and Scientific Computing Department, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Ibrahim H Kaya
- College of Medicine, Alfaisal University, Riyadh, Saudi Arabia
| | - Achraf El Allali
- African Genome Center, Mohammed VI Polytechnic University, Benguerir, Morocco
| | - Dilek Colak
- Biostatistics, Epidemiology and Scientific Computing Department, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| |
Collapse
|
35
|
Berlow NE. Probabilistic Boolean Modeling of Pre-clinical Tumor Models for Biomarker Identification in Cancer Drug Development. Curr Protoc 2021; 1:e269. [PMID: 34661991 DOI: 10.1002/cpz1.269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
As high-throughput sequencing experiments become more widely used in pre-clinical and clinical settings, pharmacogenetic and pharmacogenomic biomarker development plays an increasingly important role in oncology drug development pipelines and programs. Consequently, computer-based learning approaches have entered into use at multiple stages in pre-clinical and clinical pipelines. However, few approaches are available to identify interpretable and implementable biomarkers of response early in the drug development process when only small pre-clinical data packages are available. To address the need for early-stage biomarker development using pre-clinical tumor models, we have adapted the previously published Probabilistic Target Inhibitor Map (PTIM) platform to the challenge of biomarker hypothesis development, and denoted this approach the Probabilistic Target Map-Biomarker (PTM-Biomarker). In this article, we detail the history and design philosophy of PTM-Biomarker, and present two case studies using the biomarker discovery tool to illustrate its utility in guiding cancer drug development. © 2021 Wiley Periodicals LLC.
Collapse
|
36
|
Makond B, Wang KJ, Wang KM. Benchmarking prognosis methods for survivability - A case study for patients with contingent primary cancers. Comput Biol Med 2021; 138:104888. [PMID: 34610552 DOI: 10.1016/j.compbiomed.2021.104888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 09/17/2021] [Indexed: 11/18/2022]
Abstract
BACKGROUND There is an increasing number of patients with a first primary cancer who are diagnosed with a second primary cancer, but prognosis methods to predict the survivability of a patient with multiple primary cancers have not been fully benchmarked. METHODS This study investigated the five-year survivability prognosis performances of six machine learning approaches. These approaches are: artificial neural network, decision tree (DT), logistic regression, support vector machine, naïve Bayes (NB), and Bayesian network (BN). A synthetic minority over-sampling technique (SMOTE) was used to solve the imbalanced problem, and a nationwide cancer patient database containing 7,845 subjects in Taiwan was used as a sample source. Ten primary and secondary cancers and their key variables affecting the survivability of the patients were identified. RESULTS All the models using SMOTE improved sensitivity and specificity significantly. NB has the highest performance in terms of accuracy and specificity, whereas BN has the highest performance in terms of sensitivity. Further, the computational time and the power of knowledge representation of NB, BN, and DT outperformed the others. CONCLUSIONS Selecting the appropriate prognosis models to predict survivability of patients with two contingent primary cancers can aid precise prediction and can support appropriate treatment advice.
Collapse
Affiliation(s)
- Bunjira Makond
- Faculty of Commerce and Management, Prince of Songkla University, Trang, Thailand.
| | - Kung-Jeng Wang
- Department of Industrial Management National Taiwan University of Science and Technology, Taipei, 106, ROC, Taiwan.
| | - Kung-Min Wang
- Department of Surgery, Shin-Kong Wu Ho-Su Memorial Hospital, Taipei, R.O.C, Taiwan.
| |
Collapse
|
37
|
Rahman M, Ghasemi Y, Suley E, Zhou Y, Wang S, Rogers J. Machine Learning Based Computer Aided Diagnosis of Breast Cancer Utilizing Anthropometric and Clinical Features. Ing Rech Biomed 2021. [DOI: 10.1016/j.irbm.2020.05.005] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
|
38
|
Identification of Dipeptidyl Peptidase (DPP) Family Genes in Clinical Breast Cancer Patients via an Integrated Bioinformatics Approach. Diagnostics (Basel) 2021; 11:diagnostics11071204. [PMID: 34359286 PMCID: PMC8304478 DOI: 10.3390/diagnostics11071204] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 06/29/2021] [Accepted: 06/29/2021] [Indexed: 12/17/2022] Open
Abstract
Breast cancer is a heterogeneous disease involving complex interactions of biological processes; thus, it is important to develop therapeutic biomarkers for treatment. Members of the dipeptidyl peptidase (DPP) family are metalloproteases that specifically cleave dipeptides. This family comprises seven members, including DPP3, DPP4, DPP6, DPP7, DPP8, DPP9, and DPP10; however, information on the involvement of DPPs in breast cancer is lacking in the literature. As such, we aimed to study their roles in this cancerous disease using publicly available databases such as cBioportal, Oncomine, and Kaplan–Meier Plotter. These databases comprise comprehensive high-throughput transcriptomic profiles of breast cancer across multiple datasets. Furthermore, together with investigating the messenger RNA expression levels of these genes, we also aimed to correlate these expression levels with breast cancer patient survival. The results showed that DPP3 and DPP9 had significantly high expression profiles in breast cancer tissues relative to normal breast tissues. High expression levels of DPP3 and DPP4 were associated with poor survival of breast cancer patients, whereas high expression levels of DPP6, DPP7, DPP8, and DPP9 were associated with good prognoses. Additionally, positive correlations were also revealed of DPP family genes with the cell cycle, transforming growth factor (TGF)-beta, kappa-type opioid receptor, and immune response signaling, such as interleukin (IL)-4, IL6, IL-17, tumor necrosis factor (TNF), and interferon (IFN)-alpha/beta. Collectively, DPP family members, especially DPP3, may serve as essential prognostic biomarkers in breast cancer.
Collapse
|
39
|
Hsiao YW, Tao CL, Chuang EY, Lu TP. A risk prediction model of gene signatures in ovarian cancer through bagging of GA-XGBoost models. J Adv Res 2021; 30:113-122. [PMID: 34026291 PMCID: PMC8132202 DOI: 10.1016/j.jare.2020.11.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 10/10/2020] [Accepted: 11/05/2020] [Indexed: 12/12/2022] Open
Abstract
Introduction Ovarian cancer (OC) is one of the most frequent gynecologic cancers among women, and high-accuracy risk prediction techniques are essential to effectively select the best intervention strategies and clinical management for OC patients at different risk levels. Current risk prediction models used in OC have low sensitivity, and few of them are able to identify OC patients at high risk of mortality, which would both optimize the treatment of high-risk patients and prevent unnecessary medical intervention in those at low risk. Objectives To this end, we have developed a bagging-based algorithm with GA-XGBoost models that predicts the risk of death from OC using gene expression profiles. Methods Four gene expression datasets from public sources were used as training (n = 1) or validation (n = 3) sets. The performance of our proposed algorithm was compared with fine-tuning and other existing methods. Moreover, the biological function of selected genetic features was further interpreted, and the response to a panel of approved drugs was predicted for different risk levels. Results The proposed algorithm showed good sensitivity (74-100%) in the validation sets, compared with two simple models whose sensitivity only reached 47% and 60%. The prognostic gene signature used in this study was highly connected to AKT, a key component of the PI3K/AKT/mTOR signaling pathway, which influences the tumorigenesis, proliferation, and progression of OC. Conclusion These findings demonstrated an improvement in the sensitivity of risk classification of OC patients with our risk prediction models compared with other methods. Ongoing effort is needed to validate the outcomes of this approach for precise clinical treatment.
Collapse
Affiliation(s)
- Yi-Wen Hsiao
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Chun-Liang Tao
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Eric Y. Chuang
- Bioinformatics and Biostatistics Core, Center of Genomic and Precision Medicine, National Taiwan University, Taipei, Taiwan
- Graduate Institute of Biomedical Electronics and Bioinformatics, Department of Electrical Engineering, National Taiwan University, Taiwan
| | - Tzu-Pin Lu
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
- Bioinformatics and Biostatistics Core, Center of Genomic and Precision Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
40
|
Del Giudice M, Peirone S, Perrone S, Priante F, Varese F, Tirtei E, Fagioli F, Cereda M. Artificial Intelligence in Bulk and Single-Cell RNA-Sequencing Data to Foster Precision Oncology. Int J Mol Sci 2021; 22:ijms22094563. [PMID: 33925407 PMCID: PMC8123853 DOI: 10.3390/ijms22094563] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 04/21/2021] [Accepted: 04/23/2021] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence, or the discipline of developing computational algorithms able to perform tasks that requires human intelligence, offers the opportunity to improve our idea and delivery of precision medicine. Here, we provide an overview of artificial intelligence approaches for the analysis of large-scale RNA-sequencing datasets in cancer. We present the major solutions to disentangle inter- and intra-tumor heterogeneity of transcriptome profiles for an effective improvement of patient management. We outline the contributions of learning algorithms to the needs of cancer genomics, from identifying rare cancer subtypes to personalizing therapeutic treatments.
Collapse
Affiliation(s)
- Marco Del Giudice
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Candiolo Cancer Institute, FPO—IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy
| | - Serena Peirone
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Department of Physics and INFN, Università degli Studi di Torino, via P.Giuria 1, 10125 Turin, Italy
| | - Sarah Perrone
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Department of Physics, Università degli Studi di Torino, via P.Giuria 1, 10125 Turin, Italy
| | - Francesca Priante
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Department of Physics, Università degli Studi di Torino, via P.Giuria 1, 10125 Turin, Italy
| | - Fabiola Varese
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Department of Life Science and System Biology, Università degli Studi di Torino, via Accademia Albertina 13, 10123 Turin, Italy
| | - Elisa Tirtei
- Paediatric Onco-Haematology Division, Regina Margherita Children’s Hospital, City of Health and Science of Turin, 10126 Turin, Italy; (E.T.); (F.F.)
| | - Franca Fagioli
- Paediatric Onco-Haematology Division, Regina Margherita Children’s Hospital, City of Health and Science of Turin, 10126 Turin, Italy; (E.T.); (F.F.)
- Department of Public Health and Paediatric Sciences, University of Torino, 10124 Turin, Italy
| | - Matteo Cereda
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Candiolo Cancer Institute, FPO—IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy
- Correspondence: ; Tel.: +39-011-993-3969
| |
Collapse
|
41
|
Huang Y, Smith W, Harwood C, Wipat A, Bacardit J. Computational Strategies for the Identification of a Transcriptional Biomarker Panel to Sense Cellular Growth States in Bacillus subtilis. SENSORS (BASEL, SWITZERLAND) 2021; 21:2436. [PMID: 33916259 PMCID: PMC8036383 DOI: 10.3390/s21072436] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/29/2021] [Accepted: 03/30/2021] [Indexed: 01/08/2023]
Abstract
A goal of the biotechnology industry is to be able to recognise detrimental cellular states that may lead to suboptimal or anomalous growth in a bacterial population. Our current knowledge of how different environmental treatments modulate gene regulation and bring about physiology adaptations is limited, and hence it is difficult to determine the mechanisms that lead to their effects. Patterns of gene expression, revealed using technologies such as microarrays or RNA-seq, can provide useful biomarkers of different gene regulatory states indicative of a bacterium's physiological status. It is desirable to have only a few key genes as the biomarkers to reduce the costs of determining the transcriptional state by opening the way for methods such as quantitative RT-PCR and amplicon panels. In this paper, we used unsupervised machine learning to construct a transcriptional landscape model from condition-dependent transcriptome data, from which we have identified 10 clusters of samples with differentiated gene expression profiles and linked to different cellular growth states. Using an iterative feature elimination strategy, we identified a minimal panel of 10 biomarker genes that achieved 100% cross-validation accuracy in predicting the cluster assignment. Moreover, we designed and evaluated a variety of data processing strategies to ensure our methods were able to generate meaningful transcriptional landscape models, capturing relevant biological processes. Overall, the computational strategies introduced in this study facilitate the identification of a detailed set of relevant cellular growth states, and how to sense them using a reduced biomarker panel.
Collapse
Affiliation(s)
- Yiming Huang
- Interdisciplinary Computing and Complex BioSystems (ICOS) Group, School of Computing, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (Y.H.); (W.S.)
| | - Wendy Smith
- Interdisciplinary Computing and Complex BioSystems (ICOS) Group, School of Computing, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (Y.H.); (W.S.)
| | - Colin Harwood
- Centre for Bacterial Cell Biology, Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK;
| | - Anil Wipat
- Interdisciplinary Computing and Complex BioSystems (ICOS) Group, School of Computing, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (Y.H.); (W.S.)
| | - Jaume Bacardit
- Interdisciplinary Computing and Complex BioSystems (ICOS) Group, School of Computing, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (Y.H.); (W.S.)
| |
Collapse
|
42
|
A two-stage modeling approach for breast cancer survivability prediction. Int J Med Inform 2021; 149:104438. [PMID: 33730681 DOI: 10.1016/j.ijmedinf.2021.104438] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 01/24/2021] [Accepted: 03/08/2021] [Indexed: 01/14/2023]
Abstract
BACKGROUND Despite the increasing number of studies in breast cancer survival prediction, there is little attention put toward deceased patients and their survival lengths. Moreover, developing a model that is both accurate and interpretable remains a challenge. OBJECTIVE This paper proposes a two-stage data analytic framework, where Stage I classifies the survival and deceased statuses and Stage II predicts the number of survival months for deceased females with cancer. Since medical data are not entirely clean nor prepared for model development, we aim to show that data preparation can strengthen a simple Generalized Linear Model (GLM)1 to predict as accurate as the complex models like Extreme Gradient Boosting (XGB)2 and Multilayer Perceptron based on Artificial Neural Networks (MLP-ANNs)3 in both stages. METHODS In Stage I, we use recent Surveillance, Epidemiology, and End Results (SEER)4 data from 2004 to 2016 to predict short term survival statuses from 6-months to 3-years with 6 month increments. Synthetic Minority Over-sampling Technique (SMOTE),5 Relocating Safe-Level SMOTE (RSLS)6, Adaptive Synthetic (ADASYN)7 re-sampling techniques, Least Absolute Shrinkage and Selection Operator (LASSO)8 and Random Forest (RF)9 feature selection methods along with integer and one-hot encoding are combined with the three popular data mining methods: GLM, XGB, and MLP. In Stage II, we predict the number of survival months for patients who are correctly predicted as deceased within 3-years. Again, we employ GLM, XGB, and MLP for regression along with LASSO and RF for feature selection and one-hot encoding to encode the categorical features. RESULTS We obtain Area Under the Receiver Operating Characteristic Curve (AUC)10 values of 0.900, 0.898, 0.877, 0.852, 0.852, and 0.858 for 6-month, 1-, 1.5-, 2-, 2.5, and 3-year survival time-points, respectively, using OneHotEncoding-GLM-LASSO-ADASYN. We use the change in the Odds Ratio values in GLM to manifest the impact of individual categorical levels and numerical features on the odds of death. In Stage II, we obtain Mean Absolute Error (MAE)11 of 7.960 months using OneHotEncoding-GLM-LASSO when predicting the number of survival months for deceased patients. We present the top contributing features and their coefficient values to illustrate how the presence of each feature alters the predicted number of survival months. CONCLUSION To the best of our knowledge, this is the first study that implements both breast cancer survival classification and regression in a two-stage approach. All data-driven findings are presented in order to assist clinicians make better care decisions using GLM, an interpretable and computationally efficient method that predicts survival status and survival lengths for deceased patients, to help foster human and machine interactions.
Collapse
|
43
|
Bhandari N, Khare S, Walambe R, Kotecha K. Comparison of machine learning and deep learning techniques in promoter prediction across diverse species. PeerJ Comput Sci 2021; 7:e365. [PMID: 33817015 PMCID: PMC7959599 DOI: 10.7717/peerj-cs.365] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 12/30/2020] [Indexed: 06/12/2023]
Abstract
Gene promoters are the key DNA regulatory elements positioned around the transcription start sites and are responsible for regulating gene transcription process. Various alignment-based, signal-based and content-based approaches are reported for the prediction of promoters. However, since all promoter sequences do not show explicit features, the prediction performance of these techniques is poor. Therefore, many machine learning and deep learning models have been proposed for promoter prediction. In this work, we studied methods for vector encoding and promoter classification using genome sequences of three distinct higher eukaryotes viz. yeast (Saccharomyces cerevisiae), A. thaliana (plant) and human (Homo sapiens). We compared one-hot vector encoding method with frequency-based tokenization (FBT) for data pre-processing on 1-D Convolutional Neural Network (CNN) model. We found that FBT gives a shorter input dimension reducing the training time without affecting the sensitivity and specificity of classification. We employed the deep learning techniques, mainly CNN and recurrent neural network with Long Short Term Memory (LSTM) and random forest (RF) classifier for promoter classification at k-mer sizes of 2, 4 and 8. We found CNN to be superior in classification of promoters from non-promoter sequences (binary classification) as well as species-specific classification of promoter sequences (multiclass classification). In summary, the contribution of this work lies in the use of synthetic shuffled negative dataset and frequency-based tokenization for pre-processing. This study provides a comprehensive and generic framework for classification tasks in genomic applications and can be extended to various classification problems.
Collapse
Affiliation(s)
- Nikita Bhandari
- Computer Science, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, MH, India
| | - Satyajeet Khare
- Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Pune, MH, India
| | - Rahee Walambe
- Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, Maharashtra, India
- Electronics and Telecommunication Dept, Symbiosis Institute of Technology, Pune, Maharashtra, India
| | - Ketan Kotecha
- Computer Science, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, MH, India
- Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, Maharashtra, India
| |
Collapse
|
44
|
Chen L, Li J, Chang M. Cancer Diagnosis and Disease Gene Identification via Statistical Machine Learning. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200207094947] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Diagnosing cancer and identifying the disease gene by using DNA microarray gene
expression data are the hot topics in current bioinformatics. This paper is devoted to the latest
development in cancer diagnosis and gene selection via statistical machine learning. A support
vector machine is firstly introduced for the binary cancer diagnosis. Then, 1-norm support vector
machine, doubly regularized support vector machine, adaptive huberized support vector machine
and other extensions are presented to improve the performance of gene selection. Lasso, elastic
net, partly adaptive elastic net, group lasso, sparse group lasso, adaptive sparse group lasso and
other sparse regression methods are also introduced for performing simultaneous binary cancer
classification and gene selection. In addition to introducing three strategies for reducing multiclass
to binary, methods of directly considering all classes of data in a learning model (multi_class
support vector, sparse multinomial regression, adaptive multinomial regression and so on) are
presented for performing multiple cancer diagnosis. Limitations and promising directions are also
discussed.
Collapse
Affiliation(s)
- Liuyuan Chen
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Juntao Li
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Mingming Chang
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| |
Collapse
|
45
|
Zhang T, Chang H, Zhang B, Liu S, Zhao T, Zhao E, Zhao H, Zhang H. Transboundary Pathogenic microRNA Analysis Framework for Crop Fungi Driven by Biological Big Data and Artificial Intelligence Model. Comput Biol Chem 2020; 89:107401. [PMID: 33068919 DOI: 10.1016/j.compbiolchem.2020.107401] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 09/19/2020] [Accepted: 10/05/2020] [Indexed: 12/13/2022]
Abstract
Plant fungal diseases have been affecting the world's agricultural production and economic levels for a long time, such as rice blast, gray tomato mold, potato late blight etc. Recent studies have shown that fungal pathogens transmit microRNA as an effector to host plants for infection. However, bioassay-based verification analysis is time-consuming and challenging, and it is difficult to analyze from a global perspective. With the accumulation of fungal and plant-related data, data analysis methods can be used to analyze pathogenic fungal microRNA further. Based on the microRNA expression data of fungal pathogens infecting plants before and after, this paper discusses the selection strategy of sample data, the extraction strategy of pathogenic fungal microRNA, the prediction strategy of a fungal pathogenic microRNA target gene, the bicluster-based fungal pathogenic microRNA functional analysis strategy and experimental verification methods. A general analysis pipeline based on machine learning and bicluster-based function module was proposed for plant-fungal pathogenic microRNA.The pipeline proposed in this paper is applied to the infection process of Magnaporthe oryzae and the infection process of potato late blight. It has been verified to prove the feasibility of the pipeline. It can be extended to other relevant crop pathogen research, providing a new idea for fungal research on plant diseases. It can be used as a reference for understanding the interaction between fungi and plants.
Collapse
Affiliation(s)
- Tianyue Zhang
- College of Computer Science and Technology, Jilin University, China
| | - Haowu Chang
- College of Computer Science and Technology, Jilin University, China
| | - Borui Zhang
- Columbia Independent School, Columbia, MO, USA
| | - Sifei Liu
- College of Computer Science and Technology, Jilin University, China
| | - Tianheng Zhao
- College of Computer Science and Technology, Jilin University, China
| | - Enshuang Zhao
- College of Computer Science and Technology, Jilin University, China
| | - Hengyi Zhao
- College of Computer Science and Technology, Jilin University, China
| | - Hao Zhang
- College of Computer Science and Technology, Jilin University, China.
| |
Collapse
|
46
|
Jansen L, Holleczek B, Kraywinkel K, Weberpals J, Schröder CC, Eberle A, Emrich K, Kajüter H, Katalinic A, Kieschke J, Nennecke A, Sirri E, Heil J, Schneeweiss A, Brenner H. Divergent Patterns and Trends in Breast Cancer Incidence, Mortality and Survival Among Older Women in Germany and the United States. Cancers (Basel) 2020; 12:cancers12092419. [PMID: 32858964 PMCID: PMC7565138 DOI: 10.3390/cancers12092419] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 08/14/2020] [Accepted: 08/20/2020] [Indexed: 12/25/2022] Open
Abstract
Background: Breast cancer treatment has changed tremendously over the last decades. In addition, the use of mammography screening for early detection has increased strongly. To evaluate the impact of these developments, long-term trends in incidence, mortality, stage distribution and survival were investigated for Germany and the United States (US). Methods: Using population-based cancer registry data, long-term incidence and mortality trends (1975–2015), shifts in stage distributions (1998–2015), and trends in five-year relative survival (1979–2015) were estimated. Additionally, trends in five-year relative survival after standardization for stage were explored (2004–2015). Results: Age-standardized breast cancer incidence rates were much higher in the US than in Germany in all periods, whereas age-standardized mortality began to lower in the US from the 1990s on. The largest and increasing differences were observed for patients aged 70+ years with a 19% lower incidence but 45% higher mortality in Germany in 2015. For this age group, large differences in stage distributions were observed, with 29% (Germany) compared to 15% (US) stage III and IV patients. Age-standardized five-year relative survival increased strongly between 1979–1983 and 2013–2015 in Germany (+17% units) and the US (+19% units) but was 9% units lower in German patients aged 70+ years in 2013–2015. This difference was entirely explained by differences in stage distributions. Conclusions: Overall, our results are in line with a later uptake and less extensive utilization of mammography screening in Germany. Further studies and efforts are highly needed to further explore and overcome the increased breast cancer mortality among elderly women in Germany.
Collapse
Affiliation(s)
- Lina Jansen
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.W.); (C.C.S.); (H.B.)
- Correspondence:
| | | | - Klaus Kraywinkel
- German Centre for Cancer Registry Data (ZfKD), Robert Koch-Institute, 13353 Berlin, Germany;
| | - Janick Weberpals
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.W.); (C.C.S.); (H.B.)
| | - Chloé Charlotte Schröder
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.W.); (C.C.S.); (H.B.)
| | - Andrea Eberle
- Cancer Registry of Bremen, Leibniz Institute for Prevention Research and Epidemiology—BIPS, 28359 Bremen, Germany;
| | - Katharina Emrich
- Cancer Registry of Rhineland-Palatinate, Institute for Medical Biostatistics, Epidemiology and Informatics, University Medical Center, Johannes Gutenberg University Mainz, 55116 Mainz, Germany;
| | - Hiltraud Kajüter
- Cancer Registry of North Rhine-Westphalia, 44801 Bochum, Germany;
| | | | - Joachim Kieschke
- Cancer Registry of Lower Saxony, 26121 Oldenburg, Germany; (J.K.); (E.S.)
| | | | - Eunice Sirri
- Cancer Registry of Lower Saxony, 26121 Oldenburg, Germany; (J.K.); (E.S.)
| | - Jörg Heil
- Department of Gynecology and Obstetrics, University Women’s Clinic, 69120 Heidelberg, Germany;
| | - Andreas Schneeweiss
- National Center for Tumor Diseases, Division Gynecologic Oncology, University Hospital and German Cancer Research Center, 69120 Heidelberg, Germany;
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.W.); (C.C.S.); (H.B.)
- Division of Preventive Oncology, German Cancer Research Center (DKFZ), and National Center for Tumor Diseases (NCT), 69120 Heidelberg, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center, 69120 Heidelberg, Germany
| |
Collapse
|
47
|
Establishment and Analysis of a Combined Diagnostic Model of Polycystic Ovary Syndrome with Random Forest and Artificial Neural Network. BIOMED RESEARCH INTERNATIONAL 2020; 2020:2613091. [PMID: 32884937 PMCID: PMC7455828 DOI: 10.1155/2020/2613091] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 07/27/2020] [Accepted: 08/03/2020] [Indexed: 12/14/2022]
Abstract
Polycystic ovary syndrome (PCOS) is one of the most common metabolic and reproductive endocrinopathies. However, few studies have tried to develop a diagnostic model based on gene biomarkers. In this study, we applied a computational method by combining two machine learning algorithms, including random forest (RF) and artificial neural network (ANN), to identify gene biomarkers and construct diagnostic model. We collected gene expression data from Gene Expression Omnibus (GEO) database containing 76 PCOS samples and 57 normal samples; five datasets were utilized, including one dataset for screening differentially expressed genes (DEGs), two training datasets, and two validation datasets. Firstly, based on RF, 12 key genes in 264 DEGs were identified to be vital for classification of PCOS and normal samples. Moreover, the weights of these key genes were calculated using ANN with microarray and RNA-seq training dataset, respectively. Furthermore, the diagnostic models for two types of datasets were developed and named neuralPCOS. Finally, two validation datasets were used to test and compare the performance of neuralPCOS with other two set of marker genes by area under curve (AUC). Our model achieved an AUC of 0.7273 in microarray dataset, and 0.6488 in RNA-seq dataset. To conclude, we uncovered gene biomarkers and developed a novel diagnostic model of PCOS, which would be helpful for diagnosis.
Collapse
|
48
|
Jubair S, Alkhateeb A, Tabl AA, Rueda L, Ngom A. A novel approach to identify subtype-specific network biomarkers of breast cancer survivability. ACTA ACUST UNITED AC 2020. [DOI: 10.1007/s13721-020-00249-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
49
|
Dhillon A, Singh A. eBreCaP: extreme learning-based model for breast cancer survival prediction. IET Syst Biol 2020; 14:160-169. [PMID: 32406380 PMCID: PMC8687246 DOI: 10.1049/iet-syb.2019.0087] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 03/19/2020] [Accepted: 03/26/2020] [Indexed: 01/17/2023] Open
Abstract
Breast cancer is the second leading cause of death in the world. Breast cancer research is focused towards its early prediction, diagnosis, and prognosis. Breast cancer can be predicted on omics profiles, clinical tests, and pathological images. The omics profiles comprise of genomic, proteomic, and transcriptomic profiles that are available as high-dimensional datasets. Survival prediction is carried out on omics data to predict early the onset of disease, relapse, reoccurrence of diseases, and biomarker identification. The early prediction of breast cancer is desired for the effective treatment of patients as delay can aggravate the staging of cancer. In this study, extreme learning machine (ELM) based model for breast cancer survival prediction named eBreCaP is proposed. It integrates the genomic (gene expression, copy number alteration, DNA methylation, protein expression) and pathological image datasets; and trains them using an ensemble of ELM with the six best-chosen models suitable to be applied on integrated data. eBreCaP has been evaluated on nine performance parameters, namely sensitivity, specificity, precision, accuracy, Matthews correlation coefficient, area under curve, area under precision-recall, hazard ratio, and concordance Index. eBreCaP has achieved an accuracy of 85% for early breast cancer survival prediction using the ensemble of ELM with gradient boosting.
Collapse
Affiliation(s)
- Arwinder Dhillon
- Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab 147001, India.
| | - Ashima Singh
- Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab 147001, India
| |
Collapse
|
50
|
Pietrucci D, Teofani A, Unida V, Cerroni R, Biocca S, Stefani A, Desideri A. Can Gut Microbiota Be a Good Predictor for Parkinson's Disease? A Machine Learning Approach. Brain Sci 2020; 10:brainsci10040242. [PMID: 32325848 PMCID: PMC7226159 DOI: 10.3390/brainsci10040242] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 04/09/2020] [Accepted: 04/16/2020] [Indexed: 12/12/2022] Open
Abstract
The involvement of the gut microbiota in Parkinson's disease (PD), investigated in several studies, identified some common alterations of the microbial community, such as a decrease in Lachnospiraceae and an increase in Verrucomicrobiaceae families in PD patients. However, the results of other bacterial families are often contradictory. Machine learning is a promising tool for building predictive models for the classification of biological data, such as those produced in metagenomic studies. We tested three different machine learning algorithms (random forest, neural networks and support vector machines), analyzing 846 metagenomic samples (472 from PD patients and 374 from healthy controls), including our published data and those downloaded from public databases. Prediction performance was evaluated by the area under curve, accuracy, precision, recall and F-score metrics. The random forest algorithm provided the best results. Bacterial families were sorted according to their importance in the classification, and a subset of 22 families has been identified for the prediction of patient status. Although the results are promising, it is necessary to train the algorithm with a larger number of samples in order to increase the accuracy of the procedure.
Collapse
Affiliation(s)
- Daniele Pietrucci
- Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy; (D.P.); (A.T.); (V.U.)
| | - Adelaide Teofani
- Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy; (D.P.); (A.T.); (V.U.)
| | - Valeria Unida
- Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy; (D.P.); (A.T.); (V.U.)
| | - Rocco Cerroni
- UOSD Parkinson’s Center, Department of Systems Medicine, University of Rome Tor Vergata, 00133 Rome, Italy; (R.C.); (A.S.)
| | - Silvia Biocca
- Department of Systems Medicine, University of Rome Tor Vergata, 00133 Rome, Italy;
| | - Alessandro Stefani
- UOSD Parkinson’s Center, Department of Systems Medicine, University of Rome Tor Vergata, 00133 Rome, Italy; (R.C.); (A.S.)
| | - Alessandro Desideri
- Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy; (D.P.); (A.T.); (V.U.)
- Correspondence: ; Tel.: +39-0672594376
| |
Collapse
|