1
|
Adnan N, Zand M, Huang THM, Ruan J. Construction and Evaluation of Robust Interpretation Models for Breast Cancer Metastasis Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1344-1353. [PMID: 34662279 PMCID: PMC9254332 DOI: 10.1109/tcbb.2021.3120673] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Interpretability of machine learning (ML) models represents the extent to which a model's decision-making process can be understood by model developers and/or end users. Transcriptomics-based cancer prognosis models, for example, while achieving good accuracy, are usually hard to interpret, due to the high-dimensional feature space and the complexity of models. As interpretability is critical for the transparency and fairness of ML models, several algorithms have been proposed to improve the interpretability of arbitrary classifiers. However, evaluation of these algorithms often requires substantial domain knowledge. Here, we propose a breast cancer metastasis prediction model using a very small number of biologically interpretable features, and a simple yet novel model interpretation approach that can provide personalized interpretations. In addition, we contributed, to the best of our knowledge, the first method to quantitatively compare different interpretation algorithms. Experimental results show that our model not only achieved competitive prediction accuracy, but also higher inter-classifier interpretation consistency than state-of-the-art interpretation methods. Importantly, our interpretation results can improve the generalizability of the prediction models. Overall, this work provides several novel ideas to construct and evaluate interpretable ML models that can be valuable to both the cancer machine learning community and related application domains.
Collapse
|
2
|
Accuracy Enhancement for Breast Cancer Detection using Classification and Feature Selection. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH 2022. [DOI: 10.4018/ijirr.299931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Chronic disease like kidney failure, heart disease, cancer etc. is the major cause of deaths now days worldwide. Especially for the females the most dangerous type of disease from which the women of every age group are suffering especially the middle age group women’s is the breast cancer. To detect this type of disease at an early stage is a challenging task. In order to predict the breast cancer at an early stage classification algorithm of high accuracy and less error rate are desirable. In this research work we have used 4 classification algorithms K-NN, J48, Logistic regression and Bayes Net for building the predictive model, also the wrapper method of feature selection is used to enhance the accuracy rate and reduce the error rate of the used classifiers. To carry out this research we have used Wisconsin Diagnostic Breast Cancer dataset which contains 569 instances along with 32 attributes and a class attribute which will predict the type of cancer i.e. Benign or Malignant.
Collapse
|
3
|
Kotlyar M, Wong SWH, Pastrello C, Jurisica I. Improving Analysis and Annotation of Microarray Data with Protein Interactions. Methods Mol Biol 2022; 2401:51-68. [PMID: 34902122 DOI: 10.1007/978-1-0716-1839-4_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Gene expression microarrays are one of the most widely used high-throughput technologies in molecular biology, with applications such as identification of disease mechanisms and development of diagnostic and prognostic gene signatures. However, the success of these tasks is often limited because microarray analysis does not account for the complex relationships among genes, their products, and overall signaling and regulatory cascades. Incorporating protein-protein interaction data into microarray analysis can help address these challenges. This chapter reviews how protein-protein interactions can help with microarray analysis, leading to benefits such as better explanations of disease mechanisms, more complete gene annotations, improved prioritization of genes for future experiments, and gene signatures that generalize better to new data.
Collapse
Affiliation(s)
- Max Kotlyar
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute, University Health Network, Toronto, ON, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto Western Hospital, University Health Network, Toronto, ON, Canada
| | - Serene W H Wong
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute, University Health Network, Toronto, ON, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto Western Hospital, University Health Network, Toronto, ON, Canada
| | - Chiara Pastrello
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute, University Health Network, Toronto, ON, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto Western Hospital, University Health Network, Toronto, ON, Canada
| | - Igor Jurisica
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute, University Health Network, Toronto, ON, Canada.
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto Western Hospital, University Health Network, Toronto, ON, Canada.
- Departments of Medical Biophysics and Computer Science, University of Toronto, Toronto, ON, Canada.
- Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia.
| |
Collapse
|
4
|
Schreiter T, Gieseler RK, Vílchez-Vargas R, Jauregui R, Sowa JP, Klein-Scory S, Broering R, Croner RS, Treckmann JW, Link A, Canbay A. Transcriptome-Wide Analysis of Human Liver Reveals Age-Related Differences in the Expression of Select Functional Gene Clusters and Evidence for a PPP1R10-Governed 'Aging Cascade'. Pharmaceutics 2021; 13:pharmaceutics13122009. [PMID: 34959291 PMCID: PMC8709089 DOI: 10.3390/pharmaceutics13122009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 11/17/2021] [Accepted: 11/21/2021] [Indexed: 12/27/2022] Open
Abstract
A transcriptome-wide analysis of human liver for demonstrating differences between young and old humans has not yet been performed. However, identifying major age-related alterations in hepatic gene expression may pinpoint ontogenetic shifts with important hepatic and systemic consequences, provide novel pharmacogenetic information, offer clues to efficiently counteract symptoms of old age, and improve the overarching understanding of individual decline. Next-generation sequencing (NGS) data analyzed by the Mann-Whitney nonparametric test and Ensemble Feature Selection (EFS) bioinformatics identified 44 transcripts among 60,617 total and 19,986 protein-encoding transcripts that significantly (p = 0.0003 to 0.0464) and strikingly (EFS score > 0.3:16 transcripts; EFS score > 0.2:28 transcripts) differ between young and old livers. Most of these age-related transcripts were assigned to the categories 'regulome', 'inflammaging', 'regeneration', and 'pharmacogenes'. NGS results were confirmed by quantitative real-time polymerase chain reaction. Our results have important implications for the areas of ontogeny/aging and the age-dependent increase in major liver diseases. Finally, we present a broadly substantiated and testable hypothesis on a genetically governed 'aging cascade', wherein PPP1R10 acts as a putative ontogenetic master regulator, prominently flanked by IGFALS and DUSP1. This transcriptome-wide analysis of human liver offers potential clues towards developing safer and improved therapeutic interventions against major liver diseases and increased insights into key mechanisms underlying aging.
Collapse
Affiliation(s)
- Thomas Schreiter
- Department of Medicine, University Hospital Knappschaftskrankenhaus Bochum, Ruhr University Bochum, 44892 Bochum, Germany; (T.S.); (R.K.G.); (J.-P.S.); (S.K.-S.)
- Laboratory of Immunology & Molecular Biology, University Hospital Knappschaftskrankenhaus Bochum, Ruhr University Bochum, 44892 Bochum, Germany
| | - Robert K. Gieseler
- Department of Medicine, University Hospital Knappschaftskrankenhaus Bochum, Ruhr University Bochum, 44892 Bochum, Germany; (T.S.); (R.K.G.); (J.-P.S.); (S.K.-S.)
- Laboratory of Immunology & Molecular Biology, University Hospital Knappschaftskrankenhaus Bochum, Ruhr University Bochum, 44892 Bochum, Germany
| | - Ramiro Vílchez-Vargas
- Department of Gastroenterology, Hepatology, and Infectious Diseases, Medical Faculty, Otto-von-Guericke University, 39120 Magdeburg, Germany; (R.V.-V.); (A.L.)
| | - Ruy Jauregui
- Data Science Grasslands, Grasslands Research Centre, AgResearch, Palmerston North 4410, New Zealand;
| | - Jan-Peter Sowa
- Department of Medicine, University Hospital Knappschaftskrankenhaus Bochum, Ruhr University Bochum, 44892 Bochum, Germany; (T.S.); (R.K.G.); (J.-P.S.); (S.K.-S.)
- Laboratory of Immunology & Molecular Biology, University Hospital Knappschaftskrankenhaus Bochum, Ruhr University Bochum, 44892 Bochum, Germany
| | - Susanne Klein-Scory
- Department of Medicine, University Hospital Knappschaftskrankenhaus Bochum, Ruhr University Bochum, 44892 Bochum, Germany; (T.S.); (R.K.G.); (J.-P.S.); (S.K.-S.)
- Laboratory of Immunology & Molecular Biology, University Hospital Knappschaftskrankenhaus Bochum, Ruhr University Bochum, 44892 Bochum, Germany
| | - Ruth Broering
- Department of Gastroenterology and Hepatology, University Hospital Essen, University of Duisburg-Essen, 45147 Essen, Germany;
| | - Roland S. Croner
- Department of General, Visceral, Vascular and Transplantation Surgery, Medical Faculty, Otto-von-Guericke University, 39120 Magdeburg, Germany;
| | - Jürgen W. Treckmann
- Department of General, Visceral and Transplantation Surgery, University Hospital Essen, University of Duisburg-Essen, 45147 Essen, Germany;
| | - Alexander Link
- Department of Gastroenterology, Hepatology, and Infectious Diseases, Medical Faculty, Otto-von-Guericke University, 39120 Magdeburg, Germany; (R.V.-V.); (A.L.)
| | - Ali Canbay
- Department of Medicine, University Hospital Knappschaftskrankenhaus Bochum, Ruhr University Bochum, 44892 Bochum, Germany; (T.S.); (R.K.G.); (J.-P.S.); (S.K.-S.)
- Section of Hepatology and Gastroenterology, University Hospital Knappschaftskrankenhaus Bochum, Ruhr University Bochum, 44892 Bochum, Germany
- Correspondence: ; Tel.: +49-234-299-3401
| |
Collapse
|
5
|
Liu M, Li Q, Zhao N. Identification of a prognostic chemoresistance-related gene signature associated with immune microenvironment in breast cancer. Bioengineered 2021; 12:8419-8434. [PMID: 34661511 PMCID: PMC8806919 DOI: 10.1080/21655979.2021.1977768] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Breast cancer is the most common form of cancer among women globally, and chemoresistance is a major challenge to disease treatment that is associated with a poor prognosis. This study was formulated to identify a reliable prognostic biosignature capable of predicting the survival of patients with chemoresistant breast cancer (CRBC) and evaluating the associated tumor immune microenvironment. Through a series of protein-protein interaction and weighted correlation network analyses, genes that were significantly associated with breast cancer chemoresistance were identified. Moreover, univariate Cox regression and lasso-penalized Cox regression analyses were employed to generate a prognostic model, and the prognostic utility of this model was then assessed using time-dependent receiver operating characteristic (ROC) and Kaplan-Meier survival curves. Finally, The CIBERSORT and ESTIMATE algorithms were additionally leveraged to assess relationships between the tumor immune microenvironment and patient prognostic signatures. Overall, a multigenic prognostic biosignature capable of predicting CRBC patient risk was successfully developed based on bioinformatics analysis and in vitro experiments. This biosignature was able to stratify CRBC patients into high- and low-risk subgroups. ROC curves also revealed that this biosignature achieved high diagnostic efficiency, and multivariate regression analyses indicated that this risk signature was an independent risk factor linked to CRBC patient outcomes. In addition, this signature was associated with the infiltration of the tumor microenvironment by multiple immune cell types. In conclusion, the chemoresistance-associated prognostic gene signature developed herein was able to effectively evaluate the prognosis of CRBC patients and to reflect the overall composition of the tumor immune microenvironment.
Collapse
Affiliation(s)
- Mingzhou Liu
- Department of Pharmacy, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, Henan, China.,Tissue Engineering Laboratory, Henan Eye Institute, Henan Eye Hospital, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, China
| | - Qiaoyan Li
- Department of Pharmacy, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, Henan, China
| | - Ningmin Zhao
- Department of Pharmacy, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, Henan, China
| |
Collapse
|
6
|
Adnan N, Lei C, Ruan J. Robust edge-based biomarker discovery improves prediction of breast cancer metastasis. BMC Bioinformatics 2020; 21:359. [PMID: 32998692 PMCID: PMC7526355 DOI: 10.1186/s12859-020-03692-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background The abundance of molecular profiling of breast cancer tissues entailed active research on molecular marker-based early diagnosis of metastasis. Recently there is a surging interest in combining gene expression with gene networks such as protein-protein interaction (PPI) network, gene co-expression (CE) network and pathway information to identify robust and accurate biomarkers for metastasis prediction, reflecting the common belief that cancer is a systems biology disease. However, controversy exists in the literature regarding whether network markers are indeed better features than genes alone for predicting as well as understanding metastasis. We believe much of the existing results may have been biased by the overly complicated prediction algorithms, unfair evaluation, and lack of rigorous statistics. In this study, we propose a simple approach to use network edges as features, based on two types of networks respectively, and compared their prediction power using three classification algorithms and rigorous statistical procedure on one of the largest datasets available. To detect biomarkers that are significant for the prediction and to compare the robustness of different feature types, we propose an unbiased and novel procedure to measure feature importance that eliminates the potential bias from factors such as different sample size, number of features, as well as class distribution. Results Experimental results reveal that edge-based feature types consistently outperformed gene-based feature type in random forest and logistic regression models under all performance evaluation metrics, while the prediction accuracy of edge-based support vector machine (SVM) model was poorer, due to the larger number of edge features compared to gene features and the lack of feature selection in SVM model. Experimental results also show that edge features are much more robust than gene features and the top biomarkers from edge feature types are statistically more significantly enriched in the biological processes that are well known to be related to breast cancer metastasis. Conclusions Overall, this study validates the utility of edge features as biomarkers but also highlights the importance of carefully designed experimental procedures in order to achieve statistically reliable comparison results.
Collapse
Affiliation(s)
- Nahim Adnan
- Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, 78249, TX, USA
| | - Chengwei Lei
- Department of Computer & Electrical Engineering/Computer Science, California State University, Bakersfield, 9001 Stockdale Highway, Bakersfield, 93311, CA, USA
| | - Jianhua Ruan
- Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, 78249, TX, USA.
| |
Collapse
|
7
|
Zhang C, Mathé E, Ning X, Zhao Z, Wang K, Li L, Guo Y. The International Conference on Intelligent Biology and Medicine 2019 (ICIBM 2019): computational methods and applications in medical genomics. BMC Med Genomics 2020; 13:47. [PMID: 32241271 PMCID: PMC7119270 DOI: 10.1186/s12920-020-0678-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
In this editorial, we briefly summarized the International Conference on Intelligent Biology and Medicine 2019 (ICIBM 2019) that was held on June 9-11, 2019 at Columbus, Ohio, USA. We further introduced the 19 research articles included in this supplement issue, covering four major areas, namely computational method development, genomics analysis, network-based analysis and biomarker prediction. The selected papers perform cutting edge computational research applied to a broad range of human diseases such as cancer, neural degenerative and chronic inflammatory disease. They also proposed solutions for fundamental medical genomics problems range from basic data processing and quality control to functional interpretation, biomarker and drug prediction, and database releasing.
Collapse
Affiliation(s)
- Chi Zhang
- Department of Medical & Molecular Genetics, School of Medicine, Indiana University, Indianapolis, IN 46202 USA
| | - Ewy Mathé
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210 USA
| | - Xia Ning
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210 USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Lang Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210 USA
| | - Yan Guo
- Department of internal medicine, comprehensive cancer center, University of New Mexico, Albuquerque, NM 87131 USA
| |
Collapse
|