1
|
Fajarda O, Almeida JR, Duarte-Pereira S, Silva RM, Oliveira JL. Methodology to identify a gene expression signature by merging microarray datasets. Comput Biol Med 2023; 159:106867. [PMID: 37060770 DOI: 10.1016/j.compbiomed.2023.106867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 03/01/2023] [Accepted: 03/30/2023] [Indexed: 04/17/2023]
Abstract
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.
Collapse
Affiliation(s)
- Olga Fajarda
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal.
| | - João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Sara Duarte-Pereira
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, Portugal.
| | - Raquel M Silva
- Universidade Católica Portuguesa, Faculty of Dental Medicine (FMD), Center for Interdisciplinary Research in Health (CIIS), Viseu, Portugal.
| | | |
Collapse
|
2
|
Dandekar T, Kunz M. We Can Think About Ourselves – The Computer Cannot. Bioinformatics 2023. [DOI: 10.1007/978-3-662-65036-3_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
|
3
|
Li W, Hong T, Liu W, Dong S, Wang H, Tang ZR, Li W, Wang B, Hu Z, Liu Q, Qin Y, Yin C. Development of a Machine Learning-Based Predictive Model for Lung Metastasis in Patients With Ewing Sarcoma. Front Med (Lausanne) 2022; 9:807382. [PMID: 35433754 PMCID: PMC9011057 DOI: 10.3389/fmed.2022.807382] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 03/07/2022] [Indexed: 12/11/2022] Open
Abstract
Background This study aimed to develop and validate machine learning (ML)-based prediction models for lung metastasis (LM) in patients with Ewing sarcoma (ES), and to deploy the best model as an open access web tool. Methods We retrospectively analyzed data from the Surveillance Epidemiology and End Results (SEER) Database from 2010 to 2016 and from four medical institutions to develop and validate predictive models for LM in patients with ES. Patient data from the SEER database was used as the training group (n = 929). Using demographic and clinicopathologic variables six ML-based models for predicting LM were developed, and internally validated using 10-fold cross validation. All ML-based models were subsequently externally validated using multiple data from four medical institutions (the validation group, n = 51). The predictive power of the models was evaluated by the area under receiver operating characteristic curve (AUC). The best-performing model was used to produce an online tool for use by clinicians to identify ES patients at risk from lung metastasis, to improve decision making and optimize individual treatment. Results The study cohort consisted of 929 patients from the SEER database and 51 patients from multiple medical centers, a total of 980 ES patients. Of these, 175 (18.8%) had lung metastasis. Multivariate logistic regression analysis was performed with survival time, T-stage, N-stage, surgery, and bone metastasis providing the independent predictive factors of LM. The AUC value of six predictive models ranged from 0.585 to 0.705. The Random Forest (RF) model (AUC = 0.705) using 4 variables was identified as the best predictive model of LM in ES patients and was employed to construct an online tool to assist clinicians in optimizing patient treatment. (https://share.streamlit.io/liuwencai123/es_lm/main/es_lm.py). Conclusions Machine learning were found to have utility for predicting LM in patients with Ewing sarcoma, and the RF model gave the best performance. The accessibility of the predictive model as a web-based tool offers clear opportunities for improving the personalized treatment of patients with ES.
Collapse
Affiliation(s)
- Wenle Li
- Department of Orthopedics, Xianyang Central Hospital, Xianyang, China
- Clinical Medical Research Center, Xianyang Central Hospital, Xianyang, China
| | - Tao Hong
- Department of Cardiac Surgery, Fuwai Hospital Chinese Academy of Medical Sciences, Shenzhen, Shenzhen, China
| | - Wencai Liu
- Department of Orthopaedic Surgery, the First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Shengtao Dong
- Department of Spine Surgery, Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Haosheng Wang
- Department of Orthopaedics, The Second Hospital of Jilin University, Changchun, China
| | - Zhi-Ri Tang
- School of Physics and Technology, Wuhan University, Wuhan, China
| | - Wanying Li
- Clinical Medical Research Center, Xianyang Central Hospital, Xianyang, China
| | - Bing Wang
- Clinical Medical Research Center, Xianyang Central Hospital, Xianyang, China
| | - Zhaohui Hu
- Department of Spinal Surgery, Liuzhou People's Hospital, Liuzhou, China
| | - Qiang Liu
- Department of Orthopedics, Xianyang Central Hospital, Xianyang, China
- Qiang Liu
| | - Yong Qin
- Department of Orthopedics Surgery, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
- Yong Qin
| | - Chengliang Yin
- Faculty of Medicine, Macau University of Science and Technology, Macau, Macau SAR, China
- *Correspondence: Chengliang Yin
| |
Collapse
|
4
|
Alhebshi H, Tian K, Patnaik L, Taylor R, Bezecny P, Hall C, Muller PAJ, Safari N, Creamer DPM, Demonacos C, Mutti L, Bittar MN, Krstic-Demonacos M. Evaluation of the Role of p53 Tumour Suppressor Posttranslational Modifications and TTC5 Cofactor in Lung Cancer. Int J Mol Sci 2021; 22:ijms222413198. [PMID: 34947995 PMCID: PMC8707832 DOI: 10.3390/ijms222413198] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 11/29/2021] [Accepted: 12/02/2021] [Indexed: 01/09/2023] Open
Abstract
Mutations in the p53 tumor suppressor are found in over 50% of cancers. p53 function is controlled through posttranslational modifications and cofactor interactions. In this study, we investigated the posttranslationally modified p53, including p53 acetylated at lysine 382 (K382), p53 phosphorylated at serine 46 (S46), and the p53 cofactor TTC5/STRAP (Tetratricopeptide repeat domain 5/ Stress-responsive activator of p300-TTC5) proteins in lung cancer. Immunohistochemical (IHC) analysis of lung cancer tissues from 250 patients was carried out and the results were correlated with clinicopathological features. Significant associations between total or modified p53 with a higher grade of the tumour and shorter overall survival (OS) probability were detected, suggesting that mutant and/or modified p53 acts as an oncoprotein in these patients. Acetylated at K382 p53 was predominantly nuclear in some samples and cytoplasmic in others. The localization of the K382 acetylated p53 was significantly associated with the gender and grade of the disease. The TTC5 protein levels were significantly associated with the grade, tumor size, and node involvement in a complex manner. SIRT1 expression was evaluated in 50 lung cancer patients and significant positive correlation was found with p53 S46 intensity, whereas negative TTC5 staining was associated with SIRT1 expression. Furthermore, p53 protein levels showed positive association with poor OS, whereas TTC5 protein levels showed positive association with better OS outcome. Overall, our results indicate that an analysis of p53 modified versions together with TTC5 expression, upon testing on a larger sample size of patients, could serve as useful prognostic factors or drug targets for lung cancer treatment.
Collapse
Affiliation(s)
- Hasen Alhebshi
- School of Science, Engineering and Environment, University of Salford, Cockcroft Building 305, Manchester M5 4WT, UK; (H.A.); (N.S.); (D.P.M.C.)
| | - Kun Tian
- Institute of Biological Anthropology, School of Basical Medical Science, Jinzhou Medical University, Jinzhou 121001, China;
| | - Lipsita Patnaik
- Blackpool Teaching Hospitals NHS Foundation Trust, Blackpool FY3 8NR, UK; (L.P.); (R.T.); (P.B.); (M.N.B.)
| | - Rebecca Taylor
- Blackpool Teaching Hospitals NHS Foundation Trust, Blackpool FY3 8NR, UK; (L.P.); (R.T.); (P.B.); (M.N.B.)
| | - Pavel Bezecny
- Blackpool Teaching Hospitals NHS Foundation Trust, Blackpool FY3 8NR, UK; (L.P.); (R.T.); (P.B.); (M.N.B.)
| | - Callum Hall
- Cancer Research UK Manchester Institute, The University of Manchester, Alderley Park, Manchester SK10 4TG, UK; (C.H.); (P.A.J.M.)
| | - Patricia Anthonia Johanna Muller
- Cancer Research UK Manchester Institute, The University of Manchester, Alderley Park, Manchester SK10 4TG, UK; (C.H.); (P.A.J.M.)
| | - Nazila Safari
- School of Science, Engineering and Environment, University of Salford, Cockcroft Building 305, Manchester M5 4WT, UK; (H.A.); (N.S.); (D.P.M.C.)
| | - Delta Patricia Menendez Creamer
- School of Science, Engineering and Environment, University of Salford, Cockcroft Building 305, Manchester M5 4WT, UK; (H.A.); (N.S.); (D.P.M.C.)
| | - Constantinos Demonacos
- Division of Pharmacy and Optometry, Faculty of Biology, Medicine and Health, School of Health Sciences, The University of Manchester, Stopford Building, 3.124 Oxford Road, Manchester M13 9PT, UK;
| | - Luciano Mutti
- Center for Biotechnology, Sbarro Institute for Cancer Research and Molecular Medicine, College of Science and Technology, Temple University, Philadelphia, PA 19122, USA;
| | - Mohamad Nidal Bittar
- Blackpool Teaching Hospitals NHS Foundation Trust, Blackpool FY3 8NR, UK; (L.P.); (R.T.); (P.B.); (M.N.B.)
| | - Marija Krstic-Demonacos
- School of Science, Engineering and Environment, University of Salford, Cockcroft Building 305, Manchester M5 4WT, UK; (H.A.); (N.S.); (D.P.M.C.)
- Correspondence:
| |
Collapse
|
5
|
Artificial Intelligence Identifies an Urgent Need for Peripheral Vascular Intervention by Multiplexing Standard Clinical Parameters. Biomedicines 2021; 9:biomedicines9101456. [PMID: 34680572 PMCID: PMC8533252 DOI: 10.3390/biomedicines9101456] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 09/30/2021] [Accepted: 10/04/2021] [Indexed: 01/22/2023] Open
Abstract
Background: Peripheral artery disease (PAD) is a significant burden, particularly among patients with severe disease requiring invasive treatment. We applied a general Machine Learning (ML) workflow and investigated if a multi-dimensional marker set of standard clinical parameters can identify patients in need of vascular intervention without specialized intra–hospital diagnostics. Methods: This is a retrospective study involving patients with stable PAD (sPAD, Fontaine Class I and II, n = 38) and unstable PAD (unPAD, Fontaine Class III and IV, n = 18) in need of invasive therapeutic measures. ML algorithms such as Random Forest were utilized to evaluate a matrix consisting of multiple routinely clinically available parameters (age, complete blood count, inflammation, lipid, iron metabolism). Results: ML has enabled a generation of an Artificial Intelligence (AI) PAD score (AI-PAD) that successfully divided sPAD from unPAD patients (high AI-PAD in sPAD, low AI-PAD in unPAD, cutoff at 50 AI-PAD units). Furthermore, the probability score positively coincided with gold-standard intra-hospital mean ankle-brachial index (ABI). Conclusion: AI-based tools may be promising to enable the correct identification of patients with unstable PAD by using existing clinical information, thus supplementing clinical decision making. Additional studies in larger prospective cohorts are necessary to determine the usefulness of this approach in comparison to standard diagnostic measures.
Collapse
|
6
|
März J, Kurlbaum M, Roche-Lancaster O, Deutschbein T, Peitzsch M, Prehn C, Weismann D, Robledo M, Adamski J, Fassnacht M, Kunz M, Kroiss M. Plasma Metabolome Profiling for the Diagnosis of Catecholamine Producing Tumors. Front Endocrinol (Lausanne) 2021; 12:722656. [PMID: 34557163 PMCID: PMC8453166 DOI: 10.3389/fendo.2021.722656] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 08/09/2021] [Indexed: 12/11/2022] Open
Abstract
CONTEXT Pheochromocytomas and paragangliomas (PPGL) cause catecholamine excess leading to a characteristic clinical phenotype. Intra-individual changes at metabolome level have been described after surgical PPGL removal. The value of metabolomics for the diagnosis of PPGL has not been studied yet. OBJECTIVE Evaluation of quantitative metabolomics as a diagnostic tool for PPGL. DESIGN Targeted metabolomics by liquid chromatography-tandem mass spectrometry of plasma specimens and statistical modeling using ML-based feature selection approaches in a clinically well characterized cohort study. PATIENTS Prospectively enrolled patients (n=36, 17 female) from the Prospective Monoamine-producing Tumor Study (PMT) with hormonally active PPGL and 36 matched controls in whom PPGL was rigorously excluded. RESULTS Among 188 measured metabolites, only without considering false discovery rate, 4 exhibited statistically significant differences between patients with PPGL and controls (histidine p=0.004, threonine p=0.008, lyso PC a C28:0 p=0.044, sum of hexoses p=0.018). Weak, but significant correlations for histidine, threonine and lyso PC a C28:0 with total urine catecholamine levels were identified. Only the sum of hexoses (reflecting glucose) showed significant correlations with plasma metanephrines.By using ML-based feature selection approaches, we identified diagnostic signatures which all exhibited low accuracy and sensitivity. The best predictive value (sensitivity 87.5%, accuracy 67.3%) was obtained by using Gradient Boosting Machine Modelling. CONCLUSIONS The diabetogenic effect of catecholamine excess dominates the plasma metabolome in PPGL patients. While curative surgery for PPGL led to normalization of catecholamine-induced alterations of metabolomics in individual patients, plasma metabolomics are not useful for diagnostic purposes, most likely due to inter-individual variability.
Collapse
Affiliation(s)
- Juliane März
- Department of Internal Medicine I, Division of Endocrinology and Diabetes, University Hospital, University of Würzburg, Würzburg, Germany
| | - Max Kurlbaum
- Department of Internal Medicine I, Division of Endocrinology and Diabetes, University Hospital, University of Würzburg, Würzburg, Germany
- Core Unit Clinical Mass Spectrometry, University Hospital, Würzburg, Germany
- *Correspondence: Matthias Kroiss, ; Max Kurlbaum,
| | - Oisin Roche-Lancaster
- Chair of Medical Informatics, Friedrich-Alexander University (FAU) of Erlangen-Nürnberg, Erlangen, Germany
- Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Erlangen, Germany
- Comprehensive Cancer Center Erlangen-Europäische Metropolregion Nürnberg (CCC ER-EMN), Erlangen, Germany
| | - Timo Deutschbein
- Department of Internal Medicine I, Division of Endocrinology and Diabetes, University Hospital, University of Würzburg, Würzburg, Germany
- Medicover Oldenburg Medizinisches Versorgungszentrum (MVZ), Oldenburg, Germany
| | - Mirko Peitzsch
- Institute of Clinical Chemistry and Laboratory Medicine, University Hospital Carl Gustav Carus at Technische Universität (TU) Dresden, Dresden, Germany
| | - Cornelia Prehn
- Metabolomics and Proteomics Core, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Dirk Weismann
- Department of Internal Medicine I, Division of Endocrinology and Diabetes, University Hospital, University of Würzburg, Würzburg, Germany
| | - Mercedes Robledo
- Hereditary Endocrine Cancer Group, Spanish National Cancer Research Center, Madrid, Spain
- Hereditary Endocrine Cancer Group, Spanish National Cancer Research Center and Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Madrid, Spain
| | - Jerzy Adamski
- Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Martin Fassnacht
- Department of Internal Medicine I, Division of Endocrinology and Diabetes, University Hospital, University of Würzburg, Würzburg, Germany
- Core Unit Clinical Mass Spectrometry, University Hospital, Würzburg, Germany
- Cancer Center Mainfranken, University of Würzburg, Würzburg, Germany
| | - Meik Kunz
- Chair of Medical Informatics, Friedrich-Alexander University (FAU) of Erlangen-Nürnberg, Erlangen, Germany
- Fraunhofer Institute of Toxicology and Experimental Medicine, Hannover, Germany
| | - Matthias Kroiss
- Department of Internal Medicine I, Division of Endocrinology and Diabetes, University Hospital, University of Würzburg, Würzburg, Germany
- Core Unit Clinical Mass Spectrometry, University Hospital, Würzburg, Germany
- Department of Internal Medicine IV, University Hospital Munich, Ludwig-Maximilians-Universität München, Munich, Germany
- *Correspondence: Matthias Kroiss, ; Max Kurlbaum,
| |
Collapse
|
7
|
Machine Learning Applied to Diagnosis of Human Diseases: A Systematic Review. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10155135] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Human healthcare is one of the most important topics for society. It tries to find the correct effective and robust disease detection as soon as possible to patients receipt the appropriate cares. Because this detection is often a difficult task, it becomes necessary medicine field searches support from other fields such as statistics and computer science. These disciplines are facing the challenge of exploring new techniques, going beyond the traditional ones. The large number of techniques that are emerging makes it necessary to provide a comprehensive overview that avoids very particular aspects. To this end, we propose a systematic review dealing with the Machine Learning applied to the diagnosis of human diseases. This review focuses on modern techniques related to the development of Machine Learning applied to diagnosis of human diseases in the medical field, in order to discover interesting patterns, making non-trivial predictions and useful in decision-making. In this way, this work can help researchers to discover and, if necessary, determine the applicability of the machine learning techniques in their particular specialties. We provide some examples of the algorithms used in medicine, analysing some trends that are focused on the goal searched, the algorithm used, and the area of applications. We detail the advantages and disadvantages of each technique to help choose the most appropriate in each real-life situation, as several authors have reported. The authors searched Scopus, Journal Citation Reports (JCR), Google Scholar, and MedLine databases from the last decades (from 1980s approximately) up to the present, with English language restrictions, for studies according to the objectives mentioned above. Based on a protocol for data extraction defined and evaluated by all authors using PRISMA methodology, 141 papers were included in this advanced review.
Collapse
|
8
|
Applications of Bioinformatics in Cancer. Cancers (Basel) 2019; 11:cancers11111630. [PMID: 31652939 PMCID: PMC6893424 DOI: 10.3390/cancers11111630] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 10/23/2019] [Indexed: 01/02/2023] Open
|