1
|
Cattelani L, Fortino V. Dual-stage optimizer for systematic overestimation adjustment applied to multi-objective genetic algorithms for biomarker selection. Brief Bioinform 2024; 26:bbae674. [PMID: 39737563 DOI: 10.1093/bib/bbae674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 11/27/2024] [Accepted: 12/13/2024] [Indexed: 01/01/2025] Open
Abstract
The selection of biomarker panels in omics data, challenged by numerous molecular features and limited samples, often requires the use of machine learning methods paired with wrapper feature selection techniques, like genetic algorithms. They test various feature sets-potential biomarker solutions-to fine-tune a machine learning model's performance for supervised tasks, such as classifying cancer subtypes. This optimization process is undertaken using validation sets to evaluate and identify the most effective feature combinations. Evaluations have performance estimation error, measurable as discrepancy between validation and test set performance, and when the selection involves many models the best ones are almost certainly overestimated. This issue is also relevant in a multi-objective feature selection process where various characteristics of the biomarker panels are optimized, such as predictive performances and feature set size. Methods have been proposed to reduce the overestimation after a model has already been selected in single-objective problems, but no algorithm existed capable of reducing the overestimation during the optimization, improving model selection, or applied in the more general multi-objective domain. We propose Dual-stage Optimizer for Systematic overestimation Adjustment in Multi-Objective problems (DOSA-MO), a novel multi-objective optimization wrapper algorithm that learns how the original estimation, its variance, and the feature set size of the solutions predict the overestimation. DOSA-MO adjusts the expectation of the performance during the optimization, improving the composition of the solution set. We verify that DOSA-MO improves the performance of a state-of-the-art genetic algorithm on left-out or external sample sets, when predicting cancer subtypes and/or patient overall survival, using three transcriptomics datasets for kidney and breast cancer.
Collapse
Affiliation(s)
- Luca Cattelani
- School of Medicine, Institute of Biomedicine, University of Eastern Finland, Yliopistonranta 1, PO Box 1627, 70211 Kuopio, Finland
| | - Vittorio Fortino
- School of Medicine, Institute of Biomedicine, University of Eastern Finland, Yliopistonranta 1, PO Box 1627, 70211 Kuopio, Finland
| |
Collapse
|
2
|
Cattelani L, Fortino V. Triple and quadruple optimization for feature selection in cancer biomarker discovery. J Biomed Inform 2024; 159:104736. [PMID: 39395708 DOI: 10.1016/j.jbi.2024.104736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 10/07/2024] [Accepted: 10/09/2024] [Indexed: 10/14/2024]
Abstract
The proliferation of omics data has advanced cancer biomarker discovery but often falls short in external validation, mainly due to a narrow focus on prediction accuracy that neglects clinical utility and validation feasibility. We introduce three- and four-objective optimization strategies based on genetic algorithms to identify clinically actionable biomarkers in omics studies, addressing classification tasks aimed at distinguishing hard-to-differentiate cancer subtypes beyond histological analysis alone. Our hypothesis is that by optimizing more than one characteristic of cancer biomarkers, we may identify biomarkers that will enhance their success in external validation. Our objectives are to: (i) assess the biomarker panel's accuracy using a machine learning (ML) framework; (ii) ensure the biomarkers exhibit significant fold-changes across subtypes, thereby boosting the success rate of PCR or immunohistochemistry validations; (iii) select a concise set of biomarkers to simplify the validation process and reduce clinical costs; and (iv) identify biomarkers crucial for predicting overall survival, which plays a significant role in determining the prognostic value of cancer subtypes. We implemented and applied triple and quadruple optimization algorithms to renal carcinoma gene expression data from TCGA. The study targets kidney cancer subtypes that are difficult to distinguish through histopathology methods. Selected RNA-seq biomarkers were assessed against the gold standard method, which relies solely on clinical information, and in external microarray-based validation datasets. Notably, these biomarkers achieved over 0.8 of accuracy in external validations and added significant value to survival predictions, outperforming the use of clinical data alone with a superior c-index. The provided tool also helps explore the trade-off between objectives, offering multiple solutions for clinical evaluation before proceeding to costly validation or clinical trials.
Collapse
Affiliation(s)
- L Cattelani
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, 70210 Kuopio, Finland
| | - V Fortino
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, 70210 Kuopio, Finland.
| |
Collapse
|
3
|
Cattelani L, Ghosh A, Rintala TJ, Fortino V. A Comprehensive Evaluation Framework for Benchmarking Multi-Objective Feature Selection in Omics-Based Biomarker Discovery. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2432-2446. [PMID: 39401114 DOI: 10.1109/tcbb.2024.3480150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2024]
Abstract
Machine learning algorithms have been extensively used for accurate classification of cancer subtypes driven by gene expression-based biomarkers. However, biomarker models combining multiple gene expression signatures are often not reproducible in external validation datasets and their feature set size is often not optimized, jeopardizing their translatability into cost-effective clinical tools. We investigated how to solve the multi-objective problem of finding the best trade-offs between classification performance and set size applying seven algorithms for machine learning-driven feature subset selection and analyse how they perform in a benchmark with eight large-scale transcriptome datasets of cancer, covering both training and external validation sets. The benchmark includes evaluation metrics assessing the performance of the individual biomarkers and the solution sets, according to their accuracy, diversity, and stability of the composing genes. Moreover, a new evaluation metric for cross-validation studies is proposed that generalizes the hypervolume, which is commonly used to assess the performance of multi-objective optimization algorithms. Biomarkers exhibiting 0.8 of balanced accuracy on the external dataset for breast, kidney and ovarian cancer using respectively 4, 2 and 7 features, were obtained. Genetic algorithms often provided better performance than other considered algorithms, and the recently proposed NSGA2-CH and NSGA2-CHS were the best performing methods in most cases.
Collapse
|
4
|
Vitorino R. Transforming Clinical Research: The Power of High-Throughput Omics Integration. Proteomes 2024; 12:25. [PMID: 39311198 PMCID: PMC11417901 DOI: 10.3390/proteomes12030025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 08/31/2024] [Accepted: 09/02/2024] [Indexed: 09/26/2024] Open
Abstract
High-throughput omics technologies have dramatically changed biological research, providing unprecedented insights into the complexity of living systems. This review presents a comprehensive examination of the current landscape of high-throughput omics pipelines, covering key technologies, data integration techniques and their diverse applications. It looks at advances in next-generation sequencing, mass spectrometry and microarray platforms and highlights their contribution to data volume and precision. In addition, this review looks at the critical role of bioinformatics tools and statistical methods in managing the large datasets generated by these technologies. By integrating multi-omics data, researchers can gain a holistic understanding of biological systems, leading to the identification of new biomarkers and therapeutic targets, particularly in complex diseases such as cancer. The review also looks at the integration of omics data into electronic health records (EHRs) and the potential for cloud computing and big data analytics to improve data storage, analysis and sharing. Despite significant advances, there are still challenges such as data complexity, technical limitations and ethical issues. Future directions include the development of more sophisticated computational tools and the application of advanced machine learning techniques, which are critical for addressing the complexity and heterogeneity of omics datasets. This review aims to serve as a valuable resource for researchers and practitioners, highlighting the transformative potential of high-throughput omics technologies in advancing personalized medicine and improving clinical outcomes.
Collapse
Affiliation(s)
- Rui Vitorino
- iBiMED, Department of Medical Sciences, University of Aveiro, 3810-193 Aveiro, Portugal;
- Department of Surgery and Physiology, Cardiovascular R&D Centre—UnIC@RISE, Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal
| |
Collapse
|
5
|
Alireza Z, Maleeha M, Kaikkonen M, Fortino V. Enhancing prediction accuracy of coronary artery disease through machine learning-driven genomic variant selection. J Transl Med 2024; 22:356. [PMID: 38627847 PMCID: PMC11020205 DOI: 10.1186/s12967-024-05090-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 03/14/2024] [Indexed: 04/19/2024] Open
Abstract
Machine learning (ML) methods are increasingly becoming crucial in genome-wide association studies for identifying key genetic variants or SNPs that statistical methods might overlook. Statistical methods predominantly identify SNPs with notable effect sizes by conducting association tests on individual genetic variants, one at a time, to determine their relationship with the target phenotype. These genetic variants are then used to create polygenic risk scores (PRSs), estimating an individual's genetic risk for complex diseases like cancer or cardiovascular disorders. Unlike traditional methods, ML algorithms can identify groups of low-risk genetic variants that improve prediction accuracy when combined in a mathematical model. However, the application of ML strategies requires addressing the feature selection challenge to prevent overfitting. Moreover, ensuring the ML model depends on a concise set of genomic variants enhances its clinical applicability, where testing is feasible for only a limited number of SNPs. In this study, we introduce a robust pipeline that applies ML algorithms in combination with feature selection (ML-FS algorithms), aimed at identifying the most significant genomic variants associated with the coronary artery disease (CAD) phenotype. The proposed computational approach was tested on individuals from the UK Biobank, differentiating between CAD and non-CAD individuals within this extensive cohort, and benchmarked against standard PRS-based methodologies like LDpred2 and Lassosum. Our strategy incorporates cross-validation to ensure a more robust evaluation of genomic variant-based prediction models. This method is commonly applied in machine learning strategies but has often been neglected in previous studies assessing the predictive performance of polygenic risk scores. Our results demonstrate that the ML-FS algorithm can identify panels with as few as 50 genetic markers that can achieve approximately 80% accuracy when used in combination with known risk factors. The modest increase in accuracy over PRS performances is noteworthy, especially considering that PRS models incorporate a substantially larger number of genetic variants. This extensive variant selection can pose practical challenges in clinical settings. Additionally, the proposed approach revealed novel CAD-genetic variant associations.
Collapse
Affiliation(s)
- Z Alireza
- Institute of Biomedicine, University of Eastern Finland, 70210, Kuopio, Finland
| | - M Maleeha
- Institute of Biomedicine, University of Eastern Finland, 70210, Kuopio, Finland
| | - M Kaikkonen
- A.I.Virtanen Institute, University of Eastern Finland, 70210, Kuopio, Finland
| | - V Fortino
- Institute of Biomedicine, University of Eastern Finland, 70210, Kuopio, Finland.
| |
Collapse
|
6
|
Park A, Nam S. miRDM-rfGA: Genetic algorithm-based identification of a miRNA set for detecting type 2 diabetes. BMC Med Genomics 2023; 16:195. [PMID: 37608331 PMCID: PMC10463588 DOI: 10.1186/s12920-023-01636-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 08/17/2023] [Indexed: 08/24/2023] Open
Abstract
BACKGROUND Type 2 diabetes mellitus (T2DM) affects approximately 451 million adults globally. In this study, we identified the optimal combination of marker candidates for detecting T2DM using miRNA-Seq data from 95 samples including T2DM and healthy individuals. METHODS We utilized the genetic algorithm (GA) in the discovery of an optimal miRNA biomarker set. We discovered miRNA subsets consisting of three miRNAs for detecting T2DM by random forest-based GA (miRDM-rfGA) as a feature selection algorithm and created six GA parameter settings and three settings using traditional feature selection methods (F-test and Lasso). We then evaluated the prediction performance to detect T2DM in the miRNA subsets derived from each setting. RESULTS The miRNA subset in setting 5 using miRDM-rfGA performed the best in detecting T2DM (mean AUROC = 0.92). Target mRNA identification and functional enrichment analysis of the best miRNA subset (hsa-miR-125b-5p, hsa-miR-7-5p, and hsa-let-7b-5p) validated that this combination was involved in T2DM. We also confirmed that the targeted genes were negatively correlated with the clinical variables related to T2DM in the BxD mouse genetic reference population database. CONCLUSIONS Using GA in miRNA-Seq data, we identified the optimal miRNA biomarker set for T2DM detection. GA can be a useful tool for biomarker discovery and drug-target identification.
Collapse
Affiliation(s)
- Aron Park
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon, 21999, Korea
| | - Seungyoon Nam
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon, 21999, Korea.
- Department of Genome Medicine and Science, AI Convergence Center for Medical Science, Gachon University Gil Medical Center, Gachon University College of Medicine, Incheon, 21565, Korea.
| |
Collapse
|
7
|
Al-Tashi Q, Saad MB, Muneer A, Qureshi R, Mirjalili S, Sheshadri A, Le X, Vokes NI, Zhang J, Wu J. Machine Learning Models for the Identification of Prognostic and Predictive Cancer Biomarkers: A Systematic Review. Int J Mol Sci 2023; 24:7781. [PMID: 37175487 PMCID: PMC10178491 DOI: 10.3390/ijms24097781] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/10/2023] [Accepted: 04/19/2023] [Indexed: 05/15/2023] Open
Abstract
The identification of biomarkers plays a crucial role in personalized medicine, both in the clinical and research settings. However, the contrast between predictive and prognostic biomarkers can be challenging due to the overlap between the two. A prognostic biomarker predicts the future outcome of cancer, regardless of treatment, and a predictive biomarker predicts the effectiveness of a therapeutic intervention. Misclassifying a prognostic biomarker as predictive (or vice versa) can have serious financial and personal consequences for patients. To address this issue, various statistical and machine learning approaches have been developed. The aim of this study is to present an in-depth analysis of recent advancements, trends, challenges, and future prospects in biomarker identification. A systematic search was conducted using PubMed to identify relevant studies published between 2017 and 2023. The selected studies were analyzed to better understand the concept of biomarker identification, evaluate machine learning methods, assess the level of research activity, and highlight the application of these methods in cancer research and treatment. Furthermore, existing obstacles and concerns are discussed to identify prospective research areas. We believe that this review will serve as a valuable resource for researchers, providing insights into the methods and approaches used in biomarker discovery and identifying future research opportunities.
Collapse
Affiliation(s)
- Qasem Al-Tashi
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Maliazurina B. Saad
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Amgad Muneer
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Rizwan Qureshi
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Seyedali Mirjalili
- Centre for Artificial Intelligence Research and Optimization, Torrens University Australia, Fortitude Valley, Brisbane, QLD 4006, Australia
- Yonsei Frontier Lab, Yonsei University, Seoul 03722, Republic of Korea
- University Research and Innovation Center, Obuda University, 1034 Budapest, Hungary
| | - Ajay Sheshadri
- Department of Pulmonary Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Xiuning Le
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Natalie I. Vokes
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jianjun Zhang
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jia Wu
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
8
|
Cattelani L, Fortino V. Identifying gene expression-based biomarkers in online learning environments. BIOINFORMATICS ADVANCES 2022; 2:vbac074. [PMID: 36699355 PMCID: PMC9710669 DOI: 10.1093/bioadv/vbac074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 09/07/2022] [Accepted: 10/11/2022] [Indexed: 11/06/2022]
Abstract
Motivation Gene expression-based classifiers are often developed using historical data by training a model on a small set of patients and a large set of features. Models trained in such a way can be afterwards applied for predicting the output for new unseen patient data. However, very often the accuracy of these models starts to decrease as soon as new data is fed into the trained model. This problem, known as concept drift, complicates the task of learning efficient biomarkers from data and requires special approaches, different from commonly used data mining techniques. Results Here, we propose an online ensemble learning method to continually validate and adjust gene expression-based biomarker panels over increasing volume of data. We also propose a computational solution to the problem of feature drift where gene expression signatures used to train the classifier become less relevant over time. A benchmark study was conducted to classify the breast tumors into known subtypes by using a large-scale transcriptomic dataset (∼3500 patients), which was obtained by combining two datasets: SCAN-B and TCGA-BRCA. Remarkably, the proposed strategy improves the classification performances of gold-standard biomarker panels (e.g. PAM50, OncotypeDX and Endopredict) by adding features that are clinically relevant. Moreover, test results show that newly discovered biomarker models can retain a high classification accuracy rate when changing the source generating the gene expression profiles. Availability and implementation github.com/UEFBiomedicalInformaticsLab/OnlineLearningBD. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Luca Cattelani
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, Kuopio, Finland
| | - Vittorio Fortino
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, Kuopio, Finland
| |
Collapse
|
9
|
Cattelani L, Fortino V. Improved NSGA-II algorithms for multi-objective biomarker discovery. Bioinformatics 2022; 38:ii20-ii26. [PMID: 36124794 DOI: 10.1093/bioinformatics/btac463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION In modern translational research, the development of biomarkers heavily relies on use of omics technologies, but implementations with basic data mining algorithms frequently lead to false positives. Non-dominated Sorting Genetic Algorithm II (NSGA2) is an extremely effective algorithm for biomarker discovery but has been rarely evaluated against large-scale datasets. The exploration of the feature search space is the key to NSGA2 success but in specific cases NSGA2 expresses a shallow exploration of the space of possible feature combinations, possibly leading to models with low predictive performances. RESULTS We propose two improved NSGA2 algorithms for finding subsets of biomarkers exhibiting different trade-offs between accuracy and feature number. The performances are investigated on gene expression data of breast cancer patients. The results are compared with NSGA2 and LASSO. The benchmarking dataset includes internal and external validation sets. The results show that the proposed algorithms generate a better approximation of the optimal trade-offs between accuracy and set size. Moreover, validation and test accuracies are better than those provided by NSGA2 and LASSO. Remarkably, the GA-based methods provide biomarkers that achieve a very high prediction accuracy (>80%) with a small number of features (<10), representing a valid alternative to known biomarker models, such as Pam50 and MammaPrint. AVAILABILITY AND IMPLEMENTATION The software is publicly available on GitHub at github.com/UEFBiomedicalInformaticsLab/BIODAI/tree/main/MOO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luca Cattelani
- School of Medicine, Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| | - Vittorio Fortino
- School of Medicine, Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| |
Collapse
|
10
|
Biomarkers of nanomaterials hazard from multi-layer data. Nat Commun 2022; 13:3798. [PMID: 35778420 PMCID: PMC9249793 DOI: 10.1038/s41467-022-31609-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 06/17/2022] [Indexed: 11/09/2022] Open
Abstract
There is an urgent need to apply effective, data-driven approaches to reliably predict engineered nanomaterial (ENM) toxicity. Here we introduce a predictive computational framework based on the molecular and phenotypic effects of a large panel of ENMs across multiple in vitro and in vivo models. Our methodology allows for the grouping of ENMs based on multi-omics approaches combined with robust toxicity tests. Importantly, we identify mRNA-based toxicity markers and extensively replicate them in multiple independent datasets. We find that models based on combinations of omics-derived features and material intrinsic properties display significantly improved predictive accuracy as compared to physicochemical properties alone.
Collapse
|
11
|
Serra A, Saarimäki LA, Pavel A, del Giudice G, Fratello M, Cattelani L, Federico A, Laurino O, Marwah VS, Fortino V, Scala G, Sofia Kinaret PA, Greco D. Nextcast: A software suite to analyse and model toxicogenomics data. Comput Struct Biotechnol J 2022; 20:1413-1426. [PMID: 35386103 PMCID: PMC8956870 DOI: 10.1016/j.csbj.2022.03.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 03/16/2022] [Accepted: 03/16/2022] [Indexed: 11/28/2022] Open
Abstract
The recent advancements in toxicogenomics have led to the availability of large omics data sets, representing the starting point for studying the exposure mechanism of action and identifying candidate biomarkers for toxicity prediction. The current lack of standard methods in data generation and analysis hampers the full exploitation of toxicogenomics-based evidence in regulatory risk assessment. Moreover, the pipelines for the preprocessing and downstream analyses of toxicogenomic data sets can be quite challenging to implement. During the years, we have developed a number of software packages to address specific questions related to multiple steps of toxicogenomics data analysis and modelling. In this review we present the Nextcast software collection and discuss how its individual tools can be combined into efficient pipelines to answer specific biological questions. Nextcast components are of great support to the scientific community for analysing and interpreting large data sets for the toxicity evaluation of compounds in an unbiased, straightforward, and reliable manner. The Nextcast software suite is available at: ( https://github.com/fhaive/nextcast).
Collapse
Affiliation(s)
- Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Laura Aliisa Saarimäki
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Giusy del Giudice
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Michele Fratello
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | | | - Veer Singh Marwah
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland
| | - Vittorio Fortino
- Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| | - Giovanni Scala
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland
- Department of Biology, University of Naples Federico II, Naples, Italy
| | - Pia Anneli Sofia Kinaret
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| |
Collapse
|
12
|
Jaddi NS, Saniee Abadeh M. Cell separation algorithm with enhanced search behaviour in miRNA feature selection for cancer diagnosis. INFORM SYST 2022. [DOI: 10.1016/j.is.2021.101906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
13
|
Serra A, Cattelani L, Fratello M, Fortino V, Kinaret PAS, Greco D. Supervised Methods for Biomarker Detection from Microarray Experiments. Methods Mol Biol 2022; 2401:101-120. [PMID: 34902125 DOI: 10.1007/978-1-0716-1839-4_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Biomarkers are valuable indicators of the state of a biological system. Microarray technology has been extensively used to identify biomarkers and build computational predictive models for disease prognosis, drug sensitivity and toxicity evaluations. Activation biomarkers can be used to understand the underlying signaling cascades, mechanisms of action and biological cross talk. Biomarker detection from microarray data requires several considerations both from the biological and computational points of view. In this chapter, we describe the main methodology used in biomarkers discovery and predictive modeling and we address some of the related challenges. Moreover, we discuss biomarker validation and give some insights into multiomics strategies for biomarker detection.
Collapse
Affiliation(s)
- Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), University of Tampere, Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), University of Tampere, Tampere, Finland
| | - Michele Fratello
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), University of Tampere, Tampere, Finland
| | - Vittorio Fortino
- Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| | - Pia Anneli Sofia Kinaret
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), University of Tampere, Tampere, Finland
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), University of Tampere, Tampere, Finland.
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
14
|
Wisgrill L, Werner P, Fortino V, Fyhrquist N. AIM in Allergy. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_90] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
15
|
Saarimäki LA, Federico A, Lynch I, Papadiamantis AG, Tsoumanis A, Melagraki G, Afantitis A, Serra A, Greco D. Manually curated transcriptomics data collection for toxicogenomic assessment of engineered nanomaterials. Sci Data 2021; 8:49. [PMID: 33558569 PMCID: PMC7870661 DOI: 10.1038/s41597-021-00808-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 12/16/2020] [Indexed: 02/07/2023] Open
Abstract
Toxicogenomics (TGx) approaches are increasingly applied to gain insight into the possible toxicity mechanisms of engineered nanomaterials (ENMs). Omics data can be valuable to elucidate the mechanism of action of chemicals and to develop predictive models in toxicology. While vast amounts of transcriptomics data from ENM exposures have already been accumulated, a unified, easily accessible and reusable collection of transcriptomics data for ENMs is currently lacking. In an attempt to improve the FAIRness of already existing transcriptomics data for ENMs, we curated a collection of homogenized transcriptomics data from human, mouse and rat ENM exposures in vitro and in vivo including the physicochemical characteristics of the ENMs used in each study.
Collapse
Affiliation(s)
- Laura Aliisa Saarimäki
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
| | - Iseult Lynch
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, B15 2TT, Birmingham, United Kingdom
| | - Anastasios G Papadiamantis
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, B15 2TT, Birmingham, United Kingdom
- NovaMechanics Ltd, P.O Box 26014 1666, Nicosia, Cyprus
| | | | | | | | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
- Finnish Centre for Alternative Methods (FICAM), Faculty of Medicine and Heath Technology, Tampere University, Tampere, Finland.
| |
Collapse
|
16
|
Wisgrill L, Werner P, Fortino V, Fyhrquist N. AIM in Allergy. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_90-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
17
|
Machine-learning-driven biomarker discovery for the discrimination between allergic and irritant contact dermatitis. Proc Natl Acad Sci U S A 2020; 117:33474-33485. [PMID: 33318199 PMCID: PMC7776829 DOI: 10.1073/pnas.2009192117] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Contact dermatitis tremendously impacts the quality of life of suffering patients. Currently, diagnostic regimes rely on allergy testing, exposure specification, and follow-up visits; however, distinguishing the clinical phenotype of irritant and allergic contact dermatitis remains challenging. Employing integrative transcriptomic analysis and machine-learning approaches, we aimed to decipher disease-related signature genes to find suitable sets of biomarkers. A total of 89 positive patch-test reaction biopsies against four contact allergens and two irritants were analyzed via microarray. Coexpression network analysis and Random Forest classification were used to discover potential biomarkers and selected biomarker models were validated in an independent patient group. Differential gene-expression analysis identified major gene-expression changes depending on the stimulus. Random Forest classification identified CD47, BATF, FASLG, RGS16, SYNPO, SELE, PTPN7, WARS, PRC1, EXO1, RRM2, PBK, RAD54L, KIFC1, SPC25, PKMYT, HISTH1A, TPX2, DLGAP5, TPX2, CH25H, and IL37 as potential biomarkers to distinguish allergic and irritant contact dermatitis in human skin. Validation experiments and prediction performances on external testing datasets demonstrated potential applicability of the identified biomarker models in the clinic. Capitalizing on this knowledge, novel diagnostic tools can be developed to guide clinical diagnosis of contact allergies.
Collapse
|
18
|
Serra A, Fratello M, Cattelani L, Liampa I, Melagraki G, Kohonen P, Nymark P, Federico A, Kinaret PAS, Jagiello K, Ha MK, Choi JS, Sanabria N, Gulumian M, Puzyn T, Yoon TH, Sarimveis H, Grafström R, Afantitis A, Greco D. Transcriptomics in Toxicogenomics, Part III: Data Modelling for Risk Assessment. NANOMATERIALS (BASEL, SWITZERLAND) 2020; 10:E708. [PMID: 32276469 PMCID: PMC7221955 DOI: 10.3390/nano10040708] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 03/25/2020] [Accepted: 03/26/2020] [Indexed: 12/30/2022]
Abstract
Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics.
Collapse
Affiliation(s)
- Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Michele Fratello
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Irene Liampa
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece; (I.L.); (H.S.)
| | - Georgia Melagraki
- Nanoinformatics Department, NovaMechanics Ltd., Nicosia 1065, Cyprus; (G.M.); (A.A.)
| | - Pekka Kohonen
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Penny Nymark
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Pia Anneli Sofia Kinaret
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| | - Karolina Jagiello
- QSAR Lab Ltd., Aleja Grunwaldzka 190/102, 80-266 Gdansk, Poland; (K.J.); (T.P.)
- University of Gdansk, Faculty of Chemistry, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - My Kieu Ha
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Jang-Sik Choi
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Natasha Sanabria
- National Institute for Occupational Health, Johannesburg 30333, South Africa; (N.S.); (M.G.)
| | - Mary Gulumian
- National Institute for Occupational Health, Johannesburg 30333, South Africa; (N.S.); (M.G.)
- Haematology and Molecular Medicine Department, School of Pathology, University of the Witwatersrand, Johannesburg 2050, South Africa
| | - Tomasz Puzyn
- QSAR Lab Ltd., Aleja Grunwaldzka 190/102, 80-266 Gdansk, Poland; (K.J.); (T.P.)
- University of Gdansk, Faculty of Chemistry, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Tae-Hyun Yoon
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Haralambos Sarimveis
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece; (I.L.); (H.S.)
| | - Roland Grafström
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Antreas Afantitis
- Nanoinformatics Department, NovaMechanics Ltd., Nicosia 1065, Cyprus; (G.M.); (A.A.)
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| |
Collapse
|