1
|
Mansoor S, Hamid S, Tuan TT, Park JE, Chung YS. Advance computational tools for multiomics data learning. Biotechnol Adv 2024:108447. [PMID: 39251098 DOI: 10.1016/j.biotechadv.2024.108447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Revised: 09/01/2024] [Accepted: 09/05/2024] [Indexed: 09/11/2024]
Abstract
The burgeoning field of bioinformatics has seen a surge in computational tools tailored for omics data analysis driven by the heterogeneous and high-dimensional nature of omics data. In biomedical and plant science research multi-omics data has become pivotal for predictive analytics in the era of big data necessitating sophisticated computational methodologies. This review explores a diverse array of computational approaches which play crucial roles in processing, normalizing, integrating, and analyzing omics data. Notable methods such similarity-based methods, network-based approaches, correlation-based methods, Bayesian methods, fusion-based methods and multivariate techniques among others are discussed in detail, each offering unique functionalities to address the complexities of multi-omics data. Furthermore, this review underscores the significance of computational tools in advancing our understanding of data and their transformative impact on research.
Collapse
Affiliation(s)
- Sheikh Mansoor
- Department of Plant Resources and Environment, Jeju National University, 63243, Republic of Korea
| | - Saira Hamid
- Watson Crick Centre for Molecular Medicine, Islamic University of Science and Technology, Awantipora, Pulwama, J&K, India
| | - Thai Thanh Tuan
- Watson Crick Centre for Molecular Medicine, Islamic University of Science and Technology, Awantipora, Pulwama, J&K, India
| | - Jong Eun Park
- Department of Animal Biotechnology, College of Applied Life Science, Jeju National University, Jeju, Jeju-do, Republic of Korea.
| | - Yong Suk Chung
- Department of Plant Resources and Environment, Jeju National University, 63243, Republic of Korea.
| |
Collapse
|
2
|
Novoloaca A, Broc C, Beloeil L, Yu WH, Becker J. Comparative analysis of integrative classification methods for multi-omics data. Brief Bioinform 2024; 25:bbae331. [PMID: 38985929 PMCID: PMC11234228 DOI: 10.1093/bib/bbae331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/31/2024] [Indexed: 07/12/2024] Open
Abstract
Recent advances in sequencing, mass spectrometry, and cytometry technologies have enabled researchers to collect multiple 'omics data types from a single sample. These large datasets have led to a growing consensus that a holistic approach is needed to identify new candidate biomarkers and unveil mechanisms underlying disease etiology, a key to precision medicine. While many reviews and benchmarks have been conducted on unsupervised approaches, their supervised counterparts have received less attention in the literature and no gold standard has emerged yet. In this work, we present a thorough comparison of a selection of six methods, representative of the main families of intermediate integrative approaches (matrix factorization, multiple kernel methods, ensemble learning, and graph-based methods). As non-integrative control, random forest was performed on concatenated and separated data types. Methods were evaluated for classification performance on both simulated and real-world datasets, the latter being carefully selected to cover different medical applications (infectious diseases, oncology, and vaccines) and data modalities. A total of 15 simulation scenarios were designed from the real-world datasets to explore a large and realistic parameter space (e.g. sample size, dimensionality, class imbalance, effect size). On real data, the method comparison showed that integrative approaches performed better or equally well than their non-integrative counterpart. By contrast, DIABLO and the four random forest alternatives outperform the others across the majority of simulation scenarios. The strengths and limitations of these methods are discussed in detail as well as guidelines for future applications.
Collapse
Affiliation(s)
- Alexei Novoloaca
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Camilo Broc
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Laurent Beloeil
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Wen-Han Yu
- Bill & Melinda Gates Medical Research Institute, Cambridge, Massachusetts, MA 02139, United States
| | - Jérémie Becker
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| |
Collapse
|
3
|
Ewald JD, Zhou G, Lu Y, Kolic J, Ellis C, Johnson JD, Macdonald PE, Xia J. Web-based multi-omics integration using the Analyst software suite. Nat Protoc 2024; 19:1467-1497. [PMID: 38355833 DOI: 10.1038/s41596-023-00950-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 11/21/2023] [Indexed: 02/16/2024]
Abstract
The growing number of multi-omics studies demands clear conceptual workflows coupled with easy-to-use software tools to facilitate data analysis and interpretation. This protocol covers three key components involved in multi-omics analysis, including single-omics data analysis, knowledge-driven integration using biological networks and data-driven integration through joint dimensionality reduction. Using the dataset from a recent multi-omics study of human pancreatic islet tissue and plasma samples, the first section introduces how to perform transcriptomics/proteomics data analysis using ExpressAnalyst and lipidomics data analysis using MetaboAnalyst. On the basis of significant features detected in these workflows, the second section demonstrates how to perform knowledge-driven integration using OmicsNet. The last section illustrates how to perform data-driven integration from the normalized omics data and metadata using OmicsAnalyst. The complete protocol can be executed in ~2 h. Compared with other available options for multi-omics integration, the Analyst software suite described in this protocol enables researchers to perform a wide range of omics data analysis tasks via a user-friendly web interface.
Collapse
Affiliation(s)
- Jessica D Ewald
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada
| | - Guangyan Zhou
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada
| | - Yao Lu
- Department of Microbiology and Immunology, McGill University, Montreal, Quebec, Canada
| | - Jelena Kolic
- Life Sciences Institute, Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cara Ellis
- Department of Pharmacology, University of Alberta, Edmonton, Alberta, Canada
| | - James D Johnson
- Life Sciences Institute, Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Patrick E Macdonald
- Department of Pharmacology, University of Alberta, Edmonton, Alberta, Canada
| | - Jianguo Xia
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada.
- Department of Microbiology and Immunology, McGill University, Montreal, Quebec, Canada.
| |
Collapse
|
4
|
Mardoc E, Sow MD, Déjean S, Salse J. Genomic data integration tutorial, a plant case study. BMC Genomics 2024; 25:66. [PMID: 38233804 PMCID: PMC10792847 DOI: 10.1186/s12864-023-09833-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 11/22/2023] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND The ongoing evolution of the Next Generation Sequencing (NGS) technologies has led to the production of genomic data on a massive scale. While tools for genomic data integration and analysis are becoming increasingly available, the conceptual and analytical complexities still represent a great challenge in many biological contexts. RESULTS To address this issue, we describe a six-steps tutorial for the best practices in genomic data integration, consisting of (1) designing a data matrix; (2) formulating a specific biological question toward data description, selection and prediction; (3) selecting a tool adapted to the targeted questions; (4) preprocessing of the data; (5) conducting preliminary analysis, and finally (6) executing genomic data integration. CONCLUSION The tutorial has been tested and demonstrated on publicly available genomic data generated from poplar (Populus L.), a woody plant model. We also developed a new graphical output for the unsupervised multi-block analysis, cimDiablo_v2, available at https://forgemia.inra.fr/umr-gdec/omics-integration-on-poplar , and allowing the selection of master drivers in genomic data variation and interplay.
Collapse
Affiliation(s)
- Emile Mardoc
- UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC), 5 Chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Mamadou Dia Sow
- UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC), 5 Chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Sébastien Déjean
- Institut de Mathématiques de Toulouse, UMR 5219, Université de Toulouse, CNRS, Université Paul Sabatier, Toulouse, France
| | - Jérôme Salse
- UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC), 5 Chemin de Beaulieu, 63000, Clermont-Ferrand, France.
| |
Collapse
|
5
|
Pasero E, Gaita F, Randazzo V, Meynet P, Cannata S, Maury P, Giustetto C. Artificial Intelligence ECG Analysis in Patients with Short QT Syndrome to Predict Life-Threatening Arrhythmic Events. SENSORS (BASEL, SWITZERLAND) 2023; 23:8900. [PMID: 37960599 PMCID: PMC10649184 DOI: 10.3390/s23218900] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 10/23/2023] [Accepted: 10/28/2023] [Indexed: 11/15/2023]
Abstract
Short QT syndrome (SQTS) is an inherited cardiac ion-channel disease related to an increased risk of sudden cardiac death (SCD) in young and otherwise healthy individuals. SCD is often the first clinical presentation in patients with SQTS. However, arrhythmia risk stratification is presently unsatisfactory in asymptomatic patients. In this context, artificial intelligence-based electrocardiogram (ECG) analysis has never been applied to refine risk stratification in patients with SQTS. The purpose of this study was to analyze ECGs from SQTS patients with the aid of different AI algorithms to evaluate their ability to discriminate between subjects with and without documented life-threatening arrhythmic events. The study group included 104 SQTS patients, 37 of whom had a documented major arrhythmic event at presentation and/or during follow-up. Thirteen ECG features were measured independently by three expert cardiologists; then, the dataset was randomly divided into three subsets (training, validation, and testing). Five shallow neural networks were trained, validated, and tested to predict subject-specific class (non-event/event) using different subsets of ECG features. Additionally, several deep learning and machine learning algorithms, such as Vision Transformer, Swin Transformer, MobileNetV3, EfficientNetV2, ConvNextTiny, Capsule Networks, and logistic regression were trained, validated, and tested directly on the scanned ECG images, without any manual feature extraction. Furthermore, a shallow neural network, a 1-D transformer classifier, and a 1-D CNN were trained, validated, and tested on ECG signals extracted from the aforementioned scanned images. Classification metrics were evaluated by means of sensitivity, specificity, positive and negative predictive values, accuracy, and area under the curve. Results prove that artificial intelligence can help clinicians in better stratifying risk of arrhythmia in patients with SQTS. In particular, shallow neural networks' processing features showed the best performance in identifying patients that will not suffer from a potentially lethal event. This could pave the way for refined ECG-based risk stratification in this group of patients, potentially helping in saving the lives of young and otherwise healthy individuals.
Collapse
Affiliation(s)
- Eros Pasero
- Department of Electronics and Telecommunications, Politecnico di Torino, 10129 Turin, Italy
| | - Fiorenzo Gaita
- Cardiology Unit, J Medical, 1015 Turin, Italy;
- Department of Medical Sciences, University of Turin, 10124 Turin, Italy;
| | - Vincenzo Randazzo
- Department of Electronics and Telecommunications, Politecnico di Torino, 10129 Turin, Italy
| | - Pierre Meynet
- Department of Medical Sciences, University of Turin, 10124 Turin, Italy;
- Division of Cardiology, Città della Salute e della Scienza Hospital, 10126 Turin, Italy
| | - Sergio Cannata
- Department of Electronics and Telecommunications, Politecnico di Torino, 10129 Turin, Italy
| | - Philippe Maury
- Department of Cardiology, University Hospital Rangueil, 31400 Toulouse, France;
| | - Carla Giustetto
- Department of Medical Sciences, University of Turin, 10124 Turin, Italy;
- Division of Cardiology, Città della Salute e della Scienza Hospital, 10126 Turin, Italy
| |
Collapse
|
6
|
Chen Y, Wen Y, Xie C, Chen X, He S, Bo X, Zhang Z. MOCSS: Multi-omics data clustering and cancer subtyping via shared and specific representation learning. iScience 2023; 26:107378. [PMID: 37559907 PMCID: PMC10407241 DOI: 10.1016/j.isci.2023.107378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/23/2023] [Accepted: 07/07/2023] [Indexed: 08/11/2023] Open
Abstract
Cancer is an extremely complex disease and each type of cancer usually has several different subtypes. Multi-omics data can provide more comprehensive biological information for identifying and discovering cancer subtypes. However, existing unsupervised cancer subtyping methods cannot effectively learn comprehensive shared and specific information of multi-omics data. Therefore, a novel method is proposed based on shared and specific representation learning. For each omics data, two autoencoders are applied to extract shared and specific information, respectively. To reduce redundancy and mutual interference, orthogonality constraint is introduced to separate shared and specific information. In addition, contrastive learning is applied to align the shared information and strengthen their consistency. Finally, the obtained shared and specific information for all samples are used for clustering tasks to achieve cancer subtyping. Experimental results demonstrate that the proposed method can effectively capture shared and specific information of multi-omics data and outperform other state-of-the-art methods on cancer subtyping.
Collapse
Affiliation(s)
- Yuxin Chen
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Chenyang Xie
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Xinjian Chen
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen 361005, China
| |
Collapse
|
7
|
Eshghali M, Kannan D, Salmanzadeh-Meydani N, Esmaieeli Sikaroudi AM. Machine learning based integrated scheduling and rescheduling for elective and emergency patients in the operating theatre. ANNALS OF OPERATIONS RESEARCH 2023:1-24. [PMID: 36694896 PMCID: PMC9851122 DOI: 10.1007/s10479-023-05168-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 01/04/2023] [Indexed: 06/17/2023]
Abstract
As the only largest source of revenue and cost in a hospital, the operation room (OR) scheduling problem is a hot research topic. Nonetheless, an integrated model is the missing key to managing and improving the efficiency of ORs. This paper presents a fully integrated model regarding three concepts: meditating elective patients and emergency patients together, considering ORs and downstream units, and proposing hierarchical weekly, daily, and rescheduling models. Due to the inherent randomness in emergency patient arrival, a random forest machine learning model and geographical information systems are used to obtain the emergency patient surgery duration and arrival time, respectively. According to the machine learning model in weekly and daily scheduling, initially, fixed capacity is reserved for emergency patients. When an emergency patient arrives, the surgery starts if a reserved OR is available. Otherwise, the first available OR will be dedicated to the patient due to an emergency patient's higher priority than an elective patient. In this case, it is needed to reschedule the OT schedule for the remaining patient. Moreover, the three-phase model guarantees that an emergency patient assigns to an OR within a specific time limit. To solve the models, genetic algorithm and particle swarm optimization are developed and compared. In addition, a real-world case study is undertaken at a hospital. The results of comparing the proposed approach to the hospital's current scheduling show that the three-phase model had a considerable positive effect on the ORs schedule.
Collapse
Affiliation(s)
- Masoud Eshghali
- Department of Systems and Industrial Engineering, University of Arizona, Tucson, AZ 85721 USA
| | - Devika Kannan
- Centre for Sustainable Supply Chain Engineering, Department of Technology and Innovation, University of Southern Denmark, 5230 Odense M, Denmark
- School of Business, Woxsen University, Sadasivpet, Telangana India
| | - Navid Salmanzadeh-Meydani
- Centre for Sustainable Supply Chain Engineering, Department of Technology and Innovation, University of Southern Denmark, 5230 Odense M, Denmark
- Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran
| | | |
Collapse
|
8
|
Skin Cancer Metabolic Profile Assessed by Different Analytical Platforms. Int J Mol Sci 2023; 24:ijms24021604. [PMID: 36675128 PMCID: PMC9866771 DOI: 10.3390/ijms24021604] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 01/03/2023] [Accepted: 01/10/2023] [Indexed: 01/17/2023] Open
Abstract
Skin cancer, including malignant melanoma (MM) and keratinocyte carcinoma (KC), historically named non-melanoma skin cancers (NMSC), represents the most common type of cancer among the white skin population. Despite decades of clinical research, the incidence rate of melanoma is increasing globally. Therefore, a better understanding of disease pathogenesis and resistance mechanisms is considered vital to accomplish early diagnosis and satisfactory control. The "Omics" field has recently gained attention, as it can help in identifying and exploring metabolites and metabolic pathways that assist cancer cells in proliferation, which can be further utilized to improve the diagnosis and treatment of skin cancer. Although skin tissues contain diverse metabolic enzymes, it remains challenging to fully characterize these metabolites. Metabolomics is a powerful omics technique that allows us to measure and compare a vast array of metabolites in a biological sample. This technology enables us to study the dermal metabolic effects and get a clear explanation of the pathogenesis of skin diseases. The purpose of this literature review is to illustrate how metabolomics technology can be used to evaluate the metabolic profile of human skin cancer, using a variety of analytical platforms including gas chromatography-mass spectrometry (GC-MS), liquid chromatography-mass spectrometry (LC-MS), and nuclear magnetic resonance (NMR). Data collection has not been based on any analytical method.
Collapse
|
9
|
Chicco D, Bourne PE. Ten simple rules for organizing a special session at a scientific conference. PLoS Comput Biol 2022; 18:e1010395. [PMID: 36006874 PMCID: PMC9409505 DOI: 10.1371/journal.pcbi.1010395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Special sessions are important parts of scientific meetings and conferences: They gather together researchers and students interested in a specific topic and can strongly contribute to the success of the conference itself. Moreover, they can be the first step for trainees and students to the organization of a scientific event. Organizing a special session, however, can be uneasy for beginners and students. Here, we provide ten simple rules to follow to organize a special session at a scientific conference.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- * E-mail:
| | - Philip E. Bourne
- School of Data Science, University of Virginia, Charlottesville, Virginia, United States of America
| |
Collapse
|
10
|
Fu R, Li Z. An evidence accumulation based block diagonal cluster model for intent recognition from EEG. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103835] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
11
|
Progress in and Opportunities for Applying Information Theory to Computational Biology and Bioinformatics. ENTROPY 2022; 24:e24070925. [PMID: 35885148 PMCID: PMC9323281 DOI: 10.3390/e24070925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/27/2022] [Accepted: 06/30/2022] [Indexed: 11/25/2022]
|
12
|
Lovino M, Montemurro M, Barrese VS, Ficarra E. Identifying the oncogenic potential of gene fusions exploiting miRNAs. J Biomed Inform 2022; 129:104057. [PMID: 35339665 DOI: 10.1016/j.jbi.2022.104057] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/14/2022] [Accepted: 03/15/2022] [Indexed: 12/11/2022]
Abstract
It is estimated that oncogenic gene fusions cause about 20% of human cancer morbidity. Identifying potentially oncogenic gene fusions may improve affected patients' diagnosis and treatment. Previous approaches to this issue included exploiting specific gene-related information, such as gene function and regulation. Here we propose a model that profits from the previous findings and includes the microRNAs in the oncogenic assessment. We present ChimerDriver, a tool to classify gene fusions as oncogenic or not oncogenic. ChimerDriver is based on a specifically designed neural network and trained on genetic and post-transcriptional information to obtain a reliable classification. The designed neural network integrates information related to transcription factors, gene ontologies, microRNAs and other detailed information related to the functions of the genes involved in the fusion and the gene fusion structure. As a result, the performances on the test set reached 0.83 f1-score and 96% recall. The comparison with state-of-the-art tools returned comparable or higher results. Moreover, ChimerDriver performed well in a real-world case where 21 out of 24 validated gene fusion samples were detected by the gene fusion detection tool Starfusion. ChimerDriver integrates transcriptional and post-transcriptional information in an ad-hoc designed neural network to effectively discriminate oncogenic gene fusions from passenger ones. ChimerDriver source code is freely available at https://github.com/martalovino/ChimerDriver.
Collapse
Affiliation(s)
- Marta Lovino
- University of Modena and Reggio Emilia, Via Vivarelli 10/1, 41125 Modena, Italy.
| | | | - Venere S Barrese
- Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino, Italy
| | - Elisa Ficarra
- University of Modena and Reggio Emilia, Via Vivarelli 10/1, 41125 Modena, Italy
| |
Collapse
|