1
|
Cardoso Rial R. AI in analytical chemistry: Advancements, challenges, and future directions. Talanta 2024; 274:125949. [PMID: 38569367 DOI: 10.1016/j.talanta.2024.125949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 03/09/2024] [Accepted: 03/17/2024] [Indexed: 04/05/2024]
Abstract
This article explores the influence and applications of Artificial Intelligence (AI) in analytical chemistry, highlighting its potential to revolutionize the analysis of complex data sets and the development of innovative analytical methods. Additionally, it discusses the role of AI in interpreting large-scale data and optimizing experimental processes. AI has been fundamental in managing heterogeneous data and in advanced analysis of complex spectra in areas such as spectroscopy and chromatography. The article also examines the historical development of AI in chemistry, its current challenges, including the interpretation of AI models and the integration of large volumes of data. Finally, it forecasts future trends and the potential impact of AI on analytical chemistry, emphasizing the need for ethical and secure approaches in the use of AI.
Collapse
Affiliation(s)
- Rafael Cardoso Rial
- Federal Institute of Mato Grosso do Sul, 79750-000, Nova Andradina, MS, Brazil.
| |
Collapse
|
2
|
Mukherjee A, Abraham S, Singh A, Balaji S, Mukunthan KS. From Data to Cure: A Comprehensive Exploration of Multi-omics Data Analysis for Targeted Therapies. Mol Biotechnol 2024:10.1007/s12033-024-01133-6. [PMID: 38565775 DOI: 10.1007/s12033-024-01133-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
In the dynamic landscape of targeted therapeutics, drug discovery has pivoted towards understanding underlying disease mechanisms, placing a strong emphasis on molecular perturbations and target identification. This paradigm shift, crucial for drug discovery, is underpinned by big data, a transformative force in the current era. Omics data, characterized by its heterogeneity and enormity, has ushered biological and biomedical research into the big data domain. Acknowledging the significance of integrating diverse omics data strata, known as multi-omics studies, researchers delve into the intricate interrelationships among various omics layers. This review navigates the expansive omics landscape, showcasing tailored assays for each molecular layer through genomes to metabolomes. The sheer volume of data generated necessitates sophisticated informatics techniques, with machine-learning (ML) algorithms emerging as robust tools. These datasets not only refine disease classification but also enhance diagnostics and foster the development of targeted therapeutic strategies. Through the integration of high-throughput data, the review focuses on targeting and modeling multiple disease-regulated networks, validating interactions with multiple targets, and enhancing therapeutic potential using network pharmacology approaches. Ultimately, this exploration aims to illuminate the transformative impact of multi-omics in the big data era, shaping the future of biological research.
Collapse
Affiliation(s)
- Arnab Mukherjee
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Suzanna Abraham
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Akshita Singh
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - S Balaji
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - K S Mukunthan
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| |
Collapse
|
3
|
Lac L, Leung CK, Hu P. Computational frameworks integrating deep learning and statistical models in mining multimodal omics data. J Biomed Inform 2024; 152:104629. [PMID: 38552994 DOI: 10.1016/j.jbi.2024.104629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 02/26/2024] [Accepted: 03/25/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND In health research, multimodal omics data analysis is widely used to address important clinical and biological questions. Traditional statistical methods rely on the strong assumptions of distribution. Statistical methods such as testing and differential expression are commonly used in omics analysis. Deep learning, on the other hand, is an advanced computer science technique that is powerful in mining high-dimensional omics data for prediction tasks. Recently, integrative frameworks or methods have been developed for omics studies that combine statistical models and deep learning algorithms. METHODS AND RESULTS The aim of these integrative frameworks is to combine the strengths of both statistical methods and deep learning algorithms to improve prediction accuracy while also providing interpretability and explainability. This review report discusses the current state-of-the-art integrative frameworks, their limitations, and potential future directions in survival and time-to-event longitudinal analysis, dimension reduction and clustering, regression and classification, feature selection, and causal and transfer learning.
Collapse
Affiliation(s)
- Leann Lac
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Statistics, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Carson K Leung
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Pingzhao Hu
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Biochemistry, Western University, London, Ontario, Canada; Department of Computer Science, Western University, London, Ontario, Canada; Department of Oncology, Western University, London, Ontario, Canada; Department of Epidemiology and Biostatistics, Western University, London, Ontario, Canada; The Children's Health Research Institute, Lawson Health Research Institute, London, Ontario, Canada.
| |
Collapse
|
4
|
Larrea-Sebal A, Jebari-Benslaiman S, Galicia-Garcia U, Jose-Urteaga AS, Uribe KB, Benito-Vicente A, Martín C. Predictive Modeling and Structure Analysis of Genetic Variants in Familial Hypercholesterolemia: Implications for Diagnosis and Protein Interaction Studies. Curr Atheroscler Rep 2023; 25:839-859. [PMID: 37847331 PMCID: PMC10618353 DOI: 10.1007/s11883-023-01154-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2023] [Indexed: 10/18/2023]
Abstract
PURPOSE OF REVIEW Familial hypercholesterolemia (FH) is a hereditary condition characterized by elevated levels of low-density lipoprotein cholesterol (LDL-C), which increases the risk of cardiovascular disease if left untreated. This review aims to discuss the role of bioinformatics tools in evaluating the pathogenicity of missense variants associated with FH. Specifically, it highlights the use of predictive models based on protein sequence, structure, evolutionary conservation, and other relevant features in identifying genetic variants within LDLR, APOB, and PCSK9 genes that contribute to FH. RECENT FINDINGS In recent years, various bioinformatics tools have emerged as valuable resources for analyzing missense variants in FH-related genes. Tools such as REVEL, Varity, and CADD use diverse computational approaches to predict the impact of genetic variants on protein function. These tools consider factors such as sequence conservation, structural alterations, and receptor binding to aid in interpreting the pathogenicity of identified missense variants. While these predictive models offer valuable insights, the accuracy of predictions can vary, especially for proteins with unique characteristics that might not be well represented in the databases used for training. This review emphasizes the significance of utilizing bioinformatics tools for assessing the pathogenicity of FH-associated missense variants. Despite their contributions, a definitive diagnosis of a genetic variant necessitates functional validation through in vitro characterization or cascade screening. This step ensures the precise identification of FH-related variants, leading to more accurate diagnoses. Integrating genetic data with reliable bioinformatics predictions and functional validation can enhance our understanding of the genetic basis of FH, enabling improved diagnosis, risk stratification, and personalized treatment for affected individuals. The comprehensive approach outlined in this review promises to advance the management of this inherited disorder, potentially leading to better health outcomes for those affected by FH.
Collapse
Affiliation(s)
- Asier Larrea-Sebal
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
- Fundación Biofisika Bizkaia, 48940, Leioa, Spain
| | - Shifa Jebari-Benslaiman
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Unai Galicia-Garcia
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Ane San Jose-Urteaga
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Kepa B Uribe
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Asier Benito-Vicente
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - César Martín
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain.
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain.
| |
Collapse
|
5
|
Lareyre F, Chaudhuri A, Nasr B, Raffort J. Machine Learning and Omics Analysis in Aortic Aneurysm. Angiology 2023:33197231206427. [PMID: 37817423 DOI: 10.1177/00033197231206427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/12/2023]
Abstract
Aortic aneurysm is a life-threatening condition and mechanisms underlying its formation and progression are still incompletely understood. Omics approach has brought new insights to identify a broad spectrum of biomarkers and better understand cellular and molecular pathways involved. Omics generate a large amount of data and several studies have highlighted that artificial intelligence (AI) and techniques such as machine learning (ML)/deep learning (DL) can be of use in analyzing such complex datasets. However, only a few studies have so far reported the use of ML/DL for omics analysis in aortic aneurysms. The aim of this study is to summarize recent advances on the use of ML/DL for omics analysis to decipher aortic aneurysm pathophysiology and develop patient-tailored risk prediction models. In the light of current knowledge, we discuss current limits and highlight future directions in the field.
Collapse
Affiliation(s)
- Fabien Lareyre
- Department of Vascular Surgery, Hospital of Antibes Juan-les-Pins, Nice, France
- Inserm U1065, C3M, Université Côte d'Azur, Nice, France
| | - Arindam Chaudhuri
- Bedfordshire-Milton Keynes Vascular Centre, Bedfordshire Hospitals NHS Foundation Trust, Bedford, UK
| | - Bahaa Nasr
- Department of Vascular and Endovascular Surgery, Brest University Hospital, Brest, France
- INSERM UMR 1101, LaTIM, Brest, France
| | - Juliette Raffort
- Inserm U1065, C3M, Université Côte d'Azur, Nice, France
- Clinical Chemistry Laboratory, University Hospital of Nice, Nice, France
- 3IA Institute, Université Côte d'Azur, Nice, France
| |
Collapse
|
6
|
Pan Z, Zhou S, Liu T, Liu C, Zang M, Wang Q. WVDL: Weighted Voting Deep Learning Model for Predicting RNA-Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3322-3328. [PMID: 37028092 DOI: 10.1109/tcbb.2023.3252276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
RNA-binding proteins are important for the process of cell life activities. High-throughput technique experimental method to discover RNA-protein binding sites is time-consuming and expensive. Deep learning is an effective theory for predicting RNA-protein binding sites. Using weighted voting method to integrate multiple basic classifier models can improve model performance. Thus, in our study, we propose a weighted voting deep learning model (WVDL), which uses weighted voting method to combine convolutional neural network (CNN), long short term memory network (LSTM) and residual network (ResNet). First, the final forecast result of WVDL outperforms the basic classifier models and other ensemble strategies. Second, WVDL can extract more effective features by using weighted voting to find the best weighted combination. And, the CNN model also can draw the predicted motif pictures. Third, WVDL gets a competitive experiment result on public RBP-24 datasets comparing with other state-of-the-art methods. The source code of our proposed WVDL can be found in https://github.com/biomg/WVDL.
Collapse
|
7
|
Chong D, Jones NC, Schittenhelm RB, Anderson A, Casillas-Espinosa PM. Multi-omics Integration and Epilepsy: Towards a Better Understanding of Biological Mechanisms. Prog Neurobiol 2023:102480. [PMID: 37286031 DOI: 10.1016/j.pneurobio.2023.102480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/09/2023] [Accepted: 06/03/2023] [Indexed: 06/09/2023]
Abstract
The epilepsies are a group of complex neurological disorders characterised by recurrent seizures. Approximately 30% of patients fail to respond to anti-seizure medications, despite the recent introduction of many new drugs. The molecular processes underlying epilepsy development are not well understood and this knowledge gap impedes efforts to identify effective targets and develop novel therapies against epilepsy. Omics studies allow a comprehensive characterisation of a class of molecules. Omics-based biomarkers have led to clinically validated diagnostic and prognostic tests for personalised oncology, and more recently for non-cancer diseases. We believe that, in epilepsy, the full potential of multi-omics research is yet to be realised and we envisage that this review will serve as a guide to researchers planning to undertake omics-based mechanistic studies.
Collapse
Affiliation(s)
- Debbie Chong
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia
| | - Nigel C Jones
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia; Department of Medicine (The Royal Melbourne Hospital), The University of Melbourne, 3000, Victoria, Australia; Department of Neurology, Alfred Health, Melbourne, 3004, Victoria, Australia
| | - Ralf B Schittenhelm
- Monash Proteomics & Metabolomics Facility and Monash Biomedicine Discovery Institute, Monash University, Clayton, Victoria, 3800, Australia
| | - Alison Anderson
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia; Department of Medicine (The Royal Melbourne Hospital), The University of Melbourne, 3000, Victoria, Australia; Department of Neurology, Alfred Health, Melbourne, 3004, Victoria, Australia
| | - Pablo M Casillas-Espinosa
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia; Department of Medicine (The Royal Melbourne Hospital), The University of Melbourne, 3000, Victoria, Australia; Department of Neurology, Alfred Health, Melbourne, 3004, Victoria, Australia
| |
Collapse
|
8
|
Tognon M, Giugno R, Pinello L. A survey on algorithms to characterize transcription factor binding sites. Brief Bioinform 2023; 24:bbad156. [PMID: 37099664 PMCID: PMC10422928 DOI: 10.1093/bib/bbad156] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 03/27/2023] [Accepted: 04/01/2023] [Indexed: 04/28/2023] Open
Abstract
Transcription factors (TFs) are key regulatory proteins that control the transcriptional rate of cells by binding short DNA sequences called transcription factor binding sites (TFBS) or motifs. Identifying and characterizing TFBS is fundamental to understanding the regulatory mechanisms governing the transcriptional state of cells. During the last decades, several experimental methods have been developed to recover DNA sequences containing TFBS. In parallel, computational methods have been proposed to discover and identify TFBS motifs based on these DNA sequences. This is one of the most widely investigated problems in bioinformatics and is referred to as the motif discovery problem. In this manuscript, we review classical and novel experimental and computational methods developed to discover and characterize TFBS motifs in DNA sequences, highlighting their advantages and drawbacks. We also discuss open challenges and future perspectives that could fill the remaining gaps in the field.
Collapse
Affiliation(s)
- Manuel Tognon
- Computer Science Department, University of Verona, Verona, Italy
- Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Rosalba Giugno
- Computer Science Department, University of Verona, Verona, Italy
| | - Luca Pinello
- Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Department of Pathology, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
9
|
Joiret M, Leclercq M, Lambrechts G, Rapino F, Close P, Louppe G, Geris L. Cracking the genetic code with neural networks. Front Artif Intell 2023; 6:1128153. [PMID: 37091301 PMCID: PMC10117997 DOI: 10.3389/frai.2023.1128153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 03/21/2023] [Indexed: 04/09/2023] Open
Abstract
The genetic code is textbook scientific knowledge that was soundly established without resorting to Artificial Intelligence (AI). The goal of our study was to check whether a neural network could re-discover, on its own, the mapping links between codons and amino acids and build the complete deciphering dictionary upon presentation of transcripts proteins data training pairs. We compared different Deep Learning neural network architectures and estimated quantitatively the size of the required human transcriptomic training set to achieve the best possible accuracy in the codon-to-amino-acid mapping. We also investigated the effect of a codon embedding layer assessing the semantic similarity between codons on the rate of increase of the training accuracy. We further investigated the benefit of quantifying and using the unbalanced representations of amino acids within real human proteins for a faster deciphering of rare amino acids codons. Deep neural networks require huge amount of data to train them. Deciphering the genetic code by a neural network is no exception. A test accuracy of 100% and the unequivocal deciphering of rare codons such as the tryptophan codon or the stop codons require a training dataset of the order of 4–22 millions cumulated pairs of codons with their associated amino acids presented to the neural network over around 7–40 training epochs, depending on the architecture and settings. We confirm that the wide generic capacities and modularity of deep neural networks allow them to be customized easily to learn the deciphering task of the genetic code efficiently.
Collapse
Affiliation(s)
- Marc Joiret
- Biomechanics Research Unit, GIGA in Silico Medicine, Liège University, Liège, Belgium
- *Correspondence: Marc Joiret
| | - Marine Leclercq
- Cancer Signaling, GIGA Stem Cells, Liège University, Liège, Belgium
| | - Gaspard Lambrechts
- Department of Electrical Engineering and Computer Science, Artificial Intelligence and Deep Learning, Montefiore Institute, Liège University, Liège, Belgium
| | - Francesca Rapino
- Cancer Signaling, GIGA Stem Cells, Liège University, Liège, Belgium
| | - Pierre Close
- Cancer Signaling, GIGA Stem Cells, Liège University, Liège, Belgium
| | - Gilles Louppe
- Department of Electrical Engineering and Computer Science, Artificial Intelligence and Deep Learning, Montefiore Institute, Liège University, Liège, Belgium
| | - Liesbet Geris
- Biomechanics Research Unit, GIGA in Silico Medicine, Liège University, Liège, Belgium
- Skeletal Biology and Engineering Research Center, KU Leuven, Leuven, Belgium
- Biomechanics Section, KU Leuven, Heverlee, Belgium
| |
Collapse
|
10
|
Dixit S, Kumar A, Srinivasan K. A Current Review of Machine Learning and Deep Learning Models in Oral Cancer Diagnosis: Recent Technologies, Open Challenges, and Future Research Directions. Diagnostics (Basel) 2023; 13:diagnostics13071353. [PMID: 37046571 PMCID: PMC10093759 DOI: 10.3390/diagnostics13071353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 03/25/2023] [Accepted: 04/03/2023] [Indexed: 04/08/2023] Open
Abstract
Cancer is a problematic global health issue with an extremely high fatality rate throughout the world. The application of various machine learning techniques that have appeared in the field of cancer diagnosis in recent years has provided meaningful insights into efficient and precise treatment decision-making. Due to rapid advancements in sequencing technologies, the detection of cancer based on gene expression data has improved over the years. Different types of cancer affect different parts of the body in different ways. Cancer that affects the mouth, lip, and upper throat is known as oral cancer, which is the sixth most prevalent form of cancer worldwide. India, Bangladesh, China, the United States, and Pakistan are the top five countries with the highest rates of oral cavity disease and lip cancer. The major causes of oral cancer are excessive use of tobacco and cigarette smoking. Many people’s lives can be saved if oral cancer (OC) can be detected early. Early identification and diagnosis could assist doctors in providing better patient care and effective treatment. OC screening may advance with the implementation of artificial intelligence (AI) techniques. AI can provide assistance to the oncology sector by accurately analyzing a large dataset from several imaging modalities. This review deals with the implementation of AI during the early stages of cancer for the proper detection and treatment of OC. Furthermore, performance evaluations of several DL and ML models have been carried out to show that the DL model can overcome the difficult challenges associated with early cancerous lesions in the mouth. For this review, we have followed the rules recommended for the extension of scoping reviews and meta-analyses (PRISMA-ScR). Examining the reference lists for the chosen articles helped us gather more details on the subject. Additionally, we discussed AI’s drawbacks and its potential use in research on oral cancer. There are methods for reducing risk factors, such as reducing the use of tobacco and alcohol, as well as immunization against HPV infection to avoid oral cancer, or to lessen the burden of the disease. Additionally, officious methods for preventing oral diseases include training programs for doctors and patients as well as facilitating early diagnosis via screening high-risk populations for the disease.
Collapse
Affiliation(s)
- Shriniket Dixit
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, India
| | - Anant Kumar
- School of Bioscience and Technology, Vellore Institute of Technology, Vellore 632014, India
| | - Kathiravan Srinivasan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, India
| |
Collapse
|
11
|
Gong P, Cheng L, Zhang Z, Meng A, Li E, Chen J, Zhang L. Multi-omics integration method based on attention deep learning network for biomedical data classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 231:107377. [PMID: 36739624 DOI: 10.1016/j.cmpb.2023.107377] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 01/06/2023] [Accepted: 01/25/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND AND OBJECTIVE Integrating multi-omics data for the comprehensive analysis of the biological processes in human diseases has become one of the most challenging tasks of bioinformatics. Deep learning (DL) algorithms have recently become one of the most promising multi-omics data integration analysis methods. However, existing DL-based studies almost integrate the multi-omics data by concatenation in the input data space or the learned feature space, ignoring the correlations between patients and omics. METHODS We propose a novel multi-omics integration method, called Multi-omics Attention Deep Learning Network (MOADLN), which is used for biomedical data classification. Firstly, for each type of omics data, we use three fully-connected layers and the self-attention mechanism to reduce dimensionality, and construct the correlations between patients, respectively. Then, we apply the feature vector learned from self-attention to generate the initial category labels. Secondly, for the initial label predicted of each omics data, we use an effective Multi-Omics Correlation Discovery Network (MOCDN) to learn the cross-omic correlations in the label space. Finally, we use the softmax classifier for label prediction. RESULTS We demonstrate that our method outperforms several state-of-the-art methods on two datasets with mRNA expression data, DNA methylation data, and miRNA expression data. In addition, we identified essential biomarkers of relevant diseases by MOADLN, and the generality of MOADLN is also demonstrated in the KIRP and KIRC datasets. CONCLUSIONS MOADLN jointly explores correlations between patients in intra-omics and correlations of cross-omics in label space, which is an effective DL-based classification of biomedical data.
Collapse
Affiliation(s)
- Ping Gong
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, CN, China.
| | - Lei Cheng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, CN, China
| | - Zhiyuan Zhang
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, CN, China
| | - Ao Meng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, CN, China
| | - Enshuo Li
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, CN, China
| | - Jie Chen
- Department of Radiation Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, CN, China
| | - Longzhen Zhang
- Department of Radiation Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, CN, China
| |
Collapse
|
12
|
Schumann Y, Neumann JE, Neumann P. Robust classification using average correlations as features (ACF). BMC Bioinformatics 2023; 24:101. [PMID: 36941542 PMCID: PMC10026437 DOI: 10.1186/s12859-023-05224-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 03/09/2023] [Indexed: 03/23/2023] Open
Abstract
MOTIVATION In single-cell transcriptomics and other omics technologies, large fractions of missing values commonly occur. Researchers often either consider only those features that were measured for each instance of their dataset, thereby accepting severe loss of information, or use imputation which can lead to erroneous results. Pairwise metrics allow for imputation-free classification with minimal loss of data. RESULTS Using pairwise correlations as metric, state-of-the-art approaches to classification would include the K-nearest-neighbor- (KNN) and distribution-based-classification-classifier. Our novel method, termed average correlations as features (ACF), significantly outperforms those approaches by training tunable machine learning models on inter-class and intra-class correlations. Our approach is characterized in simulation studies and its classification performance is demonstrated on real-world datasets from single-cell RNA sequencing and bottom-up proteomics. Furthermore, we demonstrate that variants of our method offer superior flexibility and performance over KNN classifiers and can be used in conjunction with other machine learning methods. In summary, ACF is a flexible method that enables missing value tolerant classification with minimal loss of data.
Collapse
Affiliation(s)
- Yannis Schumann
- Chair for High Performance Computing, Helmut-Schmidt University, Hamburg, Germany.
| | - Julia E Neumann
- Center for Molecular Neurobiology Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Institute of Neuropathology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Philipp Neumann
- Chair for High Performance Computing, Helmut-Schmidt University, Hamburg, Germany
| |
Collapse
|
13
|
Mohammed MA, Abdulkareem KH, Dinar AM, Zapirain BG. Rise of Deep Learning Clinical Applications and Challenges in Omics Data: A Systematic Review. Diagnostics (Basel) 2023; 13:diagnostics13040664. [PMID: 36832152 PMCID: PMC9955380 DOI: 10.3390/diagnostics13040664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 02/05/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023] Open
Abstract
This research aims to review and evaluate the most relevant scientific studies about deep learning (DL) models in the omics field. It also aims to realize the potential of DL techniques in omics data analysis fully by demonstrating this potential and identifying the key challenges that must be addressed. Numerous elements are essential for comprehending numerous studies by surveying the existing literature. For example, the clinical applications and datasets from the literature are essential elements. The published literature highlights the difficulties encountered by other researchers. In addition to looking for other studies, such as guidelines, comparative studies, and review papers, a systematic approach is used to search all relevant publications on omics and DL using different keyword variants. From 2018 to 2022, the search procedure was conducted on four Internet search engines: IEEE Xplore, Web of Science, ScienceDirect, and PubMed. These indexes were chosen because they offer enough coverage and linkages to numerous papers in the biological field. A total of 65 articles were added to the final list. The inclusion and exclusion criteria were specified. Of the 65 publications, 42 are clinical applications of DL in omics data. Furthermore, 16 out of 65 articles comprised the review publications based on single- and multi-omics data from the proposed taxonomy. Finally, only a small number of articles (7/65) were included in papers focusing on comparative analysis and guidelines. The use of DL in studying omics data presented several obstacles related to DL itself, preprocessing procedures, datasets, model validation, and testbed applications. Numerous relevant investigations were performed to address these issues. Unlike other review papers, our study distinctly reflects different observations on omics with DL model areas. We believe that the result of this study can be a useful guideline for practitioners who look for a comprehensive view of the role of DL in omics data analysis.
Collapse
Affiliation(s)
- Mazin Abed Mohammed
- College of Computer Science and Information Technology, University of Anbar, Anbar 31001, Iraq
- eVIDA Lab, University of Deusto, 48007 Bilbao, Spain
- Correspondence: (M.A.M.); (B.G.Z.)
| | - Karrar Hameed Abdulkareem
- College of Agriculture, Al-Muthanna University, Samawah 66001, Iraq
- College of Engineering, University of Warith Al-Anbiyaa, Karbala 56001, Iraq
| | - Ahmed M. Dinar
- Computer Engineering Department, University of Technology- Iraq, Baghdad 19006, Iraq
| | | |
Collapse
|
14
|
Deep learning in regulatory genomics: from identification to design. Curr Opin Biotechnol 2023; 79:102887. [PMID: 36640453 DOI: 10.1016/j.copbio.2022.102887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 11/12/2022] [Accepted: 12/14/2022] [Indexed: 01/14/2023]
Abstract
Genomics and deep learning are a natural match since both are data-driven fields. Regulatory genomics refers to functional noncoding DNA regulating gene expression. In recent years, deep learning applications on regulatory genomics have achieved remarkable advances so-much-so that it has revolutionized the rules of the game of the computational methods in this field. Here, we review two emerging trends: (i) the modeling of very long input sequence (up to 200 kb), which requires self-matched modularization of model architecture; (ii) on the balance of model predictability and model interpretability because the latter is more able to meet biological demands. Finally, we discuss how to employ these two routes to design synthetic regulatory DNA, as a promising strategy for optimizing crop agronomic properties.
Collapse
|
15
|
Cao J, Xiao Y, Zhang M, Huang L, Wang Y, Liu W, Wang X, Wu J, Huang Y, Wang R, Zhou L, Li L, Zhang Y, Ren L, Qian K, Wang J. Deep Learning of Dual Plasma Fingerprints for High-Performance Infection Classification. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2023; 19:e2206349. [PMID: 36470664 DOI: 10.1002/smll.202206349] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/17/2022] [Indexed: 06/17/2023]
Abstract
Infection classification is the key for choosing the proper treatment plans. Early determination of the causative agents is critical for disease control. Host responses analysis can detect variform and sensitive host inflammatory responses to ascertain the presence and type of the infection. However, traditional host-derived inflammatory indicators are insufficient for clinical infection classification. Fingerprints-based omic analysis has attracted increasing attention globally for analyzing the complex host systemic immune response. A single type of fingerprints is not applicable for infection classification (area under curve (AUC) of 0.550-0.617). Herein, an infection classification platform based on deep learning of dual plasma fingerprints (DPFs-DL) is developed. The DPFs with high reproducibility (coefficient of variation <15%) are obtained at low sample consumption (550 nL native plasma) using inorganic nanoparticle and organic matrix assisted laser desorption/ionization mass spectrometry. A classifier (DPFs-DL) for viral versus bacterial infection discrimination (AUC of 0.775) and coronavirus disease 2019 (COVID-2019) diagnosis (AUC of 0.917) is also built. Furthermore, a metabolic biomarker panel of two differentially regulated metabolites, which may serve as potential biomarkers for COVID-19 management (AUC of 0.677-0.883), is constructed. This study will contribute to the development of precision clinical care for infectious diseases.
Collapse
Affiliation(s)
- Jing Cao
- State Key Laboratory for Oncogenes and Related Genes, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
- Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
- Department of Biomedical Engineering, Faculty of Engineering, National University of Singapore, Singapore, 117583, Singapore
| | - Yan Xiao
- NHC Key Laboratory of Systems Biology of Pathogens and Christophe Merieux Laboratory, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, P. R. China
- Key Laboratory of Respiratory Disease Pathogenomics, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, P. R. China
| | - Mengji Zhang
- State Key Laboratory for Oncogenes and Related Genes, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
- Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| | - Lin Huang
- Country Department of Clinical Laboratory Medicine, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Ying Wang
- NHC Key Laboratory of Systems Biology of Pathogens and Christophe Merieux Laboratory, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, P. R. China
| | - Wanshan Liu
- State Key Laboratory for Oncogenes and Related Genes, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
- Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| | - Xinming Wang
- NHC Key Laboratory of Systems Biology of Pathogens and Christophe Merieux Laboratory, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, P. R. China
| | - Jiao Wu
- State Key Laboratory for Oncogenes and Related Genes, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
- Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| | - Yida Huang
- State Key Laboratory for Oncogenes and Related Genes, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
- Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| | - Ruimin Wang
- State Key Laboratory for Oncogenes and Related Genes, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
- Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| | - Li Zhou
- Beijing health biotech co. Ltd, Beijing, 100193, P. R. China
| | - Lin Li
- Beijing health biotech co. Ltd, Beijing, 100193, P. R. China
| | - Yong Zhang
- Department of Biomedical Engineering, Faculty of Engineering, National University of Singapore, Singapore, 117583, Singapore
| | - Lili Ren
- NHC Key Laboratory of Systems Biology of Pathogens and Christophe Merieux Laboratory, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, P. R. China
- Key Laboratory of Respiratory Disease Pathogenomics, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, P. R. China
| | - Kun Qian
- State Key Laboratory for Oncogenes and Related Genes, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
- Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| | - Jianwei Wang
- NHC Key Laboratory of Systems Biology of Pathogens and Christophe Merieux Laboratory, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, P. R. China
- Key Laboratory of Respiratory Disease Pathogenomics, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, P. R. China
| |
Collapse
|
16
|
Han K, Wang J, Wang Y, Zhang L, Yu M, Xie F, Zheng D, Xu Y, Ding Y, Wan J. A review of methods for predicting DNA N6-methyladenine sites. Brief Bioinform 2023; 24:6887111. [PMID: 36502371 DOI: 10.1093/bib/bbac514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 10/07/2022] [Accepted: 10/27/2022] [Indexed: 12/14/2022] Open
Abstract
Deoxyribonucleic acid(DNA) N6-methyladenine plays a vital role in various biological processes, and the accurate identification of its site can provide a more comprehensive understanding of its biological effects. There are several methods for 6mA site prediction. With the continuous development of technology, traditional techniques with the high costs and low efficiencies are gradually being replaced by computer methods. Computer methods that are widely used can be divided into two categories: traditional machine learning and deep learning methods. We first list some existing experimental methods for predicting the 6mA site, then analyze the general process from sequence input to results in computer methods and review existing model architectures. Finally, the results were summarized and compared to facilitate subsequent researchers in choosing the most suitable method for their work.
Collapse
Affiliation(s)
- Ke Han
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China.,College of Pharmacy, Harbin University of Commerce, Harbin, 150076, China
| | - Jianchun Wang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Yu Wang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Lei Zhang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Mengyao Yu
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Fang Xie
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Dequan Zheng
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Yaoqun Xu
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Jie Wan
- Laboratory for Space Environment and Physical Sciences, Harbin Institute of Technology, Harbin, 150001, China
| |
Collapse
|
17
|
Zhu Y, Wang M, Yin X, Zhang J, Meijering E, Hu J. Deep Learning in Diverse Intelligent Sensor Based Systems. SENSORS (BASEL, SWITZERLAND) 2022; 23:s23010062. [PMID: 36616657 PMCID: PMC9823653 DOI: 10.3390/s23010062] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 12/06/2022] [Accepted: 12/14/2022] [Indexed: 05/27/2023]
Abstract
Deep learning has become a predominant method for solving data analysis problems in virtually all fields of science and engineering. The increasing complexity and the large volume of data collected by diverse sensor systems have spurred the development of deep learning methods and have fundamentally transformed the way the data are acquired, processed, analyzed, and interpreted. With the rapid development of deep learning technology and its ever-increasing range of successful applications across diverse sensor systems, there is an urgent need to provide a comprehensive investigation of deep learning in this domain from a holistic view. This survey paper aims to contribute to this by systematically investigating deep learning models/methods and their applications across diverse sensor systems. It also provides a comprehensive summary of deep learning implementation tips and links to tutorials, open-source codes, and pretrained models, which can serve as an excellent self-contained reference for deep learning practitioners and those seeking to innovate deep learning in this space. In addition, this paper provides insights into research topics in diverse sensor systems where deep learning has not yet been well-developed, and highlights challenges and future opportunities. This survey serves as a catalyst to accelerate the application and transformation of deep learning in diverse sensor systems.
Collapse
Affiliation(s)
- Yanming Zhu
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
| | - Min Wang
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
| | - Xuefei Yin
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
| | - Jue Zhang
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
| | - Erik Meijering
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
| | - Jiankun Hu
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
| |
Collapse
|
18
|
Small RNA Targets: Advances in Prediction Tools and High-Throughput Profiling. BIOLOGY 2022; 11:biology11121798. [PMID: 36552307 PMCID: PMC9775672 DOI: 10.3390/biology11121798] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 11/27/2022] [Accepted: 12/08/2022] [Indexed: 12/14/2022]
Abstract
MicroRNAs (miRNAs) are an abundant class of small non-coding RNAs that regulate gene expression at the post-transcriptional level. They are suggested to be involved in most biological processes of the cell primarily by targeting messenger RNAs (mRNAs) for cleavage or translational repression. Their binding to their target sites is mediated by the Argonaute (AGO) family of proteins. Thus, miRNA target prediction is pivotal for research and clinical applications. Moreover, transfer-RNA-derived fragments (tRFs) and other types of small RNAs have been found to be potent regulators of Ago-mediated gene expression. Their role in mRNA regulation is still to be fully elucidated, and advancements in the computational prediction of their targets are in their infancy. To shed light on these complex RNA-RNA interactions, the availability of good quality high-throughput data and reliable computational methods is of utmost importance. Even though the arsenal of computational approaches in the field has been enriched in the last decade, there is still a degree of discrepancy between the results they yield. This review offers an overview of the relevant advancements in the field of bioinformatics and machine learning and summarizes the key strategies utilized for small RNA target prediction. Furthermore, we report the recent development of high-throughput sequencing technologies, and explore the role of non-miRNA AGO driver sequences.
Collapse
|
19
|
Han S, Yang X, Sun H, Yang H, Zhang Q, Peng C, Fang W, Li Y. LION: an integrated R package for effective prediction of ncRNA-protein interaction. Brief Bioinform 2022; 23:6713512. [PMID: 36155620 DOI: 10.1093/bib/bbac420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/03/2022] [Accepted: 08/30/2022] [Indexed: 12/14/2022] Open
Abstract
Understanding ncRNA-protein interaction is of critical importance to unveil ncRNAs' functions. Here, we propose an integrated package LION which comprises a new method for predicting ncRNA/lncRNA-protein interaction as well as a comprehensive strategy to meet the requirement of customisable prediction. Experimental results demonstrate that our method outperforms its competitors on multiple benchmark datasets. LION can also improve the performance of some widely used tools and build adaptable models for species- and tissue-specific prediction. We expect that LION will be a powerful and efficient tool for the prediction and analysis of ncRNA/lncRNA-protein interaction. The R Package LION is available on GitHub at https://github.com/HAN-Siyu/LION/.
Collapse
Affiliation(s)
- Siyu Han
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, in Jilin University, China
| | - Xiao Yang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Hang Sun
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Hu Yang
- 964 Hospital of Joint Logistic Support Force of the Chinese People's Liberation Army
| | - Qi Zhang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Cheng Peng
- School of Software, Tsinghua University, Beijing, China
| | - Wensi Fang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Ying Li
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
20
|
Sidak D, Schwarzerová J, Weckwerth W, Waldherr S. Interpretable machine learning methods for predictions in systems biology from omics data. Front Mol Biosci 2022; 9:926623. [PMID: 36387282 PMCID: PMC9650551 DOI: 10.3389/fmolb.2022.926623] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 08/15/2022] [Indexed: 12/02/2022] Open
Abstract
Machine learning has become a powerful tool for systems biologists, from diagnosing cancer to optimizing kinetic models and predicting the state, growth dynamics, or type of a cell. Potential predictions from complex biological data sets obtained by “omics” experiments seem endless, but are often not the main objective of biological research. Often we want to understand the molecular mechanisms of a disease to develop new therapies, or we need to justify a crucial decision that is derived from a prediction. In order to gain such knowledge from data, machine learning models need to be extended. A recent trend to achieve this is to design “interpretable” models. However, the notions around interpretability are sometimes ambiguous, and a universal recipe for building well-interpretable models is missing. With this work, we want to familiarize systems biologists with the concept of model interpretability in machine learning. We consider data sets, data preparation, machine learning methods, and software tools relevant to omics research in systems biology. Finally, we try to answer the question: “What is interpretability?” We introduce views from the interpretable machine learning community and propose a scheme for categorizing studies on omics data. We then apply these tools to review and categorize recent studies where predictive machine learning models have been constructed from non-sequential omics data.
Collapse
Affiliation(s)
- David Sidak
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
| | - Jana Schwarzerová
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno, Czech Republic
| | - Wolfram Weckwerth
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
- Vienna Metabolomics Center (VIME), Faculty of Life Sciences, University of Vienna, Vienna, Austria
| | - Steffen Waldherr
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
- *Correspondence: Steffen Waldherr,
| |
Collapse
|
21
|
Omics Data and Data Representations for Deep Learning-Based Predictive Modeling. Int J Mol Sci 2022; 23:ijms232012272. [PMID: 36293133 PMCID: PMC9603455 DOI: 10.3390/ijms232012272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/03/2022] [Accepted: 10/12/2022] [Indexed: 11/25/2022] Open
Abstract
Medical discoveries mainly depend on the capability to process and analyze biological datasets, which inundate the scientific community and are still expanding as the cost of next-generation sequencing technologies is decreasing. Deep learning (DL) is a viable method to exploit this massive data stream since it has advanced quickly with there being successive innovations. However, an obstacle to scientific progress emerges: the difficulty of applying DL to biology, and this because both fields are evolving at a breakneck pace, thus making it hard for an individual to occupy the front lines of both of them. This paper aims to bridge the gap and help computer scientists bring their valuable expertise into the life sciences. This work provides an overview of the most common types of biological data and data representations that are used to train DL models, with additional information on the models themselves and the various tasks that are being tackled. This is the essential information a DL expert with no background in biology needs in order to participate in DL-based research projects in biomedicine, biotechnology, and drug discovery. Alternatively, this study could be also useful to researchers in biology to understand and utilize the power of DL to gain better insights into and extract important information from the omics data.
Collapse
|
22
|
Gonçalves E, Poulos RC, Cai Z, Barthorpe S, Manda SS, Lucas N, Beck A, Bucio-Noble D, Dausmann M, Hall C, Hecker M, Koh J, Lightfoot H, Mahboob S, Mali I, Morris J, Richardson L, Seneviratne AJ, Shepherd R, Sykes E, Thomas F, Valentini S, Williams SG, Wu Y, Xavier D, MacKenzie KL, Hains PG, Tully B, Robinson PJ, Zhong Q, Garnett MJ, Reddel RR. Pan-cancer proteomic map of 949 human cell lines. Cancer Cell 2022; 40:835-849.e8. [PMID: 35839778 PMCID: PMC9387775 DOI: 10.1016/j.ccell.2022.06.010] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 03/29/2022] [Accepted: 06/21/2022] [Indexed: 12/12/2022]
Abstract
The proteome provides unique insights into disease biology beyond the genome and transcriptome. A lack of large proteomic datasets has restricted the identification of new cancer biomarkers. Here, proteomes of 949 cancer cell lines across 28 tissue types are analyzed by mass spectrometry. Deploying a workflow to quantify 8,498 proteins, these data capture evidence of cell-type and post-transcriptional modifications. Integrating multi-omics, drug response, and CRISPR-Cas9 gene essentiality screens with a deep learning-based pipeline reveals thousands of protein biomarkers of cancer vulnerabilities that are not significant at the transcript level. The power of the proteome to predict drug response is very similar to that of the transcriptome. Further, random downsampling to only 1,500 proteins has limited impact on predictive power, consistent with protein networks being highly connected and co-regulated. This pan-cancer proteomic map (ProCan-DepMapSanger) is a comprehensive resource available at https://cellmodelpassports.sanger.ac.uk.
Collapse
Affiliation(s)
- Emanuel Gonçalves
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK; Instituto Superior Técnico (IST), Universidade de Lisboa, 1049-001 Lisboa, Portugal; INESC-ID, 1000-029 Lisboa, Portugal
| | - Rebecca C Poulos
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Zhaoxiang Cai
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Syd Barthorpe
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Srikanth S Manda
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Natasha Lucas
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Alexandra Beck
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Daniel Bucio-Noble
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Michael Dausmann
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Caitlin Hall
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Michael Hecker
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Jennifer Koh
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Howard Lightfoot
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Sadia Mahboob
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Iman Mali
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - James Morris
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Laura Richardson
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Akila J Seneviratne
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Rebecca Shepherd
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Erin Sykes
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Frances Thomas
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Sara Valentini
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Steven G Williams
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Yangxiu Wu
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Dylan Xavier
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Karen L MacKenzie
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Peter G Hains
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Brett Tully
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Phillip J Robinson
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia.
| | - Qing Zhong
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia.
| | - Mathew J Garnett
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK.
| | - Roger R Reddel
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia.
| |
Collapse
|
23
|
Assessing comparative importance of DNA sequence and epigenetic modifications on gene expression using a deep convolutional neural network. Comput Struct Biotechnol J 2022; 20:3814-3823. [PMID: 35891778 PMCID: PMC9307602 DOI: 10.1016/j.csbj.2022.07.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 07/05/2022] [Accepted: 07/05/2022] [Indexed: 11/26/2022] Open
Abstract
Gene expression is regulated at both transcriptional and post-transcriptional levels. DNA sequence and epigenetic modifications are key factors which regulate gene transcription. Understanding their complex interactions and their respective contributions to gene expression regulation remains a challenge in biological studies. We have developed iSEGnet, a framework of deep convolutional neural network to predict mRNA abundance using the information on DNA sequences as well as epigenetic modifications within genes and their cis-regulatory regions. We demonstrate that our framework outperforms other machine learning models in terms of predicting mRNA abundance using transcriptional and epigenetic profiles from six distinct cell lines/types chosen from the ENCODE. The analysis from the learned models also reveals that specific regions around promotors and transcription termination sites are most important for gene expression regulation. Using the method of Integrated Gradients, we identify narrow segments in these regions which are most likely to impact gene expression for a specific epigenetic modification. We further show that these identified segments are enriched in known active regulatory regions by comparing the transcription factor binding sites obtained via ChIP-seq. Moreover, we demonstrate how iSEGnet can uncover potential transcription factors that have regulatory functions in cancer using two cancer multi-omics data.
Collapse
|
24
|
Niu J, Yang J, Guo Y, Qian K, Wang Q. Joint deep learning for batch effect removal and classification toward MALDI MS based metabolomics. BMC Bioinformatics 2022; 23:270. [PMID: 35818047 PMCID: PMC9275160 DOI: 10.1186/s12859-022-04758-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 05/30/2022] [Indexed: 12/02/2022] Open
Abstract
Background Metabolomics is a primary omics topic, which occupies an important position in both clinical applications and basic researches for metabolic signatures and biomarkers. Unfortunately, the relevant studies are challenged by the batch effect caused by many external factors. In last decade, the technique of deep learning has become a dominant tool in data science, such that one may train a diagnosis network from a known batch and then generalize it to a new batch. However, the batch effect inevitably hinders such efforts, as the two batches under consideration can be highly mismatched. Results We propose an end-to-end deep learning framework, for joint batch effect removal and then classification upon metabolomics data. We firstly validate the proposed deep learning framework on a public CyTOF dataset as a simulated experiment. We also visually compare the t-SNE distribution and demonstrate that our method effectively removes the batch effects in latent space. Then, for a private MALDI MS dataset, we have achieved the highest diagnostic accuracy, with about 5.1 ~ 7.9% increase on average over state-of-the-art methods. Conclusions Both experiments conclude that our method performs significantly better in classification than conventional methods benefitting from the effective removal of batch effect. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04758-z.
Collapse
Affiliation(s)
- Jingyang Niu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Jing Yang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Yuyu Guo
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Kun Qian
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Qian Wang
- School of Biomedical Engineering, ShanghaiTech University, Shanghai, 201210, China.
| |
Collapse
|
25
|
Music Score Recognition Method Based on Deep Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:3022767. [PMID: 35845890 PMCID: PMC9282982 DOI: 10.1155/2022/3022767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 03/22/2022] [Indexed: 01/09/2023]
Abstract
In recent years, the recommendation application of artificial intelligence and deep music has gradually become a research hotspot. As a complex machine learning algorithm, deep learning can extract features with value laws through training samples. The rise of deep learning network will promote the development of artificial intelligence and also provide a new idea for music score recognition. In this paper, the improved deep learning algorithm is applied to the research of music score recognition. Based on the traditional neural network, the attention weight value improved convolutional neural network (CNN) and high execution efficiency deep belief network (DBN) are introduced to realize the feature extraction and intelligent recognition of music score. Taking the feature vector set extracted by CNN-DBN as input set, a feature learning algorithm based on CNN&DBN was established to extract music score. Experiments show that the proposed model in a variety of different types of polyphony music recognition showed more accurate recognition and good performance; the recognition rate of the improved algorithm applied to the soundtrack identification is as high as 98.4%, which is significantly better than those of other classic algorithms, proving that CNN&DBN can achieve better effect in music information retrieval. It provides data support for constructing knowledge graph in music field and indicates that deep learning has great research value in music retrieval field.
Collapse
|
26
|
Madhumita, Paul S. Capturing the latent space of an Autoencoder for multi-omics integration and cancer subtyping. Comput Biol Med 2022; 148:105832. [PMID: 35834966 DOI: 10.1016/j.compbiomed.2022.105832] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 06/15/2022] [Accepted: 07/03/2022] [Indexed: 11/29/2022]
Abstract
BACKGROUND AND OBJECTIVE The motivation behind cancer subtyping is to identify subgroups of cancer patients with distinguishable phenotypes of clinical importance. It can assist in advancement of subtype-targeted based treatments. Subtype identification is a complicated task, therefore requires multi-omics data integration to identify the precise patients' subgroup. Over the years, several computational attempts have been made to identify the cancer subtypes accurately using integrative multi-omics analysis. Some studies have used Autoencoders (AE) to capture multi-omics feature integration in lower dimensions for identifying subtypes in specific types of cancer. However, capturing the highly informative latent space by learning the deep architectures of AE to attain a satisfactory generalized performance is required. Therefore, in this study, a novel AE-assisted cancer subtyping framework is presented that utilizes the compressed latent space of a Sparse AE neural network for multi-omics clustering. METHODS The proposed framework first performs a supervised feature selection based on the survival status of the patients. The selected features from each of the omic data are passed to the AE. The information embedded in the latent space of the trained AE neural networks are then used for cancer subtyping using Spectral clustering. The AE architecture designed in this study exhaustively searches the best compression for multi-omics data by varying the number of neurons in the hidden layers and penalizing activations within the layers. RESULTS AND CONCLUSION The proposed framework is applied to five different multi-omics cancer datasets taken from The Cancer Genome Atlas. It is observed that for getting a robust information bottleneck, a compression of 10-20% of the input features along with an L1 regularization penalty of 0.01 or 0.001 performs well for most of the cancer datasets. Clustering performed on this latent representation generates clusters with better silhouette scores and significantly varying survival patterns. For further biological assessment, differential expression analysis is performed between the identified subtypes of Glioblastoma multiforme (GBM), followed by enrichment analysis of the differentially expressed biomarkers. Several pathways and disease ontology terms coherent to GBM are found to be significantly associated. Varying responses of the identified GBM subtypes towards the drug Temozolomide is also tested to demonstrate its clinical importance. Hence, the study shows that AE-assisted multi-omics integration can be used for the prediction of clinically significant cancer subtypes.
Collapse
Affiliation(s)
- Madhumita
- Department of Bioscience and Bioengineering, Indian Institute of Technology, Jodhpur, 342037, Rajasthan, India.
| | - Sushmita Paul
- Department of Bioscience and Bioengineering, Indian Institute of Technology, Jodhpur, 342037, Rajasthan, India; School of Artificial Intelligence and Data Science, Indian Institute of Technology, Jodhpur, 342037, Rajasthan, India.
| |
Collapse
|
27
|
Niu J, Xu W, Wei D, Qian K, Wang Q. Deep Learning Framework for Integrating Multibatch Calibration, Classification, and Pathway Activities. Anal Chem 2022; 94:8937-8946. [PMID: 35709357 DOI: 10.1021/acs.analchem.2c00601] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The amount of available biological data has exploded since the emergence of high-throughput technologies, which is not only revolting the way we recognize molecules and diseases but also bringing novel analytical challenges to bioinformatics analysis. In recent years, deep learning has become a dominant technique in data science. However, classification accuracy is plagued with domain discrepancy. Notably, in the presence of multiple batches, domain discrepancy typically happens between individual batches. Most pairwise adaptation approaches may be suboptimal as they fail to eliminate external factors across multiple batches and take the classification task into account simultaneously. We propose a joint deep learning framework for integrating batch effect removal, classification, and downstream pathway activities upon biological data. To this end, we validate it on two MALDI MS-based metabolomics datasets. We have achieved the highest diagnostic accuracy (ACC), with a notable ∼10% improvement over other methods. Overall, these results indicate that our approach removes batch effect more effectively than state-of-the-art methods and yields more accurate classification as well as biomarkers for smart diagnosis.
Collapse
Affiliation(s)
- JingYang Niu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Wei Xu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - DongMing Wei
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Kun Qian
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Qian Wang
- School of Biomedical Engineering, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
28
|
Abstract
The tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences.
Collapse
|
29
|
Identifying a Correlation among Qualitative Non-Numeric Parameters in Natural Fish Microbe Dataset Using Machine Learning. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12125927] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Recent technical innovations and developments in computer-based technology have enabled bioscience researchers to acquire comprehensive datasets and identify unique parameters within experimental datasets. However, field researchers may face the challenge that datasets exhibit few associations among any measurement results (e.g., from analytical instruments, phenotype observations as well as field environmental data), and may contain non-numerical, qualitative parameters, which make statistical analyses difficult. Here, we propose an advanced analysis scheme that combines two machine learning steps to mine association rules between non-numerical parameters. The aim of this analysis is to identify relationships between variables and enable the visualization of association rules from data of samples collected in the field, which have less correlations between genetic, physical, and non-numerical qualitative parameters. The analysis scheme presented here may increase the potential to identify important characteristics of big datasets.
Collapse
|
30
|
Egger J, Gsaxner C, Pepe A, Pomykala KL, Jonske F, Kurz M, Li J, Kleesiek J. Medical deep learning-A systematic meta-review. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 221:106874. [PMID: 35588660 DOI: 10.1016/j.cmpb.2022.106874] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 04/22/2022] [Accepted: 05/10/2022] [Indexed: 05/22/2023]
Abstract
Deep learning has remarkably impacted several different scientific disciplines over the last few years. For example, in image processing and analysis, deep learning algorithms were able to outperform other cutting-edge methods. Additionally, deep learning has delivered state-of-the-art results in tasks like autonomous driving, outclassing previous attempts. There are even instances where deep learning outperformed humans, for example with object recognition and gaming. Deep learning is also showing vast potential in the medical domain. With the collection of large quantities of patient records and data, and a trend towards personalized treatments, there is a great need for automated and reliable processing and analysis of health information. Patient data is not only collected in clinical centers, like hospitals and private practices, but also by mobile healthcare apps or online websites. The abundance of collected patient data and the recent growth in the deep learning field has resulted in a large increase in research efforts. In Q2/2020, the search engine PubMed returned already over 11,000 results for the search term 'deep learning', and around 90% of these publications are from the last three years. However, even though PubMed represents the largest search engine in the medical field, it does not cover all medical-related publications. Hence, a complete overview of the field of 'medical deep learning' is almost impossible to obtain and acquiring a full overview of medical sub-fields is becoming increasingly more difficult. Nevertheless, several review and survey articles about medical deep learning have been published within the last few years. They focus, in general, on specific medical scenarios, like the analysis of medical images containing specific pathologies. With these surveys as a foundation, the aim of this article is to provide the first high-level, systematic meta-review of medical deep learning surveys.
Collapse
Affiliation(s)
- Jan Egger
- Institute of Computer Graphics and Vision, Faculty of Computer Science and Biomedical Engineering, Graz University of Technology, Inffeldgasse 16, 8010 Graz, Styria, Austria; Department of Oral &Maxillofacial Surgery, Medical University of Graz, Auenbruggerplatz 5/1, 8036 Graz, Styria, Austria; Computer Algorithms for Medicine Laboratory, Graz, Styria, Austria; Institute for AI in Medicine (IKIM), University Medicine Essen, Girardetstraße 2, 45131 Essen, Germany; Cancer Research Center Cologne Essen (CCCE), University Medicine Essen, Hufelandstraße 55, 45147 Essen, Germany.
| | - Christina Gsaxner
- Institute of Computer Graphics and Vision, Faculty of Computer Science and Biomedical Engineering, Graz University of Technology, Inffeldgasse 16, 8010 Graz, Styria, Austria; Department of Oral &Maxillofacial Surgery, Medical University of Graz, Auenbruggerplatz 5/1, 8036 Graz, Styria, Austria; Computer Algorithms for Medicine Laboratory, Graz, Styria, Austria
| | - Antonio Pepe
- Institute of Computer Graphics and Vision, Faculty of Computer Science and Biomedical Engineering, Graz University of Technology, Inffeldgasse 16, 8010 Graz, Styria, Austria; Computer Algorithms for Medicine Laboratory, Graz, Styria, Austria
| | - Kelsey L Pomykala
- Institute for AI in Medicine (IKIM), University Medicine Essen, Girardetstraße 2, 45131 Essen, Germany
| | - Frederic Jonske
- Computer Algorithms for Medicine Laboratory, Graz, Styria, Austria; Institute for AI in Medicine (IKIM), University Medicine Essen, Girardetstraße 2, 45131 Essen, Germany
| | - Manuel Kurz
- Institute of Computer Graphics and Vision, Faculty of Computer Science and Biomedical Engineering, Graz University of Technology, Inffeldgasse 16, 8010 Graz, Styria, Austria; Computer Algorithms for Medicine Laboratory, Graz, Styria, Austria
| | - Jianning Li
- Institute of Computer Graphics and Vision, Faculty of Computer Science and Biomedical Engineering, Graz University of Technology, Inffeldgasse 16, 8010 Graz, Styria, Austria; Computer Algorithms for Medicine Laboratory, Graz, Styria, Austria; Institute for AI in Medicine (IKIM), University Medicine Essen, Girardetstraße 2, 45131 Essen, Germany
| | - Jens Kleesiek
- Institute for AI in Medicine (IKIM), University Medicine Essen, Girardetstraße 2, 45131 Essen, Germany; Cancer Research Center Cologne Essen (CCCE), University Medicine Essen, Hufelandstraße 55, 45147 Essen, Germany; German Cancer Consortium (DKTK), Partner Site Essen, Hufelandstraße 55, 45147 Essen, Germany
| |
Collapse
|
31
|
Gao Y, Chen Y, Feng H, Zhang Y, Yue Z. RicENN: Prediction of Rice Enhancers with Neural Network Based on DNA Sequences. Interdiscip Sci 2022; 14:555-565. [PMID: 35190950 DOI: 10.1007/s12539-022-00503-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 01/07/2022] [Accepted: 01/18/2022] [Indexed: 01/22/2023]
Abstract
Enhancers are the primary cis-elements of transcriptional regulation and play a vital role in gene expression at different stages of plant growth and development. Having high locational variation and free scattering in non-encoding genomes, identification of enhancers is a crucial, but challenging work in understanding the biological mechanism of model plants. Recently, applications of neural network models are gaining increasing popularity in predicting the function of genomic elements. Although several computational models have shown great advantages to tackle this challenge, a further study of the identification of rice enhancers from DNA sequences is still lacking. We present RicENN, a novel deep learning framework capable of accurately identifying enhancers of rice, integrating convolution neural networks (CNNs), bi-directional recurrent neural networks (RNNs), and attention mechanisms. A combined-feature representation method was designed to extract the sequence features from original DNA sequences using six types of autocorrelation encodings. Moreover, we verified that the integrated model achieves the best performance by an ablation study. Finally, our deep learning framework realized a reliable prediction of the rice enhancers. The results show RicENN outperforms available alternative approaches in rice species, achieving the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) of 0.960 and 0.960 on cross-validation, and 0.879 and 0.877 during independent tests, respectively. This study develops a hybrid model to combine the merits of different neural network architectures, which shows the potential ability to apply deep learning in bioinformatic sequences and contributes to the acceleration of functional genomic studies of rice. RicENN and its code are freely accessible at http://bioinfor.aielab.cc/RicENN/ .
Collapse
Affiliation(s)
- Yujia Gao
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Yiqiong Chen
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Haisong Feng
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Youhua Zhang
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| | - Zhenyu Yue
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
32
|
Bourgeais V, Zehraoui F, Hanczar B. GraphGONet: a self-explaining neural network encapsulating the Gene Ontology graph for phenotype prediction on gene expression. Bioinformatics 2022; 38:2504-2511. [PMID: 35266505 DOI: 10.1093/bioinformatics/btac147] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Revised: 02/02/2022] [Accepted: 03/07/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Medical care is becoming more and more specific to patients' needs due to the increased availability of omics data. The application to these data of sophisticated machine learning models, in particular deep learning, can improve the field of precision medicine. However, their use in clinics is limited as their predictions are not accompanied by an explanation. The production of accurate and intelligible predictions can benefit from the inclusion of domain knowledge. Therefore, knowledge-based deep learning models appear to be a promising solution. RESULTS In this paper, we propose GraphGONet, where the Gene Ontology is encapsulated in the hidden layers of a new self-explaining neural network. Each neuron in the layers represents a biological concept, combining the gene expression profile of a patient, and the information from its neighboring neurons. The experiments described in the paper confirm that our model not only performs as accurately as the state-of-the-art (non-explainable ones) but also automatically produces stable and intelligible explanations composed of the biological concepts with the highest contribution. This feature allows experts to use our tool in a medical setting. AVAILABILITY GraphGONet is freely available at https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Victoria Bourgeais
- IBISC,Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes, 91020, France
| | - Farida Zehraoui
- IBISC,Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes, 91020, France
| | - Blaise Hanczar
- IBISC,Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes, 91020, France
| |
Collapse
|
33
|
Ma C, Wu M, Ma S. Analysis of cancer omics data: a selective review of statistical techniques. Brief Bioinform 2022; 23:6510158. [PMID: 35039832 DOI: 10.1093/bib/bbab585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 12/19/2021] [Accepted: 12/20/2021] [Indexed: 11/13/2022] Open
Abstract
Cancer is an omics disease. The development in high-throughput profiling has fundamentally changed cancer research and clinical practice. Compared with clinical, demographic and environmental data, the analysis of omics data-which has higher dimensionality, weaker signals and more complex distributional properties-is much more challenging. Developments in the literature are often 'scattered', with individual studies focused on one or a few closely related methods. The goal of this review is to assist cancer researchers with limited statistical expertise in establishing the 'overall framework' of cancer omics data analysis. To facilitate understanding, we mainly focus on intuition, concepts and key steps, and refer readers to the original publications for mathematical details. This review broadly covers unsupervised and supervised analysis, as well as individual-gene-based, gene-set-based and gene-network-based analysis. We also briefly discuss 'special topics' including interaction analysis, multi-datasets analysis and multi-omics analysis.
Collapse
Affiliation(s)
- Chenjin Ma
- College of Statistics and Data Science, Faculty of Science, Beijing University of Technology, Beijing, China
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
34
|
Alzubaidi A, Tepper J. Deep Mining from Omics Data. Methods Mol Biol 2022; 2449:349-386. [PMID: 35507271 DOI: 10.1007/978-1-0716-2095-3_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Since the advent of high-throughput omics technologies, various molecular data such as genes, transcripts, proteins, and metabolites have been made widely available to researchers. This has afforded clinicians, bioinformaticians, statisticians, and data scientists the opportunity to apply their innovations in feature mining and predictive modeling to a rich data resource to develop a wide range of generalizable prediction models. What has become apparent over the last 10 years is that researchers have adopted deep neural networks (or "deep nets") as their preferred paradigm of choice for complex data modeling due to the superiority of performance over more traditional statistical machine learning approaches, such as support vector machines. A key stumbling block, however, is that deep nets inherently lack transparency and are considered to be a "black box" approach. This naturally makes it very difficult for clinicians and other stakeholders to trust their deep learning models even though the model predictions appear to be highly accurate. In this chapter, we therefore provide a detailed summary of the deep net architectures typically used in omics research, together with a comprehensive summary of the notable "deep feature mining" techniques researchers have applied to open up this black box and provide some insights into the salient input features and why these models behave as they do. We group these techniques into the following three categories: (a) hidden layer visualization and interpretation; (b) input feature importance and impact evaluation; and (c) output layer gradient analysis. While we find that omics researchers have made some considerable gains in opening up the black box through interpretation of the hidden layer weights and node activations to identify salient input features, we highlight other approaches for omics researchers, such as employing deconvolutional network-based approaches and development of bespoke attribute impact measures to enable researchers to better understand the relationships between the input data and hidden layer representations formed and thus the output behavior of their deep nets.
Collapse
Affiliation(s)
- Abeer Alzubaidi
- School of Science and Technology, Department of Computer Science, Nottingham Trent University, Nottingham, UK.
| | | |
Collapse
|
35
|
Wu Y, Zeng M, Fei Z, Yu Y, Wu FX, Li M. KAICD: A knowledge attention-based deep learning framework for automatic ICD coding. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2020.05.115] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
36
|
Huminiecki Ł. Virtual Gene Concept and a Corresponding Pragmatic Research Program in Genetical Data Science. ENTROPY (BASEL, SWITZERLAND) 2021; 24:17. [PMID: 35052043 PMCID: PMC8774939 DOI: 10.3390/e24010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/02/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]
Abstract
Mendel proposed an experimentally verifiable paradigm of particle-based heredity that has been influential for over 150 years. The historical arguments have been reflected in the near past as Mendel's concept has been diversified by new types of omics data. As an effect of the accumulation of omics data, a virtual gene concept forms, giving rise to genetical data science. The concept integrates genetical, functional, and molecular features of the Mendelian paradigm. I argue that the virtual gene concept should be deployed pragmatically. Indeed, the concept has already inspired a practical research program related to systems genetics. The program includes questions about functionality of structural and categorical gene variants, about regulation of gene expression, and about roles of epigenetic modifications. The methodology of the program includes bioinformatics, machine learning, and deep learning. Education, funding, careers, standards, benchmarks, and tools to monitor research progress should be provided to support the research program.
Collapse
Affiliation(s)
- Łukasz Huminiecki
- Evolutionary, Computational, and Statistical Genetics, Department of Molecula Biology, Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, Postępu 36A, Jastrzębiec, 05-552 Warsaw, Poland
| |
Collapse
|
37
|
Biological features between miRNAs and their targets are unveiled from deep learning models. Sci Rep 2021; 11:23825. [PMID: 34893648 PMCID: PMC8664955 DOI: 10.1038/s41598-021-03215-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 11/08/2021] [Indexed: 12/02/2022] Open
Abstract
MicroRNAs (miRNAs) are ~ 22 nucleotide ubiquitous gene regulators. They modulate a broad range of essential cellular processes linked to human health and diseases. Consequently, identifying miRNA targets and understanding how they function are critical for treating miRNA associated diseases. In our earlier work, a hybrid deep learning-based approach (miTAR) was developed for predicting miRNA targets. It performs substantially better than the existing methods. The approach integrates two major types of deep learning algorithms: convolutional neural networks (CNNs) and recurrent neural networks (RNNs). However, the features in miRNA:target interactions learned by miTAR have not been investigated. In the current study, we demonstrated that miTAR captures known features, including the involvement of seed region and the free energy, as well as multiple novel features, in the miRNA:target interactions. Interestingly, the CNN and RNN layers of the model perform differently at capturing the free energy feature: the units in RNN layer is more unique at capturing the feature but collectively the CNN layer is more efficient at capturing the feature. Although deep learning models are commonly thought “black-boxes”, our discoveries support that the biological features in miRNA:target can be unveiled from deep learning models, which will be beneficial to the understanding of the mechanisms in miRNA:target interactions.
Collapse
|
38
|
Deep Learning for Human Disease Detection, Subtype Classification, and Treatment Response Prediction Using Epigenomic Data. Biomedicines 2021; 9:biomedicines9111733. [PMID: 34829962 PMCID: PMC8615388 DOI: 10.3390/biomedicines9111733] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 10/26/2021] [Accepted: 11/17/2021] [Indexed: 12/25/2022] Open
Abstract
Deep learning (DL) is a distinct class of machine learning that has achieved first-class performance in many fields of study. For epigenomics, the application of DL to assist physicians and scientists in human disease-relevant prediction tasks has been relatively unexplored until very recently. In this article, we critically review published studies that employed DL models to predict disease detection, subtype classification, and treatment responses, using epigenomic data. A comprehensive search on PubMed, Scopus, Web of Science, Google Scholar, and arXiv.org was performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Among 1140 initially identified publications, we included 22 articles in our review. DNA methylation and RNA-sequencing data are most frequently used to train the predictive models. The reviewed models achieved a high accuracy ranged from 88.3% to 100.0% for disease detection tasks, from 69.5% to 97.8% for subtype classification tasks, and from 80.0% to 93.0% for treatment response prediction tasks. We generated a workflow to develop a predictive model that encompasses all steps from first defining human disease-related tasks to finally evaluating model performance. DL holds promise for transforming epigenomic big data into valuable knowledge that will enhance the development of translational epigenomics.
Collapse
|
39
|
Protein function prediction using functional inter-relationship. Comput Biol Chem 2021; 95:107593. [PMID: 34736126 DOI: 10.1016/j.compbiolchem.2021.107593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 08/25/2021] [Accepted: 10/03/2021] [Indexed: 11/23/2022]
Abstract
With the growth of high throughput sequencing techniques, the generation of protein sequences has become fast and cheap, leading to a huge increase in the number of known proteins. However, it is challenging to identify the functions being performed by these newly discovered proteins. Machine learning techniques have improved traditional methods' efficiency by suggesting relevant functions but fails to perform well when the number of functions to be predicted becomes large. In this work, we propose a machine learning-based approach to predict huge set of protein functions that use the inter-relationships between functions to improve the model's predictability. These inter-relationships of functions is used to reduce the redundancy caused by highly correlated functions. The proposed model is trained on the reduced set of non-redundant functions hindering the ambiguity caused due to inter-related functions. Here, we use two statistical approaches 1) Pearson's correlation coefficient 2) Jaccard similarity coefficient, as a measure of correlation to remove redundant functions. To have a fair evaluation of the proposed model, we recreate our original function set by inverse transforming the reduced set using the two proposed approaches: Direct mapping and Ensemble approach. The model is tested using different feature sets and function sets of biological processes and molecular functions to get promising results on DeepGO and CAFA3 dataset. The proposed model is able to predict specific functions for the test data which were unpredictable by other compared methods. The experimental models, code and other relevant data are available at https://github.com/richadhanuka/PFP-using-Functional-interrelationship.
Collapse
|
40
|
Buterez D. Scaling up DNA digital data storage by efficiently predicting DNA hybridisation using deep learning. Sci Rep 2021; 11:20517. [PMID: 34654863 PMCID: PMC8519920 DOI: 10.1038/s41598-021-97238-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 08/23/2021] [Indexed: 11/09/2022] Open
Abstract
Deoxyribonucleic acid (DNA) has shown great promise in enabling computational applications, most notably in the fields of DNA digital data storage and DNA computing. Information is encoded as DNA strands, which will naturally bind in solution, thus enabling search and pattern-matching capabilities. Being able to control and predict the process of DNA hybridisation is crucial for the ambitious future of Hybrid Molecular-Electronic Computing. Current tools are, however, limited in terms of throughput and applicability to large-scale problems. We present the first comprehensive study of machine learning methods applied to the task of predicting DNA hybridisation. For this purpose, we introduce an in silico-generated hybridisation dataset of over 2.5 million data points, enabling the use of deep learning. Depending on hardware, we achieve a reduction in inference time ranging from one to over two orders of magnitude compared to the state-of-the-art, while retaining high fidelity. We then discuss the integration of our methods in modern, scalable workflows.
Collapse
Affiliation(s)
- David Buterez
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK. .,Department of Computing, Imperial College London, London, UK.
| |
Collapse
|
41
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
42
|
Chen D, Fulmer C, Gordon IO, Syed S, Stidham RW, Vande Casteele N, Qin Y, Falloon K, Cohen BL, Wyllie R, Rieder F. Application of Artificial Intelligence to Clinical Practice in Inflammatory Bowel Disease - What the Clinician Needs to Know. J Crohns Colitis 2021; 16:460-471. [PMID: 34558619 PMCID: PMC8919817 DOI: 10.1093/ecco-jcc/jjab169] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Artificial intelligence [AI] techniques are quickly spreading across medicine as an analytical method to tackle challenging clinical questions. What were previously thought of as highly complex data sources, such as images or free text, are now becoming manageable. Novel analytical methods merge the latest developments in information technology infrastructure with advances in computer science. Once primarily associated with Silicon Valley, AI techniques are now making their way into medicine, including in the field of inflammatory bowel diseases [IBD]. Understanding potential applications and limitations of these techniques can be difficult, in particular for busy clinicians. In this article, we explain the basic terminologies and provide a particular focus on the foundations behind state-of-the-art AI methodologies in both imaging and text. We explore the growing applications of AI in medicine, with a specific focus on IBD to inform the practising gastroenterologist and IBD specialist. Finally, we outline possible future uses of these technologies in daily clinical practice.
Collapse
Affiliation(s)
- David Chen
- Medical Operations, Cleveland Clinic Foundation, Cleveland, OH, USA,Department of Gastroenterology, Hepatology and Nutrition, Digestive Diseases and Surgery Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Clifton Fulmer
- Department of Pathology, Robert J. Tomsich Pathology and Laboratory Medicine Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Ilyssa O Gordon
- Department of Pathology, Robert J. Tomsich Pathology and Laboratory Medicine Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Sana Syed
- Division of Gastroenterology, Hepatology and Nutrition, Department of Pediatrics, School of Medicine, University of Virginia, Charlottesville, VA, USA,School of Data Science, University of Virginia, Charlottesville, VA, USA
| | - Ryan W Stidham
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Michigan Medicine, Ann Arbor, MI, USA,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | | | - Yi Qin
- Department of Gastroenterology, Hepatology and Nutrition, Digestive Diseases and Surgery Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Katherine Falloon
- Department of Gastroenterology, Hepatology and Nutrition, Digestive Diseases and Surgery Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Benjamin L Cohen
- Department of Gastroenterology, Hepatology and Nutrition, Digestive Diseases and Surgery Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Robert Wyllie
- Medical Operations, Cleveland Clinic Foundation, Cleveland, OH, USA,Department of Pediatric Gastroenterology, Hepatology, and Nutrition, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Florian Rieder
- Corresponding author: Florian Rieder, MD, Department of Inflammation and Immunity, and Department of Gastroenterology, Hepatology, & Nutrition, Cleveland Clinic Foundation, 9500 Euclid Ave., Cleveland, OH 44195, USA. Tel: (216) 445-5631; Fax: (216) 636-0104; E-mail:
| |
Collapse
|
43
|
Alharbi A, Abdur Rahman MD. Review of Recent Technologies for Tackling COVID-19. SN COMPUTER SCIENCE 2021; 2:460. [PMID: 34549196 PMCID: PMC8444512 DOI: 10.1007/s42979-021-00841-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 08/26/2021] [Indexed: 01/09/2023]
Abstract
The current pandemic caused by the COVID-19 virus requires more effort, experience, and science-sharing to overcome the damage caused by the pathogen. The fast and wide human-to-human transmission of the COVID-19 virus demands a significant role of the newest technologies in the form of local and global computing and information sharing, data privacy, and accurate tests. The advancements of deep neural networks, cloud computing solutions, blockchain technology, and beyond 5G (B5G) communication have contributed to the better management of the COVID-19 impacts on society. This paper reviews recent attempts to tackle the COVID-19 situation using these technological advancements.
Collapse
Affiliation(s)
- Ayman Alharbi
- Department Of Computer Engineering, College of Computer and Information systems, Umm AL-Qura University, Mecca, Saudi Arabia
| | - MD Abdur Rahman
- Department of Cyber Security and Forensic Computing, College of Computer and Cyber Sciences, University of Prince Mugrin, Madinah, 41499 Saudi Arabia
| |
Collapse
|
44
|
Albaradei S, Thafar M, Alsaedi A, Van Neste C, Gojobori T, Essack M, Gao X. Machine learning and deep learning methods that use omics data for metastasis prediction. Comput Struct Biotechnol J 2021; 19:5008-5018. [PMID: 34589181 PMCID: PMC8450182 DOI: 10.1016/j.csbj.2021.09.001] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 08/16/2021] [Accepted: 09/02/2021] [Indexed: 12/14/2022] Open
Abstract
Knowing metastasis is the primary cause of cancer-related deaths, incentivized research directed towards unraveling the complex cellular processes that drive the metastasis. Advancement in technology and specifically the advent of high-throughput sequencing provides knowledge of such processes. This knowledge led to the development of therapeutic and clinical applications, and is now being used to predict the onset of metastasis to improve diagnostics and disease therapies. In this regard, predicting metastasis onset has also been explored using artificial intelligence approaches that are machine learning, and more recently, deep learning-based. This review summarizes the different machine learning and deep learning-based metastasis prediction methods developed to date. We also detail the different types of molecular data used to build the models and the critical signatures derived from the different methods. We further highlight the challenges associated with using machine learning and deep learning methods, and provide suggestions to improve the predictive performance of such methods.
Collapse
Key Words
- AE, autoencoder
- ANN, Artificial Neural Network
- AUC, area under the curve
- Acc, Accuracy
- Artificial intelligence
- BC, Betweenness centrality
- BH, Benjamini-Hochberg
- BioGRID, Biological General Repository for Interaction Datasets
- CCP, compound covariate predictor
- CEA, Carcinoembryonic antigen
- CNN, convolution neural networks
- CV, cross-validation
- Cancer
- DBN, deep belief network
- DDBN, discriminative deep belief network
- DEGs, differentially expressed genes
- DIP, Database of Interacting Proteins
- DNN, Deep neural network
- DT, Decision Tree
- Deep learning
- EMT, epithelial-mesenchymal transition
- FC, fully connected
- GA, Genetic Algorithm
- GANs, generative adversarial networks
- GEO, Gene Expression Omnibus
- HCC, hepatocellular carcinoma
- HPRD, Human Protein Reference Database
- KNN, K-nearest neighbor
- L-SVM, linear SVM
- LIMMA, linear models for microarray data
- LOOCV, Leave-one-out cross-validation
- LR, Logistic Regression
- MCCV, Monte Carlo cross-validation
- MLP, multilayer perceptron
- Machine learning
- Metastasis
- NPV, negative predictive value
- PCA, Principal component analysis
- PPI, protein-protein interaction
- PPV, positive predictive value
- RC, ridge classifier
- RF, Random Forest
- RFE, recursive feature elimination
- RMA, robust multi‐array average
- RNN, recurrent neural networks
- SGD, stochastic gradient descent
- SMOTE, synthetic minority over-sampling technique
- SVM, Support Vector Machine
- Se, sensitivity
- Sp, specificity
- TCGA, The Cancer Genome Atlas
- k-CV, k-fold cross validation
- mRMR, minimum redundancy maximum relevance
Collapse
Affiliation(s)
- Somayah Albaradei
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
- King Abdulaziz University, Faculty of Computing and Information Technology, Jeddah, Saudi Arabia
| | - Maha Thafar
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
- Taif University, Collage of Computers and Information Technology, Taif, Saudi Arabia
| | - Asim Alsaedi
- King Saud bin Abdulaziz University for Health Sciences, Jeddah, Saudi Arabia
- King Abdulaziz Medical City, Jeddah, Saudi Arabia
| | - Christophe Van Neste
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Takashi Gojobori
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
- Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Magbubah Essack
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
45
|
Su R, Hu J, Zou Q, Manavalan B, Wei L. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief Bioinform 2021; 21:408-420. [PMID: 30649170 DOI: 10.1093/bib/bby124] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 11/30/2018] [Accepted: 11/30/2018] [Indexed: 12/16/2022] Open
Abstract
Cell-penetrating peptides (CPPs) facilitate the delivery of therapeutically relevant molecules, including DNA, proteins and oligonucleotides, into cells both in vitro and in vivo. This unique ability explores the possibility of CPPs as therapeutic delivery and its potential applications in clinical therapy. Over the last few decades, a number of machine learning (ML)-based prediction tools have been developed, and some of them are freely available as web portals. However, the predictions produced by various tools are difficult to quantify and compare. In particular, there is no systematic comparison of the web-based prediction tools in performance, especially in practical applications. In this work, we provide a comprehensive review on the biological importance of CPPs, CPP database and existing ML-based methods for CPP prediction. To evaluate current prediction tools, we conducted a comparative study and analyzed a total of 12 models from 6 publicly available CPP prediction tools on 2 benchmark validation sets of CPPs and non-CPPs. Our benchmarking results demonstrated that a model from the KELM-CPPpred, namely KELM-hybrid-AAC, showed a significant improvement in overall performance, when compared to the other 11 prediction models. Moreover, through a length-dependency analysis, we find that existing prediction tools tend to more accurately predict CPPs and non-CPPs with the length of 20-25 residues long than peptides in other length ranges.
Collapse
Affiliation(s)
- Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jie Hu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | | | - Leyi Wei
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
46
|
The Trifecta of Single-Cell, Systems-Biology, and Machine-Learning Approaches. Genes (Basel) 2021; 12:genes12071098. [PMID: 34356114 PMCID: PMC8306972 DOI: 10.3390/genes12071098] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 07/13/2021] [Accepted: 07/18/2021] [Indexed: 12/18/2022] Open
Abstract
Together, single-cell technologies and systems biology have been used to investigate previously unanswerable questions in biomedicine with unparalleled detail. Despite these advances, gaps in analytical capacity remain. Machine learning, which has revolutionized biomedical imaging analysis, drug discovery, and systems biology, is an ideal strategy to fill these gaps in single-cell studies. Machine learning additionally has proven to be remarkably synergistic with single-cell data because it remedies unique challenges while capitalizing on the positive aspects of single-cell data. In this review, we describe how systems-biology algorithms have layered machine learning with biological components to provide systems level analyses of single-cell omics data, thus elucidating complex biological mechanisms. Accordingly, we highlight the trifecta of single-cell, systems-biology, and machine-learning approaches and illustrate how this trifecta can significantly contribute to five key areas of scientific research: cell trajectory and identity, individualized medicine, pharmacology, spatial omics, and multi-omics. Given its success to date, the systems-biology, single-cell omics, and machine-learning trifecta has proven to be a potent combination that will further advance biomedical research.
Collapse
|
47
|
Zhang J, Chen Q, Liu B. DeepDRBP-2L: A New Genome Annotation Predictor for Identifying DNA-Binding Proteins and RNA-Binding Proteins Using Convolutional Neural Network and Long Short-Term Memory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1451-1463. [PMID: 31722485 DOI: 10.1109/tcbb.2019.2952338] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two kinds of crucial proteins, which are associated with various cellule activities and some important diseases. Accurate identification of DBPs and RBPs facilitate both theoretical research and real world application. Existing sequence-based DBP predictors can accurately identify DBPs but incorrectly predict many RBPs as DBPs, and vice versa, resulting in low prediction precision. Moreover, some proteins (DRBPs) interacting with both DNA and RNA play important roles in gene expression and cannot be identified by existing computational methods. In this study, a two-level predictor named DeepDRBP-2L was proposed by combining Convolutional Neural Network (CNN) and the Long Short-Term Memory (LSTM). It is the first computational method that is able to identify DBPs, RBPs and DRBPs. Rigorous cross-validations and independent tests showed that DeepDRBP-2L is able to overcome the shortcoming of the existing methods and can go one further step to identify DRBPs. Application of DeepDRBP-2L to tomato genome further demonstrated its performance. The webserver of DeepDRBP-2L is freely available at http://bliulab.net/DeepDRBP-2L.
Collapse
|
48
|
Lorkowski J, Kolaszyńska O, Pokorski M. Artificial Intelligence and Precision Medicine: A Perspective. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1375:1-11. [PMID: 34138457 DOI: 10.1007/5584_2021_652] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
This article aims to present how the advanced solutions of artificial intelligence and precision medicine work together to refine medical management. Multi-omics seems the most suitable approach for biological analysis of data on precision medicine and artificial intelligence. We searched PubMed and Google Scholar databases to collect pertinent articles appearing up to 5 March 2021. Genetics, oncology, radiology, and the recent coronavirus disease (COVID-19) pandemic were chosen as representative fields addressing the cross-compliance of artificial intelligence (AI) and precision medicine based on the highest number of articles, topicality, and interconnectedness of the issue. Overall, we identified and perused 1572 articles. AI is a breakthrough that takes part in shaping the Fourth Industrial Revolution in medicine and health care, changing the long-time accepted diagnostic and treatment regimens and approaches. AI-based link prediction models may be outstandingly helpful in the literature search for drug repurposing or finding new therapeutical modalities in rapidly erupting wide-scale diseases such as the recent COVID-19.
Collapse
Affiliation(s)
- Jacek Lorkowski
- Department of Orthopedics, Traumatology and Sports Medicine, Central Clinical Hospital of the Ministry of Internal Affairs and Administration, Warsaw, Poland. .,Faculty of Health Sciences, Medical University of Mazovia, Warsaw, Poland.
| | - Oliwia Kolaszyńska
- Department of Cardiology, Independent Public Regional Hospital, Szczecin, Poland
| | - Mieczysław Pokorski
- Institute of Health Sciences, Opole University, Opole, Poland.,Faculty of Health Sciences, The Jan Długosz University in Częstochowa, Częstochowa, Poland
| |
Collapse
|
49
|
Kang Q, Meng J, Shi W, Luan Y. Ensemble Deep Learning Based on Multi-level Information Enhancement and Greedy Fuzzy Decision for Plant miRNA-lncRNA Interaction Prediction. Interdiscip Sci 2021; 13:603-614. [PMID: 33900552 DOI: 10.1007/s12539-021-00434-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 04/01/2021] [Accepted: 04/16/2021] [Indexed: 12/18/2022]
Abstract
MicroRNAs (miRNAs) and long non-coding RNAs (lncRNAs) are both non-coding RNAs (ncRNAs) and their interactions play important roles in biological processes. Computational methods, such as machine learning and various bioinformatics tools, can predict potential miRNA-lncRNA interactions, which is significant for studying their mechanisms and biological functions. A growing number of RNA interaction predictors for animal have been reported, but they are unreliable for plant due to the differences of ncRNAs in animal and plant. It is urgent to build a reliable plant predictor, especially for cross-species. This paper proposes an ensemble deep learning model based on multi-level information enhancement and greedy fuzzy decision (PmliPEMG) for plant miRNA-lncRNA interaction prediction. The fusion complex features, multi-scale convolutional long short-term memory networks, and attention mechanism are adopted to enhance the sample information at the feature, scale, and model levels, respectively. An ensemble deep learning model is built based on a novel method (greedy fuzzy decision) which greatly improves the efficiency. The multi-level information enhancement and greedy fuzzy decision are verified to have the positive effects on prediction performance. PmliPEMG can be applied to the cross-species prediction. It shows better performance and stronger generalization ability than state-of-the-art predictors and may provide valuable references for related research.
Collapse
Affiliation(s)
- Qiang Kang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| | - Wenhao Shi
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, China
| |
Collapse
|
50
|
Performance Comparison of Deep Learning Autoencoders for Cancer Subtype Detection Using Multi-Omics Data. Cancers (Basel) 2021; 13:cancers13092013. [PMID: 33921978 PMCID: PMC8122584 DOI: 10.3390/cancers13092013] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/29/2021] [Accepted: 04/06/2021] [Indexed: 12/14/2022] Open
Abstract
A heterogeneous disease such as cancer is activated through multiple pathways and different perturbations. Depending upon the activated pathway(s), the survival of the patients varies significantly and shows different efficacy to various drugs. Therefore, cancer subtype detection using genomics level data is a significant research problem. Subtype detection is often a complex problem, and in most cases, needs multi-omics data fusion to achieve accurate subtyping. Different data fusion and subtyping approaches have been proposed over the years, such as kernel-based fusion, matrix factorization, and deep learning autoencoders. In this paper, we compared the performance of different deep learning autoencoders for cancer subtype detection. We performed cancer subtype detection on four different cancer types from The Cancer Genome Atlas (TCGA) datasets using four autoencoder implementations. We also predicted the optimal number of subtypes in a cancer type using the silhouette score and found that the detected subtypes exhibit significant differences in survival profiles. Furthermore, we compared the effect of feature selection and similarity measures for subtype detection. For further evaluation, we used the Glioblastoma multiforme (GBM) dataset and identified the differentially expressed genes in each of the subtypes. The results obtained are consistent with other genomic studies and can be corroborated with the involved pathways and biological functions. Thus, it shows that the results from the autoencoders, obtained through the interaction of different datatypes of cancer, can be used for the prediction and characterization of patient subgroups and survival profiles.
Collapse
|