1
|
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. Designing interpretable deep learning applications for functional genomics: a quantitative analysis. Brief Bioinform 2024; 25:bbae449. [PMID: 39293804 PMCID: PMC11410376 DOI: 10.1093/bib/bbae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/07/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Wiro J Niessen
- Department of Imaging Physics, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| |
Collapse
|
2
|
Ruiz-Arenas C, Marín-Goñi I, Wang L, Ochoa I, Pérez-Jurado L, Hernaez M. NetActivity enhances transcriptional signals by combining gene expression into robust gene set activity scores through interpretable autoencoders. Nucleic Acids Res 2024; 52:e44. [PMID: 38597610 PMCID: PMC11109970 DOI: 10.1093/nar/gkae197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 01/23/2024] [Accepted: 03/12/2024] [Indexed: 04/11/2024] Open
Abstract
Grouping gene expression into gene set activity scores (GSAS) provides better biological insights than studying individual genes. However, existing gene set projection methods cannot return representative, robust, and interpretable GSAS. We developed NetActivity, a machine learning framework that generates GSAS based on a sparsely-connected autoencoder, where each neuron in the inner layer represents a gene set. We proposed a three-tier training that yielded representative, robust, and interpretable GSAS. NetActivity model was trained with 1518 GO biological processes terms and KEGG pathways and all GTEx samples. NetActivity generates GSAS robust to the initialization parameters and representative of the original transcriptome, and assigned higher importance to more biologically relevant genes. Moreover, NetActivity returns GSAS with a more consistent definition and higher interpretability than GSVA and hipathia, state-of-the-art gene set projection methods. Finally, NetActivity enables combining bulk RNA-seq and microarray datasets in a meta-analysis of prostate cancer progression, highlighting gene sets related to cell division, key for disease progression. When applied to metastatic prostate cancer, gene sets associated with cancer progression were also altered due to drug resistance, while a classical enrichment analysis identified gene sets irrelevant to the phenotype. NetActivity is publicly available in Bioconductor and GitHub.
Collapse
Affiliation(s)
- Carlos Ruiz-Arenas
- Computational Biology Program, CIMA University of Navarra, idiSNA, Pamplona 31008, Spain
- Department MELIS, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Irene Marín-Goñi
- Computational Biology Program, CIMA University of Navarra, idiSNA, Pamplona 31008, Spain
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Idoia Ochoa
- Department of Electrical and Electronics Engineering, Tecnun, University of Navarra, Donostia, Spain
- Institute for Data Science and Artificial Inteligence (DATAI), University of Navarra, Pamplona 31008, Spain
| | - Luis A Pérez-Jurado
- Department MELIS, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Barcelona, Spain
- Genetics Service, Hospital del Mar & Hospital del Mar Research Institute (IMIM), Barcelona, Spain
| | - Mikel Hernaez
- Computational Biology Program, CIMA University of Navarra, idiSNA, Pamplona 31008, Spain
- Institute for Data Science and Artificial Inteligence (DATAI), University of Navarra, Pamplona 31008, Spain
| |
Collapse
|
3
|
Feng X, Xiu YH, Long HX, Wang ZT, Bilal A, Yang LM. Advancing single-cell RNA-seq data analysis through the fusion of multi-layer perceptron and graph neural network. Brief Bioinform 2023; 25:bbad481. [PMID: 38171931 PMCID: PMC10764207 DOI: 10.1093/bib/bbad481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 11/18/2023] [Accepted: 12/03/2023] [Indexed: 01/05/2024] Open
Abstract
The advancement of single-cell sequencing technology has smoothed the ability to do biological studies at the cellular level. Nevertheless, single-cell RNA sequencing (scRNA-seq) data presents several obstacles due to the considerable heterogeneity, sparsity and complexity. Although many machine-learning models have been devised to tackle these difficulties, there is still a need to enhance their efficiency and accuracy. Current deep learning methods often fail to fully exploit the intrinsic interconnections within cells, resulting in unsatisfactory results. Given these obstacles, we propose a unique approach for analyzing scRNA-seq data called scMPN. This methodology integrates multi-layer perceptron and graph neural network, including attention network, to execute gene imputation and cell clustering tasks. In order to evaluate the gene imputation performance of scMPN, several metrics like cosine similarity, median L1 distance and root mean square error are used. These metrics are utilized to compare the efficacy of scMPN with other existing approaches. This research utilizes criteria such as adjusted mutual information, normalized mutual information and integrity score to assess the efficacy of cell clustering across different approaches. The superiority of scMPN over current single-cell data processing techniques in cell clustering and gene imputation investigations is shown by the experimental findings obtained from four datasets with gold-standard cell labels. This observation demonstrates the efficacy of our suggested methodology in using deep learning methodologies to enhance the interpretation of scRNA-seq data.
Collapse
Affiliation(s)
- Xiang Feng
- Department of Information Science Technology, Hainan Normal University, 99 Longkun Road, Haikou, Hainan 571158, China
| | - Yu-Han Xiu
- Department of Information Science Technology, Hainan Normal University, 99 Longkun Road, Haikou, Hainan 571158, China
| | - Hai-Xia Long
- Department of Information Science Technology, Hainan Normal University, 99 Longkun Road, Haikou, Hainan 571158, China
| | - Zi-Tong Wang
- Department of Pathophysiology, School of Basic Medical Sciences, Harbin Medical University, Harbin 150081, China
| | - Anas Bilal
- Department of Information Science Technology, Hainan Normal University, 99 Longkun Road, Haikou, Hainan 571158, China
| | - Li-Ming Yang
- Department of Pathophysiology, School of Basic Medical Sciences, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
4
|
Yu Z, Su Y, Lu Y, Yang Y, Wang F, Zhang S, Chang Y, Wong KC, Li X. Topological identification and interpretation for single-cell gene regulation elucidation across multiple platforms using scMGCA. Nat Commun 2023; 14:400. [PMID: 36697410 PMCID: PMC9877026 DOI: 10.1038/s41467-023-36134-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 01/16/2023] [Indexed: 01/26/2023] Open
Abstract
Single-cell RNA sequencing provides high-throughput gene expression information to explore cellular heterogeneity at the individual cell level. A major challenge in characterizing high-throughput gene expression data arises from challenges related to dimensionality, and the prevalence of dropout events. To address these concerns, we develop a deep graph learning method, scMGCA, for single-cell data analysis. scMGCA is based on a graph-embedding autoencoder that simultaneously learns cell-cell topology representation and cluster assignments. We show that scMGCA is accurate and effective for cell segregation and batch effect correction, outperforming other state-of-the-art models across multiple platforms. In addition, we perform genomic interpretation on the key compressed transcriptomic space of the graph-embedding autoencoder to demonstrate the underlying gene regulation mechanism. We demonstrate that in a pancreatic ductal adenocarcinoma dataset, scMGCA successfully provides annotations on the specific cell types and reveals differential gene expression levels across multiple tumor-associated and cell signalling pathways.
Collapse
Affiliation(s)
- Zhuohan Yu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yanchi Su
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yifu Lu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yuning Yang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| | - Shixiong Zhang
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China.
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Jilin, China.
| |
Collapse
|
5
|
Single-Cell RNAseq Complexity Reduction. Methods Mol Biol 2022; 2584:217-230. [PMID: 36495452 DOI: 10.1007/978-1-0716-2756-3_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
An important step in single-cell RNAseq data analysis is the preparation of the single cell transcription data for cell sub-population partitioning. In this chapter, we describe how to perform complexity reduction for 3' end single-cell RNAseq transcriptomics data.
Collapse
|
6
|
Abstract
rCASC is a modular workflow providing an integrated environment for single-cell RNA-seq (scRNA-Seq) data analysis exploiting Docker containers to achieve functional and computational reproducibility. It was initially developed as an R package usable also through a Java GUI. However, the Java frontend cannot be employed when running rCASC on a remote server, a typical setup due to the significant computational resources commonly needed to analyze scRNA-Seq data.To allow the use of rCASC through a graphical user interface on the client side and to harness the many advantages provided by the Galaxy platform, we have made rCASC available as a Galaxy set of tools, also providing a dedicated public instance of Galaxy named "Galaxy-rCASC." To integrate rCASC into Galaxy, all its functions, originally implemented as a set of Docker containers to maximize reproducibility, have been extensively reworked to become independent from the R package functions that launch them in the original implementation. Furthermore, suitable Galaxy wrappers have been developed for most functions of rCASC. We provide a detailed reference document to the use of Galaxy-rCASC with insights and explanations on the platform functionalities, parameters, and output while guiding the reader through the typical rCASC analysis workflow of a scRNA-Seq dataset.
Collapse
|
7
|
Danielski K. Guidance on Processing the 10x Genomics Single Cell Gene Expression Assay. Methods Mol Biol 2022; 2584:1-28. [PMID: 36495443 DOI: 10.1007/978-1-0716-2756-3_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The demand for technologies that allow the study of gene expression at single cell resolution continues to increase. One such assay was launched in 2016 by the US-based company 10x Genomics Inc. Utilizing the power of the single cell on a large scale (Zheng et al. Nat Commun 8:14049, 2017)-capturing thousands of cells at once-has shaped life sciences ever since and allowed researchers to discover new insights within their respective fields of study such as oncology, neurobiology, and immunology (among others). Obtaining high-data quality is the key to being able to make these meaningful discoveries, which in turn is directly linked to the quality of the initial cell (or nuclei) suspension that is used to load the 10x Genomics Chromium Single Cell Gene Expression assay. A successful workflow relies on a cell suspension which is fully dissociated, extremely clean, and of high viability. While the workflow itself has been detailed elsewhere (De Simone et al. Methods Mol Biol 1979:87-110, 2019), in this chapter we will focus on the importance of the quality of the initial cell suspension, as well as common mistakes that can occur while running a Single Cell Gene Expression assay. The descriptions of these tips and tricks refer to the current version of the 10x Genomics User Guide (Chromium Single Cell 3' Reagent Kits User Guide (v3.1 Chemistry Dual Index). https://support10xgenomicscom/single-cell-gene-expression/index/doc/user-guide-chromium-single-cell-3-reagent-kits-user-guide-v31-chemistry-dual-index) which can be downloaded from the Support section on the 10x Genomics website (10x Genomics website. https://www10xgenomicscom). These documents and user guides are continuously improved and updated; hence, it is important to regularly check the company's website for the most recent version.
Collapse
|
8
|
Alessandri L, Calogero RA. Functional-Feature-Based Data Reduction Using Sparsely Connected Autoencoders. Methods Mol Biol 2022; 2584:231-240. [PMID: 36495453 DOI: 10.1007/978-1-0716-2756-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) allows for the creation of large collections of individual cells transcriptome. Unsupervised clustering is an essential element for the analysis of these data, and it represents the initial step for the identification of different cell types to investigate the cell subpopulation structure of a biological sample. However, it is possible that the clustering aggregation features do not perfectly match the underlying biology since scRNA-seq data are characterized by high noise. In this chapter, we describe a functional feature-driven data reduction approach, which could provide a better link among cell clusters and their underlying cell biology.
Collapse
Affiliation(s)
- Luca Alessandri
- Molecular Biotechnology Center, University of Torino, Turin, Italy.
| | | |
Collapse
|
9
|
Olivero M, Calogero RA. Single-Cell RNAseq Data QC and Preprocessing. Methods Mol Biol 2022; 2584:205-215. [PMID: 36495451 DOI: 10.1007/978-1-0716-2756-3_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The first step in single-cell RNAseq data analysis is the evaluation of the overall quality of the cell transcriptome and the preparation of the single-cell transcription data for clustering. In this chapter, we describe one of the possible approaches to perform single-cell data preprocessing for 3' end single-cell RNAseq transcriptomics data.
Collapse
Affiliation(s)
- Martina Olivero
- Department of Oncology, University of Torino, Torino, Italy. .,Candiolo Cancer Institute-FPO, IRCCS, Candiolo, TO, Italy.
| | | |
Collapse
|
10
|
Beccuti M, Calogero RA. Single-Cell RNAseq Clustering. Methods Mol Biol 2022; 2584:241-250. [PMID: 36495454 DOI: 10.1007/978-1-0716-2756-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) allows the creation of large collections of individual cells transcriptome. Unsupervised clustering is an essential element for the analysis of these data, and it represents the initial step for the identification of different cell types to investigate the cell subpopulation organization of a sample. In this chapter, we describe how to approach the clustering of single-cell RNAseq transcriptomics data using various clustering tools, and we provide some information on the limitations affecting the clustering procedure.
Collapse
Affiliation(s)
- Marco Beccuti
- Department of Computer Science, University of Torino, Turin, Italy.
| | | |
Collapse
|
11
|
Antico F, Gai M, Arigoni M. Tissue RNA Integrity in Visium Spatial Protocol (Fresh Frozen Samples). Methods Mol Biol 2022; 2584:191-203. [PMID: 36495450 DOI: 10.1007/978-1-0716-2756-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The transcriptome of a tissue can be acquired both by single-cell RNAseq (scRNA-seq) and by spatial transcriptomics (ST). The dissociation step, which is mandatory in scRNA-seq methods, might lead to the loss of fragile cells and of spatial information, thus limiting the acquisition of the tissue cellular organization. Spatial transcriptomics methods moderate the above-mentioned issues and provide single-cell transcripts detection over an intact fresh frozen tissue section. Visium platform, commercialized from 10× Genomics, provides a whole transcriptome spatial transcriptomics platform, which does not require dedicated instruments, other than those available in any pathology laboratory. In spatial transcriptomics, proper tissue handling is mandatory to preserve the morphological quality of the tissue sections and the integrity of mRNA transcripts. Proper tissue handling is critical for downstream library preparation and sequencing performance. In this chapter, we describe the most critical steps of Visium protocol on fresh frozen tissues and we provide indications on how to interpret the data obtained from the quality control analysis recommended during the workflow.
Collapse
Affiliation(s)
- Federica Antico
- Molecular Biotechnology Center, University of Torino, Torino, Italy
| | - Marta Gai
- Molecular Biotechnology Center, University of Torino, Torino, Italy
| | - Maddalena Arigoni
- Molecular Biotechnology Center, University of Torino, Torino, Italy.
| |
Collapse
|
12
|
Abstract
The idea behind novel single-cell RNA sequencing (scRNA-seq) pipelines is to isolate single cells through microfluidic approaches and generate sequencing libraries in which the transcripts are tagged to track their cell of origin. Modern scRNA-seq platforms are capable of analyzing up to many thousands of cells in each run. Then, combined with massive high-throughput sequencing producing billions of reads, scRNA-seq allows the assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution.In this chapter, we describe how cell subpopulation discovery algorithms, integrated into rCASC, could be efficiently executed on cloud-HPC infrastructure. To achieve this task, we focus on the StreamFlow framework which provides container-native runtime support for scientific workflows in cloud/HPC environments.
Collapse
|
13
|
Identifying Gene Markers Associated with Cell Subpopulations. Methods Mol Biol 2022; 2584:251-268. [PMID: 36495455 DOI: 10.1007/978-1-0716-2756-3_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
An important point of the analysis of a single-cell RNA experiment is the identification of the key elements, i.e., genes, characterizing each cell subpopulation cluster. In this chapter, we describe the use of sparsely connected autoencoder, as a tool to convert single-cell clusters in pseudo-RNAseq experiments to be used as input for differential expression analysis, and the use of COMET, as a tool to depict cluster-specific gene markers.
Collapse
|
14
|
Li MM, Huang K, Zitnik M. Graph representation learning in biomedicine and healthcare. Nat Biomed Eng 2022; 6:1353-1369. [PMID: 36316368 PMCID: PMC10699434 DOI: 10.1038/s41551-022-00942-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Networks-or graphs-are universal descriptors of systems of interacting elements. In biomedicine and healthcare, they can represent, for example, molecular interactions, signalling pathways, disease co-morbidities or healthcare systems. In this Perspective, we posit that representation learning can realize principles of network medicine, discuss successes and current limitations of the use of representation learning on graphs in biomedicine and healthcare, and outline algorithmic strategies that leverage the topology of graphs to embed them into compact vectorial spaces. We argue that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
Collapse
Affiliation(s)
- Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Kexin Huang
- Health Data Science Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
15
|
Tripathi S, Purchase D, Govarthanan M, Chandra R, Yadav S. Regulatory and innovative mechanisms of bacterial quorum sensing-mediated pathogenicity: a review. ENVIRONMENTAL MONITORING AND ASSESSMENT 2022; 195:75. [PMID: 36334179 DOI: 10.1007/s10661-022-10564-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 01/29/2022] [Indexed: 06/16/2023]
Abstract
Quorum sensing (QS) is a system of bacteria in which cells communicate with each other; it is linked to cell density in the microbiome. The high-density colony population can provide enough small molecular signals to enable a range of cellular activities, gene expression, pathogenicity, and antibiotic resistance that cause damage to the hosts. QS is the basis of chronic illnesses in human due to microbial sporulation, expression of virulence factors, biofilm formation, secretion of enzymes, or production of membrane vesicles. The transfer of antimicrobial resistance gene (ARG) among antibiotic resistance bacteria is a major public health concern. QS-mediated biofilm is a hub for ARG horizontal gene transfer. To develop innovative approach to prevent microbial pathogenesis, it is essential to understand the role of QS especially in response to environmental stressors such as exposure to antibiotics. This review provides the latest knowledge on the relationship of QS and pathogenicity and explore the novel approach to control QS via quorum quenching (QQ) using QS inhibitors (QSIs) and QQ enzymes. The state-of-the art knowledge on the role of QS and the potential of using QQ will help to overcome the threats of rapidly emerging bacterial pathogenesis.
Collapse
Affiliation(s)
- Sonam Tripathi
- Department of Environmental Microbiology, School for Environmental Sciences, Babasaheb Bhimrao Ambedkar University (A Central University), Vidya Vihar, Raebareli Road, Lucknow, 226025, UP, India
| | - Diane Purchase
- Department of Natural Sciences, Faculty of Science and Technology, Middlesex University, The Burroughs, Hendon, London, NW4 4BT, UK
| | - Muthusamy Govarthanan
- Department of Environmental Engineering, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu, 41566, South Korea
| | - Ram Chandra
- Department of Environmental Microbiology, School for Environmental Sciences, Babasaheb Bhimrao Ambedkar University (A Central University), Vidya Vihar, Raebareli Road, Lucknow, 226025, UP, India.
| | - Sangeeta Yadav
- Department of Environmental Microbiology, School for Environmental Sciences, Babasaheb Bhimrao Ambedkar University (A Central University), Vidya Vihar, Raebareli Road, Lucknow, 226025, UP, India.
- Department of Botany, Vaishno Devi Prashikshan Mahavidyalaya, Gondahi, Kunda, Pratapgarh, India.
| |
Collapse
|
16
|
Brendel M, Su C, Bai Z, Zhang H, Elemento O, Wang F. Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:814-835. [PMID: 36528240 PMCID: PMC10025684 DOI: 10.1016/j.gpb.2022.11.011] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 08/17/2022] [Accepted: 11/24/2022] [Indexed: 12/23/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during the development of complex organisms, and improved our understanding of disease states, such as cancer, diabetes, and coronavirus disease 2019 (COVID-19). Deep learning, a recent advance of artificial intelligence that has been used to address many problems involving large datasets, has also emerged as a promising tool for scRNA-seq data analysis, as it has a capacity to extract informative and compact features from noisy, heterogeneous, and high-dimensional scRNA-seq data to improve downstream analysis. The present review aims at surveying recently developed deep learning techniques in scRNA-seq data analysis, identifying key steps within the scRNA-seq data analysis pipeline that have been advanced by deep learning, and explaining the benefits of deep learning over more conventional analytic tools. Finally, we summarize the challenges in current deep learning approaches faced within scRNA-seq data and discuss potential directions for improvements in deep learning algorithms for scRNA-seq data analysis.
Collapse
Affiliation(s)
- Matthew Brendel
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA; Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Chang Su
- Department of Health Service Administration and Policy, Temple University, Philadelphia, PA 19122, USA.
| | - Zilong Bai
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Hao Zhang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Olivier Elemento
- Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA.
| |
Collapse
|
17
|
scCAN: single-cell clustering using autoencoder and network fusion. Sci Rep 2022; 12:10267. [PMID: 35715568 PMCID: PMC9206025 DOI: 10.1038/s41598-022-14218-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 06/02/2022] [Indexed: 11/30/2022] Open
Abstract
Unsupervised clustering of single-cell RNA sequencing data (scRNA-seq) is important because it allows us to identify putative cell types. However, the large number of cells (up to millions), the high-dimensionality of the data (tens of thousands of genes), and the high dropout rates all present substantial challenges in single-cell analysis. Here we introduce a new method, named single-cell Clustering using Autoencoder and Network fusion (scCAN), that can overcome these challenges to accurately segregate different cell types in large and sparse scRNA-seq data. In an extensive analysis using 28 real scRNA-seq datasets (more than three million cells) and 243 simulated datasets, we validate that scCAN: (1) correctly estimates the number of true cell types, (2) accurately segregates cells of different types, (3) is robust against dropouts, and (4) is fast and memory efficient. We also compare scCAN with CIDR, SEURAT3, Monocle3, SHARP, and SCANPY. scCAN outperforms these state-of-the-art methods in terms of both accuracy and scalability. The scCAN package is available at https://cran.r-project.org/package=scCAN. Data and R scripts are available at http://sccan.tinnguyen-lab.com/
Collapse
|
18
|
Abondio P, De Intinis C, da Silva Gonçalves Vianez Júnior JL, Pace L. SINGLE CELL MULTIOMIC APPROACHES TO DISENTANGLE T CELL HETEROGENEITY. Immunol Lett 2022; 246:37-51. [DOI: 10.1016/j.imlet.2022.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 04/16/2022] [Accepted: 04/26/2022] [Indexed: 11/29/2022]
|
19
|
A novel graph mining approach to predict and evaluate food-drug interactions. Sci Rep 2022; 12:1061. [PMID: 35058561 PMCID: PMC8776972 DOI: 10.1038/s41598-022-05132-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 01/05/2022] [Indexed: 12/26/2022] Open
Abstract
Food-drug interactions (FDIs) arise when nutritional dietary consumption regulates biochemical mechanisms involved in drug metabolism. This study proposes FDMine, a novel systematic framework that models the FDI problem as a homogenous graph. Our dataset consists of 788 unique approved small molecule drugs with metabolism-related drug-drug interactions and 320 unique food items, composed of 563 unique compounds. The potential number of interactions is 87,192 and 92,143 for disjoint and joint versions of the graph. We defined several similarity subnetworks comprising food-drug similarity, drug-drug similarity, and food-food similarity networks. A unique part of the graph involves encoding the food composition as a set of nodes and calculating a content contribution score. To predict new FDIs, we considered several link prediction algorithms and various performance metrics, including the precision@top (top 1%, 2%, and 5%) of the newly predicted links. The shortest path-based method has achieved a precision of 84%, 60% and 40% for the top 1%, 2% and 5% of FDIs identified, respectively. We validated the top FDIs predicted using FDMine to demonstrate its applicability, and we relate therapeutic anti-inflammatory effects of food items informed by FDIs. FDMine is publicly available to support clinicians and researchers.
Collapse
|
20
|
Wang SH, Satapathy SC, Zhou Q, Zhang X, Zhang YD. Secondary Pulmonary Tuberculosis Identification Via pseudo-Zernike Moment and Deep Stacked Sparse Autoencoder. JOURNAL OF GRID COMPUTING 2021; 20:1. [PMID: 34931118 PMCID: PMC8674408 DOI: 10.1007/s10723-021-09596-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 11/28/2021] [Indexed: 05/26/2023]
Abstract
Secondary pulmonary tuberculosis (SPT) is one of the top ten causes of death from a single infectious agent. To recognize SPT more accurately, this paper proposes a novel artificial intelligence model, which uses Pseudo Zernike moment (PZM) as the feature extractor and deep stacked sparse autoencoder (DSSAE) as the classifier. In addition, 18-way data augmentation is employed to avoid overfitting. This model is abbreviated as PZM-DSSAE. The ten runs of 10-fold cross-validation show this model achieves a sensitivity of 93.33% ± 1.47%, a specificity of 93.13% ± 0.95%, a precision of 93.15% ± 0.89%, an accuracy of 93.23% ± 0.81%, and an F1 score of 93.23% ± 0.83%. The area-under-curve reaches 0.9739. This PZM-DSSAE is superior to 5 state-of-the-art approaches.
Collapse
Affiliation(s)
- Shui-Hua Wang
- School of Mathematics and Actuarial Science, University of Leicester, Leicester, LE1 7RH UK
| | | | - Qinghua Zhou
- School of Informatics, University of Leicester, Leicester, LE1 7RH UK
| | - Xin Zhang
- Department of Medical Imaging, The Fourth People’s Hospital of Huai’an, Huai’an, 223002 Jiangsu Province China
| | - Yu-Dong Zhang
- School of Informatics, University of Leicester, Leicester, LE1 7RH UK
| |
Collapse
|
21
|
Bao S, Li K, Yan C, Zhang Z, Qu J, Zhou M. Deep learning-based advances and applications for single-cell RNA-sequencing data analysis. Brief Bioinform 2021; 23:6444320. [PMID: 34849562 DOI: 10.1093/bib/bbab473] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/24/2021] [Accepted: 10/15/2021] [Indexed: 11/14/2022] Open
Abstract
The rapid development of single-cell RNA-sequencing (scRNA-seq) technology has raised significant computational and analytical challenges. The application of deep learning to scRNA-seq data analysis is rapidly evolving and can overcome the unique challenges in upstream (quality control and normalization) and downstream (cell-, gene- and pathway-level) analysis of scRNA-seq data. In the present study, recent advances and applications of deep learning-based methods, together with specific tools for scRNA-seq data analysis, were summarized. Moreover, the future perspectives and challenges of deep-learning techniques regarding the appropriate analysis and interpretation of scRNA-seq data were investigated. The present study aimed to provide evidence supporting the biomedical application of deep learning-based tools and may aid biologists and bioinformaticians in navigating this exciting and fast-moving area.
Collapse
Affiliation(s)
- Siqi Bao
- School of Information and Communication Engineering, Hainan University, Haikou 570228, P. R. China.,School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.,Hainan Institute of Real World Data, Haikou 570228, P. R. China
| | - Ke Li
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Congcong Yan
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Zicheng Zhang
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jia Qu
- School of Information and Communication Engineering, Hainan University, Haikou 570228, P. R. China.,School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.,Hainan Institute of Real World Data, Haikou 570228, P. R. China
| | - Meng Zhou
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| |
Collapse
|
22
|
Sparsely Connected Autoencoders: A Multi-Purpose Tool for Single Cell omics Analysis. Int J Mol Sci 2021; 22:ijms222312755. [PMID: 34884559 PMCID: PMC8657975 DOI: 10.3390/ijms222312755] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 11/12/2021] [Accepted: 11/23/2021] [Indexed: 02/02/2023] Open
Abstract
Background: Biological processes are based on complex networks of cells and molecules. Single cell multi-omics is a new tool aiming to provide new incites in the complex network of events controlling the functionality of the cell. Methods: Since single cell technologies provide many sample measurements, they are the ideal environment for the application of Deep Learning and Machine Learning approaches. An autoencoder is composed of an encoder and a decoder sub-model. An autoencoder is a very powerful tool in data compression and noise removal. However, the decoder model remains a black box from which is impossible to depict the contribution of the single input elements. We have recently developed a new class of autoencoders, called Sparsely Connected Autoencoders (SCA), which have the advantage of providing a controlled association among the input layer and the decoder module. This new architecture has the benefit that the decoder model is not a black box anymore and can be used to depict new biologically interesting features from single cell data. Results: Here, we show that SCA hidden layer can grab new information usually hidden in single cell data, like providing clustering on meta-features difficult, i.e. transcription factors expression, or not technically not possible, i.e. miRNA expression, to depict in single cell RNAseq data. Furthermore, SCA representation of cell clusters has the advantage of simulating a conventional bulk RNAseq, which is a data transformation allowing the identification of similarity among independent experiments. Conclusions: In our opinion, SCA represents the bioinformatics version of a universal “Swiss-knife” for the extraction of hidden knowledgeable features from single cell omics data.
Collapse
|
23
|
Tangaro MA, Mandreoli P, Chiara M, Donvito G, Antonacci M, Parisi A, Bianco A, Romano A, Bianchi DM, Cangelosi D, Uva P, Molineris I, Nosi V, Calogero RA, Alessandri L, Pedrini E, Mordenti M, Bonetti E, Sangiorgi L, Pesole G, Zambelli F. Laniakea@ReCaS: exploring the potential of customisable Galaxy on-demand instances as a cloud-based service. BMC Bioinformatics 2021; 22:544. [PMID: 34749633 PMCID: PMC8574934 DOI: 10.1186/s12859-021-04401-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 09/24/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Improving the availability and usability of data and analytical tools is a critical precondition for further advancing modern biological and biomedical research. For instance, one of the many ramifications of the COVID-19 global pandemic has been to make even more evident the importance of having bioinformatics tools and data readily actionable by researchers through convenient access points and supported by adequate IT infrastructures. One of the most successful efforts in improving the availability and usability of bioinformatics tools and data is represented by the Galaxy workflow manager and its thriving community. In 2020 we introduced Laniakea, a software platform conceived to streamline the configuration and deployment of "on-demand" Galaxy instances over the cloud. By facilitating the set-up and configuration of Galaxy web servers, Laniakea provides researchers with a powerful and highly customisable platform for executing complex bioinformatics analyses. The system can be accessed through a dedicated and user-friendly web interface that allows the Galaxy web server's initial configuration and deployment. RESULTS "Laniakea@ReCaS", the first instance of a Laniakea-based service, is managed by ELIXIR-IT and was officially launched in February 2020, after about one year of development and testing that involved several users. Researchers can request access to Laniakea@ReCaS through an open-ended call for use-cases. Ten project proposals have been accepted since then, totalling 18 Galaxy on-demand virtual servers that employ ~ 100 CPUs, ~ 250 GB of RAM and ~ 5 TB of storage and serve several different communities and purposes. Herein, we present eight use cases demonstrating the versatility of the platform. CONCLUSIONS During this first year of activity, the Laniakea-based service emerged as a flexible platform that facilitated the rapid development of bioinformatics tools, the efficient delivery of training activities, and the provision of public bioinformatics services in different settings, including food safety and clinical research. Laniakea@ReCaS provides a proof of concept of how enabling access to appropriate, reliable IT resources and ready-to-use bioinformatics tools can considerably streamline researchers' work.
Collapse
Affiliation(s)
- Marco Antonio Tangaro
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy
- National Institute for Nuclear Physics (INFN), Section of Bari, Via Orabona 4, 70126, Bari, Italy
| | - Pietro Mandreoli
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy
- Department of Biosciences, University of Milan, Via Celoria 26, 20133, Milano, Italy
| | - Matteo Chiara
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy
- Department of Biosciences, University of Milan, Via Celoria 26, 20133, Milano, Italy
| | - Giacinto Donvito
- National Institute for Nuclear Physics (INFN), Section of Bari, Via Orabona 4, 70126, Bari, Italy
| | - Marica Antonacci
- National Institute for Nuclear Physics (INFN), Section of Bari, Via Orabona 4, 70126, Bari, Italy
| | - Antonio Parisi
- Istituto Zooprofilattico Sperimentale Della Puglia e Della Basilicata, Via Manfredonia 20, 71121, Foggia, Italy
| | - Angelica Bianco
- Istituto Zooprofilattico Sperimentale Della Puglia e Della Basilicata, Via Manfredonia 20, 71121, Foggia, Italy
| | - Angelo Romano
- National Reference Laboratory for Coagulase-Positive Staphylococci Including Staphylococcus Aureus, Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy
| | - Daniela Manila Bianchi
- National Reference Laboratory for Coagulase-Positive Staphylococci Including Staphylococcus Aureus, Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy
| | - Davide Cangelosi
- Clinical Bioinformatics Unit, Scientific Direction, IRCCS Istituto Giannina Gaslini, Via Gerolamo Gaslini 5, 16147, Genova, Italy
| | - Paolo Uva
- Clinical Bioinformatics Unit, Scientific Direction, IRCCS Istituto Giannina Gaslini, Via Gerolamo Gaslini 5, 16147, Genova, Italy
- Italian Institute of Technology, Via Morego 30, 16163, Genova, Italy
| | - Ivan Molineris
- Department of Life Science and System Biology, University of Turin, Via Accademia Albertina, 13-1023, Turin, Italy
| | - Vladimir Nosi
- Department of Computer Science, University of Turin, Via Pessinetto 12, 10049, Turin, Italy
| | - Raffaele A Calogero
- Department of Molecular Biotechnology and Health Sciences, Via Nizza 52, 10126, Turin, Italy
| | - Luca Alessandri
- Department of Molecular Biotechnology and Health Sciences, Via Nizza 52, 10126, Turin, Italy
| | - Elena Pedrini
- Department of Rare Skeletal Disorders, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136, Bologna, Italy
| | - Marina Mordenti
- Department of Rare Skeletal Disorders, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136, Bologna, Italy
| | - Emanuele Bonetti
- Department of Rare Skeletal Disorders, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136, Bologna, Italy
- Department of Experimental Oncology, European Institute of Oncology, Via Adamello 16, 20139, Milan, Italy
| | - Luca Sangiorgi
- Department of Rare Skeletal Disorders, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136, Bologna, Italy
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy.
- Department of Biosciences, Biotechnologies and Biopharmaceutics, University of Bari, Via Orabona 4, 70126, Bari, Italy.
| | - Federico Zambelli
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy.
- Department of Biosciences, University of Milan, Via Celoria 26, 20133, Milano, Italy.
| |
Collapse
|
24
|
Dannhauser D, Rossi D, Palatucci AT, Rubino V, Carriero F, Ruggiero G, Ripaldi M, Toriello M, Maisto G, Netti PA, Terrazzano G, Causa F. Non-invasive and label-free identification of human natural killer cell subclasses by biophysical single-cell features in microfluidic flow. LAB ON A CHIP 2021; 21:4144-4154. [PMID: 34515262 DOI: 10.1039/d1lc00651g] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Natural killer (NK) cells are indicated as favorite candidates for innovative therapeutic treatment and are divided into two subclasses: immature regulatory NK CD56bright and mature cytotoxic NK CD56dim. Therefore, the ability to discriminate CD56dim from CD56bright could be very useful because of their higher cytotoxicity. Nowadays, NK cell classification is routinely performed by cytometric analysis based on surface receptor expression. Here, we present an in-flow, label-free and non-invasive biophysical analysis of NK cells through a combination of light scattering and machine learning (ML) for NK cell subclass classification. In this respect, to identify relevant biophysical cell features, we stimulated NK cells with interleukine-15 inducing a subclass transition from CD56bright to CD56dim. We trained our ML algorithm with sorted NK cell subclasses (≥86% accuracy). Next, we applied our NK cell classification algorithm to cells stimulated over time, to investigate the transition of CD56bright to CD56dim and their biophysical feature changes. Finally, we tested our approach on several proband samples, highlighting the potential of our measurement approach. We show a label-free way for the robust identification of NK cell subclasses based on biophysical features, which can be applied in both cell biology and cell therapy.
Collapse
Affiliation(s)
- David Dannhauser
- Interdisciplinary Research Centre on Biomaterials (CRIB) and Dipartimento di Ingegneria Chimica, dei Materiali e della Produzione Industriale, Università degli Studi di Napoli "Federico II", Piazzale Tecchio 80, 80125 Naples, Italy.
| | - Domenico Rossi
- Center for Advanced Biomaterials for Healthcare@CRIB, Istituto Italiano di Tecnologia, Largo Barsanti e Matteucci 53, 80125 Naples, Italy
| | - Anna Teresa Palatucci
- Dipartimento di Scienze (DiS), Università della Basilicata, Via dell'Ateneo Lucano 10, 85100 Potenza, Italy
| | - Valentina Rubino
- Dipartimento di Scienze Mediche Traslazionali, Università degli Studi di Napoli "Federico II", Naples, Italy
| | - Flavia Carriero
- Dipartimento di Scienze Mediche Traslazionali, Università degli Studi di Napoli "Federico II", Naples, Italy
| | - Giuseppina Ruggiero
- Dipartimento di Scienze Mediche Traslazionali, Università degli Studi di Napoli "Federico II", Naples, Italy
| | - Mimmo Ripaldi
- Dipartimento Oncologia AORN Santobono Pausilipon Hospital, Via Posillipo, 226, 80123, Naples, Italy
| | - Mario Toriello
- Dipartimento Oncologia AORN Santobono Pausilipon Hospital, Via Posillipo, 226, 80123, Naples, Italy
| | - Giovanna Maisto
- Dipartimento Oncologia AORN Santobono Pausilipon Hospital, Via Posillipo, 226, 80123, Naples, Italy
| | - Paolo Antonio Netti
- Interdisciplinary Research Centre on Biomaterials (CRIB) and Dipartimento di Ingegneria Chimica, dei Materiali e della Produzione Industriale, Università degli Studi di Napoli "Federico II", Piazzale Tecchio 80, 80125 Naples, Italy.
- Center for Advanced Biomaterials for Healthcare@CRIB, Istituto Italiano di Tecnologia, Largo Barsanti e Matteucci 53, 80125 Naples, Italy
| | - Giuseppe Terrazzano
- Dipartimento di Scienze (DiS), Università della Basilicata, Via dell'Ateneo Lucano 10, 85100 Potenza, Italy
| | - Filippo Causa
- Interdisciplinary Research Centre on Biomaterials (CRIB) and Dipartimento di Ingegneria Chimica, dei Materiali e della Produzione Industriale, Università degli Studi di Napoli "Federico II", Piazzale Tecchio 80, 80125 Naples, Italy.
| |
Collapse
|
25
|
Asada K, Takasawa K, Machino H, Takahashi S, Shinkai N, Bolatkan A, Kobayashi K, Komatsu M, Kaneko S, Okamoto K, Hamamoto R. Single-Cell Analysis Using Machine Learning Techniques and Its Application to Medical Research. Biomedicines 2021; 9:biomedicines9111513. [PMID: 34829742 PMCID: PMC8614827 DOI: 10.3390/biomedicines9111513] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 10/06/2021] [Accepted: 10/19/2021] [Indexed: 01/14/2023] Open
Abstract
In recent years, the diversity of cancer cells in tumor tissues as a result of intratumor heterogeneity has attracted attention. In particular, the development of single-cell analysis technology has made a significant contribution to the field; technologies that are centered on single-cell RNA sequencing (scRNA-seq) have been reported to analyze cancer constituent cells, identify cell groups responsible for therapeutic resistance, and analyze gene signatures of resistant cell groups. However, although single-cell analysis is a powerful tool, various issues have been reported, including batch effects and transcriptional noise due to gene expression variation and mRNA degradation. To overcome these issues, machine learning techniques are currently being introduced for single-cell analysis, and promising results are being reported. In addition, machine learning has also been used in various ways for single-cell analysis, such as single-cell assay of transposase accessible chromatin sequencing (ATAC-seq), chromatin immunoprecipitation sequencing (ChIP-seq) analysis, and multi-omics analysis; thus, it contributes to a deeper understanding of the characteristics of human diseases, especially cancer, and supports clinical applications. In this review, we present a comprehensive introduction to the implementation of machine learning techniques in medical research for single-cell analysis, and discuss their usefulness and future potential.
Collapse
Affiliation(s)
- Ken Asada
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
- Correspondence: (K.A.); (R.H.); Tel.: +81-3-3547-5271 (R.H.)
| | - Ken Takasawa
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
| | - Hidenori Machino
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
| | - Satoshi Takahashi
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
| | - Norio Shinkai
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo 113-8510, Japan
| | - Amina Bolatkan
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (K.K.); (S.K.)
| | - Kazuma Kobayashi
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (K.K.); (S.K.)
| | - Masaaki Komatsu
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
| | - Syuzo Kaneko
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (K.K.); (S.K.)
| | - Koji Okamoto
- Division of Cancer Differentiation, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan;
| | - Ryuji Hamamoto
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo 113-8510, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (K.K.); (S.K.)
- Correspondence: (K.A.); (R.H.); Tel.: +81-3-3547-5271 (R.H.)
| |
Collapse
|
26
|
Pratella D, Ait-El-Mkadem Saadi S, Bannwarth S, Paquis-Fluckinger V, Bottini S. A Survey of Autoencoder Algorithms to Pave the Diagnosis of Rare Diseases. Int J Mol Sci 2021; 22:10891. [PMID: 34639231 PMCID: PMC8509321 DOI: 10.3390/ijms221910891] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/04/2021] [Accepted: 10/07/2021] [Indexed: 12/28/2022] Open
Abstract
Rare diseases (RDs) concern a broad range of disorders and can result from various origins. For a long time, the scientific community was unaware of RDs. Impressive progress has already been made for certain RDs; however, due to the lack of sufficient knowledge, many patients are not diagnosed. Nowadays, the advances in high-throughput sequencing technologies such as whole genome sequencing, single-cell and others, have boosted the understanding of RDs. To extract biological meaning using the data generated by these methods, different analysis techniques have been proposed, including machine learning algorithms. These methods have recently proven to be valuable in the medical field. Among such approaches, unsupervised learning methods via neural networks including autoencoders (AEs) or variational autoencoders (VAEs) have shown promising performances with applications on various type of data and in different contexts, from cancer to healthy patient tissues. In this review, we discuss how AEs and VAEs have been used in biomedical settings. Specifically, we discuss their current applications and the improvements achieved in diagnostic and survival of patients. We focus on the applications in the field of RDs, and we discuss how the employment of AEs and VAEs would enhance RD understanding and diagnosis.
Collapse
Affiliation(s)
- David Pratella
- Center of Modeling, Simulation and Interactions, Université Côte d’Azur, 06200 Nice, France;
| | - Samira Ait-El-Mkadem Saadi
- Centre Hospitalier Universitaire (CHU) de Nice, Institute for Research on Cancer and Aging, Nice (IRCAN), Université Côte d’Azur, Inserm U1081, CNRS UMR 7284, 06200 Nice, France; (S.A.-E.-M.S.); (S.B.); (V.P.-F.)
| | - Sylvie Bannwarth
- Centre Hospitalier Universitaire (CHU) de Nice, Institute for Research on Cancer and Aging, Nice (IRCAN), Université Côte d’Azur, Inserm U1081, CNRS UMR 7284, 06200 Nice, France; (S.A.-E.-M.S.); (S.B.); (V.P.-F.)
| | - Véronique Paquis-Fluckinger
- Centre Hospitalier Universitaire (CHU) de Nice, Institute for Research on Cancer and Aging, Nice (IRCAN), Université Côte d’Azur, Inserm U1081, CNRS UMR 7284, 06200 Nice, France; (S.A.-E.-M.S.); (S.B.); (V.P.-F.)
| | - Silvia Bottini
- Center of Modeling, Simulation and Interactions, Université Côte d’Azur, 06200 Nice, France;
| |
Collapse
|
27
|
Randhawa V, Kumar M. An integrated network analysis approach to identify potential key genes, transcription factors, and microRNAs regulating human hematopoietic stem cell aging. Mol Omics 2021; 17:967-984. [PMID: 34605522 DOI: 10.1039/d1mo00199j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Hematopoietic stem cells (HSCs) undergo functional deterioration with increasing age that causes loss of their self-renewal and regenerative potential. Despite various efforts, significant success in identifying molecular regulators of HSC aging has not been achieved, one prime reason being the non-availability of appropriate human HSC samples. To demonstrate the scope of integrating and re-analyzing the HSC transcriptomics data available, we used existing tools and databases to structure a sequential data analysis pipeline to predict potential candidate genes, transcription factors, and microRNAs simultaneously. This sequential approach comprises (i) collecting matched young and aged mice HSC sample datasets, (ii) identifying differentially expressed genes, (iii) identifying human homologs of differentially expressed genes, (iv) inferring gene co-expression network modules, and (v) inferring the microRNA-transcription factor-gene regulatory network. Systems-level analyses of HSC interaction networks provided various insights based on which several candidates were predicted. For example, 16 HSC aging-related candidate genes were predicted (e.g., CD38, BRCA1, AGTR1, GSTM1, etc.) from GCN analysis. Following this, the shortest path distance-based analyses of the regulatory network predicted several novel candidate miRNAs and TFs. Among these, miR-124-3p was a common regulator in candidate gene modules, while TFs MYC and SP1 were identified to regulate various candidate genes. Based on the regulatory interactions among candidate genes, TFs, and miRNAs, a potential regulation model of biological processes in each of the candidate modules was predicted, which provided systems-level insights into the molecular complexity of each module to regulate HSC aging.
Collapse
Affiliation(s)
- Vinay Randhawa
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific & Industrial Research, Chandigarh-160036, India.
| | - Manoj Kumar
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific & Industrial Research, Chandigarh-160036, India. .,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad-201002, India
| |
Collapse
|
28
|
MET Exon 14 Skipping: A Case Study for the Detection of Genetic Variants in Cancer Driver Genes by Deep Learning. Int J Mol Sci 2021; 22:ijms22084217. [PMID: 33921709 PMCID: PMC8072630 DOI: 10.3390/ijms22084217] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 04/13/2021] [Accepted: 04/17/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Disruption of alternative splicing (AS) is frequently observed in cancer and might represent an important signature for tumor progression and therapy. Exon skipping (ES) represents one of the most frequent AS events, and in non-small cell lung cancer (NSCLC) MET exon 14 skipping was shown to be targetable. METHODS We constructed neural networks (NN/CNN) specifically designed to detect MET exon 14 skipping events using RNAseq data. Furthermore, for discovery purposes we also developed a sparsely connected autoencoder to identify uncharacterized MET isoforms. RESULTS The neural networks had a Met exon 14 skipping detection rate greater than 94% when tested on a manually curated set of 690 TCGA bronchus and lung samples. When globally applied to 2605 TCGA samples, we observed that the majority of false positives was characterized by a blurry coverage of exon 14, but interestingly they share a common coverage peak in the second intron and we speculate that this event could be the transcription signature of a LINE1 (Long Interspersed Nuclear Element 1)-MET (Mesenchymal Epithelial Transition receptor tyrosine kinase) fusion. CONCLUSIONS Taken together, our results indicate that neural networks can be an effective tool to provide a quick classification of pathological transcription events, and sparsely connected autoencoders could represent the basis for the development of an effective discovery tool.
Collapse
|