1
|
Cui Z, Wang SG, He Y, Chen ZH, Zhang QH. DeepTPpred: A Deep Learning Approach With Matrix Factorization for Predicting Therapeutic Peptides by Integrating Length Information. IEEE J Biomed Health Inform 2023; 27:4611-4622. [PMID: 37368803 DOI: 10.1109/jbhi.2023.3290014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]
Abstract
The abuse of traditional antibiotics has led to increased resistance of bacteria and viruses. Efficient therapeutic peptide prediction is critical for peptide drug discovery. However, most of the existing methods only make effective predictions for one class of therapeutic peptides. It is worth noting that currently no predictive method considers sequence length information as a distinct feature of therapeutic peptides. In this article, a novel deep learning approach with matrix factorization for predicting therapeutic peptides (DeepTPpred) by integrating length information are proposed. The matrix factorization layer can learn the potential features of the encoded sequence through the mechanism of first compression and then restoration. And the length features of the sequence of therapeutic peptides are embedded with encoded amino acid sequences. To automatically learn therapeutic peptide predictions, these latent features are input into the neural networks with self-attention mechanism. On eight therapeutic peptide datasets, DeepTPpred achieved excellent prediction results. Based on these datasets, we first integrated eight datasets to obtain a full therapeutic peptide integration dataset. Then, we obtained two functional integration datasets based on the functional similarity of the peptides. Finally, we also conduct experiments on the latest versions of the ACP and CPP datasets. Overall, the experimental results show that our work is effective for the identification of therapeutic peptides.
Collapse
|
2
|
Ten FW, Yuan D, Jabareen N, Phua YJ, Eils R, Lukassen S, Conrad C. resVAE ensemble: Unsupervised identification of gene sets in multi-modal single-cell sequencing data using deep ensembles. Front Cell Dev Biol 2023; 11:1091047. [PMID: 36875765 PMCID: PMC9975353 DOI: 10.3389/fcell.2023.1091047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 01/24/2023] [Indexed: 02/17/2023] Open
Abstract
Feature identification and manual inspection is currently still an integral part of biological data analysis in single-cell sequencing. Features such as expressed genes and open chromatin status are selectively studied in specific contexts, cell states or experimental conditions. While conventional analysis methods construct a relatively static view on gene candidates, artificial neural networks have been used to model their interactions after hierarchical gene regulatory networks. However, it is challenging to identify consistent features in this modeling process due to the inherently stochastic nature of these methods. Therefore, we propose using ensembles of autoencoders and subsequent rank aggregation to extract consensus features in a less biased manner. Here, we performed sequencing data analyses of different modalities either independently or simultaneously as well as with other analysis tools. Our resVAE ensemble method can successfully complement and find additional unbiased biological insights with minimal data processing or feature selection steps while giving a measurement of confidence, especially for models using stochastic or approximation algorithms. In addition, our method can also work with overlapping clustering identity assignment suitable for transitionary cell types or cell fates in comparison to most conventional tools.
Collapse
Affiliation(s)
- Foo Wei Ten
- Center for Digital Health, Berlin Institute of Health (BIH) at Charité-Universitatsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Dongsheng Yuan
- Center for Digital Health, Berlin Institute of Health (BIH) at Charité-Universitatsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin, Germany.,Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Department of Neurology with Experimental Neurology, Berlin, Germany
| | - Nabil Jabareen
- Center for Digital Health, Berlin Institute of Health (BIH) at Charité-Universitatsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Yin Jun Phua
- Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan
| | - Roland Eils
- Center for Digital Health, Berlin Institute of Health (BIH) at Charité-Universitatsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin, Germany.,Health Data Science Unit, Faculty of Medicine, University of Heidelberg, Heidelberg, Germany
| | - Sören Lukassen
- Center for Digital Health, Berlin Institute of Health (BIH) at Charité-Universitatsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Christian Conrad
- Center for Digital Health, Berlin Institute of Health (BIH) at Charité-Universitatsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
3
|
Brendel M, Su C, Bai Z, Zhang H, Elemento O, Wang F. Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:814-835. [PMID: 36528240 PMCID: PMC10025684 DOI: 10.1016/j.gpb.2022.11.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 08/17/2022] [Accepted: 11/24/2022] [Indexed: 12/23/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during the development of complex organisms, and improved our understanding of disease states, such as cancer, diabetes, and coronavirus disease 2019 (COVID-19). Deep learning, a recent advance of artificial intelligence that has been used to address many problems involving large datasets, has also emerged as a promising tool for scRNA-seq data analysis, as it has a capacity to extract informative and compact features from noisy, heterogeneous, and high-dimensional scRNA-seq data to improve downstream analysis. The present review aims at surveying recently developed deep learning techniques in scRNA-seq data analysis, identifying key steps within the scRNA-seq data analysis pipeline that have been advanced by deep learning, and explaining the benefits of deep learning over more conventional analytic tools. Finally, we summarize the challenges in current deep learning approaches faced within scRNA-seq data and discuss potential directions for improvements in deep learning algorithms for scRNA-seq data analysis.
Collapse
Affiliation(s)
- Matthew Brendel
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA; Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Chang Su
- Department of Health Service Administration and Policy, Temple University, Philadelphia, PA 19122, USA.
| | - Zilong Bai
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Hao Zhang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Olivier Elemento
- Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA.
| |
Collapse
|
4
|
Lu Y, Li Q, Li T. PPA-GCN: A Efficient GCN Framework for Prokaryotic Pathways Assignment. Front Genet 2022; 13:839453. [PMID: 35444686 PMCID: PMC9013948 DOI: 10.3389/fgene.2022.839453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/17/2022] [Indexed: 11/17/2022] Open
Abstract
With the rapid development of sequencing technology, completed genomes of microbes have explosively emerged. For a newly sequenced prokaryotic genome, gene functional annotation and metabolism pathway assignment are important foundations for all subsequent research work. However, the assignment rate for gene metabolism pathways is lower than 48% on the whole. It is even lower for newly sequenced prokaryotic genomes, which has become a bottleneck for subsequent research. Thus, the development of a high-precision metabolic pathway assignment framework is urgently needed. Here, we developed PPA-GCN, a prokaryotic pathways assignment framework based on graph convolutional network, to assist functional pathway assignments using KEGG information and genomic characteristics. In the framework, genomic gene synteny information was used to construct a network, and ideas of self-supervised learning were inspired to enhance the framework’s learning ability. Our framework is applicable to the genera of microbe with sufficient whole genome sequences. To evaluate the assignment rate, genomes from three different genera (Flavobacterium (65 genomes) and Pseudomonas (100 genomes), Staphylococcus (500 genomes)) were used. The initial functional pathway assignment rate of the three test genera were 27.7% (Flavobacterium), 49.5% (Pseudomonas) and 30.1% (Staphylococcus). PPA-GCN achieved excellence performance of 84.8% (Flavobacterium), 77.0% (Pseudomonas) and 71.0% (Staphylococcus) for assignment rate. At the same time, PPA-GCN was proved to have strong fault tolerance. The framework provides novel insights into assignment for metabolism pathways and is likely to inform future deep learning applications for interpreting functional annotations and extends to all prokaryotic genera with sufficient genomes.
Collapse
Affiliation(s)
- Yuntao Lu
- Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.,College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Qi Li
- Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Tao Li
- Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| |
Collapse
|