Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Sun S, Zhu J, Ma Y, Zhou X. Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol 2019;20:269. [PMID: 31823809 PMCID: PMC6902413 DOI: 10.1186/s13059-019-1898-6] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 11/22/2019] [Indexed: 01/01/2023] Open

For:	Sun S, Zhu J, Ma Y, Zhou X. Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol 2019;20:269. [PMID: 31823809 PMCID: PMC6902413 DOI: 10.1186/s13059-019-1898-6] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 11/22/2019] [Indexed: 01/01/2023] Open

Number

Cited by Other Article(s)

Sin DD. What Single Cell RNA Sequencing Has Taught Us about Chronic Obstructive Pulmonary Disease. Tuberc Respir Dis (Seoul) 2024;87:252-260. [PMID: 38369875 PMCID: PMC11222093 DOI: 10.4046/trd.2024.0001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 02/17/2024] [Indexed: 02/20/2024] Open

Fang C, Selega A, Campbell KR. Beyond benchmarking and towards predictive models of dataset-specific single-cell RNA-seq pipeline performance. Genome Biol 2024;25:159. [PMID: 38886757 PMCID: PMC11184819 DOI: 10.1186/s13059-024-03304-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open

Sun Y, Kong L, Huang J, Deng H, Bian X, Li X, Cui F, Dou L, Cao C, Zou Q, Zhang Z. A comprehensive survey of dimensionality reduction and clustering methods for single-cell and spatial transcriptomics data. Brief Funct Genomics 2024:elae023. [PMID: 38860675 DOI: 10.1093/bfgp/elae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/29/2024] [Accepted: 05/27/2024] [Indexed: 06/12/2024] Open

Canzar S, Do VH, Jelić S, Laue S, Matijević D, Prusina T. Metric multidimensional scaling for large single-cell datasets using neural networks. Algorithms Mol Biol 2024;19:21. [PMID: 38863064 PMCID: PMC11165904 DOI: 10.1186/s13015-024-00265-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 05/22/2024] [Indexed: 06/13/2024] Open

Shi M, Tian Y, Luo Y, Elze T, Wang M. RNFLT2Vec: Artifact-corrected representation learning for retinal nerve fiber layer thickness maps. Med Image Anal 2024;94:103110. [PMID: 38458093 DOI: 10.1016/j.media.2024.103110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/09/2024] [Accepted: 02/15/2024] [Indexed: 03/10/2024]

Abstract

Optical coherence tomography imaging provides a crucial clinical measurement for diagnosing and monitoring glaucoma through the two-dimensional retinal nerve fiber layer (RNFL) thickness (RNFLT) map. Researchers have been increasingly using neural models to extract meaningful features from the RNFLT map, aiming to identify biomarkers for glaucoma and its progression. However, accurately representing the RNFLT map features relevant to glaucoma is challenging due to significant variations in retinal anatomy among individuals, which confound the pathological thinning of the RNFL. Moreover, the presence of artifacts in the RNFLT map, caused by segmentation errors in the context of degraded image quality and defective imaging procedures, further complicates the task. In this paper, we propose a general framework called RNFLT2Vec for unsupervised learning of vectorized feature representations from RNFLT maps. Our method includes an artifact correction component that learns to rectify RNFLT values at artifact locations, producing a representation reflecting the RNFLT map without artifacts. Additionally, we incorporate two regularization techniques to encourage discriminative representation learning. Firstly, we introduce a contrastive learning-based regularization to capture the similarities and dissimilarities between RNFLT maps. Secondly, we employ a consistency learning-based regularization to align pairwise distances of RNFLT maps with their corresponding thickness distributions. Through extensive experiments on a large-scale real-world dataset, we demonstrate the superiority of RNFLT2Vec in three different clinical tasks: RNFLT pattern discovery, glaucoma detection, and visual field prediction. Our results validate the effectiveness of our framework and its potential to contribute to a better understanding and diagnosis of glaucoma.

Collapse

Wu J, Wang L, Xi S, Ma C, Zou F, Fang G, Liu F, Wang X, Qu L. Biological significance of METTL5 in atherosclerosis: comprehensive analysis of single-cell and bulk RNA sequencing data. Aging (Albany NY) 2024;16:7267-7276. [PMID: 38663914 PMCID: PMC11087127 DOI: 10.18632/aging.205755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 03/27/2024] [Indexed: 05/08/2024]

Wang Y, Chen X, Zheng Z, Huang L, Xie W, Wang F, Zhang Z, Wong KC. scGREAT: Transformer-based deep-language model for gene regulatory network inference from single-cell transcriptomics. iScience 2024;27:109352. [PMID: 38510148 PMCID: PMC10951644 DOI: 10.1016/j.isci.2024.109352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 12/29/2023] [Accepted: 02/23/2024] [Indexed: 03/22/2024] Open

Luecken MD, Gigante S, Burkhardt DB, Cannoodt R, Strobl DC, Markov NS, Zappia L, Palla G, Lewis W, Dimitrov D, Vinyard ME, Magruder DS, Andersson A, Dann E, Qin Q, Otto DJ, Klein M, Botvinnik OB, Deconinck L, Waldrant K, Bloom JM, Pisco AO, Saez-Rodriguez J, Wulsin D, Pinello L, Saeys Y, Theis FJ, Krishnaswamy S. Defining and benchmarking open problems in single-cell analysis. RESEARCH SQUARE 2024:rs.3.rs-4181617. [PMID: 38645152 PMCID: PMC11030530 DOI: 10.21203/rs.3.rs-4181617/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]

Affiliation(s)

Malte D Luecken Institute of computational Biology, Helmholtz Munich, Neuherberg, Germany Institute of Lung Health & Immunity, Helmholtz Munich; Member of the German Center for Lung Research (DZL), Munich, Germany
Scott Gigante Immunai, New York, USA
Daniel B Burkhardt Cellarity, Inc. Somerville, USA
Robrecht Cannoodt Data Intuitive, Lebbeke, Belgium Data Mining and Modelling for Biomedicine group, VIB Center for Inflammation Research, Ghent, Belgium Department of Applied Mathematics, Computer Science, and Statistics, Ghent University, Ghent, Belgium
Daniel C Strobl Institute of computational Biology, Helmholtz Munich, Neuherberg, Germany Institute of Clinical Chemistry and Pathobiochemistry, School of Medicine, Technical University of Munich, Munich, Germany TUM School of Life Sciences Weihenstephan, Technical University of Munich, Germany
Nikolay S Markov Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University
Luke Zappia Institute of computational Biology, Helmholtz Munich, Neuherberg, Germany Department of Mathematics, School of Computing, Information and Technology, Technical University of Munich, Munich, Germany
Giovanni Palla Institute of computational Biology, Helmholtz Munich, Neuherberg, Germany TUM School of Life Sciences Weihenstephan, Technical University of Munich, Germany
Wesley Lewis Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
Daniel Dimitrov Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
Michael E Vinyard Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA Broad Institute of Harvard and MIT, Cambridge, MA, USA Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
D S Magruder Department of Computer Science, Yale University, New Haven CT, USA
Alma Andersson Genentech Inc Royal Institute of Technology (KTH), Gene Technology Science for Life Laboratory (SciLifeLab)
Emma Dann Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
Qian Qin Broad Institute of Harvard and MIT, Cambridge, MA, USA
Dominik J Otto Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle WA Computational Biology Program, Public Health Sciences Division, Seattle WA Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle WA
Michal Klein Apple
Olga Borisovna Botvinnik Data Sciences Platform, Chan Zuckerberg Biohub, 499 Illinois St, San Francisco, CA 94158 Bridge Bio Pharma, 3160 Porter Drive, Suite 250, Palo Alto, CA, 94304
Louise Deconinck Data Mining and Modelling for Biomedicine group, VIB Center for Inflammation Research, Ghent, Belgium Department of Applied Mathematics, Computer Science, and Statistics, Ghent University, Ghent, Belgium
Kai Waldrant Data Intuitive, Lebbeke, Belgium
Jonathan M Bloom Massachusetts Institute of Technology
Angela Oliveira Pisco Data Sciences Platform, Chan Zuckerberg Biohub, 499 Illinois St, San Francisco, CA 94158 Insitro, South San Francisco
Julio Saez-Rodriguez Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
Drausin Wulsin Immunai, New York, USA
Luca Pinello Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
Yvan Saeys Data Mining and Modelling for Biomedicine group, VIB Center for Inflammation Research, Ghent, Belgium Department of Applied Mathematics, Computer Science, and Statistics, Ghent University, Ghent, Belgium VIB Center for AI & Computational Biology (VIB.AI), Gent, Belgium
Fabian J Theis Institute of computational Biology, Helmholtz Munich, Neuherberg, Germany Department of Mathematics, School of Computing, Information and Technology, Technical University of Munich, Munich, Germany Cellular Genetics Programme, Wellcome Sanger Institute, Hinxton, UK (associated faculty)
Smita Krishnaswamy Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA Department of Computer Science, Yale University, New Haven CT, USA Department of Genetics, Yale University, New Haven CT, USA

Collapse

Yuan CU, Quah FX, Hemberg M. Single-cell and spatial transcriptomics: Bridging current technologies with long-read sequencing. Mol Aspects Med 2024;96:101255. [PMID: 38368637 DOI: 10.1016/j.mam.2024.101255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 01/30/2024] [Accepted: 02/07/2024] [Indexed: 02/20/2024]

Weine E, Carbonetto P, Stephens M. Accelerated dimensionality reduction of single-cell RNA sequencing data with fastglmpca. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.23.586420. [PMID: 38585920 PMCID: PMC10996495 DOI: 10.1101/2024.03.23.586420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]

Kang Y, Zhang H, Guan J. scINRB: single-cell gene expression imputation with network regularization and bulk RNA-seq data. Brief Bioinform 2024;25:bbae148. [PMID: 38600665 PMCID: PMC11006796 DOI: 10.1093/bib/bbae148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/26/2024] [Accepted: 03/18/2024] [Indexed: 04/12/2024] Open

Li T, Qian K, Wang X, Li WV, Li H. scBiG for representation learning of single-cell gene expression data based on bipartite graph embedding. NAR Genom Bioinform 2024;6:lqae004. [PMID: 38288376 PMCID: PMC10823585 DOI: 10.1093/nargab/lqae004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 12/19/2023] [Accepted: 01/09/2024] [Indexed: 01/31/2024] Open

Xia L, Lee C, Li JJ. Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters. Nat Commun 2024;15:1753. [PMID: 38409103 PMCID: PMC10897166 DOI: 10.1038/s41467-024-45891-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 02/06/2024] [Indexed: 02/28/2024] Open

Chen Y, Zheng R, Liu J, Li M. scMLC: an accurate and robust multiplex community detection method for single-cell multi-omics data. Brief Bioinform 2024;25:bbae101. [PMID: 38493339 PMCID: PMC10944569 DOI: 10.1093/bib/bbae101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 01/03/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open

Atitey K, Motsinger-Reif AA, Anchang B. Model-based evaluation of spatiotemporal data reduction methods with unknown ground truth through optimal visualization and interpretability metrics. Brief Bioinform 2023;25:bbad455. [PMID: 38113074 PMCID: PMC10729792 DOI: 10.1093/bib/bbad455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/06/2023] [Accepted: 11/20/2023] [Indexed: 12/21/2023] Open

Abstract

Optimizing and benchmarking data reduction methods for dynamic or spatial visualization and interpretation (DSVI) face challenges due to many factors, including data complexity, lack of ground truth, time-dependent metrics, dimensionality bias and different visual mappings of the same data. Current studies often focus on independent static visualization or interpretability metrics that require ground truth. To overcome this limitation, we propose the MIBCOVIS framework, a comprehensive and interpretable benchmarking and computational approach. MIBCOVIS enhances the visualization and interpretability of high-dimensional data without relying on ground truth by integrating five robust metrics, including a novel time-ordered Markov-based structural metric, into a semi-supervised hierarchical Bayesian model. The framework assesses method accuracy and considers interaction effects among metric features. We apply MIBCOVIS using linear and nonlinear dimensionality reduction methods to evaluate optimal DSVI for four distinct dynamic and spatial biological processes captured by three single-cell data modalities: CyTOF, scRNA-seq and CODEX. These data vary in complexity based on feature dimensionality, unknown cell types and dynamic or spatial differences. Unlike traditional single-summary score approaches, MIBCOVIS compares accuracy distributions across methods. Our findings underscore the joint evaluation of visualization and interpretability, rather than relying on separate metrics. We reveal that prioritizing average performance can obscure method feature performance. Additionally, we explore the impact of data complexity on visualization and interpretability. Specifically, we provide optimal parameters and features and recommend methods, like the optimized variational contractive autoencoder, for targeted DSVI for various data complexities. MIBCOVIS shows promise for evaluating dynamic single-cell atlases and spatiotemporal data reduction models.

Collapse

Hassan AZ, Ward HN, Rahman M, Billmann M, Lee Y, Myers CL. Dimensionality reduction methods for extracting functional networks from large-scale CRISPR screens. Mol Syst Biol 2023;19:e11657. [PMID: 37750448 PMCID: PMC10632734 DOI: 10.15252/msb.202311657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 08/28/2023] [Accepted: 09/05/2023] [Indexed: 09/27/2023] Open

Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. Genome Biol 2023;24:236. [PMID: 37858253 PMCID: PMC10588049 DOI: 10.1186/s13059-023-03067-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/20/2023] [Indexed: 10/21/2023] Open

Du J, Gu XR, Yu XX, Cao YJ, Hou J. Essential procedures of single-cell RNA sequencing in multiple myeloma and its translational value. BLOOD SCIENCE 2023;5:221-236. [PMID: 37941914 PMCID: PMC10629747 DOI: 10.1097/bs9.0000000000000172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 09/18/2023] [Indexed: 11/10/2023] Open

Xia L, Lee C, Li JJ. scDEED: a statistical method for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.21.537839. [PMID: 37163087 PMCID: PMC10168265 DOI: 10.1101/2023.04.21.537839] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]

Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.03.531029. [PMID: 36945441 PMCID: PMC10028846 DOI: 10.1101/2023.03.03.531029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]

Erfanian N, Heydari AA, Feriz AM, Iañez P, Derakhshani A, Ghasemigol M, Farahpour M, Razavi SM, Nasseri S, Safarpour H, Sahebkar A. Deep learning applications in single-cell genomics and transcriptomics data analysis. Biomed Pharmacother 2023;165:115077. [PMID: 37393865 DOI: 10.1016/j.biopha.2023.115077] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 06/22/2023] [Accepted: 06/23/2023] [Indexed: 07/04/2023] Open

Gunawan I, Vafaee F, Meijering E, Lock JG. An introduction to representation learning for single-cell data analysis. CELL REPORTS METHODS 2023;3:100547. [PMID: 37671013 PMCID: PMC10475795 DOI: 10.1016/j.crmeth.2023.100547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]

Kana O, Nault R, Filipovic D, Marri D, Zacharewski T, Bhattacharya S. Generative modeling of single-cell gene expression for dose-dependent chemical perturbations. PATTERNS (NEW YORK, N.Y.) 2023;4:100817. [PMID: 37602218 PMCID: PMC10436058 DOI: 10.1016/j.patter.2023.100817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 12/07/2022] [Accepted: 07/14/2023] [Indexed: 08/22/2023]

Morales-Hernandez AG, Martinez-Aguilar V, Chavez-Gonzalez TM, Mendez-Avila JC, Frias-Becerril JV, Morales-Hernandez LA, Cruz-Albarran IA. Short-Term Thermal Effect of Continuous Ultrasound from 3 MHz to 1 and 0.5 W/cm² Applied to Gastrocnemius Muscle. Diagnostics (Basel) 2023;13:2644. [PMID: 37627903 PMCID: PMC10453025 DOI: 10.3390/diagnostics13162644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 08/05/2023] [Indexed: 08/27/2023] Open

Raimundo F, Prompsy P, Vert JP, Vallot C. A benchmark of computational pipelines for single-cell histone modification data. Genome Biol 2023;24:143. [PMID: 37340307 PMCID: PMC10280832 DOI: 10.1186/s13059-023-02981-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 06/07/2023] [Indexed: 06/22/2023] Open

Abstract

BACKGROUND

Single-cell histone post translational modification (scHPTM) assays such as scCUT&Tag or scChIP-seq allow single-cell mapping of diverse epigenomic landscapes within complex tissues and are likely to unlock our understanding of various mechanisms involved in development or diseases. Running scHTPM experiments and analyzing the data produced remains challenging since few consensus guidelines currently exist regarding good practices for experimental design and data analysis pipelines.

RESULTS

We perform a computational benchmark to assess the impact of experimental parameters and data analysis pipelines on the ability of the cell representation to recapitulate known biological similarities. We run more than ten thousand experiments to systematically study the impact of coverage and number of cells, of the count matrix construction method, of feature selection and normalization, and of the dimension reduction algorithm used. This allows us to identify key experimental parameters and computational choices to obtain a good representation of single-cell HPTM data. We show in particular that the count matrix construction step has a strong influence on the quality of the representation and that using fixed-size bin counts outperforms annotation-based binning. Dimension reduction methods based on latent semantic indexing outperform others, and feature selection is detrimental, while keeping only high-quality cells has little influence on the final representation as long as enough cells are analyzed.

CONCLUSIONS

This benchmark provides a comprehensive study on how experimental parameters and computational choices affect the representation of single-cell HPTM data. We propose a series of recommendations regarding matrix construction, feature and cell selection, and dimensionality reduction algorithms.

Collapse

Li K, Sun YH, Ouyang Z, Negi S, Gao Z, Zhu J, Wang W, Chen Y, Piya S, Hu W, Zavodszky MI, Yalamanchili H, Cao S, Gehrke A, Sheehan M, Huh D, Casey F, Zhang X, Zhang B. scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing. BMC Genomics 2023;24:228. [PMID: 37131143 PMCID: PMC10155351 DOI: 10.1186/s12864-023-09332-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 04/25/2023] [Indexed: 05/04/2023] Open

Zhang S, Li X, Lin J, Lin Q, Wong KC. Review of single-cell RNA-seq data clustering for cell-type identification and characterization. RNA (NEW YORK, N.Y.) 2023;29:517-530. [PMID: 36737104 PMCID: PMC10158997 DOI: 10.1261/rna.078965.121] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Accepted: 01/03/2023] [Indexed: 05/06/2023]

Zhang Z, Wei X. Artificial intelligence-assisted selection and efficacy prediction of antineoplastic strategies for precision cancer therapy. Semin Cancer Biol 2023;90:57-72. [PMID: 36796530 DOI: 10.1016/j.semcancer.2023.02.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/12/2023] [Accepted: 02/13/2023] [Indexed: 02/16/2023]

Wang K, Yang Y, Wu F, Song B, Wang X, Wang T. Comparative analysis of dimension reduction methods for cytometry by time-of-flight data. Nat Commun 2023;14:1836. [PMID: 37005472 PMCID: PMC10067013 DOI: 10.1038/s41467-023-37478-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 03/20/2023] [Indexed: 04/04/2023] Open

Crowell HL, Morillo Leonardo SX, Soneson C, Robinson MD. The shaky foundations of simulating single-cell RNA sequencing data. Genome Biol 2023;24:62. [PMID: 36991470 PMCID: PMC10061781 DOI: 10.1186/s13059-023-02904-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 03/20/2023] [Indexed: 03/31/2023] Open

Zernab Hassan A, Ward HN, Rahman M, Billmann M, Lee Y, Myers CL. Dimensionality reduction methods for extracting functional networks from large-scale CRISPR screens. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.22.529573. [PMID: 36993440 PMCID: PMC10054965 DOI: 10.1101/2023.02.22.529573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]

Choi Y, Li R, Quon G. siVAE: interpretable deep generative models for single-cell transcriptomes. Genome Biol 2023;24:29. [PMID: 36803416 PMCID: PMC9940350 DOI: 10.1186/s13059-023-02850-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 01/06/2023] [Indexed: 02/22/2023] Open

Hsu LL, Culhane AC. Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data. Sci Rep 2023;13:1197. [PMID: 36681709 PMCID: PMC9867729 DOI: 10.1038/s41598-022-26434-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 12/14/2022] [Indexed: 01/22/2023] Open

Drake RS, Villanueva MA, Vilme M, Russo DD, Navia A, Love JC, Shalek AK. Profiling Transcriptional Heterogeneity with Seq-Well S³: A Low-Cost, Portable, High-Fidelity Platform for Massively Parallel Single-Cell RNA-Seq. Methods Mol Biol 2023;2584:57-104. [PMID: 36495445 DOI: 10.1007/978-1-0716-2756-3_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Affiliation(s)

Riley S Drake Institute for Medical Engineering and Science (IMES), Massachusetts Institute of Technology, Cambridge, MA, USA Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA The Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
Martin Arreola Villanueva Institute for Medical Engineering and Science (IMES), Massachusetts Institute of Technology, Cambridge, MA, USA. Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA. The Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA. Broad Institute of MIT and Harvard, Cambridge, MA, USA. Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
Mike Vilme Institute for Medical Engineering and Science (IMES), Massachusetts Institute of Technology, Cambridge, MA, USA Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA The Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
Daniela D Russo Institute for Medical Engineering and Science (IMES), Massachusetts Institute of Technology, Cambridge, MA, USA Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA The Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
Andrew Navia Institute for Medical Engineering and Science (IMES), Massachusetts Institute of Technology, Cambridge, MA, USA Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA The Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
J Christopher Love Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA. The Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA. Broad Institute of MIT and Harvard, Cambridge, MA, USA. Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
Alex K Shalek Institute for Medical Engineering and Science (IMES), Massachusetts Institute of Technology, Cambridge, MA, USA. Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA. The Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA. Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Collapse

Chatterjee D, Deng WM. Standardization of Single-Cell RNA-Sequencing Analysis Workflow to Study Drosophila Ovary. Methods Mol Biol 2023;2677:151-171. [PMID: 37464241 DOI: 10.1007/978-1-0716-3259-8_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/20/2023]

Wu S, Schmitz U. Single-cell and long-read sequencing to enhance modelling of splicing and cell-fate determination. Comput Struct Biotechnol J 2023;21:2373-2380. [PMID: 37066125 PMCID: PMC10091034 DOI: 10.1016/j.csbj.2023.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/13/2023] [Accepted: 03/13/2023] [Indexed: 04/03/2023] Open

Quah FX, Hemberg M. SC3s: efficient scaling of single cell consensus clustering to millions of cells. BMC Bioinformatics 2022;23:536. [PMID: 36503522 PMCID: PMC9743492 DOI: 10.1186/s12859-022-05085-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 11/25/2022] [Indexed: 12/14/2022] Open

Spatial-ID: a cell typing method for spatially resolved transcriptomics via transfer learning and spatial embedding. Nat Commun 2022;13:7640. [PMID: 36496406 PMCID: PMC9741613 DOI: 10.1038/s41467-022-35288-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 11/25/2022] [Indexed: 12/13/2022] Open

Su M, Pan T, Chen QZ, Zhou WW, Gong Y, Xu G, Yan HY, Li S, Shi QZ, Zhang Y, He X, Jiang CJ, Fan SC, Li X, Cairns MJ, Wang X, Li YS. Data analysis guidelines for single-cell RNA-seq in biomedical studies and clinical applications. Mil Med Res 2022;9:68. [PMID: 36461064 PMCID: PMC9716519 DOI: 10.1186/s40779-022-00434-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open

Affiliation(s)

Min Su State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
Tao Pan College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
Qiu-Zhen Chen State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
Wei-Wei Zhou College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
Yi Gong State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China.,Department of Immunology, Nanjing Medical University, Nanjing, 211166, China
Gang Xu College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
Huan-Yu Yan State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
Si Li College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
Qiao-Zhen Shi State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
Ya Zhang College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
Xiao He Department of Laboratory Medicine, Women and Children's Hospital of Chongqing Medical University, Chongqing, 401174, China
Chun-Jie Jiang Baylor College of Medicine, Houston, TX, 77030, USA
Shi-Cai Fan Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, 518110, Guangdong, China
Xia Li College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China.
Murray J Cairns School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, the University of Newcastle, University Drive, Callaghan, NSW, 2308, Australia. .,Precision Medicine Research Program, Hunter Medical Research Institute, New Lambton Heights, NSW, 2305, Australia.
Xi Wang State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China.
Yong-Sheng Li College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China.

Collapse

Spatially aware dimension reduction for spatial transcriptomics. Nat Commun 2022;13:7203. [PMID: 36418351 PMCID: PMC9684472 DOI: 10.1038/s41467-022-34879-1] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 11/10/2022] [Indexed: 11/27/2022] Open

Lee H, Han B. FastRNA: An efficient solution for PCA of single-cell RNA-sequencing data based on a batch-accounting count model. Am J Hum Genet 2022;109:1974-1985. [PMID: 36206757 PMCID: PMC9674949 DOI: 10.1016/j.ajhg.2022.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 09/14/2022] [Indexed: 01/26/2023] Open

Predicting the prevalence of lung cancer using feature transformation techniques. EGYPTIAN INFORMATICS JOURNAL 2022. [DOI: 10.1016/j.eij.2022.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Li Z, Zhou X. BASS: multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol 2022;23:168. [PMID: 35927760 PMCID: PMC9351148 DOI: 10.1186/s13059-022-02734-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 07/21/2022] [Indexed: 02/08/2023] Open

Unified K-means coupled self-representation and neighborhood kernel learning for clustering single-cell RNA-sequencing data. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Wang Y, Xu Y, Zang Z, Wu L, Li Z. Panoramic Manifold Projection (Panoramap) for Single-Cell Data Dimensionality Reduction and Visualization. Int J Mol Sci 2022;23:7775. [PMID: 35887125 PMCID: PMC9316349 DOI: 10.3390/ijms23147775] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 07/03/2022] [Accepted: 07/12/2022] [Indexed: 12/22/2022] Open

Bard JE, Nowak NJ, Buck MJ, Sinha S. Multimodal Dimension Reduction and Subtype Classification of Head and Neck Squamous Cell Tumors. Front Oncol 2022;12:892207. [PMID: 35912202 PMCID: PMC9326399 DOI: 10.3389/fonc.2022.892207] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 06/09/2022] [Indexed: 01/18/2023] Open

Ellis D, Wu D, Datta S. SAREV: A review on statistical analytics of single-cell RNA sequencing data. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2022;14:e1558. [PMID: 36034329 PMCID: PMC9400796 DOI: 10.1002/wics.1558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 04/09/2021] [Indexed: 06/15/2023]

Context-aware deconvolution of cell-cell communication with Tensor-cell2cell. Nat Commun 2022;13:3665. [PMID: 35760817 PMCID: PMC9237099 DOI: 10.1038/s41467-022-31369-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 06/14/2022] [Indexed: 12/23/2022] Open

Abstract

Cell interactions determine phenotypes, and intercellular communication is shaped by cellular contexts such as disease state, organismal life stage, and tissue microenvironment. Single-cell technologies measure the molecules mediating cell–cell communication, and emerging computational tools can exploit these data to decipher intercellular communication. However, current methods either disregard cellular context or rely on simple pairwise comparisons between samples, thus limiting the ability to decipher complex cell–cell communication across multiple time points, levels of disease severity, or spatial contexts. Here we present Tensor-cell2cell, an unsupervised method using tensor decomposition, which deciphers context-driven intercellular communication by simultaneously accounting for multiple stages, states, or locations of the cells. To do so, Tensor-cell2cell uncovers context-driven patterns of communication associated with different phenotypic states and determined by unique combinations of cell types and ligand-receptor pairs. As such, Tensor-cell2cell robustly improves upon and extends the analytical capabilities of existing tools. We show Tensor-cell2cell can identify multiple modules associated with distinct communication processes (e.g., participating cell–cell and ligand-receptor pairs) linked to severities of Coronavirus Disease 2019 and to Autism Spectrum Disorder. Thus, we introduce an effective and easy-to-use strategy for understanding complex communication patterns across diverse conditions.

Cellular contexts such as disease state, organismal life stage and tissue microenvironment, shape intercellular communication, and ultimately affect an organism’s phenotypes. Here, the authors present Tensor-cell2cell, an unsupervised method for deciphering context-driven intercellular communication.

Collapse

Zandavi SM, Koch FC, Vijayan A, Zanini F, Mora FV, Ortega DG, Vafaee F. Disentangling single-cell omics representation with a power spectral density-based feature extraction. Nucleic Acids Res 2022;50:5482-5492. [PMID: 35639509 PMCID: PMC9178020 DOI: 10.1093/nar/gkac436] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Revised: 04/26/2022] [Accepted: 05/10/2022] [Indexed: 12/13/2022] Open

Wang Y, Peng Q, Mou X, Wang X, Li H, Han T, Sun Z, Wang X. A successful hybrid deep learning model aiming at promoter identification. BMC Bioinformatics 2022;23:206. [PMID: 35641900 PMCID: PMC9158169 DOI: 10.1186/s12859-022-04735-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 05/16/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The zone adjacent to a transcription start site (TSS), namely, the promoter, is primarily involved in the process of DNA transcription initiation and regulation. As a result, proper promoter identification is critical for further understanding the mechanism of the networks controlling genomic regulation. A number of methodologies for the identification of promoters have been proposed. Nonetheless, due to the great heterogeneity existing in promoters, the results of these procedures are still unsatisfactory. In order to establish additional discriminative characteristics and properly recognize promoters, we developed the hybrid model for promoter identification (HMPI), a hybrid deep learning model that can characterize both the native sequences of promoters and the morphological outline of promoters at the same time. We developed the HMPI to combine a method called the PSFN (promoter sequence features network), which characterizes native promoter sequences and deduces sequence features, with a technique referred to as the DSPN (deep structural profiles network), which is specially structured to model the promoters in terms of their structural profile and to deduce their structural attributes.

RESULTS

The HMPI was applied to human, plant and Escherichia coli K-12 strain datasets, and the findings showed that the HMPI was successful at extracting the features of the promoter while greatly enhancing the promoter identification performance. In addition, after the improvements of synthetic sampling, transfer learning and label smoothing regularization, the improved HMPI models achieved good results in identifying subtypes of promoters on prokaryotic promoter datasets.

CONCLUSIONS

The results showed that the HMPI was successful at extracting the features of promoters while greatly enhancing the performance of identifying promoters on both eukaryotic and prokaryotic datasets, and the improved HMPI models are good at identifying subtypes of promoters on prokaryotic promoter datasets. The HMPI is additionally adaptable to different biological functional sequences, allowing for the addition of new features or models.

Collapse