1
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and Deep Learning Methods for Predicting 3D Genome Organization. Methods Mol Biol 2025; 2856:357-400. [PMID: 39283464 DOI: 10.1007/978-1-0716-4136-1_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Three-dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, topologically associating domains (TADs), and A/B compartments, play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers and transcription factor binding site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, and TAD boundaries) and analyze their pros and cons. We also point out obstacles to the computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P G Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - J Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
2
|
Zhong G, Liu H, Deng L. Ensemble Machine Learning and Predicted Properties Promote Antimicrobial Peptide Identification. Interdiscip Sci 2024; 16:951-965. [PMID: 38972032 DOI: 10.1007/s12539-024-00640-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 06/04/2024] [Accepted: 06/07/2024] [Indexed: 07/08/2024]
Abstract
The emergence of antibiotic-resistant microbes raises a pressing demand for novel alternative treatments. One promising alternative is the antimicrobial peptides (AMPs), a class of innate immunity mediators within the therapeutic peptide realm. AMPs offer salient advantages such as high specificity, cost-effective synthesis, and reduced toxicity. Although some computational methodologies have been proposed to identify potential AMPs with the rapid development of artificial intelligence techniques, there is still ample room to improve their performance. This study proposes a predictive framework which ensembles deep learning and statistical learning methods to screen peptides with antimicrobial activity. We integrate multiple LightGBM classifiers and convolution neural networks which leverages various predicted sequential, structural and physicochemical properties from their residue sequences extracted by diverse machine learning paradigms. Comparative experiments exhibit that our method outperforms other state-of-the-art approaches on an independent test dataset, in terms of representative capability measures. Besides, we analyse the discrimination quality under different varieties of attribute information and it reveals that combination of multiple features could improve prediction. In addition, a case study is carried out to illustrate the exemplary favorable identification effect. We establish a web application at http://amp.denglab.org to provide convenient usage of our proposal and make the predictive framework, source code, and datasets publicly accessible at https://github.com/researchprotein/amp .
Collapse
Affiliation(s)
- Guolun Zhong
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Hui Liu
- College of Computer and Information Engineering, Nanjing Tech University, Nanjing, 211816, China.
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
| |
Collapse
|
3
|
Xiu YH, Sun SL, Zhou BW, Wan Y, Tang H, Long HX. DGSIST: Clustering spatial transcriptome data based on deep graph structure Infomax. Methods 2024; 231:226-236. [PMID: 39413889 DOI: 10.1016/j.ymeth.2024.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 09/26/2024] [Accepted: 10/04/2024] [Indexed: 10/18/2024] Open
Abstract
Although spatial transcriptomics data provide valuable insights into gene expression profiles and the spatial structure of tissues, most studies rely solely on gene expression information, underutilizing the spatial data. To fully leverage the potential of spatial transcriptomics and graph neural networks, the DGSI (Deep Graph Structure Infomax) model is proposed. This innovative graph data processing model uses graph convolutional neural networks and employs an unsupervised learning approach. It maximizes the mutual information between graph-level and node-level representations, emphasizing flexible sampling and aggregation of nodes and their neighbors. This effectively captures and incorporates local information from nodes into the overall graph structure. Additionally, this paper developed the DGSIST framework, an unsupervised cell clustering method that integrates the DGSI model, SVD dimensionality reduction algorithm, and k-means++ clustering algorithm. This aims to identify cell types accurately. DGSIST fully uses spatial transcriptomics data and outperforms existing methods in accuracy. Demonstrations of DGSIST's capability across various tissue types and technological platforms have shown its effectiveness in accurately identifying spatial domains in multiple tissue sections. Compared to other spatial clustering methods, DGSIST excels in cell clustering and effectively eliminates batch effects without needing batch correction. DGSIST excels in spatial clustering analysis, spatial variation identification, and differential gene expression detection and directly applies to graph analysis tasks, such as node classification, link prediction, or graph clustering. Anticipation lies in the contribution of the DGSIST framework to a deeper understanding of the spatial organizational structures of diseases such as cancer.
Collapse
Affiliation(s)
- Yu-Han Xiu
- College of Information Science Technology, Hainan Normal University, HaiKou City 571158, China; Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, HaiKou City 571158, China
| | - Si-Lin Sun
- College of Information Science Technology, Hainan Normal University, HaiKou City 571158, China; Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, HaiKou City 571158, China
| | - Bing-Wei Zhou
- College of Information Science Technology, Hainan Normal University, HaiKou City 571158, China; Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, HaiKou City 571158, China
| | - Ying Wan
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China; Medical Engineering & Medical Informatics Integration and Transformational Medicine Key Laboratory of Luzhou City, Luzhou 646000, China.
| | - Hai-Xia Long
- College of Information Science Technology, Hainan Normal University, HaiKou City 571158, China; Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, HaiKou City 571158, China.
| |
Collapse
|
4
|
Kumar Halder A, Agarwal A, Jodkowska K, Plewczynski D. A systematic analyses of different bioinformatics pipelines for genomic data and its impact on deep learning models for chromatin loop prediction. Brief Funct Genomics 2024; 23:538-548. [PMID: 38555493 DOI: 10.1093/bfgp/elae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/07/2024] [Accepted: 03/04/2024] [Indexed: 04/02/2024] Open
Abstract
Genomic data analysis has witnessed a surge in complexity and volume, primarily driven by the advent of high-throughput technologies. In particular, studying chromatin loops and structures has become pivotal in understanding gene regulation and genome organization. This systematic investigation explores the realm of specialized bioinformatics pipelines designed specifically for the analysis of chromatin loops and structures. Our investigation incorporates two protein (CTCF and Cohesin) factor-specific loop interaction datasets from six distinct pipelines, amassing a comprehensive collection of 36 diverse datasets. Through a meticulous review of existing literature, we offer a holistic perspective on the methodologies, tools and algorithms underpinning the analysis of this multifaceted genomic feature. We illuminate the vast array of approaches deployed, encompassing pivotal aspects such as data preparation pipeline, preprocessing, statistical features and modelling techniques. Beyond this, we rigorously assess the strengths and limitations inherent in these bioinformatics pipelines, shedding light on the interplay between data quality and the performance of deep learning models, ultimately advancing our comprehension of genomic intricacies.
Collapse
Affiliation(s)
- Anup Kumar Halder
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Abhishek Agarwal
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Karolina Jodkowska
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| |
Collapse
|
5
|
Deng Q, Zhang J, Liu J, Liu Y, Dai Z, Zou X, Li Z. Identifying Protein Phosphorylation Site-Disease Associations Based on Multi-Similarity Fusion and Negative Sample Selection by Convolutional Neural Network. Interdiscip Sci 2024; 16:649-664. [PMID: 38457108 DOI: 10.1007/s12539-024-00615-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 03/09/2024]
Abstract
As one of the most important post-translational modifications (PTMs), protein phosphorylation plays a key role in a variety of biological processes. Many studies have shown that protein phosphorylation is associated with various human diseases. Therefore, identifying protein phosphorylation site-disease associations can help to elucidate the pathogenesis of disease and discover new drug targets. Networks of sequence similarity and Gaussian interaction profile kernel similarity were constructed for phosphorylation sites, as well as networks of disease semantic similarity, disease symptom similarity and Gaussian interaction profile kernel similarity were constructed for diseases. To effectively combine different phosphorylation sites and disease similarity information, random walk with restart algorithm was used to obtain the topology information of the network. Then, the diffusion component analysis method was utilized to obtain the comprehensive phosphorylation site similarity and disease similarity. Meanwhile, the reliable negative samples were screened based on the Euclidean distance method. Finally, a convolutional neural network (CNN) model was constructed to identify potential associations between phosphorylation sites and diseases. Based on tenfold cross-validation, the evaluation indicators were obtained including accuracy of 93.48%, specificity of 96.82%, sensitivity of 90.15%, precision of 96.62%, Matthew's correlation coefficient of 0.8719, area under the receiver operating characteristic curve of 0.9786 and area under the precision-recall curve of 0.9836. Additionally, most of the top 20 predicted disease-related phosphorylation sites (19/20 for Alzheimer's disease; 20/16 for neuroblastoma) were verified by literatures and databases. These results show that the proposed method has an outstanding prediction performance and a high practical value.
Collapse
Affiliation(s)
- Qian Deng
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
| | - Jing Zhang
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
| | - Jie Liu
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
| | - Yuqi Liu
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
| | - Zong Dai
- School of Biomedical Engineering, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Xiaoyong Zou
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, China.
| | - Zhanchao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China.
| |
Collapse
|
6
|
Shi P, Han J, Zhang Y, Li G, Zhou X. IMI-driver: Integrating multi-level gene networks and multi-omics for cancer driver gene identification. PLoS Comput Biol 2024; 20:e1012389. [PMID: 39186807 PMCID: PMC11379397 DOI: 10.1371/journal.pcbi.1012389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 09/06/2024] [Accepted: 08/05/2024] [Indexed: 08/28/2024] Open
Abstract
The identification of cancer driver genes is crucial for early detection, effective therapy, and precision medicine of cancer. Cancer is caused by the dysregulation of several genes at various levels of regulation. However, current techniques only capture a limited amount of regulatory information, which may hinder their efficacy. In this study, we present IMI-driver, a model that integrates multi-omics data into eight biological networks and applies Multi-view Collaborative Network Embedding to embed the gene regulation information from the biological networks into a low-dimensional vector space to identify cancer drivers. We apply IMI-driver to 29 cancer types from The Cancer Genome Atlas (TCGA) and compare its performance with nine other methods on nine benchmark datasets. IMI-driver outperforms the other methods, demonstrating that multi-level network integration enhances prediction accuracy. We also perform a pan-cancer analysis using the genes identified by IMI-driver, which confirms almost all our selected candidate genes as known or potential drivers. Case studies of the new positive genes suggest their roles in cancer development and progression.
Collapse
Affiliation(s)
- Peiting Shi
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Junmin Han
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Yinghao Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Guanpu Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Xionghui Zhou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, People's Republic of China
- Key Laboratory of Smart Farming for Agricultural Animals, Ministry of Agriculture and Rural Affairs, People's Republic of China
| |
Collapse
|
7
|
Wu Y, Shi Z, Zhou X, Zhang P, Yang X, Ding J, Wu H. scHiCyclePred: a deep learning framework for predicting cell cycle phases from single-cell Hi-C data using multi-scale interaction information. Commun Biol 2024; 7:923. [PMID: 39085477 PMCID: PMC11291681 DOI: 10.1038/s42003-024-06626-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 07/24/2024] [Indexed: 08/02/2024] Open
Abstract
The emergence of single-cell Hi-C (scHi-C) technology has provided unprecedented opportunities for investigating the intricate relationship between cell cycle phases and the three-dimensional (3D) structure of chromatin. However, accurately predicting cell cycle phases based on scHi-C data remains a formidable challenge. Here, we present scHiCyclePred, a prediction model that integrates multiple feature sets to leverage scHi-C data for predicting cell cycle phases. scHiCyclePred extracts 3D chromatin structure features by incorporating multi-scale interaction information. The comparative analysis illustrates that scHiCyclePred surpasses existing methods such as Nagano_method and CIRCLET across various metrics including accuracy (ACC), F1 score, Precision, Recall, and balanced accuracy (BACC). In addition, we evaluate scHiCyclePred against the previously published CIRCLET using the dataset of complex tissues (Liu_dataset). Experimental results reveal significant improvements with scHiCyclePred exhibiting improvements of 0.39, 0.52, 0.52, and 0.39 over the CIRCLET in terms of ACC, F1 score, Precision, and Recall metrics, respectively. Furthermore, we conduct analyses on three-dimensional chromatin dynamics and gene features during the cell cycle, providing a more comprehensive understanding of cell cycle dynamics through chromatin structure. scHiCyclePred not only offers insights into cell biology but also holds promise for catalyzing breakthroughs in disease research. Access scHiCyclePred on GitHub at https:// github.com/HaoWuLab-Bioinformatics/ scHiCyclePred .
Collapse
Affiliation(s)
- Yingfu Wu
- School of Software, Shandong University, Jinan, Shandong, China
- Shenzhen Research Institute of Shandong University, Shenzhen, Guangdong, China
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Zhenqi Shi
- School of Software, Shandong University, Jinan, Shandong, China
| | - Xiangfei Zhou
- School of Software, Shandong University, Jinan, Shandong, China
| | - Pengyu Zhang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Xiuhui Yang
- School of Software, Shandong University, Jinan, Shandong, China
| | - Jun Ding
- Department of Medicine, Meakins-Christie Laboratories, McGill University, Montreal, QC, Canada.
| | - Hao Wu
- School of Software, Shandong University, Jinan, Shandong, China.
- Shenzhen Research Institute of Shandong University, Shenzhen, Guangdong, China.
| |
Collapse
|
8
|
Wang Y, Kong X, Bi X, Cui L, Yu H, Wu H. ResDeepSurv: A Survival Model for Deep Neural Networks Based on Residual Blocks and Self-attention Mechanism. Interdiscip Sci 2024; 16:405-417. [PMID: 38489147 DOI: 10.1007/s12539-024-00617-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 01/30/2024] [Accepted: 02/01/2024] [Indexed: 03/17/2024]
Abstract
Survival analysis, as a widely used method for analyzing and predicting the timing of event occurrence, plays a crucial role in the medicine field. Medical professionals utilize survival models to gain insight into the effects of patient covariates on the disease, and the correlation with the effectiveness of different treatment strategies. This knowledge is essential for the development of treatment plans and the enhancement of treatment approaches. Conventional survival models, such as the Cox proportional hazards model, require a significant amount of feature engineering or prior knowledge to facilitate personalized modeling. To address these limitations, we propose a novel residual-based self-attention deep neural network for survival modeling, called ResDeepSurv, which combines the benefits of neural networks and the Cox proportional hazards regression model. The model proposed in our study simulates the distribution of survival time and the correlation between covariates and outcomes, but does not impose strict assumptions on the basic distribution of survival data. This approach effectively accounts for both linear and nonlinear risk functions in survival data analysis. The performance of our model in analyzing survival data with various risk functions is on par with or even superior to that of other existing survival analysis methods. Furthermore, we validate the superior performance of our model in comparison to currently existing methods by evaluating multiple publicly available clinical datasets. Through this study, we prove the effectiveness of our proposed model in survival analysis, providing a promising alternative to traditional approaches. The application of deep learning techniques and the ability to capture complex relationships between covariates and survival outcomes without relying on extensive feature engineering make our model a valuable tool for personalized medicine and decision-making in clinical practice.
Collapse
Affiliation(s)
- Yuchen Wang
- School of Software, Shandong University, Jinan, 250101, China
| | - Xianchun Kong
- Department of Pediatric Surgery, Heze Municipal Hospital, Heze, 274000, China
| | - Xiao Bi
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Lizhen Cui
- School of Software, Shandong University, Jinan, 250101, China
| | - Hong Yu
- School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Hao Wu
- School of Software, Shandong University, Jinan, 250101, China.
| |
Collapse
|
9
|
Li F, Zhang J, Li K, Peng Y, Zhang H, Xu Y, Yu Y, Zhang Y, Liu Z, Wang Y, Huang L, Zhou F. GANSamples-ac4C: Enhancing ac4C site prediction via generative adversarial networks and transfer learning. Anal Biochem 2024; 689:115495. [PMID: 38431142 DOI: 10.1016/j.ab.2024.115495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 02/18/2024] [Accepted: 02/22/2024] [Indexed: 03/05/2024]
Abstract
RNA modification, N4-acetylcytidine (ac4C), is enzymatically catalyzed by N-acetyltransferase 10 (NAT10) and plays an essential role across tRNA, rRNA, and mRNA. It influences various cellular functions, including mRNA stability and rRNA biosynthesis. Wet-lab detection of ac4C modification sites is highly resource-intensive and costly. Therefore, various machine learning and deep learning techniques have been employed for computational detection of ac4C modification sites. The known ac4C modification sites are limited for training an accurate and stable prediction model. This study introduces GANSamples-ac4C, a novel framework that synergizes transfer learning and generative adversarial network (GAN) to generate synthetic RNA sequences to train a better ac4C modification site prediction model. Comparative analysis reveals that GANSamples-ac4C outperforms existing state-of-the-art methods in identifying ac4C sites. Moreover, our result underscores the potential of synthetic data in mitigating the issue of data scarcity for biological sequence prediction tasks. Another major advantage of GANSamples-ac4C is its interpretable decision logic. Multi-faceted interpretability analyses detect key regions in the ac4C sequences influencing the discriminating decision between positive and negative samples, a pronounced enrichment of G in this region, and ac4C-associated motifs. These findings may offer novel insights for ac4C research. The GANSamples-ac4C framework and its source code are publicly accessible at http://www.healthinformaticslab.org/supp/.
Collapse
Affiliation(s)
- Fei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, and College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Jiale Zhang
- College of Software, Jilin University, Changchun, Jilin, 130012, China
| | - Kewei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, and College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| | - Yu Peng
- College of Software, Jilin University, Changchun, Jilin, 130012, China
| | - Haotian Zhang
- College of Software, Jilin University, Changchun, Jilin, 130012, China
| | - Yiping Xu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, and College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Yue Yu
- College of Software, Jilin University, Changchun, Jilin, 130012, China
| | - Yuteng Zhang
- College of Software, Jilin University, Changchun, Jilin, 130012, China
| | - Zewen Liu
- College of Software, Jilin University, Changchun, Jilin, 130012, China
| | - Ying Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, and College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Lan Huang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, and College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, and College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China; School of Biology and Engineering, Guizhou Medical University, Guiyang, 550025, Guizhou, China.
| |
Collapse
|
10
|
Khan R, Xiao C, Liu Y, Tian J, Chen Z, Su L, Li D, Hassan H, Li H, Xie W, Zhong W, Huang B. Transformative Deep Neural Network Approaches in Kidney Ultrasound Segmentation: Empirical Validation with an Annotated Dataset. Interdiscip Sci 2024; 16:439-454. [PMID: 38413547 DOI: 10.1007/s12539-024-00620-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 01/06/2024] [Accepted: 02/05/2024] [Indexed: 02/29/2024]
Abstract
Kidney ultrasound (US) images are primarily employed for diagnosing different renal diseases. Among them, one is renal localization and detection, which can be carried out by segmenting the kidney US images. However, kidney segmentation from US images is challenging due to low contrast, speckle noise, fluid, variations in kidney shape, and modality artifacts. Moreover, well-annotated US datasets for renal segmentation and detection are scarce. This study aims to build a novel, well-annotated dataset containing 44,880 US images. In addition, we propose a novel training scheme that utilizes the encoder and decoder parts of a state-of-the-art segmentation algorithm. In the pre-processing step, pixel intensity normalization improves contrast and facilitates model convergence. The modified encoder-decoder architecture improves pyramid-shaped hole pooling, cascaded multiple-hole convolutions, and batch normalization. The pre-processing step gradually reconstructs spatial information, including the capture of complete object boundaries, and the post-processing module with a concave curvature reduces the false positive rate of the results. We present benchmark findings to validate the quality of the proposed training scheme and dataset. We applied six evaluation metrics and several baseline segmentation approaches to our novel kidney US dataset. Among the evaluated models, DeepLabv3+ performed well and achieved the highest dice, Hausdorff distance 95, accuracy, specificity, average symmetric surface distance, and recall scores of 89.76%, 9.91, 98.14%, 98.83%, 3.03, and 90.68%, respectively. The proposed training strategy aids state-of-the-art segmentation models, resulting in better-segmented predictions. Furthermore, the large, well-annotated kidney US public dataset will serve as a valuable baseline source for future medical image analysis research.
Collapse
Affiliation(s)
- Rashid Khan
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518188, China
- College of Applied Sciences, Shenzhen University, Shenzhen, 518060, China
- Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Shenzhen University Health Science Center, Shenzhen, 518060, China
| | - Chuda Xiao
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518188, China
- Wuerzburg Dynamics Inc., Shenzhen, 518188, China
| | - Yang Liu
- Department of Urology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, 510120, China
| | - Jinyu Tian
- Wuerzburg Dynamics Inc., Shenzhen, 518188, China
| | - Zhuo Chen
- Wuerzburg Dynamics Inc., Shenzhen, 518188, China
| | - Liyilei Su
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518188, China
- College of Applied Sciences, Shenzhen University, Shenzhen, 518060, China
- Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Shenzhen University Health Science Center, Shenzhen, 518060, China
| | - Dan Li
- Wuerzburg Dynamics Inc., Shenzhen, 518188, China
| | - Haseeb Hassan
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518188, China
| | - Haoyu Li
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518188, China
| | - Weiguo Xie
- Wuerzburg Dynamics Inc., Shenzhen, 518188, China
| | - Wen Zhong
- Department of Urology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, 510120, China.
| | - Bingding Huang
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518188, China
| |
Collapse
|
11
|
Zhao L, Hao R, Chai Z, Fu W, Yang W, Li C, Liu Q, Jiang Y. DeepOCR: A multi-species deep-learning framework for accurate identification of open chromatin regions in livestock. Comput Biol Chem 2024; 110:108077. [PMID: 38691895 DOI: 10.1016/j.compbiolchem.2024.108077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 03/27/2024] [Accepted: 04/16/2024] [Indexed: 05/03/2024]
Abstract
A wealth of experimental evidence has suggested that open chromatin regions (OCRs) are involved in many critical biological activities, such as DNA replication, enhancer activity, and gene transcription. Accurately identifying OCRs in livestock species can provide critical insights into the distribution and characteristics of OCRs for disease treatment in livestock, thereby improving animal welfare. However, most current machine-learning methods for OCR prediction were originally designed for a limited number of model organisms, such as humans and some model organisms, and thus their performance on non-model organisms, specifically livestock, is often unsatisfactory. To bridge this gap, we propose DeepOCR, a lightweight depth-separable residual network model for predicting OCRs in livestock, including chicken, cattle, and sheep. DeepOCR integrates a single convolution layer and two improved residue structure blocks to extract and learn important features from the input DNA sequences. A fully connected layer was also employed to further process the extracted features and improve the robustness of the entire network. Our benchmarking experiments demonstrated superior prediction performance of DeepOCR compared to state-of-the-art approaches on testing datasets of the three species. The source code of DeepOCR is freely available for academic purposes at https://github.com/jasonzhao371/DeepOCR/. We anticipate DeepOCR servers as a practical and reliable computational tool for OCR-related studies in livestock species.
Collapse
Affiliation(s)
- Liangwei Zhao
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Ran Hao
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Ziyi Chai
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Weiwei Fu
- College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, Gansu 730020, China
| | - Wei Yang
- National Clinical Research Center for Infectious Diseases, Shenzhen Third People's Hospital, Shenzhen 518112, China
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling 712100, China.
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China; Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China.
| |
Collapse
|
12
|
Shi Z, Wu H. CTPredictor: A comprehensive and robust framework for predicting cell types by integrating multi-scale features from single-cell Hi-C data. Comput Biol Med 2024; 173:108336. [PMID: 38513390 DOI: 10.1016/j.compbiomed.2024.108336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/01/2024] [Accepted: 03/17/2024] [Indexed: 03/23/2024]
Abstract
Single-cell Hi-C (scHi-C) has emerged as a powerful technology for deciphering cell-to-cell variability in three-dimensional (3D) chromatin organization, providing insights into genome-wide chromatin interactions and their correlation with cellular functions. Nevertheless, the accurate identification of cell types across different datasets remains a formidable challenge, hindering comprehensive investigations into genome structure. In response, we introduce CTPredictor, an innovative computational method that integrates multi-scale features to accurately predict cell types in various datasets. CTPredictor strategically incorporates three distinct feature sets, namely, small intra-domain contact probability (SICP), smoothed small intra-domain contact probability (SSICP), and smoothed bin contact probability (SBCP). The resulting fusion classification model significantly enhances the accuracy of cell type prediction based on single-cell Hi-C data (scHi-C). Rigorous benchmarking against established methods and three conventional machine learning approaches demonstrates the robust performance of CTPredictor, positioning it as an advanced tool for cell type prediction within scHi-C data. Beyond its prediction capabilities, CTPredictor holds promise in illuminating 3D genome structures and their functional significance across a wide array of biological processes.
Collapse
Affiliation(s)
- Zhenqi Shi
- School of Software, Shandong University, 250100, Jinan, China
| | - Hao Wu
- School of Software, Shandong University, 250100, Jinan, China.
| |
Collapse
|
13
|
Rana V, Peng J, Pan C, Lyu H, Cheng A, Kim M, Milenkovic O. Interpretable online network dictionary learning for inferring long-range chromatin interactions. PLoS Comput Biol 2024; 20:e1012095. [PMID: 38753877 PMCID: PMC11135774 DOI: 10.1371/journal.pcbi.1012095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 05/29/2024] [Accepted: 04/20/2024] [Indexed: 05/18/2024] Open
Abstract
Dictionary learning (DL), implemented via matrix factorization (MF), is commonly used in computational biology to tackle ubiquitous clustering problems. The method is favored due to its conceptual simplicity and relatively low computational complexity. However, DL algorithms produce results that lack interpretability in terms of real biological data. Additionally, they are not optimized for graph-structured data and hence often fail to handle them in a scalable manner. In order to address these limitations, we propose a novel DL algorithm called online convex network dictionary learning (online cvxNDL). Unlike classical DL algorithms, online cvxNDL is implemented via MF and designed to handle extremely large datasets by virtue of its online nature. Importantly, it enables the interpretation of dictionary elements, which serve as cluster representatives, through convex combinations of real measurements. Moreover, the algorithm can be applied to data with a network structure by incorporating specialized subnetwork sampling techniques. To demonstrate the utility of our approach, we apply cvxNDL on 3D-genome RNAPII ChIA-Drop data with the goal of identifying important long-range interaction patterns (long-range dictionary elements). ChIA-Drop probes higher-order interactions, and produces data in the form of hypergraphs whose nodes represent genomic fragments. The hyperedges represent observed physical contacts. Our hypergraph model analysis has the objective of creating an interpretable dictionary of long-range interaction patterns that accurately represent global chromatin physical contact maps. Through the use of dictionary information, one can also associate the contact maps with RNA transcripts and infer cellular functions. To accomplish the task at hand, we focus on RNAPII-enriched ChIA-Drop data from Drosophila Melanogaster S2 cell lines. Our results offer two key insights. First, we demonstrate that online cvxNDL retains the accuracy of classical DL (MF) methods while simultaneously ensuring unique interpretability and scalability. Second, we identify distinct collections of proximal and distal interaction patterns involving chromatin elements shared by related processes across different chromosomes, as well as patterns unique to specific chromosomes. To associate the dictionary elements with biological properties of the corresponding chromatin regions, we employ Gene Ontology (GO) enrichment analysis and perform multiple RNA coexpression studies.
Collapse
Affiliation(s)
- Vishal Rana
- Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Illinois, United States of America
| | - Jianhao Peng
- Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Illinois, United States of America
| | - Chao Pan
- Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Illinois, United States of America
| | - Hanbaek Lyu
- Department of Mathematics, University of Wisconsin - Madison, Madison, Wisconsin, United States of America
| | - Albert Cheng
- School of Biological and Health Systems Engineering, Arizona State University, Phoenix, Arizona, United States of America
| | - Minji Kim
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Olgica Milenkovic
- Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Illinois, United States of America
| |
Collapse
|
14
|
Yang G, Li J, Hu J, Shi JY. Recognition of cyanobacteria promoters via Siamese network-based contrastive learning under novel non-promoter generation. Brief Bioinform 2024; 25:bbae193. [PMID: 38701419 PMCID: PMC11066903 DOI: 10.1093/bib/bbae193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/08/2024] [Accepted: 04/05/2024] [Indexed: 05/05/2024] Open
Abstract
It is a vital step to recognize cyanobacteria promoters on a genome-wide scale. Computational methods are promising to assist in difficult biological identification. When building recognition models, these methods rely on non-promoter generation to cope with the lack of real non-promoters. Nevertheless, the factitious significant difference between promoters and non-promoters causes over-optimistic prediction. Moreover, designed for E. coli or B. subtilis, existing methods cannot uncover novel, distinct motifs among cyanobacterial promoters. To address these issues, this work first proposes a novel non-promoter generation strategy called phantom sampling, which can eliminate the factitious difference between promoters and generated non-promoters. Furthermore, it elaborates a novel promoter prediction model based on the Siamese network (SiamProm), which can amplify the hidden difference between promoters and non-promoters through a joint characterization of global associations, upstream and downstream contexts, and neighboring associations w.r.t. k-mer tokens. The comparison with state-of-the-art methods demonstrates the superiority of our phantom sampling and SiamProm. Both comprehensive ablation studies and feature space illustrations also validate the effectiveness of the Siamese network and its components. More importantly, SiamProm, upon our phantom sampling, finds a novel cyanobacterial promoter motif ('GCGATCGC'), which is palindrome-patterned, content-conserved, but position-shifted.
Collapse
Affiliation(s)
- Guang Yang
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, Shaanxi, 710072, China
| | - Jianing Li
- School of Computer Science, Northwestern Polytechnical University, Xi’an, Shaanxi, 710072, China
| | - Jinlu Hu
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, Shaanxi, 710072, China
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, Shaanxi, 710072, China
| |
Collapse
|
15
|
Zhang Y, Zhang P, Wu H. Enhancer-MDLF: a novel deep learning framework for identifying cell-specific enhancers. Brief Bioinform 2024; 25:bbae083. [PMID: 38485768 PMCID: PMC10938904 DOI: 10.1093/bib/bbae083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 01/27/2024] [Accepted: 02/07/2024] [Indexed: 03/18/2024] Open
Abstract
Enhancers, noncoding DNA fragments, play a pivotal role in gene regulation, facilitating gene transcription. Identifying enhancers is crucial for understanding genomic regulatory mechanisms, pinpointing key elements and investigating networks governing gene expression and disease-related mechanisms. Existing enhancer identification methods exhibit limitations, prompting the development of our novel multi-input deep learning framework, termed Enhancer-MDLF. Experimental results illustrate that Enhancer-MDLF outperforms the previous method, Enhancer-IF, across eight distinct human cell lines and exhibits superior performance on generic enhancer datasets and enhancer-promoter datasets, affirming the robustness of Enhancer-MDLF. Additionally, we introduce transfer learning to provide an effective and potential solution to address the prediction challenges posed by enhancer specificity. Furthermore, we utilize model interpretation to identify transcription factor binding site motifs that may be associated with enhancer regions, with important implications for facilitating the study of enhancer regulatory mechanisms. The source code is openly accessible at https://github.com/HaoWuLab-Bioinformatics/Enhancer-MDLF.
Collapse
Affiliation(s)
- Yao Zhang
- School of Software, Shandong University, Jinan, 250100, Shandong, China
| | - Pengyu Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Hao Wu
- School of Software, Shandong University, Jinan, 250100, Shandong, China
| |
Collapse
|
16
|
Song C, Zhang G, Mu X, Feng C, Zhang Q, Song S, Zhang Y, Yin M, Zhang H, Tang H, Li C. eRNAbase: a comprehensive database for decoding the regulatory eRNAs in human and mouse. Nucleic Acids Res 2024; 52:D81-D91. [PMID: 37889077 PMCID: PMC10767853 DOI: 10.1093/nar/gkad925] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/26/2023] [Accepted: 10/12/2023] [Indexed: 10/28/2023] Open
Abstract
Enhancer RNAs (eRNAs) transcribed from distal active enhancers serve as key regulators in gene transcriptional regulation. The accumulation of eRNAs from multiple sequencing assays has led to an urgent need to comprehensively collect and process these data to illustrate the regulatory landscape of eRNAs. To address this need, we developed the eRNAbase (http://bio.liclab.net/eRNAbase/index.php) to store the massive available resources of human and mouse eRNAs and provide comprehensive annotation and analyses for eRNAs. The current version of eRNAbase cataloged 10 399 928 eRNAs from 1012 samples, including 858 human samples and 154 mouse samples. These eRNAs were first identified and uniformly processed from 14 eRNA-related experiment types manually collected from GEO/SRA and ENCODE. Importantly, the eRNAbase provides detailed and abundant (epi)genetic annotations in eRNA regions, such as super enhancers, enhancers, common single nucleotide polymorphisms, expression quantitative trait loci, transcription factor binding sites, CRISPR/Cas9 target sites, DNase I hypersensitivity sites, chromatin accessibility regions, methylation sites, chromatin interactions regions, topologically associating domains and RNA spatial interactions. Furthermore, the eRNAbase provides users with three novel analyses including eRNA-mediated pathway regulatory analysis, eRNA-based variation interpretation analysis and eRNA-mediated TF-target gene analysis. Hence, eRNAbase is a powerful platform to query, browse and visualize regulatory cues associated with eRNAs.
Collapse
Affiliation(s)
- Chao Song
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Guorui Zhang
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Xinxin Mu
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Chenchen Feng
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
| | - Qinyi Zhang
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Shuang Song
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Yuexin Zhang
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Mingxue Yin
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Hang Zhang
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Huifang Tang
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Clinical Research Center for Myocardial Injury in Hunan Province, Hengyang, Hunan, 421001, China
| | - Chunquan Li
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Maternal and Child Health Care Hospital, National Health Commission Key Laboratory of Birth Defect Research and Prevention, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
| |
Collapse
|
17
|
Bose S, Saha S, Goswami H, Shanmugam G, Sarkar K. Involvement of CCCTC-binding factor in epigenetic regulation of cancer. Mol Biol Rep 2023; 50:10383-10398. [PMID: 37840067 DOI: 10.1007/s11033-023-08879-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 10/03/2023] [Indexed: 10/17/2023]
Abstract
A major global health burden continues to be borne by the complex and multifaceted disease of cancer. Epigenetic changes, which are essential for the emergence and spread of cancer, have drawn a huge amount of attention recently. The CCCTC-binding factor (CTCF), which takes part in a wide range of cellular processes including genomic imprinting, X chromosome inactivation, 3D chromatin architecture, local modifications of histone, and RNA polymerase II-mediated gene transcription, stands out among the diverse array of epigenetic regulators. CTCF not only functions as an architectural protein but also modulates DNA methylation and histone modifications. Epigenetic regulation of cancer has already been the focus of plenty of studies. Understanding the role of CTCF in the cancer epigenetic landscape may lead to the development of novel targeted therapeutic strategies for cancer. CTCF has already earned its status as a tumor suppressor gene by acting like a homeostatic regulator of genome integrity and function. Moreover, CTCF has a direct effect on many important transcriptional regulators that control the cell cycle, apoptosis, senescence, and differentiation. As we learn more about CTCF-mediated epigenetic modifications and transcriptional regulations, the possibility of utilizing CTCF as a diagnostic marker and therapeutic target for cancer will also increase. Thus, the current review intends to promote personalized and precision-based therapeutics for cancer patients by shedding light on the complex interplay between CTCF and epigenetic processes.
Collapse
Affiliation(s)
- Sayani Bose
- Department of Biotechnology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, 603203, India
| | - Srawsta Saha
- Department of Biotechnology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, 603203, India
| | - Harsita Goswami
- Department of Biotechnology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, 603203, India
| | - Geetha Shanmugam
- Department of Biotechnology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, 603203, India
| | - Koustav Sarkar
- Department of Biotechnology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, 603203, India.
| |
Collapse
|
18
|
Peng L, Huang L, Su Q, Tian G, Chen M, Han G. LDA-VGHB: identifying potential lncRNA-disease associations with singular value decomposition, variational graph auto-encoder and heterogeneous Newton boosting machine. Brief Bioinform 2023; 25:bbad466. [PMID: 38127089 PMCID: PMC10734633 DOI: 10.1093/bib/bbad466] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 10/05/2023] [Accepted: 11/25/2023] [Indexed: 12/23/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) participate in various biological processes and have close linkages with diseases. In vivo and in vitro experiments have validated many associations between lncRNAs and diseases. However, biological experiments are time-consuming and expensive. Here, we introduce LDA-VGHB, an lncRNA-disease association (LDA) identification framework, by incorporating feature extraction based on singular value decomposition and variational graph autoencoder and LDA classification based on heterogeneous Newton boosting machine. LDA-VGHB was compared with four classical LDA prediction methods (i.e. SDLDA, LDNFSGB, IPCARF and LDASR) and four popular boosting models (XGBoost, AdaBoost, CatBoost and LightGBM) under 5-fold cross-validations on lncRNAs, diseases, lncRNA-disease pairs and independent lncRNAs and independent diseases, respectively. It greatly outperformed the other methods with its prominent performance under four different cross-validations on the lncRNADisease and MNDR databases. We further investigated potential lncRNAs for lung cancer, breast cancer, colorectal cancer and kidney neoplasms and inferred the top 20 lncRNAs associated with them among all their unobserved lncRNAs. The results showed that most of the predicted top 20 lncRNAs have been verified by biomedical experiments provided by the Lnc2Cancer 3.0, lncRNADisease v2.0 and RNADisease databases as well as publications. We found that HAR1A, KCNQ1DN, ZFAT-AS1 and HAR1B could associate with lung cancer, breast cancer, colorectal cancer and kidney neoplasms, respectively. The results need further biological experimental validation. We foresee that LDA-VGHB was capable of identifying possible lncRNAs for complex diseases. LDA-VGHB is publicly available at https://github.com/plhhnu/LDA-VGHB.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, 412007, Hunan, China
- College of Life Sciences and Chemistry, Hunan University of Technology, 412007, Hunan, China
| | - Liangliang Huang
- School of Computer Science, Hunan University of Technology, 412007, Hunan, China
| | - Qiongli Su
- Department of Pharmacy, the Affiliated Zhuzhou Hospital Xiangya Medical College CSU, 412007, Hunan, China
| | - Geng Tian
- Geneis (Beijing) Co. Ltd, China, 100102, Beijing, China
| | - Min Chen
- School of Computer Science, Hunan Institute of Technology, 421002, No. 18 Henghua Road, Zhuhui District, Hengyang, Hunan, China
| | - Guosheng Han
- School of Mathematics and Computational Science, Xiangtan University, 411105, Yuhu District, Xiangtan, Hunan, China
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, 411105, Yuhu District, Xiangtan, Hunan, China
| |
Collapse
|
19
|
Liang X, Cao L, Chen H, Wang L, Wang Y, Fu L, Tan X, Chen E, Ding Y, Tang J. A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study. Brief Bioinform 2023; 25:bbad497. [PMID: 38168839 PMCID: PMC10782910 DOI: 10.1093/bib/bbad497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 10/13/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open
Abstract
Cell clustering is typically the initial step in single-cell RNA sequencing (scRNA-seq) analyses. The performance of clustering considerably impacts the validity and reproducibility of cell identification. A variety of clustering algorithms have been developed for scRNA-seq data. These algorithms generate cell label sets that assign each cell to a cluster. However, different algorithms usually yield different label sets, which can introduce variations in cell-type identification based on the generated label sets. Currently, the performance of these algorithms has not been systematically evaluated in single-cell transcriptome studies. Herein, we performed a critical assessment of seven state-of-the-art clustering algorithms including four deep learning-based clustering algorithms and commonly used methods Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL) and Single-cell consensus clustering (SC3). We used diverse evaluation indices based on 10 different scRNA-seq benchmarks to systematically evaluate their clustering performance. Our results show that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC) and SC3 consistently outperformed Single-Cell Clustering Assessment Framework and scDeepCluster based on nine effectiveness scores. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. The performance of the single-cell Variational Inference tools varied across different datasets, suggesting its sensitivity to certain dataset characteristics. Notably, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. In addition, SC3 requires more memory and exhibits slower computation speed compared to other algorithms for the same dataset. In sum, this study provides useful guidance for selecting appropriate clustering methods in scRNA-seq data analysis.
Collapse
Affiliation(s)
- Xiao Liang
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lijie Cao
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Hao Chen
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lidan Wang
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Yangyun Wang
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lijuan Fu
- Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China
- Department of Pharmacology, Academician Workstation, Changsha Medical University, Changsha 410219, China
| | - Xiaqin Tan
- The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Enxiang Chen
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
- Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Yubin Ding
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
- Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Jing Tang
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| |
Collapse
|
20
|
Wu H, Zhou B, Zhou H, Zhang P, Wang M. Be-1DCNN: a neural network model for chromatin loop prediction based on bagging ensemble learning. Brief Funct Genomics 2023; 22:475-484. [PMID: 37133976 DOI: 10.1093/bfgp/elad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 03/10/2023] [Accepted: 03/29/2023] [Indexed: 05/04/2023] Open
Abstract
The chromatin loops in the three-dimensional (3D) structure of chromosomes are essential for the regulation of gene expression. Despite the fact that high-throughput chromatin capture techniques can identify the 3D structure of chromosomes, chromatin loop detection utilizing biological experiments is arduous and time-consuming. Therefore, a computational method is required to detect chromatin loops. Deep neural networks can form complex representations of Hi-C data and provide the possibility of processing biological datasets. Therefore, we propose a bagging ensemble one-dimensional convolutional neural network (Be-1DCNN) to detect chromatin loops from genome-wide Hi-C maps. First, to obtain accurate and reliable chromatin loops in genome-wide contact maps, the bagging ensemble learning method is utilized to synthesize the prediction results of multiple 1DCNN models. Second, each 1DCNN model consists of three 1D convolutional layers for extracting high-dimensional features from input samples and one dense layer for producing the prediction results. Finally, the prediction results of Be-1DCNN are compared to those of the existing models. The experimental results indicate that Be-1DCNN predicts high-quality chromatin loops and outperforms the state-of-the-art methods using the same evaluation metrics. The source code of Be-1DCNN is available for free at https://github.com/HaoWuLab-Bioinformatics/Be1DCNN.
Collapse
Affiliation(s)
- Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
- School of Software, Shandong University, Jinan, 250101 Shandong, China
| | - Bing Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| | - Haoru Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| | - Pengyu Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| | - Meili Wang
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| |
Collapse
|
21
|
Yu X, Hu J, Zhang Y. SNN6mA: Improved DNA N6-methyladenine site prediction using Siamese network-based feature embedding. Comput Biol Med 2023; 166:107533. [PMID: 37793205 DOI: 10.1016/j.compbiomed.2023.107533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 09/01/2023] [Accepted: 09/27/2023] [Indexed: 10/06/2023]
Abstract
DNA N6-methyladenine (6mA) is one of the most common and abundant modifications, which plays essential roles in various biological processes and cellular functions. Therefore, the accurate identification of DNA 6mA sites is of great importance for a better understanding of its regulatory mechanisms and biological functions. Although significant progress has been made, there still has room for further improvement in 6mA site prediction in DNA sequences. In this study, we report a smart but accurate 6mA predictor, termed as SNN6mA, using Siamese network. To be specific, DNA segments are firstly encoded into feature vectors using the one-hot encoding scheme; then, these original feature vectors are mapped to a low-dimensional embedding space derived from Siamese network to capture more discriminative features; finally, the obtained low-dimensional features are fed to a fully connected neural network to perform final prediction. Stringent benchmarking tests on the datasets of two species demonstrated that the proposed SNN6mA is superior to the state-of-the-art 6mA predictors. Detailed data analyses show that the major advantage of SNN6mA lies in the utilization of Siamese network, which can map the original features into a low-dimensional embedding space with more discriminative capability. In summary, the proposed SNN6mA is the first attempt to use Siamese network for 6mA site prediction and could be easily extended to predict other types of modifications. The codes and datasets used in the study are freely available at https://github.com/YuXuan-Glasgow/SNN6mA for academic use.
Collapse
Affiliation(s)
- Xuan Yu
- Glasgow College, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Ying Zhang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| |
Collapse
|
22
|
Peng L, He X, Peng X, Li Z, Zhang L. STGNNks: Identifying cell types in spatial transcriptomics data based on graph neural network, denoising auto-encoder, and k-sums clustering. Comput Biol Med 2023; 166:107440. [PMID: 37738898 DOI: 10.1016/j.compbiomed.2023.107440] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/15/2023] [Accepted: 08/29/2023] [Indexed: 09/24/2023]
Abstract
BACKGROUND Spatial transcriptomics technologies fully utilize spatial location information, tissue morphological features, and transcriptional profiles. Integrating these data can greatly advance our understanding about cell biology in the morphological background. METHODS We developed an innovative spatial clustering method called STGNNks by combining graph neural network, denoising auto-encoder, and k-sums clustering. First, spatial resolved transcriptomics data are preprocessed and a hybrid adjacency matrix is constructed. Next, gene expressions and spatial context are integrated to learn spots' embedding features by a deep graph infomax-based graph convolutional network. Third, the learned features are mapped to a low-dimensional space through a zero-inflated negative binomial (ZINB)-based denoising auto-encoder. Fourth, a k-sums clustering algorithm is developed to identify spatial domains by combining k-means clustering and the ratio-cut clustering algorithms. Finally, it implements spatial trajectory inference, spatially variable gene identification, and differentially expressed gene detection based on the pseudo-space-time method on six 10x Genomics Visium datasets. RESULTS We compared our proposed STGNNks method with five other spatial clustering methods, CCST, Seurat, stLearn, Scanpy and SEDR. For the first time, four internal indicators in the area of machine learning, that is, silhouette coefficient, the Davies-Bouldin index, the Caliniski-Harabasz index, and the S_Dbw index, were used to measure the clustering performance of STGNNks with CCST, Seurat, stLearn, Scanpy and SEDR on five spatial transcriptomics datasets without labels (i.e., Adult Mouse Brain (FFPE), Adult Mouse Kidney (FFPE), Human Breast Cancer (Block A Section 2), Human Breast Cancer (FFPE), and Human Lymph Node). And two external indicators including adjusted Rand index (ARI) and normalized mutual information (NMI) were applied to evaluate the performance of the above six methods on Human Breast Cancer (Block A Section 1) with real labels. The comparison experiments elucidated that STGNNks obtained the smallest Davies-Bouldin and S_Dbw values and the largest Silhouette Coefficient, Caliniski-Harabasz, ARI and NMI, significantly outperforming the above five spatial transcriptomics analysis algorithms. Furthermore, we detected the top six spatially variable genes and the top five differentially expressed genes in each cluster on the above five unlabeled datasets. And the pseudo-space-time tree plot with hierarchical layout demonstrated a flow of Human Breast Cancer (Block A Section 1) progress in three clades branching from three invasive ductal carcinoma regions to multiple ductal carcinoma in situ sub-clusters. CONCLUSION We anticipate that STGNNks can efficiently improve spatial transcriptomics data analysis and further boost the diagnosis and therapy of related diseases. The codes are publicly available at https://github.com/plhhnu/STGNNks.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China; College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Xianzhi He
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Xinhuai Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, 421002, Hunan, China.
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, China.
| |
Collapse
|
23
|
Chen W, Wang H, Liang C. Deep multi-view contrastive learning for cancer subtype identification. Brief Bioinform 2023; 24:bbad282. [PMID: 37539822 DOI: 10.1093/bib/bbad282] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 05/29/2023] [Accepted: 07/19/2023] [Indexed: 08/05/2023] Open
Abstract
Cancer heterogeneity has posed great challenges in exploring precise therapeutic strategies for cancer treatment. The identification of cancer subtypes aims to detect patients with distinct molecular profiles and thus could provide new clues on effective clinical therapies. While great efforts have been made, it remains challenging to develop powerful computational methods that can efficiently integrate multi-omics datasets for the task. In this paper, we propose a novel self-supervised learning model called Deep Multi-view Contrastive Learning (DMCL) for cancer subtype identification. Specifically, by incorporating the reconstruction loss, contrastive loss and clustering loss into a unified framework, our model simultaneously encodes the sample discriminative information into the extracted feature representations and well preserves the sample cluster structures in the embedded space. Moreover, DMCL is an end-to-end framework where the cancer subtypes could be directly obtained from the model outputs. We compare DMCL with eight alternatives ranging from classic cancer subtype identification methods to recently developed state-of-the-art systems on 10 widely used cancer multi-omics datasets as well as an integrated dataset, and the experimental results validate the superior performance of our method. We further conduct a case study on liver cancer and the analysis results indicate that different subtypes might have different responses to the selected chemotherapeutic drugs.
Collapse
Affiliation(s)
- Wenlan Chen
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Hong Wang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| |
Collapse
|
24
|
Peng L, Huang L, Tian G, Wu Y, Li G, Cao J, Wang P, Li Z, Duan L. Predicting potential microbe-disease associations with graph attention autoencoder, positive-unlabeled learning, and deep neural network. Front Microbiol 2023; 14:1244527. [PMID: 37789848 PMCID: PMC10543759 DOI: 10.3389/fmicb.2023.1244527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 08/16/2023] [Indexed: 10/05/2023] Open
Abstract
Background Microbes have dense linkages with human diseases. Balanced microorganisms protect human body against physiological disorders while unbalanced ones may cause diseases. Thus, identification of potential associations between microbes and diseases can contribute to the diagnosis and therapy of various complex diseases. Biological experiments for microbe-disease association (MDA) prediction are expensive, time-consuming, and labor-intensive. Methods We developed a computational MDA prediction method called GPUDMDA by combining graph attention autoencoder, positive-unlabeled learning, and deep neural network. First, GPUDMDA computes disease similarity and microbe similarity matrices by integrating their functional similarity and Gaussian association profile kernel similarity, respectively. Next, it learns the feature representation of each microbe-disease pair using graph attention autoencoder based on the obtained disease similarity and microbe similarity matrices. Third, it selects a few reliable negative MDAs based on positive-unlabeled learning. Finally, it takes the learned MDA features and the selected negative MDAs as inputs and designed a deep neural network to predict potential MDAs. Results GPUDMDA was compared with four state-of-the-art MDA identification models (i.e., MNNMDA, GATMDA, LRLSHMDA, and NTSHMDA) on the HMDAD and Disbiome databases under five-fold cross validations on microbes, diseases, and microbe-disease pairs. Under the three five-fold cross validations, GPUDMDA computed the best AUCs of 0.7121, 0.9454, and 0.9501 on the HMDAD database and 0.8372, 0.8908, and 0.8948 on the Disbiome database, respectively, outperforming the other four MDA prediction methods. Asthma is the most common chronic respiratory condition and affects ~339 million people worldwide. Inflammatory bowel disease is a class of globally chronic intestinal disease widely existed in the gut and gastrointestinal tract and extraintestinal organs of patients. Particularly, inflammatory bowel disease severely affects the growth and development of children. We used the proposed GPUDMDA method and found that Enterobacter hormaechei had potential associations with both asthma and inflammatory bowel disease and need further biological experimental validation. Conclusion The proposed GPUDMDA demonstrated the powerful MDA prediction ability. We anticipate that GPUDMDA helps screen the therapeutic clues for microbe-related diseases.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
- College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China
| | - Liangliang Huang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Geng Tian
- Geneis (Beijing) Co. Ltd., Beijing, China
| | - Yan Wu
- Geneis (Beijing) Co. Ltd., Beijing, China
| | - Guang Li
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| | - Jianying Cao
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| | - Peng Wang
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lian Duan
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| |
Collapse
|
25
|
Zhang P, Wu H. IChrom-Deep: An Attention-Based Deep Learning Model for Identifying Chromatin Interactions. IEEE J Biomed Health Inform 2023; 27:4559-4568. [PMID: 37402191 DOI: 10.1109/jbhi.2023.3292299] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/06/2023]
Abstract
Identification of chromatin interactions is crucial for advancing our knowledge of gene regulation. However, due to the limitations of high-throughput experimental techniques, there is an urgent need to develop computational methods for predicting chromatin interactions. In this study, we propose a novel attention-based deep learning model, termed IChrom-Deep, to identify chromatin interactions using sequence features and genomic features. The experimental results based on the datasets of three cell lines demonstrate that the IChrom-Deep achieves satisfactory performance and is superior to the previous methods. We also investigate the effect of DNA sequence and associated features and genomic features on chromatin interactions, and highlight the applicable scenarios of some features, such as sequence conservation and distance. Moreover, we identify a few genomic features that are extremely important across different cell lines, and IChrom-Deep achieves comparable performance with only these significant genomic features versus using all genomic features. It is believed that IChrom-Deep can serve as a useful tool for future studies that seek to identify chromatin interactions.
Collapse
|
26
|
Liu T, Wang Z. DeepChIA-PET: Accurately predicting ChIA-PET from Hi-C and ChIP-seq with deep dilated networks. PLoS Comput Biol 2023; 19:e1011307. [PMID: 37440599 PMCID: PMC10368233 DOI: 10.1371/journal.pcbi.1011307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023] Open
Abstract
Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) can capture genome-wide chromatin interactions mediated by a specific DNA-associated protein. The ChIA-PET experiments have been applied to explore the key roles of different protein factors in chromatin folding and transcription regulation. However, compared with widely available Hi-C and ChIP-seq data, there are not many ChIA-PET datasets available in the literature. A computational method for accurately predicting ChIA-PET interactions from Hi-C and ChIP-seq data is needed that can save the efforts of performing wet-lab experiments. Here we present DeepChIA-PET, a supervised deep learning approach that can accurately predict ChIA-PET interactions by learning the latent relationships between ChIA-PET and two widely used data types: Hi-C and ChIP-seq. We trained our deep models with CTCF-mediated ChIA-PET of GM12878 as ground truth, and the deep network contains 40 dilated residual convolutional blocks. We first showed that DeepChIA-PET with only Hi-C as input significantly outperforms Peakachu, another computational method for predicting ChIA-PET from Hi-C but using random forests. We next proved that adding ChIP-seq as one extra input does improve the classification performance of DeepChIA-PET, but Hi-C plays a more prominent role in DeepChIA-PET than ChIP-seq. Our evaluation results indicate that our learned models can accurately predict not only CTCF-mediated ChIA-ET in GM12878 and HeLa but also non-CTCF ChIA-PET interactions, including RNA polymerase II (RNAPII) ChIA-PET of GM12878, RAD21 ChIA-PET of GM12878, and RAD21 ChIA-PET of K562. In total, DeepChIA-PET is an accurate tool for predicting the ChIA-PET interactions mediated by various chromatin-associated proteins from different cell types.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| | - Zheng Wang
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| |
Collapse
|
27
|
Su D, Xiong Y, Wei H, Wang S, Ke J, Liang P, Zhang H, Yu Y, Zuo Y, Yang L. Integrated analysis of ovarian cancer patients from prospective transcription factor activity reveals subtypes of prognostic significance. Heliyon 2023; 9:e16147. [PMID: 37215759 PMCID: PMC10199194 DOI: 10.1016/j.heliyon.2023.e16147] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/04/2023] [Accepted: 05/07/2023] [Indexed: 05/24/2023] Open
Abstract
Transcription factors are protein molecules that act as regulators of gene expression. Aberrant protein activity of transcription factors can have a significant impact on tumor progression and metastasis in tumor patients. In this study, 868 immune-related transcription factors were identified from the transcription factor activity profile of 1823 ovarian cancer patients. The prognosis-related transcription factors were identified through univariate Cox analysis and random survival tree analysis, and two distinct clustering subtypes were subsequently derived based on these transcription factors. We assessed the clinical significance and genomics landscape of the two clustering subtypes and found statistically significant differences in prognosis, response to immunotherapy, and chemotherapy among ovarian cancer patients with different subtypes. Multi-scale Embedded Gene Co-expression Network Analysis was used to identify differential gene modules between the two clustering subtypes, which allowed us to conduct further analysis of biological pathways that exhibited significant differences between them. Finally, a ceRNA network was constructed to analyze lncRNA-miRNA-mRNA regulatory pairs with differential expression levels between two clustering subtypes. We expected that our study may provide some useful references for stratifying and treating patients with ovarian cancer.
Collapse
Affiliation(s)
- Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yuqiang Xiong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Haodong Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Jiawei Ke
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Pengfei Liang
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Haoxin Zhang
- Department of Gastrointestinal Oncology, Harbin Medical University Cancer Hospital, Harbin 150081, China
| | - Yao Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
- Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| |
Collapse
|
28
|
Wu H, Liu M, Zhang P, Zhang H. iEnhancer-SKNN: a stacking ensemble learning-based method for enhancer identification and classification using sequence information. Brief Funct Genomics 2023; 22:302-311. [PMID: 36715222 DOI: 10.1093/bfgp/elac057] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 12/01/2022] [Accepted: 12/13/2022] [Indexed: 01/31/2023] Open
Abstract
Enhancers, a class of distal cis-regulatory elements located in the non-coding region of DNA, play a key role in gene regulation. It is difficult to identify enhancers from DNA sequence data because enhancers are freely distributed in the non-coding region, with no specific sequence features, and having a long distance with the targeted promoters. Therefore, this study presents a stacking ensemble learning method to accurately identify enhancers and classify enhancers into strong and weak enhancers. Firstly, we obtain the fusion feature matrix by fusing the four features of Kmer, PseDNC, PCPseDNC and Z-Curve9. Secondly, five K-Nearest Neighbor (KNN) models with different parameters are trained as the base model, and the Logistic Regression algorithm is utilized as the meta-model. Thirdly, the stacking ensemble learning strategy is utilized to construct a two-layer model based on the base model and meta-model to train the preprocessed feature sets. The proposed method, named iEnhancer-SKNN, is a two-layer prediction model, in which the function of the first layer is to predict whether the given DNA sequences are enhancers or non-enhancers, and the function of the second layer is to distinguish whether the predicted enhancers are strong enhancers or weak enhancers. The performance of iEnhancer-SKNN is evaluated on the independent testing dataset and the results show that the proposed method has better performance in predicting enhancers and their strength. In enhancer identification, iEnhancer-SKNN achieves an accuracy of 81.75%, an improvement of 1.35% to 8.75% compared with other predictors, and in enhancer classification, iEnhancer-SKNN achieves an accuracy of 80.50%, an improvement of 5.5% to 25.5% compared with other predictors. Moreover, we identify key transcription factor binding site motifs in the enhancer regions and further explore the biological functions of the enhancers and these key motifs. Source code and data can be downloaded from https://github.com/HaoWuLab-Bioinformatics/iEnhancer-SKNN.
Collapse
Affiliation(s)
- Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.,School of Software, Shandong University, Jinan, 250101, Shandong, China
| | - Mengdi Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Pengyu Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Hongming Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| |
Collapse
|
29
|
Wang Y, Lian B, Zhang H, Zhong Y, He J, Wu F, Reinert K, Shang X, Yang H, Hu J. A multi-view latent variable model reveals cellular heterogeneity in complex tissues for paired multimodal single-cell data. Bioinformatics 2023; 39:btad005. [PMID: 36622018 PMCID: PMC9857983 DOI: 10.1093/bioinformatics/btad005] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 12/27/2022] [Accepted: 01/06/2023] [Indexed: 01/10/2023] Open
Abstract
MOTIVATION Single-cell multimodal assays allow us to simultaneously measure two different molecular features of the same cell, enabling new insights into cellular heterogeneity, cell development and diseases. However, most existing methods suffer from inaccurate dimensionality reduction for the joint-modality data, hindering their discovery of novel or rare cell subpopulations. RESULTS Here, we present VIMCCA, a computational framework based on variational-assisted multi-view canonical correlation analysis to integrate paired multimodal single-cell data. Our statistical model uses a common latent variable to interpret the common source of variances in two different data modalities. Our approach jointly learns an inference model and two modality-specific non-linear models by leveraging variational inference and deep learning. We perform VIMCCA and compare it with 10 existing state-of-the-art algorithms on four paired multi-modal datasets sequenced by different protocols. Results demonstrate that VIMCCA facilitates integrating various types of joint-modality data, thus leading to more reliable and accurate downstream analysis. VIMCCA improves our ability to identify novel or rare cell subtypes compared to existing widely used methods. Besides, it can also facilitate inferring cell lineage based on joint-modality profiles. AVAILABILITY AND IMPLEMENTATION The VIMCCA algorithm has been implemented in our toolkit package scbean (≥0.5.0), and its code has been archived at https://github.com/jhu99/scbean under MIT license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuwei Wang
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Bin Lian
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Haohui Zhang
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Yuanke Zhong
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Jie He
- Department of Biostatistics, School of Public Health, Peking University Health Science Center, Beijing 100191, China
| | - Fashuai Wu
- Department of Orthopaedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Knut Reinert
- Institut für Informatik, Freie Universität Berlin, 14195 Berlin, Germany
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Hui Yang
- School of Life Science, Northwestern Polytechnical University, Shaanxi 710072, China
| | - Jialu Hu
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| |
Collapse
|