1
|
Zou HT, Ji BY, Xie XL. A multi-source molecular network representation model for protein-protein interactions prediction. Sci Rep 2024; 14:6184. [PMID: 38485942 PMCID: PMC10940665 DOI: 10.1038/s41598-024-56286-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 03/05/2024] [Indexed: 03/18/2024] Open
Abstract
The prediction of potential protein-protein interactions (PPIs) is a critical step in decoding diseases and understanding cellular mechanisms. Traditional biological experiments have identified plenty of potential PPIs in recent years, but this problem is still far from being solved. Hence, there is urgent to develop computational models with good performance and high efficiency to predict potential PPIs. In this study, we propose a multi-source molecular network representation learning model (called MultiPPIs) to predict potential protein-protein interactions. Specifically, we first extract the protein sequence features according to the physicochemical properties of amino acids by utilizing the auto covariance method. Second, a multi-source association network is constructed by integrating the known associations among miRNAs, proteins, lncRNAs, drugs, and diseases. The graph representation learning method, DeepWalk, is adopted to extract the multisource association information of proteins with other biomolecules. In this way, the known protein-protein interaction pairs can be represented as a concatenation of the protein sequence and the multi-source association features of proteins. Finally, the Random Forest classifier and corresponding optimal parameters are used for training and prediction. In the results, MultiPPIs obtains an average 86.03% prediction accuracy with 82.69% sensitivity at the AUC of 93.03% under five-fold cross-validation. The experimental results indicate that MultiPPIs has a good prediction performance and provides valuable insights into the field of potential protein-protein interactions prediction. MultiPPIs is free available at https://github.com/jiboyalab/multiPPIs .
Collapse
Affiliation(s)
- Hai-Tao Zou
- College of Information Science and Engineering, Guilin University of Technology, Guilin, 541000, China
| | - Bo-Ya Ji
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China.
| | - Xiao-Lan Xie
- College of Information Science and Engineering, Guilin University of Technology, Guilin, 541000, China.
| |
Collapse
|
2
|
Gong M, He Y, Wang M, Zhang Y, Ding C. Interpretable single-cell transcription factor prediction based on deep learning with attention mechanism. Comput Biol Chem 2023; 106:107923. [PMID: 37598467 DOI: 10.1016/j.compbiolchem.2023.107923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 07/01/2023] [Accepted: 07/12/2023] [Indexed: 08/22/2023]
Abstract
Predicting the transcription factor binding site (TFBS) in the whole genome range is essential in exploring the rule of gene transcription control. Although many deep learning methods to predict TFBS have been proposed, predicting TFBS using single-cell ATAC-seq data and embedding attention mechanisms needs to be improved. To this end, we present IscPAM, an interpretable method based on deep learning with an attention mechanism to predict single-cell transcription factors. Our model adopts the convolution neural network to extract the data feature and optimize the pre-trained model. In particular, the model obtains faster training and prediction due to the embedded attention mechanism. For datasets, we take ATAC-seq, ChIP-seq, and DNA sequences data for the pre-trained model, and single-cell ATAC-seq data is used to predict the TF binding graph in the given cell. We verify the interpretability of the model through ablation experiments and sensitivity analysis. IscPAM can efficiently predict the combination of whole genome transcription factors in single cells and study cellular heterogeneity through chromatin accessibility of related diseases.
Collapse
Affiliation(s)
- Meiqin Gong
- West China Second University Hospital, Sichuan University, Chengdu 610041, China
| | - Yuchen He
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Maocheng Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Chunli Ding
- Sichuan Institute of Computer Sciences, Chengdu 610041, China.
| |
Collapse
|
3
|
Zhu X, Pang L, Ding X, Lan W, Meng S, Peng X. A Gene Correlation Measurement Method for Spatial Transcriptome Data Based on Partitioning and Distribution. J Comput Biol 2023; 30:877-888. [PMID: 37471241 DOI: 10.1089/cmb.2023.0108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/22/2023] Open
Abstract
Spatial transcriptome (ST) technology provides both the spatial location and transcriptional profile of spots, as well as tissue images. ST data can be utilized to construct gene regulatory networks, which can help identify gene modules that facilitate the understanding of biological processes such as cell communication. Correlation measurement is the core basis for constructing a gene regulatory network. However, due to the high noise and sparsity in ST data, common correlation measurement methods such as the Pearson correlation coefficient (PCC) and Spearman correlation coefficient (SPCC) are not suitable. In this work, a new gene correlation measurement method called STgcor is proposed. STgcor defines vertexes as spots in a two-dimensional coordinate plane consisting of axes X and Y from the gene pair (X and Y). The joint probability density of Gaussian distribution of the gene pair (X and Y) is calculated to identify and eliminate outliers. To overcome sparsity, the degree, trend, and location of the distribution of vertexes are used to measure the correlation between gene pairs (X, Y). To validate the performance of the STgcor method, it is compared with the PCC and SPCC in a weighted coexpression network analysis method using two ST datasets of breast cancer and prostate cancer. The gene modules identified by these methods are then compared and analyzed. The results show that the STgcor method detects some special gene modules and cancer-related pathways that cannot be detected by the other two methods.
Collapse
Affiliation(s)
- Xiaoshu Zhu
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
- School of Computer, Electronics, and Information Science and Engineering, Guangxi University, Nanning, China
| | - Liyuan Pang
- School of Computer, Electronics, and Information Science and Engineering, Guangxi University, Nanning, China
| | - Xiaojun Ding
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Wei Lan
- School of Computer, Electronics, and Information Science and Engineering, Guangxi University, Nanning, China
| | - Shuang Meng
- School of Computer Science and Engineering, Guangxi Normal University, Guilin, China
| | - Xiaoqing Peng
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| |
Collapse
|
4
|
Zhang P, Zhang W, Sun W, Li L, Xu J, Wang L, Wong L. A lncRNA-disease association prediction tool development based on bridge heterogeneous information network via graph representation learning for family medicine and primary care. Front Genet 2023; 14:1084482. [PMID: 37274787 PMCID: PMC10234424 DOI: 10.3389/fgene.2023.1084482] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 05/02/2023] [Indexed: 06/07/2023] Open
Abstract
Identification of long non-coding RNAs (lncRNAs) associated with common diseases is crucial for patient self-diagnosis and monitoring of health conditions using artificial intelligence (AI) technology at home. LncRNAs have gained significant attention due to their crucial roles in the pathogenesis of complex human diseases and identifying their associations with diseases can aid in developing diagnostic biomarkers at the molecular level. Computational methods for predicting lncRNA-disease associations (LDAs) have become necessary due to the time-consuming and labor-intensive nature of wet biological experiments in hospitals, enabling patients to access LDAs through their AI terminal devices at any time. Here, we have developed a predictive tool, LDAGRL, for identifying potential LDAs using a bridge heterogeneous information network (BHnet) constructed via Structural Deep Network Embedding (SDNE). The BHnet consists of three types of molecules as bridge nodes to implicitly link the lncRNA with disease nodes and the SDNE is used to learn high-quality node representations and make LDA predictions in a unified graph space. To assess the feasibility and performance of LDAGRL, extensive experiments, including 5-fold cross-validation, comparison with state-of-the-art methods, comparison on different classifiers and comparison of different node feature combinations, were conducted, and the results showed that LDAGRL achieved satisfactory prediction performance, indicating its potential as an effective LDAs prediction tool for family medicine and primary care.
Collapse
Affiliation(s)
- Ping Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Weihan Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Weicheng Sun
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Jinsheng Xu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Lei Wang
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, China
| | - Leon Wong
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, China
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| |
Collapse
|
5
|
Li W, Wang S, Xu J, Xiang J. Inferring Latent MicroRNA-Disease Associations on a Gene-Mediated Tripartite Heterogeneous Multiplexing Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3190-3201. [PMID: 35041612 DOI: 10.1109/tcbb.2022.3143770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
MicroRNA (miRNA) is a class of non-coding single-stranded RNA molecules encoded by endogenous genes with a length of about 22 nucleotides. MiRNAs have been successfully identified as differentially expressed in various cancers. There is evidence that disorders of miRNAs are associated with a variety of complex diseases. Therefore, inferring potential miRNA-disease associations (MDAs) is very important for understanding the aetiology and pathogenesis of many diseases and is useful to disease diagnosis, prognosis and treatment. First, We creatively fused multiple similarity subnetworks from multi-sources for miRNAs, genes and diseases by multiplexing technology, respectively. Then, three multiplexed biological subnetworks are connected through the extended binary association to form a tripartite complete heterogeneous multiplexed network (Tri-HM). Finally, because the constructed Tri-HM network can retain subnetworks' original topology and biological functions and expands the binary association and dependence between the three biological entities, rich neighbourhood information is obtained iteratively from neighbours by a non-equilibrium random walk. Through cross-validation, our tri-HM-RWR model obtained an AUC value of 0.8657, and an AUPR value of 0.2139 in the global 5-fold cross-validation, which shows that our model can more fully speculate disease-related miRNAs.
Collapse
|
6
|
Li B, Tian Y, Tian Y, Zhang S, Zhang X. Predicting Cancer Lymph-Node Metastasis From LncRNA Expression Profiles Using Local Linear Reconstruction Guided Distance Metric Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3179-3189. [PMID: 35139024 DOI: 10.1109/tcbb.2022.3149791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Lymph-node metastasis is the most perilous cancer progressive state, where long non-coding RNA (lncRNA) has been confirmed to be an important genetic indicator in cancer prediction. However, lncRNA expression profile is often characterized of large features and small samples, it is urgent to establish an efficient judgment to deal with such high dimensional lncRNA data, which will aid in clinical targeted treatment. Thus, in this study, a local linear reconstruction guided distance metric learning is put forward to handle lncRNA data for determination of cancer lymph-node metastasis. In the original locally linear embedding (LLE) approach, any point can be approximately linearly reconstructed using its nearest neighborhood points, from which a novel distance metric can be learned by satisfying both nonnegative and sum-to-one constraints on the reconstruction weights. Taking the defined distance metric and lncRNA data supervised information into account, a local margin model will be deduced to find a low dimensional subspace for lncRNA signature extraction. At last, a classifier is constructed to predict cancer lymph-node metastasis, where the learned distance metric is also adopted. Several experiments on lncRNA data sets have been carried out, and experimental results show the performance of the proposed method by making comparisons with some other related dimensionality reduction methods and the classical classifier models.
Collapse
|
7
|
Lu X, Li J, Zhu Z, Yuan Y, Chen G, He K. Predicting miRNA-Disease Associations via Combining Probability Matrix Feature Decomposition With Neighbor Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3160-3170. [PMID: 34260356 DOI: 10.1109/tcbb.2021.3097037] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Predicting the associations of miRNAs and diseases may uncover the causation of various diseases. Many methods are emerging to tackle the sparse and unbalanced disease related miRNA prediction. Here, we propose a Probabilistic matrix decomposition combined with neighbor learning to identify MiRNA-Disease Associations utilizing heterogeneous data(PMDA). First, we build similarity networks for diseases and miRNAs, respectively, by integrating semantic information and functional interactions. Second, we construct a neighbor learning model in which the neighbor information of individual miRNA or disease is utilized to enhance the association relationship to tackle the spare problem. Third, we predict the potential association between miRNAs and diseases via probability matrix decomposition. The experimental results show that PMDA is superior to other five methods in sparse and unbalanced data. The case study shows that the new miRNA-disease interactions predicted by the PMDA are effective and the performance of the PMDA is superior to other methods.
Collapse
|
8
|
Zhang Q, Zhang Y, Wang S, Chen ZH, Gribova V, Filaretov VF, Huang DS. Predicting In-Vitro DNA-Protein Binding With a Spatially Aligned Fusion of Sequence and Shape. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3144-3153. [PMID: 34882561 DOI: 10.1109/tcbb.2021.3133869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Discovery of transcription factor binding sites (TFBSs) is of primary importance for understanding the underlying binding mechanic and gene regulation process. Growing evidence indicates that apart from the primary DNA sequences, DNA shape landscape has a significant influence on transcription factor binding preference. To effectively model the co-influence of sequence and shape features, we emphasize the importance of position information of sequence motif and shape pattern. In this paper, we propose a novel deep learning-based architecture, named hybridShape eDeepCNN, for TFBS prediction which integrates DNA sequence and shape information in a spatially aligned manner. Our model utilizes the power of the multi-layer convolutional neural network and constructs an independent subnetwork to adapt for the distinct data distribution of heterogeneous features. Besides, we explore the usage of continuous embedding vectors as the representation of DNA sequences. Based on the experiments on 20 in-vitro datasets derived from universal protein binding microarrays (uPBMs), we demonstrate the superiority of our proposed method and validate the underlying design logic.
Collapse
|
9
|
Narayanan G, Ali MS, Alsulami H, Saeed T, Ahmad B. Synchronization of T–S Fuzzy Fractional-Order Discrete-Time Complex-Valued Molecular Models of mRNA and Protein in Regulatory Mechanisms with Leakage Effects. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11010-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
10
|
Reza MS, Hossen MA, Harun-Or-Roshid M, Siddika MA, Kabir MH, Mollah MNH. Metadata analysis to explore hub of the hub-genes highlighting their functions, pathways and regulators for cervical cancer diagnosis and therapies. Discov Oncol 2022; 13:79. [PMID: 35994213 PMCID: PMC9395557 DOI: 10.1007/s12672-022-00546-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 08/11/2022] [Indexed: 11/25/2022] Open
Abstract
Cervical cancer (CC) is considered as the fourth most common women cancer globally.that shows malignant features of local infiltration and invasion into adjacent organs and tissues. There are several individual studies in the literature that explored CC-causing hub-genes (HubGs), however, we observed that their results are not so consistent. Therefore, the main objective of this study was to explore hub of the HubGs (hHubGs) that might be more representative CC-causing HubGs compare to the single study based HubGs. We reviewed 52 published articles and found 255 HubGs/studied-genes in total. Among them, we selected 10 HubGs (CDK1, CDK2, CHEK1, MKI67, TOP2A, BRCA1, PLK1, CCNA2, CCNB1, TYMS) as the hHubGs by the protein-protein interaction (PPI) network analysis. Then, we validated their differential expression patterns between CC and control samples through the GPEA database. The enrichment analysis of HubGs revealed some crucial CC-causing biological processes (BPs), molecular functions (MFs) and cellular components (CCs) by involving hHubGs. The gene regulatory network (GRN) analysis identified four TFs proteins and three miRNAs as the key transcriptional and post-transcriptional regulators of hHubGs. Then, we identified hHubGs-guided top-ranked FDA-approved 10 candidate drugs and validated them against the state-of-the-arts independent receptors by molecular docking analysis. Finally, we investigated the binding stability of the top-ranked three candidate drugs (Docetaxel, Temsirolimus, Paclitaxel) by using 100 ns MD-based MM-PBSA simulations and observed their stable performance. Therefore the finding of this study might be the useful resources for CC diagnosis and therapies.
Collapse
Affiliation(s)
- Md. Selim Reza
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh
| | - Md. Alim Hossen
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh
| | - Md. Harun-Or-Roshid
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh
| | - Mst. Ayesha Siddika
- Microbiology Lab, Department of Veterinary and Animal Sciences, University of Rajshahi, Rajshahi-6205, Bangladesh
| | - Md. Hadiul Kabir
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh
| | - Md. Nurul Haque Mollah
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh
| |
Collapse
|
11
|
Wang Y, Wang LL, Wong L, Li Y, Wang L, You ZH. SIPGCN: A Novel Deep Learning Model for Predicting Self-Interacting Proteins from Sequence Information Using Graph Convolutional Networks. Biomedicines 2022; 10:biomedicines10071543. [PMID: 35884848 PMCID: PMC9313220 DOI: 10.3390/biomedicines10071543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
Protein is the basic organic substance that constitutes the cell and is the material condition for the life activity and the guarantee of the biological function activity. Elucidating the interactions and functions of proteins is a central task in exploring the mysteries of life. As an important protein interaction, self-interacting protein (SIP) has a critical role. The fast growth of high-throughput experimental techniques among biomolecules has led to a massive influx of available SIP data. How to conduct scientific research using the massive amount of SIP data has become a new challenge that is being faced in related research fields such as biology and medicine. In this work, we design an SIP prediction method SIPGCN using a deep learning graph convolutional network (GCN) based on protein sequences. First, protein sequences are characterized using a position-specific scoring matrix, which is able to describe the biological evolutionary message, then their hidden features are extracted by the deep learning method GCN, and, finally, the random forest is utilized to predict whether there are interrelationships between proteins. In the cross-validation experiment, SIPGCN achieved 93.65% accuracy and 99.64% specificity in the human data set. SIPGCN achieved 90.69% and 99.08% of these two indicators in the yeast data set, respectively. Compared with other feature models and previous methods, SIPGCN showed excellent results. These outcomes suggest that SIPGCN may be a suitable instrument for predicting SIP and may be a reliable candidate for future wet experiments.
Collapse
Affiliation(s)
- Ying Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
| | - Lin-Lin Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Correspondence: (L.-L.W.); (L.W.)
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- Correspondence: (L.-L.W.); (L.W.)
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
12
|
Bagherabadi A, Hooshmand A, Shekari N, Singh P, Zolghadri S, Stanek A, Dohare R. Correlation of NTRK1 Downregulation with Low Levels of Tumor-Infiltrating Immune Cells and Poor Prognosis of Prostate Cancer Revealed by Gene Network Analysis. Genes (Basel) 2022; 13:840. [PMID: 35627227 PMCID: PMC9140438 DOI: 10.3390/genes13050840] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 05/05/2022] [Accepted: 05/06/2022] [Indexed: 11/23/2022] Open
Abstract
Prostate cancer (PCa) is a life-threatening heterogeneous malignancy of the urinary tract. Due to the incidence of prostate cancer and the crucial need to elucidate its molecular mechanisms, we searched for possible prognosis impactful genes in PCa using bioinformatics analysis. A script in R language was used for the identification of Differentially Expressed Genes (DEGs) from the GSE69223 dataset. The gene ontology (GO) of the DEGs and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed. A protein-protein interaction (PPI) network was constructed using the STRING online database to identify hub genes. GEPIA and UALCAN databases were utilized for survival analysis and expression validation, and 990 DEGs (316 upregulated and 674 downregulated) were identified. The GO analysis was enriched mainly in the "collagen-containing extracellular matrix", and the KEGG pathway analysis was enriched mainly in "focal adhesion". The downregulation of neurotrophic receptor tyrosine kinase 1 (NTRK1) was associated with a poor prognosis of PCa and had a significant positive correlation with infiltrating levels of immune cells. We acquired a collection of pathways related to primary PCa, and our findings invite the further exploration of NTRK1 as a biomarker for early diagnosis and prognosis, and as a future potential molecular therapeutic target for PCa.
Collapse
Affiliation(s)
- Arash Bagherabadi
- Department of Biology, Faculty of Sciences, University of Mohaghegh Ardabili, Ardabil 56199-11367, Iran;
| | - Amirreza Hooshmand
- Department of Biology, Jahrom Branch, Islamic Azad University, Jahrom 74147-85318, Iran;
| | - Nooshin Shekari
- Department of Biology, Faculty of Sciences, Shahid Chamran University of Ahvaz, Ahvaz 61357-83151, Iran;
| | - Prithvi Singh
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi 110025, India; (P.S.); (R.D.)
| | - Samaneh Zolghadri
- Department of Biology, Jahrom Branch, Islamic Azad University, Jahrom 74147-85318, Iran;
| | - Agata Stanek
- Department and Clinic of Internal Medicine, Angiology and Physical Medicine, Faculty of Medical Sciences in Zabrze, Medical University of Silesia, Batorego 15 St., 41-902 Bytom, Poland
| | - Ravins Dohare
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi 110025, India; (P.S.); (R.D.)
| |
Collapse
|
13
|
Bioinformatics Screening of Potential Biomarkers from mRNA Expression Profiles to Discover Drug Targets and Agents for Cervical Cancer. Int J Mol Sci 2022; 23:ijms23073968. [PMID: 35409328 PMCID: PMC8999699 DOI: 10.3390/ijms23073968] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/13/2022] [Accepted: 03/22/2022] [Indexed: 02/06/2023] Open
Abstract
Bioinformatics analysis has been playing a vital role in identifying potential genomic biomarkers more accurately from an enormous number of candidates by reducing time and cost compared to the wet-lab-based experimental procedures for disease diagnosis, prognosis, and therapies. Cervical cancer (CC) is one of the most malignant diseases seen in women worldwide. This study aimed at identifying potential key genes (KGs), highlighting their functions, signaling pathways, and candidate drugs for CC diagnosis and targeting therapies. Four publicly available microarray datasets of CC were analyzed for identifying differentially expressed genes (DEGs) by the LIMMA approach through GEO2R online tool. We identified 116 common DEGs (cDEGs) that were utilized to identify seven KGs (AURKA, BRCA1, CCNB1, CDK1, MCM2, NCAPG2, and TOP2A) by the protein–protein interaction (PPI) network analysis. The GO functional and KEGG pathway enrichment analyses of KGs revealed some important functions and signaling pathways that were significantly associated with CC infections. The interaction network analysis identified four TFs proteins and two miRNAs as the key transcriptional and post-transcriptional regulators of KGs. Considering seven KGs-based proteins, four key TFs proteins, and already published top-ranked seven KGs-based proteins (where five KGs were common with our proposed seven KGs) as drug target receptors, we performed their docking analysis with the 80 meta-drug agents that were already published by different reputed journals as CC drugs. We found Paclitaxel, Vinorelbine, Vincristine, Docetaxel, Everolimus, Temsirolimus, and Cabazitaxel as the top-ranked seven candidate drugs. Finally, we investigated the binding stability of the top-ranked three drugs (Paclitaxel, Vincristine, Vinorelbine) by using 100 ns MD-based MM-PBSA simulations with the three top-ranked proposed receptors (AURKA, CDK1, TOP2A) and observed their stable performance. Therefore, the proposed drugs might play a vital role in the treatment against CC.
Collapse
|
14
|
Shen Z, Zhang Q, Han K, Huang DS. A Deep Learning Model for RNA-Protein Binding Preference Prediction Based on Hierarchical LSTM and Attention Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:753-762. [PMID: 32750884 DOI: 10.1109/tcbb.2020.3007544] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Attention mechanism has the ability to find important information in the sequence. The regions of the RNA sequence that can bind to proteins are more important than those that cannot bind to proteins. Neither conventional methods nor deep learning-based methods, they are not good at learning this information. In this study, LSTM is used to extract the correlation features between different sites in RNA sequence. We also use attention mechanism to evaluate the importance of different sites in RNA sequence. We get the optimal combination of k-mer length, k-mer stride window, k-mer sentence length, k-mer sentence stride window, and optimization function through hyper-parm experiments. The results show that the performance of our method is better than other methods. We tested the effects of changes in k-mer vector length on model performance. We show model performance changes under various k-mer related parameter settings. Furthermore, we investigate the effect of attention mechanism and RNA structure data on model performance.
Collapse
|
15
|
Hameed Y, Khan M. Discovery of novel six genes-based cervical cancer-associated biomarkers that are capable to break the heterogeneity barrier and applicable at the global level. J Cancer Res Ther 2022. [DOI: 10.4103/jcrt.jcrt_1588_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|
16
|
González-Espinoza A, Zamora-Fuentes J, Hernández-Lemus E, Espinal-Enríquez J. Gene Co-Expression in Breast Cancer: A Matter of Distance. Front Oncol 2021; 11:726493. [PMID: 34868919 PMCID: PMC8636045 DOI: 10.3389/fonc.2021.726493] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 10/26/2021] [Indexed: 01/16/2023] Open
Abstract
Gene regulatory and signaling phenomena are known to be relevant players underlying the establishment of cellular phenotypes. It is also known that such regulatory programs are disrupted in cancer, leading to the onset and development of malignant phenotypes. Gene co-expression matrices have allowed us to compare and analyze complex phenotypes such as breast cancer (BrCa) and their control counterparts. Global co-expression patterns have revealed, for instance, that the highest gene-gene co-expression interactions often occur between genes from the same chromosome (cis-), meanwhile inter-chromosome (trans-) interactions are scarce and have lower correlation values. Furthermore, strength of cis- correlations have been shown to decay with the chromosome distance of gene couples. Despite this loss of long-distance co-expression has been clearly identified, it has been observed only in a small fraction of the whole co-expression landscape, namely the most significant interactions. For that reason, an approach that takes into account the whole interaction set results appealing. In this work, we developed a hybrid method to analyze whole-chromosome Pearson correlation matrices for the four BrCa subtypes (Luminal A, Luminal B, HER2+ and Basal), as well as adjacent normal breast tissue derived matrices. We implemented a systematic method for clustering gene couples, by using eigenvalue spectral decomposition and the k–medoids algorithm, allowing us to determine a number of clusters without removing any interaction. With this method we compared, for each chromosome in the five phenotypes: a) Whether or not the gene-gene co-expression decays with the distance in the breast cancer subtypes b) the chromosome location of cis- clusters of gene couples, and c) whether or not the loss of long-distance co-expression is observed in the whole range of interactions. We found that in the correlation matrix for the control phenotype, positive and negative Pearson correlations deviate from a random null model independently of the distance between couples. Conversely, for all BrCa subtypes, in all chromosomes, positive correlations decay with distance, and negative correlations do not differ from the null model. We also found that BrCa clusters are distance-dependent, meanwhile for the control phenotype, chromosome location does not determine the clustering. To our knowledge, this is the first time that a dependence on distance is reported for gene clusters in breast cancer. Since this method uses the whole cis- interaction geneset, combination with other -omics approaches may provide further evidence to understand in a more integrative fashion, the mechanisms that disrupt gene regulation in cancer.
Collapse
Affiliation(s)
- Alfredo González-Espinoza
- Department of Biology, University of Pennsylvania, Philadelphia, PA, United States.,Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
| | - Jose Zamora-Fuentes
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autόnoma de México, Mexico City, Mexico
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autόnoma de México, Mexico City, Mexico
| |
Collapse
|
17
|
Xiao Q, Dai J, Luo J. A survey of circular RNAs in complex diseases: databases, tools and computational methods. Brief Bioinform 2021; 23:6407737. [PMID: 34676391 DOI: 10.1093/bib/bbab444] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 09/21/2021] [Accepted: 09/28/2021] [Indexed: 01/22/2023] Open
Abstract
Circular RNAs (circRNAs) are a category of novelty discovered competing endogenous non-coding RNAs that have been proved to implicate many human complex diseases. A large number of circRNAs have been confirmed to be involved in cancer progression and are expected to become promising biomarkers for tumor diagnosis and targeted therapy. Deciphering the underlying relationships between circRNAs and diseases may provide new insights for us to understand the pathogenesis of complex diseases and further characterize the biological functions of circRNAs. As traditional experimental methods are usually time-consuming and laborious, computational models have made significant progress in systematically exploring potential circRNA-disease associations, which not only creates new opportunities for investigating pathogenic mechanisms at the level of circRNAs, but also helps to significantly improve the efficiency of clinical trials. In this review, we first summarize the functions and characteristics of circRNAs and introduce some representative circRNAs related to tumorigenesis. Then, we mainly investigate the available databases and tools dedicated to circRNA and disease studies. Next, we present a comprehensive review of computational methods for predicting circRNA-disease associations and classify them into five categories, including network propagating-based, path-based, matrix factorization-based, deep learning-based and other machine learning methods. Finally, we further discuss the challenges and future researches in this field.
Collapse
Affiliation(s)
- Qiu Xiao
- Hunan Normal University and Hunan Xiangjiang Artificial Intelligence Academy, Changsha, China
| | - Jianhua Dai
- Hunan Normal University and Hunan Xiangjiang Artificial Intelligence Academy, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
18
|
Thummadi NB, T M, Vindal V, P M. Prioritizing the candidate genes related to cervical cancer using the moment of inertia tensor. Proteins 2021; 90:363-371. [PMID: 34468998 DOI: 10.1002/prot.26226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 08/07/2021] [Accepted: 08/16/2021] [Indexed: 12/24/2022]
Abstract
It is well known that cervical cancer poses the fourth most malignancy threat to women worldwide among all cancer types. There is a tremendous improvement in realizing the underlying molecular associations in cervical cancer. Several studies reported pieces of evidence for the involvement of various genes in the disease progression. However, with the ever-evolving bioinformatics tools, there has been an upsurge in predicting numerous genes responsible for cervical cancer progression and making it highly complex to target the genes for further evaluation. In this article, we prioritized the candidate genes based on the sequence similarity analysis with known cancer genes. For this purpose, we used the concept of the moment of inertia tensor, which reveals the similarities between the protein sequences more efficiently. Tensor for moment of inertia explores the similarity of the protein sequences based on the physicochemical properties of amino acids. From our analysis, we obtained 14 candidate cervical cancer genes, which are highly similar to known cervical cancer genes. Further, we analyzed the GO terms and prioritized these genes based on the number of hits with biological process, molecular functions, and their involvement in KEGG pathways. We also discussed the evidence-based involvement of the prioritized genes in other cancers and listed the available drugs for those genes.
Collapse
Affiliation(s)
- Neelesh Babu Thummadi
- Department of Animal Biology, School of Life Sciences, University of Hyderabad, Gachibowli, Hyderabad, India
| | - Mallikarjuna T
- Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, Gachibowli, Hyderabad, India
| | - Vaibhav Vindal
- Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, Gachibowli, Hyderabad, India
| | - Manimaran P
- School of Physics, University of Hyderabad, Gachibowli, Hyderabad, India
| |
Collapse
|
19
|
Zhang Q, Yu W, Han K, Nandi AK, Huang DS. Multi-Scale Capsule Network for Predicting DNA-Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1793-1800. [PMID: 32960766 DOI: 10.1109/tcbb.2020.3025579] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Discovering DNA-protein binding sites, also known as motif discovery, is the foundation for further analysis of transcription factors (TFs). Deep learning algorithms such as convolutional neural networks (CNN) have been introduced to motif discovery task and have achieved state-of-art performance. However, due to the limitations of CNN, motif discovery methods based on CNN do not take full advantage of large-scale sequencing data generated by high-throughput sequencing technology. Hence, in this paper we propose multi-scale capsule network architecture (MSC) integrating multi-scale CNN, a variant of CNN able to extract motif features of different lengths, and capsule network, a novel type of artificial neural network architecture aimed at improving CNN. The proposed method is tested on real ChIP-seq datasets and the experimental results show a considerable improvement compared with two well-tested deep learning-based sequence model, DeepBind and Deepsea.
Collapse
|
20
|
Zhang Q, Wang D, Han K, Huang DS. Predicting TF-DNA Binding Motifs from ChIP-seq Datasets Using the Bag-Based Classifier Combined With a Multi-Fold Learning Scheme. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1743-1751. [PMID: 32946398 DOI: 10.1109/tcbb.2020.3025007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The rapid development of high-throughput sequencing technology provides unique opportunities for studying of transcription factor binding sites, but also brings new computational challenges. Recently, a series of discriminative motif discovery (DMD) methods have been proposed and offer promising solutions for addressing these challenges. However, because of the huge computation cost, most of them have to choose approximate schemes that either sacrifice the accuracy of motif representation or tune motif parameter indirectly. In this paper, we propose a bag-based classifier combined with a multi-fold learning scheme (BCMF) to discover motifs from ChIP-seq datasets. First, BCMF formulates input sequences as a labeled bag naturally. Then, a bag-based classifier, combining with a bag feature extracting strategy, is applied to construct the objective function, and a multi-fold learning scheme is used to solve it. Compared with the existing DMD tools, BCMF features three improvements: 1) Learning position weight matrix (PWM) directly in a continuous space; 2) Proposing to represent a positive bag with a feature fused by its k "most positive" patterns. 3) Applying a more advanced learning scheme. The experimental results on 134 ChIP-seq datasets show that BCMF substantially outperforms existing DMD methods (including DREME, HOMER, XXmotif, motifRG, EDCOD and our previous work).
Collapse
|
21
|
Gakii C, Bwana BK, Mugambi GG, Mukoya E, Mireji PO, Rimiru R. In silico-driven analysis of the Glossina morsitans morsitans antennae transcriptome in response to repellent or attractant compounds. PeerJ 2021; 9:e11691. [PMID: 34249514 PMCID: PMC8255069 DOI: 10.7717/peerj.11691] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 06/08/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND High-throughput sequencing generates large volumes of biological data that must be interpreted to make meaningful inference on the biological function. Problems arise due to the large number of characteristics p (dimensions) that describe each record [n] in the database. Feature selection using a subset of variables extracted from the large datasets is one of the approaches towards solving this problem. METHODOLOGY In this study we analyzed the transcriptome of Glossina morsitans morsitans (Tsetsefly) antennae after exposure to either a repellant (δ-nonalactone) or an attractant (ε-nonalactone). We identified 308 genes that were upregulated or downregulated due to exposure to a repellant (δ-nonalactone) or an attractant (ε-nonalactone) respectively. Weighted gene coexpression network analysis was used to cluster the genes into 12 modules and filter unconnected genes. Discretized and association rule mining was used to find association between genes thereby predicting the putative function of unannotated genes. RESULTS AND DISCUSSION Among the significantly expressed chemosensory genes (FDR < 0.05) in response to Ɛ-nonalactone were gustatory receptors (GrIA and Gr28b), ionotrophic receptors (Ir41a and Ir75a), odorant binding proteins (Obp99b, Obp99d, Obp59a and Obp28a) and the odorant receptor (Or67d). Several non-chemosensory genes with no assigned function in the NCBI database were co-expressed with the chemosensory genes. Exposure to a repellent (δ-nonalactone) did not show any significant change between the treatment and control samples. We generated a coexpression network with 276 edges and 130 nodes. Genes CAH3, Ahcy, Ir64a, Or67c, Ir8a and Or67a had node degree values above 11 and therefore could be regarded as the top hub genes in the network. Association rule mining showed a relation between various genes based on their appearance in the same itemsets as consequent and antecedent.
Collapse
Affiliation(s)
- Consolata Gakii
- Department of Mathematics, Computing and Information Technology, University of Embu, Embu, Eastern, Kenya
- School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Nairobi, Kenya
| | | | - Grace Gathoni Mugambi
- School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Nairobi, Kenya
| | - Esther Mukoya
- School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Nairobi, Kenya
| | - Paul O. Mireji
- Biotechnology Research Center, Kenya Agricultural & Livestock Research Organization, Nairobi, Nairobi, Kenya
| | - Richard Rimiru
- School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Nairobi, Kenya
| |
Collapse
|
22
|
Park Y, Heider D, Hauschild AC. Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence. Cancers (Basel) 2021; 13:3148. [PMID: 34202427 PMCID: PMC8269018 DOI: 10.3390/cancers13133148] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/16/2021] [Accepted: 06/21/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Anne-Christin Hauschild
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
- Department of Medical Informatics, University Medical Center Göttingen, 37075 Göttingen, Germany
| |
Collapse
|
23
|
Lin X, Zhang X. Identification of hot regions in hub protein-protein interactions by clustering and PPRA optimization. BMC Med Inform Decis Mak 2021; 21:143. [PMID: 33941163 PMCID: PMC8094484 DOI: 10.1186/s12911-020-01350-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 11/23/2020] [Indexed: 11/24/2022] Open
Abstract
Background Protein–protein interactions (PPIs) are the core of protein function, which provide an effective means to understand the function at cell level. Identification of PPIs is the crucial foundation of predicting drug-target interactions. Although traditional biological experiments of identifying PPIs are becoming available, these experiments remain to be extremely time-consuming and expensive. Therefore, various computational models have been introduced to identify PPIs. In protein-protein interaction network (PPIN), Hub protein, as a highly connected node, can coordinate PPIs and play biological functions. Detecting hot regions on Hub protein interaction interfaces is an issue worthy of discussing. Methods Two clustering methods, LCSD and RCNOIK are used to detect the hot regions on Hub protein interaction interfaces in this paper. In order to improve the efficiency of K-means clustering algorithm, the best k value is selected by calculating the distance square sum and the average silhouette coefficients. Then, the optimization of residue coordination number strategy is used to calculate the average coordination number. In addition, the pair potentials and relative ASA (PPRA) strategy is also used to optimize the predicted results. Results DataHub dataset and PartyHub dataset were used to train two clustering models respectively. Experiments show that LCSD and RCNOIK have the same coverage with Hub protein datasets, and RCNOIK is slightly higher than LCSD in Precision. The predicted hot regions are closer to the standard hot regions. Conclusions This paper optimizes two clustering methods based on PPRA strategy. Compared our methods for hot regions prediction against the well-known approaches, our improved methods have the higher reliability and are effective for predicting hot regions on Hub protein interaction interfaces.
Collapse
Affiliation(s)
- Xiaoli Lin
- Hubei Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, People's Republic of China.
| | - Xiaolong Zhang
- Hubei Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, People's Republic of China
| |
Collapse
|
24
|
Wang J, Zhao Y, Gong W, Liu Y, Wang M, Huang X, Tan J. EDLMFC: an ensemble deep learning framework with multi-scale features combination for ncRNA-protein interaction prediction. BMC Bioinformatics 2021; 22:133. [PMID: 33740884 PMCID: PMC7980572 DOI: 10.1186/s12859-021-04069-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Accepted: 03/05/2021] [Indexed: 11/29/2022] Open
Abstract
Background Non-coding RNA (ncRNA) and protein interactions play essential roles in various physiological and pathological processes. The experimental methods used for predicting ncRNA–protein interactions are time-consuming and labor-intensive. Therefore, there is an increasing demand for computational methods to accurately and efficiently predict ncRNA–protein interactions. Results In this work, we presented an ensemble deep learning-based method, EDLMFC, to predict ncRNA–protein interactions using the combination of multi-scale features, including primary sequence features, secondary structure sequence features, and tertiary structure features. Conjoint k-mer was used to extract protein/ncRNA sequence features, integrating tertiary structure features, then fed into an ensemble deep learning model, which combined convolutional neural network (CNN) to learn dominating biological information with bi-directional long short-term memory network (BLSTM) to capture long-range dependencies among the features identified by the CNN. Compared with other state-of-the-art methods under five-fold cross-validation, EDLMFC shows the best performance with accuracy of 93.8%, 89.7%, and 86.1% on RPI1807, NPInter v2.0, and RPI488 datasets, respectively. The results of the independent test demonstrated that EDLMFC can effectively predict potential ncRNA–protein interactions from different organisms. Furtherly, EDLMFC is also shown to predict hub ncRNAs and proteins presented in ncRNA–protein networks of Mus musculus successfully. Conclusions In general, our proposed method EDLMFC improved the accuracy of ncRNA–protein interaction predictions and anticipated providing some helpful guidance on ncRNA functions research. The source code of EDLMFC and the datasets used in this work are available at https://github.com/JingjingWang-87/EDLMFC. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04069-9.
Collapse
Affiliation(s)
- Jingjing Wang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Yanpeng Zhao
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Weikang Gong
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Yang Liu
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Mei Wang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Xiaoqian Huang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Jianjun Tan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China.
| |
Collapse
|
25
|
Zhang Q, Shen Z, Huang DS. Predicting in-vitro Transcription Factor Binding Sites Using DNA Sequence + Shape. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:667-676. [PMID: 31634140 DOI: 10.1109/tcbb.2019.2947461] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Discovery of transcription factor binding sites (TFBSs) is essential for understanding the underlying binding mechanisms and cellular functions. Recently, Convolutional neural network (CNN) has succeeded in predicting TFBSs from the primary DNA sequences. In addition to DNA sequences, several evidences suggest that protein-DNA binding is partly mediated by properties of DNA shape. Although many methods have been proposed to jointly account for DNA sequences and shape properties in predicting TFBSs, they ignore the power of the combination of deep learning and DNA sequence + shape. Therefore we develop a deep-learning-based sequence + shape framework (DLBSS) in this paper, which appropriately integrates DNA sequences and shape properties, to better understand protein-DNA binding preference. This method uses a shared CNN to find their common patterns from DNA sequences and their corresponding shape features, which are then concatenated to compute a predicted value. Using 66 in-vitro datasets derived from universal protein binding microarrays (uPBMs), we show that our proposed method DLBSS significantly improves the performance of predicting TFBSs. In addition, we explain the reason why we should use the shared CNN, and explore the performance of DLBSS when using a deeper CNN, through a series of experiments.
Collapse
|
26
|
Schulc K, Nagy ZT, Kamp S, Molnár J, Veres DV, Csermely P, Kovács BM. Modular Reorganization of Signaling Networks during the Development of Colon Adenoma and Carcinoma. J Phys Chem B 2021; 125:1716-1726. [PMID: 33562960 PMCID: PMC8023713 DOI: 10.1021/acs.jpcb.0c09307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
Abstract
![]()
Network science is
an emerging tool in systems biology and oncology,
providing novel, system-level insight into the development of cancer.
The aim of this project was to study the signaling networks in the
process of oncogenesis to explore the adaptive mechanisms taking part
in the cancerous transformation of healthy cells. For this purpose,
colon cancer proved to be an excellent candidate as the preliminary
phase, and adenoma has a long evolution time. In our work, transcriptomic
data have been collected from normal colon, colon adenoma, and colon
cancer samples to calculating link (i.e., network edge) weights as
approximative proxies for protein abundances, and link weights were
included in the Human Cancer Signaling Network. Here we show that
the adenoma phase clearly differs from the normal and cancer states
in terms of a more scattered link weight distribution and enlarged
network diameter. Modular analysis shows the rearrangement of the
apoptosis- and the cell-cycle-related modules, whose pathway enrichment
analysis supports the relevance of targeted therapy. Our work enriches
the system-wide assessment of cancer development, showing specific
changes for the adenoma state.
Collapse
Affiliation(s)
- Klára Schulc
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary
| | - Zsolt T Nagy
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary
| | | | | | - Daniel V Veres
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary.,Turbine Ltd, Budapest, Hungary
| | - Peter Csermely
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary
| | - Borbála M Kovács
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary
| |
Collapse
|
27
|
Wang S, Zhang Q, Shen Z, He Y, Chen ZH, Li J, Huang DS. Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture. MOLECULAR THERAPY-NUCLEIC ACIDS 2021; 24:154-163. [PMID: 33767912 PMCID: PMC7972936 DOI: 10.1016/j.omtn.2021.02.014] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 02/14/2021] [Indexed: 12/26/2022]
Abstract
The study of transcriptional regulation is still difficult yet fundamental in molecular biology research. Recent research has shown that the double helix structure of nucleotides plays an important role in improving the accuracy and interpretability of transcription factor binding sites (TFBSs). Although several computational methods have been designed to take both DNA sequence and DNA shape features into consideration simultaneously, how to design an efficient model is still an intractable topic. In this paper, we proposed a hybrid convolutional recurrent neural network (CNN/RNN) architecture, CRPTS, to predict TFBSs by combining DNA sequence and DNA shape features. The novelty of our proposed method relies on three critical aspects: (1) the application of a shared hybrid CNN and RNN has the ability to efficiently extract features from large-scale genomic sequences obtained by high-throughput technology; (2) the common patterns were found from DNA sequences and their corresponding DNA shape features; (3) our proposed CRPTS can capture local structural information of DNA sequences without completely relying on DNA shape data. A series of comprehensive experiments on 66 in vitro datasets derived from universal protein binding microarrays (uPBMs) shows that our proposed method CRPTS obviously outperforms the state-of-the-art methods.
Collapse
Affiliation(s)
- Siguo Wang
- The Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China
| | - Qinhu Zhang
- The Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China.,Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Tongji University, Siping Road 1239, Shanghai 200092, China
| | - Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Changjiang Road 80, Nanyang, Henan 473004, China
| | - Ying He
- The Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China
| | - Zhen-Heng Chen
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Jianqiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - De-Shuang Huang
- The Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China
| |
Collapse
|
28
|
Gupta MK, Ramakrishna V. Identification of targeted molecules in cervical cancer by computational approaches. A THERANOSTIC AND PRECISION MEDICINE APPROACH FOR FEMALE-SPECIFIC CANCERS 2021:213-222. [DOI: 10.1016/b978-0-12-822009-2.00011-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
|
29
|
Iliopoulos A, Beis G, Apostolou P, Papasotiriou I. Complex Networks, Gene Expression and Cancer Complexity: A Brief Review of Methodology and Applications. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017093504] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
In this brief survey, various aspects of cancer complexity and how this complexity can
be confronted using modern complex networks’ theory and gene expression datasets, are described.
In particular, the causes and the basic features of cancer complexity, as well as the challenges
it brought are underlined, while the importance of gene expression data in cancer research
and in reverse engineering of gene co-expression networks is highlighted. In addition, an introduction
to the corresponding theoretical and mathematical framework of graph theory and complex
networks is provided. The basics of network reconstruction along with the limitations of gene
network inference, the enrichment and survival analysis, evolution, robustness-resilience and cascades
in complex networks, are described. Finally, an indicative and suggestive example of a cancer
gene co-expression network inference and analysis is given.
Collapse
Affiliation(s)
- A.C. Iliopoulos
- Research and Development Department, Research Genetic Cancer Centre S.A., Florina, Greece
| | - G. Beis
- Research and Development Department, Research Genetic Cancer Centre S.A., Florina, Greece
| | - P. Apostolou
- Research and Development Department, Research Genetic Cancer Centre S.A., Florina, Greece
| | - I. Papasotiriou
- Research Genetic Cancer Centre International GmbH, Zug, Switzerland
| |
Collapse
|
30
|
Hind J, Lisboa P, Hussain AJ, Al-Jumeily D. A Novel Approach to Detecting Epistasis using Random Sampling Regularisation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1535-1545. [PMID: 31634840 DOI: 10.1109/tcbb.2019.2948330] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Epistasis is a progressive approach that complements the 'common disease, common variant' hypothesis that highlights the potential for connected networks of genetic variants collaborating to produce a phenotypic expression. Epistasis is commonly performed as a pairwise or limitless-arity capacity that considers variant networks as either variant vs variant or as high order interactions. This type of analysis extends the number of tests that were previously performed in a standard approach such as Genome-Wide Association Study (GWAS), in which False Discovery Rate (FDR) is already an issue, therefore by multiplying the number of tests up to a factorial rate also increases the issue of FDR. Further to this, epistasis introduces its own limitations of computational complexity and intensity that are generated based on the analysis performed; to consider the most intense approach, a multivariate analysis introduces a time complexity of O(n!). Proposed in this paper is a novel methodology for the detection of epistasis using interpretable methods and best practice to outline interactions through filtering processes. Using a process of Random Sampling Regularisation which randomly splits and produces sample sets to conduct a voting system to regularise the significance and reliability of biological markers, SNPs. Preliminary results are promising, outlining a concise detection of interactions. Results for the detection of epistasis, in the classification of breast cancer patients, indicated eight outlined risk candidate interactions from five variants and a singular candidate variant with high protective association.
Collapse
|
31
|
Lin X, Zhang X, Xu X. Efficient Classification of Hot Spots and Hub Protein Interfaces by Recursive Feature Elimination and Gradient Boosting. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1525-1534. [PMID: 31380766 DOI: 10.1109/tcbb.2019.2931717] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Proteins are not isolated biological molecules, which have the specific three-dimensional structures and interact with other proteins to perform functions. A small number of residues (hot spots) in protein-protein interactions (PPIs) play the vital role in bioinformatics to influence and control of biological processes. This paper uses the boosting algorithm and gradient boosting algorithm based on two feature selection strategies to classify hot spots with three common datasets and two hub protein datasets. First, the correlation-based feature selection is used to remove the highly related features for improving accuracy of prediction. Then, the recursive feature elimination based on support vector machine (SVM-RFE) is adopted to select the optimal feature subset to improve the training performance. Finally, boosting and gradient boosting (G-boosting) methods are invoked to generate classification results. Gradient boosting is capable of obtaining an excellent model by reducing the loss function in the gradient direction to avoid overfitting. Five datasets from different protein databases are used to verify our models in the experiments. Experimental results show that our proposed classification models have the competitive performance compared with existing classification methods.
Collapse
|
32
|
Shen Z, Deng SP, Huang DS. RNA-Protein Binding Sites Prediction via Multi Scale Convolutional Gated Recurrent Unit Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1741-1750. [PMID: 30990191 DOI: 10.1109/tcbb.2019.2910513] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
RNA-Protein binding plays important roles in the field of gene expression. With the development of high throughput sequencing, several conventional methods and deep learning-based methods have been proposed to predict the binding preference of RNA-protein binding. These methods can hardly meet the need of consideration of the dependencies between subsequence and the various motif lengths of different translation factors (TFs). To overcome such limitations, we propose a predictive model that utilizes a combination of multi-scale convolutional layers and bidirectional gated recurrent unit (GRU) layer. Multi-scale convolution layer has the ability to capture the motif features of different lengths, and bidirectional GRU layer is able to capture the dependencies among subsequence. Experimental results show that the proposed method performs better than four state-of-the-art methods in this field. In addition, we investigate the effect of model structure on model performance by performing our proposed method with a different convolution layer and a different number of kernel size. We also demonstrate the effectiveness of bidirectional GRU in improving model performance through comparative experiments.
Collapse
|
33
|
Peng C, Zheng Y, Huang DS. Capsule Network Based Modeling of Multi-omics Data for Discovery of Breast Cancer-Related Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1605-1612. [PMID: 30969931 DOI: 10.1109/tcbb.2019.2909905] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Breast cancer is one of the most common cancers all over the world, which bring about more than 450,000 deaths each year. Although this malignancy has been extensively studied by a large number of researchers, its prognosis is still poor. Since therapeutic advance can be obtained based on gene signatures, there is an urgent need to discover genes related to breast cancer that may help uncover the mechanisms in cancer progression. We propose a deep learning method for the discovery of breast cancer-related genes by using Capsule Network based Modeling of Multi-omics Data (CapsNetMMD). In CapsNetMMD, we make use of known breast cancer-related genes to transform the issue of gene identification into the issue of supervised classification. The features of genes are generated through comprehensive integration of multi-omics data, e.g., mRNA expression, z scores for mRNA expression, DNA methylation, and two forms of DNA copy-number alterations (CNAs). By modeling features based on the capsule network, we identify breast cancer-related genes with a significantly better performance than other existing machine learning methods. The predicted genes with prognostic values play potential important roles in breast cancer and may serve as candidates for biologists and medical scientists in the future studies of biomarkers.
Collapse
|
34
|
Shen Z, Deng SP, Huang DS. Capsule Network for Predicting RNA-Protein Binding Preferences Using Hybrid Feature. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1483-1492. [PMID: 31562101 DOI: 10.1109/tcbb.2019.2943465] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
RNA-Protein binding is involved in many different biological processes. With the progress of technology, more and more data are available for research. Based on these data, many prediction methods have been proposed to predict RNA-Protein binding preference. Some of these methods use only RNA sequence features for prediction, and some methods use multiple features for prediction. But, the performance of these methods is not satisfactory. In this study, we propose an improved capsule network to predict RNA-protein binding preferences, which can use both RNA sequence features and structure features. Experimental results show that our proposed method iCapsule performs better than three baseline methods in this field. We used both RNA sequence features and structure features in the model, so we tested the effect of primary capsule layer changes on model performance. In addition, we also studied the impact of model structure on model performance by performing our proposed method with different number of convolution layers and different kernel sizes.
Collapse
|
35
|
Abdulla M, Khasawneh MT. G-Forest: An ensemble method for cost-sensitive feature selection in gene expression microarrays. Artif Intell Med 2020; 108:101941. [DOI: 10.1016/j.artmed.2020.101941] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 06/27/2020] [Accepted: 08/07/2020] [Indexed: 12/27/2022]
|
36
|
Yu L, Wei M, Li F. Longitudinal Analysis of Gene Expression Changes During Cervical Carcinogenesis Reveals Potential Therapeutic Targets. Evol Bioinform Online 2020; 16:1176934320920574. [PMID: 32489245 PMCID: PMC7241206 DOI: 10.1177/1176934320920574] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Accepted: 03/24/2020] [Indexed: 01/06/2023] Open
Abstract
Despite advances in the treatment of cervical cancer (CC), the prognosis of patients with CC remains to be improved. This study aimed to explore candidate gene targets for CC. CC datasets were downloaded from the Gene Expression Omnibus database. Genes with similar expression trends in varying steps of CC development were clustered using Short Time-series Expression Miner (STEM) software. Gene functions were then analyzed using the Gene Ontology (GO) database and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. Protein interactions among genes of interest were predicted, followed by drug-target genes and prognosis-associated genes. The expressions of the predicted genes were determined using real-time quantitative polymerase chain reaction (RT-qPCR) and Western blotting. Red and green profiles with upward and downward gene expressions, respectively, were screened using STEM software. Genes with increased expression were significantly enriched in DNA replication, cell-cycle-related biological processes, and the p53 signaling pathway. Based on the predicted results of the Drug-Gene Interaction database, 17 drug-gene interaction pairs, including 3 red profile genes (TOP2A, RRM2, and POLA1) and 16 drugs, were obtained. The Cancer Genome Atlas data analysis showed that high POLA1 expression was significantly correlated with prolonged survival, indicating that POLA1 is protective against CC. RT-qPCR and Western blotting showed that the expressions of TOP2A, RRM2, and POLA1 gradually increased in the multistep process of CC. TOP2A, RRM2, and POLA1 may be targets for the treatment of CC. However, many studies are needed to validate our findings.
Collapse
Affiliation(s)
- Lijun Yu
- Department of Gynecology, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Meiyan Wei
- Department of Gynecology, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Fengyan Li
- Department of Gynecology, First Hospital of Shanxi Medical University, Taiyuan, China
| |
Collapse
|
37
|
Xiao Q, Zhang N, Luo J, Dai J, Tang X. Adaptive multi-source multi-view latent feature learning for inferring potential disease-associated miRNAs. Brief Bioinform 2020; 22:2043-2057. [PMID: 32186712 DOI: 10.1093/bib/bbaa028] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 02/16/2020] [Accepted: 01/14/2020] [Indexed: 12/13/2022] Open
Abstract
Accumulating evidence has shown that microRNAs (miRNAs) play crucial roles in different biological processes, and their mutations and dysregulations have been proved to contribute to tumorigenesis. In silico identification of disease-associated miRNAs is a cost-effective strategy to discover those most promising biomarkers for disease diagnosis and treatment. The increasing available omics data sources provide unprecedented opportunities to decipher the underlying relationships between miRNAs and diseases by computational models. However, most existing methods are biased towards a single representation of miRNAs or diseases and are also not capable of discovering unobserved associations for new miRNAs or diseases without association information. In this study, we present a novel computational method with adaptive multi-source multi-view latent feature learning (M2LFL) to infer potential disease-associated miRNAs. First, we adopt multiple data sources to obtain similarity profiles and capture different latent features according to the geometric characteristic of miRNA and disease spaces. Then, the multi-modal latent features are projected to a common subspace to discover unobserved miRNA-disease associations in both miRNA and disease views, and an adaptive joint graph regularization term is developed to preserve the intrinsic manifold structures of multiple similarity profiles. Meanwhile, the Lp,q-norms are imposed into the projection matrices to ensure the sparsity and improve interpretability. The experimental results confirm the superior performance of our proposed method in screening reliable candidate disease miRNAs, which suggests that M2LFL could be an efficient tool to discover diagnostic biomarkers for guiding laborious clinical trials.
Collapse
|
38
|
Abstract
Aims:Post-Translational Modifications (PTMs), which include more than 450 types, can be regarded as the fundamental cellular regulation.Background:Recently, experiments demonstrated that the lysine malonylation modification is a significant process in several organisms and cells. Meanwhile, malonylation plays an important role in the regulation of protein subcellular localization, stability, translocation to lipid rafts and many other protein functions.Objective:Identification of malonylation will contribute to understanding the molecular mechanism in the field of biology. Nevertheless, several existing experimental approaches, which can hardly meet the need of the high speed data generation, are expensive and time-consuming. Moreover, some machine learning methods can hardly meet the high-accuracy need in this issue.Methods:In this study, we proposed a method, named MSIT that means malonylation sites identification tree, utilized the amino acid residues and profile information to identify the lysine malonylation sites with the tree structural neural network in the peptides sequence level.Methods:The proposed algorithm can get 0.8699 of F1 score and 89.34% in true positive ratio in E. coli. MSIT outperformed existing malonylation site identification methods and features on different species datasets.Conclusion:Based on these measures, it can be demonstrated that MSIT will be helpful in identifying candidate malonylation sites.
Collapse
Affiliation(s)
- Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou 221018, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Yue-Hui Chen
- School of Information, University of Jinan, Jinan 250022, China
| |
Collapse
|
39
|
Gan Y, Li N, Xin Y, Zou G. TriPCE: A Novel Tri-Clustering Algorithm for Identifying Pan-Cancer Epigenetic Patterns. Front Genet 2020; 10:1298. [PMID: 32010182 PMCID: PMC6974616 DOI: 10.3389/fgene.2019.01298] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Accepted: 11/25/2019] [Indexed: 11/20/2022] Open
Abstract
Epigenetic alteration is a fundamental characteristic of nearly all human cancers. Tumor cells not only harbor genetic alterations, but also are regulated by diverse epigenetic modifications. Identification of epigenetic similarities across different cancer types is beneficial for the discovery of treatments that can be extended to different cancers. Nowadays, abundant epigenetic modification profiles have provided a great opportunity to achieve this goal. Here, we proposed a new approach TriPCE, introducing tri-clustering strategy to integrative pan-cancer epigenomic analysis. The method is able to identify coherent patterns of various epigenetic modifications across different cancer types. To validate its capability, we applied the proposed TriPCE to analyze six important epigenetic marks among seven cancer types, and identified significant cross-cancer epigenetic similarities. These results suggest that specific epigenetic patterns indeed exist among these investigated cancers. Furthermore, the gene functional analysis performed on the associated gene sets demonstrates strong relevance with cancer development and reveals consistent risk tendency among these investigated cancer types.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Ning Li
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Yongchang Xin
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| |
Collapse
|
40
|
Identification of common candidate genes and pathways for progression of ovarian, cervical and endometrial cancers. Meta Gene 2020. [DOI: 10.1016/j.mgene.2019.100634] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|
41
|
Sun S, Yu X, Sun F, Tang Y, Zhao J, Zeng T. Dynamically characterizing individual clinical change by the steady state of disease-associated pathway. BMC Bioinformatics 2019; 20:697. [PMID: 31874621 PMCID: PMC6929545 DOI: 10.1186/s12859-019-3271-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Background Along with the development of precision medicine, individual heterogeneity is attracting more and more attentions in clinical research and application. Although the biomolecular reaction seems to be some various when different individuals suffer a same disease (e.g. virus infection), the final pathogen outcomes of individuals always can be mainly described by two categories in clinics, i.e. symptomatic and asymptomatic. Thus, it is still a great challenge to characterize the individual specific intrinsic regulatory convergence during dynamic gene regulation and expression. Except for individual heterogeneity, the sampling time also increase the expression diversity, so that, the capture of similar steady biological state is a key to characterize individual dynamic biological processes. Results Assuming the similar biological functions (e.g. pathways) should be suitable to detect consistent functions rather than chaotic genes, we design and implement a new computational framework (ABP: Attractor analysis of Boolean network of Pathway). ABP aims to identify the dynamic phenotype associated pathways in a state-transition manner, using the network attractor to model and quantify the steady pathway states characterizing the final steady biological sate of individuals (e.g. normal or disease). By analyzing multiple temporal gene expression datasets of virus infections, ABP has shown its effectiveness on identifying key pathways associated with phenotype change; inferring the consensus functional cascade among key pathways; and grouping pathway activity states corresponding to disease states. Conclusions Collectively, ABP can detect key pathways and infer their consensus functional cascade during dynamical process (e.g. virus infection), and can also categorize individuals with disease state well, which is helpful for disease classification and prediction.
Collapse
Affiliation(s)
- Shaoyan Sun
- School of Mathematics and Statistics Science, Ludong University, Yantai, 264025, China.
| | - Xiangtian Yu
- Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, 200233, China.,Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, 200031, China
| | - Fengnan Sun
- Medical Laboratory, Yantaishan Hospital, Yantai, 264001, China
| | - Ying Tang
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, 200031, China
| | - Juan Zhao
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, 200031, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, 200031, China. .,Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, 201210, China.
| |
Collapse
|
42
|
Wu H, Yang R, Fu Q, Chen J, Lu W, Li H. Research on predicting 2D-HP protein folding using reinforcement learning with full state space. BMC Bioinformatics 2019; 20:685. [PMID: 31874607 PMCID: PMC6929271 DOI: 10.1186/s12859-019-3259-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein structure prediction has always been an important issue in bioinformatics. Prediction of the two-dimensional structure of proteins based on the hydrophobic polarity model is a typical non-deterministic polynomial hard problem. Currently reported hydrophobic polarity model optimization methods, greedy method, brute-force method, and genetic algorithm usually cannot converge robustly to the lowest energy conformations. Reinforcement learning with the advantages of continuous Markov optimal decision-making and maximizing global cumulative return is especially suitable for solving global optimization problems of biological sequences. RESULTS In this study, we proposed a novel hydrophobic polarity model optimization method derived from reinforcement learning which structured the full state space, and designed an energy-based reward function and a rigid overlap detection rule. To validate the performance, sixteen sequences were selected from the classical data set. The results indicated that reinforcement learning with full states successfully converged to the lowest energy conformations against all sequences, while the reinforcement learning with partial states folded 50% sequences to the lowest energy conformations. Reinforcement learning with full states hits the lowest energy on an average 5 times, which is 40 and 100% higher than the three and zero hit by the greedy algorithm and reinforcement learning with partial states respectively in the last 100 episodes. CONCLUSIONS Our results indicate that reinforcement learning with full states is a powerful method for predicting two-dimensional hydrophobic-polarity protein structure. It has obvious competitive advantages compared with greedy algorithm and reinforcement learning with partial states.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Ru Yang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Qiming Fu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China. .,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Jianping Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Weizhong Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Haiou Li
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| |
Collapse
|
43
|
Han Z, Wang T, Tian R, Zhou W, Wang P, Ren P, Zong J, Hu Y, Jin S, Jiang Q. BIN1 rs744373 variant shows different association with Alzheimer's disease in Caucasian and Asian populations. BMC Bioinformatics 2019; 20:691. [PMID: 31874619 PMCID: PMC6929404 DOI: 10.1186/s12859-019-3264-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND The association between BIN1 rs744373 variant and Alzheimer's disease (AD) had been identified by genome-wide association studies (GWASs) as well as candidate gene studies in Caucasian populations. But in East Asian populations, both positive and negative results had been identified by association studies. Considering the smaller sample sizes of the studies in East Asian, we believe that the results did not have enough statistical power. RESULTS We conducted a meta-analysis with 71,168 samples (22,395 AD cases and 48,773 controls, from 37 studies of 19 articles). Based on the additive model, we observed significant genetic heterogeneities in pooled populations as well as Caucasians and East Asians. We identified a significant association between rs744373 polymorphism with AD in pooled populations (P = 5 × 10- 07, odds ratio (OR) = 1.12, and 95% confidence interval (CI) 1.07-1.17) and in Caucasian populations (P = 3.38 × 10- 08, OR = 1.16, 95% CI 1.10-1.22). But in the East Asian populations, the association was not identified (P = 0.393, OR = 1.057, and 95% CI 0.95-1.15). Besides, the regression analysis suggested no significant publication bias. The results for sensitivity analysis as well as meta-analysis under the dominant model and recessive model remained consistent, which demonstrated the reliability of our finding. CONCLUSIONS The large-scale meta-analysis highlighted the significant association between rs744373 polymorphism and AD risk in Caucasian populations but not in the East Asian populations.
Collapse
Affiliation(s)
- Zhifa Han
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tao Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Rui Tian
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Peng Ren
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jian Zong
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yang Hu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Shuilin Jin
- Department of Mathematics, Harbin Institute of Technology, Harbin, China.
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
44
|
Zou B, Chen C, Zhao R, Ouyang P, Zhu C, Chen Q, Duan X. A novel glaucomatous representation method based on Radon and wavelet transform. BMC Bioinformatics 2019; 20:693. [PMID: 31874641 PMCID: PMC6929399 DOI: 10.1186/s12859-019-3267-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Glaucoma is an irreversible eye disease caused by the optic nerve injury. Therefore, it usually changes the structure of the optic nerve head (ONH). Clinically, ONH assessment based on fundus image is one of the most useful way for glaucoma detection. However, the effective representation for ONH assessment is a challenging task because its structural changes result in the complex and mixed visual patterns. Method We proposed a novel feature representation based on Radon and Wavelet transform to capture these visual patterns. Firstly, Radon transform (RT) is used to map the fundus image into Radon domain, in which the spatial radial variations of ONH are converted to a discrete signal for the description of image structural features. Secondly, the discrete wavelet transform (DWT) is utilized to capture differences and get quantitative representation. Finally, principal component analysis (PCA) and support vector machine (SVM) are used for dimensionality reduction and glaucoma detection. Results The proposed method achieves the state-of-the-art detection performance on RIMONE-r2 dataset with the accuracy and area under the curve (AUC) at 0.861 and 0.906, respectively. Conclusion In conclusion, we showed that the proposed method has the capacity as an effective tool for large-scale glaucoma screening, and it can provide a reference for the clinical diagnosis on glaucoma.
Collapse
Affiliation(s)
- Beiji Zou
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.,Hunan Province Engineering Technology Research Center of Computer Vision and Intelligent Medical Treatment, Changsha, 410083, China
| | - Changlong Chen
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.,Hunan Province Engineering Technology Research Center of Computer Vision and Intelligent Medical Treatment, Changsha, 410083, China
| | - Rongchang Zhao
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China. .,Hunan Province Engineering Technology Research Center of Computer Vision and Intelligent Medical Treatment, Changsha, 410083, China.
| | - Pingbo Ouyang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.,The Second Xiangya Hospital of Central South University, Changsha, 410011, China
| | - Chengzhang Zhu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.,Hunan Province Engineering Technology Research Center of Computer Vision and Intelligent Medical Treatment, Changsha, 410083, China
| | - Qilin Chen
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.,Hunan Province Engineering Technology Research Center of Computer Vision and Intelligent Medical Treatment, Changsha, 410083, China
| | - Xuanchu Duan
- The Second Xiangya Hospital of Central South University, Changsha, 410011, China
| |
Collapse
|
45
|
Wang S, Wang X. Prediction of protein structural classes by different feature expressions based on 2-D wavelet denoising and fusion. BMC Bioinformatics 2019; 20:701. [PMID: 31874617 PMCID: PMC6929547 DOI: 10.1186/s12859-019-3276-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Protein structural class predicting is a heavily researched subject in bioinformatics that plays a vital role in protein functional analysis, protein folding recognition, rational drug design and other related fields. However, when traditional feature expression methods are adopted, the features usually contain considerable redundant information, which leads to a very low recognition rate of protein structural classes. RESULTS We constructed a prediction model based on wavelet denoising using different feature expression methods. A new fusion idea, first fuse and then denoise, is proposed in this article. Two types of pseudo amino acid compositions are utilized to distill feature vectors. Then, a two-dimensional (2-D) wavelet denoising algorithm is used to remove the redundant information from two extracted feature vectors. The two feature vectors based on parallel 2-D wavelet denoising are fused, which is known as PWD-FU-PseAAC. The related source codes are available at https://github.com/Xiaoheng-Wang12/Wang-xiaoheng/tree/master. CONCLUSIONS Experimental verification of three low-similarity datasets suggests that the proposed model achieves notably good results as regarding the prediction of protein structural classes.
Collapse
Affiliation(s)
- Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China.
| | - Xiaoheng Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China
| |
Collapse
|
46
|
Abstract
Background Cost-sensitive algorithm is an effective strategy to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically based on user expertise, which leads to unstable performance of cost-sensitive classification. Therefore, an efficient and accurate method is needed to calculate the optimal cost weights. Results In this paper, two approaches are proposed to search for the optimal cost weights, targeting at the highest weighted classification accuracy (WCA). One is the optimal cost weights grid searching and the other is the function fitting. Comparisons are made between these between the two algorithms above. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Conclusions Comprehensive experimental results show that the function fitting method is generally more efficient, which can well find the optimal cost weights with acceptable WCA.
Collapse
|
47
|
Guo M, Yu Y, Wen T, Zhang X, Liu B, Zhang J, Zhang R, Zhang Y, Zhou X. Analysis of disease comorbidity patterns in a large-scale China population. BMC Med Genomics 2019; 12:177. [PMID: 31829182 PMCID: PMC6907122 DOI: 10.1186/s12920-019-0629-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Disease comorbidity is popular and has significant indications for disease progress and management. We aim to detect the general disease comorbidity patterns in Chinese populations using a large-scale clinical data set. METHODS We extracted the diseases from a large-scale anonymized data set derived from 8,572,137 inpatients in 453 hospitals across China. We built a Disease Comorbidity Network (DCN) using correlation analysis and detected the topological patterns of disease comorbidity using both complex network and data mining methods. The comorbidity patterns were further validated by shared molecular mechanisms using disease-gene associations and pathways. To predict the disease occurrence during the whole disease progressions, we applied four machine learning methods to model the disease trajectories of patients. RESULTS We obtained the DCN with 5702 nodes and 258,535 edges, which shows a power law distribution of the degree and weight. It further indicated that there exists high heterogeneity of comorbidities for different diseases and we found that the DCN is a hierarchical modular network with community structures, which have both homogeneous and heterogeneous disease categories. Furthermore, adhering to the previous work from US and Europe populations, we found that the disease comorbidities have their shared underlying molecular mechanisms. Furthermore, take hypertension and psychiatric disease as instance, we used four classification methods to predicte the disease occurrence using the comorbid disease trajectories and obtained acceptable performance, in which in particular, random forest obtained an overall best performance (with F1-score 0.6689 for hypertension and 0.6802 for psychiatric disease). CONCLUSIONS Our study indicates that disease comorbidity is significant and valuable to understand the disease incidences and their interactions in real-world populations, which will provide important insights for detection of the patterns of disease classification, diagnosis and prognosis.
Collapse
Affiliation(s)
- Mengfei Guo
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China
| | - Yanan Yu
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China
| | - Tiancai Wen
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, China.,School of Computer Science, Northwestern Polytechnical University, Xi'an, 710129, Shanxi Province, China
| | - Xiaoping Zhang
- China Academy of Chinese Medicine Sciences, Beijing, 100070, China
| | - Baoyan Liu
- China Academy of Chinese Medicine Sciences, Beijing, 100070, China.
| | - Jin Zhang
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Runshun Zhang
- China Academy of Chinese Medical Sciences, Guang'anmen Hospital, Beijing, 100053, China
| | - Yanning Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710129, Shanxi Province, China.
| | - Xuezhong Zhou
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China.
| |
Collapse
|
48
|
Wang S, Hu X, Feng Z, Zhang X, Liu L, Sun K, Xu S. Recognizing ion ligand binding sites by SMO algorithm. BMC Mol Cell Biol 2019; 20:53. [PMID: 31823742 PMCID: PMC6905020 DOI: 10.1186/s12860-019-0237-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Background In many important life activities, the execution of protein function depends on the interaction between proteins and ligands. As an important protein binding ligand, the identification of the binding site of the ion ligands plays an important role in the study of the protein function. Results In this study, four acid radical ion ligands (NO2−,CO32−,SO42−,PO43−) and ten metal ion ligands (Zn2+,Cu2+,Fe2+,Fe3+,Ca2+,Mg2+,Mn2+,Na+,K+,Co2+) are selected as the research object, and the Sequential minimal optimization (SMO) algorithm based on sequence information was proposed, better prediction results were obtained by 5-fold cross validation. Conclusions An efficient method for predicting ion ligand binding sites was presented.
Collapse
Affiliation(s)
- Shan Wang
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China.
| | - Zhenxing Feng
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Xiaojin Zhang
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Liu Liu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Kai Sun
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Shuang Xu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| |
Collapse
|
49
|
Xu J, Wu P, Chen Y, Meng Q, Dawood H, Dawood H. A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC Bioinformatics 2019; 20:527. [PMID: 31660856 PMCID: PMC6819613 DOI: 10.1186/s12859-019-3116-7] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 09/27/2019] [Indexed: 12/11/2022] Open
Abstract
Background Cancer subtype classification attains the great importance for accurate diagnosis and personalized treatment of cancer. Latest developments in high-throughput sequencing technologies have rapidly produced multi-omics data of the same cancer sample. Many computational methods have been proposed to classify cancer subtypes, however most of them generate the model by only employing gene expression data. It has been shown that integration of multi-omics data contributes to cancer subtype classification. Results A new hierarchical integration deep flexible neural forest framework is proposed to integrate multi-omics data for cancer subtype classification named as HI-DFNForest. Stacked autoencoder (SAE) is used to learn high-level representations in each omics data, then the complex representations are learned by integrating all learned representations into a layer of autoencoder. Final learned data representations (from the stacked autoencoder) are used to classify patients into different cancer subtypes using deep flexible neural forest (DFNForest) model.Cancer subtype classification is verified on BRCA, GBM and OV data sets from TCGA by integrating gene expression, miRNA expression and DNA methylation data. These results demonstrated that integrating multiple omics data improves the accuracy of cancer subtype classification than only using gene expression data and the proposed framework has achieved better performance compared with other conventional methods. Conclusion The new hierarchical integration deep flexible neural forest framework(HI-DFNForest) is an effective method to integrate multi-omics data to classify cancer subtypes.
Collapse
Affiliation(s)
- Jing Xu
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Peng Wu
- School of Information Science and Engineering, University of Jinan, Jinan, China. .,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China.
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Qingfang Meng
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Hussain Dawood
- Department of Computer and Network Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Hassan Dawood
- Department of Software Engineering, University of Engineering and Technology, Taxila, Pakistan
| |
Collapse
|
50
|
Zhang Q, Zhu L, Huang DS. High-Order Convolutional Neural Network Architecture for Predicting DNA-Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1184-1192. [PMID: 29993783 DOI: 10.1109/tcbb.2018.2819660] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Although Deep learning algorithms have outperformed conventional methods in predicting the sequence specificities of DNA-protein binding, they lack to consider the dependencies among nucleotides and the diverse binding lengths for different transcription factors (TFs). To address the above two limitations simultaneously, in this paper, we propose a high-order convolutional neural network architecture (HOCNN), which employs a high-order encoding method to build high-order dependencies among nucleotides, and a multi-scale convolutional layer to capture the motif features of different length. The experimental results on real ChIP-seq datasets show that the proposed method outperforms the state-of-the-art deep learning method (DeepBind) in the motif discovery task. In addition, we provide further insights about the importance of introducing additional convolutional kernels and the degeneration problem of importing high-order in the motif discovery task.
Collapse
|