1
|
Geng G, Wang L, Xu Y, Wang T, Ma W, Duan H, Zhang J, Mao A. MGDDI: A multi-scale graph neural networks for drug-drug interaction prediction. Methods 2024; 228:22-29. [PMID: 38754712 DOI: 10.1016/j.ymeth.2024.05.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 05/09/2024] [Accepted: 05/12/2024] [Indexed: 05/18/2024] Open
Abstract
Drug-drug interaction (DDI) prediction is crucial for identifying interactions within drug combinations, especially adverse effects due to physicochemical incompatibility. While current methods have made strides in predicting adverse drug interactions, limitations persist. Most methods rely on handcrafted features, restricting their applicability. They predominantly extract information from individual drugs, neglecting the importance of interaction details between drug pairs. To address these issues, we propose MGDDI, a graph neural network-based model for predicting potential adverse drug interactions. Notably, we use a multiscale graph neural network (MGNN) to learn drug molecule representations, addressing substructure size variations and preventing gradient issues. For capturing interaction details between drug pairs, we integrate a substructure interaction learning module based on attention mechanisms. Our experimental results demonstrate MGDDI's superiority in predicting adverse drug interactions, offering a solution to current methodological limitations.
Collapse
Affiliation(s)
- Guannan Geng
- Department of Endocrinology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Lizhuang Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yanwei Xu
- Beidahuang Group Neuropsychiatric Hospital, Jiamusi, China; Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Tianshuo Wang
- School of Software, Shandong University, Jinan, China
| | - Wei Ma
- Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| | - Jiahui Zhang
- Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China.
| | - Anqiong Mao
- The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Department of Anesthesiology, Luzhou, China.
| |
Collapse
|
2
|
Gouveia Roque C, Phatnani H, Hengst U. The broken Alzheimer's disease genome. CELL GENOMICS 2024; 4:100555. [PMID: 38697121 PMCID: PMC11099344 DOI: 10.1016/j.xgen.2024.100555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 02/25/2024] [Accepted: 04/07/2024] [Indexed: 05/04/2024]
Abstract
The complex pathobiology of late-onset Alzheimer's disease (AD) poses significant challenges to therapeutic and preventative interventions. Despite these difficulties, genomics and related disciplines are allowing fundamental mechanistic insights to emerge with clarity, particularly with the introduction of high-resolution sequencing technologies. After all, the disrupted processes at the interface between DNA and gene expression, which we call the broken AD genome, offer detailed quantitative evidence unrestrained by preconceived notions about the disease. In addition to highlighting biological pathways beyond the classical pathology hallmarks, these advances have revitalized drug discovery efforts and are driving improvements in clinical tools. We review genetic, epigenomic, and gene expression findings related to AD pathogenesis and explore how their integration enables a better understanding of the multicellular imbalances contributing to this heterogeneous condition. The frontiers opening on the back of these research milestones promise a future of AD care that is both more personalized and predictive.
Collapse
Affiliation(s)
- Cláudio Gouveia Roque
- Center for Genomics of Neurodegenerative Disease, New York Genome Center, New York, NY 10013, USA; The Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA.
| | - Hemali Phatnani
- Center for Genomics of Neurodegenerative Disease, New York Genome Center, New York, NY 10013, USA; Department of Neurology, Center for Translational and Computational Neuroimmunology, Columbia University, New York, NY 10032, USA
| | - Ulrich Hengst
- The Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA; Department of Pathology & Cell Biology, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
3
|
Yang X, Jin J, Wang R, Li Z, Wang Y, Wei L. CACPP: A Contrastive Learning-Based Siamese Network to Identify Anticancer Peptides Based on Sequence Only. J Chem Inf Model 2024; 64:2807-2816. [PMID: 37252890 DOI: 10.1021/acs.jcim.3c00297] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Anticancer peptides (ACPs) recently have been receiving increasing attention in cancer therapy due to their low consumption, few adverse side effects, and easy accessibility. However, it remains a great challenge to identify anticancer peptides via experimental approaches, requiring expensive and time-consuming experimental studies. In addition, traditional machine-learning-based methods are proposed for ACP prediction mainly depending on hand-crafted feature engineering, which normally achieves low prediction performance. In this study, we propose CACPP (Contrastive ACP Predictor), a deep learning framework based on the convolutional neural network (CNN) and contrastive learning for accurately predicting anticancer peptides. In particular, we introduce the TextCNN model to extract the high-latent features based on the peptide sequences only and exploit the contrastive learning module to learn more distinguishable feature representations to make better predictions. Comparative results on the benchmark data sets indicate that CACPP outperforms all the state-of-the-art methods in the prediction of anticancer peptides. Moreover, to intuitively show that our model has good classification ability, we visualize the dimension reduction of the features from our model and explore the relationship between ACP sequences and anticancer functions. Furthermore, we also discuss the influence of data set construction on model prediction and explore our model performance on the data sets with verified negative samples.
Collapse
Affiliation(s)
- Xuetong Yang
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Junru Jin
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Ruheng Wang
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Zhongshen Li
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Yu Wang
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| |
Collapse
|
4
|
Wang X, Yu C, Sun Y, Liu Y, Tang S, Sun Y, Zhou Y. Three-dimensional morphology scoring of hepatocellular carcinoma stratifies prognosis and immune infiltration. Comput Biol Med 2024; 172:108253. [PMID: 38484698 DOI: 10.1016/j.compbiomed.2024.108253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/18/2024] [Accepted: 03/06/2024] [Indexed: 03/26/2024]
Abstract
BACKGROUND The morphological attributes could serve as pivotal indicators precipitating early recurrence and dismal overall survival in hepatocellular carcinoma (HCC), and quantifying morphological features may better stratify the prognosis of HCC. OBJECTIVE To develop a radiomics approach based on 3D tumor morphology features for predicting the prognosis of HCC and identifying differentially expressed genes related to morphology to guide HCC treatment. MATERIALS AND METHODS Retrospective study of 357 HCC patients. Radiomic features were extracted from MRI tumor regions; 14 morphology-related features predicted early HCC recurrence and patient stratification via LASSO-Cox modeling. Overall survival (OS) and recurrence-free survival (RFS) were analyzed. RNA sequencing from the Cancer Imaging Archive (TCIA) examined drug sensitivity and stratified HCC using morphological immunity genes, validating recurrence and prognosis. RESULTS Patients were split into training (n = 225), test (n = 132), and 50 TCIA dataset cohorts. Two features (Maximum2DdiameterColumn, Sphericity) in Cox regression stratified patients into high/low-risk Morphological Radiological Score (Morph-RS) groups. Significant OS and RFS were seen across all sets. Differentially expressed genes focused on T cell receptor signaling; low-risk group had higher T cells (P = 0.039), B cells (P = 0.041), NK cells (P = 0.018). SN-38, GSK2126458 might treat high-risk morphology. Morphology-immune genes stratified HCC, showing significant RFS/OS differences. CONCLUSION Tumor Morph-RS effectively stratifies HCC patients' recurrence and prognosis. Limited immune infiltration seen in Morph-RS high-risk groups signifies the potential of employing tumor morphology as a potent visual biomarker for diagnosing and managing HCC.
Collapse
Affiliation(s)
- Xinxin Wang
- Department of Radiology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Can Yu
- Department of Radiology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yu Sun
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yixin Liu
- Basic Medicine College, Harbin Medical University, Harbin, China
| | - Shuli Tang
- Department of Outpatient Chemotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yige Sun
- Department of Radiology, Harbin Medical University Cancer Hospital, Harbin, China; Genomics Research Center (Key Laboratory of Gut Microbiota and Pharmacogenomics of Heilongjiang Province, State-Province Key Laboratory of Biomedicine-Pharmaceutics of China), College of Pharmacy, Harbin Medical University, Harbin, China.
| | - Yang Zhou
- Department of Radiology, Harbin Medical University Cancer Hospital, Harbin, China.
| |
Collapse
|
5
|
Chen M, Sun M, Su X, Tiwari P, Ding Y. Fuzzy kernel evidence Random Forest for identifying pseudouridine sites. Brief Bioinform 2024; 25:bbae169. [PMID: 38622357 PMCID: PMC11018548 DOI: 10.1093/bib/bbae169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/27/2024] [Accepted: 03/31/2024] [Indexed: 04/17/2024] Open
Abstract
Pseudouridine is an RNA modification that is widely distributed in both prokaryotes and eukaryotes, and plays a critical role in numerous biological activities. Despite its importance, the precise identification of pseudouridine sites through experimental approaches poses significant challenges, requiring substantial time and resources.Therefore, there is a growing need for computational techniques that can reliably and quickly identify pseudouridine sites from vast amounts of RNA sequencing data. In this study, we propose fuzzy kernel evidence Random Forest (FKeERF) to identify pseudouridine sites. This method is called PseU-FKeERF, which demonstrates high accuracy in identifying pseudouridine sites from RNA sequencing data. The PseU-FKeERF model selected four RNA feature coding schemes with relatively good performance for feature combination, and then input them into the newly proposed FKeERF method for category prediction. FKeERF not only uses fuzzy logic to expand the original feature space, but also combines kernel methods that are easy to interpret in general for category prediction. Both cross-validation tests and independent tests on benchmark datasets have shown that PseU-FKeERF has better predictive performance than several state-of-the-art methods. This new method not only improves the accuracy of pseudouridine site identification, but also provides a certain reference for disease control and related drug development in the future.
Collapse
Affiliation(s)
- Mingshuai Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China
| | - Mingai Sun
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Xi Su
- Foshan Women and Children Hospital, Foshan 528000, China
| | - Prayag Tiwari
- School of Information Technology, Halmstad University, Sweden
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China
| |
Collapse
|
6
|
Qiu S, Sun M, Xu Y, Hu Y. Integrating multi-omics data to reveal the effect of genetic variant rs6430538 on Alzheimer's disease risk. Front Neurosci 2024; 18:1277187. [PMID: 38562299 PMCID: PMC10982421 DOI: 10.3389/fnins.2024.1277187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 02/26/2024] [Indexed: 04/04/2024] Open
Abstract
Introduction Growing evidence highlights a potential genetic overlap between Alzheimer's disease (AD) and Parkinson's disease (PD); however, the role of the PD risk variant rs6430538 in AD remains unclear. Methods In Stage 1, we investigated the risk associated with the rs6430538 C allele in seven large-scale AD genome-wide association study (GWAS) cohorts. In Stage 2, we performed expression quantitative trait loci (eQTL) analysis to calculate the cis-regulated effect of rs6430538 on TMEM163 in both AD and neuropathologically normal samples. Stage 3 involved evaluating the differential expression of TMEM163 in 4 brain tissues from AD cases and controls. Finally, in Stage 4, we conducted a transcriptome-wide association study (TWAS) to identify any association between TMEM163 expression and AD. Results The results showed that genetic variant rs6430538 C allele might increase the risk of AD. eQTL analysis revealed that rs6430538 up-regulated TMEM163 expression in AD brain tissue, but down-regulated its expression in normal samples. Interestingly, TMEM163 showed differential expression in entorhinal cortex (EC) and temporal cortex (TCX). Furthermore, the TWAS analysis indicated strong associations between TMEM163 and AD in various tissues. Discussion In summary, our findings suggest that rs6430538 may influence AD by regulating TMEM163 expression. These discoveries may open up new opportunities for therapeutic strategies targeting AD.
Collapse
Affiliation(s)
- Shizheng Qiu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Meili Sun
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yanwei Xu
- Beidahuang Group Neuropsychiatric Hospital, Jiamusi, China
| | - Yang Hu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
7
|
Li J, Chen D, Liu H, Xi Y, Luo H, Wei Y, Liu J, Liang H, Zhang Q. Identifying potential genetic epistasis implicated in Alzheimer's disease via detection of SNP-SNP interaction on quantitative trait CSF Aβ 42. Neurobiol Aging 2024; 134:84-93. [PMID: 38039940 DOI: 10.1016/j.neurobiolaging.2023.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 10/06/2023] [Accepted: 10/10/2023] [Indexed: 12/03/2023]
Abstract
Although genome-wide association studies have identified multiple Alzheimer's disease (AD)-associated loci by selecting the main effects of individual single-nucleotide polymorphisms (SNPs), the interpretation of genetic variance in AD is limited. Based on the linear regression method, we performed genome-wide SNP-SNP interaction on cerebrospinal fluid Aβ42 to identify potential genetic epistasis implicated in AD, with age, gender, and diagnosis as covariates. A GPU-based method was used to address the computational challenges posed by the analysis of epistasis. We found 368 SNP pairs to be statistically significant, and highly significant SNP-SNP interactions were identified between the marginal main effects of SNP pairs, which explained a relatively high variance at the Aβ42 level. Our results replicated 100 previously reported AD-related genes and 5 gene-gene interaction pairs of the protein-protein interaction network. Our bioinformatics analyses provided preliminary evidence that the 5-overlapping gene-gene interaction pairs play critical roles in inducing synaptic loss and dysfunction, thereby leading to memory decline and cognitive impairment in AD-affected brains.
Collapse
Affiliation(s)
- Jin Li
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| | - Dandan Chen
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China; School of Automation Engineering, Northeast Electric Power University, Jilin, China
| | - Hongwei Liu
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| | - Yang Xi
- School of Computer Science, Northeast Electric Power University, Jilin, China
| | - Haoran Luo
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| | - Yiming Wei
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| | - Junfeng Liu
- School of Computer Science, Northeast Electric Power University, Jilin, China
| | - Hong Liang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China.
| | - Qiushi Zhang
- School of Computer Science, Northeast Electric Power University, Jilin, China.
| |
Collapse
|
8
|
Wan H, Zhang Y, Huang S. Prediction of thermophilic protein using 2-D general series correlation pseudo amino acid features. Methods 2023; 218:141-148. [PMID: 37604248 DOI: 10.1016/j.ymeth.2023.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/08/2023] [Accepted: 08/18/2023] [Indexed: 08/23/2023] Open
Abstract
The demand for thermophilic protein has been increasing in protein engineering recently. Many machine-learning methods for identifying thermophilic proteins have emerged during this period. However, most machine learning-based thermophilic protein identification studies have only focused on accuracy. The relationship between the features' meaning and the proteins' physicochemical properties has yet to be studied in depth. In this article, we focused on the relationship between the features and the thermal stability of thermophilic proteins. This method used 2-D general series correlation pseudo amino acid (SC-PseAAC-General) features and realized accuracy of 82.76% using the J48 classifier. In addition, this research found the presence of higher frequencies of glutamic acid in thermophilic proteins, which help thermophilic proteins maintain their thermal stability by forming hydrogen bonds and salt bridges that prevent denaturation at high temperatures.
Collapse
Affiliation(s)
- Hao Wan
- College of Life Science, Qingdao University, Qingdao 266071, China.
| | - Yanan Zhang
- College of Life Science, Qingdao University, Qingdao 266071, China
| | - Shibo Huang
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| |
Collapse
|
9
|
Ju H, Bai J, Jiang J, Che Y, Chen X. Comparative evaluation and analysis of DNA N4-methylcytosine methylation sites using deep learning. Front Genet 2023; 14:1254827. [PMID: 37671040 PMCID: PMC10476523 DOI: 10.3389/fgene.2023.1254827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 07/31/2023] [Indexed: 09/07/2023] Open
Abstract
DNA N4-methylcytosine (4mC) is significantly involved in biological processes, such as DNA expression, repair, and replication. Therefore, accurate prediction methods are urgently needed. Deep learning methods have transformed applications that previously require sequencing expertise into engineering challenges that do not require expertise to solve. Here, we compare a variety of state-of-the-art deep learning models on six benchmark datasets to evaluate their performance in 4mC methylation site detection. We visualize the statistical analysis of the datasets and the performance of different deep-learning models. We conclude that deep learning can greatly expand the potential of methylation site prediction.
Collapse
Affiliation(s)
- Hong Ju
- Heilongjiang Agricultural Engineering Vocational College, Harbin, China
| | - Jie Bai
- Engineering Research Center of Integration and Application of Digital Learning Technology, Ministry of Education, Hangzhou, China
| | - Jing Jiang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yusheng Che
- Heilongjiang Agricultural Engineering Vocational College, Harbin, China
| | - Xin Chen
- Department of Neurosurgical Laboratory, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
10
|
Zhu W, Yuan SS, Li J, Huang CB, Lin H, Liao B. A First Computational Frame for Recognizing Heparin-Binding Protein. Diagnostics (Basel) 2023; 13:2465. [PMID: 37510209 PMCID: PMC10377868 DOI: 10.3390/diagnostics13142465] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 07/13/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023] Open
Abstract
Heparin-binding protein (HBP) is a cationic antibacterial protein derived from multinuclear neutrophils and an important biomarker of infectious diseases. The correct identification of HBP is of great significance to the study of infectious diseases. This work provides the first HBP recognition framework based on machine learning to accurately identify HBP. By using four sequence descriptors, HBP and non-HBP samples were represented by discrete numbers. By inputting these features into a support vector machine (SVM) and random forest (RF) algorithm and comparing the prediction performances of these methods on training data and independent test data, it is found that the SVM-based classifier has the greatest potential to identify HBP. The model could produce an auROC of 0.981 ± 0.028 on training data using 10-fold cross-validation and an overall accuracy of 95.0% on independent test data. As the first model for HBP recognition, it will provide some help for infectious diseases and stimulate further research in related fields.
Collapse
Affiliation(s)
- Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| | - Shi-Shi Yuan
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Jian Li
- School of Basic Medical Sciences, Chengdu University, Chengdu 610106, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, ABa Teachers University, Chengdu 623002, China
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| |
Collapse
|
11
|
Du L, Liu H, Zhang L, Lu Y, Li M, Hu Y, Zhang Y. Deep ensemble learning for accurate retinal vessel segmentation. Comput Biol Med 2023; 158:106829. [PMID: 37054633 DOI: 10.1016/j.compbiomed.2023.106829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Revised: 03/09/2023] [Accepted: 03/26/2023] [Indexed: 04/15/2023]
Abstract
Significant progress has been made in deep learning-based retinal vessel segmentation in recent years. However, the current methods suffer from low performance and the robust of the models is not that good. Our work introduces an novel framework for retinal vessel segmentation based on deep ensemble learning. The results of benchmarking comparisons indicate that our model outperforms the existing ones on multiple datasets, demonstrating that our models are more effective, superior, and robust for the retinal vessel segmentation. It evinces the capability of our model to capture the discriminative feature representations through introducing the ensemble strategy to integrate different base deep learning models like pyramid vision Transformer and FCN-Transformer. We expect our proposed method can benefit and accelerate the development of accurate retinal vessel segmentation in this field.
Collapse
Affiliation(s)
- Lingling Du
- Department of Ophthalmology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Hanruo Liu
- The Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Lan Zhang
- Department of Cardiovascular, Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Yao Lu
- Department of Ophthalmology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Mengyao Li
- Department of Ophthalmology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yang Hu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yi Zhang
- Department of Ophthalmology, The First Affiliated Hospital of Harbin Medical University, Harbin, China.
| |
Collapse
|
12
|
Zulfiqar H, Guo Z, Grace-Mercure BK, Zhang ZY, Gao H, Lin H, Wu Y. Empirical Comparison and Recent Advances of Computational Prediction of Hormone Binding Proteins Using Machine Learning Methods. Comput Struct Biotechnol J 2023; 21:2253-2261. [PMID: 37035551 PMCID: PMC10073991 DOI: 10.1016/j.csbj.2023.03.024] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 03/15/2023] [Accepted: 03/16/2023] [Indexed: 03/19/2023] Open
Abstract
Hormone binding proteins (HBPs) belong to the group of soluble carrier proteins. These proteins selectively and non-covalently interact with hormones and promote growth hormone signaling in human and other animals. The HBPs are useful in many medical and commercial fields. Thus, the identification of HBPs is very important because it can help to discover more details about hormone binding proteins. Meanwhile, the experimental methods are time-consuming and expensive for hormone binding proteins recognition. Computational prediction methods have played significant roles in the correct recognition of hormone binding proteins with the use of sequence information and ML algorithms. In this review, we compared and assessed the implementation of ML-based tools in recognition of HBPs in a unique way. We hope that this study will give enough awareness and knowledge for research on HBPs.
Collapse
|
13
|
Han K, Wang J, Wang Y, Zhang L, Yu M, Xie F, Zheng D, Xu Y, Ding Y, Wan J. A review of methods for predicting DNA N6-methyladenine sites. Brief Bioinform 2023; 24:6887111. [PMID: 36502371 DOI: 10.1093/bib/bbac514] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 10/07/2022] [Accepted: 10/27/2022] [Indexed: 12/14/2022] Open
Abstract
Deoxyribonucleic acid(DNA) N6-methyladenine plays a vital role in various biological processes, and the accurate identification of its site can provide a more comprehensive understanding of its biological effects. There are several methods for 6mA site prediction. With the continuous development of technology, traditional techniques with the high costs and low efficiencies are gradually being replaced by computer methods. Computer methods that are widely used can be divided into two categories: traditional machine learning and deep learning methods. We first list some existing experimental methods for predicting the 6mA site, then analyze the general process from sequence input to results in computer methods and review existing model architectures. Finally, the results were summarized and compared to facilitate subsequent researchers in choosing the most suitable method for their work.
Collapse
Affiliation(s)
- Ke Han
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China.,College of Pharmacy, Harbin University of Commerce, Harbin, 150076, China
| | - Jianchun Wang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Yu Wang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Lei Zhang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Mengyao Yu
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Fang Xie
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Dequan Zheng
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Yaoqun Xu
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Jie Wan
- Laboratory for Space Environment and Physical Sciences, Harbin Institute of Technology, Harbin, 150001, China
| |
Collapse
|
14
|
Identification of adaptor proteins using the ANOVA feature selection technique. Methods 2022; 208:42-47. [DOI: 10.1016/j.ymeth.2022.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 10/01/2022] [Accepted: 10/24/2022] [Indexed: 11/06/2022] Open
|
15
|
iEnhancer-MRBF: Identifying enhancers and their strength with a multiple Laplacian-regularized radial basis function network. Methods 2022; 208:1-8. [DOI: 10.1016/j.ymeth.2022.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 09/26/2022] [Accepted: 10/03/2022] [Indexed: 11/07/2022] Open
|
16
|
Shi H, Li Y, Chen Y, Qin Y, Tang Y, Zhou X, Zhang Y, Wu Y. ToxMVA: An end-to-end multi-view deep autoencoder method for protein toxicity prediction. Comput Biol Med 2022; 151:106322. [PMID: 36435057 DOI: 10.1016/j.compbiomed.2022.106322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 11/03/2022] [Accepted: 11/14/2022] [Indexed: 11/18/2022]
Abstract
Effectively predicting protein toxicity plays an essential step in the early stage of protein-based drug discovery, which is of great help to speed up novel drug screening and reduce costs. Recently, several relevant datasets have been designed, and then machine learning-based methods have been proposed to predict the toxicity of the protein and have shown satisfactory performance. However, previous studies generally directly concatenate different protein features, which may introduce irrelevant information and decrease model performance. In this study, we present a novel end-to-end deep learning-based method called ToxMVA, to predict protein toxicity. To be specific, we first build comprehensive feature profiles of proteins based on primary sequences, including sequential, physicochemical, and contextual semantic information. Next, an autoencoder network is introduced to integrate the multi-view information for obtaining a more concise and accurate feature representation. Extensive experimental results on three datasets demonstrate that ToxMVA has superior performance for protein toxicity prediction and shows better robustness among three different datasets.
Collapse
Affiliation(s)
- Hua Shi
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, 361024, Fujian, China
| | - Yan Li
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, 361024, Fujian, China
| | - Yi Chen
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, 361024, Fujian, China
| | - Yuming Qin
- Anesthesiology Department, The Affiliated Hospital of Southwest Medical University, Luzhou, 646000, Sichuan, China
| | - Yifan Tang
- Anesthesiology Department, The Affiliated Hospital of Southwest Medical University, Luzhou, 646000, Sichuan, China
| | - Xun Zhou
- Beidahuang Industry Group General Hospital, Harbin, China.
| | - Ying Zhang
- Anesthesiology Department, The Affiliated Hospital of Southwest Medical University, Luzhou, 646000, Sichuan, China.
| | - Yun Wu
- College of Computer and Information Engineering, Xiamen University of Technology, Xiamen, 361024, Fujian, China.
| |
Collapse
|
17
|
Zhao D, Wang L, Chen Z, Zhang L, Xu L. KRAS is a prognostic biomarker associated with diagnosis and treatment in multiple cancers. Front Genet 2022; 13:1024920. [PMID: 36330448 PMCID: PMC9624065 DOI: 10.3389/fgene.2022.1024920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 09/20/2022] [Indexed: 11/21/2022] Open
Abstract
KRAS encodes K-Ras proteins, which take part in the MAPK pathway. The expression level of KRAS is high in tumor patients. Our study compared KRAS expression levels between 33 kinds of tumor tissues. Additionally, we studied the association of KRAS expression levels with diagnostic and prognostic values, clinicopathological features, and tumor immunity. We established 22 immune-infiltrating cell expression datasets to calculate immune and stromal scores to evaluate the tumor microenvironment. KRAS genes, immune check-point genes and interacting genes were selected to construct the PPI network. We selected 79 immune checkpoint genes and interacting related genes to calculate the correlation. Based on the 33 tumor expression datasets, we conducted GSEA (genome set enrichment analysis) to show the KRAS and other co-expressed genes associated with cancers. KRAS may be a reliable prognostic biomarker in the diagnosis of cancer patients and has the potential to be included in cancer-targeted drugs.
Collapse
Affiliation(s)
- Da Zhao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- School of food and drug, Shenzhen Polytechnic, Shenzhen, China
| | - Lizhuang Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Zheng Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- School of food and drug, Shenzhen Polytechnic, Shenzhen, China
| | - Lijun Zhang
- School of food and drug, Shenzhen Polytechnic, Shenzhen, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
- *Correspondence: Lei Xu,
| |
Collapse
|
18
|
Chen H, Li D, Liao J, Wei L, Wei L. MultiscaleDTA: a multiscale-based method with a self-attention mechanism for drug-target binding affinity prediction. Methods 2022; 207:103-109. [PMID: 36155250 DOI: 10.1016/j.ymeth.2022.09.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 09/15/2022] [Accepted: 09/19/2022] [Indexed: 11/28/2022] Open
Abstract
The task of predicting drug-target affinity (DTA) plays an increasingly important role at the early stage of in silico drug discovery and development. Currently, a variety of machine learning-based methods have been presented for DTA prediction and achieved outstanding performance, which is beneficial for speeding up the development of new drugs. However, most convolutional neural networks (CNNs) based methods ignore the significance of information from CNN layers with different scales to DTA prediction. In addition, each feature provides different contributions to the final task. Therefore, in this study, we propose a novel end-to-end deep learning-based framework, called MultiscaleDTA, to predict drug-target binding affinity. MultiscaleDTA incorporates multi-scale CNNs and a self-attention mechanism to capture multi-scale and comprehensive features for characterizing the intrinsic properties of drugs and targets. Extensive experimental results on both regression and binary classification tasks demonstrate that MultiscaleDTA has achieved competitive performance compared to state-of-the-art methods.
Collapse
Affiliation(s)
- Haoyang Chen
- School of Mathematics and Statistics, Hainan Normal University, Hainan, China; School of Software, Shandong University, Jinan, China
| | - Dahe Li
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Jiaqi Liao
- School of Mathematics and Statistics, Hainan Normal University, Hainan, China
| | - Lesong Wei
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Leyi Wei
- School of Mathematics and Statistics, Hainan Normal University, Hainan, China; School of Software, Shandong University, Jinan, China.
| |
Collapse
|
19
|
Liu J, Li M, Chen X. AntiMF: A deep learning framework for predicting anticancer peptides based on multi-view feature extraction. Methods 2022; 207:38-43. [PMID: 36100141 DOI: 10.1016/j.ymeth.2022.07.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/20/2022] [Accepted: 07/26/2022] [Indexed: 01/10/2023] Open
Abstract
In recent years, anticancer peptides have emerged as a new viable option in cancer therapy, with the ability to overcome the considerable side effects and poor outcomes of standard cancer therapies. Accurate anticancer peptide identification can facilitate its finding and speed up its application in treating cancer. However, many recent approaches are based on machine learning, which not only restricts the representation ability of the models but also requires a complex hand-crafted feature extraction process. Here, we propose AntiMF, a deep learning model that utilizes multi-view mechanism based on different feature extraction models. Comparative results show that our model has a better performance than the state-of-the-art methods in the prediction of anticancer peptides. By using an ensemble learning framework to extract representation, AntiMF can capture the different dimensional information, which can make model representation more complete. Moreover, we visualize what AntiMF learns on one of its ensemble models to intuitively show the effectivity of our model, indicating that AntiMF has the great potential ability to be an effective and useful model to identify anticancer peptides accurately.
Collapse
Affiliation(s)
- Jingjing Liu
- Eye Hospital, The First Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| | - Minghao Li
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Xin Chen
- Eye Hospital, The First Affiliated Hospital of Harbin Medical University, Harbin 150001, China; Department of Neurosurgical Laboratory, The First Affiliated Hospital of Harbin Medical University, Harbin 150001, China.
| |
Collapse
|
20
|
Identification of DNA-binding proteins via Multi-view LSSVM with independence criterion. Methods 2022; 207:29-37. [PMID: 36087888 DOI: 10.1016/j.ymeth.2022.08.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 08/06/2022] [Accepted: 08/25/2022] [Indexed: 11/24/2022] Open
Abstract
DNA-binding proteins actively participate in life activities such as DNA replication, recombination, gene expression and regulation and play a prominent role in these processes. As DNA-binding proteins continue to be discovered and increase, it is imperative to design an efficient and accurate identification tool. Considering the time-consuming and expensive traditional experimental technology and the insufficient number of samples in the biological computing method based on structural information, we proposed a machine learning algorithm based on sequence information to identify DNA binding proteins, named multi-view Least Squares Support Vector Machine via Hilbert-Schmidt Independence Criterion (multi-view LSSVM via HSIC). This method took 6 feature sets as multi-view input and trains a single view through the LSSVM algorithm. Then, we integrated HSIC into LSSVM as a regular term to reduce the dependence between views and explored the complementary information of multiple views. Subsequently, we trained and coordinated the submodels and finally combined the submodels in the form of weights to obtain the final prediction model. On training set PDB1075, the prediction results of our model were better than those of most existing methods. Independent tests are conducted on the datasets PDB186 and PDB2272. The accuracy of the prediction results was 85.5% and 79.36%, respectively. This result exceeded the current state-of-the-art methods, which showed that the multi-view LSSVM via HSIC can be used as an efficient predictor.
Collapse
|
21
|
Ma J, Qiu S. Genetic variant rs11136000 upregulates clusterin expression and reduces Alzheimer's disease risk. Front Neurosci 2022; 16:926830. [PMID: 36033622 PMCID: PMC9407972 DOI: 10.3389/fnins.2022.926830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Accepted: 07/11/2022] [Indexed: 11/29/2022] Open
Abstract
Clusterin (CLU) is an extracellular chaperone involved in reducing amyloid beta (Aβ) toxicity and aggregation. Although previous genome-wide association studies (GWAS) have reported a potential protective effect of CLU on Alzheimer's disease (AD) patients, how intron-located rs11136000 (CLU) affects AD risk by regulating CLU expression remains unknown. In this study, we integrated multiple omics data to construct the regulated pathway of rs11136000-CLU-AD. In step 1, we investigated the effects of variant rs11136000 on AD risk with different genders and diagnostic methods using GWAS summary statistics for AD from International Genomics of Alzheimer's Project (IGAP) and UK Biobank. In step 2, we assessed the regulation of rs11136000 on CLU expression in AD brain samples from Mayo clinic and controls from Genotype-Tissue Expression (GTEx). In step 3, we investigated the differential gene/protein expression of CLU in AD and controls from four large cohorts. The results showed that rs11136000 T allele reduced AD risk in either clinically diagnosed or proxy AD patients. By using expression quantitative trait loci (eQTL) analysis, rs11136000 variant downregulated CLU expression in 13 normal brain tissues, but upregulated CLU expression in cerebellum and temporal cortex of AD samples. Importantly, CLU was significantly differentially expressed in temporal cortex, dorsolateral prefrontal cortex and anterior prefrontal cortex of AD patients compared with normal controls. Together, rs11136000 may reduce AD risk by regulating CLU expression, which may provide important information about the biological mechanism of rs9848497 in AD progress.
Collapse
Affiliation(s)
- Jin Ma
- Department of Emergency Medicine, Affiliated Kunshan Hospital of Jiangsu University, Kunshan, China
| | - Shizheng Qiu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
22
|
Eissman JM, Dumitrescu L, Mahoney ER, Smith AN, Mukherjee S, Lee ML, Scollard P, Choi SE, Bush WS, Engelman CD, Lu Q, Fardo DW, Trittschuh EH, Mez J, Kaczorowski CC, Hernandez Saucedo H, Widaman KF, Buckley RF, Properzi MJ, Mormino EC, Yang HS, Harrison TM, Hedden T, Nho K, Andrews SJ, Tommet D, Hadad N, Sanders RE, Ruderfer DM, Gifford KA, Zhong X, Raghavan NS, Vardarajan BN, Pericak-Vance MA, Farrer LA, Wang LS, Cruchaga C, Schellenberg GD, Cox NJ, Haines JL, Keene CD, Saykin AJ, Larson EB, Sperling RA, Mayeux R, Cuccaro ML, Bennett DA, Schneider JA, Crane PK, Jefferson AL, Hohman TJ. Sex differences in the genetic architecture of cognitive resilience to Alzheimer's disease. Brain 2022; 145:2541-2554. [PMID: 35552371 PMCID: PMC9337804 DOI: 10.1093/brain/awac177] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 04/07/2022] [Accepted: 04/14/2022] [Indexed: 12/04/2022] Open
Abstract
Approximately 30% of elderly adults are cognitively unimpaired at time of death despite the presence of Alzheimer's disease neuropathology at autopsy. Studying individuals who are resilient to the cognitive consequences of Alzheimer's disease neuropathology may uncover novel therapeutic targets to treat Alzheimer's disease. It is well established that there are sex differences in response to Alzheimer's disease pathology, and growing evidence suggests that genetic factors may contribute to these differences. Taken together, we sought to elucidate sex-specific genetic drivers of resilience. We extended our recent large scale genomic analysis of resilience in which we harmonized cognitive data across four cohorts of cognitive ageing, in vivo amyloid PET across two cohorts, and autopsy measures of amyloid neuritic plaque burden across two cohorts. These data were leveraged to build robust, continuous resilience phenotypes. With these phenotypes, we performed sex-stratified [n (males) = 2093, n (females) = 2931] and sex-interaction [n (both sexes) = 5024] genome-wide association studies (GWAS), gene and pathway-based tests, and genetic correlation analyses to clarify the variants, genes and molecular pathways that relate to resilience in a sex-specific manner. Estimated among cognitively normal individuals of both sexes, resilience was 20-25% heritable, and when estimated in either sex among cognitively normal individuals, resilience was 15-44% heritable. In our GWAS, we identified a female-specific locus on chromosome 10 [rs827389, β (females) = 0.08, P (females) = 5.76 × 10-09, β (males) = -0.01, P(males) = 0.70, β (interaction) = 0.09, P (interaction) = 1.01 × 10-04] in which the minor allele was associated with higher resilience scores among females. This locus is located within chromatin loops that interact with promoters of genes involved in RNA processing, including GATA3. Finally, our genetic correlation analyses revealed shared genetic architecture between resilience phenotypes and other complex traits, including a female-specific association with frontotemporal dementia and male-specific associations with heart rate variability traits. We also observed opposing associations between sexes for multiple sclerosis, such that more resilient females had a lower genetic susceptibility to multiple sclerosis, and more resilient males had a higher genetic susceptibility to multiple sclerosis. Overall, we identified sex differences in the genetic architecture of resilience, identified a female-specific resilience locus and highlighted numerous sex-specific molecular pathways that may underly resilience to Alzheimer's disease pathology. This study illustrates the need to conduct sex-aware genomic analyses to identify novel targets that are unidentified in sex-agnostic models. Our findings support the theory that the most successful treatment for an individual with Alzheimer's disease may be personalized based on their biological sex and genetic context.
Collapse
Affiliation(s)
- Jaclyn M Eissman
- Vanderbilt Memory and Alzheimer's Center, Vanderbilt University Medical
Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical
Center, Nashville, TN, USA
| | - Logan Dumitrescu
- Vanderbilt Memory and Alzheimer's Center, Vanderbilt University Medical
Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical
Center, Nashville, TN, USA
| | - Emily R Mahoney
- Vanderbilt Memory and Alzheimer's Center, Vanderbilt University Medical
Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical
Center, Nashville, TN, USA
| | - Alexandra N Smith
- Vanderbilt Memory and Alzheimer's Center, Vanderbilt University Medical
Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical
Center, Nashville, TN, USA
| | | | - Michael L Lee
- Department of Medicine, University of Washington,
Seattle, WA, USA
| | - Phoebe Scollard
- Department of Medicine, University of Washington,
Seattle, WA, USA
| | - Seo Eun Choi
- Department of Medicine, University of Washington,
Seattle, WA, USA
| | - William S Bush
- Cleveland Institute for Computational Biology, Department of Population and
Quantitative Health Sciences, Case Western Reserve University,
Cleveland, OH, USA
| | - Corinne D Engelman
- Department of Population Health Sciences, School of Medicine and Public
Health, University of Wisconsin-Madison, Madison,
WI, USA
| | - Qiongshi Lu
- Department of Statistics, University of Wisconsin-Madison,
Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of
Wisconsin-Madison, Madison, WI, USA
| | - David W Fardo
- Department of Biostatistics, College of Public Health, University of
Kentucky, Lexington, KY, USA
- Sanders-Brown Center on Aging, University of Kentucky,
Lexington, KY, USA
| | - Emily H Trittschuh
- Department of Psychiatry and Behavioral Sciences, University of Washington
School of Medicine, Seattle, WA, USA
- VA Puget Sound Health Care System, GRECC, Seattle,
WA, USA
| | - Jesse Mez
- Department of Neurology, Boston University School of
Medicine, Boston, MA, USA
| | | | - Hector Hernandez Saucedo
- UC Davis Alzheimer's Disease Research Center, Department of Neurology,
University of California Davis Medical Center, Sacramento,
CA, USA
| | | | - Rachel F Buckley
- Department of Neurology, Massachusetts General Hospital/Harvard Medical
School, Boston, MA, USA
- Center for Alzheimer's Research and Treatment, Department of Neurology,
Brigham and Women’s Hospital/Harvard Medical School, Boston,
MA, USA
- Melbourne School of Psychological Sciences, University of
Melbourne, Melbourne, Australia
| | - Michael J Properzi
- Department of Neurology, Massachusetts General Hospital/Harvard Medical
School, Boston, MA, USA
| | - Elizabeth C Mormino
- Department of Neurology and Neurological Sciences, Stanford
University, Stanford, CA, USA
| | - Hyun Sik Yang
- Department of Neurology, Massachusetts General Hospital/Harvard Medical
School, Boston, MA, USA
- Center for Alzheimer's Research and Treatment, Department of Neurology,
Brigham and Women’s Hospital/Harvard Medical School, Boston,
MA, USA
| | - Theresa M Harrison
- Helen Wills Neuroscience Institute, University of California
Berkeley, Berkeley, CA, USA
| | - Trey Hedden
- Icahn School of Medicine at Mount Sinai, New York
City, NY, USA
| | - Kwangsik Nho
- Department of Radiology and Imaging Sciences, Indiana Alzheimer Disease
Center, Indiana University School of Medicine, Indianapolis,
IN, USA
- Center for Computational Biology and Bioinformatics, Indiana University
School of Medicine, Indianapolis, IN, USA
| | - Shea J Andrews
- Icahn School of Medicine at Mount Sinai, New York
City, NY, USA
| | - Douglas Tommet
- Department of Psychiatry and Human Behavior, Brown University School of
Medicine, Providence, RI, USA
| | | | | | - Douglas M Ruderfer
- Vanderbilt Genetics Institute, Vanderbilt University Medical
Center, Nashville, TN, USA
| | - Katherine A Gifford
- Vanderbilt Memory and Alzheimer's Center, Vanderbilt University Medical
Center, Nashville, TN, USA
| | - Xiaoyuan Zhong
- Department of Statistics, University of Wisconsin-Madison,
Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of
Wisconsin-Madison, Madison, WI, USA
| | - Neha S Raghavan
- Department of Neurology, Columbia University, New
York, NY, USA
- The Taub Institute for Research on Alzheimer's Disease and The Aging Brain,
Columbia University, New York, NY, USA
- The Institute for Genomic Medicine, Columbia University Medical Center and
The New York Presbyterian Hospital, New York, NY,
USA
| | - Badri N Vardarajan
- Department of Neurology, Columbia University, New
York, NY, USA
- The Taub Institute for Research on Alzheimer's Disease and The Aging Brain,
Columbia University, New York, NY, USA
- The Institute for Genomic Medicine, Columbia University Medical Center and
The New York Presbyterian Hospital, New York, NY,
USA
| | | | | | | | - Margaret A Pericak-Vance
- John P. Hussman Institute for Human Genomics, University of Miami School of
Medicine, Miami, FL, USA
| | - Lindsay A Farrer
- Department of Neurology, Boston University School of
Medicine, Boston, MA, USA
- Department of Biostatistics, Boston University School of Public
Health, Boston, MA, USA
- Department of Medicine (Biomedical Genetics), Boston University School of
Medicine, Boston, MA, USA
| | - Li San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and
Laboratory Medicine, University of Pennsylvania Perelman School of
Medicine, Philadelphia, PA, USA
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University School of
Medicine, St. Louis, MO, USA
| | - Gerard D Schellenberg
- Penn Neurodegeneration Genomics Center, Department of Pathology and
Laboratory Medicine, University of Pennsylvania Perelman School of
Medicine, Philadelphia, PA, USA
| | - Nancy J Cox
- Vanderbilt Genetics Institute, Vanderbilt University Medical
Center, Nashville, TN, USA
| | - Jonathan L Haines
- Cleveland Institute for Computational Biology, Department of Population and
Quantitative Health Sciences, Case Western Reserve University,
Cleveland, OH, USA
| | - C Dirk Keene
- Department of Pathology, University of Washington,
Seattle, WA, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of
Medicine, Indianapolis, IN, USA
| | - Eric B Larson
- Department of Medicine, University of Washington,
Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute,
Seattle, WA, USA
| | - Reisa A Sperling
- Department of Neurology, Massachusetts General Hospital/Harvard Medical
School, Boston, MA, USA
| | - Richard Mayeux
- Department of Neurology, Columbia University, New
York, NY, USA
- The Taub Institute for Research on Alzheimer's Disease and The Aging Brain,
Columbia University, New York, NY, USA
- The Institute for Genomic Medicine, Columbia University Medical Center and
The New York Presbyterian Hospital, New York, NY,
USA
| | - Michael L Cuccaro
- John P. Hussman Institute for Human Genomics, University of Miami School of
Medicine, Miami, FL, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical
Center, Chicago, IL, USA
| | - Julie A Schneider
- Rush Alzheimer's Disease Center, Rush University Medical
Center, Chicago, IL, USA
| | - Paul K Crane
- Department of Medicine, University of Washington,
Seattle, WA, USA
| | - Angela L Jefferson
- Vanderbilt Memory and Alzheimer's Center, Vanderbilt University Medical
Center, Nashville, TN, USA
| | - Timothy J Hohman
- Vanderbilt Memory and Alzheimer's Center, Vanderbilt University Medical
Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical
Center, Nashville, TN, USA
| |
Collapse
|
23
|
An integrated pan-cancer analysis of identifying biomarkers about the EGR family genes in human carcinomas. Comput Biol Med 2022; 148:105889. [DOI: 10.1016/j.compbiomed.2022.105889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/25/2022] [Accepted: 07/16/2022] [Indexed: 12/24/2022]
|
24
|
Liu P, Ding Y, Rong Y, Chen D. Prediction of cell penetrating peptides and their uptake efficiency using random forest‐based feature selections. AIChE J 2022. [DOI: 10.1002/aic.17781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Peng Liu
- Institute of Fundamental and Frontier Sciences University of Electronic Science and Technology of China Chengdu China
- Institute of Yangtze Delta Region (Quzhou) University of Electronic Science and Technology of China Quzhou China
| | - Yijie Ding
- Institute of Yangtze Delta Region (Quzhou) University of Electronic Science and Technology of China Quzhou China
| | - Ying Rong
- Beidahuang Industry Group General Hospital Harbin China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University Quzhou China
| |
Collapse
|
25
|
Zhao S, Pan Q, Zou Q, Ju Y, Shi L, Su X. Identifying and Classifying Enhancers by Dinucleotide-Based Auto-Cross Covariance and Attention-Based Bi-LSTM. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:7518779. [PMID: 35422876 PMCID: PMC9005296 DOI: 10.1155/2022/7518779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/12/2022] [Indexed: 11/17/2022]
Abstract
Enhancers are a class of noncoding DNA elements located near structural genes. In recent years, their identification and classification have been the focus of research in the field of bioinformatics. However, due to their high free scattering and position variability, although the performance of the prediction model has been continuously improved, there is still a lot of room for progress. In this paper, density-based spatial clustering of applications with noise (DBSCAN) was used to screen the physicochemical properties of dinucleotides to extract dinucleotide-based auto-cross covariance (DACC) features; then, the features are reduced by feature selection Python toolkit MRMD 2.0. The reduced features are input into the random forest to identify enhancers. The enhancer classification model was built by word2vec and attention-based Bi-LSTM. Finally, the accuracies of our enhancer identification and classification models were 77.25% and 73.50%, respectively, and the Matthews' correlation coefficients (MCCs) were 0.5470 and 0.4881, respectively, which were better than the performance of most predictors.
Collapse
Affiliation(s)
- Shulin Zhao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Qingfeng Pan
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Xi Su
- Foshan Maternal and Child Health Hospital, Foshan, Guangdong, China
| |
Collapse
|
26
|
He S, Dou L, Li X, Zhang Y. Review of bioinformatics in Azheimer's Disease Research. Comput Biol Med 2022; 143:105269. [PMID: 35158118 DOI: 10.1016/j.compbiomed.2022.105269] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 01/21/2022] [Accepted: 01/23/2022] [Indexed: 01/05/2023]
Abstract
Alzheimer's disease (AD) is a severe neurodegenerative disease with slow course of onset and deterioration with time. With the speedup of global aging, AD has become a disease that seriously threatens the physical health of the elderly; therefore, the effective prevention and treatments of AD is an extremely important area of study for researchers and clinicians. Rapid technological developments have promoted the analysis of various kinds of complex data sets using machine learning methods. The common machine learning algorithms, such as Lasso, SVM and Random Forest, are very important in AD research. To help accelerate AD-related research, we review some recent research progress on Alzheimer's disease, including database, image analysis, gene expression, etc., which can provide AD researchers with more comprehensive research methods.
Collapse
Affiliation(s)
- Shida He
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China; Department of Computer Science, University of Tsukuba, Japan
| | - Lijun Dou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China; School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Xuehong Li
- Beidahuang Industry Group General Hospital, Harbin, China.
| | - Ying Zhang
- Department of Anesthesiology, Hospital (T.C.M) Affiliated To Southwest Medical University, Luzhou, China.
| |
Collapse
|
27
|
Chen Y, Gong Y, Dou L, Zhou X, Zhang Y. Bioinformatics analysis methods for cell-free DNA. Comput Biol Med 2022; 143:105283. [PMID: 35149459 DOI: 10.1016/j.compbiomed.2022.105283] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 01/29/2022] [Accepted: 01/30/2022] [Indexed: 12/13/2022]
Abstract
As a kind of novel non-invasive marker for molecular detection, cell-free DNA (cfDNA) has potential value for the early diagnosis of diseases, prognosis assessment, and efficacy monitoring. The constant developments in molecular biology detection technologies have led to an increase in clinical studies on the use of cfDNA detection methods for patients, and many gratifying outcomes have been achieved. In this review, the contributions of bioinformatics tools to the study of cfDNA are well discussed. The focus of the review is on cfDNA identification signals, cfDNA identification methods, and the relationship of cfDNA with human diseases such as hepatic cancer, lung cancer, end-stage kidney disease, and ischemic stroke. The research significance and existing problems of using cfDNA as a biomarker for diseases are also discussed.
Collapse
Affiliation(s)
- Yaojia Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yuxin Gong
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China; School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lijun Dou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China; School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Xun Zhou
- Beidahuang Industry Group General Hospital, Harbin, China.
| | - Ying Zhang
- Department of Anesthesiology, Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, China.
| |
Collapse
|
28
|
Chen Y, Wang Y, Ding Y, Su X, Wang C. RGCNCDA: Relational graph convolutional network improves circRNA-disease association prediction by incorporating microRNAs. Comput Biol Med 2022; 143:105322. [PMID: 35217342 DOI: 10.1016/j.compbiomed.2022.105322] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 02/11/2022] [Accepted: 02/13/2022] [Indexed: 12/21/2022]
Abstract
Recently, a large number of studies have indicated that circRNAs with covalently closed loops play important roles in biological processes and have potential as diagnostic biomarkers. Therefore, research on the circRNA-disease relationship is helpful in disease diagnosis and treatment. However, traditional biological verification methods require considerable labor and time costs. In this paper, we propose a new computational method (RGCNCDA) to predict circRNA-disease associations based on relational graph convolutional networks (R-GCNs). The method first integrates the circRNA similarity network, miRNA similarity network, disease similarity network and association networks among them to construct a global heterogeneous network. Then, it employs the random walk with restart (RWR) and principal component analysis (PCA) models to learn low-dimensional and high-order information from the global heterogeneous network as the topological features. Finally, a prediction model based on an R-GCN encoder and a DistMult decoder is built to predict the potential disease-associated circRNA. The predicted results demonstrate that RGCNCDA performs significantly better than the other six state-of-the-art methods in a 5-fold cross validation. Furthermore, the case study illustrates that RGCNCDA can effectively discover potential circRNA-disease associations.
Collapse
Affiliation(s)
- Yaojia Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yanpeng Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Xi Su
- Foshan Maternity & Child Healthcare Hospital, Southern Medical University, Foshan, China.
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
29
|
Shi H, Li S, Su X. Plant6mA: a predictor for predicting N6-methyladenine sites with lightweight structure in plant genomes. Methods 2022; 204:126-131. [DOI: 10.1016/j.ymeth.2022.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 02/20/2022] [Accepted: 02/24/2022] [Indexed: 10/19/2022] Open
|
30
|
Han K, Cao P, Wang Y, Xie F, Ma J, Yu M, Wang J, Xu Y, Zhang Y, Wan J. A Review of Approaches for Predicting Drug-Drug Interactions Based on Machine Learning. Front Pharmacol 2022; 12:814858. [PMID: 35153767 PMCID: PMC8835726 DOI: 10.3389/fphar.2021.814858] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 12/20/2021] [Indexed: 01/01/2023] Open
Abstract
Drug-drug interactions play a vital role in drug research. However, they may also cause adverse reactions in patients, with serious consequences. Manual detection of drug-drug interactions is time-consuming and expensive, so it is urgent to use computer methods to solve the problem. There are two ways for computers to identify drug interactions: one is to identify known drug interactions, and the other is to predict unknown drug interactions. In this paper, we review the research progress of machine learning in predicting unknown drug interactions. Among these methods, the literature-based method is special because it combines the extraction method of DDI and the prediction method of DDI. We first introduce the common databases, then briefly describe each method, and summarize the advantages and disadvantages of some prediction models. Finally, we discuss the challenges and prospects of machine learning methods in predicting drug interactions. This review aims to provide useful guidance for interested researchers to further promote bioinformatics algorithms to predict DDI.
Collapse
Affiliation(s)
- Ke Han
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
- College of Pharmacy, Harbin University of Commerce, Harbin, China
| | - Peigang Cao
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yu Wang
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Fang Xie
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Jiaqi Ma
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Mengyao Yu
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Jianchun Wang
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Yaoqun Xu
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Yu Zhang
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Jie Wan
- Laboratory for Space Environment and Physical Sciences, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
31
|
Chen Z, Guo Y, Zhao D, Zou Q, Yu F, Zhang L, Xu L. Comprehensive Analysis Revealed that CDKN2A is a Biomarker for Immune Infiltrates in Multiple Cancers. Front Cell Dev Biol 2022; 9:808208. [PMID: 35004697 PMCID: PMC8733648 DOI: 10.3389/fcell.2021.808208] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 12/06/2021] [Indexed: 01/22/2023] Open
Abstract
The CDKN2A (cyclin dependent kinase inhibitor 2A/multiple tumor suppressor 1) gene, also known as the P16 gene, encodes multiple tumor suppressor 1 (MTS1), which belongs to the INK4 family. In tumor tissue, CDKN2A has a high expression level compared with normal tissue and reflects prognosis in tumor patients. Our research targeted the analysis of CDKN2A expression in 33 tumors and clinical parameters, patient prognosis and tumor immunity roles. The CDKN2A expression level was significantly correlated with the tumor mutation burden (TMB) in 10 tumors, and the expression of CDKN2A was also correlated with MSI (microsatellite instability) in 10 tumors. CDKN2A expression was associated with infiltrating lymphocyte (TIL) levels in 22 pancancers, thus suggesting that CDKN2A expression is associated with tumor immunity. Enrichment analysis indicated that CDKN2A expression was involved in natural killer cell-mediated cytotoxicity pathways, antigen processing and presentation, olfactory transduction pathways, and regulation of the autophagy pathway in multiple cancers. CDKN2A was significantly associated with several immune cell infiltrates in pantumors. CDKN2A may serve as a promising prognostic biomarker and is associated with immune infiltrates across cancers.
Collapse
Affiliation(s)
- Zheng Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
| | - Yingjie Guo
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Da Zhao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fusheng Yu
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Lijun Zhang
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| |
Collapse
|
32
|
Zhang Z, Cui F, Cao C, Wang Q, Zou Q. Single-cell RNA analysis reveals the potential risk of organ-specific cell types vulnerable to SARS-CoV-2 infections. Comput Biol Med 2022; 140:105092. [PMID: 34864302 PMCID: PMC8628631 DOI: 10.1016/j.compbiomed.2021.105092] [Citation(s) in RCA: 64] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 11/22/2021] [Accepted: 11/26/2021] [Indexed: 12/20/2022]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a global pandemic of coronavirus disease 2019 (COVID-19) since December 2019 that has led to more than 160 million confirmed cases, including 3.3 million deaths. To understand the mechanism by which SARS-CoV-2 invades human cells and reveal organ-specific susceptible cell types for COVID-19, we conducted comprehensive bioinformatic analysis using public single-cell RNA sequencing datasets. Utilizing the expression information of six confirmed COVID-19 receptors (ACE2, TMPRSS2, NRP1, AXL, FURIN and CTSL), we demonstrated that macrophages are the most likely cells that may be associated with SARS-CoV-2 pathogenesis in lung. Besides the widely reported 'chemokine storm', we identified ribosome related pathways that may also be potential therapeutic target for COVID-19 lung infection patients. Moreover, cell-cell communication analysis and trajectory analysis revealed that M1-like macrophages showed the highest relation to severe COVID-19 patients. And we also demonstrated that up-regulation of chemokine pathways generally lead to severe symptoms, while down-regulation of ribosome and RNA activity related pathways are more likely to be mild. Other organ-specific susceptible cell type analyses could also provide potential targets for COVID-19 therapy. This work can provide clues for understanding the pathogenesis of COVID-19 and contribute to understanding the mechanism by which SARS-CoV-2 invades human cells.
Collapse
Affiliation(s)
- Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Feifei Cui
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Chen Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Qingsuo Wang
- Beidahuang Industry Group General Hospital, Harbin, 150001, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China.
| |
Collapse
|
33
|
Cui F, Li S, Zhang Z, Sui M, Cao C, El-Latif Hesham A, Zou Q. DeepMC-iNABP: Deep learning for multiclass identification and classification of nucleic acid-binding proteins. Comput Struct Biotechnol J 2022; 20:2020-2028. [PMID: 35521556 PMCID: PMC9065708 DOI: 10.1016/j.csbj.2022.04.029] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 04/06/2022] [Accepted: 04/20/2022] [Indexed: 11/29/2022] Open
Abstract
Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play vital roles in gene expression. Accurate identification of these proteins is crucial. However, there are two existing challenges: one is the problem of ignoring DNA- and RNA-binding proteins (DRBPs), and the other is a cross-predicting problem referring to DBP predictors predicting DBPs as RBPs, and vice versa. In this study, we proposed a computational predictor, called DeepMC-iNABP, with the goal of solving these difficulties by utilizing a multiclass classification strategy and deep learning approaches. DBPs, RBPs, DRBPs and non-NABPs as separate classes of data were used for training the DeepMC-iNABP model. The results on test data collected in this study and two independent test datasets showed that DeepMC-iNABP has a strong advantage in identifying the DRBPs and has the ability to alleviate the cross-prediction problem to a certain extent. The web-server of DeepMC-iNABP is freely available at http://www.deepmc-inabp.net/. The datasets used in this research can also be downloaded from the website.
Collapse
Affiliation(s)
- Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Shuang Li
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Miaomiao Sui
- Graduate School Agricultural and Life Science, The University of Tokyo, Tokyo 1138657, Japan
| | - Chen Cao
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Abd El-Latif Hesham
- Genetics Department, Faculty of Agriculture, Beni-Suef University, Beni-Suef 62511, Egypt
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
- Corresponding author at: Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
34
|
Guo X, Zhou W, Yu Y, Cai Y, Zhang Y, Du A, Lu Q, Ding Y, Li C. Multiple Laplacian Regularized RBF Neural Network for Assessing Dry Weight of Patients With End-Stage Renal Disease. Front Physiol 2021; 12:790086. [PMID: 34966294 PMCID: PMC8711098 DOI: 10.3389/fphys.2021.790086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 11/17/2021] [Indexed: 11/28/2022] Open
Abstract
Dry weight (DW) is an important dialysis index for patients with end-stage renal disease. It can guide clinical hemodialysis. Brain natriuretic peptide, chest computed tomography image, ultrasound, and bioelectrical impedance analysis are key indicators (multisource information) for assessing DW. By these approaches, a trial-and-error method (traditional measurement method) is employed to assess DW. The assessment of clinician is time-consuming. In this study, we developed a method based on artificial intelligence technology to estimate patient DW. Based on the conventional radial basis function neural (RBFN) network, we propose a multiple Laplacian-regularized RBFN (MLapRBFN) model to predict DW of patient. Compared with other model and body composition monitor, our method achieves the lowest value (1.3226) of root mean square error. In Bland-Altman analysis of MLapRBFN, the number of out agreement interval is least (17 samples). MLapRBFN integrates multiple Laplace regularization terms, and employs an efficient iterative algorithm to solve the model. The ratio of out agreement interval is 3.57%, which is lower than 5%. Therefore, our method can be tentatively applied for clinical evaluation of DW in hemodialysis patients.
Collapse
Affiliation(s)
- Xiaoyi Guo
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Zhou
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yan Yu
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yinghua Cai
- Department of Nursing, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yuan Zhang
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Aiyan Du
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Qun Lu
- Department of Nursing, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Chao Li
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| |
Collapse
|
35
|
Gong Y, Zhu W, Sun M, Shi L. Bioinformatics Analysis of Long Non-coding RNA and Related Diseases: An Overview. Front Genet 2021; 12:813873. [PMID: 34956340 PMCID: PMC8692768 DOI: 10.3389/fgene.2021.813873] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 11/26/2021] [Indexed: 12/30/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are usually located in the nucleus and cytoplasm of cells. The transcripts of lncRNAs are >200 nucleotides in length and do not encode proteins. Compared with small RNAs, lncRNAs have longer sequences, more complex spatial structures, and more diverse and complex mechanisms involved in the regulation of gene expression. LncRNAs are widely involved in the biological processes of cells, and in the occurrence and development of many human diseases. Many studies have shown that lncRNAs can induce the occurrence of diseases, and some lncRNAs undergo specific changes in tumor cells. Research into the roles of lncRNAs has covered the diagnosis of, for example, cardiovascular, cerebrovascular, and central nervous system diseases. The bioinformatics of lncRNAs has gradually become a research hotspot and has led to the discovery of a large number of lncRNAs and associated biological functions, and lncRNA databases and recognition models have been developed. In this review, the research progress of lncRNAs is discussed, and lncRNA-related databases and the mechanisms and modes of action of lncRNAs are described. In addition, disease-related lncRNA methods and the relationships between lncRNAs and human lung adenocarcinoma, rectal cancer, colon cancer, heart disease, and diabetes are discussed. Finally, the significance and existing problems of lncRNA research are considered.
Collapse
Affiliation(s)
- Yuxin Gong
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Meili Sun
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
| |
Collapse
|
36
|
Guo Y, Cheng H, Yuan Z, Liang Z, Wang Y, Du D. Testing Gene-Gene Interactions Based on a Neighborhood Perspective in Genome-wide Association Studies. Front Genet 2021; 12:801261. [PMID: 34956337 PMCID: PMC8693929 DOI: 10.3389/fgene.2021.801261] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 11/15/2021] [Indexed: 12/21/2022] Open
Abstract
Unexplained genetic variation that causes complex diseases is often induced by gene-gene interactions (GGIs). Gene-based methods are one of the current statistical methodologies for discovering GGIs in case-control genome-wide association studies that are not only powerful statistically, but also interpretable biologically. However, most approaches include assumptions about the form of GGIs, which results in poor statistical performance. As a result, we propose gene-based testing based on the maximal neighborhood coefficient (MNC) called gene-based gene-gene interaction through a maximal neighborhood coefficient (GBMNC). MNC is a metric for capturing a wide range of relationships between two random vectors with arbitrary, but not necessarily equal, dimensions. We established a statistic that leverages the difference in MNC in case and in control samples as an indication of the existence of GGIs, based on the assumption that the joint distribution of two genes in cases and controls should not be substantially different if there is no interaction between them. We then used a permutation-based statistical test to evaluate this statistic and calculate a statistical p-value to represent the significance of the interaction. Experimental results using both simulation and real data showed that our approach outperformed earlier methods for detecting GGIs.
Collapse
Affiliation(s)
- Yingjie Guo
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Honghong Cheng
- School of Information, Shanxi University of Finance and Economics, Taiyuan, China
| | - Zhian Yuan
- Research Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
| | - Zhen Liang
- School of Life Science, Shanxi University, Taiyuan, China
| | - Yang Wang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Debing Du
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
37
|
Chen Y, Juan L, Lv X, Shi L. Bioinformatics Research on Drug Sensitivity Prediction. Front Pharmacol 2021; 12:799712. [PMID: 34955863 PMCID: PMC8696280 DOI: 10.3389/fphar.2021.799712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 11/18/2021] [Indexed: 11/28/2022] Open
Abstract
Modeling-based anti-cancer drug sensitivity prediction has been extensively studied in recent years. While most drug sensitivity prediction models only use gene expression data, the remarkable impacts of gene mutation, methylation, and copy number variation on drug sensitivity are neglected. Drug sensitivity prediction can both help protect patients from some adverse drug reactions and improve the efficacy of treatment. Genomics data are extremely useful for drug sensitivity prediction task. This article reviews the role of drug sensitivity prediction, describes a variety of methods for predicting drug sensitivity. Moreover, the research significance of drug sensitivity prediction, as well as existing problems are well discussed.
Collapse
Affiliation(s)
- Yaojia Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiao Lv
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Lei Shi
- Department of Spine Surgery Changzheng Hospital, Naval Medical University, Shanghai, China
| |
Collapse
|
38
|
Guo Y, Ju Y, Chen D, Wang L. Research on the Computational Prediction of Essential Genes. Front Cell Dev Biol 2021; 9:803608. [PMID: 34938741 PMCID: PMC8685449 DOI: 10.3389/fcell.2021.803608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 11/22/2021] [Indexed: 11/19/2022] Open
Abstract
Genes, the nucleotide sequences that encode a polypeptide chain or functional RNA, are the basic genetic unit controlling biological traits. They are the guarantee of the basic structures and functions in organisms, and they store information related to biological factors and processes such as blood type, gestation, growth, and apoptosis. The environment and genetics jointly affect important physiological processes such as reproduction, cell division, and protein synthesis. Genes are related to a wide range of phenomena including growth, decline, illness, aging, and death. During the evolution of organisms, there is a class of genes that exist in a conserved form in multiple species. These genes are often located on the dominant strand of DNA and tend to have higher expression levels. The protein encoded by it usually either performs very important functions or is responsible for maintaining and repairing these essential functions. Such genes are called persistent genes. Among them, the irreplaceable part of the body’s life activities is the essential gene. For example, when starch is the only source of energy, the genes related to starch digestion are essential genes. Without them, the organism will die because it cannot obtain enough energy to maintain basic functions. The function of the proteins encoded by these genes is thought to be fundamental to life. Nowadays, DNA can be extracted from blood, saliva, or tissue cells for genetic testing, and detailed genetic information can be obtained using the most advanced scientific instruments and technologies. The information gained from genetic testing is useful to assess the potential risks of disease, and to help determine the prognosis and development of diseases. Such information is also useful for developing personalized medication and providing targeted health guidance to improve the quality of life. Therefore, it is of great theoretical and practical significance to identify important and essential genes. In this paper, the research status of essential genes and the essential genome database of bacteria are reviewed, the computational prediction method of essential genes based on communication coding theory is expounded, and the significance and practical application value of essential genes are discussed.
Collapse
Affiliation(s)
- Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Lihong Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
39
|
Gu X, Guo L, Liao B, Jiang Q. Pseudo-188D: Phage Protein Prediction Based on a Model of Pseudo-188D. Front Genet 2021; 12:796327. [PMID: 34925468 PMCID: PMC8672092 DOI: 10.3389/fgene.2021.796327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
Phages have seriously affected the biochemical systems of the world, and not only are phages related to our health, but medical treatments for many cancers and skin infections are related to phages; therefore, this paper sought to identify phage proteins. In this paper, a Pseudo-188D model was established. The digital features of the phage were extracted by PseudoKNC, an appropriate vector was selected by the AdaBoost tool, and features were extracted by 188D. Then, the extracted digital features were combined together, and finally, the viral proteins of the phage were predicted by a stochastic gradient descent algorithm. Our model effect reached 93.4853%. To verify the stability of our model, we randomly selected 80% of the downloaded data to train the model and used the remaining 20% of the data to verify the robustness of our model.
Collapse
Affiliation(s)
- Xiaomei Gu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Institute of Yangtze River Delta, University of Electronic Science and Technology of China, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lina Guo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Qinghua Jiang
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
40
|
Han S, Wang N, Guo Y, Tang F, Xu L, Ju Y, Shi L. Application of Sparse Representation in Bioinformatics. Front Genet 2021; 12:810875. [PMID: 34976030 PMCID: PMC8715914 DOI: 10.3389/fgene.2021.810875] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 12/01/2021] [Indexed: 11/15/2022] Open
Abstract
Inspired by L1-norm minimization methods, such as basis pursuit, compressed sensing, and Lasso feature selection, in recent years, sparse representation shows up as a novel and potent data processing method and displays powerful superiority. Researchers have not only extended the sparse representation of a signal to image presentation, but also applied the sparsity of vectors to that of matrices. Moreover, sparse representation has been applied to pattern recognition with good results. Because of its multiple advantages, such as insensitivity to noise, strong robustness, less sensitivity to selected features, and no “overfitting” phenomenon, the application of sparse representation in bioinformatics should be studied further. This article reviews the development of sparse representation, and explains its applications in bioinformatics, namely the use of low-rank representation matrices to identify and study cancer molecules, low-rank sparse representations to analyze and process gene expression profiles, and an introduction to related cancers and gene expression profile database.
Collapse
Affiliation(s)
- Shuguang Han
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Ning Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Furong Tang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
- *Correspondence: Ying Ju, ; Lei Shi,
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
- *Correspondence: Ying Ju, ; Lei Shi,
| |
Collapse
|
41
|
Lin X. Genomic Variation Prediction: A Summary From Different Views. Front Cell Dev Biol 2021; 9:795883. [PMID: 34901036 PMCID: PMC8656232 DOI: 10.3389/fcell.2021.795883] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 11/11/2021] [Indexed: 12/02/2022] Open
Abstract
Structural variations in the genome are closely related to human health and the occurrence and development of various diseases. To understand the mechanisms of diseases, find pathogenic targets, and carry out personalized precision medicine, it is critical to detect such variations. The rapid development of high-throughput sequencing technologies has accelerated the accumulation of large amounts of genomic mutation data, including synonymous mutations. Identifying pathogenic synonymous mutations that play important roles in the occurrence and development of diseases from all the available mutation data is of great importance. In this paper, machine learning theories and methods are reviewed, efficient and accurate pathogenic synonymous mutation prediction methods are developed, and a standardized three-level variant analysis framework is constructed. In addition, multiple variation tolerance prediction models are studied and integrated, and new ideas for structural variation detection based on deep information mining are explored.
Collapse
Affiliation(s)
- Xiuchun Lin
- College of Information and Electrical Engineering, China Agricultural University, Beijing, China
| |
Collapse
|
42
|
Sun X, Guo Y, Zhang Y, Zhao P, Wang Z, Wei Z, Qiao H. Colon Cancer-Related Genes Identification and Function Study Based on Single-Cell Multi-Omics Integration. Front Cell Dev Biol 2021; 9:789587. [PMID: 34901030 PMCID: PMC8657154 DOI: 10.3389/fcell.2021.789587] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 11/01/2021] [Indexed: 12/13/2022] Open
Abstract
Transcriptomes and DNA methylation of colon cancer at the single-cell level are used to identify marker genes and improve diagnoses and therapies. Seven colon cancer subtypes are recognized based on the single-cell RNA sequence, and the differentially expressed genes regulated by dysregulated methylation are identified as marker genes for different types of colon cancer. Compared with normal colon cells, marker genes of different types show very obvious specificity, especially upregulated genes in tumors. Functional enrichment analysis for marker genes indicates a possible relation between colon cancer and nervous system disease, moreover, the weak immune system is verified in colon cancer. The heightened expression of markers and the reduction of methylation in colon cancer promote tumor development in an extensive mechanism so that there is no biological process that can be enriched in different types.
Collapse
Affiliation(s)
- Xuepu Sun
- The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yu Zhang
- Department of Neurosurgery, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Peng Zhao
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Zhaoqing Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Zheng Wei
- The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Haiquan Qiao
- The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
43
|
Yang F, Chen S, Qu Z, Wang K, Xie X, Cui H. Genetic Liability to Sedentary Behavior in Relation to Stroke, Its Subtypes and Neurodegenerative Diseases: A Mendelian Randomization Study. Front Aging Neurosci 2021; 13:757388. [PMID: 34867285 PMCID: PMC8641575 DOI: 10.3389/fnagi.2021.757388] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 10/18/2021] [Indexed: 01/16/2023] Open
Abstract
Objective: To investigate the causal association of domain-specific sedentary behaviors with cerebrovascular diseases and neurodegenerative diseases, and the potential mediators among these associations. Methods: Genetic instruments were identified for television watching, computer use and driving behavior from a genome-wide association study including 408,815 subjects. Mendelian randomization (MR) analysis was used to estimate the causal effect of sedentary behaviors on the cerebrovascular diseases and neurodegenerative diseases. Multivariable MR analysis was applied to adjust potential confounding factors, and mediation analysis was conducted to explore potential mediators. Results: Genetically predisposition to 1.5 h/day increase in leisure time watching television was associated with increased risk of all-cause stroke [odds ratio (OR) = 1.32, 95% confidence interval (CI) = 1.15-1.52, p-value for MR-Egger method (P Egger) = 0.11, I 2 = 37%, Cochrane's Q = 212, p-value for Cochran Q test (P Q) < 0.001], and ischemic stroke (OR = 1.28, 95%CI = 1.10-1.49, P Egger = 0.04, I 2 = 35%, Cochrane's Q = 206, P Q = 0.002). Interestingly, television watching may decrease the risk of Parkinson's disease (OR = 0.65, 95%CI = 0.50-0.84, P Egger = 0.47, I 2 = 19%, Cochrane's Q = 157, P Q = 0.04). Television watching was a detrimental factor of cognitive performance (estimate = -0.46, 95%CI = -0.55 - -0.37, P Egger = 0.001, I 2 = 85%, Cochrane's Q = 862, P Q < 0.001). Sensitivity analyses using leave out method and MR-PRESSO method suggested weak evidence of pleiotropy. Conclusion: We provided genetic evidence for the causal association of television watching with increased risk of all-cause stroke and ischemic stroke, decreased risk of Parkinson's disease, and worse cognitive performance. The results should be interpreted with caution considering the pleiotropy.
Collapse
Affiliation(s)
- Fangkun Yang
- Department of Cardiology, Ningbo Hospital of Zhejiang University (Ningbo First Hospital), Ningbo, China.,Department of Cardiology, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Songzan Chen
- School of Medicine, Zhejiang University, Hangzhou, China
| | - Zihao Qu
- School of Medicine, Zhejiang University, Hangzhou, China
| | - Kai Wang
- School of Medicine, Zhejiang University, Hangzhou, China
| | - Xiaojie Xie
- Department of Cardiology, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Hanbin Cui
- Cardiology Center, Ningbo First Hospital, Ningbo University, Ningbo, China
| |
Collapse
|
44
|
Guo Y, Hou L, Zhu W, Wang P. Prediction of Hormone-Binding Proteins Based on K-mer Feature Representation and Naive Bayes. Front Genet 2021; 12:797641. [PMID: 34887905 PMCID: PMC8650314 DOI: 10.3389/fgene.2021.797641] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 11/05/2021] [Indexed: 11/29/2022] Open
Abstract
Hormone binding protein (HBP) is a soluble carrier protein that interacts selectively with different types of hormones and has various effects on the body's life activities. HBPs play an important role in the growth process of organisms, but their specific role is still unclear. Therefore, correctly identifying HBPs is the first step towards understanding and studying their biological function. However, due to their high cost and long experimental period, it is difficult for traditional biochemical experiments to correctly identify HBPs from an increasing number of proteins, so the real characterization of HBPs has become a challenging task for researchers. To measure the effectiveness of HBPs, an accurate and reliable prediction model for their identification is desirable. In this paper, we construct the prediction model HBP_NB. First, HBPs data were collected from the UniProt database, and a dataset was established. Then, based on the established high-quality dataset, the k-mer (K = 3) feature representation method was used to extract features. Second, the feature selection algorithm was used to reduce the dimensionality of the extracted features and select the appropriate optimal feature set. Finally, the selected features are input into Naive Bayes to construct the prediction model, and the model is evaluated by using 10-fold cross-validation. The final results were 95.45% accuracy, 94.17% sensitivity and 96.73% specificity. These results indicate that our model is feasible and effective.
Collapse
Affiliation(s)
- Yuxin Guo
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Yangtze Delta Region Institute, University of Electronic Science and Technology of China, Quzhou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Liping Hou
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Peng Wang
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
45
|
Qiu S, Li M, Jin S, Lu H, Hu Y. Rheumatoid Arthritis and Cardio-Cerebrovascular Disease: A Mendelian Randomization Study. Front Genet 2021; 12:745224. [PMID: 34745219 PMCID: PMC8567962 DOI: 10.3389/fgene.2021.745224] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 08/20/2021] [Indexed: 01/05/2023] Open
Abstract
Significant genetic association exists between rheumatoid arthritis (RA) and cardiovascular disease. The associated mechanisms include common inflammatory mediators, changes in lipoprotein composition and function, immune responses, etc. However, the causality of RA and vascular/heart problems remains unknown. Herein, we performed Mendelian randomization (MR) analysis using a large-scale RA genome-wide association study (GWAS) dataset (462,933 cases and 457,732 controls) and six cardio-cerebrovascular disease GWAS datasets, including age angina (461,880 cases and 447,052 controls), hypertension (461,880 cases and 337,653 controls), age heart attack (10,693 cases and 451,187 controls), abnormalities of heartbeat (461,880 cases and 361,194 controls), stroke (7,055 cases and 454,825 controls), and coronary heart disease (361,194 cases and 351,037 controls) from United Kingdom biobank. We further carried out heterogeneity and sensitivity analyses. We confirmed the causality of RA with age angina (OR = 1.17, 95% CI: 1.04–1.33, p = 1.07E−02), hypertension (OR = 1.45, 95% CI: 1.20–1.75, p = 9.64E−05), age heart attack (OR = 1.15, 95% CI: 1.05–1.26, p = 3.56E−03), abnormalities of heartbeat (OR = 1.07, 95% CI: 1.01–1.12, p = 1.49E−02), stroke (OR = 1.06, 95% CI: 1.01–1.12, p = 2.79E−02), and coronary heart disease (OR = 1.19, 95% CI: 1.01–1.39, p = 3.33E−02), contributing to the understanding of the overlapping genetic mechanisms and therapeutic approaches between RA and cardiovascular disease.
Collapse
Affiliation(s)
- Shizheng Qiu
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Meijie Li
- Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Shunshan Jin
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Haoyu Lu
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Yang Hu
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
46
|
Li G, Liu Y, Li D, Liu B, Li J, Hu Y, Wang Y. Fast and Accurate Classification of Meta-Genomics Long Reads With deSAMBA. Front Cell Dev Biol 2021; 9:643645. [PMID: 34012962 PMCID: PMC8127778 DOI: 10.3389/fcell.2021.643645] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 02/09/2021] [Indexed: 12/01/2022] Open
Abstract
There is still a lack of fast and accurate classification tools to identify the taxonomies of noisy long reads, which is a bottleneck to the use of the promising long-read metagenomic sequencing technologies. Herein, we propose de Bruijn graph-based Sparse Approximate Match Block Analyzer (deSAMBA), a tailored long-read classification approach that uses a novel pseudo alignment algorithm based on sparse approximate match block (SAMB). Benchmarks on real sequencing datasets demonstrate that deSAMBA enables to achieve high yields and fast speed simultaneously, which outperforms state-of-the-art tools and has many potentials to cutting-edge metagenomics studies.
Collapse
Affiliation(s)
- Gaoyang Li
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yongzhuang Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Deying Li
- Department of Internal Medicine, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Bo Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Junyi Li
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.,School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Yang Hu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.,School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
47
|
Qiu S, Cao P, Guo Y, Lu H, Hu Y. Exploring the Causality Between Hypothyroidism and Non-alcoholic Fatty Liver: A Mendelian Randomization Study. Front Cell Dev Biol 2021; 9:643582. [PMID: 33791302 PMCID: PMC8005565 DOI: 10.3389/fcell.2021.643582] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 02/09/2021] [Indexed: 12/12/2022] Open
Abstract
The etiology of non-alcoholic fatty liver disease (NAFLD) involves complex interaction of genetic and environmental factors. A large number of observational studies have shown that hypothyroidism contributes to a high risk of NAFLD. However, the exact causality is still unknown. Due to the progress of genome-wide association study (GWAS) and the discovery of Mendelian randomization (MR), it is possible to explore the causality between the two diseases. In this study, in order to research into the influence of intermediate phenotypes on outcome, nine independent genetic variants of hypothyroidism obtained from the GWAS were used as instrumental variables (IVs) to perform MR analysis on NAFLD. Since there was no heterogeneity between IVs (P = 0.70), a fixed-effects model was used. The correlation between hypothyroidism and NAFLD was evaluated by using inverse-variance weighted (IVW) method and weighted median method. Then the sensitivity test was analyzed. The results showed that there was a high OR (1.7578; 95%CI 1.1897–2.5970; P = 0.0046) and a low intercept (−0.095; P = 0.431). None of the genetic variants drove the overall result (P < 0.01). Simply, we proved for the first time that the risk of NAFLD increases significantly on patients with hypothyroidism. Furthermore, we explained possible causes of NAFLD caused by hypothyroidism.
Collapse
Affiliation(s)
- Shizheng Qiu
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Peigang Cao
- Department of Cardiovascular, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Yu Guo
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Haoyu Lu
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Yang Hu
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|