1
|
Chen M, Zhang X, Ju Y, Liu Q, Ding Y. iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:13829-13850. [PMID: 36654069 DOI: 10.3934/mbe.2022644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Biological sequence analysis is an important basic research work in the field of bioinformatics. With the explosive growth of data, machine learning methods play an increasingly important role in biological sequence analysis. By constructing a classifier for prediction, the input sequence feature vector is predicted and evaluated, and the knowledge of gene structure, function and evolution is obtained from a large amount of sequence information, which lays a foundation for researchers to carry out in-depth research. At present, many machine learning methods have been applied to biological sequence analysis such as RNA gene recognition and protein secondary structure prediction. As a biological sequence, RNA plays an important biological role in the encoding, decoding, regulation and expression of genes. The analysis of RNA data is currently carried out from the aspects of structure and function, including secondary structure prediction, non-coding RNA identification and functional site prediction. Pseudouridine (У) is the most widespread and rich RNA modification and has been discovered in a variety of RNAs. It is highly essential for the study of related functional mechanisms and disease diagnosis to accurately identify У sites in RNA sequences. At present, several computational approaches have been suggested as an alternative to experimental methods to detect У sites, but there is still potential for improvement in their performance. In this study, we present a model based on twin support vector machine (TWSVM) for У site identification. The model combines a variety of feature representation techniques and uses the max-relevance and min-redundancy methods to obtain the optimum feature subset for training. The independent testing accuracy is improved by 3.4% in comparison to current advanced У site predictors. The outcomes demonstrate that our model has better generalization performance and improves the accuracy of У site identification. iPseU-TWSVM can be a helpful tool to identify У sites.
Collapse
Affiliation(s)
- Mingshuai Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Xin Zhang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Qing Liu
- Department of Anesthesiology, Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| |
Collapse
|
2
|
Sun W, Han Y, Yang S, Zhuang H, Zhang J, Cheng L, Fu L. The Assessment of Interleukin-18 on the Risk of Coronary Heart Disease. Med Chem 2021; 16:626-634. [PMID: 31584380 DOI: 10.2174/1573406415666191004115128] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 05/13/2019] [Accepted: 08/23/2019] [Indexed: 12/31/2022]
Abstract
BACKGROUND Observational studies support the inflammation hypothesis in coronary heart disease (CHD). As a pleiotropic proinflammatory cytokine, Interleukin-18 (IL-18), has also been found to be associated with the risk of CHD. However, to our knowledge, the method of Mendelian Randomization has not been used to explore the causal effect of IL-18 on CHD. OBJECTIVE To assess the causal effect of IL-18 on the risk of CHD. METHODS AND RESULTS Genetic variant instruments for IL-18 were obtained from information of the CHS and InCHIANTI cohort, and consisted of the per-allele difference in mean IL-18 for 16 independent variants that reached genome-wide significance. The per-allele difference in log-odds of CHD for each of these variants was estimated from CARDIoGRAMplusC4D, a two-stage meta -analysis. Two-sample Mendelian Randomization (MR) was then performed. Various MR analyses were used, including weighted inverse-variance, MR-Egger regression, robust regression, and penalized regression. The OR of elevated IL-18 associated with CHD was only 0.005 (95%CI -0.105~0.095; P-value=0.927). Similar results were obtained with the use of MR-Egger regression, suggesting that directional pleiotropy was unlikely biasing these results (intercept -0.050, P-value=0.220). Moreover, results from the robust regression and penalized regression analyses also revealed essentially similar findings. CONCLUSION Our findings indicate that, by itself, IL-18 is unlikely to represent even a modest causal factor for CHD risk.
Collapse
Affiliation(s)
- Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Ying Han
- Cardiovascular Department, the Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - He Zhuang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jingwen Zhang
- Department of Physiology and Biology, University of Mississippi Medical Center, United States
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Lu Fu
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
3
|
The construction and analysis of tumor-infiltrating immune cells and ceRNA networks in metastatic adrenal cortical carcinoma. Biosci Rep 2021; 40:222366. [PMID: 32175564 PMCID: PMC7103591 DOI: 10.1042/bsr20200049] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Revised: 03/08/2020] [Accepted: 03/09/2020] [Indexed: 12/12/2022] Open
Abstract
Purpose: To construct and analyze tumor-infiltrating immune cell and ceRNA (competitive endogenous RNA) networks in metastatic adrenal cortical carcinoma (ACC). Methods: A ceRNA network was established to identify the ceRNAs involved in metastasis of ACC based on 92 samples from TCGA, including 18 cases of metastasis and 74 cases of non-metastatic primary tumors. And the algorithm “cell type identification by estimating relative subsets of RNA transcripts (CIBERSORT)” was used to quantify the proportion of immune cells in ACC. In addition, predictive nomograms based on the types of important immune cells or ceRNAs were constructed to predict ACC prognosis. Moreover, we evaluated the relationships between metastatic ACC-specific immune cells and ceRNA networks to identify the potential immune gene characteristics. Results: Ten prognostic biomarkers were identified as key members of the ceRNA network and three tumor-infiltrating immune cells were identified by CIBERSORT algorithm. Some important co-expression patterns between immune cells and ceRNAs network indicate significant correlation between Macrophages M0 and hsa-miR-130b-3p (P < 0.001), Macrophages M0 and H2AFX (P = 0.003). Conclusions: The present study inferred that the metastasis-related ceRNAs of H2AFX, hsa-miR-130b-3p and Macrophages M0 might play important roles in ACC metastasis.
Collapse
|
4
|
Li Z, Jiang H, Kong L, Chen Y, Lang K, Fan X, Zhang L, Pian C. Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species. PLoS Comput Biol 2021; 17:e1008767. [PMID: 33600435 PMCID: PMC7924747 DOI: 10.1371/journal.pcbi.1008767] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 03/02/2021] [Accepted: 02/03/2021] [Indexed: 12/25/2022] Open
Abstract
N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA’s biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression. DNA N6 methyladenine (6mA) is a newly recognized methylation modification in eukaryotes. It exists widely and conservatively in organisms, and its modification level changes dynamically in the whole life cycle. This study proposes an algorithm based on a deep learning framework including LSTM and CNN to predict 6mA sites. The results showed that our method could accurately predict the 6mA sites in different species, which means DNA sub-sequences containing 6mA sites among species have certain conservation. Importantly, we found that 6mA methylation in most different species is more likely to occur on the GAGG motif. In addition, we also found that 6mA is rich in the promoter’s TATA box, which may be a mechanism of regulating downstream gene expression.
Collapse
Affiliation(s)
- Zutan Li
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
| | - Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou, China
| | - Lingpeng Kong
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
| | - Yuanyuan Chen
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
| | - Kun Lang
- College of information science & Technology, Nanjing Agricultural University, Nanjing, China
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Liangyun Zhang
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
- * E-mail: (LYZ); (CP)
| | - Cong Pian
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
- * E-mail: (LYZ); (CP)
| |
Collapse
|
5
|
Peng J, Lu G, Shang X. A Survey of Network Representation Learning Methods for Link Prediction in Biological Network. Curr Pharm Des 2021; 26:3076-3084. [PMID: 31951161 DOI: 10.2174/1381612826666200116145057] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 01/09/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Networks are powerful resources for describing complex systems. Link prediction is an important issue in network analysis and has important practical application value. Network representation learning has proven to be useful for network analysis, especially for link prediction tasks. OBJECTIVE To review the application of network representation learning on link prediction in a biological network, we summarize recent methods for link prediction in a biological network and discuss the application and significance of network representation learning in link prediction task. METHOD & RESULTS We first introduce the widely used link prediction algorithms, then briefly introduce the development of network representation learning methods, focusing on a few widely used methods, and their application in biological network link prediction. Existing studies demonstrate that using network representation learning to predict links in biological networks can achieve better performance. In the end, some possible future directions have been discussed.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Guilin Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
6
|
Gong Y, Chang C, Liu X, He Y, Wu Y, Wang S, Zhang C. Stimulator of Interferon Genes Signaling Pathway and its Role in Anti-tumor Immune Therapy. Curr Pharm Des 2021; 26:3085-3095. [PMID: 32520678 DOI: 10.2174/1381612826666200610183048] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Accepted: 05/04/2020] [Indexed: 12/19/2022]
Abstract
Stimulator of interferon genes is an important innate immune signaling molecule in the body and is involved in the innate immune signal transduction pathway induced by pathogen-associated molecular patterns or damage-associated molecular patterns. Stimulator of interferon genes promotes the production of type I interferon and thus plays an important role in the innate immune response to infection. In addition, according to a recent study, the stimulator of interferon genes pathway also contributes to anti-inflammatory and anti-tumor reactions. In this paper, current researches on the Stimulator of interferon genes signaling pathway and its relationship with tumor immunity are reviewed. Meanwhile, a series of critical problems to be addressed in subsequent studies are discussed as well.
Collapse
Affiliation(s)
- Yuanjin Gong
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Chang Chang
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Xi Liu
- Center of Cardiovascular Disease, Inner Mongolia People's Hospital, Hohhot, China
| | - Yan He
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Yiqi Wu
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Song Wang
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Chongyou Zhang
- Basic Medical College, Harbin Medical University, Harbin, China
| |
Collapse
|
7
|
Shi W, Chen X, Deng L. A Review of Recent Developments and Progress in Computational Drug Repositioning. Curr Pharm Des 2021; 26:3059-3068. [PMID: 31951162 DOI: 10.2174/1381612826666200116145559] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 01/09/2020] [Indexed: 12/27/2022]
Abstract
Computational drug repositioning is an efficient approach towards discovering new indications for existing drugs. In recent years, with the accumulation of online health-related information and the extensive use of biomedical databases, computational drug repositioning approaches have achieved significant progress in drug discovery. In this review, we summarize recent advancements in drug repositioning. Firstly, we explicitly demonstrated the available data source information which is conducive to identifying novel indications. Furthermore, we provide a summary of the commonly used computing approaches. For each method, we briefly described techniques, case studies, and evaluation criteria. Finally, we discuss the limitations of the existing computing approaches.
Collapse
Affiliation(s)
- Wanwan Shi
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xuegong Chen
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
8
|
Lv Z, Ding H, Wang L, Zou Q. A Convolutional Neural Network Using Dinucleotide One-hot Encoder for identifying DNA N6-Methyladenine Sites in the Rice Genome. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.056] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
9
|
Han Y, Cheng L, Sun W. Analysis of Protein-Protein Interaction Networks through Computational Approaches. Protein Pept Lett 2020; 27:265-278. [PMID: 31692419 DOI: 10.2174/0929866526666191105142034] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 05/08/2019] [Accepted: 09/26/2019] [Indexed: 01/02/2023]
Abstract
The interactions among proteins and genes are extremely important for cellular functions. Molecular interactions at protein or gene levels can be used to construct interaction networks in which the interacting species are categorized based on direct interactions or functional similarities. Compared with the limited experimental techniques, various computational tools make it possible to analyze, filter, and combine the interaction data to get comprehensive information about the biological pathways. By the efficient way of integrating experimental findings in discovering PPIs and computational techniques for prediction, the researchers have been able to gain many valuable data on PPIs, including some advanced databases. Moreover, many useful tools and visualization programs enable the researchers to establish, annotate, and analyze biological networks. We here review and list the computational methods, databases, and tools for protein-protein interaction prediction.
Collapse
Affiliation(s)
- Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
10
|
Zhan Q, Fu Y, Jiang Q, Liu B, Peng J, Wang Y. SpliVert: A Protein Multiple Sequence Alignment Refinement Method Based on Splitting-Splicing Vertically. Protein Pept Lett 2020; 27:295-302. [PMID: 31385760 DOI: 10.2174/0929866526666190806143959] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 04/26/2019] [Accepted: 06/14/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Multiple Sequence Alignment (MSA) is a fundamental task in bioinformatics and is required for many biological analysis tasks. The more accurate the alignments are, the more credible the downstream analyses. Most protein MSA algorithms realign an alignment to refine it by dividing it into two groups horizontally and then realign the two groups. However, this strategy does not consider that different regions of the sequences have different conservation; this property may lead to incorrect residue-residue or residue-gap pairs, which cannot be corrected by this strategy. OBJECTIVE In this article, our motivation is to develop a novel refinement method based on splitting- splicing vertically. METHODS Here, we present a novel refinement method based on splitting-splicing vertically, called SpliVert. For an alignment, we split it vertically into 3 parts, remove the gap characters in the middle, realign the middle part alone, and splice the realigned middle parts with the other two initial pieces to obtain a refined alignment. In the realign procedure of our method, the aligner will only focus on a certain part, ignoring the disturbance of the other parts, which could help fix the incorrect pairs. RESULTS We tested our refinement strategy for 2 leading MSA tools on 3 standard benchmarks, according to the commonly used average SP (and TC) score. The results show that given appropriate proportions to split the initial alignment, the average scores are increased comparably or slightly after using our method. We also compared the alignments refined by our method with alignments directly refined by the original alignment tools. The results suggest that using our SpliVert method to refine alignments can also outperform direct use of the original alignment tools. CONCLUSION The results reveal that splitting vertically and realigning part of the alignment is a good strategy for the refinement of protein multiple sequence alignments.
Collapse
Affiliation(s)
- Qing Zhan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yilei Fu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
11
|
Zhuang H, Zhang Y, Yang S, Cheng L, Liu SL. A Mendelian Randomization Study on Infant Length and Type 2 Diabetes Mellitus Risk. Curr Gene Ther 2020; 19:224-231. [PMID: 31553296 DOI: 10.2174/1566523219666190925115535] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/15/2019] [Accepted: 06/16/2019] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Infant length (IL) is a positively associated phenotype of type 2 diabetes mellitus (T2DM), but the causal relationship of which is still unclear. Here, we applied a Mendelian randomization (MR) study to explore the causal relationship between IL and T2DM, which has the potential to provide guidance for assessing T2DM activity and T2DM- prevention in young at-risk populations. MATERIALS AND METHODS To classify the study, a two-sample MR, using genetic instrumental variables (IVs) to explore the causal effect was applied to test the influence of IL on the risk of T2DM. In this study, MR was carried out on GWAS data using 8 independent IL SNPs as IVs. The pooled odds ratio (OR) of these SNPs was calculated by the inverse-variance weighted method for the assessment of the risk the shorter IL brings to T2DM. Sensitivity validation was conducted to identify the effect of individual SNPs. MR-Egger regression was used to detect pleiotropic bias of IVs. RESULTS The pooled odds ratio from the IVW method was 1.03 (95% CI 0.89-1.18, P = 0.0785), low intercept was -0.477, P = 0.252, and small fluctuation of ORs ranged from -0.062 ((0.966 - 1.03) / 1.03) to 0.05 ((1.081 - 1.03) / 1.03) in leave-one-out validation. CONCLUSION We validated that the shorter IL causes no additional risk to T2DM. The sensitivity analysis and the MR-Egger regression analysis also provided adequate evidence that the above result was not due to any heterogeneity or pleiotropic effect of IVs.
Collapse
Affiliation(s)
- He Zhuang
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, 150001, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shu-Lin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China.,Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, Canada.,Department of Infectious Diseases, The First Affiliated Hospital, Harbin Medical University, Harbin, China.,Translational Medicine Research and Cooperation Center of Northern China, Heilongjiang Academy of Medical Sciences, Harbin, China
| |
Collapse
|
12
|
Wu N, Wang L, Hu J, Zhao S, Liu B, Li Y, Du H, Zhang Y, Li X, Yan Z, Wang S, Wang Y, Zhang J, Wu Z, Disco Deciphering Disorders Involving Scoliosis Comorbidities Study Group, Qiu G. A Recurrent Rare SOX9 Variant (M469V) is Associated with Congenital Vertebral Malformations. Curr Gene Ther 2020; 19:242-247. [PMID: 31549955 DOI: 10.2174/1566523219666190924120307] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/11/2019] [Accepted: 06/12/2019] [Indexed: 11/22/2022]
Abstract
OBJECTIVE The genetic variations contributed to a substantial proportion of congenital vertebral malformations (CVM). SOX9 gene, a member of the SOX gene family, has been implicated in CVM. To study the SOX9 mutation in CVM patients is of great significance to explain the pathogenesis of scoliosis (the clinical manifestation of CVM) and to explore the pathogenesis of SOX9-related skeletal deformities. METHODS A total of 50 singleton patients with CVM were included in this study. Exome Sequencing (ES) was performed on all the patients. The recurrent candidate variant of SOX9 gene was validated by Sanger sequencing. Luciferase assay was performed to investigate the functional changes of this variant. RESULTS A recurrent rare heterozygous missense variant in SOX9 gene (NM_000346.3: c.1405A>G, p.M469V) which had not been reported previously was identified in three CVM patients who had the clinical findings of congenital scoliosis without deformities in other systems. This variant was absent from our in-house database and it was predicted to be deleterious (CADD = 24.5). The luciferase assay demonstrated that transactivation capacity of the mutated SOX9 protein was significantly lower than that of the wild-type for the two luciferase reporters (p = 0.0202, p = 0.0082, respectively). CONCLUSION This SOX9 mutation (p.M469V) may contribute to CVM without other systematic deformity, which provides important implications and better understanding of phenotypic variability in SOX9-related skeletal deformities.
Collapse
Affiliation(s)
- Nan Wu
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Beijing Key Laboratory for Genetic Research of Skeletal Deformity, Beijing, China.,Medical Research Center of Orthopedics, Chinese Academy of Medical Sciences, Beijing, China
| | - Lianlei Wang
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing, China
| | - Jianhua Hu
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Beijing Key Laboratory for Genetic Research of Skeletal Deformity, Beijing, China.,Medical Research Center of Orthopedics, Chinese Academy of Medical Sciences, Beijing, China
| | - Sen Zhao
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing, China
| | - Bowen Liu
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing, China
| | - Yaqi Li
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing, China
| | - Huakang Du
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing, China
| | - Yuanqiang Zhang
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing, China
| | - Xiaoxin Li
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Department of Central Laboratory, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| | - Zihui Yan
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing, China
| | - Shengru Wang
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Beijing Key Laboratory for Genetic Research of Skeletal Deformity, Beijing, China.,Medical Research Center of Orthopedics, Chinese Academy of Medical Sciences, Beijing, China
| | - Yipeng Wang
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Beijing Key Laboratory for Genetic Research of Skeletal Deformity, Beijing, China.,Medical Research Center of Orthopedics, Chinese Academy of Medical Sciences, Beijing, China
| | - Jianguo Zhang
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Beijing Key Laboratory for Genetic Research of Skeletal Deformity, Beijing, China.,Medical Research Center of Orthopedics, Chinese Academy of Medical Sciences, Beijing, China
| | - Zhihong Wu
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Beijing Key Laboratory for Genetic Research of Skeletal Deformity, Beijing, China.,Medical Research Center of Orthopedics, Chinese Academy of Medical Sciences, Beijing, China.,Department of Central Laboratory, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| | | | - Guixing Qiu
- Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China.,Beijing Key Laboratory for Genetic Research of Skeletal Deformity, Beijing, China.,Medical Research Center of Orthopedics, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
13
|
Guan ZX, Li SH, Zhang ZM, Zhang D, Yang H, Ding H. A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods. Curr Genomics 2020; 21:11-25. [PMID: 32655294 PMCID: PMC7324890 DOI: 10.2174/1389202921666200214125102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/24/2020] [Accepted: 01/30/2020] [Indexed: 11/22/2022] Open
Abstract
MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.
Collapse
Affiliation(s)
- Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| |
Collapse
|
14
|
Deng S, Sun Y, Zhao T, Hu Y, Zang T. A Review of Drug Side Effect Identification Methods. Curr Pharm Des 2020; 26:3096-3104. [PMID: 32532187 DOI: 10.2174/1381612826666200612163819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 05/18/2020] [Indexed: 11/22/2022]
Abstract
Drug side effects have become an important indicator for evaluating the safety of drugs. There are two main factors in the frequent occurrence of drug safety problems; on the one hand, the clinical understanding of drug side effects is insufficient, leading to frequent adverse drug reactions, while on the other hand, due to the long-term period and complexity of clinical trials, side effects of approved drugs on the market cannot be reported in a timely manner. Therefore, many researchers have focused on developing methods to identify drug side effects. In this review, we summarize the methods of identifying drug side effects and common databases in this field. We classified methods of identifying side effects into four categories: biological experimental, machine learning, text mining and network methods. We point out the key points of each kind of method. In addition, we also explain the advantages and disadvantages of each method. Finally, we propose future research directions.
Collapse
Affiliation(s)
- Shuai Deng
- College of Science, Beijing Forestry University, Beijing, China
| | - Yige Sun
- Microbiology Department, Harbin Medical University, Harbin, 150081, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
15
|
Meng C, Guo F, Zou Q. CWLy-SVM: A support vector machine-based tool for identifying cell wall lytic enzymes. Comput Biol Chem 2020; 87:107304. [PMID: 32580129 DOI: 10.1016/j.compbiolchem.2020.107304] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 06/07/2020] [Accepted: 06/08/2020] [Indexed: 12/21/2022]
Abstract
Cell wall lytic enzymes, as an important biotechnical tool in drug development, agriculture and the food industry, have attracted more research attention. In this research, the accurate identification of cell wall lytic enzymes is one of the key and fundamental tasks. In this study, in order to eliminate the inefficiency of in vitro experiments, a support vector machine-based cell wall lytic enzyme identification model was constructed using bioinformatics. This machine learning process includes feature extraction, feature selection, model training and optimization. According to the jackknife cross validation test, this model obtained a sensitivity of 0.853, a specificity of 0.977, an MCC of 0.845 and an AUC of 0.915. These benchmark results demonstrate that the proposed model outperforms the state-of-the-art method and that it has powerful cell wall lytic enzyme identification ability. Furthermore, we comprehensively analyzed the selected optimal features and used the proposed model to construct a user friendly web server called the CWLy-SVM to identify cell wall lytic enzymes, which is available at http://server.malab.cn/CWLy-SVM/index.jsp.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, China; College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
16
|
Its2vec: Fungal Species Identification Using Sequence Embedding and Random Forest Classification. BIOMED RESEARCH INTERNATIONAL 2020; 2020:2468789. [PMID: 32566672 PMCID: PMC7275950 DOI: 10.1155/2020/2468789] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 03/20/2020] [Accepted: 03/25/2020] [Indexed: 12/19/2022]
Abstract
Fungi play essential roles in many ecological processes, and taxonomic classification is fundamental for microbial community characterization and vital for the study and preservation of fungal biodiversity. To cope with massive fungal barcode data, tools that can implement extensive volumes of barcode sequences, especially the internal transcribed spacer (ITS) region, are necessary. However, high variation in the ITS region and computational requirements for processing high-dimensional features remain challenging for existing predictors. In this study, we developed Its2vec, a bioinformatics tool for the classification of fungal ITS barcodes to the species level. An ITS database covering more than 25,000 species in a broad range of fungal taxa was assembled. For dimensionality reduction, a word embedding algorithm was used to represent an ITS sequence as a dense low-dimensional vector. A random forest-based classifier was built for species identification. Benchmarking results showed that our model achieved an accuracy comparable to that of several state-of-the-art predictors, and more importantly, it could implement large datasets and greatly reduce dimensionality. We expect the Its2vec model to be helpful for fungal species identification and, thus, for revealing microbial community structures and in deepening our understanding of their functional mechanisms.
Collapse
|
17
|
Cheng L, Qi C, Zhuang H, Fu T, Zhang X. gutMDisorder: a comprehensive database for dysbiosis of the gut microbiota in disorders and interventions. Nucleic Acids Res 2020; 48:D554-D560. [PMID: 31584099 PMCID: PMC6943049 DOI: 10.1093/nar/gkz843] [Citation(s) in RCA: 124] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2019] [Revised: 09/18/2019] [Accepted: 10/01/2019] [Indexed: 12/11/2022] Open
Abstract
gutMDisorder (http://bio-annotation.cn/gutMDisorder), a manually curated database, aims at providing a comprehensive resource of dysbiosis of the gut microbiota in disorders and interventions. Alterations in the composition of the gut microbial community play crucial roles in the development of chronic disorders. And the beneficial effects of drugs, foods and other intervention measures on disorders could be microbially mediated. The current version of gutMDisorder documents 2263 curated associations between 579 gut microbes and 123 disorders or 77 intervention measures in Human, and 930 curated associations between 273 gut microbes and 33 disorders or 151 intervention measures in Mouse. Each entry in the gutMDisorder contains detailed information on an association, including an intestinal microbe, a disorder name, intervention measures, experimental technology and platform, characteristic of samples, web sites for downloading the sequencing data, a brief description of the association, a literature reference, and so on. gutMDisorder provides a user-friendly interface to browse, retrieve each entry using gut microbes, disorders, and intervention measures. It also offers pages for downloading all the entries and submitting new experimentally validated associations.
Collapse
Affiliation(s)
- Liang Cheng
- NHC and CAMS Key Laboratory of Molecular Probe and Targeted Theranostics, Harbin Medical University, Harbin, Heilongjiang, China, 150028.,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - Changlu Qi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - He Zhuang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - Tongze Fu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - Xue Zhang
- NHC and CAMS Key Laboratory of Molecular Probe and Targeted Theranostics, Harbin Medical University, Harbin, Heilongjiang, China, 150028.,McKusick-Zhang Center for Genetic Medicine, Peking Union Medical College, Beijing, China, 100005
| |
Collapse
|
18
|
Kou N, Zhou W, He Y, Ying X, Chai S, Fei T, Fu W, Huang J, Liu H. A Mendelian Randomization Analysis to Expose the Causal Effect of IL-18 on Osteoporosis Based on Genome-Wide Association Study Data. Front Bioeng Biotechnol 2020; 8:201. [PMID: 32266232 PMCID: PMC7099043 DOI: 10.3389/fbioe.2020.00201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 02/28/2020] [Indexed: 01/16/2023] Open
Abstract
Accumulating evidence showed that Interleukin (IL) level is associated with Osteoporosis. Whereas, most of these associations are based on observational studies. Thus, their causality was still unclear. Mendelian randomization (MR) is a widely used statistical framework that uses genetic instrumental variables (IVs) to explore the causality of intermediate phenotype with disease. To classify their causality, we conducted a MR analysis to investigate the effect of IL-18 level on the risk of Osteoporosis. First, based on summarized genome-wide association study (GWAS) data, 8 independent IL-18 SNPs reaching genome-wide significance were deemed as IVs. Next, Simple median method was used to calculate the pooled odds ratio (OR) of these 8 SNPs for the assessment of IL-8 on the risk of Osteoporosis. Then, MR-Egger regression was utilized to detect potential bias due to the horizontal pleiotropy of these IVs. As a result of simple median method, we get the SE (−0.001; 95% CI−0.002 to 0; P = 0.042), which means low IL-18 level could increases the risk of the development of Osteoporosis. The low intercept (0; 95% CI −0.001 to 0; P = 0.59) shows there is no bias due to the horizontal pleiotropy of the IVs.
Collapse
Affiliation(s)
- Ni Kou
- Department of Oral Prosthodontics, School of Stomatology, Dalian Medical University, Dalian, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yuzhu He
- Department of Oral Prosthodontics, School of Stomatology, Dalian Medical University, Dalian, China
| | - Xiaoxia Ying
- Department of Oral Prosthodontics, School of Stomatology, Dalian Medical University, Dalian, China
| | - Songling Chai
- Department of Oral Prosthodontics, School of Stomatology, Dalian Medical University, Dalian, China
| | - Tao Fei
- Department of Oral Prosthodontics, School of Stomatology, Dalian Medical University, Dalian, China
| | - Wenqi Fu
- Department of Oral Prosthodontics, School of Stomatology, Dalian Medical University, Dalian, China
| | - Jiaqian Huang
- Department of Oral Prosthodontics, School of Stomatology, Dalian Medical University, Dalian, China
| | - Huiying Liu
- Department of Oral Prosthodontics, School of Stomatology, Dalian Medical University, Dalian, China
- *Correspondence: Huiying Liu
| |
Collapse
|
19
|
Chu Y, Nie C, Wang Y. A Pipeline for Reconstructing Somatic Copy Number Alternation's Subclonal Population-Based Next-Generation Sequencing Data. Front Genet 2020; 10:1374. [PMID: 32180789 PMCID: PMC7058119 DOI: 10.3389/fgene.2019.01374] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Accepted: 12/16/2019] [Indexed: 12/23/2022] Open
Abstract
State-of-the-art next-generation sequencing (NGS)-based subclonal reconstruction methods perform poorly on somatic copy number alternations (SCNAs), due to not only it needs to simultaneously estimate the subclonal population frequency and the absolute copy number for each SCNA, but also there exist complex bias and noise in the tumor and its paired normal sequencing data. Both existing NGS-based SCNA detection methods and SCNA’s subclonal population frequency inferring tools use the read count on radio (RCR) of tumor to its paired normal as the key feature of tumor sequencing data; however, the sequencing error and bias have great impact on RCR, which leads to a large number of redundant SCNA segments that make the subsequent process of SCNA’s subclonal population frequency inferring and subclonal reconstruction time-consuming and inaccurate. We perform a mathematical analysis of the solution number of SCNA’s subclonal frequency, and we propose a computational algorithm to reduce the impact of false breakpoints based on it. We construct a new probability model that incorporates the RCR bias correction algorithm, and by stringing it with the false breakpoint filtering algorithm, we construct a whole SCNA’s subclonal population reconstruction pipeline. The experimental result shows that our pipeline outperforms the existing subclonal reconstruction programs both on simulated data and TCGA data. Source code is publicly available as a Python package at https://github.com/dustincys/msphy-SCNAClonal.
Collapse
Affiliation(s)
- Yanshuo Chu
- Center of Bioinfomatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chenxi Nie
- Center of Bioinfomatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- Center of Bioinfomatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
20
|
Dou L, Li X, Ding H, Xu L, Xiang H. Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem? MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 19:293-303. [PMID: 31865116 PMCID: PMC6931122 DOI: 10.1016/j.omtn.2019.11.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 10/29/2019] [Accepted: 11/11/2019] [Indexed: 01/01/2023]
Abstract
Pseudouridine (Ψ) is the most abundant RNA modification and has been found in many kinds of RNAs, including snRNA, rRNA, tRNA, mRNA, and snoRNA. Thus, Ψ sites play a significant role in basic research and drug development. Although some experimental techniques have been developed to identify Ψ sites, they are expensive and time consuming, especially in the post-genomic era with the explosive growth of known RNA sequences. Thus, highly accurate computational methods are urgently required to quickly detect the Ψ sites on uncharacterized RNA sequences. Several predictors have been proposed using multifarious features, but their evaluated performances are still unsatisfactory. In this study, we first identified Ψ sites for H. sapiens, S. cerevisiae, and M. musculus using the sequence features from the bi-profile Bayes (BPB) method based on the random forest (RF) and support vector machine (SVM) algorithms, where the performances were evaluated using 5-fold cross-validation and independent tests. It was found that the SVM-based accuracies were 3.55% and 5.09% lower than the iPseU-CUU predictor for the H_990 and S_628 datasets, respectively. Almost the same-level results were obtained for M_994 and an independent H_200 dataset, even showing a 5.0% improvement for S_200. Then, three different kinds of features, including basic Kmer, general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-General), and nucleotide chemical property (NCP) and nucleotide density (ND) from the iRNA-PseU method, were combined with BPB to show their comprehensive performances, where the effective features are selected by the max-relevance-max-distance (MRMD) method. The best evaluated accuracies of the combined features for the S_628 and M_994 datasets were achieved at 70.54% and 72.45%, which were 2.39% and 0.65% higher than iPseU-CUU. For the S_200 dataset, it was also improved 8% from 69% to 77%. However, there was no obvious improvement for H. sapiens, which was evaluated as approximately 63.23% and 72.0% for the H_990 and H_200 datasets, respectively. The overall performances for Ψ identification using BPB features as well as the combined features were not obviously improved. Although some kinds of feature extraction methods based on the RNA sequence information have been applied to construct the predictors in previous studies, the corresponding accuracies are generally in the range of 60%-70%. Thus, researchers need to reconsider whether there is any sequence feature in the RNA Ψ modification prediction problem.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaoling Li
- Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China.
| |
Collapse
|
21
|
Wang C, Zhang J, Wang X, Han K, Guo M. Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion. Front Genet 2020; 11:5. [PMID: 32117433 PMCID: PMC7010852 DOI: 10.3389/fgene.2020.00005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 01/06/2020] [Indexed: 12/23/2022] Open
Abstract
Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene–disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene–disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jie Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xueping Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
22
|
Lv Z, Zhang J, Ding H, Zou Q. RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites. Front Bioeng Biotechnol 2020; 8:134. [PMID: 32175316 PMCID: PMC7054385 DOI: 10.3389/fbioe.2020.00134] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 02/10/2020] [Indexed: 12/21/2022] Open
Abstract
One of the ubiquitous chemical modifications in RNA, pseudouridine modification is crucial for various cellular biological and physiological processes. To gain more insight into the functional mechanisms involved, it is of fundamental importance to precisely identify pseudouridine sites in RNA. Several useful machine learning approaches have become available recently, with the increasing progress of next-generation sequencing technology; however, existing methods cannot predict sites with high accuracy. Thus, a more accurate predictor is required. In this study, a random forest-based predictor named RF-PseU is proposed for prediction of pseudouridylation sites. To optimize feature representation and obtain a better model, the light gradient boosting machine algorithm and incremental feature selection strategy were used to select the optimum feature space vector for training the random forest model RF-PseU. Compared with previous state-of-the-art predictors, the results on the same benchmark data sets of three species demonstrate that RF-PseU performs better overall. The integrated average leave-one-out cross-validation and independent testing accuracy scores were 71.4% and 74.7%, respectively, representing increments of 3.63% and 4.77% versus the best existing predictor. Moreover, the final RF-PseU model for prediction was built on leave-one-out cross-validation and provides a reliable and robust tool for identifying pseudouridine sites. A web server with a user-friendly interface is accessible at http://148.70.81.170:10228/rfpseu.
Collapse
Affiliation(s)
- Zhibin Lv
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
23
|
Pan W, Sun W, Yang S, Zhuang H, Jiang H, Ju H, Wang D, Han Y. LDL-C plays a causal role on T2DM: a Mendelian randomization analysis. Aging (Albany NY) 2020; 12:2584-2594. [PMID: 32040442 PMCID: PMC7041740 DOI: 10.18632/aging.102763] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 01/12/2020] [Indexed: 06/10/2023]
Abstract
Diabetic dyslipidemia is a common condition in patients with Type 2 diabetes mellitus (T2DM). However, with the increasing application of statins which mainly decrease low-density lipoprotein cholesterol (LDL-C) levels, clinical trials and meta-analysis showed a clearly increase of the incidence of new-onset DMs, partly due to genetic factors. To determine whether a causal relationship exists between LDL-C and T2DM, we conducted a two-sample Mendelian Randomization (MR) analysis using genetic variations as instrumental variables (IVs). Initially, 29 SNPs significantly related to LDL-C (P≤ 5.0×10-8) were selected as based on results from the study of Henry et al, which processed loci data influencing lipids identified by the Global Lipids Genetics Consortium (GLGC) from 188,577 individuals of European ancestry. While 6 SNPs related to T2DM (P value < 5×10-2) were deleted, with the remaining 23 SNPs without LD eventually being deemed as IVs. The combined effect of all these 23 SNPs on T2DM, as generated with use of the penalized robust inverse-variance weighted (IVW) method (Beta value 0.24, 95%CI 0.087~0.393, P-value=0.002) demonstrated that elevated LDL-C levels significantly increased the risk of T2DM. The relationship between LDL-C and Type 1 diabetes mellitus (T1DM) with this analysis producing negative pooled results (Beta value -0.202, 95%CI -2.888~2.484, P-value=0.883).
Collapse
Affiliation(s)
- Wenbin Pan
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - He Zhuang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Huijie Jiang
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Hong Ju
- Department of Information Engineering, Heilongjiang Biological Science and Technology Career Academy, Harbin, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
24
|
Zhang D, Huo D, Xie H, Wu L, Zhang J, Liu L, Jin Q, Chen X. CHG: A Systematically Integrated Database of Cancer Hallmark Genes. Front Genet 2020; 11:29. [PMID: 32117445 PMCID: PMC7013921 DOI: 10.3389/fgene.2020.00029] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 01/09/2020] [Indexed: 12/20/2022] Open
Abstract
Background The analysis of cancer diversity based on a logical framework of hallmarks has greatly improved our understanding of the occurrence, development and metastasis of various cancers. Methods We designed Cancer Hallmark Genes (CHG) database which focuses on integrating hallmark genes in a systematic, standard way and annotates the potential roles of the hallmark genes in cancer processes. Following the conceptual criteria description of hallmark function the keywords for each hallmark were manually selected from the literature. Candidate hallmark genes collected were derived from 301 pathways of KEGG database by Lucene and manually corrected. Results Based on the variation data, we finally identified the hallmark genes of various types of cancer and constructed CHG. And we also analyzed the relationships among hallmarks and potential characteristics and relationships of hallmark genes based on the topological structures of their networks. We manually confirm the hallmark gene identified by CHG based on literature and database. We also predicted the prognosis of breast cancer, glioblastoma multiforme and kidney papillary cell carcinoma patients based on CHG data. Conclusions In summary, CHG, which was constructed based on a hallmark feature set, provides a new perspective for analyzing the diversity and development of cancers.
Collapse
Affiliation(s)
- Denan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Diwei Huo
- The 2nd Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Hongbo Xie
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Lingxiang Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Juan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Lei Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qing Jin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xiujie Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
25
|
Huang Q, Zhang J, Wei L, Guo F, Zou Q. 6mA-RicePred: A Method for Identifying DNA N 6-Methyladenine Sites in the Rice Genome Based on Feature Fusion. FRONTIERS IN PLANT SCIENCE 2020; 11:4. [PMID: 32076430 PMCID: PMC7006724 DOI: 10.3389/fpls.2020.00004] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 01/06/2020] [Indexed: 06/01/2023]
Abstract
MOTIVATION The biological function of N 6-methyladenine DNA (6mA) in plants is largely unknown. Rice is one of the most important crops worldwide and is a model species for molecular and genetic studies. There are few methods for 6mA site recognition in the rice genome, and an effective computational method is needed. RESULTS In this paper, we propose a new computational method called 6mA-Pred to identify 6mA sites in the rice genome. 6mA-Pred employs a feature fusion method to combine advantageous features from other methods and thus obtain a new feature to identify 6mA sites. This method achieved an accuracy of 87.27% in the identification of 6mA sites with 10-fold cross-validation and achieved an accuracy of 85.6% in independent test sets.
Collapse
Affiliation(s)
- Qianfei Huang
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Leyi Wei
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
26
|
Wang C, Zhao N, Yuan L, Liu X. Computational Detection of Breast Cancer Invasiveness with DNA Methylation Biomarkers. Cells 2020; 9:E326. [PMID: 32019269 PMCID: PMC7072524 DOI: 10.3390/cells9020326] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Revised: 01/28/2020] [Accepted: 01/28/2020] [Indexed: 12/14/2022] Open
Abstract
Breast cancer is the most common female malignancy. It has high mortality, primarily due to metastasis and recurrence. Patients with invasive and noninvasive breast cancer require different treatments, so there is an urgent need for predictive tools to guide clinical decision making and avoid overtreatment of noninvasive breast cancer and undertreatment of invasive cases. Here, we divided the sample set based on the genome-wide methylation distance to make full use of metastatic cancer data. Specifically, we implemented two differential methylation analysis methods to identify specific CpG sites. After effective dimensionality reduction, we constructed a methylation-based classifier using the Random Forest algorithm to categorize the primary breast cancer. We took advantage of breast cancer (BRCA) HM450 DNA methylation data and accompanying clinical data from The Cancer Genome Atlas (TCGA) database to validate the performance of the classifier. Overall, this study demonstrates DNA methylation as a potential biomarker to predict breast tumor invasiveness and as a possible parameter that could be included in the studies aiming to predict breast cancer aggressiveness. However, more comparative studies are needed to assess its usability in the clinic. Towards this, we developed a website based on these algorithms to facilitate its use in studies and predictions of breast cancer invasiveness.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Ning Zhao
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China;
| | - Linlin Yuan
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China;
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| |
Collapse
|
27
|
Zhou W, Yang F, Xu Z, Luo M, Wang P, Guo Y, Nie H, Yao L, Jiang Q. Comprehensive Analysis of Copy Number Variations in Kidney Cancer by Single-Cell Exome Sequencing. Front Genet 2020; 10:1379. [PMID: 32038722 PMCID: PMC6989475 DOI: 10.3389/fgene.2019.01379] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 12/17/2019] [Indexed: 12/16/2022] Open
Abstract
Clear-cell renal cell carcinoma (ccRCC) is the most common and lethal subtype of kidney cancer. VHL and PBRM1 are the top two significantly mutated genes in ccRCC specimens, while the genetic mechanism of the VHL/PBRM1-negative ccRCC remains to be elucidated. Here we carried out a comprehensive analysis of single-cell genomic copy number variations (CNVs) in VHL/PBRM1-negative ccRCC. Genomic CNVs were identified at the single-cell level, and the tumor cells showed widespread amplification and deletion across the whole genome. Functional enrichment analysis indicated that the amplified genes are significantly enriched in cancer-related signaling transduction pathways. Besides, receptor protein tyrosine kinase (RTK) genes also showed widespread copy number variations in cancer cells. Our studies indicated that the genomic CNVs in RTK genes and downstream signaling transduction pathways may be involved in VHL/PBRM1-negative ccRCC pathogenesis and progression, and highlighted the role of the comprehensive investigation of genomic CNVs at the single-cell level in both clarifying pathogenic mechanism and identifying potential therapeutic targets in cancers.
Collapse
Affiliation(s)
- Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Fan Yang
- Department of Neurology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Zhaochun Xu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yu Guo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Huan Nie
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Lifen Yao
- Department of Neurology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
28
|
Wang T, Peng Q, Liu B, Liu X, Liu Y, Peng J, Wang Y. eQTLMAPT: Fast and Accurate eQTL Mediation Analysis With Efficient Permutation Testing Approaches. Front Genet 2020; 10:1309. [PMID: 31998368 PMCID: PMC6970436 DOI: 10.3389/fgene.2019.01309] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 11/27/2019] [Indexed: 12/21/2022] Open
Abstract
Expression quantitative trait locus (eQTL) analyses are critical in understanding the complex functional regulatory natures of genetic variation and have been widely used in the interpretation of disease-associated variants identified by genome-wide association studies (GWAS). Emerging evidence has shown that trans-eQTL effects on remote gene expression could be mediated by local transcripts, which is known as the mediation effects. To discover the genome-wide eQTL mediation effects combing genomic and transcriptomic profiles, it is necessary to develop novel computational methods to rapidly scan large number of candidate associations while controlling for multiple testing appropriately. Here, we present eQTLMAPT, an R package aiming to perform eQTL mediation analysis with implementation of efficient permutation procedures in multiple testing correction. eQTLMAPT is advantageous in threefold. First, it accelerates mediation analysis by effectively pruning the permutation process through adaptive permutation scheme. Second, it can efficiently and accurately estimate the significance level of mediation effects by modeling the null distribution with generalized Pareto distribution (GPD) trained from a few permutation statistics. Third, eQTLMAPT provides flexible interfaces for users to combine various permutation schemes with different confounding adjustment methods. Experiments on real eQTL dataset demonstrate that eQTLMAPT provides higher resolution of estimated significance of mediation effects and is an order of magnitude faster than compared methods with similar accuracy.
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qidi Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiaoli Liu
- Department of Neurology, Zhejiang Hospital, Hangzhou, China
| | - Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
29
|
Zhao L, Wang J, Pang L, Liu Y, Zhang J. GANsDTA: Predicting Drug-Target Binding Affinity Using GANs. Front Genet 2020; 10:1243. [PMID: 31993067 PMCID: PMC6962343 DOI: 10.3389/fgene.2019.01243] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 11/11/2019] [Indexed: 01/09/2023] Open
Abstract
The computational prediction of interactions between drugs and targets is a standing challenge in drug discovery. State-of-the-art methods for drug-target interaction prediction are primarily based on supervised machine learning with known label information. However, in biomedicine, obtaining labeled training data is an expensive and a laborious process. This paper proposes a semi-supervised generative adversarial networks (GANs)-based method to predict binding affinity. Our method comprises two parts, two GANs for feature extraction and a regression network for prediction. The semi-supervised mechanism allows our model to learn proteins drugs features of both labeled and unlabeled data. We evaluate the performance of our method using multiple public datasets. Experimental results demonstrate that our method achieves competitive performance while utilizing freely available unlabeled data. Our results suggest that utilizing such unlabeled data can considerably help improve performance in various biomedical relation extraction processes, for example, Drug-Target interaction and protein-protein interaction, particularly when only limited labeled data are available in such tasks. To our best knowledge, this is the first semi-supervised GANs-based method to predict binding affinity.
Collapse
Affiliation(s)
- Lingling Zhao
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Junjie Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Long Pang
- Institute of Space Environment and Material Science, Harbin Institute of Technology, Harbin, China
| | - Yang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jun Zhang
- Department of Rehabilitation, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| |
Collapse
|
30
|
Ru X, Cao P, Li L, Zou Q. Selecting Essential MicroRNAs Using a Novel Voting Method. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:16-23. [PMID: 31479921 PMCID: PMC6727015 DOI: 10.1016/j.omtn.2019.07.019] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 06/20/2019] [Accepted: 07/08/2019] [Indexed: 02/06/2023]
Abstract
Among the large number of known microRNAs (miRNAs), some miRNAs play negligible roles in cell regulation. Therefore, selecting essential miRNAs is an important initial step for a deeper understanding of miRNAs and their functions. In this study, we generated 60 classification models by combining 12 representative feature extraction methods and 5 commonly used classification algorithms. The optimal model for essential miRNA classification that we obtained is based on the Mismatch feature extraction method combined with the random forest algorithm. The F-Measure, area under the curve, and accuracy values of this model were 93.2%, 96.7%, and 93.0%, respectively. We also found that the distribution of the positive and negative examples of the first few features greatly influenced the classification results. The feature extraction methods performed best when the differences between the positive and negative examples were obvious, and this led to better classification of essential miRNAs. Because each classifier's predictions for the same sample may be different, we employed a novel voting method to improve the accuracy of the classification of essential miRNAs. The performance results showed that the best classification results were obtained when five classification models were used in the voting. The five classification models were constructed based on the Mismatch, pseudo-distance structure status pair composition, Subsequence, Kmer, and Triplet feature extraction methods. The voting result was 95.3%. Our results suggest that the voting method can be an important tool for selecting essential miRNAs.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Peigang Cao
- Department of Cardiology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Lihong Li
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
31
|
Zhong W, Zhong B, Zhang H, Chen Z, Chen Y. Identification of Anti-cancer Peptides Based on Multi-classifier System. Comb Chem High Throughput Screen 2019; 22:694-704. [PMID: 31793417 DOI: 10.2174/1386207322666191203141102] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 07/18/2019] [Accepted: 07/30/2019] [Indexed: 01/01/2023]
Abstract
AIMS AND OBJECTIVE Cancer is one of the deadliest diseases, taking the lives of millions every year. Traditional methods of treating cancer are expensive and toxic to normal cells. Fortunately, anti-cancer peptides (ACPs) can eliminate this side effect. However, the identification and development of new anti-cancer peptides through experiments take a lot of time and money, therefore, it is necessary to develop a fast and accurate calculation model to identify the anti-cancer peptide. Machine learning algorithms are a good choice. MATERIALS AND METHODS In our study, a multi-classifier system was used, combined with multiple machine learning models, to predict anti-cancer peptides. These individual learners are composed of different feature information and algorithms, and form a multi-classifier system by voting. RESULTS AND CONCLUSION The experiments show that the overall prediction rate of each individual learner is above 80% and the overall accuracy of multi-classifier system for anti-cancer peptides prediction can reach 95.93%, which is better than the existing prediction model.
Collapse
Affiliation(s)
- Wanben Zhong
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China
| | - Bineng Zhong
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China.,Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing University of Science and Technology, Nanjing, 210094, China
| | - Hongbo Zhang
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China
| | - Ziyi Chen
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China
| | - Yan Chen
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China
| |
Collapse
|
32
|
Taxonomy dimension reduction for colorectal cancer prediction. Comput Biol Chem 2019; 83:107160. [DOI: 10.1016/j.compbiolchem.2019.107160] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 11/02/2019] [Accepted: 11/04/2019] [Indexed: 02/01/2023]
|
33
|
Hu Y, Zhao T, Zhang N, Zhang Y, Cheng L. A Review of Recent Advances and Research on Drug Target Identification Methods. Curr Drug Metab 2019; 20:209-216. [PMID: 30251599 DOI: 10.2174/1389200219666180925091851] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 01/01/2018] [Accepted: 08/02/2018] [Indexed: 12/14/2022]
Abstract
BACKGROUND From a therapeutic viewpoint, understanding how drugs bind and regulate the functions of their target proteins to protect against disease is crucial. The identification of drug targets plays a significant role in drug discovery and studying the mechanisms of diseases. Therefore the development of methods to identify drug targets has become a popular issue. METHODS We systematically review the recent work on identifying drug targets from the view of data and method. We compiled several databases that collect data more comprehensively and introduced several commonly used databases. Then divided the methods into two categories: biological experiments and machine learning, each of which is subdivided into different subclasses and described in detail. RESULTS Machine learning algorithms are the majority of new methods. Generally, an optimal set of features is chosen to predict successful new drug targets with similar properties. The most widely used features include sequence properties, network topological features, structural properties, and subcellular locations. Since various machine learning methods exist, improving their performance requires combining a better subset of features and choosing the appropriate model for the various datasets involved. CONCLUSION The application of experimental and computational methods in protein drug target identification has become increasingly popular in recent years. Current biological and computational methods still have many limitations due to unbalanced and incomplete datasets or imperfect feature selection methods.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ningyi Zhang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
34
|
Zhan Q, Wang N, Jin S, Tan R, Jiang Q, Wang Y. ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function. BMC Bioinformatics 2019; 20:573. [PMID: 31760933 PMCID: PMC6876095 DOI: 10.1186/s12859-019-3132-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND During procedures for conducting multiple sequence alignment, that is so essential to use the substitution score of pairwise alignment. To compute adaptive scores for alignment, researchers usually use Hidden Markov Model or probabilistic consistency methods such as partition function. Recent studies show that optimizing the parameters for hidden Markov model, as well as integrating hidden Markov model with partition function can raise the accuracy of alignment. The combination of partition function and optimized HMM, which could further improve the alignment's accuracy, however, was ignored by these researches. RESULTS A novel algorithm for MSA called ProbPFP is presented in this paper. It intergrate optimized HMM by particle swarm with partition function. The algorithm of PSO was applied to optimize HMM's parameters. After that, the posterior probability obtained by the HMM was combined with the one obtained by partition function, and thus to calculate an integrated substitution score for alignment. In order to evaluate the effectiveness of ProbPFP, we compared it with 13 outstanding or classic MSA methods. The results demonstrate that the alignments obtained by ProbPFP got the maximum mean TC scores and mean SP scores on these two benchmark datasets: SABmark and OXBench, and it got the second highest mean TC scores and mean SP scores on the benchmark dataset BAliBASE. ProbPFP is also compared with 4 other outstanding methods, by reconstructing the phylogenetic trees for six protein families extracted from the database TreeFam, based on the alignments obtained by these 5 methods. The result indicates that the reference trees are closer to the phylogenetic trees reconstructed from the alignments obtained by ProbPFP than the other methods. CONCLUSIONS We propose a new multiple sequence alignment method combining optimized HMM and partition function in this paper. The performance validates this method could make a great improvement of the alignment's accuracy.
Collapse
Affiliation(s)
- Qing Zhan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Nan Wang
- Department of Mathematics, Harbin Institute of Technology, Harbin, 150001, China
| | - Shuilin Jin
- Department of Mathematics, Harbin Institute of Technology, Harbin, 150001, China
| | - Renjie Tan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
35
|
Zhao T, Wang D, Hu Y, Zhang N, Zang T, Wang Y. Identifying Alzheimer’s Disease-related miRNA Based on Semi-clustering. Curr Gene Ther 2019; 19:216-223. [DOI: 10.2174/1566523219666190924113737] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/05/2019] [Accepted: 06/12/2019] [Indexed: 01/14/2023]
Abstract
Background:
More and more scholars are trying to use it as a specific biomarker for Alzheimer’s
Disease (AD) and mild cognitive impairment (MCI). Multiple studies have indicated that
miRNAs are associated with poor axonal growth and loss of synaptic structures, both of which are early
events in AD. The overall loss of miRNA may be associated with aging, increasing the incidence of
AD, and may also be involved in the disease through some specific molecular mechanisms.
Objective:
Identifying Alzheimer’s disease-related miRNA can help us find new drug targets, early
diagnosis.
Materials and Methods:
We used genes as a bridge to connect AD and miRNAs. Firstly, proteinprotein
interaction network is used to find more AD-related genes by known AD-related genes. Then,
each miRNA’s correlation with these genes is obtained by miRNA-gene interaction. Finally, each
miRNA could get a feature vector representing its correlation with AD. Unlike other studies, we do not
generate negative samples randomly with using classification method to identify AD-related miRNAs.
Here we use a semi-clustering method ‘one-class SVM’. AD-related miRNAs are considered as outliers
and our aim is to identify the miRNAs that are similar to known AD-related miRNAs (outliers).
Results and Conclusion:
We identified 257 novel AD-related miRNAs and compare our method with
SVM which is applied by generating negative samples. The AUC of our method is much higher than
SVM and we did case studies to prove that our results are reliable.
Collapse
Affiliation(s)
- Tianyi Zhao
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Donghua Wang
- Department of General Surgery, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Yang Hu
- School of life Science and Tenchnology, Harbin Institute of Technology, Harbin, China
| | - Ningyi Zhang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
36
|
Han Z, Hua J, Xue W, Zhu F. Integrating the Ribonucleic Acid Sequencing Data From Various Studies for Exploring the Multiple Sclerosis-Related Long Noncoding Ribonucleic Acids and Their Functions. Front Genet 2019; 10:1136. [PMID: 31781177 PMCID: PMC6861379 DOI: 10.3389/fgene.2019.01136] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 10/18/2019] [Indexed: 12/19/2022] Open
Abstract
Multiple sclerosis (MS) is a chronic fatal central nervous system (CNS) disease involving in complex immunity dysfunction. Recently, long noncoding RNAs (lncRNAs) were discovered as the important regulatory factors for the pathogenesis of MS. However, these findings often cannot be repeated and confirmed by the subsequent studies. We considered that the small-scale samples or the heterogeneity among various tissues may result in the divergence of the results. Currently, RNA-seq has become a powerful approach to quantify the abundances of lncRNA transcripts. Therefore, we comprehensively collected the MS-related RNA-seq data from a variety of previous studies, and integrated these data using an expression-based meta-analysis to identify the differentially expressed lncRNA between MS patients and controls in whole samples and subgroups. Then, we performed the Jensen-Shannon (JS) divergence and cluster analysis to explore the heterogeneity and expression specificity among various tissues. Finally, we investigated the potential function of identified lncRNAs for MS using weighted gene co-expression network analysis (WGCNA) and gene set enrichment analysis (GSEA), and 5,420 MS-related lncRNAs specifically expressed in the brain tissue were identified. The subgroup analysis found a small heterogeneity of the lncRNA expression profiles between brain and blood tissues. The results of WGCNA and GSEA showed that a potential important function of lncRNAs in MS may be involved in the regulation of ribonucleoproteins and tumor necrosis factor cytokines receptors. In summary, this study provided a strategy to explore disease-related lncRNAs on genome-wide scale, and our findings will be benefit to improve the understanding of MS pathogenesis.
Collapse
Affiliation(s)
- Zhijie Han
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Jiao Hua
- School of Mathematics, Harbin Institute of Technology, Harbin, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| |
Collapse
|
37
|
Abstract
Protein methylation is an important and reversible post-translational modification
that regulates many biological processes in cells. It occurs mainly on lysine and arginine
residues and involves many important biological processes, including transcriptional
activity, signal transduction, and the regulation of gene expression. Protein methylation
and its regulatory enzymes are related to a variety of human diseases, so improved identification
of methylation sites is useful for designing drugs for a variety of related diseases.
In this review, we systematically summarize and analyze the tools used for the prediction
of protein methylation sites on arginine and lysine residues over the last decade.
Collapse
Affiliation(s)
- Chunyan Ao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Shunshan Jin
- Department of Neurology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Yuan Lin
- Department of System Integration, Sparebanken Vest, Bergen, Norway
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
38
|
Zhao T, Hu Y, Zang T, Wang Y. Integrate GWAS, eQTL, and mQTL Data to Identify Alzheimer's Disease-Related Genes. Front Genet 2019; 10:1021. [PMID: 31708967 PMCID: PMC6824203 DOI: 10.3389/fgene.2019.01021] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Accepted: 09/24/2019] [Indexed: 12/19/2022] Open
Abstract
It is estimated that the impact of related genes on the risk of Alzheimer's disease (AD) is nearly 70%. Identifying candidate causal genes can help treatment and diagnosis. The maturity of sequencing technology and the reduction of cost make genome-wide association study (GWAS) become an important means to find disease-related mutation sites. Because of linkage disequilibrium (LD), neither the gene regulated by SNP nor the specific SNP can be determined. Because GWAS is affected by sample size and interaction, we introduced empirical Bayes (EB) to make a meta-analysis of GWAS to greatly eliminate the bias caused by sample and the interaction of SNP. In addition, most SNPs are in the noncoding region, so it is not clear how they relate to phenotype. In this paper, expression quantitative trait locus (eQTL) studies and methylation quantitative trait locus (mQTL) studies are combined with GWAS to find the genes associated with Alzheimer disease in expression levels by pleiotropy. Summary data-based Mendelian randomization (SMR) is introduced to integrate GWAS and eQTL/mQTL data. Finally, we prioritized 274 significant SNPs, which belong to 20 genes by eQTL analysis and 379 significant SNPs, which belong to seven known genes by mQTL. Among them, 93 SNPs and 2 genes are overlapped. Finally, we did 10 case studies to prove the effectiveness of our method.
Collapse
Affiliation(s)
- Tianyi Zhao
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yang Hu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
39
|
Wang Y, Xie Y, Li L, He Y, Zheng D, Yu P, Yu L, Tang L, Wang Y, Wang Z. EZH2 RIP-seq Identifies Tissue-specific Long Non-coding RNAs. Curr Gene Ther 2019; 18:275-285. [PMID: 30295189 PMCID: PMC6249712 DOI: 10.2174/1566523218666181008125010] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Revised: 05/24/2018] [Accepted: 09/17/2018] [Indexed: 02/07/2023]
Abstract
Background: Polycomb Repressive Complex 2 (PRC2) catalyzes histone methylation at H3 Lys27, and plays crucial roles during development and diseases in numerous systems. Its catalytic sub-unit EZH2 represents a key nuclear target for long non-coding RNAs (lncRNAs) that emerging to be a novel class of epigenetic regulator and participate in diverse cellular processes. LncRNAs are character-ized by high tissue-specificity; however, little is known about the tissue profile of the EZH2-interacting lncRNAs. Objective: Here we performed a global screening for EZH2-binding lncRNAs in tissues including brain, lung, heart, liver, kidney, intestine, spleen, testis, muscle and blood by combining RNA immuno-precipitation and RNA sequencing. We identified 1328 EZH2-binding lncRNAs, among which 470 were shared in at least two tissues while 858 were only detected in single tissue. An RNA motif with specific secondary structure was identified in a number of lncRNAs, albeit not in all EZH2-binding lncRNAs. The EZH2-binding lncRNAs fell into four categories including intergenic lncRNA, antisense lncRNA, intron-related lncRNA and promoter-related lncRNA, suggesting diverse regulations of both cis and trans-mechanisms. A promoter-related lncRNA Hnf1aos1 bound to EZH2 specifically in the liver, a feature same as its paired coding gene Hnf1a, further confirming the validity of our study. In ad-dition to the well known EZH2-binding lncRNAs like Kcnq1ot1, Gas5, Meg3, Hotair and Malat1, ma-jority of the lncRNAs were firstly reported to be associated with EZH2. Conclusion: Our findings provide a profiling view of the EZH2-interacting lncRNAs across different tissues, and suggest critical roles of lncRNAs during cell differentiation and maturation
Collapse
Affiliation(s)
- Yan Wang
- Department of Cardiovascular Medicine, Beijing Hospital, National Center of Gerontology, Beijing 100730, China
| | - Yinping Xie
- Department of Cardiology, Central Laboratory, Renmin Hospital, Wuhan University, Wuhan 430060, China
| | - Lili Li
- Department of Cardiology, Central Laboratory, Renmin Hospital, Wuhan University, Wuhan 430060, China
| | - Yuan He
- Department of Cardiology, Central Laboratory, Renmin Hospital, Wuhan University, Wuhan 430060, China
| | - Di Zheng
- Department of Orthopedics, Renmin Hospital, Wuhan University, Wuhan 430060, China
| | - Pengcheng Yu
- Department of Cardiology, Central Laboratory, Renmin Hospital, Wuhan University, Wuhan 430060, China
| | - Ling Yu
- Department of Orthopedics, Renmin Hospital, Wuhan University, Wuhan 430060, China
| | - Lixu Tang
- Wushu College, Wuhan Sports University, Wuhan, Hubei 430079, China
| | - Yibin Wang
- Departments of Anesthesiology, Division of Molecular Medicine, Physiology and Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, United States
| | - Zhihua Wang
- Department of Cardiology, Central Laboratory, Renmin Hospital, Wuhan University, Wuhan 430060, China
| |
Collapse
|
40
|
Cheng L, Zhao H, Wang P, Zhou W, Luo M, Li T, Han J, Liu S, Jiang Q. Computational Methods for Identifying Similar Diseases. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:590-604. [PMID: 31678735 PMCID: PMC6838934 DOI: 10.1016/j.omtn.2019.09.019] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/11/2019] [Accepted: 09/12/2019] [Indexed: 02/01/2023]
Abstract
Although our knowledge of human diseases has increased dramatically, the molecular basis, phenotypic traits, and therapeutic targets of most diseases still remain unclear. An increasing number of studies have observed that similar diseases often are caused by similar molecules, can be diagnosed by similar markers or phenotypes, or can be cured by similar drugs. Thus, the identification of diseases similar to known ones has attracted considerable attention worldwide. To this end, the associations between diseases at the molecular, phenotypic, and taxonomic levels were used to measure the pairwise similarity in diseases. The corresponding performance assessment strategies for these methods involving the terms “category-based,” “simulated-patient-based,” and “benchmark-data-based” were thus further emphasized. Then, frequently used methods were evaluated using a benchmark-data-based strategy. To facilitate the assessment of disease similarity scores, researchers have designed dozens of tools that implement these methods for calculating disease similarity. Currently, disease similarity has been advantageous in predicting noncoding RNA (ncRNA) function and therapeutic drugs for diseases. In this article, we review disease similarity methods, evaluation strategies, tools, and their applications in the biomedical community. We further evaluate the performance of these methods and discuss the current limitations and future trends for calculating disease similarity.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hengqiang Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Tianxin Li
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Shulin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, Heilongjiang, China; Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, AB, Canada.
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
41
|
Huang R, Zeng Z, Li G, Song D, Yan P, Yin H, Hu P, Zhu X, Chang R, Zhang X, Zhang J, Meng T, Huang Z. The Construction and Comprehensive Analysis of ceRNA Networks and Tumor-Infiltrating Immune Cells in Bone Metastatic Melanoma. Front Genet 2019; 10:828. [PMID: 31608101 PMCID: PMC6774271 DOI: 10.3389/fgene.2019.00828] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 08/12/2019] [Indexed: 12/19/2022] Open
Abstract
Background/Aims: As a malignant and melanocytic tumor, cutaneous melanoma is the devastating skin tumor with high rates of recurrence and metastasis. Bone is the common metastatic location, and bone metastasis may result in pathologic fracture, neurologic damage, and severe bone pain. Although metastatic melanoma was reported to get benefits from immunotherapy, molecular mechanisms and immune microenviroment underlying the melanoma bone metastasis and prognostic factors are still unknown. Methods: Gene expression profiling of 112 samples, including 104 primary melanomas and 8 bone metastatic melanomas from The Cancer Genome Atlas database, was assayed to construct a ceRNA network associated with bone metastases. Besides, we detected the fraction of 22 immune cell types in melanoma via the algorithm of “cell type identification by estimating relative subsets of RNA transcripts (CIBERSORT).” Based on the significant ceRNAs or immune cells, we constructed nomograms to predict the prognosis of patients with melanoma. Ultimately, correlation analysis was implemented to discover the relationship between the significant ceRNA and immune cells to reveal the potential signaling pathways. Results: We constructed a ceRNA network based on the interaction among 8 pairs of long noncoding RNA–microRNA and 15 pairs of microRNA–mRNA. CIBERSORT and ceRNA integration analysis discovered that AL118506.1 has both significant prognostic value (P = 0.002) and high correlation with T follicular helper cells (P = 0.033). Meanwhile, T cells CD8 and macrophages M2 were negatively correlated (P < 0.001). Moreover, we constructed two satisfactory nomograms (area under curve of 3-year survival: 0.899; 5-year survival: 0.885; and concordance index: 0.780) with significant ceRNAs or immune cells, to predict the prognosis of patients. Conclusions: In this study, we suggest that bone metastasis in melanoma might be related to AL118506.1 and its role in regulating thrombospondin 2 and T follicular helper cells. Two nomograms were constructed to predict the prognosis of patients with melanoma and demonstrated their value in improving the personalized management.
Collapse
Affiliation(s)
- Runzhi Huang
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.,Division of Spine, Department of Orthopedics, Tongji Hospital affiliated to Tongji University School of Medicine, Shanghai, China.,Tongji University School of Medicine, Tongji University, Shanghai, China
| | - Zhiwei Zeng
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Guangyu Li
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Dianwen Song
- Department of Orthopedics, Shanghai General Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Penghui Yan
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Huabin Yin
- Department of Orthopedics, Shanghai General Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Peng Hu
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Xiaolong Zhu
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Ruizhi Chang
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Xu Zhang
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Jie Zhang
- Shanghai East Hospital, Key Laboratory of Arrhythmias, Ministry of Education, Tongji University School of Medicine, Shanghai, China
| | - Tong Meng
- Division of Spine, Department of Orthopedics, Tongji Hospital affiliated to Tongji University School of Medicine, Shanghai, China.,Tongji University School of Medicine, Tongji University, Shanghai, China.,Department of Orthopedics, Shanghai General Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Zongqiang Huang
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| |
Collapse
|
42
|
Wang Y, Nie C, Zang T, Wang Y. Predicting circRNA-Disease Associations Based on circRNA Expression Similarity and Functional Similarity. Front Genet 2019; 10:832. [PMID: 31572444 PMCID: PMC6751509 DOI: 10.3389/fgene.2019.00832] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 08/13/2019] [Indexed: 12/19/2022] Open
Abstract
Circular RNAs (circRNAs) are a novel class of endogenous noncoding RNAs that have well-conserved sequences. Emerging evidence has shown that circRNAs can be novel biomarkers or therapeutic targets for many diseases and play an important role in the development of various pathological conditions. Therefore, identifying potential disease-related circRNAs is helpful in improving the efficiency of finding therapeutic targets for diseases. Here, we propose a computational model (PreCDA) to predict potential circRNA-disease associations. First, we calculated the circRNA expression similarity based on circRNA expression profiles. The circRNA functional similarity is calculated based on cosine similarity, and the disease similarity is used as the dimension of each circRNA vector. The associations between circRNAs and diseases are defined based on the circRNA functional similarity and expression similarity. We constructed a disease-related circRNA association network and used a graph-based recommendation algorithm (PersonalRank) to sort candidate disease-related circRNAs. As a result, PreCDA has an average area under the receiver operating characteristic curve value of 78.15% in predicting candidate disease-related circRNAs. In addition, we discuss the factors that affect the performance of this method and find some unknown circRNAs related to diseases, with several common diseases used as case studies. These results show that PreCDA has good performance in predicting potential circRNA-disease associations and is helpful for the diagnosis and treatment of human diseases.
Collapse
Affiliation(s)
| | | | - Tianyi Zang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
43
|
Lv H, Dao FY, Guan ZX, Zhang D, Tan JX, Zhang Y, Chen W, Lin H. iDNA6mA-Rice: A Computational Tool for Detecting N6-Methyladenine Sites in Rice. Front Genet 2019; 10:793. [PMID: 31552096 PMCID: PMC6746913 DOI: 10.3389/fgene.2019.00793] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Accepted: 07/26/2019] [Indexed: 01/08/2023] Open
Abstract
DNA N6-methyladenine (6mA) is a dominant DNA modification form and involved in many biological functions. The accurate genome-wide identification of 6mA sites may increase understanding of its biological functions. Experimental methods for 6mA detection in eukaryotes genome are laborious and expensive. Therefore, it is necessary to develop computational methods to identify 6mA sites on a genomic scale, especially for plant genomes. Based on this consideration, the study aims to develop a machine learning-based method of predicting 6mA sites in the rice genome. We initially used mono-nucleotide binary encoding to formulate positive and negative samples. Subsequently, the machine learning algorithm named Random Forest was utilized to perform the classification for identifying 6mA sites. Our proposed method could produce an area under the receiver operating characteristic curve of 0.964 with an overall accuracy of 0.917, as indicated by the fivefold cross-validation test. Furthermore, an independent dataset was established to assess the generalization ability of our method. Finally, an area under the receiver operating characteristic curve of 0.981 was obtained, suggesting that the proposed method had good performance of predicting 6mA sites in the rice genome. For the convenience of retrieving 6mA sites, on the basis of the computational method, we built a freely accessible web server named iDNA6mA-Rice at http://lin-group.cn/server/iDNA6mA-Rice.
Collapse
Affiliation(s)
- Hao Lv
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiu-Xin Tan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yong Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
44
|
Lv H, Zhang ZM, Li SH, Tan JX, Chen W, Lin H. Evaluation of different computational methods on 5-methylcytosine sites identification. Brief Bioinform 2019; 21:982-995. [DOI: 10.1093/bib/bbz048] [Citation(s) in RCA: 82] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 03/25/2019] [Accepted: 04/01/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
5-Methylcytosine (m5C) plays an extremely important role in the basic biochemical process. With the great increase of identified m5C sites in a wide variety of organisms, their epigenetic roles become largely unknown. Hence, accurate identification of m5C site is a key step in understanding its biological functions. Over the past several years, more attentions have been paid on the identification of m5C sites in multiple species. In this work, we firstly summarized the current progresses in computational prediction of m5C sites and then constructed a more powerful and reliable model for identifying m5C sites. To train the model, we collected experimentally confirmed m5C data from Homo sapiens, Mus musculus, Saccharomyces cerevisiae and Arabidopsis thaliana, and compared the performances of different feature extraction methods and classification algorithms for optimizing prediction model. Based on the optimal model, a novel predictor called iRNA-m5C was developed for the recognition of m5C sites. Finally, we critically evaluated the performance of iRNA-m5C and compared it with existing methods. The result showed that iRNA-m5C could produce the best prediction performance. We hope that this paper could provide a guide on the computational identification of m5C site and also anticipate that the proposed iRNA-m5C will become a powerful tool for large scale identification of m5C sites.
Collapse
Affiliation(s)
- Hao Lv
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zi-Mei Zhang
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Shi-Hao Li
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiu-Xin Tan
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Chen
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
45
|
Zhuang H, Han J, Cheng L, Liu SL. A Positive Causal Influence of IL-18 Levels on the Risk of T2DM: A Mendelian Randomization Study. Front Genet 2019; 10:295. [PMID: 31024619 PMCID: PMC6459887 DOI: 10.3389/fgene.2019.00295] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 03/19/2019] [Indexed: 12/21/2022] Open
Abstract
A large number of clinical studies have shown that interleukin-18 (IL-18) plasma levels are positively correlated with the pathogenesis and development of type 2 diabetes mellitus (T2DM), but it remains unclear whether IL-18 causes T2DM, primarily due to the influence of reverse causality and residual confounding factors. Genome-wide association studies have led to the discovery of numerous common variants associated with IL-18 and T2DM and opened unprecedented opportunities for investigating possible associations between genetic traits and diseases. In this study, we employed a two-sample Mendelian randomization (MR) method to analyze the causal relationships between IL-18 plasma levels and T2DM using IL18-related SNPs as genetic instrumental variables (IVs). We first selected eight SNPs that were significantly associated with IL-18 but independent of T2DM. We then used these SNPs as IVs to evaluate their effects on T2DM using the inverse-variance weighted (IVW) method. Finally, we conducted sensitivity analysis and MR-Egger regression analysis to evaluate the heterogeneity and pleiotropic effects of each variant. The results based on the IVW method demonstrate that high IL-18 plasma levels significantly increase the risk of T2DM, and no heterogeneity or pleiotropic effects appeared after the sensitivity and MR-Egger analyses.
Collapse
Affiliation(s)
- He Zhuang
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shu-Lin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, China.,Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
46
|
Ru X, Li L, Wang C. Identification of Phage Viral Proteins With Hybrid Sequence Features. Front Microbiol 2019; 10:507. [PMID: 30972038 PMCID: PMC6443926 DOI: 10.3389/fmicb.2019.00507] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2018] [Accepted: 02/27/2019] [Indexed: 02/01/2023] Open
Abstract
The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accurately. Extracting comprehensive and effective sequence features from proteins plays a vital role in protein classification. In order to more fully represent protein information, this paper is more comprehensive and effective by combining the features extracted by the feature information representation algorithm based on sequence information (CCPA) and the feature representation algorithm based on sequence and structure information. After extracting features, the Max-Relevance-Max-Distance (MRMD) algorithm is used to select the optimal feature set with the strongest correlation between class labels and low redundancy between features. Given the randomness of the samples selected by the random forest classification algorithm and the randomness features for producing each node variable, a random forest method is employed to perform 10-fold cross-validation on the bacteriophage protein classification. The accuracy of this model is as high as 93.5% in the classification of phage proteins in this study. This study also found that, among the eight physicochemical properties considered, the charge property has the greatest impact on the classification of bacteriophage proteins These results indicate that the model discussed in this paper is an important tool in bacteriophage protein research.
Collapse
Affiliation(s)
- Xiaoqing Ru
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Lihong Li
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
47
|
Cheng L, Zhuang H, Ju H, Yang S, Han J, Tan R, Hu Y. Exposing the Causal Effect of Body Mass Index on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study. Front Genet 2019; 10:94. [PMID: 30891058 PMCID: PMC6413727 DOI: 10.3389/fgene.2019.00094] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Accepted: 01/29/2019] [Indexed: 12/17/2022] Open
Abstract
Introduction: High body mass index (BMI) is a positive associated phenotype of type 2 diabetes mellitus (T2DM). Abundant studies have observed this from a clinical perspective. Since the rapid increase in a large number of genetic variants from the genome-wide association studies (GWAS), common SNPs of BMI and T2DM were identified as the genetic basis for understanding their associations. Currently, their causality is beginning to blur. Materials and Methods: To classify it, a Mendelian randomisation (MR), using genetic instrumental variables (IVs) to explore the causality of intermediate phenotype and disease, was utilized here to test the effect of BMI on the risk of T2DM. In this article, MR was carried out on GWAS data using 52 independent BMI SNPs as IVs. The pooled odds ratio (OR) of these SNPs was calculated using inverse-variance weighted method for the assessment of 5 kg/m2 higher BMI on the risk of T2DM. The leave-one-out validation was conducted to identify the effect of individual SNPs. MR-Egger regression was utilized to detect potential pleiotropic bias of variants. Results: We obtained the high OR (1.470; 95% CI 1.170 to 1.847; P = 0.001), low intercept (0.004, P = 0.661), and small fluctuation of ORs {from -0.039 [(1.412 - 1.470) / 1.470)] to 0.075 [(1.568- 1.470) / 1.470)] in leave-one-out validation. Conclusion: We validate the causal effect of high BMI on the risk of T2DM. The low intercept shows no pleiotropic bias of IVs. The small alterations of ORs activated by removing individual SNPs showed no single SNP drives our estimate.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - He Zhuang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hong Ju
- Department of Information Engineering, Heilongjiang Biological Science and Technology Career Academy, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Renjie Tan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yang Hu
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
48
|
Hu Y, Zhao T, Zang T, Zhang Y, Cheng L. Identification of Alzheimer's Disease-Related Genes Based on Data Integration Method. Front Genet 2019; 9:703. [PMID: 30740125 PMCID: PMC6355707 DOI: 10.3389/fgene.2018.00703] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 12/14/2018] [Indexed: 01/18/2023] Open
Abstract
Alzheimer disease (AD) is the fourth major cause of death in the elderly following cancer, heart disease and cerebrovascular disease. Finding candidate causal genes can help in the design of Gene targeted drugs and effectively reduce the risk of the disease. Complex diseases such as AD are usually caused by multiple genes. The Genome-wide association study (GWAS), has identified the potential genetic variants for most diseases. However, because of linkage disequilibrium (LD), it is difficult to identify the causative mutations that directly cause diseases. In this study, we combined expression quantitative trait locus (eQTL) studies with the GWAS, to comprehensively define the genes that cause Alzheimer disease. The method used was the Summary Mendelian randomization (SMR), which is a novel method to integrate summarized data. Two GWAS studies and five eQTL studies were referenced in this paper. We found several candidate SNPs that have a strong relationship with AD. Most of these SNPs overlap in different data sets, providing relatively strong reliability. We also explain the function of the novel AD-related genes we have discovered.
Collapse
Affiliation(s)
- Yang Hu
- Department of Computer Science and Technology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zhao
- Department of Computer Science and Technology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zang
- Department of Computer Science and Technology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ying Zhang
- Department of Rehabilitation, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Liang Cheng
- Department of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
49
|
Pang L, Wang J, Zhao L, Wang C, Zhan H. A Novel Protein Subcellular Localization Method With CNN-XGBoost Model for Alzheimer's Disease. Front Genet 2019; 9:751. [PMID: 30713552 PMCID: PMC6345701 DOI: 10.3389/fgene.2018.00751] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 12/31/2018] [Indexed: 12/26/2022] Open
Abstract
The disorder distribution of protein in the compartment or organelle leads to many human diseases, including neurodegenerative diseases such as Alzheimer's disease. The prediction of protein subcellular localization play important roles in the understanding of the mechanism of protein function, pathogenes and disease therapy. This paper proposes a novel subcellular localization method by integrating the Convolutional Neural Network (CNN) and eXtreme Gradient Boosting (XGBoost), where CNN acts as a feature extractor to automatically obtain features from the original sequence information and a XGBoost classifier as a recognizer to identify the protein subcellular localization based on the output of the CNN. Experiments are implemented on three protein datasets. The results prove that the CNN-XGBoost method performs better than the general protein subcellular localization methods.
Collapse
Affiliation(s)
- Long Pang
- Harbin Nebula Bioinformatics Technology Development Co., Ltd., Harbin, China
| | - Junjie Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Lingling Zhao
- School of Electronic Engineering, Heilongjiang University, Harbin, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Hui Zhan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
50
|
Wang N, Zhang Y, Xu L, Jin S. Relationship Between Alzheimer's Disease and the Immune System: A Meta-Analysis of Differentially Expressed Genes. Front Neurosci 2019; 12:1026. [PMID: 30705616 PMCID: PMC6344412 DOI: 10.3389/fnins.2018.01026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 12/18/2018] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD), a neurodegenerative diseases (neuro-diseases) which is prevalent in the elderly and seriously affects the lives of individuals. Many studies have discussed the relationship between immune system and AD pathogenesis. Here, the meta-analysis of differentially expressed (DE) genes based on microarray data was conducted to study the association between AD and immune system. 9519 target genes of hippocampus in 146 subjects (73 AD cases and 73 controls) from 4 microarray data sets were compiled and DE genes with p < 1.00E - 04 were selected to conduct the pathway-analysis. The results indicated that the DE genes were significantly enriched in the neuro-diseases as well as the immune system pathways.
Collapse
Affiliation(s)
- Nan Wang
- Department of Mathematics, Harbin Institute of Technology, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Li Xu
- College of Computer Science and Technology, Harbin Engineering University, Harbin, China
| | - Shuilin Jin
- Department of Mathematics, Harbin Institute of Technology, Harbin, China
| |
Collapse
|