151
|
Zheng G, Zhang Y, Wang H, Ding E, Qu A, Su P, Yang Y, Zou M, Zhang Y. Genome-wide DNA methylation analysis by MethylRad and the transcriptome profiles reveal the potential cancer-related lncRNAs in colon cancer. Cancer Med 2020; 9:7601-7612. [PMID: 32869528 PMCID: PMC7571838 DOI: 10.1002/cam4.3412] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 08/03/2020] [Accepted: 08/05/2020] [Indexed: 12/13/2022] Open
Abstract
Colon cancer (CC) is characterized by global aberrant DNA methylation that may affect gene expression and genomic stability. A series of studies have demonstrated that DNA methylation could regulate the expressions of not only protein-coding genes but also ncRNAs. However, the regulatory role of lncRNA genes methylaton in CC remains largely unknown. In the present study, we systemically characterize the profile of DNA methylation, especially the aberrant methylation of lncRNAs genes using MethylRAD technology. A total of 132 999 CCGG/8487 CCWGG sites were identified as differentially methylated sites (DMSs), which were mainly located on the introns and intergenic elements. Moreover, 1,359 CCGG/1,052 CCWGG differentially methylated genes (DMGs) were screened. Our results demonstrated that aberrant methylation of lncRNA genes occurred most frequently, accounting for 37.5% and 44.3% in CCGG and CCWGG DMGs respectively. In addition, 963 lncRNA DMGs were co-analyzed with 1328 differentially expressed lncRNAs which were identified from TCGA database. We found that 15 lncRNAs might be CC-related lncRNAs. ZNF667-AS1 and MAFA-AS1 were down-regulated in CC, which might be silenced by hypermethylation. Besides, 13 lncRNAs were hypomethylated and up-regulated in CC. Moreover, our results validated the expression and methylation level of CC-related lncRNAs by RT-qPCR and pyrosequencing assay. In conclusion, we performed a genome-wide DNA methylation analysis by MethylRAD to acquire both CCGG and CCWGG DMSs and DMGs in CC. The results screened lncRNA DMSs as potential biomarkers and identified 15 lncRNAs as CC-related lncRNAs. This study provided novel therapy targets and valuable insights into molecular mechanism in tumorigenesis and development of CC.
Collapse
Affiliation(s)
- Guixi Zheng
- Department of Clinical Laboratory, Qilu Hospital of Shandong University, Jinan, Shandong, China
| | - Yuzhi Zhang
- Department of Clinical Laboratory, Affiliated Hospital of Weifang Medical University, Weifang, Shandong, China
| | - Hongchun Wang
- Department of Clinical Laboratory, Qilu Hospital of Shandong University, Jinan, Shandong, China
| | - E Ding
- Department of Clinical Laboratory, Qilu Hospital of Shandong University, Jinan, Shandong, China
| | - Ailin Qu
- Department of Clinical Laboratory, Qilu Hospital of Shandong University, Jinan, Shandong, China
| | - Peng Su
- Department of Pathology, Shandong University School of Medicine, Jinan, Shandong, China
| | - Yongmei Yang
- Department of Clinical Laboratory, Qilu Hospital of Shandong University, Jinan, Shandong, China
| | - Mingjin Zou
- Department of Clinical Laboratory, Qilu Hospital of Shandong University, Jinan, Shandong, China
| | - Yi Zhang
- Department of Clinical Laboratory, Qilu Hospital of Shandong University, Jinan, Shandong, China
| |
Collapse
|
152
|
Zhuang H, Zhang Y, Yang S, Cheng L, Liu SL. A Mendelian Randomization Study on Infant Length and Type 2 Diabetes Mellitus Risk. Curr Gene Ther 2020; 19:224-231. [PMID: 31553296 DOI: 10.2174/1566523219666190925115535] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/15/2019] [Accepted: 06/16/2019] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Infant length (IL) is a positively associated phenotype of type 2 diabetes mellitus (T2DM), but the causal relationship of which is still unclear. Here, we applied a Mendelian randomization (MR) study to explore the causal relationship between IL and T2DM, which has the potential to provide guidance for assessing T2DM activity and T2DM- prevention in young at-risk populations. MATERIALS AND METHODS To classify the study, a two-sample MR, using genetic instrumental variables (IVs) to explore the causal effect was applied to test the influence of IL on the risk of T2DM. In this study, MR was carried out on GWAS data using 8 independent IL SNPs as IVs. The pooled odds ratio (OR) of these SNPs was calculated by the inverse-variance weighted method for the assessment of the risk the shorter IL brings to T2DM. Sensitivity validation was conducted to identify the effect of individual SNPs. MR-Egger regression was used to detect pleiotropic bias of IVs. RESULTS The pooled odds ratio from the IVW method was 1.03 (95% CI 0.89-1.18, P = 0.0785), low intercept was -0.477, P = 0.252, and small fluctuation of ORs ranged from -0.062 ((0.966 - 1.03) / 1.03) to 0.05 ((1.081 - 1.03) / 1.03) in leave-one-out validation. CONCLUSION We validated that the shorter IL causes no additional risk to T2DM. The sensitivity analysis and the MR-Egger regression analysis also provided adequate evidence that the above result was not due to any heterogeneity or pleiotropic effect of IVs.
Collapse
Affiliation(s)
- He Zhuang
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, 150001, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shu-Lin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China.,Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, Canada.,Department of Infectious Diseases, The First Affiliated Hospital, Harbin Medical University, Harbin, China.,Translational Medicine Research and Cooperation Center of Northern China, Heilongjiang Academy of Medical Sciences, Harbin, China
| |
Collapse
|
153
|
Chang Z, Huang R, Fu W, Li J, Ji G, Huang J, Shi W, Yin H, Wang W, Meng T, Huang Z, Wei Q, Qin H. The Construction and Analysis of ceRNA Network and Patterns of Immune Infiltration in Colon Adenocarcinoma Metastasis. Front Cell Dev Biol 2020; 8:688. [PMID: 32850813 PMCID: PMC7417319 DOI: 10.3389/fcell.2020.00688] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 07/06/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Colon adenocarcinoma (COAD) is a malignant and lethal tumor in digestive system and distance metastasis lead to poor prognosis. The metastasis-specific ceRNAs (competitive endogenous RNAs) and tumor-infiltrating immune cells might associate with tumor prognosis and distance metastasis. Nonetheless, few studies have concentrated on ceRNAs and Immune cells in COAD. METHODS The gene expression profile and clinical information of COAD were downloaded from TCGA and divided into two groups: primary tumors with or without distance metastasis. We applied comprehensive bioinformatics methods to analyze differential expression genes (DEGs) related to metastasis and establish the ceRNA networks. The Cox analysis and Lasso regression were utilized to screen the pivotal genes and prevent overfitting. Based on them, the prognosis prediction nomograms were established. The cell type identification by estimating relative subsets of RNA transcripts (CIBERSORT) algorithm was then applied to screen significant tumor immune-infiltrating cells associated with COAD metastasis and established another prognosis prediction model. Ultimately, co-expression analysis was applied to explore the relationship between key genes in ceRNA networks and significant immune cells. Multiple databases and preliminary clinical specimen validation were used to test the expressions of key biomarkers at the cellular and tissue levels. RESULTS We explored 1 significantly differentially expressed lncRNA, 1 significantly differentially expressed miRNA, 8 survival-related immune-infiltrating cells, 5 immune cells associated with distance metastasis. Besides, 3 pairs of important biomarkers associated with COAD metastasis were also identified: T cells follicular helper and hsa-miR-125b-5p (R = -0.200, P < 0.001), Macrophages M0 and hsa-miR-125b-5p (R = 0.170, P < 0.001) and Macrophages M0 and FAS (R = -0.370, P < 0.001). Multidimensional validation and preliminary clinical specimen validation also supported the results. CONCLUSION In this research, we found some significant ceRNAs (FAS and hsa-miR-125b-5p) and tumor-infiltrating immune cells (T cells follicular helper and Macrophages M0) might related to distance metastasis and prognosis of COAD. The nomograms could assist scientific and medical researchers in clinical management.
Collapse
Affiliation(s)
- Zhengyan Chang
- Department of Pathology, Shanghai Tenth People’s Hospital, Tongji University School of Medicine, Shanghai, China
| | - Runzhi Huang
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- Division of Spine, Department of Orthopedics, Tongji Hospital Affiliated to Tongji University School of Medicine, Shanghai, China
- Tongji University School of Medicine, Tongji University, Shanghai, China
| | - Wanting Fu
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Jiehan Li
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Guo Ji
- Department of Pathology, Shanghai Tenth People’s Hospital, Tongji University School of Medicine, Shanghai, China
| | - Jinglei Huang
- Department of Pathology, Shanghai Tenth People’s Hospital, Tongji University School of Medicine, Shanghai, China
| | - Weijun Shi
- Department of Pathology, Shanghai Tenth People’s Hospital, Tongji University School of Medicine, Shanghai, China
| | - Huabin Yin
- Department of Orthopedics, Shanghai General Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Weifeng Wang
- Department of Central Laboratory, Shanghai Tenth People’s Hospital, Shanghai, China
| | - Tong Meng
- Tongji University Cancer Center, Shanghai Tenth People’s Hospital of Tongji University, School of Medicine, Tongji University, Shanghai, China
| | - Zongqiang Huang
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Qing Wei
- Department of Pathology, Shanghai Tenth People’s Hospital, Tongji University School of Medicine, Shanghai, China
| | - Huanlong Qin
- Department of Gastrointestinal Surgery, Shanghai Tenth People’s Hospital, Tongji University School of Medicine, Shanghai, China
| |
Collapse
|
154
|
Predicting Preference of Transcription Factors for Methylated DNA Using Sequence Information. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 22:1043-1050. [PMID: 33294291 PMCID: PMC7691157 DOI: 10.1016/j.omtn.2020.07.035] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 07/28/2020] [Indexed: 12/12/2022]
Abstract
Transcription factors play key roles in cell-fate decisions by regulating 3D genome conformation and gene expression. The traditional view is that methylation of DNA hinders transcription factors binding to them, but recent research has shown that many transcription factors prefer to bind to methylated DNA. Therefore, identifying such transcription factors and understanding their functions is a stepping-stone for studying methylation-mediated biological processes. In this paper, a two-step discriminated method was proposed to recognize transcription factors and their preference for methylated DNA based only on sequences information. In the first step, the proposed model was used to discriminate transcription factors from non-transcription factors. The areas under the curve (AUCs) are 0.9183 and 0.9116, respectively, for the 5-fold cross-validation test and independent dataset test. Subsequently, for the classification of transcription factors that prefer methylated DNA and transcription factors that prefer non-methylated DNA, our model could produce the AUCs of 0.7744 and 0.7356, respectively, for the 5-fold cross-validation test and independent dataset test. Based on the proposed model, a user-friendly web server called TFPred was built, which can be freely accessed at http://lin-group.cn/server/TFPred/.
Collapse
|
155
|
Prediction of N7-methylguanosine sites in human RNA based on optimal sequence features. Genomics 2020; 112:4342-4347. [PMID: 32721444 DOI: 10.1016/j.ygeno.2020.07.035] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 07/18/2020] [Accepted: 07/22/2020] [Indexed: 12/14/2022]
Abstract
N-7 methylguanosine (m7G) modification is a ubiquitous post-transcriptional RNA modification which is vital for maintaining RNA function and protein translation. Developing computational tools will help us to easily predict the m7G sites in RNA sequence. In this work, we designed a sequence-based method to identify the modification site in human RNA sequences. At first, several kinds of sequence features were extracted to code m7G and non-m7G samples. Subsequently, we used mRMR, F-score, and Relief to obtain the optimal subset of features which could produce the maximum prediction accuracy. In 10-fold cross-validation, results showed that the highest accuracy is 94.67% achieved by support vector machine (SVM) for identifying m7G sites in human genome. In addition, we examined the performances of other algorithms and found that the SVM-based model outperformed others. The results indicated that the predictor could be a useful tool for studying m7G. A prediction model is available at https://github.com/MapFM/m7g_model.git.
Collapse
|
156
|
Wong L, You ZH, Guo ZH, Yi HC, Chen ZH, Cao MY. MIPDH: A Novel Computational Model for Predicting microRNA-mRNA Interactions by DeepWalk on a Heterogeneous Network. ACS OMEGA 2020; 5:17022-17032. [PMID: 32715187 PMCID: PMC7376568 DOI: 10.1021/acsomega.9b04195] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Accepted: 03/06/2020] [Indexed: 06/11/2023]
Abstract
Analysis of miRNA-target mRNA interaction (MTI) is of crucial significance in discovering new target candidates for miRNAs. However, the biological experiments for identifying MTIs have a high false positive rate and are high-priced, time-consuming, and arduous. It is an urgent task to develop effective computational approaches to enhance the investigation of miRNA-target mRNA relationships. In this study, a novel method called MIPDH is developed for miRNA-mRNA interaction prediction by using DeepWalk on a heterogeneous network. More specifically, MIPDH extracts two kinds of features, in which a biological behavior feature is learned using a network embedding algorithm on a constructed heterogeneous network derived from 17 kinds of associations among drug, disease, and 6 kinds of biomolecules, and the attribute feature is learned using the k-mer method on sequences of miRNAs and target mRNAs. Then, a random forest classifier is trained on the features combined with the biological behavior feature and attribute feature. When implementing a 5-fold cross-validation experiment, MIPDH achieved an average accuracy, sensitivity, specificity and AUC of 75.85, 74.37, 77.33%, and 0.8044, respectively. To further evaluate the performance of MIPDH, other classifiers and feature descriptors are conducted for comparisons. MIPDH can achieve a better performance. Additionally, case studies on hsa-miR-106b-5p, hsa-let-7d-5p, and hsa-let-7e-5p are also implemented. As a result, 14, 9, and 9 out of the top 15 targets that interacted with these miRNAs were verified using the experimental literature or other databases. All these prediction results indicate that MIPDH is an effective method for predicting miRNA-target mRNA interactions.
Collapse
Affiliation(s)
- Leon Wong
- The
Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
- University
of Chinese Academy of Sciences, Beijing 100049, China
- XinJiang
Laboratory of Minority Speech and Language Information Processing, Chinese Academy of Sciences, Urumqi 830011, China
| | - Zhu-Hong You
- The
Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
- University
of Chinese Academy of Sciences, Beijing 100049, China
- XinJiang
Laboratory of Minority Speech and Language Information Processing, Chinese Academy of Sciences, Urumqi 830011, China
| | - Zhen-Hao Guo
- The
Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
- University
of Chinese Academy of Sciences, Beijing 100049, China
- XinJiang
Laboratory of Minority Speech and Language Information Processing, Chinese Academy of Sciences, Urumqi 830011, China
| | - Hai-Cheng Yi
- The
Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
- University
of Chinese Academy of Sciences, Beijing 100049, China
- XinJiang
Laboratory of Minority Speech and Language Information Processing, Chinese Academy of Sciences, Urumqi 830011, China
| | - Zhan-Heng Chen
- The
Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
- University
of Chinese Academy of Sciences, Beijing 100049, China
- XinJiang
Laboratory of Minority Speech and Language Information Processing, Chinese Academy of Sciences, Urumqi 830011, China
| | - Mei-Yuan Cao
- Guang
Dong Polytechnic College, Zhaoqing 526100, Guangdong, China
| |
Collapse
|
157
|
Guan ZX, Li SH, Zhang ZM, Zhang D, Yang H, Ding H. A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods. Curr Genomics 2020; 21:11-25. [PMID: 32655294 PMCID: PMC7324890 DOI: 10.2174/1389202921666200214125102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/24/2020] [Accepted: 01/30/2020] [Indexed: 11/22/2022] Open
Abstract
MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.
Collapse
Affiliation(s)
- Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| |
Collapse
|
158
|
Identification of Human Enzymes Using Amino Acid Composition and the Composition of k-Spaced Amino Acid Pairs. BIOMED RESEARCH INTERNATIONAL 2020; 2020:9235920. [PMID: 32596396 PMCID: PMC7273372 DOI: 10.1155/2020/9235920] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 04/22/2020] [Indexed: 11/17/2022]
Abstract
Enzymes are proteins that can efficiently catalyze specific biochemical reactions, and they are widely present in the human body. Developing an efficient method to identify human enzymes is vital to select enzymes from the vast number of human proteins and to investigate their functions. Nevertheless, only a limited amount of research has been conducted on the classification of human enzymes and nonenzymes. In this work, we developed a support vector machine- (SVM-) based predictor to classify human enzymes using the amino acid composition (AAC), the composition of k-spaced amino acid pairs (CKSAAP), and selected informative amino acid pairs through the use of a feature selection technique. A training dataset including 1117 human enzymes and 2099 nonenzymes and a test dataset including 684 human enzymes and 1270 nonenzymes were constructed to train and test the proposed model. The results of jackknife cross-validation showed that the overall accuracy was 76.46% for the training set and 76.21% for the test set, which are higher than the 72.6% achieved in previous research. Furthermore, various feature extraction methods and mainstream classifiers were compared in this task, and informative feature parameters of k-spaced amino acid pairs were selected and compared. The results suggest that our classifier can be used in human enzyme identification effectively and efficiently and can help to understand their functions and develop new drugs.
Collapse
|
159
|
Xiao N, Hu Y, Juan L. Comprehensive Analysis of Differentially Expressed lncRNAs in Gastric Cancer. Front Cell Dev Biol 2020; 8:557. [PMID: 32695786 PMCID: PMC7338654 DOI: 10.3389/fcell.2020.00557] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 06/11/2020] [Indexed: 01/26/2023] Open
Abstract
Gastric cancer (GC) is the fourth most common malignant tumor. The mechanism underlying GC occurrence and development remains unclear. Previous studies have indicated that long non-coding RNAs (lncRNAs) are significantly associated with gastric cancer, but a systematic understanding of the role of lncRNAs in gastric cancer is lacking. In recent years, with the development of next-generation sequencing technology, tens of thousands of lncRNAs have been discovered. However, a large number of unannotated lncRNAs remain unidentified in different tissues, including potential gastric cancer-related lncRNAs. In this study, RNA sequencing (RNA-seq) data from 16 samples of eight gastric cancer patients were obtained and analyzed. A total of 1,854 previously unannotated lncRNAs were identified by ab initio assembly, and 520 differentially expressed lncRNAs were validated in the TCGA expression dataset. Methylation and copy number variation (CNV) array data from the same sample were integrated in the analysis. Changes in DNA methylation levels and CNVs may be responsible for the differential expression of 91 lncRNAs. Differentially expressed lncRNAs were enriched in coexpressed clusters of genes related to functions such as cell signaling, cell cycle, immune response, metabolic processes, angiogenesis, and regulation of retinoic acid (RA) receptors. Finally, a differentially expressed lncRNA, AC004510.3, was identified as a potential biomarker for the prediction of the overall survival of gastric cancer patients.
Collapse
Affiliation(s)
- Nan Xiao
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China.,School of Pharmaceutical Sciences, Tsinghua University, Beijing, China
| | - Yang Hu
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Liran Juan
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
160
|
Li H, Du H, Wang X, Gao P, Liu Y, Lin W. Remarks on Computational Method for Identifying Acid and Alkaline Enzymes. Curr Pharm Des 2020; 26:3105-3114. [PMID: 32552636 DOI: 10.2174/1381612826666200617170826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 05/07/2020] [Indexed: 11/22/2022]
Abstract
The catalytic efficiency of the enzyme is thousands of times higher than that of ordinary catalysts. Thus, they are widely used in industrial and medical fields. However, enzymes with protein structure can be destroyed and inactivated in high temperature, over acid or over alkali environment. It is well known that most of enzymes work well in an environment with pH of 6-8, while some special enzymes remain active only in an alkaline environment with pH > 8 or an acidic environment with pH < 6. Therefore, the identification of acidic and alkaline enzymes has become a key task for industrial production. Because of the wide varieties of enzymes, it is hard work to determine the acidity and alkalinity of the enzyme by experimental methods, and even this task cannot be achieved. Converting protein sequences into digital features and building computational models can efficiently and accurately identify the acidity and alkalinity of enzymes. This review summarized the progress of the digital features to express proteins and computational methods to identify acidic and alkaline enzymes. We hope that this paper will provide more convenience, ideas, and guides for computationally classifying acid and alkaline enzymes.
Collapse
Affiliation(s)
- Hongfei Li
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Haoze Du
- Department of Computer Science, Wake Forest University, Winston-Salem, NC, 27109, United States
| | - Xianfang Wang
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Peng Gao
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Yifeng Liu
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Weizhong Lin
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, United States
| |
Collapse
|
161
|
Deng S, Sun Y, Zhao T, Hu Y, Zang T. A Review of Drug Side Effect Identification Methods. Curr Pharm Des 2020; 26:3096-3104. [PMID: 32532187 DOI: 10.2174/1381612826666200612163819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 05/18/2020] [Indexed: 11/22/2022]
Abstract
Drug side effects have become an important indicator for evaluating the safety of drugs. There are two main factors in the frequent occurrence of drug safety problems; on the one hand, the clinical understanding of drug side effects is insufficient, leading to frequent adverse drug reactions, while on the other hand, due to the long-term period and complexity of clinical trials, side effects of approved drugs on the market cannot be reported in a timely manner. Therefore, many researchers have focused on developing methods to identify drug side effects. In this review, we summarize the methods of identifying drug side effects and common databases in this field. We classified methods of identifying side effects into four categories: biological experimental, machine learning, text mining and network methods. We point out the key points of each kind of method. In addition, we also explain the advantages and disadvantages of each method. Finally, we propose future research directions.
Collapse
Affiliation(s)
- Shuai Deng
- College of Science, Beijing Forestry University, Beijing, China
| | - Yige Sun
- Microbiology Department, Harbin Medical University, Harbin, 150081, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
162
|
Yi HC, You ZH, Huang DS, Guo ZH, Chan KCC, Li Y. Learning Representations to Predict Intermolecular Interactions on Large-Scale Heterogeneous Molecular Association Network. iScience 2020; 23:101261. [PMID: 32580123 PMCID: PMC7317230 DOI: 10.1016/j.isci.2020.101261] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 04/29/2020] [Accepted: 06/08/2020] [Indexed: 02/07/2023] Open
Abstract
Molecular components that are functionally interdependent in human cells constitute molecular association networks. Disease can be caused by disturbance of multiple molecular interactions. New biomolecular regulatory mechanisms can be revealed by discovering new biomolecular interactions. To this end, a heterogeneous molecular association network is formed by systematically integrating comprehensive associations between miRNAs, lncRNAs, circRNAs, mRNAs, proteins, drugs, microbes, and complex diseases. We propose a machine learning method for predicting intermolecular interactions, named MMI-Pred. More specifically, a network embedding model is developed to fully exploit the network behavior of biomolecules, and attribute features are also calculated. Then, these discriminative features are combined to train a random forest classifier to predict intermolecular interactions. MMI-Pred achieves an outstanding performance of 93.50% accuracy in hybrid associations prediction under 5-fold cross-validation. This work provides systematic landscape and machine learning method to model and infer complex associations between various biological components.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Zhen-Hao Guo
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Keith C C Chan
- Department of Computing, Hong Kong Polytechnic University, Hong Kong SAR 999077, China
| | - Yangming Li
- College of Engineering Technology, Rochester Institute of Technology, Rochester, NY 14623, USA
| |
Collapse
|
163
|
Chen ZH, You ZH, Guo ZH, Yi HC, Luo GX, Wang YB. Prediction of Drug-Target Interactions From Multi-Molecular Network Based on Deep Walk Embedding Model. Front Bioeng Biotechnol 2020; 8:338. [PMID: 32582646 PMCID: PMC7283956 DOI: 10.3389/fbioe.2020.00338] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 03/26/2020] [Indexed: 12/16/2022] Open
Abstract
Predicting drug-target interactions (DTIs) is crucial in innovative drug discovery, drug repositioning and other fields. However, there are many shortcomings for predicting DTIs using traditional biological experimental methods, such as the high-cost, time-consumption, low efficiency, and so on, which make these methods difficult to widely apply. As a supplement, the in silico method can provide helpful information for predictions of DTIs in a timely manner. In this work, a deep walk embedding method is developed for predicting DTIs from a multi-molecular network. More specifically, a multi-molecular network, also called molecular associations network, is constructed by integrating the associations among drug, protein, disease, lncRNA, and miRNA. Then, each node can be represented as a behavior feature vector by using a deep walk embedding method. Finally, we compared behavior features with traditional attribute features on an integrated dataset by using various classifiers. The experimental results revealed that the behavior feature could be performed better on different classifiers, especially on the random forest classifier. It is also demonstrated that the use of behavior information is very helpful for addressing the problem of sequences containing both self-interacting and non-interacting pairs of proteins. This work is not only extremely suitable for predicting DTIs, but also provides a new perspective for the prediction of other biomolecules' associations.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhen-Hao Guo
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hai-Cheng Yi
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Gong-Xu Luo
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yan-Bin Wang
- School of Cyber Science and Technology, Zhejiang University, Hangzhou, China
| |
Collapse
|
164
|
Guo ZH, You ZH, Wang YB, Huang DS, Yi HC, Chen ZH. Bioentity2vec: Attribute- and behavior-driven representation for predicting multi-type relationships between bioentities. Gigascience 2020; 9:giaa032. [PMID: 32533701 PMCID: PMC7293023 DOI: 10.1093/gigascience/giaa032] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 01/06/2020] [Accepted: 03/13/2020] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND The explosive growth of genomic, chemical, and pathological data provides new opportunities and challenges for humans to thoroughly understand life activities in cells. However, there exist few computational models that aggregate various bioentities to comprehensively reveal the physical and functional landscape of biological systems. RESULTS We constructed a molecular association network, which contains 18 edges (relationships) between 8 nodes (bioentities). Based on this, we propose Bioentity2vec, a new method for representing bioentities, which integrates information about the attributes and behaviors of a bioentity. Applying the random forest classifier, we achieved promising performance on 18 relationships, with an area under the curve of 0.9608 and an area under the precision-recall curve of 0.9572. CONCLUSIONS Our study shows that constructing a network with rich topological and biological information is important for systematic understanding of the biological landscape at the molecular level. Our results show that Bioentity2vec can effectively represent biological entities and provides easily distinguishable information about classification tasks. Our method is also able to simultaneously predict relationships between single types and multiple types, which will accelerate progress in biological experimental research and industrial product development.
Collapse
Affiliation(s)
- Zhen-Hao Guo
- XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan-Bin Wang
- School of Cyber Science and Technology, Zhejiang University, Hangzhou 310000, Zhejiang, China
| | - De-Shuang Huang
- Computer Science Department, Tongji University, Shanghai 200000, China
| | - Hai-Cheng Yi
- XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhan-Heng Chen
- XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
165
|
Cheng L, Nan C, Kang L, Zhang N, Liu S, Chen H, Hong C, Chen Y, Liang Z, Liu X. Whole blood transcriptomic investigation identifies long non-coding RNAs as regulators in sepsis. J Transl Med 2020; 18:217. [PMID: 32471511 PMCID: PMC7257169 DOI: 10.1186/s12967-020-02372-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 05/12/2020] [Indexed: 12/21/2022] Open
Abstract
Background Sepsis is a fatal disease referring to the presence of a known or strongly suspected infection coupled with systemic and uncontrolled immune activation causing multiple organ failure. However, current knowledge of the role of lncRNAs in sepsis is still extremely limited. Methods We performed an in silico investigation of the gene coexpression pattern for the patients response to all-cause sepsis in consecutive intensive care unit (ICU) admissions. Sepsis coexpression gene modules were identified using WGCNA and enrichment analysis. lncRNAs were determined as sepsis biomarkers based on the interactions among lncRNAs and the identified modules. Results Twenty-three sepsis modules, including both differentially expressed modules and prognostic modules, were identified from the whole blood RNA expression profiling of sepsis patients. Five lncRNAs, FENDRR, MALAT1, TUG1, CRNDE, and ANCR, were detected as sepsis regulators based on the interactions among lncRNAs and the identified coexpression modules. Furthermore, we found that CRNDE and MALAT1 may act as miRNA sponges of sepsis related miRNAs to regulate the expression of sepsis modules. Ultimately, FENDRR, MALAT1, TUG1, and CRNDE were reannotated using three independent lncRNA expression datasets and validated as differentially expressed lncRNAs. Conclusion The procedure facilitates the identification of prognostic biomarkers and novel therapeutic strategies of sepsis. Our findings highlight the importance of transcriptome modularity and regulatory lncRNAs in the progress of sepsis.
Collapse
Affiliation(s)
- Lixin Cheng
- Department of Critical Care Medicine, Shenzhen People's Hospital, The Second Clinical Medicine College of Jinan University, Shenzhen, China
| | - Chuanchuan Nan
- Department of Critical Care Medicine, Shenzhen People's Hospital, The Second Clinical Medicine College of Jinan University, Shenzhen, China
| | - Lin Kang
- Shenzhen People's Hospital, The Second Clinical Medicine College of Jinan University, Shenzhen, China
| | - Ning Zhang
- Department of Critical Care Medicine, Shenzhen People's Hospital, The Second Clinical Medicine College of Jinan University, Shenzhen, China
| | - Sheng Liu
- Department of Critical Care Medicine, Shenzhen People's Hospital, The Second Clinical Medicine College of Jinan University, Shenzhen, China
| | - Huaisheng Chen
- Department of Critical Care Medicine, Shenzhen People's Hospital, The Second Clinical Medicine College of Jinan University, Shenzhen, China
| | - Chengying Hong
- Department of Critical Care Medicine, Shenzhen People's Hospital, The Second Clinical Medicine College of Jinan University, Shenzhen, China
| | - Youlian Chen
- Department of Critical Care Medicine, Shenzhen People's Hospital, The Second Clinical Medicine College of Jinan University, Shenzhen, China
| | - Zhen Liang
- Shenzhen People's Hospital, The Second Clinical Medicine College of Jinan University, Shenzhen, China.
| | - Xueyan Liu
- Department of Critical Care Medicine, Shenzhen People's Hospital, The Second Clinical Medicine College of Jinan University, Shenzhen, China.
| |
Collapse
|
166
|
Wang C, Zhang Y, Han S. Its2vec: Fungal Species Identification Using Sequence Embedding and Random Forest Classification. BIOMED RESEARCH INTERNATIONAL 2020; 2020:2468789. [PMID: 32566672 PMCID: PMC7275950 DOI: 10.1155/2020/2468789] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 03/20/2020] [Accepted: 03/25/2020] [Indexed: 12/19/2022]
Abstract
Fungi play essential roles in many ecological processes, and taxonomic classification is fundamental for microbial community characterization and vital for the study and preservation of fungal biodiversity. To cope with massive fungal barcode data, tools that can implement extensive volumes of barcode sequences, especially the internal transcribed spacer (ITS) region, are necessary. However, high variation in the ITS region and computational requirements for processing high-dimensional features remain challenging for existing predictors. In this study, we developed Its2vec, a bioinformatics tool for the classification of fungal ITS barcodes to the species level. An ITS database covering more than 25,000 species in a broad range of fungal taxa was assembled. For dimensionality reduction, a word embedding algorithm was used to represent an ITS sequence as a dense low-dimensional vector. A random forest-based classifier was built for species identification. Benchmarking results showed that our model achieved an accuracy comparable to that of several state-of-the-art predictors, and more importantly, it could implement large datasets and greatly reduce dimensionality. We expect the Its2vec model to be helpful for fungal species identification and, thus, for revealing microbial community structures and in deepening our understanding of their functional mechanisms.
Collapse
Affiliation(s)
- Chao Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, China
| | - Shuguang Han
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 60054, China
| |
Collapse
|
167
|
Cheng L, Qi C, Zhuang H, Fu T, Zhang X. gutMDisorder: a comprehensive database for dysbiosis of the gut microbiota in disorders and interventions. Nucleic Acids Res 2020; 48:D554-D560. [PMID: 31584099 PMCID: PMC6943049 DOI: 10.1093/nar/gkz843] [Citation(s) in RCA: 133] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2019] [Revised: 09/18/2019] [Accepted: 10/01/2019] [Indexed: 12/11/2022] Open
Abstract
gutMDisorder (http://bio-annotation.cn/gutMDisorder), a manually curated database, aims at providing a comprehensive resource of dysbiosis of the gut microbiota in disorders and interventions. Alterations in the composition of the gut microbial community play crucial roles in the development of chronic disorders. And the beneficial effects of drugs, foods and other intervention measures on disorders could be microbially mediated. The current version of gutMDisorder documents 2263 curated associations between 579 gut microbes and 123 disorders or 77 intervention measures in Human, and 930 curated associations between 273 gut microbes and 33 disorders or 151 intervention measures in Mouse. Each entry in the gutMDisorder contains detailed information on an association, including an intestinal microbe, a disorder name, intervention measures, experimental technology and platform, characteristic of samples, web sites for downloading the sequencing data, a brief description of the association, a literature reference, and so on. gutMDisorder provides a user-friendly interface to browse, retrieve each entry using gut microbes, disorders, and intervention measures. It also offers pages for downloading all the entries and submitting new experimentally validated associations.
Collapse
Affiliation(s)
- Liang Cheng
- NHC and CAMS Key Laboratory of Molecular Probe and Targeted Theranostics, Harbin Medical University, Harbin, Heilongjiang, China, 150028.,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - Changlu Qi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - He Zhuang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - Tongze Fu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - Xue Zhang
- NHC and CAMS Key Laboratory of Molecular Probe and Targeted Theranostics, Harbin Medical University, Harbin, Heilongjiang, China, 150028.,McKusick-Zhang Center for Genetic Medicine, Peking Union Medical College, Beijing, China, 100005
| |
Collapse
|
168
|
Zhao H, Shi J, Zhang Y, Xie A, Yu L, Zhang C, Lei J, Xu H, Leng Z, Li T, Huang W, Lin S, Wang L, Xiao Y, Li X. LncTarD: a manually-curated database of experimentally-supported functional lncRNA-target regulations in human diseases. Nucleic Acids Res 2020; 48:D118-D126. [PMID: 31713618 PMCID: PMC7145524 DOI: 10.1093/nar/gkz985] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/12/2019] [Accepted: 10/16/2019] [Indexed: 12/11/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are associated with human diseases. Although lncRNA–disease associations have received significant attention, no online repository is available to collect lncRNA-mediated regulatory mechanisms, key downstream targets, and important biological functions driven by disease-related lncRNAs in human diseases. We thus developed LncTarD (http://biocc.hrbmu.edu.cn/LncTarD/ or http://bio-bigdata.hrbmu.edu.cn/LncTarD), a manually-curated database that provides a comprehensive resource of key lncRNA–target regulations, lncRNA-influenced functions, and lncRNA-mediated regulatory mechanisms in human diseases. LncTarD offers (i) 2822 key lncRNA–target regulations involving 475 lncRNAs and 1039 targets associated with 177 human diseases; (ii) 1613 experimentally-supported functional regulations and 1209 expression associations in human diseases; (iii) important biological functions driven by disease-related lncRNAs in human diseases; (iv) lncRNA–target regulations responsible for drug resistance or sensitivity in human diseases and (v) lncRNA microarray, lncRNA sequence data and transcriptome data of an 11 373 pan-cancer patient cohort from TCGA to help characterize the functional dynamics of these lncRNA–target regulations. LncTarD also provides a user-friendly interface to conveniently browse, search, and download data. LncTarD will be a useful resource platform for the further understanding of functions and molecular mechanisms of lncRNA deregulation in human disease, which will help to identify novel and sensitive biomarkers and therapeutic targets.
Collapse
Affiliation(s)
- Hongying Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Jian Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yunpeng Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Aimin Xie
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Lei Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Caiyu Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Junjie Lei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Haotian Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Zhijun Leng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Tengyue Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Waidong Huang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shihua Lin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Li Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yun Xiao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.,College of Bioinformatics, Hainan Medical University, Haikou 570100, China
| |
Collapse
|
169
|
Ying X, Jin X, Wang P, He Y, Zhang H, Ren X, Chai S, Fu W, Zhao P, Chen C, Ma G, Liu H. Integrative Analysis for Elucidating Transcriptomics Landscapes of Glucocorticoid-Induced Osteoporosis. Front Cell Dev Biol 2020; 8:252. [PMID: 32373610 PMCID: PMC7176994 DOI: 10.3389/fcell.2020.00252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 03/25/2020] [Indexed: 11/13/2022] Open
Abstract
Osteoporosis is the most common bone metabolic disease, characterized by bone mass loss and bone microstructure changes due to unbalanced bone conversion, which increases bone fragility and fracture risk. Glucocorticoids are clinically used to treat a variety of diseases, including inflammation, cancer and autoimmune diseases. However, excess glucocorticoids can cause osteoporosis. Herein we performed an integrated analysis of two glucocorticoid-related microarray datasets. The WGCNA analysis identified 3 and 4 glucocorticoid-related gene modules, respectively. Differential expression analysis revealed 1047 and 844 differentially expressed genes in the two datasets. After integrating differentially expressed glucocorticoid-related genes, we found that most of the robust differentially expressed genes were up-regulated. Through protein-protein interaction analysis, we obtained 158 glucocorticoid-related candidate genes. Enrichment analysis showed that these genes are significantly enriched in the osteoporosis related pathways. Our results provided new insights into glucocorticoid-induced osteoporosis and potential candidate markers of osteoporosis.
Collapse
Affiliation(s)
- Xiaoxia Ying
- School of Stomatology, Dalian Medical University, Dalian, China
| | - Xiyun Jin
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Pingping Wang
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Yuzhu He
- School of Stomatology, Dalian Medical University, Dalian, China
| | - Haomiao Zhang
- School of Stomatology, Dalian Medical University, Dalian, China
| | - Xiang Ren
- School of Stomatology, Dalian Medical University, Dalian, China
| | - Songling Chai
- School of Stomatology, Dalian Medical University, Dalian, China
| | - Wenqi Fu
- School of Stomatology, Dalian Medical University, Dalian, China
| | - Pengcheng Zhao
- School of Stomatology, Dalian Medical University, Dalian, China
| | - Chen Chen
- School of Stomatology, Dalian Medical University, Dalian, China
| | - Guowu Ma
- School of Stomatology, Dalian Medical University, Dalian, China
| | - Huiying Liu
- School of Stomatology, Dalian Medical University, Dalian, China
| |
Collapse
|
170
|
Feng C, Ma Z, Yang D, Li X, Zhang J, Li Y. A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features. Front Bioeng Biotechnol 2020; 8:285. [PMID: 32432088 PMCID: PMC7214540 DOI: 10.3389/fbioe.2020.00285] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 03/18/2020] [Indexed: 11/13/2022] Open
Abstract
The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to achieve this recognition task. In this method, an amino acid reduction scheme was adopted to recode the amino acid sequence. Then, the physicochemical characteristics, auto-cross covariance (ACC), and reduced dipeptides were calculated and integrated to form a mixed feature set, which was processed using correlation analysis, feature selection, and principal component analysis (PCA) to remove redundant information. Finally, four machine learning methods and a dataset containing 500 random observations out of 915 thermophilic proteins and 500 random samples out of 793 non-thermophilic proteins were used to train and predict the data. The experimental results showed that 98.2% of thermophilic and non-thermophilic proteins were correctly identified using 10-fold cross-validation. Moreover, our analysis of the final reserved features and removed features yielded information about the crucial, unimportant and insensitive elements, it also provided essential information for enzyme design.
Collapse
Affiliation(s)
- Changli Feng
- College of Information Science and Technology, Taishan University, Tai’an, China
| | - Zhaogui Ma
- College of Information Science and Technology, Taishan University, Tai’an, China
| | - Deyun Yang
- College of Information Science and Technology, Taishan University, Tai’an, China
| | - Xin Li
- College of Information Science and Technology, Taishan University, Tai’an, China
| | - Jun Zhang
- Department of Rehabilitation, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Yanjuan Li
- Information and Computer Engineering College, Northeast Forestry University, Harbin, China
| |
Collapse
|
171
|
Peng J, Zhu L, Wang Y, Chen J. Mining Relationships among Multiple Entities in Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:769-776. [PMID: 30872239 DOI: 10.1109/tcbb.2019.2904965] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying topological relationships among multiple entities in biological networks is critical towards the understanding of the organizational principles of network functionality. Theoretically, this problem can be solved using minimum Steiner tree (MSTT) algorithms. However, due to large network size, it remains to be computationally challenging, and the predictive value of multi-entity topological relationships is still unclear. We present a novel solution called Cluster-based Steiner Tree Miner (CST-Miner) to instantly identify multi-entity topological relationships in biological networks. Given a list of user-specific entities, CST-Miner decomposes a biological network into nested cluster-based subgraphs, on which multiple minimum Steiner trees are identified. By merging all of them into a minimum cost tree, the optimal topological relationships among all the user-specific entities are revealed. Experimental results showed that CST-Miner can finish in nearly log-linear time and the tree constructed by CST-Miner is close to the global minimum.
Collapse
|
172
|
Dao FY, Lv H, Yang YH, Zulfiqar H, Gao H, Lin H. Computational identification of N6-methyladenosine sites in multiple tissues of mammals. Comput Struct Biotechnol J 2020; 18:1084-1091. [PMID: 32435427 PMCID: PMC7229270 DOI: 10.1016/j.csbj.2020.04.015] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Revised: 04/20/2020] [Accepted: 04/21/2020] [Indexed: 12/12/2022] Open
Abstract
N6-methyladenosine (m6A) is the methylation of the adenosine at the nitrogen-6 position, which is the most abundant RNA methylation modification and involves a series of important biological processes. Accurate identification of m6A sites in genome-wide is invaluable for better understanding their biological functions. In this work, an ensemble predictor named iRNA-m6A was established to identify m6A sites in multiple tissues of human, mouse and rat based on the data from high-throughput sequencing techniques. In the proposed predictor, RNA sequences were encoded by physical-chemical property matrix, mono-nucleotide binary encoding and nucleotide chemical property. Subsequently, these features were optimized by using minimum Redundancy Maximum Relevance (mRMR) feature selection method. Based on the optimal feature subset, the best m6A classification models were trained by Support Vector Machine (SVM) with 5-fold cross-validation test. Prediction results on independent dataset showed that our proposed method could produce the excellent generalization ability. We also established a user-friendly webserver called iRNA-m6A which can be freely accessible at http://lin-group.cn/server/iRNA-m6A. This tool will provide more convenience to users for studying m6A modification in different tissues.
Collapse
Affiliation(s)
| | | | - Yu-He Yang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hasan Zulfiqar
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Gao
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
173
|
Salamini-Montemurri M, Lamas-Maceiras M, Barreiro-Alonso A, Vizoso-Vázquez Á, Rodríguez-Belmonte E, Quindós-Varela M, Cerdán ME. The Challenges and Opportunities of LncRNAs in Ovarian Cancer Research and Clinical Use. Cancers (Basel) 2020; 12:E1020. [PMID: 32326249 PMCID: PMC7225988 DOI: 10.3390/cancers12041020] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Revised: 04/15/2020] [Accepted: 04/17/2020] [Indexed: 12/24/2022] Open
Abstract
Ovarian cancer is one of the most lethal gynecological malignancies worldwide because it tends to be detected late, when the disease has already spread, and prognosis is poor. In this review we aim to highlight the importance of long non-coding RNAs (lncRNAs) in diagnosis, prognosis and treatment choice, to make progress towards increasingly personalized medicine in this malignancy. We review the effects of lncRNAs associated with ovarian cancer in the context of cancer hallmarks. We also discuss the molecular mechanisms by which lncRNAs become involved in cellular physiology; the onset, development and progression of ovarian cancer; and lncRNAs' regulatory mechanisms at the transcriptional, post-transcriptional and post-translational stages of gene expression. Finally, we compile a series of online resources useful for the study of lncRNAs, especially in the context of ovarian cancer. Future work required in the field is also discussed along with some concluding remarks.
Collapse
Affiliation(s)
- Martín Salamini-Montemurri
- EXPRELA Group, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, 15071 A Coruña, Spain; (M.S.-M.); (M.L.-M.); (A.B.-A.); (E.R.-B.)
| | - Mónica Lamas-Maceiras
- EXPRELA Group, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, 15071 A Coruña, Spain; (M.S.-M.); (M.L.-M.); (A.B.-A.); (E.R.-B.)
| | - Aida Barreiro-Alonso
- EXPRELA Group, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, 15071 A Coruña, Spain; (M.S.-M.); (M.L.-M.); (A.B.-A.); (E.R.-B.)
| | - Ángel Vizoso-Vázquez
- EXPRELA Group, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, 15071 A Coruña, Spain; (M.S.-M.); (M.L.-M.); (A.B.-A.); (E.R.-B.)
| | - Esther Rodríguez-Belmonte
- EXPRELA Group, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, 15071 A Coruña, Spain; (M.S.-M.); (M.L.-M.); (A.B.-A.); (E.R.-B.)
| | - María Quindós-Varela
- Translational Cancer Research Group, Instituto de Investigación Biomédica de A Coruña (INIBIC), Carretera del Pasaje s/n, 15006 A Coruña, Spain;
| | - María Esperanza Cerdán
- EXPRELA Group, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, 15071 A Coruña, Spain; (M.S.-M.); (M.L.-M.); (A.B.-A.); (E.R.-B.)
| |
Collapse
|
174
|
Ji BY, You ZH, Cheng L, Zhou JR, Alghazzawi D, Li LP. Predicting miRNA-disease association from heterogeneous information network with GraRep embedding model. Sci Rep 2020; 10:6658. [PMID: 32313121 PMCID: PMC7170854 DOI: 10.1038/s41598-020-63735-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 03/16/2020] [Indexed: 12/27/2022] Open
Abstract
In recent years, accumulating evidences have shown that microRNA (miRNA) plays an important role in the exploration and treatment of diseases, so detection of the associations between miRNA and disease has been drawn more and more attentions. However, traditional experimental methods have the limitations of high cost and time- consuming, a computational method can help us more systematically and effectively predict the potential miRNA-disease associations. In this work, we proposed a novel network embedding-based heterogeneous information integration method to predict miRNA-disease associations. More specifically, a heterogeneous information network is constructed by combining the known associations among lncRNA, drug, protein, disease, and miRNA. After that, the network embedding method Learning Graph Representations with Global Structural Information (GraRep) is employed to learn embeddings of nodes in heterogeneous information network. In this way, the embedding representations of miRNA and disease are integrated with the attribute information of miRNA and disease (e.g. miRNA sequence information and disease semantic similarity) to represent miRNA-disease association pairs. Finally, the Random Forest (RF) classifier is used for predicting potential miRNA-disease associations. Under the 5-fold cross validation, our method obtained 85.11% prediction accuracy with 80.41% sensitivity at the AUC of 91.25%. In addition, in case studies of three major Human diseases, 45 (Colon Neoplasms), 42 (Breast Neoplasms) and 44 (Esophageal Neoplasms) of top-50 predicted miRNAs are respectively verified by other miRNA-disease association databases. In conclusion, the experimental results suggest that our method can be a powerful and useful tool for predicting potential miRNA-disease associations.
Collapse
Affiliation(s)
- Bo-Ya Ji
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhu-Hong You
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Li Cheng
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.
| | - Ji-Ren Zhou
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Daniyal Alghazzawi
- Department of Information Systems, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Li-Ping Li
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| |
Collapse
|
175
|
Donato L, Scimone C, Alibrandi S, Rinaldi C, Sidoti A, D’Angelo R. Transcriptome Analyses of lncRNAs in A2E-Stressed Retinal Epithelial Cells Unveil Advanced Links between Metabolic Impairments Related to Oxidative Stress and Retinitis Pigmentosa. Antioxidants (Basel) 2020; 9:E318. [PMID: 32326576 PMCID: PMC7222347 DOI: 10.3390/antiox9040318] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 04/08/2020] [Accepted: 04/14/2020] [Indexed: 12/12/2022] Open
Abstract
: Long non-coding RNAs (lncRNAs) are untranslated transcripts which regulate many biological processes. Changes in lncRNA expression pattern are well-known related to various human disorders, such as ocular diseases. Among them, retinitis pigmentosa, one of the most heterogeneous inherited disorder, is strictly related to oxidative stress. However, little is known about regulative aspects able to link oxidative stress to etiopathogenesis of retinitis. Thus, we realized a total RNA-Seq experiment, analyzing human retinal pigment epithelium cells treated by the oxidant agent N-retinylidene-N-retinylethanolamine (A2E), considering three independent experimental groups (untreated control cells, cells treated for 3 h and cells treated for 6 h). Differentially expressed lncRNAs were filtered out, explored with specific tools and databases, and finally subjected to pathway analysis. We detected 3,3'-overlapping ncRNAs, 107 antisense, 24 sense-intronic, four sense-overlapping and 227 lincRNAs very differentially expressed throughout all considered time points. Analyzed lncRNAs could be involved in several biochemical pathways related to compromised response to oxidative stress, carbohydrate and lipid metabolism impairment, melanin biosynthetic process alteration, deficiency in cellular response to amino acid starvation, unbalanced regulation of cofactor metabolic process, all leading to retinal cell death. The explored lncRNAs could play a relevant role in retinitis pigmentosa etiopathogenesis, and seem to be the ideal candidate for novel molecular markers and therapeutic strategies.
Collapse
Affiliation(s)
- Luigi Donato
- Department of Biomedical and Dental Sciences and Morphofunctional Imaging, Division of Medical Biotechnologies and Preventive Medicine, University of Messina, 98125 Messina, Italy
- Department of Biomolecular Strategies, Genetics and Avant-Garde Therapies, I.E.ME.S.T., 90139 Palermo, Italy
| | - Concetta Scimone
- Department of Biomedical and Dental Sciences and Morphofunctional Imaging, Division of Medical Biotechnologies and Preventive Medicine, University of Messina, 98125 Messina, Italy
- Department of Biomolecular Strategies, Genetics and Avant-Garde Therapies, I.E.ME.S.T., 90139 Palermo, Italy
| | - Simona Alibrandi
- Department of Biomedical and Dental Sciences and Morphofunctional Imaging, Division of Medical Biotechnologies and Preventive Medicine, University of Messina, 98125 Messina, Italy
- Department of Chemical, Biological, Pharmaceutical and Environmental Sciences, University of Messina, 98125 Messina, Italy
| | - Carmela Rinaldi
- Department of Biomedical and Dental Sciences and Morphofunctional Imaging, Division of Medical Biotechnologies and Preventive Medicine, University of Messina, 98125 Messina, Italy
| | - Antonina Sidoti
- Department of Biomedical and Dental Sciences and Morphofunctional Imaging, Division of Medical Biotechnologies and Preventive Medicine, University of Messina, 98125 Messina, Italy
- Department of Biomolecular Strategies, Genetics and Avant-Garde Therapies, I.E.ME.S.T., 90139 Palermo, Italy
| | - Rosalia D’Angelo
- Department of Biomedical and Dental Sciences and Morphofunctional Imaging, Division of Medical Biotechnologies and Preventive Medicine, University of Messina, 98125 Messina, Italy
- Department of Biomolecular Strategies, Genetics and Avant-Garde Therapies, I.E.ME.S.T., 90139 Palermo, Italy
| |
Collapse
|
176
|
Meng C, Hu Y, Zhang Y, Guo F. PSBP-SVM: A Machine Learning-Based Computational Identifier for Predicting Polystyrene Binding Peptides. Front Bioeng Biotechnol 2020; 8:245. [PMID: 32296690 PMCID: PMC7137786 DOI: 10.3389/fbioe.2020.00245] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 03/09/2020] [Indexed: 12/11/2022] Open
Abstract
Polystyrene binding peptides (PSBPs) play a key role in the immobilization process. The correct identification of PSBPs is the first step of all related works. In this paper, we proposed a novel support vector machine-based bioinformatic identification model. This model contains four machine learning steps, including feature extraction, feature selection, model training and optimization. In a five-fold cross validation test, this model achieves 90.38, 84.62, 87.50, and 0.90% SN, SP, ACC, and AUC, respectively. The performance of this model outperforms the state-of-the-art identifier in terms of the SN and ACC with a smaller feature set. Furthermore, we constructed a web server that includes the proposed model, which is freely accessible at http://server.malab.cn/PSBP-SVM/index.jsp.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, China.,College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Yang Hu
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
177
|
Li HF, Wang XF, Tang H. Predicting Bacteriophage Enzymes and Hydrolases by Using Combined Features. Front Bioeng Biotechnol 2020; 8:183. [PMID: 32266225 PMCID: PMC7105632 DOI: 10.3389/fbioe.2020.00183] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Accepted: 02/24/2020] [Indexed: 12/19/2022] Open
Abstract
Bacteriophage is a type of virus that could infect the host bacteria. They have been applied in the treatment of pathogenic bacterial infection. Phage enzymes and hydrolases play the most important role in the destruction of bacterial cells. Correctly identifying the hydrolases coded by phage is not only beneficial to their function study, but also conducive to antibacteria drug discovery. Thus, this work aims to recognize the enzymes and hydrolases in phage. A combination of different features was used to represent samples of phage and hydrolase. A feature selection technique called analysis of variance was developed to optimize features. The classification was performed by using support vector machine (SVM). The prediction process includes two steps. The first step is to identify phage enzymes. The second step is to determine whether a phage enzyme is hydrolase or not. The jackknife cross-validated results showed that our method could produce overall accuracies of 85.1 and 94.3%, respectively, for the two predictions, demonstrating that the proposed method is promising.
Collapse
Affiliation(s)
- Hong-Fei Li
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou, China.,School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Xian-Fang Wang
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou, China
| |
Collapse
|
178
|
Chu Y, Nie C, Wang Y. A Pipeline for Reconstructing Somatic Copy Number Alternation's Subclonal Population-Based Next-Generation Sequencing Data. Front Genet 2020; 10:1374. [PMID: 32180789 PMCID: PMC7058119 DOI: 10.3389/fgene.2019.01374] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Accepted: 12/16/2019] [Indexed: 12/23/2022] Open
Abstract
State-of-the-art next-generation sequencing (NGS)-based subclonal reconstruction methods perform poorly on somatic copy number alternations (SCNAs), due to not only it needs to simultaneously estimate the subclonal population frequency and the absolute copy number for each SCNA, but also there exist complex bias and noise in the tumor and its paired normal sequencing data. Both existing NGS-based SCNA detection methods and SCNA’s subclonal population frequency inferring tools use the read count on radio (RCR) of tumor to its paired normal as the key feature of tumor sequencing data; however, the sequencing error and bias have great impact on RCR, which leads to a large number of redundant SCNA segments that make the subsequent process of SCNA’s subclonal population frequency inferring and subclonal reconstruction time-consuming and inaccurate. We perform a mathematical analysis of the solution number of SCNA’s subclonal frequency, and we propose a computational algorithm to reduce the impact of false breakpoints based on it. We construct a new probability model that incorporates the RCR bias correction algorithm, and by stringing it with the false breakpoint filtering algorithm, we construct a whole SCNA’s subclonal population reconstruction pipeline. The experimental result shows that our pipeline outperforms the existing subclonal reconstruction programs both on simulated data and TCGA data. Source code is publicly available as a Python package at https://github.com/dustincys/msphy-SCNAClonal.
Collapse
Affiliation(s)
- Yanshuo Chu
- Center of Bioinfomatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chenxi Nie
- Center of Bioinfomatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- Center of Bioinfomatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
179
|
A learning based framework for diverse biomolecule relationship prediction in molecular association network. Commun Biol 2020; 3:118. [PMID: 32170157 PMCID: PMC7070057 DOI: 10.1038/s42003-020-0858-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 02/20/2020] [Indexed: 12/18/2022] Open
Abstract
Abundant life activities are maintained by various biomolecule relationships in human cells. However, many previous computational models only focus on isolated objects, without considering that cell is a complete entity with ample functions. Inspired by holism, we constructed a Molecular Associations Network (MAN) including 9 kinds of relationships among 5 types of biomolecules, and a prediction model called MAN-GF. More specifically, biomolecules can be represented as vectors by the algorithm called biomarker2vec which combines 2 kinds of information involved the attribute learned by k-mer, etc and the behavior learned by Graph Factorization (GF). Then, Random Forest classifier is applied for training, validation and test. MAN-GF obtained a substantial performance with AUC of 0.9647 and AUPR of 0.9521 under 5-fold Cross-validation. The results imply that MAN-GF with an overall perspective can act as ancillary for practice. Besides, it holds great hope to provide a new insight to elucidate the regulatory mechanisms. Guo et al. construct a large scale Molecular Associations Network (MAN) including 9 kinds of associations among 5 types of biomolecules, namely protein, miRNA, lncRNA, disease and drug. They further propose a computational model, MAN-GF, that can predict links between these biomolecules and displays a substantial performance under 5-fold cross validation.
Collapse
|
180
|
Liu T, Tang H. A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite. Curr Pharm Des 2020; 26:3049-3058. [PMID: 32156226 DOI: 10.2174/1381612826666200310122324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 02/10/2020] [Indexed: 11/22/2022]
Abstract
The number of human deaths caused by malaria is increasing day-by-day. In fact, the mitochondrial proteins of the malaria parasite play vital roles in the organism. For developing effective drugs and vaccines against infection, it is necessary to accurately identify mitochondrial proteins of the malaria parasite. Although precise details for the mitochondrial proteins can be provided by biochemical experiments, they are expensive and time-consuming. In this review, we summarized the machine learning-based methods for mitochondrial proteins identification in the malaria parasite and compared the construction strategies of these computational methods. Finally, we also discussed the future development of mitochondrial proteins recognition with algorithms.
Collapse
Affiliation(s)
- Ting Liu
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| |
Collapse
|
181
|
Dou L, Li X, Ding H, Xu L, Xiang H. Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem? MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 19:293-303. [PMID: 31865116 PMCID: PMC6931122 DOI: 10.1016/j.omtn.2019.11.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 10/29/2019] [Accepted: 11/11/2019] [Indexed: 01/01/2023]
Abstract
Pseudouridine (Ψ) is the most abundant RNA modification and has been found in many kinds of RNAs, including snRNA, rRNA, tRNA, mRNA, and snoRNA. Thus, Ψ sites play a significant role in basic research and drug development. Although some experimental techniques have been developed to identify Ψ sites, they are expensive and time consuming, especially in the post-genomic era with the explosive growth of known RNA sequences. Thus, highly accurate computational methods are urgently required to quickly detect the Ψ sites on uncharacterized RNA sequences. Several predictors have been proposed using multifarious features, but their evaluated performances are still unsatisfactory. In this study, we first identified Ψ sites for H. sapiens, S. cerevisiae, and M. musculus using the sequence features from the bi-profile Bayes (BPB) method based on the random forest (RF) and support vector machine (SVM) algorithms, where the performances were evaluated using 5-fold cross-validation and independent tests. It was found that the SVM-based accuracies were 3.55% and 5.09% lower than the iPseU-CUU predictor for the H_990 and S_628 datasets, respectively. Almost the same-level results were obtained for M_994 and an independent H_200 dataset, even showing a 5.0% improvement for S_200. Then, three different kinds of features, including basic Kmer, general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-General), and nucleotide chemical property (NCP) and nucleotide density (ND) from the iRNA-PseU method, were combined with BPB to show their comprehensive performances, where the effective features are selected by the max-relevance-max-distance (MRMD) method. The best evaluated accuracies of the combined features for the S_628 and M_994 datasets were achieved at 70.54% and 72.45%, which were 2.39% and 0.65% higher than iPseU-CUU. For the S_200 dataset, it was also improved 8% from 69% to 77%. However, there was no obvious improvement for H. sapiens, which was evaluated as approximately 63.23% and 72.0% for the H_990 and H_200 datasets, respectively. The overall performances for Ψ identification using BPB features as well as the combined features were not obviously improved. Although some kinds of feature extraction methods based on the RNA sequence information have been applied to construct the predictors in previous studies, the corresponding accuracies are generally in the range of 60%-70%. Thus, researchers need to reconsider whether there is any sequence feature in the RNA Ψ modification prediction problem.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaoling Li
- Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China.
| |
Collapse
|
182
|
Pan W, Sun W, Yang S, Zhuang H, Jiang H, Ju H, Wang D, Han Y. LDL-C plays a causal role on T2DM: a Mendelian randomization analysis. Aging (Albany NY) 2020; 12:2584-2594. [PMID: 32040442 PMCID: PMC7041740 DOI: 10.18632/aging.102763] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 01/12/2020] [Indexed: 06/10/2023]
Abstract
Diabetic dyslipidemia is a common condition in patients with Type 2 diabetes mellitus (T2DM). However, with the increasing application of statins which mainly decrease low-density lipoprotein cholesterol (LDL-C) levels, clinical trials and meta-analysis showed a clearly increase of the incidence of new-onset DMs, partly due to genetic factors. To determine whether a causal relationship exists between LDL-C and T2DM, we conducted a two-sample Mendelian Randomization (MR) analysis using genetic variations as instrumental variables (IVs). Initially, 29 SNPs significantly related to LDL-C (P≤ 5.0×10-8) were selected as based on results from the study of Henry et al, which processed loci data influencing lipids identified by the Global Lipids Genetics Consortium (GLGC) from 188,577 individuals of European ancestry. While 6 SNPs related to T2DM (P value < 5×10-2) were deleted, with the remaining 23 SNPs without LD eventually being deemed as IVs. The combined effect of all these 23 SNPs on T2DM, as generated with use of the penalized robust inverse-variance weighted (IVW) method (Beta value 0.24, 95%CI 0.087~0.393, P-value=0.002) demonstrated that elevated LDL-C levels significantly increased the risk of T2DM. The relationship between LDL-C and Type 1 diabetes mellitus (T1DM) with this analysis producing negative pooled results (Beta value -0.202, 95%CI -2.888~2.484, P-value=0.883).
Collapse
Affiliation(s)
- Wenbin Pan
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - He Zhuang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Huijie Jiang
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Hong Ju
- Department of Information Engineering, Heilongjiang Biological Science and Technology Career Academy, Harbin, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
183
|
Zhang D, Huo D, Xie H, Wu L, Zhang J, Liu L, Jin Q, Chen X. CHG: A Systematically Integrated Database of Cancer Hallmark Genes. Front Genet 2020; 11:29. [PMID: 32117445 PMCID: PMC7013921 DOI: 10.3389/fgene.2020.00029] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 01/09/2020] [Indexed: 12/20/2022] Open
Abstract
Background The analysis of cancer diversity based on a logical framework of hallmarks has greatly improved our understanding of the occurrence, development and metastasis of various cancers. Methods We designed Cancer Hallmark Genes (CHG) database which focuses on integrating hallmark genes in a systematic, standard way and annotates the potential roles of the hallmark genes in cancer processes. Following the conceptual criteria description of hallmark function the keywords for each hallmark were manually selected from the literature. Candidate hallmark genes collected were derived from 301 pathways of KEGG database by Lucene and manually corrected. Results Based on the variation data, we finally identified the hallmark genes of various types of cancer and constructed CHG. And we also analyzed the relationships among hallmarks and potential characteristics and relationships of hallmark genes based on the topological structures of their networks. We manually confirm the hallmark gene identified by CHG based on literature and database. We also predicted the prognosis of breast cancer, glioblastoma multiforme and kidney papillary cell carcinoma patients based on CHG data. Conclusions In summary, CHG, which was constructed based on a hallmark feature set, provides a new perspective for analyzing the diversity and development of cancers.
Collapse
Affiliation(s)
- Denan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Diwei Huo
- The 2nd Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Hongbo Xie
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Lingxiang Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Juan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Lei Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qing Jin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xiujie Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
184
|
Wang C, Zhang J, Wang X, Han K, Guo M. Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion. Front Genet 2020; 11:5. [PMID: 32117433 PMCID: PMC7010852 DOI: 10.3389/fgene.2020.00005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 01/06/2020] [Indexed: 12/23/2022] Open
Abstract
Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene-disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene-disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jie Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xueping Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
- Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
185
|
Peng L, Liu F, Yang J, Liu X, Meng Y, Deng X, Peng C, Tian G, Zhou L. Probing lncRNA-Protein Interactions: Data Repositories, Models, and Algorithms. Front Genet 2020; 10:1346. [PMID: 32082358 PMCID: PMC7005249 DOI: 10.3389/fgene.2019.01346] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 12/09/2019] [Indexed: 12/31/2022] Open
Abstract
Identifying lncRNA-protein interactions (LPIs) is vital to understanding various key biological processes. Wet experiments found a few LPIs, but experimental methods are costly and time-consuming. Therefore, computational methods are increasingly exploited to capture LPI candidates. We introduced relevant data repositories, focused on two types of LPI prediction models: network-based methods and machine learning-based methods. Machine learning-based methods contain matrix factorization-based techniques and ensemble learning-based techniques. To detect the performance of computational methods, we compared parts of LPI prediction models on Leave-One-Out cross-validation (LOOCV) and fivefold cross-validation. The results show that SFPEL-LPI obtained the best performance of AUC. Although computational models have efficiently unraveled some LPI candidates, there are many limitations involved. We discussed future directions to further boost LPI predictive performance.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Fuxing Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Jialiang Yang
- Department of Sciences, Genesis (Beijing) Co. Ltd., Beijing, China
| | - Xiaojun Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yajie Meng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiaojun Deng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Cheng Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Geng Tian
- Department of Sciences, Genesis (Beijing) Co. Ltd., Beijing, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
186
|
Wang C, Zhao N, Yuan L, Liu X. Computational Detection of Breast Cancer Invasiveness with DNA Methylation Biomarkers. Cells 2020; 9:E326. [PMID: 32019269 PMCID: PMC7072524 DOI: 10.3390/cells9020326] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Revised: 01/28/2020] [Accepted: 01/28/2020] [Indexed: 12/14/2022] Open
Abstract
Breast cancer is the most common female malignancy. It has high mortality, primarily due to metastasis and recurrence. Patients with invasive and noninvasive breast cancer require different treatments, so there is an urgent need for predictive tools to guide clinical decision making and avoid overtreatment of noninvasive breast cancer and undertreatment of invasive cases. Here, we divided the sample set based on the genome-wide methylation distance to make full use of metastatic cancer data. Specifically, we implemented two differential methylation analysis methods to identify specific CpG sites. After effective dimensionality reduction, we constructed a methylation-based classifier using the Random Forest algorithm to categorize the primary breast cancer. We took advantage of breast cancer (BRCA) HM450 DNA methylation data and accompanying clinical data from The Cancer Genome Atlas (TCGA) database to validate the performance of the classifier. Overall, this study demonstrates DNA methylation as a potential biomarker to predict breast tumor invasiveness and as a possible parameter that could be included in the studies aiming to predict breast cancer aggressiveness. However, more comparative studies are needed to assess its usability in the clinic. Towards this, we developed a website based on these algorithms to facilitate its use in studies and predictions of breast cancer invasiveness.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Ning Zhao
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China;
| | - Linlin Yuan
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China;
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| |
Collapse
|
187
|
Zhao S, Jiang H, Liang ZH, Ju H. Integrating Multi-Omics Data to Identify Novel Disease Genes and Single-Neucleotide Polymorphisms. Front Genet 2020; 10:1336. [PMID: 32038707 PMCID: PMC6993083 DOI: 10.3389/fgene.2019.01336] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 12/06/2019] [Indexed: 12/15/2022] Open
Abstract
Stroke ranks the second leading cause of death among people over the age of 60 in the world. Stroke is widely regarded as a complex disease that is affected by genetic and environmental factors. Evidence from twin and family studies suggests that genetic factors may play an important role in its pathogenesis. Therefore, research on the genetic association of susceptibility genes can help understand the mechanism of stroke. Genome-wide association study (GWAS) has found a large number of stroke-related loci, but their mechanism is unknown. In order to explore the function of single-nucleotide polymorphisms (SNPs) at the molecular level, in this paper, we integrated 8 GWAS datasets with brain expression quantitative trait loci (eQTL) dataset to identify SNPs and genes which are related to four types of stroke (ischemic stroke, large artery stroke, cardioembolic stroke, small vessel stroke). Thirty-eight SNPs which can affect 14 genes expression are found to be associated with stroke. Among these 14 genes, 10 genes expression are associated with ischemic stroke, one gene for large artery stroke, six genes for cardioembolic stroke and eight genes for small vessel stroke. To explore the effects of environmental factors on stroke, we identified methylation susceptibility loci associated with stroke using methylation quantitative trait loci (MQTL). Thirty-one of these 38 SNPs are at greater risk of methylation and can significantly change gene expression level. Overall, the genetic pathogenesis of stroke is explored from locus to gene, gene to gene expression and gene expression to phenotype.
Collapse
Affiliation(s)
- Sheng Zhao
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Huijie Jiang
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Zong-Hui Liang
- Department of Radiology, Jian'an District Centre Hospital of Fudan University, Shanghai, China
| | - Hong Ju
- Department of Information Engineering, Heilongjiang Biological Science and Technology Career Academy, Harbin, China
| |
Collapse
|
188
|
Zhou W, Yang F, Xu Z, Luo M, Wang P, Guo Y, Nie H, Yao L, Jiang Q. Comprehensive Analysis of Copy Number Variations in Kidney Cancer by Single-Cell Exome Sequencing. Front Genet 2020; 10:1379. [PMID: 32038722 PMCID: PMC6989475 DOI: 10.3389/fgene.2019.01379] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 12/17/2019] [Indexed: 12/16/2022] Open
Abstract
Clear-cell renal cell carcinoma (ccRCC) is the most common and lethal subtype of kidney cancer. VHL and PBRM1 are the top two significantly mutated genes in ccRCC specimens, while the genetic mechanism of the VHL/PBRM1-negative ccRCC remains to be elucidated. Here we carried out a comprehensive analysis of single-cell genomic copy number variations (CNVs) in VHL/PBRM1-negative ccRCC. Genomic CNVs were identified at the single-cell level, and the tumor cells showed widespread amplification and deletion across the whole genome. Functional enrichment analysis indicated that the amplified genes are significantly enriched in cancer-related signaling transduction pathways. Besides, receptor protein tyrosine kinase (RTK) genes also showed widespread copy number variations in cancer cells. Our studies indicated that the genomic CNVs in RTK genes and downstream signaling transduction pathways may be involved in VHL/PBRM1-negative ccRCC pathogenesis and progression, and highlighted the role of the comprehensive investigation of genomic CNVs at the single-cell level in both clarifying pathogenic mechanism and identifying potential therapeutic targets in cancers.
Collapse
Affiliation(s)
- Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Fan Yang
- Department of Neurology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Zhaochun Xu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yu Guo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Huan Nie
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Lifen Yao
- Department of Neurology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
189
|
Wang T, Peng Q, Liu B, Liu X, Liu Y, Peng J, Wang Y. eQTLMAPT: Fast and Accurate eQTL Mediation Analysis With Efficient Permutation Testing Approaches. Front Genet 2020; 10:1309. [PMID: 31998368 PMCID: PMC6970436 DOI: 10.3389/fgene.2019.01309] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 11/27/2019] [Indexed: 12/21/2022] Open
Abstract
Expression quantitative trait locus (eQTL) analyses are critical in understanding the complex functional regulatory natures of genetic variation and have been widely used in the interpretation of disease-associated variants identified by genome-wide association studies (GWAS). Emerging evidence has shown that trans-eQTL effects on remote gene expression could be mediated by local transcripts, which is known as the mediation effects. To discover the genome-wide eQTL mediation effects combing genomic and transcriptomic profiles, it is necessary to develop novel computational methods to rapidly scan large number of candidate associations while controlling for multiple testing appropriately. Here, we present eQTLMAPT, an R package aiming to perform eQTL mediation analysis with implementation of efficient permutation procedures in multiple testing correction. eQTLMAPT is advantageous in threefold. First, it accelerates mediation analysis by effectively pruning the permutation process through adaptive permutation scheme. Second, it can efficiently and accurately estimate the significance level of mediation effects by modeling the null distribution with generalized Pareto distribution (GPD) trained from a few permutation statistics. Third, eQTLMAPT provides flexible interfaces for users to combine various permutation schemes with different confounding adjustment methods. Experiments on real eQTL dataset demonstrate that eQTLMAPT provides higher resolution of estimated significance of mediation effects and is an order of magnitude faster than compared methods with similar accuracy.
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qidi Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiaoli Liu
- Department of Neurology, Zhejiang Hospital, Hangzhou, China
| | - Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
190
|
Zhao L, Wang J, Pang L, Liu Y, Zhang J. GANsDTA: Predicting Drug-Target Binding Affinity Using GANs. Front Genet 2020; 10:1243. [PMID: 31993067 PMCID: PMC6962343 DOI: 10.3389/fgene.2019.01243] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 11/11/2019] [Indexed: 01/09/2023] Open
Abstract
The computational prediction of interactions between drugs and targets is a standing challenge in drug discovery. State-of-the-art methods for drug-target interaction prediction are primarily based on supervised machine learning with known label information. However, in biomedicine, obtaining labeled training data is an expensive and a laborious process. This paper proposes a semi-supervised generative adversarial networks (GANs)-based method to predict binding affinity. Our method comprises two parts, two GANs for feature extraction and a regression network for prediction. The semi-supervised mechanism allows our model to learn proteins drugs features of both labeled and unlabeled data. We evaluate the performance of our method using multiple public datasets. Experimental results demonstrate that our method achieves competitive performance while utilizing freely available unlabeled data. Our results suggest that utilizing such unlabeled data can considerably help improve performance in various biomedical relation extraction processes, for example, Drug-Target interaction and protein-protein interaction, particularly when only limited labeled data are available in such tasks. To our best knowledge, this is the first semi-supervised GANs-based method to predict binding affinity.
Collapse
Affiliation(s)
- Lingling Zhao
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Junjie Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Long Pang
- Institute of Space Environment and Material Science, Harbin Institute of Technology, Harbin, China
| | - Yang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jun Zhang
- Department of Rehabilitation, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| |
Collapse
|
191
|
Lin Y, Liu T, Cui T, Wang Z, Zhang Y, Tan P, Huang Y, Yu J, Wang D. RNAInter in 2020: RNA interactome repository with increased coverage and annotation. Nucleic Acids Res 2020; 48:D189-D197. [PMID: 31906603 PMCID: PMC6943043 DOI: 10.1093/nar/gkz804] [Citation(s) in RCA: 160] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 09/03/2019] [Accepted: 09/10/2019] [Indexed: 01/23/2023] Open
Abstract
Research on RNA-associated interactions has exploded in recent years, and increasing numbers of studies are not limited to RNA-RNA and RNA-protein interactions but also include RNA-DNA/compound interactions. To facilitate the development of the interactome and promote understanding of the biological functions and molecular mechanisms of RNA, we updated RAID v2.0 to RNAInter (RNA Interactome Database), a repository for RNA-associated interactions that is freely accessible at http://www.rna-society.org/rnainter/ or http://www.rna-society.org/raid/. Compared to RAID v2.0, new features in RNAInter include (i) 8-fold more interaction data and 94 additional species; (ii) more definite annotations organized, including RNA editing/localization/modification/structure and homology interaction; (iii) advanced functions including fuzzy/batch search, interaction network and RNA dynamic expression and (iv) four embedded RNA interactome tools: RIscoper, IntaRNA, PRIdictor and DeepBind. Consequently, RNAInter contains >41 million RNA-associated interaction entries, involving more than 450 thousand unique molecules, including RNA, protein, DNA and compound. Overall, RNAInter provides a comprehensive RNA interactome resource for researchers and paves the way to investigate the regulatory landscape of cellular RNAs.
Collapse
Affiliation(s)
- Yunqing Lin
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Tianyuan Liu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Tianyu Cui
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Zhao Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yuncong Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Puwen Tan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Yan Huang
- Shunde Hospital, Southern Medical University (The First People's Hospital of Shunde), Foshan 528308, China
| | - Jia Yu
- State Key Laboratory of Medical Molecular Biology, Department of Biochemistry & Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences (CAMS) & Peking Union Medical College (PUMC), Beijing 100730, China
| | - Dong Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
- Shunde Hospital, Southern Medical University (The First People's Hospital of Shunde), Foshan 528308, China
- Dermatology Hospital, Southern Medical University, Guangzhou 510091, China
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 611731, China
- To whom correspondence should be addressed. Tel: +86 20 61648279; Fax: +86 20 61648279; or
| |
Collapse
|
192
|
Ru X, Cao P, Li L, Zou Q. Selecting Essential MicroRNAs Using a Novel Voting Method. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:16-23. [PMID: 31479921 PMCID: PMC6727015 DOI: 10.1016/j.omtn.2019.07.019] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 06/20/2019] [Accepted: 07/08/2019] [Indexed: 02/06/2023]
Abstract
Among the large number of known microRNAs (miRNAs), some miRNAs play negligible roles in cell regulation. Therefore, selecting essential miRNAs is an important initial step for a deeper understanding of miRNAs and their functions. In this study, we generated 60 classification models by combining 12 representative feature extraction methods and 5 commonly used classification algorithms. The optimal model for essential miRNA classification that we obtained is based on the Mismatch feature extraction method combined with the random forest algorithm. The F-Measure, area under the curve, and accuracy values of this model were 93.2%, 96.7%, and 93.0%, respectively. We also found that the distribution of the positive and negative examples of the first few features greatly influenced the classification results. The feature extraction methods performed best when the differences between the positive and negative examples were obvious, and this led to better classification of essential miRNAs. Because each classifier's predictions for the same sample may be different, we employed a novel voting method to improve the accuracy of the classification of essential miRNAs. The performance results showed that the best classification results were obtained when five classification models were used in the voting. The five classification models were constructed based on the Mismatch, pseudo-distance structure status pair composition, Subsequence, Kmer, and Triplet feature extraction methods. The voting result was 95.3%. Our results suggest that the voting method can be an important tool for selecting essential miRNAs.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Peigang Cao
- Department of Cardiology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Lihong Li
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
193
|
Zhong W, Zhong B, Zhang H, Chen Z, Chen Y. Identification of Anti-cancer Peptides Based on Multi-classifier System. Comb Chem High Throughput Screen 2019; 22:694-704. [PMID: 31793417 DOI: 10.2174/1386207322666191203141102] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 07/18/2019] [Accepted: 07/30/2019] [Indexed: 01/01/2023]
Abstract
AIMS AND OBJECTIVE Cancer is one of the deadliest diseases, taking the lives of millions every year. Traditional methods of treating cancer are expensive and toxic to normal cells. Fortunately, anti-cancer peptides (ACPs) can eliminate this side effect. However, the identification and development of new anti-cancer peptides through experiments take a lot of time and money, therefore, it is necessary to develop a fast and accurate calculation model to identify the anti-cancer peptide. Machine learning algorithms are a good choice. MATERIALS AND METHODS In our study, a multi-classifier system was used, combined with multiple machine learning models, to predict anti-cancer peptides. These individual learners are composed of different feature information and algorithms, and form a multi-classifier system by voting. RESULTS AND CONCLUSION The experiments show that the overall prediction rate of each individual learner is above 80% and the overall accuracy of multi-classifier system for anti-cancer peptides prediction can reach 95.93%, which is better than the existing prediction model.
Collapse
Affiliation(s)
- Wanben Zhong
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China
| | - Bineng Zhong
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China.,Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing University of Science and Technology, Nanjing, 210094, China
| | - Hongbo Zhang
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China
| | - Ziyi Chen
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China
| | - Yan Chen
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China
| |
Collapse
|
194
|
Taxonomy dimension reduction for colorectal cancer prediction. Comput Biol Chem 2019; 83:107160. [DOI: 10.1016/j.compbiolchem.2019.107160] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 11/02/2019] [Accepted: 11/04/2019] [Indexed: 02/01/2023]
|
195
|
Guo ZH, You ZH, Yi HC. Integrative Construction and Analysis of Molecular Association Network in Human Cells by Fusing Node Attribute and Behavior Information. MOLECULAR THERAPY-NUCLEIC ACIDS 2019; 19:498-506. [PMID: 31923739 PMCID: PMC6951835 DOI: 10.1016/j.omtn.2019.10.046] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 10/07/2019] [Accepted: 10/21/2019] [Indexed: 11/27/2022]
Abstract
Detecting whether a pair of biomolecules associate is of great significance in the study of molecular biology. Hence, computational methods are urgently needed as guidance for practice. However, most of the previous prediction models influenced by reductionism focused on isolated research objects, which have their own inherent defects. Inspired by holism, a machine-learning-based framework called MAN-node2vec is proposed to predict multi-type relationships in the molecular associations network (MAN). Specifically, we constructed a large-scale MAN composed of 1,023 miRNAs, 1,649 proteins, 769 long non-coding RNAs (lncRNAs), 1,025 drugs, and 2,062 diseases. Then, each biomolecule in MAN can be represented as a vector by its attribute learned by k-mer, etc. and its behavior learned by node2vec. Finally, the random forest classifier is applied to carry out the relationship prediction task. The proposed model achieved a reliable performance with 0.9677 areas under the curve (AUCs) and 0.9562 areas under the precision curve (AUPRs) under 5-fold cross-validation. Also, additional experiments proved that the proposed global model shows more competitive performance than the traditional local method. All of these provided a systematic insight for understanding the synergistic interactions between various molecules and diseases. It is anticipated that this work can bring beneficial inspiration and advance to related systems biology and biomedical research.
Collapse
Affiliation(s)
- Zhen-Hao Guo
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Hai-Cheng Yi
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
196
|
Hou H, Gan T, Yang Y, Zhu X, Liu S, Guo W, Hao J. Using deep reinforcement learning to speed up collective cell migration. BMC Bioinformatics 2019; 20:571. [PMID: 31760946 PMCID: PMC6876083 DOI: 10.1186/s12859-019-3126-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Collective cell migration is a significant and complex phenomenon that affects many basic biological processes. The coordination between leader cell and follower cell affects the rate of collective cell migration. However, there are still very few papers on the impacts of the stimulus signal released by the leader on the follower. Tracking cell movement using 3D time-lapse microscopy images provides an unprecedented opportunity to systematically study and analyze collective cell migration. RESULTS Recently, deep reinforcement learning algorithms have become very popular. In our paper, we also use this method to train the number of cells and control signals. By experimenting with single-follower cell and multi-follower cells, it is concluded that the number of stimulation signals is proportional to the rate of collective movement of the cells. Such research provides a more diverse approach and approach to studying biological problems. CONCLUSION Traditional research methods are always based on real-life scenarios, but as the number of cells grows exponentially, the research process is too time consuming. Agent-based modeling is a robust framework that approximates cells to isotropic, elastic, and sticky objects. In this paper, an agent-based modeling framework is used to establish a simulation platform for simulating collective cell migration. The goal of the platform is to build a biomimetic environment to demonstrate the importance of stimuli between the leading and following cells.
Collapse
Affiliation(s)
- Hanxu Hou
- School of Electrical Engineering & Intelligentization, Dongguan University of Technology, No.1 University Road, DongGuan, 523808 China
| | - Tian Gan
- College of Intelligence and Computing, TianJin University, No.135 Yaguan Road, TianJin, 300350 China
| | - Yaodong Yang
- College of Intelligence and Computing, TianJin University, No.135 Yaguan Road, TianJin, 300350 China
| | - Xianglei Zhu
- Automotive Data Center, CATARC, No.69 Xianfeng Road, TianJin, 300300 China
| | - Sen Liu
- Automotive Data Center, CATARC, No.69 Xianfeng Road, TianJin, 300300 China
| | - Weiming Guo
- Automotive Data Center, CATARC, No.69 Xianfeng Road, TianJin, 300300 China
| | - Jianye Hao
- School of Electrical Engineering & Intelligentization, Dongguan University of Technology, No.1 University Road, DongGuan, 523808 China
| |
Collapse
|
197
|
Abstract
BACKGROUND With the development of e-Health, it plays a more and more important role in predicting whether a doctor's answer can be accepted by a patient through online healthcare community. Unlike the previous work which focus mainly on the numerical feature, in our framework, we combine both numerical and textual information to predict the acceptance of answers. The textual information is composed of questions posted by the patients and answers posted by the doctors. To extract the textual features from them, we first trained a sentence encoder to encode a pair of question and answer into a co-dependent representation on a held-out dataset. After that,we can use it to predict the acceptance of answers by doctors. RESULTS Our experimental results on the real-world dataset demonstrate that by applying our model additional features from text can be extracted and the prediction can be more accurate. That's to say, the model which take both textual features and numerical features as input performs significantly better than model which takes numerical features only on all the four metrics (Accuracy, AUC, F1-score and Recall). CONCLUSIONS This work proposes a generic framework combining numerical features and textual features for acceptance prediction, where textual features are extracted from text based on deep learning methods firstly and can be used to achieve a better prediction results.
Collapse
Affiliation(s)
- Qianlong Liu
- School of Data Science, Fudan University, Handan Road, Shanghai, China
- Jockey Club School of Public Health and Primary Care The Chinese University of Hong Kong, Hong Kong, China
| | - Kangenbei Liao
- School of Data Science, Fudan University, Handan Road, Shanghai, China
- Jockey Club School of Public Health and Primary Care The Chinese University of Hong Kong, Hong Kong, China
| | - Kelvin Kam-fai Tsoi
- Jockey Club School of Public Health and Primary Care The Chinese University of Hong Kong, Hong Kong, China
| | - Zhongyu Wei
- School of Data Science, Fudan University, Handan Road, Shanghai, China
| |
Collapse
|
198
|
Peng J, Lu G, Xue H, Wang T, Shang X. TS-GOEA: a web tool for tissue-specific gene set enrichment analysis based on gene ontology. BMC Bioinformatics 2019; 20:572. [PMID: 31760951 PMCID: PMC6876092 DOI: 10.1186/s12859-019-3125-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND The Gene Ontology (GO) knowledgebase is the world's largest source of information on the functions of genes. Since the beginning of GO project, various tools have been developed to perform GO enrichment analysis experiments. GO enrichment analysis has become a commonly used method of gene function analysis. Existing GO enrichment analysis tools do not consider tissue-specific information, although this information is very important to current research. RESULTS In this paper, we built an easy-to-use web tool called TS-GOEA that allows users to easily perform experiments based on tissue-specific GO enrichment analysis. TS-GOEA uses strict threshold statistical method for GO enrichment analysis, and provides statistical tests to improve the reliability of the analysis results. Meanwhile, TS-GOEA provides tools to compare different experimental results, which is convenient for users to compare the experimental results. To evaluate its performance, we tested the genes associated with platelet disease with TS-GOEA. CONCLUSIONS TS-GOEA is an effective GO analysis tool with unique features. The experimental results show that our method has better performance and provides a useful supplement for the existing GO enrichment analysis tools. TS-GOEA is available at http://120.77.47.2:5678.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Guilin Lu
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Hansheng Xue
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Tao Wang
- School of Computer Science, Harbin Institute of Technology, Harbin, 150001 China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| |
Collapse
|
199
|
Abstract
Background Alzheimer’s disease (AD) imposes a heavy burden on society and every family. Therefore, diagnosing AD in advance and discovering new drug targets are crucial, while these could be achieved by identifying AD-related proteins. The time-consuming and money-costing biological experiment makes researchers turn to develop more advanced algorithms to identify AD-related proteins. Results Firstly, we proposed a hypothesis “similar diseases share similar related proteins”. Therefore, five similarity calculation methods are introduced to find out others diseases which are similar to AD. Then, these diseases’ related proteins could be obtained by public data set. Finally, these proteins are features of each disease and could be used to map their similarity to AD. We developed a novel method ‘LRRGD’ which combines Logistic Regression (LR) and Gradient Descent (GD) and borrows the idea of Random Forest (RF). LR is introduced to regress features to similarities. Borrowing the idea of RF, hundreds of LR models have been built by randomly selecting 40 features (proteins) each time. Here, GD is introduced to find out the optimal result. To avoid the drawback of local optimal solution, a good initial value is selected by some known AD-related proteins. Finally, 376 proteins are found to be related to AD. Conclusion Three hundred eight of three hundred seventy-six proteins are the novel proteins. Three case studies are done to prove our method’s effectiveness. These 308 proteins could give researchers a basis to do biological experiments to help treatment and diagnostic AD.
Collapse
Affiliation(s)
- Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China.
| |
Collapse
|
200
|
Wang Z, Wang Y. Extracting a biologically latent space of lung cancer epigenetics with variational autoencoders. BMC Bioinformatics 2019; 20:568. [PMID: 31760935 PMCID: PMC6876071 DOI: 10.1186/s12859-019-3130-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Background Lung cancer is one of the most malignant tumors, causing over 1,000,000 deaths each year worldwide. Deep learning has brought success in many domains in recent years. DNA methylation, an epigenetic factor, is used for model training in many studies. There is an opportunity for deep learning methods to analyze the lung cancer epigenetic data to determine their subtypes for appropriate treatment. Results Here, we employ variational autoencoders (VAEs), an unsupervised deep learning framework, on 450K DNA methylation data of TCGA-LUAD and TCGA-LUSC to learn latent representations of the DNA methylation landscape. We extract a biologically relevant latent space of LUAD and LUSC samples. It is showed that the bivariate classifiers on the further compressed latent features could classify the subtypes accurately. Through clustering of methylation-based latent space features, we demonstrate that the VAEs can capture differential methylation patterns about subtypes of lung cancer. Conclusions VAEs can distinguish the original subtypes from manually mixed methylation data frame with the encoded features of latent space. Further applications about VAEs should focus on fine-grained subtypes identification for precision medicine.
Collapse
Affiliation(s)
- Zhenxing Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|