1
|
Yu S, Liu L, Wang H, Yan S, Zheng S, Ning J, Luo R, Fu X, Deng X. AtML: An Arabidopsis thaliana root cell identity recognition tool for medicinal ingredient accumulation. Methods 2024; 231:61-69. [PMID: 39293728 DOI: 10.1016/j.ymeth.2024.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 08/05/2024] [Accepted: 09/12/2024] [Indexed: 09/20/2024] Open
Abstract
Arabidopsis thaliana synthesizes various medicinal compounds, and serves as a model plant for medicinal plant research. Single-cell transcriptomics technologies are essential for understanding the developmental trajectory of plant roots, facilitating the analysis of synthesis and accumulation patterns of medicinal compounds in different cell subpopulations. Although methods for interpreting single-cell transcriptomics data are rapidly advancing in Arabidopsis, challenges remain in precisely annotating cell identity due to the lack of marker genes for certain cell types. In this work, we trained a machine learning system, AtML, using sequencing datasets from six cell subpopulations, comprising a total of 6000 cells, to predict Arabidopsis root cell stages and identify biomarkers through complete model interpretability. Performance testing using an external dataset revealed that AtML achieved 96.50% accuracy and 96.51% recall. Through the interpretability provided by AtML, our model identified 160 important marker genes, contributing to the understanding of cell type annotations. In conclusion, we trained AtML to efficiently identify Arabidopsis root cell stages, providing a new tool for elucidating the mechanisms of medicinal compound accumulation in Arabidopsis roots.
Collapse
Affiliation(s)
- Shicong Yu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Lijia Liu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Hao Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shen Yan
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shuqin Zheng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Jing Ning
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Ruxian Luo
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Xiangzheng Fu
- Research Institute of Hunan University in Chongqing, Chongqing 401120, China.
| | - Xiaoshu Deng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China; Chongqing Academy of Chinese Materia Medica, Chongqing 400065, China.
| |
Collapse
|
2
|
Jin W, Jia J, Si Y, Liu J, Li H, Zhu H, Wu Z, Zuo Y, Yu L. Identification of Key lncRNAs Associated with Immune Infiltration and Prognosis in Gastric Cancer. Biochem Genet 2024:10.1007/s10528-024-10801-w. [PMID: 38658494 DOI: 10.1007/s10528-024-10801-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 04/05/2024] [Indexed: 04/26/2024]
Abstract
Long non-coding RNAs (lncRNAs), as promising novel biomarkers for cancer treatment and prognosis, can function as tumor suppressors and oncogenes in the occurrence and development of many types of cancer, including gastric cancer (GC). However, little is known about the complex regulatory system of lncRNAs in GC. In this study, we systematically analyzed lncRNA and miRNA transcriptomic profiles of GC based on bioinformatics methods and experimental validation. An lncRNA-miRNA interaction network related to GC was constructed, and the nine crucial lncRNAs were identified. These 9 lncRNAs were found to be associated with the prognosis of GC patients by Cox proportional hazards regression analysis. Among them, the expression of lncRNA SNHG14 can affect the survival of GC patients as a potential prognostic marker. Moreover, it was shown that SNHG14 was involved in immune-related pathways and significantly correlated with immune cell infiltration in GC. Meanwhile, we found that SNHG14 affected immune function in many cancers, such as breast cancer and esophageal carcinoma. Such information revealed that SNHG14 may serve as a potential target for cancer immunotherapy. As well, our study could provide practical and theoretical guiding significance for clinical application of non-coding RNAs.
Collapse
Affiliation(s)
- Wen Jin
- Clinical Medical Research Center, Inner Mongolia Key Laboratory of Gene Regulation of the Metabolic Disease, Inner Mongolia People's Hospital, Hohhot, 010010, China
| | - Jianchao Jia
- Clinical Medical Research Center, Inner Mongolia Key Laboratory of Gene Regulation of the Metabolic Disease, Inner Mongolia People's Hospital, Hohhot, 010010, China
| | - Yangming Si
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Jianli Liu
- School of Water Resource and Environment Engineering, China University of Geosciences, Beijing, 100083, China
| | - Hanshuang Li
- College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Hao Zhu
- Clinical Medical Research Center, Inner Mongolia Key Laboratory of Gene Regulation of the Metabolic Disease, Inner Mongolia People's Hospital, Hohhot, 010010, China
| | - Zhouying Wu
- Clinical Medical Research Center, Inner Mongolia Key Laboratory of Gene Regulation of the Metabolic Disease, Inner Mongolia People's Hospital, Hohhot, 010010, China
| | - Yongchun Zuo
- College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
- Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot, 010010, China.
- Inner Mongolia International Mongolian Hospital, Hohhot, 010065, China.
| | - Lan Yu
- Clinical Medical Research Center, Inner Mongolia Key Laboratory of Gene Regulation of the Metabolic Disease, Inner Mongolia People's Hospital, Hohhot, 010010, China.
- Department of Endocrine and Metabolic Diseases, Inner Mongolia People's Hospital, Hohhot, 010010, China.
| |
Collapse
|
3
|
Wang H, Lin YN, Yan S, Hong JP, Tan JR, Chen YQ, Cao YS, Fang W. NRTPredictor: identifying rice root cell state in single-cell RNA-seq via ensemble learning. PLANT METHODS 2023; 19:119. [PMID: 37925413 PMCID: PMC10625708 DOI: 10.1186/s13007-023-01092-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 10/15/2023] [Indexed: 11/06/2023]
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) measurements of gene expression show great promise for studying the cellular heterogeneity of rice roots. How precisely annotating cell identity is a major unresolved problem in plant scRNA-seq analysis due to the inherent high dimensionality and sparsity. RESULTS To address this challenge, we present NRTPredictor, an ensemble-learning system, to predict rice root cell stage and mine biomarkers through complete model interpretability. The performance of NRTPredictor was evaluated using a test dataset, with 98.01% accuracy and 95.45% recall. With the power of interpretability provided by NRTPredictor, our model recognizes 110 marker genes partially involved in phenylpropanoid biosynthesis. Expression patterns of rice root could be mapped by the above-mentioned candidate genes, showing the superiority of NRTPredictor. Integrated analysis of scRNA and bulk RNA-seq data revealed aberrant expression of Epidermis cell subpopulations in flooding, Pi, and salt stresses. CONCLUSION Taken together, our results demonstrate that NRTPredictor is a useful tool for automated prediction of rice root cell stage and provides a valuable resource for deciphering the rice root cellular heterogeneity and the molecular mechanisms of flooding, Pi, and salt stresses. Based on the proposed model, a free webserver has been established, which is available at https://www.cgris.net/nrtp .
Collapse
Affiliation(s)
- Hao Wang
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Yu-Nan Lin
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Shen Yan
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jing-Peng Hong
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jia-Rui Tan
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Yan-Qing Chen
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Yong-Sheng Cao
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Wei Fang
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| |
Collapse
|
4
|
Wu C, Guo D. Identification of Two Flip-Over Genes in Grass Family as Potential Signature of C4 Photosynthesis Evolution. Int J Mol Sci 2023; 24:14165. [PMID: 37762466 PMCID: PMC10531853 DOI: 10.3390/ijms241814165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 09/05/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023] Open
Abstract
In flowering plants, C4 photosynthesis is superior to C3 type in carbon fixation efficiency and adaptation to extreme environmental conditions, but the mechanisms behind the assembly of C4 machinery remain elusive. This study attempts to dissect the evolutionary divergence from C3 to C4 photosynthesis in five photosynthetic model plants from the grass family, using a combined comparative transcriptomics and deep learning technology. By examining and comparing gene expression levels in bundle sheath and mesophyll cells of five model plants, we identified 16 differentially expressed signature genes showing cell-specific expression patterns in C3 and C4 plants. Among them, two showed distinctively opposite cell-specific expression patterns in C3 vs. C4 plants (named as FOGs). The in silico physicochemical analysis of the two FOGs illustrated that C3 homologous proteins of LHCA6 had low and stable pI values of ~6, while the pI values of LHCA6 homologs increased drastically in C4 plants Setaria viridis (7), Zea mays (8), and Sorghum bicolor (over 9), suggesting this protein may have different functions in C3 and C4 plants. Interestingly, based on pairwise protein sequence/structure similarities between each homologous FOG protein, one FOG PGRL1A showed local inconsistency between sequence similarity and structure similarity. To find more examples of the evolutionary characteristics of FOG proteins, we investigated the protein sequence/structure similarities of other FOGs (transcription factors) and found that FOG proteins have diversified incompatibility between sequence and structure similarities during grass family evolution. This raised an interesting question as to whether the sequence similarity is related to structure similarity during C4 photosynthesis evolution.
Collapse
Affiliation(s)
| | - Dianjing Guo
- State Key Laboratory of Agrobiotechnology, School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China;
| |
Collapse
|
5
|
Rahman MS, Paul KC, Rahman MM, Samuel J, Thill JC, Hossain MA, Ali GGMN. Pandemic vulnerability index of US cities: A hybrid knowledge-based and data-driven approach. SUSTAINABLE CITIES AND SOCIETY 2023; 95:104570. [PMID: 37065624 PMCID: PMC10085879 DOI: 10.1016/j.scs.2023.104570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 04/01/2023] [Accepted: 04/01/2023] [Indexed: 06/19/2023]
Abstract
Cities become mission-critical zones during pandemics and it is vital to develop a better understanding of the factors that are associated with infection levels. The COVID-19 pandemic has impacted many cities severely; however, there is significant variance in its impact across cities. Pandemic infection levels are associated with inherent features of cities (e.g., population size, density, mobility patterns, socioeconomic condition, and health & environment), which need to be better understood. Intuitively, the infection levels are expected to be higher in big urban agglomerations, but the measurable influence of a specific urban feature is unclear. The present study examines 41 variables and their potential influence on the incidence of COVID-19 infection cases. The study uses a multi-method approach to study the influence of variables, classified as demographic, socioeconomic, mobility and connectivity, urban form and density, and health and environment dimensions. This study develops an index dubbed the pandemic vulnerability index at city level (PVI-CI) for classifying the pandemic vulnerability levels of cities, grouping them into five vulnerability classes, from very high to very low. Furthermore, clustering and outlier analysis provides insights on the spatial clustering of cities with high and low vulnerability scores. This study provides strategic insights into levels of influence of key variables upon the spread of infections, along with an objective ranking for the vulnerability of cities. Thus, it provides critical wisdom needed for urban healthcare policy and resource management. The calculation method for the pandemic vulnerability index and the associated analytical process present a blueprint for the development of similar indices for cities in other countries, leading to a better understanding and improved pandemic management for urban areas, and more resilient planning for future pandemics in cities across the world.
Collapse
Affiliation(s)
- Md Shahinoor Rahman
- Department of Earth and Environmental Sciences, New Jersey City University, Jersey City, NJ, 07305, USA
| | - Kamal Chandra Paul
- Department of Electrical and Computer Engineering, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA
| | - Md Mokhlesur Rahman
- The William States Lee College of Engineering, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA
- Department of Urban and Regional Planning, Khulna University of Engineering & Technology (KUET), Khulna, Khulna, 9203, Bangladesh
| | - Jim Samuel
- E.J. Bloustein School of Planning & Public Policy, Rutgers University, NJ, 08901, USA
| | - Jean-Claude Thill
- Department of Geography and Earth Sciences, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA
- School of Data Science, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA
| | - Md Amjad Hossain
- Department of Accounting, Information Systems, and Finance, Emporia State University, Emporia, KS, 66801, USA
| | - G G Md Nawaz Ali
- Department of Computer Science and Information Systems, Bradley University, Peoria, IL, 61625, USA
| |
Collapse
|
6
|
Zhou Y, Ping X, Guo Y, Heng BC, Wang Y, Meng Y, Jiang S, Wei Y, Lai B, Zhang X, Deng X. Assessing Biomaterial-Induced Stem Cell Lineage Fate by Machine Learning-Based Artificial Intelligence. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2023; 35:e2210637. [PMID: 36756993 DOI: 10.1002/adma.202210637] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 02/02/2023] [Indexed: 05/12/2023]
Abstract
Current functional assessment of biomaterial-induced stem cell lineage fate in vitro mainly relies on biomarker-dependent methods with limited accuracy and efficiency. Here a "Mesenchymal stem cell Differentiation Prediction (MeD-P)" framework for biomaterial-induced cell lineage fate prediction is reported. MeD-P contains a cell-type-specific gene expression profile as a reference by integrating public RNA-seq data related to tri-lineage differentiation (osteogenesis, chondrogenesis, and adipogenesis) of human mesenchymal stem cells (hMSCs) and a predictive model for classifying hMSCs differentiation lineages using the k-nearest neighbors (kNN) strategy. It is shown that MeD-P exhibits an overall accuracy of 90.63% on testing datasets, which is significantly higher than the model constructed based on canonical marker genes (80.21%). Moreover, evaluations of multiple biomaterials show that MeD-P provides accurate prediction of lineage fate on different types of biomaterials as early as the first week of hMSCs culture. In summary, it is demonstrated that MeD-P is an efficient and accurate strategy for stem cell lineage fate prediction and preliminary biomaterial functional evaluation.
Collapse
Affiliation(s)
- Yingying Zhou
- Department of Dental Materials and Dental Medical Devices Testing Center, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
- National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, NMPA Key Laboratory for Dental Materials, Beijing Laboratory of Biomedical Materials, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
| | - Xianfeng Ping
- National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, NMPA Key Laboratory for Dental Materials, Beijing Laboratory of Biomedical Materials, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
- Central Laboratory, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
| | - Yusi Guo
- National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, NMPA Key Laboratory for Dental Materials, Beijing Laboratory of Biomedical Materials, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
- Department of Geriatric Dentistry, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
| | - Boon Chin Heng
- National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, NMPA Key Laboratory for Dental Materials, Beijing Laboratory of Biomedical Materials, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
- Central Laboratory, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
| | - Yijun Wang
- Department of Dental Materials and Dental Medical Devices Testing Center, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
- National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, NMPA Key Laboratory for Dental Materials, Beijing Laboratory of Biomedical Materials, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
| | - Yanze Meng
- Department of Dental Materials and Dental Medical Devices Testing Center, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
- National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, NMPA Key Laboratory for Dental Materials, Beijing Laboratory of Biomedical Materials, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
| | - Shengjie Jiang
- National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, NMPA Key Laboratory for Dental Materials, Beijing Laboratory of Biomedical Materials, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
- Department of Geriatric Dentistry, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
| | - Yan Wei
- National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, NMPA Key Laboratory for Dental Materials, Beijing Laboratory of Biomedical Materials, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
- Department of Geriatric Dentistry, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
| | - Binbin Lai
- Biomedical Engineering Department, Peking University, Beijing, 100191, P. R. China
- Department of Dermatology and Venereology, Peking University First Hospital, Beijing, 100034, P. R. China
| | - Xuehui Zhang
- Department of Dental Materials and Dental Medical Devices Testing Center, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
- National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, NMPA Key Laboratory for Dental Materials, Beijing Laboratory of Biomedical Materials, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
| | - Xuliang Deng
- National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, NMPA Key Laboratory for Dental Materials, Beijing Laboratory of Biomedical Materials, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
- Department of Geriatric Dentistry, Peking University School and Hospital of Stomatology, Beijing, 100081, P. R. China
- Biomedical Engineering Department, Peking University, Beijing, 100191, P. R. China
| |
Collapse
|
7
|
Liang Y, Yang S, Zheng L, Wang H, Zhou J, Huang S, Yang L, Zuo Y. Research progress of reduced amino acid alphabets in protein analysis and prediction. Comput Struct Biotechnol J 2022; 20:3503-3510. [PMID: 35860409 PMCID: PMC9284397 DOI: 10.1016/j.csbj.2022.07.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 11/29/2022] Open
Abstract
A comprehensive summary of the literature on the reduced amino acid alphabets. A systematic review of the development history of reduced amino acid alphabets. Rich application cases of amino acid reduction alphabets are described in the article. A detailed analysis of the properties and uses of the reduced amino acid alphabets.
Proteins are the executors of cellular physiological activities, and accurate structural and function elucidation are crucial for the refined mapping of proteins. As a feature engineering method, the reduction of amino acid composition is not only an important method for protein structure and function analysis, but also opens a broad horizon for the complex field of machine learning. Representing sequences with fewer amino acid types greatly reduces the complexity and noise of traditional feature engineering in dimension, and provides more interpretable predictive models for machine learning to capture key features. In this paper, we systematically reviewed the strategy and method studies of the reduced amino acid (RAA) alphabets, and summarized its main research in protein sequence alignment, functional classification, and prediction of structural properties, respectively. In the end, we gave a comprehensive analysis of 672 RAA alphabets from 74 reduction methods.
Collapse
Affiliation(s)
- Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Hao Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Jian Zhou
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
- Corresponding authors.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
- Corresponding authors.
| |
Collapse
|
8
|
Zhang Y, Wang K, Yu W, Guo X, Wen J, Luo Y. Minimal EEG channel selection for depression detection with connectivity features during sleep. Comput Biol Med 2022; 147:105690. [DOI: 10.1016/j.compbiomed.2022.105690] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/29/2022] [Accepted: 05/31/2022] [Indexed: 11/30/2022]
|
9
|
Dimitriu MA, Lazar-Contes I, Roszkowski M, Mansuy IM. Single-Cell Multiomics Techniques: From Conception to Applications. Front Cell Dev Biol 2022; 10:854317. [PMID: 35386194 PMCID: PMC8979110 DOI: 10.3389/fcell.2022.854317] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 02/14/2022] [Indexed: 01/16/2023] Open
Abstract
Recent advances in methods for single-cell analyses and barcoding strategies have led to considerable progress in research. The development of multiplexed assays offers the possibility to conduct parallel analyses of multiple factors and processes for comprehensive characterization of cellular and molecular states in health and disease. These technologies have expanded extremely rapidly in the past years and constantly evolve and provide better specificity, precision and resolution. This review summarizes recent progress in single-cell multiomics approaches, and focuses, in particular, on the most innovative techniques that integrate genome, epigenome and transcriptome profiling. It describes the methodologies, discusses their advantages and limitations, and explains how they have been applied to studies on cell heterogeneity and differentiation, and epigenetic reprogramming.
Collapse
Affiliation(s)
| | | | | | - Isabelle M. Mansuy
- Laboratory of Neuroepigenetics, Brain Research Institute, University of Zurich and Institute for Neuroscience, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
10
|
Sun Y, Li H, Zheng L, Li J, Hong Y, Liang P, Kwok LY, Zuo Y, Zhang W, Zhang H. iProbiotics: a machine learning platform for rapid identification of probiotic properties from whole-genome primary sequences. Brief Bioinform 2021; 23:6444315. [PMID: 34849572 DOI: 10.1093/bib/bbab477] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 09/28/2021] [Accepted: 10/15/2021] [Indexed: 12/13/2022] Open
Abstract
Lactic acid bacteria consortia are commonly present in food, and some of these bacteria possess probiotic properties. However, discovery and experimental validation of probiotics require extensive time and effort. Therefore, it is of great interest to develop effective screening methods for identifying probiotics. Advances in sequencing technology have generated massive genomic data, enabling us to create a machine learning-based platform for such purpose in this work. This study first selected a comprehensive probiotics genome dataset from the probiotic database (PROBIO) and literature surveys. Then, k-mer (from 2 to 8) compositional analysis was performed, revealing diverse oligonucleotide composition in strain genomes and apparently more probiotic (P-) features in probiotic genomes than non-probiotic genomes. To reduce noise and improve computational efficiency, 87 376 k-mers were refined by an incremental feature selection (IFS) method, and the model achieved the maximum accuracy level at 184 core features, with a high prediction accuracy (97.77%) and area under the curve (98.00%). Functional genomic analysis using annotations from gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Rapid Annotation using Subsystem Technology (RAST) databases, as well as analysis of genes associated with host gastrointestinal survival/settlement, carbohydrate utilization, drug resistance and virulence factors, revealed that the distribution of P-features was biased toward genes/pathways related to probiotic function. Our results suggest that the role of probiotics is not determined by a single gene, but by a combination of k-mer genomic components, providing new insights into the identification and underlying mechanisms of probiotics. This work created a novel and free online bioinformatic tool, iProbiotics, which would facilitate rapid screening for probiotics.
Collapse
Affiliation(s)
- Yu Sun
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Haicheng Li
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Jinzhao Li
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Yan Hong
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Lai-Yu Kwok
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Wenyi Zhang
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, China
| | - Heping Zhang
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, China
| |
Collapse
|
11
|
Lv H, Shi L, Berkenpas JW, Dao FY, Zulfiqar H, Ding H, Zhang Y, Yang L, Cao R. Application of artificial intelligence and machine learning for COVID-19 drug discovery and vaccine design. Brief Bioinform 2021; 22:bbab320. [PMID: 34410360 PMCID: PMC8511807 DOI: 10.1093/bib/bbab320] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/15/2021] [Accepted: 07/22/2021] [Indexed: 12/13/2022] Open
Abstract
The global pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2, has led to a dramatic loss of human life worldwide. Despite many efforts, the development of effective drugs and vaccines for this novel virus will take considerable time. Artificial intelligence (AI) and machine learning (ML) offer promising solutions that could accelerate the discovery and optimization of new antivirals. Motivated by this, in this paper, we present an extensive survey on the application of AI and ML for combating COVID-19 based on the rapidly emerging literature. Particularly, we point out the challenges and future directions associated with state-of-the-art solutions to effectively control the COVID-19 pandemic. We hope that this review provides researchers with new insights into the ways AI and ML fight and have fought the COVID-19 outbreak.
Collapse
Affiliation(s)
- Hao Lv
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai 200433, China
| | | | - Fu-Ying Dao
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hasan Zulfiqar
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Liming Yang
- Department of Pathophysiology, Harbin Medical University-Daqing, Daqing, 163319, China
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma 98447, USA
| |
Collapse
|
12
|
Zhao YW, Zhang S, Ding H. Recent development of machine learning methods in sumoylation sites prediction. Curr Med Chem 2021; 29:894-907. [PMID: 34525906 DOI: 10.2174/0929867328666210915112030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 07/24/2021] [Accepted: 08/07/2021] [Indexed: 11/22/2022]
Abstract
Sumoylation of proteins is an important reversible post-translational modification of proteins and mediates a variety of cellular processes. Sumo-modified proteins can change their subcellular localization, activity and stability. In addition, it also plays an important role in various cellular processes such as transcriptional regulation and signal transduction. The abnormal sumoylation is involved in many diseases, including neurodegeneration and immune-related diseases, as well as the development of cancer. Therefore, identification of the sumoylation site (SUMO site) is fundamental to understanding their molecular mechanisms and regulatory roles. In contrast to labor-intensive and costly experimental approaches, computational prediction of sumoylation sites in silico also attracted much attention for its accuracy, convenience and speed. At present, many computational prediction models have been used to identify SUMO sites, but these contents have not been comprehensively summarized and reviewed. Therefore, the research progress of relevant models is summarized and discussed in this paper. We will briefly summarize the development of bioinformatics methods on sumoylation site prediction. We will mainly focus on the benchmark dataset construction, feature extraction, machine learning method, published results and online tools. We hope the review will provide more help for wet-experimental scholars.
Collapse
Affiliation(s)
- Yi-Wei Zhao
- School of Medicine, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Shihua Zhang
- College of Life Science and Health, Wuhan University of Science and Technology, Wuhan 430065. China
| | - Hui Ding
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| |
Collapse
|
13
|
Yang YH, Wang JS, Yuan SS, Liu ML, Su W, Lin H, Zhang ZY. A Survey for Predicting ATP Binding Residues of Proteins Using Machine Learning Methods. Curr Med Chem 2021; 29:789-806. [PMID: 34514982 DOI: 10.2174/0929867328666210910125802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 06/29/2021] [Accepted: 07/04/2021] [Indexed: 11/22/2022]
Abstract
Protein-ligand interactions are necessary for majority protein functions. Adenosine-5'-triphosphate (ATP) is one such ligand that plays vital role as a coenzyme in providing energy for cellular activities, catalyzing biological reaction and signaling. Knowing ATP binding residues of proteins is helpful for annotation of protein function and drug design. However, due to the huge amounts of protein sequences influx into databases in the post-genome era, experimentally identifying ATP binding residues is cost-ineffective and time-consuming. To address this problem, computational methods have been developed to predict ATP binding residues. In this review, we briefly summarized the application of machine learning methods in detecting ATP binding residues of proteins. We expect this review will be helpful for further research.
Collapse
Affiliation(s)
- Yu-He Yang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Jia-Shu Wang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Shi-Shi Yuan
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Meng-Lu Liu
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Wei Su
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Zhao-Yue Zhang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| |
Collapse
|
14
|
Min X, Lu F, Li C. Sequence-Based Deep Learning Frameworks on Enhancer-Promoter Interactions Prediction. Curr Pharm Des 2021; 27:1847-1855. [PMID: 33234095 DOI: 10.2174/1381612826666201124112710] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 07/29/2020] [Accepted: 08/06/2020] [Indexed: 11/22/2022]
Abstract
Enhancer-promoter interactions (EPIs) in the human genome are of great significance to transcriptional regulation, which tightly controls gene expression. Identification of EPIs can help us better decipher gene regulation and understand disease mechanisms. However, experimental methods to identify EPIs are constrained by funds, time, and manpower, while computational methods using DNA sequences and genomic features are viable alternatives. Deep learning methods have shown promising prospects in classification and efforts that have been utilized to identify EPIs. In this survey, we specifically focus on sequence-based deep learning methods and conduct a comprehensive review of the literature. First, we briefly introduce existing sequence- based frameworks on EPIs prediction and their technique details. After that, we elaborate on the dataset, pre-processing means, and evaluation strategies. Finally, we concluded with the challenges these methods are confronted with and suggest several future opportunities. We hope this review will provide a useful reference for further studies on enhancer-promoter interactions.
Collapse
Affiliation(s)
- Xiaoping Min
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Fengqing Lu
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Chunyan Li
- Graduate School, Yunnan Minzu University, Kunming 650504, China
| |
Collapse
|
15
|
Liang P, Zheng L, Long C, Yang W, Yang L, Zuo Y. HelPredictor models single-cell transcriptome to predict human embryo lineage allocation. Brief Bioinform 2021; 22:6284371. [PMID: 34037706 DOI: 10.1093/bib/bbab196] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 04/15/2021] [Accepted: 04/29/2021] [Indexed: 01/08/2023] Open
Abstract
The in-depth understanding of cellular fate decision of human preimplantation embryos has prompted investigations on how changes in lineage allocation, which is far from trivial and remains a time-consuming task by experimental methods. It is desirable to develop a novel effective bioinformatics strategy to consider transitions of coordinated embryo lineage allocation and stage-specific patterns. There are rapidly growing applications of machine learning models to interpret complex datasets for identifying candidate development-related factors and lineage-determining molecular events. Here we developed the first machine learning platform, HelPredictor, that integrates three feature selection methods, namely, principal components analysis, F-score algorithm and squared coefficient of variation, and four classical machine learning classifiers that different combinations of methods and classifiers have independent outputs by increment feature selection method. With application to single-cell sequencing data of human embryo, HelPredictor not only achieved 94.9% and 90.9% respectively with cross-validation and independent test, but also fast classified different embryonic lineages and their development trajectories using less HelPredictor-predicted factors. The above-mentioned candidate lineage-specific genes were discussed in detail and were clustered for exploring transitions of embryonic heterogeneity. Our tool can fast and efficiently reveal potential lineage-specific and stage-specific biomarkers and provide insights into how advanced computational tools contribute to development research. The source code is available at https://github.com/liameihao/HelPredictor.
Collapse
Affiliation(s)
- Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Chunshen Long
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Wuritu Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| |
Collapse
|
16
|
Zulfiqar H, Khan RS, Hassan F, Hippe K, Hunt C, Ding H, Song XM, Cao R. Computational identification of N4-methylcytosine sites in the mouse genome with machine-learning method. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:3348-3363. [PMID: 34198389 DOI: 10.3934/mbe.2021167] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/24/2023]
Abstract
N4-methylcytosine (4mC) is a kind of DNA modification which could regulate multiple biological processes. Correctly identifying 4mC sites in genomic sequences can provide precise knowledge about their genetic roles. This study aimed to develop an ensemble model to predict 4mC sites in the mouse genome. In the proposed model, DNA sequences were encoded by k-mer, enhanced nucleic acid composition and composition of k-spaced nucleic acid pairs. Subsequently, these features were optimized by using minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) and five-fold cross-validation. The obtained optimal features were inputted into random forest classifier for discriminating 4mC from non-4mC sites in mouse. On the independent dataset, our model could yield the overall accuracy of 85.41%, which was approximately 3.8% -6.3% higher than the two existing models, i4mC-Mouse and 4mCpred-EL respectively. The data and source code of the model can be freely download from https://github.com/linDing-groups/model_4mc.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Rida Sarwar Khan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Farwa Hassan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Kyle Hippe
- Department of Computer Science, Pacific Lutheran University, Tacoma 98447, USA
| | - Cassandra Hunt
- Department of Computer Science, Pacific Lutheran University, Tacoma 98447, USA
| | - Hui Ding
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiao-Ming Song
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
- School of Life Sciences, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma 98447, USA
| |
Collapse
|
17
|
Zhang X, Jonassen I, Goksøyr A. Machine Learning Approaches for Biomarker Discovery Using Gene Expression Data. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
|
18
|
Wang H, Liang P, Zheng L, Long C, Li H, Zuo Y. eHSCPr discriminating the cell identity involved in endothelial to hematopoietic transition. Bioinformatics 2021; 37:2157-2164. [PMID: 33532815 DOI: 10.1093/bioinformatics/btab071] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 01/15/2021] [Accepted: 01/28/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Hematopoietic stem cells (HSCs) give rise to all blood cells and play a vital role throughout the whole lifespan through their pluripotency and self-renewal properties. Accurately identifying the stages of early HSCs is extremely important, as it may open up new prospects for extracorporeal blood research. Existing experimental techniques for identifying the early stages of HSCs development are time-consuming and expensive. Machine learning has shown its excellence in massive single-cell data processing and it is desirable to develop related computational models as good complements to experimental techniques. RESULTS In this study, we presented a novel predictor called eHSCPr specifically for predicting the early stages of HSCs development. To reveal the distinct genes at each developmental stage of HSCs, we compared F-score with three state-of-art differential gene selection methods (limma, DESeq2, edgeR) and evaluated their performance. F-score captured the more critical surface markers of endothelial cells and hematopoietic cells, and the area under receiver operating characteristic curve (ROC) value was 0.987. Based on SVM, the 10-fold cross-validation accuracy of eHSCpr in the independent dataset and the training dataset reached 94.84% and 94.19%, respectively. Importantly, we performed transcription analysis on the F-score gene set, which indeed further enriched the signal markers of HSCs development stages. eHSCPr can be a powerful tool for predicting early stages of HSCs development, facilitating hypothesis-driven experimental design and providing crucial clues for the in vitro blood regeneration studies. AVAILABILITY http://bioinfor.imu.edu.cn/ehscpr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hao Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - ChunShen Long
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - HanShuang Li
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| |
Collapse
|
19
|
Predicting Preference of Transcription Factors for Methylated DNA Using Sequence Information. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 22:1043-1050. [PMID: 33294291 PMCID: PMC7691157 DOI: 10.1016/j.omtn.2020.07.035] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 07/28/2020] [Indexed: 12/12/2022]
Abstract
Transcription factors play key roles in cell-fate decisions by regulating 3D genome conformation and gene expression. The traditional view is that methylation of DNA hinders transcription factors binding to them, but recent research has shown that many transcription factors prefer to bind to methylated DNA. Therefore, identifying such transcription factors and understanding their functions is a stepping-stone for studying methylation-mediated biological processes. In this paper, a two-step discriminated method was proposed to recognize transcription factors and their preference for methylated DNA based only on sequences information. In the first step, the proposed model was used to discriminate transcription factors from non-transcription factors. The areas under the curve (AUCs) are 0.9183 and 0.9116, respectively, for the 5-fold cross-validation test and independent dataset test. Subsequently, for the classification of transcription factors that prefer methylated DNA and transcription factors that prefer non-methylated DNA, our model could produce the AUCs of 0.7744 and 0.7356, respectively, for the 5-fold cross-validation test and independent dataset test. Based on the proposed model, a user-friendly web server called TFPred was built, which can be freely accessed at http://lin-group.cn/server/TFPred/.
Collapse
|