1
|
Wen Y, Yang H, Hong Y. Transcriptomic Approaches to Cardiomyocyte-Biomaterial Interactions: A Review. ACS Biomater Sci Eng 2024; 10:4175-4194. [PMID: 38934720 DOI: 10.1021/acsbiomaterials.4c00303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
Biomaterials, essential for supporting, enhancing, and repairing damaged tissues, play a critical role in various medical applications. This Review focuses on the interaction of biomaterials and cardiomyocytes, emphasizing the unique significance of transcriptomic approaches in understanding their interactions, which are pivotal in cardiac bioengineering and regenerative medicine. Transcriptomic approaches serve as powerful tools to investigate how cardiomyocytes respond to biomaterials, shedding light on the gene expression patterns, regulatory pathways, and cellular processes involved in these interactions. Emerging technologies such as bulk RNA-seq, single-cell RNA-seq, single-nucleus RNA-seq, and spatial transcriptomics offer promising avenues for more precise and in-depth investigations. Longitudinal studies, pathway analyses, and machine learning techniques further improve the ability to explore the complex regulatory mechanisms involved. This review also discusses the challenges and opportunities of utilizing transcriptomic techniques in cardiomyocyte-biomaterial research. Although there are ongoing challenges such as costs, cell size limitation, sample differences, and complex analytical process, there exist exciting prospects in comprehensive gene expression analyses, biomaterial design, cardiac disease treatment, and drug testing. These multimodal methodologies have the capacity to deepen our understanding of the intricate interaction network between cardiomyocytes and biomaterials, potentially revolutionizing cardiac research with the aim of promoting heart health, and they are also promising for studying interactions between biomaterials and other cell types.
Collapse
Affiliation(s)
- Yufeng Wen
- Department of Bioengineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Huaxiao Yang
- Department of Biomedical Engineering, University of North Texas, Denton, Texas 76207, United States
| | - Yi Hong
- Department of Bioengineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| |
Collapse
|
2
|
Zimmer A, Wang ER, Choudhary G, Zhang P. Protocol for simultaneous isolation of high-quality and high-quantity cardiomyocytes and non-myocyte cells from adult rat hearts. STAR Protoc 2024; 5:103174. [PMID: 38970791 PMCID: PMC11264182 DOI: 10.1016/j.xpro.2024.103174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/28/2024] [Accepted: 06/17/2024] [Indexed: 07/08/2024] Open
Abstract
Isolating high-quality different cell types is a powerful approach for understanding cellular compositions and features in the heart, but it is challenging. The available protocols typically focus on isolating one or two cell types. Here, we present a protocol to simultaneously isolate high-quality and high-quantity cardiomyocytes and non-myocyte cells, including immune cells, from adult rat hearts. We describe steps for purifying cells using bovine serum albumin. We also detail procedures for viability analysis and cell type identification using fluorescence-activated cell sorting. For complete details on the use and execution of this protocol, please refer to Zhang et al.,1 Valkov et al.,2 Vang et al.,3 and Li et al.4.
Collapse
Affiliation(s)
- Alexsandra Zimmer
- Vascular Research Laboratory, Providence VA Medical Center, Providence, RI 02908, USA; Department of Medicine, Alpert Medical School of Brown University, Providence, RI, USA; Lifespan Cardiovascular Institute, Rhode Island Hospital, Providence, RI, USA
| | - Eric R Wang
- Vascular Research Laboratory, Providence VA Medical Center, Providence, RI 02908, USA; Department of Medicine, Alpert Medical School of Brown University, Providence, RI, USA
| | - Gaurav Choudhary
- Vascular Research Laboratory, Providence VA Medical Center, Providence, RI 02908, USA; Department of Medicine, Alpert Medical School of Brown University, Providence, RI, USA; Lifespan Cardiovascular Institute, Rhode Island Hospital, Providence, RI, USA
| | - Peng Zhang
- Vascular Research Laboratory, Providence VA Medical Center, Providence, RI 02908, USA; Department of Medicine, Alpert Medical School of Brown University, Providence, RI, USA; Lifespan Cardiovascular Institute, Rhode Island Hospital, Providence, RI, USA.
| |
Collapse
|
3
|
Hu C, Francisco J, Del Re DP, Sadoshima J. Decoding the Impact of the Hippo Pathway on Different Cell Types in Heart Failure. Circ J 2024:CJ-24-0171. [PMID: 38644191 DOI: 10.1253/circj.cj-24-0171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
The evolutionarily conserved Hippo pathway plays a pivotal role in governing a variety of biological processes. Heart failure (HF) is a major global health problem with a significant risk of mortality. This review provides a contemporary understanding of the Hippo pathway in regulating different cell types during HF. Through a systematic analysis of each component's regulatory mechanisms within the Hippo pathway, we elucidate their specific effects on cardiomyocytes, fibroblasts, endothelial cells, and macrophages in response to various cardiac injuries. Insights gleaned from both in vitro and in vivo studies highlight the therapeutic promise of targeting the Hippo pathway to address cardiovascular diseases, particularly HF.
Collapse
Affiliation(s)
- Chengchen Hu
- Department of Cell Biology and Molecular Medicine, Rutgers New Jersey Medical School
| | - Jamie Francisco
- Department of Cell Biology and Molecular Medicine, Rutgers New Jersey Medical School
| | - Dominic P Del Re
- Department of Cell Biology and Molecular Medicine, Rutgers New Jersey Medical School
| | - Junichi Sadoshima
- Department of Cell Biology and Molecular Medicine, Rutgers New Jersey Medical School
| |
Collapse
|
4
|
Hegemann N, Barth L, Döring Y, Voigt N, Grune J. Implications for neutrophils in cardiac arrhythmias. Am J Physiol Heart Circ Physiol 2024; 326:H441-H458. [PMID: 38099844 PMCID: PMC11219058 DOI: 10.1152/ajpheart.00590.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 12/08/2023] [Accepted: 12/13/2023] [Indexed: 02/03/2024]
Abstract
Cardiac arrhythmias commonly occur as a result of aberrant electrical impulse formation or conduction in the myocardium. Frequently discussed triggers include underlying heart diseases such as myocardial ischemia, electrolyte imbalances, or genetic anomalies of ion channels involved in the tightly regulated cardiac action potential. Recently, the role of innate immune cells in the onset of arrhythmic events has been highlighted in numerous studies, correlating leukocyte expansion in the myocardium to increased arrhythmic burden. Here, we aim to call attention to the role of neutrophils in the pathogenesis of cardiac arrhythmias and their expansion during myocardial ischemia and infectious disease manifestation. In addition, we will elucidate molecular mechanisms associated with neutrophil activation and discuss their involvement as direct mediators of arrhythmogenicity.
Collapse
Affiliation(s)
- Niklas Hegemann
- Department of Cardiothoracic and Vascular Surgery, Deutsches Herzzentrum der Charité (DHZC), Berlin, Germany
- Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- German Centre for Cardiovascular Research (DZHK), Berlin, Germany
| | - Lukas Barth
- Department of Cardiothoracic and Vascular Surgery, Deutsches Herzzentrum der Charité (DHZC), Berlin, Germany
- Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- German Centre for Cardiovascular Research (DZHK), Berlin, Germany
| | - Yannic Döring
- Institute of Pharmacology and Toxicology, University Medical Center Göttingen, Georg August University Göttingen, Göttingen, Germany
- German Centre for Cardiovascular Research (DZHK), Göttingen, Germany
| | - Niels Voigt
- Institute of Pharmacology and Toxicology, University Medical Center Göttingen, Georg August University Göttingen, Göttingen, Germany
- German Centre for Cardiovascular Research (DZHK), Göttingen, Germany
- Cluster of Excellence "Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells" (MBExC), University of Göttingen, Göttingen, Germany
| | - Jana Grune
- Department of Cardiothoracic and Vascular Surgery, Deutsches Herzzentrum der Charité (DHZC), Berlin, Germany
- Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- German Centre for Cardiovascular Research (DZHK), Berlin, Germany
| |
Collapse
|
5
|
Xu Y, Zhang W, Zheng X, Cai X. Combining Global-Constrained Concept Factorization and a Regularized Gaussian Graphical Model for Clustering Single-Cell RNA-seq Data. Interdiscip Sci 2024; 16:1-15. [PMID: 37815679 DOI: 10.1007/s12539-023-00587-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 09/14/2023] [Accepted: 09/17/2023] [Indexed: 10/11/2023]
Abstract
Single-cell RNA sequencing technology is one of the most cost-effective ways to uncover transcriptomic heterogeneity. With the rapid rise of this technology, enormous amounts of scRNA-seq data have been produced. Due to the high dimensionality, noise, sparsity and missing features of the available scRNA-seq data, accurately clustering the scRNA-seq data for downstream analysis is a significant challenge. Many computational methods have been designed to address this issue; nevertheless, the efficacy of the available methods is still inadequate. In addition, most similarity-based methods require a number of clusters as input, which is difficult to achieve in real applications. In this study, we developed a novel computational method for clustering scRNA-seq data by considering both global and local information, named GCFG. This method characterizes the global properties of data by applying concept factorization, and the regularized Gaussian graphical model is utilized to evaluate the local embedding relationship of data. To learn the cell-cell similarity matrix, we integrated the two components, and an iterative optimization algorithm was developed. The categorization of single cells is obtained by applying Louvain, a modularity-based community discovery algorithm, to the similarity matrix. The behavior of the GCFG approach is assessed on 14 real scRNA-seq datasets in terms of ACC and ARI, and comparison results with 17 other competitive methods suggest that GCFG is effective and robust.
Collapse
Affiliation(s)
- Yaxin Xu
- School of Sciences, East China Jiaotong University, Nanchang, 330013, China
| | - Wei Zhang
- School of Sciences, East China Jiaotong University, Nanchang, 330013, China.
| | - Xiaoying Zheng
- Operations Research and Planning Department, Naval University of Engineering, Wuhan, 430033, China
| | - Xianxian Cai
- School of Sciences, East China Jiaotong University, Nanchang, 330013, China
| |
Collapse
|
6
|
Cui YH, Wu CR, Xu D, Tang JG. Exploration of neuron heterogeneity in human heart failure with dilated cardiomyopathy through single-cell RNA sequencing analysis. BMC Cardiovasc Disord 2024; 24:86. [PMID: 38310240 PMCID: PMC10838417 DOI: 10.1186/s12872-024-03739-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 01/19/2024] [Indexed: 02/05/2024] Open
Abstract
OBJECTIVE We aimed to explore the heterogeneity of neurons in heart failure with dilated cardiomyopathy (DCM). METHODS Single-cell RNA sequencing (scRNA-seq) data of patients with DCM and chronic heart failure and healthy samples from GSE183852 dataset were downloaded from NCBI Gene Expression Omnibus, in which neuron data were extracted for investigation. Cell clustering analysis, differential expression analysis, trajectory analysis, and cell communication analysis were performed, and highly expressed genes in neurons from patients were used to construct a protein-protein interaction (PPI) network and validated by GSE120895 dataset. RESULTS Neurons were divided into six subclusters involved in various biological processes and each subcluster owned its specific cell communication pathways. Neurons were differentiated into two branches along the pseudotime, one of which was differentiated into mature neurons, whereas another tended to be involved in the immune and inflammation response. Genes exhibited branch-specific differential expression patterns. FLNA, ITGA6, ITGA1, and MDK interacted more with other gene-product proteins in the PPI network. The differential expression of FLNA between DCM and control was validated. CONCLUSION Neurons have significant heterogeneity in heart failure with DCM, and may be involved in the immune and inflammation response to heart failure.
Collapse
Affiliation(s)
- Yu-Hui Cui
- Department of Trauma-Emergency & Critical Care Medicine Center, Shanghai Fifth People's Hospital, Fudan University, No.801 Heqing Road, Minhang District, Shanghai, 200240, China
| | - Chun-Rong Wu
- Department of Trauma-Emergency & Critical Care Medicine Center, Shanghai Fifth People's Hospital, Fudan University, No.801 Heqing Road, Minhang District, Shanghai, 200240, China
| | - Dan Xu
- Department of Trauma-Emergency & Critical Care Medicine Center, Shanghai Fifth People's Hospital, Fudan University, No.801 Heqing Road, Minhang District, Shanghai, 200240, China
| | - Jian-Guo Tang
- Department of Trauma-Emergency & Critical Care Medicine Center, Shanghai Fifth People's Hospital, Fudan University, No.801 Heqing Road, Minhang District, Shanghai, 200240, China.
| |
Collapse
|
7
|
Bazgir F, Nau J, Nakhaei-Rad S, Amin E, Wolf MJ, Saucerman JJ, Lorenz K, Ahmadian MR. The Microenvironment of the Pathogenesis of Cardiac Hypertrophy. Cells 2023; 12:1780. [PMID: 37443814 PMCID: PMC10341218 DOI: 10.3390/cells12131780] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 06/22/2023] [Accepted: 06/29/2023] [Indexed: 07/15/2023] Open
Abstract
Pathological cardiac hypertrophy is a key risk factor for the development of heart failure and predisposes individuals to cardiac arrhythmia and sudden death. While physiological cardiac hypertrophy is adaptive, hypertrophy resulting from conditions comprising hypertension, aortic stenosis, or genetic mutations, such as hypertrophic cardiomyopathy, is maladaptive. Here, we highlight the essential role and reciprocal interactions involving both cardiomyocytes and non-myocardial cells in response to pathological conditions. Prolonged cardiovascular stress causes cardiomyocytes and non-myocardial cells to enter an activated state releasing numerous pro-hypertrophic, pro-fibrotic, and pro-inflammatory mediators such as vasoactive hormones, growth factors, and cytokines, i.e., commencing signaling events that collectively cause cardiac hypertrophy. Fibrotic remodeling is mediated by cardiac fibroblasts as the central players, but also endothelial cells and resident and infiltrating immune cells enhance these processes. Many of these hypertrophic mediators are now being integrated into computational models that provide system-level insights and will help to translate our knowledge into new pharmacological targets. This perspective article summarizes the last decades' advances in cardiac hypertrophy research and discusses the herein-involved complex myocardial microenvironment and signaling components.
Collapse
Affiliation(s)
- Farhad Bazgir
- Institute of Biochemistry and Molecular Biology II, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany; (F.B.); (J.N.)
| | - Julia Nau
- Institute of Biochemistry and Molecular Biology II, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany; (F.B.); (J.N.)
| | - Saeideh Nakhaei-Rad
- Stem Cell Biology, and Regenerative Medicine Research Group, Institute of Biotechnology, Ferdowsi University of Mashhad, Mashhad 91779-48974, Iran;
| | - Ehsan Amin
- Institute of Neural and Sensory Physiology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany;
| | - Matthew J. Wolf
- Department of Medicine and Robert M. Berne Cardiovascular Research Center, University of Virginia, Charlottesville, VA 22908, USA;
| | - Jeffry J. Saucerman
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA;
| | - Kristina Lorenz
- Institute of Pharmacology and Toxicology, University of Würzburg, Leibniz Institute for Analytical Sciences, 97078 Würzburg, Germany;
| | - Mohammad Reza Ahmadian
- Institute of Biochemistry and Molecular Biology II, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany; (F.B.); (J.N.)
| |
Collapse
|
8
|
Analysis and prediction of protein stability based on interaction network, gene ontology, and KEGG pathway enrichment scores. BIOCHIMICA ET BIOPHYSICA ACTA. PROTEINS AND PROTEOMICS 2023; 1871:140889. [PMID: 36610583 DOI: 10.1016/j.bbapap.2023.140889] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/18/2022] [Accepted: 01/02/2023] [Indexed: 01/06/2023]
Abstract
Metabolic stability of proteins plays a vital role in various dedicated cellular processes. Traditional methods of measuring the metabolic stability are time-consuming and expensive. Therefore, we developed a more efficient computational approach to understand the protein dynamic action mechanisms in biological process networks. In this study, we collected 341 short-lived proteins and 824 non-short-lived proteins from U2OS; 342 short-lived proteins and 821 non-short-lived proteins from HEK293T; 424 short-lived proteins and 1153 non-short-lived proteins from HCT116; and 384 short-lived proteins and 992 non-short-lived proteins from RPE1. The proteins were encoded by GO and KEGG enrichment scores based on the genes and their neighbors in STRING, resulting in 20,681 GO term features and 297 KEGG pathway features. We also incorporated the protein interaction information from STRING into the features and obtained 19,247 node features. Boruta and mRMR methods were used for feature filtering, and IFS method was used to obtain the best feature subsets and create the models with the highest performance. The present study identified 42 features that did not appear in previous studies and classified them into eight groups according to their functional annotation. By reviewing the literature, we found that the following three functional groups were critical in determining the stability of proteins: synaptic transmission, post-translational modifications, and cell fate determination. These findings may serve as a valuable reference for developing drugs that target protein stability.
Collapse
|
9
|
Identification of Smoking-Associated Transcriptome Aberration in Blood with Machine Learning Methods. BIOMED RESEARCH INTERNATIONAL 2023; 2023:5333361. [PMID: 36644165 PMCID: PMC9833906 DOI: 10.1155/2023/5333361] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Revised: 12/15/2022] [Accepted: 12/15/2022] [Indexed: 01/06/2023]
Abstract
Long-term cigarette smoking causes various human diseases, including respiratory disease, cancer, and gastrointestinal (GI) disorders. Alterations in gene expression and variable splicing processes induced by smoking are associated with the development of diseases. This study applied advanced machine learning methods to identify the isoforms with important roles in distinguishing smokers from former smokers based on the expression profile of isoforms from current and former smokers collected in one previous study. These isoforms were deemed as features, which were first analyzed by the Boruta to select features highly correlated with the target variables. Then, the selected features were evaluated by four feature ranking algorithms, resulting in four feature lists. The incremental feature selection method was applied to each list for obtaining the optimal feature subsets and building high-performance classification models. Furthermore, a series of classification rules were accessed by decision tree with the highest performance. Eventually, the rationality of the mined isoforms (features) and classification rules was verified by reviewing previous research. Features such as isoforms ENST00000464835 (expressed by LRRN3), ENST00000622663 (expressed by SASH1), and ENST00000284311 (expressed by GPR15), and pathways (cytotoxicity mediated by natural killer cell and cytokine-cytokine receptor interaction) revealed by the enrichment analysis, were highly relevant to smoking response, suggesting the robustness of our analysis pipeline.
Collapse
|
10
|
Wu C, Chen L. A model with deep analysis on a large drug network for drug classification. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:383-401. [PMID: 36650771 DOI: 10.3934/mbe.2023018] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Drugs are an important means to treat various diseases. They are classified into several classes to indicate their properties and effects. Those in the same class always share some important features. The Kyoto Encyclopedia of Genes and Genomes (KEGG) DRUG recently reported a new drug classification system that classifies drugs into 14 classes. Correct identification of the class for any possible drug-like compound is helpful to roughly determine its effects for a particular type of disease. Experiments could be conducted to confirm such latent effects, thus accelerating the procedures for discovering novel drugs. In this study, this classification system was investigated. A classification model was proposed to assign one of the classes in the system to any given drug for the first time. Different from traditional fingerprint features, which indicated essential drug properties alone and were very popular in investigating drug-related problems, drugs were represented by novel features derived from a large drug network via a well-known network embedding algorithm called Node2vec. These features abstracted the drug associations generated from their essential properties, and they could overview each drug with all drugs as background. As class sizes were of great differences, synthetic minority over-sampling technique (SMOTE) was employed to tackle the imbalance problem. A balanced dataset was fed into the support vector machine to build the model. The 10-fold cross-validation results suggested the excellent performance of the model. This model was also superior to models using other drug features, including those generated by another network embedding algorithm and fingerprint features. Furthermore, this model provided more balanced performance across all classes than that without SMOTE.
Collapse
Affiliation(s)
- Chenhao Wu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
11
|
Identification of Methylation Signatures and Rules for Sarcoma Subtypes by Machine Learning Methods. BIOMED RESEARCH INTERNATIONAL 2022; 2022:5297235. [PMID: 36619306 PMCID: PMC9812612 DOI: 10.1155/2022/5297235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/28/2022] [Accepted: 12/08/2022] [Indexed: 12/31/2022]
Abstract
Sarcoma, the second common type of solid tumor in children and adolescents, has a wide variety of subtypes that are often not properly diagnosed at an early stage, leading to late metastases and causing serious loss of life and property to patients and families. It exhibits a high degree of heterogeneity at the cellular, molecular, and epigenetic levels, where DNA methylation has been proposed to play a role in the diagnosis of sarcoma subtypes. Thus, this study is aimed at finding potential biomarkers at the DNA methylation level to distinguish different sarcoma subtypes. A machine learning process was designed to analyse sarcoma samples, each of which was represented by lots of methylation sites. Irrelevant sites were removed using the Boruta method, and remaining sites related to the target variables were kept for further analyses. Afterward, three feature ranking methods (LASSO, LightGBM, and MCFS) were adopted to rank these features, and six classification models were constructed by combining incremental feature selection and two classification algorithms (decision tree and random forest). Among these models, the performance of RF model was higher than that of DT model under all three ranking conditions. The specific expression of genes obtained from the annotation of highly correlated methylation site features, such as PRKAR1B, INPP5A, and GLI3, was proven to be associated with sarcoma by publications. Moreover, the quantitative rules obtained by decision tree algorithm helped us to understand the essential differences between various sarcoma types and classify sarcoma subtypes, providing a new means of clinical identification and determining new therapeutic targets.
Collapse
|
12
|
Identification of Transcriptome Biomarkers for Severe COVID-19 with Machine Learning Methods. Biomolecules 2022; 12:biom12121735. [PMID: 36551164 PMCID: PMC9775121 DOI: 10.3390/biom12121735] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/18/2022] [Accepted: 11/18/2022] [Indexed: 11/24/2022] Open
Abstract
The rapid spread of COVID-19 has become a major concern for people's lives and health all around the world. COVID-19 patients in various phases and severity require individualized treatment given that different patients may develop different symptoms. We employed machine learning methods to discover biomarkers that may accurately classify COVID-19 in various disease states and severities in this study. The blood gene expression profiles from 50 COVID-19 patients without intensive care, 50 COVID-19 patients with intensive care, 10 non-COVID-19 individuals without intensive care, and 16 non-COVID-19 individuals with intensive care were analyzed. Boruta was first used to remove irrelevant gene features in the expression profiles, and then, the minimum redundancy maximum relevance was applied to sort the remaining features. The generated feature-ranked list was fed into the incremental feature selection method to discover the essential genes and build powerful classifiers. The molecular mechanism of some biomarker genes was addressed using recent studies, and biological functions enriched by essential genes were examined. Our findings imply that genes including UBE2C, PCLAF, CDK1, CCNB1, MND1, APOBEC3G, TRAF3IP3, CD48, and GZMA play key roles in defining the different states and severity of COVID-19. Thus, a new point of reference is provided for understanding the disease's etiology and facilitating a precise therapy.
Collapse
|
13
|
Identifying MicroRNA Markers That Predict COVID-19 Severity Using Machine Learning Methods. LIFE (BASEL, SWITZERLAND) 2022; 12:life12121964. [PMID: 36556329 PMCID: PMC9784129 DOI: 10.3390/life12121964] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 11/21/2022] [Accepted: 11/21/2022] [Indexed: 11/25/2022]
Abstract
Individuals with the SARS-CoV-2 infection may experience a wide range of symptoms, from being asymptomatic to having a mild fever and cough to a severe respiratory impairment that results in death. MicroRNA (miRNA), which plays a role in the antiviral effects of SARS-CoV-2 infection, has the potential to be used as a novel marker to distinguish between patients who have various COVID-19 clinical severities. In the current study, the existing blood expression profiles reported in two previous studies were combined for deep analyses. The final profiles contained 1444 miRNAs in 375 patients from six categories, which were as follows: 30 patients with mild COVID-19 symptoms, 81 patients with moderate COVID-19 symptoms, 30 non-COVID-19 patients with mild symptoms, 137 patients with severe COVID-19 symptoms, 31 non-COVID-19 patients with severe symptoms, and 66 healthy controls. An efficient computational framework containing four feature selection methods (LASSO, LightGBM, MCFS, and mRMR) and four classification algorithms (DT, KNN, RF, and SVM) was designed to screen clinical miRNA markers, and a high-precision RF model with a 0.780 weighted F1 was constructed. Some miRNAs, including miR-24-3p, whose differential expression was discovered in patients with acute lung injury complications brought on by severe COVID-19, and miR-148a-3p, differentially expressed against SARS-CoV-2 structural proteins, were identified, thereby suggesting the effectiveness and accuracy of our framework. Meanwhile, we extracted classification rules based on the DT model for the quantitative representation of the role of miRNA expression in differentiating COVID-19 patients with different severities. The search for novel biomarkers that could predict the severity of the disease could aid in the clinical diagnosis of COVID-19 and in exploring the specific mechanisms of the complications caused by SARS-CoV-2 infection. Moreover, new therapeutic targets for the disease may be found.
Collapse
|
14
|
Lu J, Meng M, Zhou X, Ding S, Feng K, Zeng Z, Huang T, Cai YD. Identification of COVID-19 severity biomarkers based on feature selection on single-cell RNA-Seq data of CD8 + T cells. Front Genet 2022; 13:1053772. [PMID: 36437952 PMCID: PMC9682094 DOI: 10.3389/fgene.2022.1053772] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 10/27/2022] [Indexed: 07/30/2023] Open
Abstract
The global outbreak of the COVID-19 epidemic has become a major public health problem. COVID-19 virus infection triggers a complex immune response. CD8+ T cells, in particular, play an essential role in controlling the severity of the disease. However, the mechanism of the regulatory role of CD8+ T cells on COVID-19 remains poorly investigated. In this study, single-cell gene expression profiles from three CD8+ T cell subtypes (effector, memory, and naive T cells) were downloaded. Each cell subtype included three disease states, namely, acute COVID-19, convalescent COVID-19, and unexposed individuals. The profiles on each cell subtype were individually analyzed in the same way. Irrelevant features in the profiles were first excluded by the Boruta method. The remaining features for each CD8+ T cells subtype were further analyzed by Max-Relevance and Min-Redundancy, Monte Carlo feature selection, and light gradient boosting machine methods to obtain three feature lists. These lists were then brought into the incremental feature selection method to determine the optimal features for each cell subtype. Their corresponding genes may be latent biomarkers to determine COVID-19 severity. Genes, such as ZFP36, DUSP1, TCR, and IL7R, can be confirmed to play an immune regulatory role in COVID-19 infection and recovery. The results of functional enrichment analysis revealed that these important genes may be associated with immune functions, such as response to cAMP, response to virus, T cell receptor complex, T cell activation, and T cell differentiation. This study further set up different gene expression pattens, represented by classification rules, on three states of COVID-19 and constructed several efficient classifiers to distinguish COVID-19 severity. The findings of this study provided new insights into the biological processes of CD8+ T cells in regulating the immune response.
Collapse
Affiliation(s)
- Jian Lu
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
| | - Mei Meng
- State Key Laboratory of Oncogenes and Related Genes, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - XianChao Zhou
- State Key Laboratory of Oncogenes and Related Genes, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Zhenbing Zeng
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
15
|
Li H, Wang D, Zhou X, Ding S, Guo W, Zhang S, Li Z, Huang T, Cai YD. Characterization of spleen and lymph node cell types via CITE-seq and machine learning methods. Front Mol Neurosci 2022; 15:1033159. [PMID: 36311013 PMCID: PMC9608858 DOI: 10.3389/fnmol.2022.1033159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 09/26/2022] [Indexed: 11/13/2022] Open
Abstract
The spleen and lymph nodes are important functional organs for human immune system. The identification of cell types for spleen and lymph nodes is helpful for understanding the mechanism of immune system. However, the cell types of spleen and lymph are highly diverse in the human body. Therefore, in this study, we employed a series of machine learning algorithms to computationally analyze the cell types of spleen and lymph based on single-cell CITE-seq sequencing data. A total of 28,211 cell data (training vs. test = 14,435 vs. 13,776) involving 24 cell types were collected for this study. For the training dataset, it was analyzed by Boruta and minimum redundancy maximum relevance (mRMR) one by one, resulting in an mRMR feature list. This list was fed into the incremental feature selection (IFS) method, incorporating four classification algorithms (deep forest, random forest, K-nearest neighbor, and decision tree). Some essential features were discovered and the deep forest with its optimal features achieved the best performance. A group of related proteins (CD4, TCRb, CD103, CD43, and CD23) and genes (Nkg7 and Thy1) contributing to the classification of spleen and lymph nodes cell types were analyzed. Furthermore, the classification rules yielded by decision tree were also provided and analyzed. Above findings may provide helpful information for deepening our understanding on the diversity of cell types.
Collapse
Affiliation(s)
- Hao Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Deling Wang
- State Key Laboratory of Oncology in South China, Department of Radiology, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Xianchao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Institutes for Biological Sciences (SIBS), Shanghai Jiao Tong University School of Medicine (SJTUSM), Chinese Academy of Sciences (CAS), Shanghai, China
| | - Shiqi Zhang
- Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Zhandong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- Yu-Dong Cai,
| |
Collapse
|
16
|
Lu J, Li J, Ren J, Ding S, Zeng Z, Huang T, Cai YD. Functional and embedding feature analysis for pan-cancer classification. Front Oncol 2022; 12:979336. [PMID: 36248961 PMCID: PMC9559388 DOI: 10.3389/fonc.2022.979336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
With the increasing number of people suffering from cancer, this illness has become a major health problem worldwide. Exploring the biological functions and signaling pathways of carcinogenesis is essential for cancer detection and research. In this study, a mutation dataset for eleven cancer types was first obtained from a web-based resource called cBioPortal for Cancer Genomics, followed by extracting 21,049 features from three aspects: relationship to GO and KEGG (enrichment features), mutated genes learned by word2vec (text features), and protein-protein interaction network analyzed by node2vec (network features). Irrelevant features were then excluded using the Boruta feature filtering method, and the retained relevant features were ranked by four feature selection methods (least absolute shrinkage and selection operator, minimum redundancy maximum relevance, Monte Carlo feature selection and light gradient boosting machine) to generate four feature-ranked lists. Incremental feature selection was used to determine the optimal number of features based on these feature lists to build the optimal classifiers and derive interpretable classification rules. The results of four feature-ranking methods were integrated to identify key functional pathways, such as olfactory transduction (hsa04740) and colorectal cancer (hsa05210), and the roles of these functional pathways in cancers were discussed in reference to literature. Overall, this machine learning-based study revealed the altered biological functions of cancers and provided a reference for the mechanisms of different cancers.
Collapse
Affiliation(s)
- Jian Lu
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
| | - JiaRui Li
- Advanced Research Computing, University of British Columbia, Vancouver, BC, Canada
| | - Jingxin Ren
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Zhenbing Zeng
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
- *Correspondence: Zhenbing Zeng, ; Tao Huang, ; Yu-Dong Cai,
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Zhenbing Zeng, ; Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Zhenbing Zeng, ; Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
17
|
Jian F, Huang F, Zhang YH, Huang T, Cai YD. Identifying anal and cervical tumorigenesis-associated methylation signaling with machine learning methods. Front Oncol 2022; 12:998032. [PMID: 36249027 PMCID: PMC9557006 DOI: 10.3389/fonc.2022.998032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
Cervical and anal carcinoma are neoplastic diseases with various intraepithelial neoplasia stages. The underlying mechanisms for cancer initiation and progression have not been fully revealed. DNA methylation has been shown to be aberrantly regulated during tumorigenesis in anal and cervical carcinoma, revealing the important roles of DNA methylation signaling as a biomarker to distinguish cancer stages in clinics. In this research, several machine learning methods were used to analyze the methylation profiles on anal and cervical carcinoma samples, which were divided into three classes representing various stages of tumor progression. Advanced feature selection methods, including Boruta, LASSO, LightGBM, and MCFS, were used to select methylation features that are highly correlated with cancer progression. Some methylation probes including cg01550828 and its corresponding gene RNF168 have been reported to be associated with human papilloma virus-related anal cancer. As for biomarkers for cervical carcinoma, cg27012396 and its functional gene HDAC4 were confirmed to regulate the glycolysis and survival of hypoxic tumor cells in cervical carcinoma. Furthermore, we developed effective classifiers for identifying various tumor stages and derived classification rules that reflect the quantitative impact of methylation on tumorigenesis. The current study identified methylation signals associated with the development of cervical and anal carcinoma at qualitative and quantitative levels using advanced machine learning methods.
Collapse
Affiliation(s)
- Fangfang Jian
- Department of Obstetrics & Gynecology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
18
|
Liu Z, Meng M, Ding S, Zhou X, Feng K, Huang T, Cai YD. Identification of methylation signatures and rules for predicting the severity of SARS-CoV-2 infection with machine learning methods. Front Microbiol 2022; 13:1007295. [PMID: 36212830 PMCID: PMC9537378 DOI: 10.3389/fmicb.2022.1007295] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 09/01/2022] [Indexed: 11/17/2022] Open
Abstract
Patients infected with SARS-CoV-2 at various severities have different clinical manifestations and treatments. Mild or moderate patients usually recover with conventional medical treatment, but severe patients require prompt professional treatment. Thus, stratifying infected patients for targeted treatment is meaningful. A computational workflow was designed in this study to identify key blood methylation features and rules that can distinguish the severity of SARS-CoV-2 infection. First, the methylation features in the expression profile were deeply analyzed by a Monte Carlo feature selection method. A feature list was generated. Next, this ranked feature list was fed into the incremental feature selection method to determine the optimal features for different classification algorithms, thereby further building optimal classifiers. These selected key features were analyzed by functional enrichment to detect their biofunctional information. Furthermore, a set of rules were set up by a white-box algorithm, decision tree, to uncover different methylation patterns on various severity of SARS-CoV-2 infection. Some genes (PARP9, MX1, IRF7), corresponding to essential methylation sites, and rules were validated by published academic literature. Overall, this study contributes to revealing potential expression features and provides a reference for patient stratification. The physicians can prioritize and allocate health and medical resources for COVID-19 patients based on their predicted severe clinical outcomes.
Collapse
Affiliation(s)
- Zhiyang Liu
- School of Life Sciences, Changchun Sci-Tech University, Changchun, China
| | - Mei Meng
- State Key Laboratory of Oncogenes and Related Genes, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - ShiJian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - XiaoChao Zhou
- State Key Laboratory of Oncogenes and Related Genes, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- Yu-Dong Cai,
| |
Collapse
|
19
|
Subcellular Localization Prediction of Human Proteins Using Multifeature Selection Methods. BIOMED RESEARCH INTERNATIONAL 2022; 2022:3288527. [PMID: 36132086 PMCID: PMC9484878 DOI: 10.1155/2022/3288527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 08/30/2022] [Indexed: 11/25/2022]
Abstract
Subcellular localization attempts to assign proteins to one of the cell compartments that performs specific biological functions. Finding the link between proteins, biological functions, and subcellular localization is an effective way to investigate the general organization of living cells in a systematic manner. However, determining the subcellular localization of proteins by traditional experimental approaches is difficult. Here, protein–protein interaction networks, functional enrichment on gene ontology and pathway, and a set of proteins having confirmed subcellular localization were applied to build prediction models for human protein subcellular localizations. To build an effective predictive model, we employed a variety of robust machine learning algorithms, including Boruta feature selection, minimum redundancy maximum relevance, Monte Carlo feature selection, and LightGBM. Then, the incremental feature selection method with random forest and support vector machine was used to discover the essential features. Furthermore, 38 key features were determined by integrating results of different feature selection methods, which may provide critical insights into the subcellular location of proteins. Their biological functions of subcellular localizations were discussed according to recent publications. In summary, our computational framework can help advance the understanding of subcellular localization prediction techniques and provide a new perspective to investigate the patterns of protein subcellular localization and their biological importance.
Collapse
|
20
|
Yang L, Zhang YH, Huang F, Li Z, Huang T, Cai YD. Identification of protein–protein interaction associated functions based on gene ontology and KEGG pathway. Front Genet 2022; 13:1011659. [PMID: 36171880 PMCID: PMC9511048 DOI: 10.3389/fgene.2022.1011659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
Protein–protein interactions (PPIs) are extremely important for gaining mechanistic insights into the functional organization of the proteome. The resolution of PPI functions can help in the identification of novel diagnostic and therapeutic targets with medical utility, thus facilitating the development of new medications. However, the traditional methods for resolving PPI functions are mainly experimental methods, such as co-immunoprecipitation, pull-down assays, cross-linking, label transfer, and far-Western blot analysis, that are not only expensive but also time-consuming. In this study, we constructed an integrated feature selection scheme for the large-scale selection of the relevant functions of PPIs by using the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotations of PPI participants. First, we encoded the proteins in each PPI with their gene ontologies and KEGG pathways. Then, the encoded protein features were refined as features of both positive and negative PPIs. Subsequently, Boruta was used for the initial filtering of features to obtain 5684 features. Three feature ranking algorithms, namely, least absolute shrinkage and selection operator, light gradient boosting machine, and max-relevance and min-redundancy, were applied to evaluate feature importance. Finally, the top-ranked features derived from multiple datasets were comprehensively evaluated, and the intersection of results mined by three feature ranking algorithms was taken to identify the features with high correlation with PPIs. Some functional terms were identified in our study, including cytokine–cytokine receptor interaction (hsa04060), intrinsic component of membrane (GO:0031224), and protein-binding biological process (GO:0005515). Our newly proposed integrated computational approach offers a novel perspective of the large-scale mining of biological functions linked to PPI.
Collapse
Affiliation(s)
- Lili Yang
- Measurement Biotechnique Research Center, School of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - ZhanDong Li
- Measurement Biotechnique Research Center, School of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
21
|
Lu S, Wang H, Zhang J. Identification of uveitis-associated functions based on the feature selection analysis of gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment scores. Front Mol Neurosci 2022; 15:1007352. [PMID: 36157069 PMCID: PMC9493498 DOI: 10.3389/fnmol.2022.1007352] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
Uveitis is a typical type of eye inflammation affecting the middle layer of eye (i.e., uvea layer) and can lead to blindness in middle-aged and young people. Therefore, a comprehensive study determining the disease susceptibility and the underlying mechanisms for uveitis initiation and progression is urgently needed for the development of effective treatments. In the present study, 108 uveitis-related genes are collected on the basis of literature mining, and 17,560 other human genes are collected from the Ensembl database, which are treated as non-uveitis genes. Uveitis- and non-uveitis-related genes are then encoded by gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment scores based on the genes and their neighbors in STRING, resulting in 20,681 GO term features and 297 KEGG pathway features. Subsequently, we identify functions and biological processes that can distinguish uveitis-related genes from other human genes by using an integrated feature selection method, which incorporate feature filtering method (Boruta) and four feature importance assessment methods (i.e., LASSO, LightGBM, MCFS, and mRMR). Some essential GO terms and KEGG pathways related to uveitis, such as GO:0001841 (neural tube formation), has04612 (antigen processing and presentation in human beings), and GO:0043379 (memory T cell differentiation), are identified. The plausibility of the association of mined functional features with uveitis is verified on the basis of the literature. Overall, several advanced machine learning methods are used in the current study to uncover specific functions of uveitis and provide a theoretical foundation for the clinical treatment of uveitis.
Collapse
Affiliation(s)
- Shiheng Lu
- Department of Ophthalmology, Shanghai Eye Disease Prevention and Treatment Center, Shanghai Eye Hospital, Shanghai, China
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai, China
- Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China
- National Clinical Research Center for Eye Diseases, Shanghai, China
- Shanghai Engineering Research Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
- *Correspondence: Shiheng Lu,
| | - Hui Wang
- Department of Orthopedics, Shanghai Yangpu Hospital of Traditional Chinese Medicine, Shanghai, China
| | - Jian Zhang
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai, China
- Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China
- National Clinical Research Center for Eye Diseases, Shanghai, China
- Shanghai Engineering Research Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
- Jian Zhang,
| |
Collapse
|
22
|
Identification of Human Cell Cycle Phase Markers Based on Single-Cell RNA-Seq Data by Using Machine Learning Methods. BIOMED RESEARCH INTERNATIONAL 2022; 2022:2516653. [PMID: 36004205 PMCID: PMC9393965 DOI: 10.1155/2022/2516653] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 07/25/2022] [Accepted: 07/29/2022] [Indexed: 12/17/2022]
Abstract
The cell cycle is composed of a series of ordered, highly regulated processes through which a cell grows and duplicates its genome and eventually divides into two daughter cells. According to the complex changes in cell structure and biosynthesis, the cell cycle is divided into four phases: gap 1 (G1), DNA synthesis (S), gap 2 (G2), and mitosis (M). Determining which cell cycle phases a cell is in is critical to the research of cancer development and pharmacy for targeting cell cycle. However, current detection methods have the following problems: (1) they are complicated and time consuming to perform, and (2) they cannot detect the cell cycle on a large scale. Rapid developments in single-cell technology have made dissecting cells on a large scale possible with unprecedented resolution. In the present research, we construct efficient classifiers and identify essential gene biomarkers based on single-cell RNA sequencing data through Boruta and three feature ranking algorithms (e.g., mRMR, MCFS, and SHAP by LightGBM) by utilizing four advanced classification algorithms. Meanwhile, we mine a series of classification rules that can distinguish different cell cycle phases. Collectively, we have provided a novel method for determining the cell cycle and identified new potential cell cycle-related genes, thereby contributing to the understanding of the processes that regulate the cell cycle.
Collapse
|
23
|
Song J, Huang F, Chen L, Feng K, Jian F, Huang T, Cai YD. Identification of methylation signatures associated with CAR T cell in B-cell acute lymphoblastic leukemia and non-hodgkin’s lymphoma. Front Oncol 2022; 12:976262. [PMID: 36033519 PMCID: PMC9402909 DOI: 10.3389/fonc.2022.976262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 07/25/2022] [Indexed: 11/13/2022] Open
Abstract
CD19-targeted CAR T cell immunotherapy has exceptional efficacy for the treatment of B-cell malignancies. B-cell acute lymphocytic leukemia and non-Hodgkin’s lymphoma are two common B-cell malignancies with high recurrence rate and are refractory to cure. Although CAR T-cell immunotherapy overcomes the limitations of conventional treatments for such malignancies, failure of treatment and tumor recurrence remain common. In this study, we searched for important methylation signatures to differentiate CAR-transduced and untransduced T cells from patients with acute lymphoblastic leukemia and non-Hodgkin’s lymphoma. First, we used three feature ranking methods, namely, Monte Carlo feature selection, light gradient boosting machine, and least absolute shrinkage and selection operator, to rank all methylation features in order of their importance. Then, the incremental feature selection method was adopted to construct efficient classifiers and filter the optimal feature subsets. Some important methylated genes, namely, SERPINB6, ANK1, PDCD5, DAPK2, and DNAJB6, were identified. Furthermore, the classification rules for distinguishing different classes were established, which can precisely describe the role of methylation features in the classification. Overall, we applied advanced machine learning approaches to the high-throughput data, investigating the mechanism of CAR T cells to establish the theoretical foundation for modifying CAR T cells.
Collapse
Affiliation(s)
- Jiwei Song
- College of Life Science, Changchun Sci-Tech University, Shuangyang, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Fangfang Jian
- Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
24
|
Li H, Huang F, Liao H, Li Z, Feng K, Huang T, Cai YD. Identification of COVID-19-Specific Immune Markers Using a Machine Learning Method. Front Mol Biosci 2022; 9:952626. [PMID: 35928229 PMCID: PMC9344575 DOI: 10.3389/fmolb.2022.952626] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 06/21/2022] [Indexed: 01/08/2023] Open
Abstract
Notably, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a tight relationship with the immune system. Human resistance to COVID-19 infection comprises two stages. The first stage is immune defense, while the second stage is extensive inflammation. This process is further divided into innate and adaptive immunity during the immune defense phase. These two stages involve various immune cells, including CD4+ T cells, CD8+ T cells, monocytes, dendritic cells, B cells, and natural killer cells. Various immune cells are involved and make up the complex and unique immune system response to COVID-19, providing characteristics that set it apart from other respiratory infectious diseases. In the present study, we identified cell markers for differentiating COVID-19 from common inflammatory responses, non-COVID-19 severe respiratory diseases, and healthy populations based on single-cell profiling of the gene expression of six immune cell types by using Boruta and mRMR feature selection methods. Some features such as IFI44L in B cells, S100A8 in monocytes, and NCR2 in natural killer cells are involved in the innate immune response of COVID-19. Other features such as ZFP36L2 in CD4+ T cells can regulate the inflammatory process of COVID-19. Subsequently, the IFS method was used to determine the best feature subsets and classifiers in the six immune cell types for two classification algorithms. Furthermore, we established the quantitative rules used to distinguish the disease status. The results of this study can provide theoretical support for a more in-depth investigation of COVID-19 pathogenesis and intervention strategies.
Collapse
Affiliation(s)
- Hao Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Feiming Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Huiping Liao
- Ophthalmology and Optometry Medical School, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Zhandong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
25
|
Zhang YH, Li ZD, Zeng T, Chen L, Huang T, Cai YD. Screening gene signatures for clinical response subtypes of lung transplantation. Mol Genet Genomics 2022; 297:1301-1313. [PMID: 35780439 DOI: 10.1007/s00438-022-01918-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 06/12/2022] [Indexed: 11/30/2022]
Abstract
Lung is the most important organ in the human respiratory system, whose normal functions are quite essential for human beings. Under certain pathological conditions, the normal lung functions could no longer be maintained in patients, and lung transplantation is generally applied to ease patients' breathing and prolong their lives. However, several risk factors exist during and after lung transplantation, including bleeding, infection, and transplant rejections. In particular, transplant rejections are difficult to predict or prevent, leading to the most dangerous complications and severe status in patients undergoing lung transplantation. Given that most common monitoring and validation methods for lung transplantation rejections may take quite a long time and have low reproducibility, new technologies and methods are required to improve the efficacy and accuracy of rejection monitoring after lung transplantation. Recently, one previous study set up the gene expression profiles of patients who underwent lung transplantation. However, it did not provide a tool to predict lung transplantation responses. Here, a further deep investigation was conducted on such profiling data. A computational framework, incorporating several machine learning algorithms, such as feature selection methods and classification algorithms, was built to establish an effective prediction model distinguishing patient into different clinical subgroups, corresponding to different rejection responses after lung transplantation. Furthermore, the framework also screened essential genes with functional enrichments and create quantitative rules for the distinction of patients with different rejection responses to lung transplantation. The outcome of this contribution could provide guidelines for clinical treatment of each rejection subtype and contribute to the revealing of complicated rejection mechanisms of lung transplantation.
Collapse
Affiliation(s)
- Yu-Hang Zhang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Zhan Dong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, 130052, China
| | - Tao Zeng
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
26
|
Analysis of Lymphoma-Related Genes with Gene Ontology and Kyoto Encyclopedia of Genes and Genomes Enrichment. BIOMED RESEARCH INTERNATIONAL 2022; 2022:8503511. [PMID: 35795312 PMCID: PMC9251090 DOI: 10.1155/2022/8503511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 05/10/2022] [Accepted: 05/25/2022] [Indexed: 11/18/2022]
Abstract
Lymphoma is a serious malignant tumor that contains more than 70 different types and seriously endangers the body's lymphatic system. The lymphatic system is the regulatory center of the immune system and is important in the immune response to foreign antigens and tumors. Studies showed that multiple genetic variants are associated with lymphoma but determining the pathogenic mechanisms remains a challenge. In the present study, we first applied the Gene Ontology (GO) and KEGG pathway enrichment analyses of lymphoma-associated and lymphoma-nonassociated genes. Next, the Boruta and max-relevance and min-redundancy feature selection methods were performed to filter and rank features. Then, features preselected and ranked using the incremental feature selection method were applied for the decision tree model to identify the best GO terms and KEGG pathways and extract classification rules. Results indicate that our predicted features, such as B-cell activation, negative regulation of protein processing, negative regulation of mast cell cytokine production, and natural killer cell-mediated cytotoxicity, are associated with the biological process of lymphoma, consistent with those of recent publications. This study provides a new perspective for future research on the molecular mechanisms of lymphoma.
Collapse
|
27
|
Li H, Zhang S, Chen L, Pan X, Li Z, Huang T, Cai YD. Identifying Functions of Proteins in Mice With Functional Embedding Features. Front Genet 2022; 13:909040. [PMID: 35651937 PMCID: PMC9149260 DOI: 10.3389/fgene.2022.909040] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 04/28/2022] [Indexed: 12/02/2022] Open
Abstract
In current biology, exploring the biological functions of proteins is important. Given the large number of proteins in some organisms, exploring their functions one by one through traditional experiments is impossible. Therefore, developing quick and reliable methods for identifying protein functions is necessary. Considerable accumulation of protein knowledge and recent developments on computer science provide an alternative way to complete this task, that is, designing computational methods. Several efforts have been made in this field. Most previous methods have adopted the protein sequence features or directly used the linkage from a protein–protein interaction (PPI) network. In this study, we proposed some novel multi-label classifiers, which adopted new embedding features to represent proteins. These features were derived from functional domains and a PPI network via word embedding and network embedding, respectively. The minimum redundancy maximum relevance method was used to assess the features, generating a feature list. Incremental feature selection, incorporating RAndom k-labELsets to construct multi-label classifiers, used such list to construct two optimum classifiers, corresponding to two key measurements: accuracy and exact match. These two classifiers had good performance, and they were superior to classifiers that used features extracted by traditional methods.
Collapse
Affiliation(s)
- Hao Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - ShiQi Zhang
- Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - ZhanDong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.,CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
28
|
Li Z, Pan X, Cai YD. Identification of Type 2 Diabetes Biomarkers From Mixed Single-Cell Sequencing Data With Feature Selection Methods. Front Bioeng Biotechnol 2022; 10:890901. [PMID: 35721855 PMCID: PMC9201257 DOI: 10.3389/fbioe.2022.890901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 04/04/2022] [Indexed: 11/18/2022] Open
Abstract
Diabetes is the most common disease and a major threat to human health. Type 2 diabetes (T2D) makes up about 90% of all cases. With the development of high-throughput sequencing technologies, more and more fundamental pathogenesis of T2D at genetic and transcriptomic levels has been revealed. The recent single-cell sequencing can further reveal the cellular heterogenicity of complex diseases in an unprecedented way. With the expectation on the molecular essence of T2D across multiple cell types, we investigated the expression profiling of more than 1,600 single cells (949 cells from T2D patients and 651 cells from normal controls) and identified the differential expression profiling and characteristics at the transcriptomics level that can distinguish such two groups of cells at the single-cell level. The expression profile was analyzed by several machine learning algorithms, including Monte Carlo feature selection, support vector machine, and repeated incremental pruning to produce error reduction (RIPPER). On one hand, some T2D-associated genes (MTND4P24, MTND2P28, and LOC100128906) were discovered. On the other hand, we revealed novel potential pathogenic mechanisms in a rule manner. They are induced by newly recognized genes and neglected by traditional bulk sequencing techniques. Particularly, the newly identified T2D genes were shown to follow specific quantitative rules with diabetes prediction potentials, and such rules further indicated several potential functional crosstalks involved in T2D.
Collapse
Affiliation(s)
- Zhandong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Xiaoyong Pan
- Key Laboratory of System Control and Information Processing, Institute of Image Processing and Pattern Recognition, Ministry of Education of China, Shanghai Jiao Tong University, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Yu-Dong Cai,
| |
Collapse
|
29
|
Li Z, Huang F, Chen L, Huang T, Cai YD. Identifying In Vitro Cultured Human Hepatocytes Markers with Machine Learning Methods Based on Single-Cell RNA-Seq Data. Front Bioeng Biotechnol 2022; 10:916309. [PMID: 35706505 PMCID: PMC9189284 DOI: 10.3389/fbioe.2022.916309] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 05/11/2022] [Indexed: 01/12/2023] Open
Abstract
Cell transplantation is an effective method for compensating for the loss of liver function and improve patient survival. However, given that hepatocytes cultivated in vitro have diverse developmental processes and physiological features, obtaining hepatocytes that can properly function in vivo is difficult. In the present study, we present an advanced computational analysis on single-cell transcriptional profiling to resolve the heterogeneity of the hepatocyte differentiation process in vitro and to mine biomarkers at different periods of differentiation. We obtained a batch of compressed and effective classification features with the Boruta method and ranked them using the Max-Relevance and Min-Redundancy method. Some key genes were identified during the in vitro culture of hepatocytes, including CD147, which not only regulates terminally differentiated cells in the liver but also affects cell differentiation. PPIA, which encodes a CD147 ligand, also appeared in the identified gene list, and the combination of the two proteins mediated multiple biological pathways. Other genes, such as TMSB10, TMEM176B, and CD63, which are involved in the maturation and differentiation of hepatocytes and assist different hepatic cell types in performing their roles were also identified. Then, several classifiers were trained and evaluated to obtain optimal classifiers and optimal feature subsets, using three classification algorithms (random forest, k-nearest neighbor, and decision tree) and the incremental feature selection method. The best random forest classifier with a 0.940 Matthews correlation coefficient was constructed to distinguish different hepatic cell types. Finally, classification rules were created for quantitatively describing hepatic cell types. In summary, This study provided potential targets for cell transplantation associated liver disease treatment strategies by elucidating the process and mechanism of hepatocyte development at both qualitative and quantitative levels.
Collapse
Affiliation(s)
- ZhanDong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
30
|
Huang F, Chen L, Guo W, Zhou X, Feng K, Huang T, Cai Y. Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method. Life (Basel) 2022; 12:life12060806. [PMID: 35743837 PMCID: PMC9225528 DOI: 10.3390/life12060806] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/22/2022] [Accepted: 05/25/2022] [Indexed: 12/22/2022] Open
Abstract
SARS-CoV-2 shows great evolutionary capacity through a high frequency of genomic variation during transmission. Evolved SARS-CoV-2 often demonstrates resistance to previous vaccines and can cause poor clinical status in patients. Mutations in the SARS-CoV-2 genome involve mutations in structural and nonstructural proteins, and some of these proteins such as spike proteins have been shown to be directly associated with the clinical status of patients with severe COVID-19 pneumonia. In this study, we collected genome-wide mutation information of virulent strains and the severity of COVID-19 pneumonia in patients varying depending on their clinical status. Important protein mutations and untranslated region mutations were extracted using machine learning methods. First, through Boruta and four ranking algorithms (least absolute shrinkage and selection operator, light gradient boosting machine, max-relevance and min-redundancy, and Monte Carlo feature selection), mutations that were highly correlated with the clinical status of the patients were screened out and sorted in four feature lists. Some mutations such as D614G and V1176F were shown to be associated with viral infectivity. Moreover, previously unreported mutations such as A320V of nsp14 and I164ILV of nsp14 were also identified, which suggests their potential roles. We then applied the incremental feature selection method to each feature list to construct efficient classifiers, which can be directly used to distinguish the clinical status of COVID-19 patients. Meanwhile, four sets of quantitative rules were set up, which can help us to more intuitively understand the role of each mutation in differentiating the clinical status of COVID-19 patients. Identified key mutations linked to virologic properties will help better understand the mechanisms of infection and will aid in the development of antiviral treatments.
Collapse
Affiliation(s)
- Feiming Huang
- School of Life Sciences, Shanghai University, Shanghai 200444, China;
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China;
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) and Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200025, China;
| | - Xianchao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai 200025, China;
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510060, China;
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- Correspondence: (T.H.); (Y.C.); Tel.: +86-21-54923269 (T.H.); +86-21-66136132 (Y.C.)
| | - Yudong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China;
- Correspondence: (T.H.); (Y.C.); Tel.: +86-21-54923269 (T.H.); +86-21-66136132 (Y.C.)
| |
Collapse
|
31
|
Li Z, Guo W, Ding S, Chen L, Feng K, Huang T, Cai YD. Identifying Key MicroRNA Signatures for Neurodegenerative Diseases With Machine Learning Methods. Front Genet 2022; 13:880997. [PMID: 35528544 PMCID: PMC9068882 DOI: 10.3389/fgene.2022.880997] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 03/30/2022] [Indexed: 01/28/2023] Open
Abstract
Neurodegenerative diseases, including Alzheimer's disease (AD), Parkinson's disease, and many other disease types, cause cognitive dysfunctions such as dementia via the progressive loss of structure or function of the body's neurons. However, the etiology of these diseases remains unknown, and diagnosing less common cognitive disorders such as vascular dementia (VaD) remains a challenge. In this work, we developed a machine-leaning-based technique to distinguish between normal control (NC), AD, VaD, dementia with Lewy bodies, and mild cognitive impairment at the microRNA (miRNA) expression level. First, unnecessary miRNA features in the miRNA expression profiles were removed using the Boruta feature selection method, and the retained feature sets were sorted using minimum redundancy maximum relevance and Monte Carlo feature selection to provide two ranking feature lists. The incremental feature selection method was used to construct a series of feature subsets from these feature lists, and the random forest and PART classifiers were trained on the sample data consisting of these feature subsets. On the basis of the model performance of these classifiers with different number of features, the best feature subsets and classifiers were identified, and the classification rules were retrieved from the optimal PART classifiers. Finally, the link between candidate miRNA features, including hsa-miR-3184-5p, has-miR-6088, and has-miR-4649, and neurodegenerative diseases was confirmed using recently published research, laying the groundwork for more research on miRNAs in neurodegenerative diseases for the diagnosis of cognitive impairment and the understanding of potential pathogenic mechanisms.
Collapse
Affiliation(s)
- ZhanDong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) and Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, China
| | - ShiJian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.,CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
32
|
Li Z, Mei Z, Ding S, Chen L, Li H, Feng K, Huang T, Cai YD. Identifying Methylation Signatures and Rules for COVID-19 With Machine Learning Methods. Front Mol Biosci 2022; 9:908080. [PMID: 35620480 PMCID: PMC9127386 DOI: 10.3389/fmolb.2022.908080] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 04/27/2022] [Indexed: 11/13/2022] Open
Abstract
The occurrence of coronavirus disease 2019 (COVID-19) has become a serious challenge to global public health. Definitive and effective treatments for COVID-19 are still lacking, and targeted antiviral drugs are not available. In addition, viruses can regulate host innate immunity and antiviral processes through the epigenome to promote viral self-replication and disease progression. In this study, we first analyzed the methylation dataset of COVID-19 using the Monte Carlo feature selection method to obtain a feature list. This feature list was subjected to the incremental feature selection method combined with a decision tree algorithm to extract key biomarkers, build effective classification models and classification rules that can remarkably distinguish patients with or without COVID-19. EPSTI1, NACAP1, SHROOM3, C19ORF35, and MX1 as the essential features play important roles in the infection and immune response to novel coronavirus. The six significant rules extracted from the optimal classifier quantitatively explained the expression pattern of COVID-19. Therefore, these findings validated that our method can distinguish COVID-19 at the methylation level and provide guidance for the diagnosis and treatment of COVID-19.
Collapse
Affiliation(s)
- Zhandong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Zi Mei
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Hao Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
33
|
Li Z, Guo W, Zeng T, Yin J, Feng K, Huang T, Cai YD. Detecting Brain Structure-Specific Methylation Signatures and Rules for Alzheimer's Disease. Front Neurosci 2022; 16:895181. [PMID: 35585924 PMCID: PMC9108872 DOI: 10.3389/fnins.2022.895181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Accepted: 04/11/2022] [Indexed: 01/01/2023] Open
Abstract
Alzheimer's disease (AD) is a progressive disease that leads to irreversible behavioral changes, erratic emotions, and loss of motor skills. These conditions make people with AD hard or almost impossible to take care of. Multiple internal and external pathological factors may affect or even trigger the initiation and progression of AD. DNA methylation is one of the most effective regulatory roles during AD pathogenesis, and pathological methylation alterations may be potentially different in the various brain structures of people with AD. Although multiple loci associated with AD initiation and progression have been identified, the spatial distribution patterns of AD-associated DNA methylation in the brain have not been clarified. According to the systematic methylation profiles on different structural brain regions, we applied multiple machine learning algorithms to investigate such profiles. First, the profile on each brain region was analyzed by the Boruta feature filtering method. Some important methylation features were extracted and further analyzed by the max-relevance and min-redundancy method, resulting in a feature list. Then, the incremental feature selection method, incorporating some classification algorithms, adopted such list to identify candidate AD-associated loci at methylation with structural specificity, establish a group of quantitative rules for revealing the effects of DNA methylation in various brain regions (i.e., four brain structures) on AD pathogenesis. Furthermore, some efficient classifiers based on essential methylation sites were proposed to identify AD samples. Results revealed that methylation alterations in different brain structures have different contributions to AD pathogenesis. This study further illustrates the complex pathological mechanisms of AD.
Collapse
Affiliation(s)
- ZhanDong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Tao Zeng
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Jie Yin
- Cancer Institute, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Department of Human Genetics, Institute of Genetics, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
34
|
Zhou X, Ding S, Wang D, Chen L, Feng K, Huang T, Li Z, Cai Y. Identification of Cell Markers and Their Expression Patterns in Skin Based on Single-Cell RNA-Sequencing Profiles. Life (Basel) 2022; 12:life12040550. [PMID: 35455041 PMCID: PMC9025372 DOI: 10.3390/life12040550] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 03/27/2022] [Accepted: 04/04/2022] [Indexed: 12/19/2022] Open
Abstract
Atopic dermatitis and psoriasis are members of a family of inflammatory skin disorders. Cellular immune responses in skin tissues contribute to the development of these diseases. However, their underlying immune mechanisms remain to be fully elucidated. We developed a computational pipeline for analyzing the single-cell RNA-sequencing profiles of the Human Cell Atlas skin dataset to investigate the pathological mechanisms of skin diseases. First, we applied the maximum relevance criterion and the Boruta feature selection method to exclude irrelevant gene features from the single-cell gene expression profiles of inflammatory skin disease samples and healthy controls. The retained gene features were ranked by using the Monte Carlo feature selection method on the basis of their importance, and a feature list was compiled. This list was then introduced into the incremental feature selection method that combined the decision tree and random forest algorithms to extract important cell markers and thus build excellent classifiers and decision rules. These cell markers and their expression patterns have been analyzed and validated in recent studies and are potential therapeutic and diagnostic targets for skin diseases because their expression affects the pathogenesis of inflammatory skin diseases.
Collapse
Affiliation(s)
- Xianchao Zhou
- School of Life Sciences, Shanghai University, Shanghai 200444, China; (X.Z.); (S.D.)
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai 200444, China; (X.Z.); (S.D.)
| | - Deling Wang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Department of Medical Imaging, Sun Yat-sen University Cancer Center, Guangzhou 510060, China;
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China;
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China;
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- Correspondence: (T.H.); (Z.L.); (Y.C.); Tel.: +86-21-54923269 (T.H.); +86-21-66136132 (Y.C.)
| | - Zhandong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun 130052, China
- Correspondence: (T.H.); (Z.L.); (Y.C.); Tel.: +86-21-54923269 (T.H.); +86-21-66136132 (Y.C.)
| | - Yudong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China; (X.Z.); (S.D.)
- Correspondence: (T.H.); (Z.L.); (Y.C.); Tel.: +86-21-54923269 (T.H.); +86-21-66136132 (Y.C.)
| |
Collapse
|
35
|
Li Z, Wang D, Liao H, Zhang S, Guo W, Chen L, Lu L, Huang T, Cai YD. Exploring the Genomic Patterns in Human and Mouse Cerebellums Via Single-Cell Sequencing and Machine Learning Method. Front Genet 2022; 13:857851. [PMID: 35309141 PMCID: PMC8930846 DOI: 10.3389/fgene.2022.857851] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 02/09/2022] [Indexed: 12/29/2022] Open
Abstract
In mammals, the cerebellum plays an important role in movement control. Cellular research reveals that the cerebellum involves a variety of sub-cell types, including Golgi, granule, interneuron, and unipolar brush cells. The functional characteristics of cerebellar cells exhibit considerable differences among diverse mammalian species, reflecting a potential development and evolution of nervous system. In this study, we aimed to recognize the transcriptional differences between human and mouse cerebellum in four cerebellar sub-cell types by using single-cell sequencing data and machine learning methods. A total of 321,387 single-cell sequencing data were used. The 321,387 cells included 4 cell types, i.e., Golgi (5,048, 1.57%), granule (250,307, 77.88%), interneuron (60,526, 18.83%), and unipolar brush (5,506, 1.72%) cells. Our results showed that by using gene expression profiles as features, the optimal classification model could achieve very high even perfect performance for Golgi, granule, interneuron, and unipolar brush cells, respectively, suggesting a remarkable difference between the genomic profiles of human and mouse. Furthermore, a group of related genes and rules contributing to the classification was identified, which might provide helpful information for deepening the understanding of cerebellar cell heterogeneity and evolution.
Collapse
Affiliation(s)
- ZhanDong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Deling Wang
- Department of Radiology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - HuiPing Liao
- Eye Institute of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - ShiQi Zhang
- Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Lin Lu
- Department of Radiology, Columbia University Medical Center, New York, NY, United States
- *Correspondence: Lin Lu, ; Tao Huang, ; Yu-Dong Cai,
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Lin Lu, ; Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Lin Lu, ; Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|