1
|
Bagolini A, Di Novo NG, Pedrotti S, Valt M, Collini C, Pugno NM, Lorenzelli L. Beveled microneedles with channel for transdermal injection and sampling, fabricated with minimal steps and standard MEMS technology. LAB ON A CHIP 2024. [PMID: 39665277 DOI: 10.1039/d4lc00880d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2024]
Abstract
Microneedles hold the potential for enabling shallow skin penetration applications where biomarkers are extracted from the interstitial fluid (ISF) and drugs are injected in a painless and effective manner. To this purpose, needles must have an inner channel. Channeled needles were demonstrated using custom silicon microtechnology, having several needle tip geometries. Nevertheless, all the proposed fabrication sequences are not compatible with mass production based on mature, standard microfabrication techniques. Furthermore, ISF extraction was also demonstrated with channeled needles but under poorly controlled conditions and over long periods of time, the latter being impractical for medical use. A range of factors may impede or slow ISF extraction that require controlled experiments. In this work we address the above tasks in terms of microfabrication sequence design, tip geometry design and experimental validation under controlled conditions. We report the development and fabrication of a silicon channeled microneedle array using conventional, industrial micromechanic processes. With only 2 lithography steps, a hypodermic needle tip profile is achieved. Using the fabricated microneedles, fluid extraction is experimented on chicken skin mockups. Extraction tests are carried out by inducing a controlled pressure gradient between the two ends of the microneedle channels, generated by loading the chip or by applying vacuum to the chip's backside. The extraction of more than 1 μL of fluid in 20 minutes is demonstrated with a maximum applied pressure gradient of 500 mbar. A correlation between the extraction rate efficiency and needles' density is observed, both for short and long extraction times. These results provide the first demonstration of in vitro interstitial fluid collection under controlled experimental conditions using silicon hollow microneedles fabricated with standard micro electro mechanical systems (MEMS) fabrication technology and minimal steps. Based on the obtained data, a comparison is drawn between pressure load and vacuum as drivers for ISF extraction, according to modelling and controlled experiments.
Collapse
Affiliation(s)
- Alvise Bagolini
- Sensors and Devices Center, Bruno Kessler Foundation, Via Sommarive 18, 38123 Trento, Italy.
| | - Nicolò G Di Novo
- Laboratory of Bioinspired, Bionic, Nano, Meta, Materials & Mechanics, Department of Civil, Environmental and Mechanical Engineering, University of Trento, Via Mesiano, 77, 38123 Trento, Italy
| | - Severino Pedrotti
- Sensors and Devices Center, Bruno Kessler Foundation, Via Sommarive 18, 38123 Trento, Italy.
| | - Matteo Valt
- Sensors and Devices Center, Bruno Kessler Foundation, Via Sommarive 18, 38123 Trento, Italy.
| | - Cristian Collini
- Sensors and Devices Center, Bruno Kessler Foundation, Via Sommarive 18, 38123 Trento, Italy.
| | - Nicola M Pugno
- Laboratory of Bioinspired, Bionic, Nano, Meta, Materials & Mechanics, Department of Civil, Environmental and Mechanical Engineering, University of Trento, Via Mesiano, 77, 38123 Trento, Italy
- School of Engineering and Materials Science, Queen Mary University of London, Mile End Road, London E1 4NS, UK
| | - Leandro Lorenzelli
- Sensors and Devices Center, Bruno Kessler Foundation, Via Sommarive 18, 38123 Trento, Italy.
| |
Collapse
|
2
|
Xu Y, Zhang W, Zheng X, Cai X. Combining Global-Constrained Concept Factorization and a Regularized Gaussian Graphical Model for Clustering Single-Cell RNA-seq Data. Interdiscip Sci 2024; 16:1-15. [PMID: 37815679 DOI: 10.1007/s12539-023-00587-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 09/14/2023] [Accepted: 09/17/2023] [Indexed: 10/11/2023]
Abstract
Single-cell RNA sequencing technology is one of the most cost-effective ways to uncover transcriptomic heterogeneity. With the rapid rise of this technology, enormous amounts of scRNA-seq data have been produced. Due to the high dimensionality, noise, sparsity and missing features of the available scRNA-seq data, accurately clustering the scRNA-seq data for downstream analysis is a significant challenge. Many computational methods have been designed to address this issue; nevertheless, the efficacy of the available methods is still inadequate. In addition, most similarity-based methods require a number of clusters as input, which is difficult to achieve in real applications. In this study, we developed a novel computational method for clustering scRNA-seq data by considering both global and local information, named GCFG. This method characterizes the global properties of data by applying concept factorization, and the regularized Gaussian graphical model is utilized to evaluate the local embedding relationship of data. To learn the cell-cell similarity matrix, we integrated the two components, and an iterative optimization algorithm was developed. The categorization of single cells is obtained by applying Louvain, a modularity-based community discovery algorithm, to the similarity matrix. The behavior of the GCFG approach is assessed on 14 real scRNA-seq datasets in terms of ACC and ARI, and comparison results with 17 other competitive methods suggest that GCFG is effective and robust.
Collapse
Affiliation(s)
- Yaxin Xu
- School of Sciences, East China Jiaotong University, Nanchang, 330013, China
| | - Wei Zhang
- School of Sciences, East China Jiaotong University, Nanchang, 330013, China.
| | - Xiaoying Zheng
- Operations Research and Planning Department, Naval University of Engineering, Wuhan, 430033, China
| | - Xianxian Cai
- School of Sciences, East China Jiaotong University, Nanchang, 330013, China
| |
Collapse
|
3
|
Guan W, Zhang N, Bains A, Martinez A, LiWang PJ. Sustained Delivery of the Antiviral Protein Griffithsin and Its Adhesion to a Biological Surface by a Silk Fibroin Scaffold. MATERIALS (BASEL, SWITZERLAND) 2023; 16:5547. [PMID: 37629837 PMCID: PMC10456748 DOI: 10.3390/ma16165547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 08/02/2023] [Accepted: 08/06/2023] [Indexed: 08/27/2023]
Abstract
The protein Griffithsin (Grft) is a lectin that tightly binds to high-mannose glycosylation sites on viral surfaces. This property allows Grft to potently inhibit many viruses, including HIV-1. The major route of HIV infection is through sexual activity, so an important tool for reducing the risk of infection would be a film that could be inserted vaginally or rectally to inhibit transmission of the virus. We have previously shown that silk fibroin can encapsulate, stabilize, and release various antiviral proteins, including Grft. However, for broad utility as a prevention method, it would be useful for an insertable film to adhere to the mucosal surface so that it remains for several days or weeks to provide longer-term protection from infection. We show here that silk fibroin can be formulated with adhesive properties using the nontoxic polymer hydroxypropyl methylcellulose (HPMC) and glycerol, and that the resulting silk scaffold can both adhere to biological surfaces and release Grft over the course of at least one week. This work advances the possible use of silk fibroin as an anti-viral insertable device to prevent infection by sexually transmitted viruses, including HIV-1.
Collapse
Affiliation(s)
- Wenyan Guan
- Materials and Biomaterials Science and Engineering, University of California Merced, 5200 North Lake Rd., Merced, CA 95343, USA;
| | - Ning Zhang
- Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao 266101, China;
| | - Arjan Bains
- Chemistry and Biochemistry, University of California Merced, 5200 North Lake Rd., Merced, CA 95343, USA;
| | - Airam Martinez
- Department of Bioengineering, University of California Merced, 5200 North Lake Rd., Merced, CA 95343, USA;
| | - Patricia J. LiWang
- Molecular Cell Biology, Health Sciences Research Institute, University of California Merced, 5200 North Lake Rd., Merced, CA 95343, USA
| |
Collapse
|
4
|
Ma QL, Huang FM, Guo W, Feng KY, Huang T, Cai YD. Machine Learning Classification of Time since BNT162b2 COVID-19 Vaccination Based on Array-Measured Antibody Activity. Life (Basel) 2023; 13:1304. [PMID: 37374086 DOI: 10.3390/life13061304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 05/26/2023] [Accepted: 05/29/2023] [Indexed: 06/29/2023] Open
Abstract
Vaccines trigger an immunological response that includes B and T cells, with B cells producing antibodies. SARS-CoV-2 immunity weakens over time after vaccination. Discovering key changes in antigen-reactive antibodies over time after vaccination could help improve vaccine efficiency. In this study, we collected data on blood antibody levels in a cohort of healthcare workers vaccinated for COVID-19 and obtained 73 antigens in samples from four groups according to the duration after vaccination, including 104 unvaccinated healthcare workers, 534 healthcare workers within 60 days after vaccination, 594 healthcare workers between 60 and 180 days after vaccination, and 141 healthcare workers over 180 days after vaccination. Our work was a reanalysis of the data originally collected at Irvine University. This data was obtained in Orange County, California, USA, with the collection process commencing in December 2020. British variant (B.1.1.7), South African variant (B.1.351), and Brazilian/Japanese variant (P.1) were the most prevalent strains during the sampling period. An efficient machine learning based framework containing four feature selection methods (least absolute shrinkage and selection operator, light gradient boosting machine, Monte Carlo feature selection, and maximum relevance minimum redundancy) and four classification algorithms (decision tree, k-nearest neighbor, random forest, and support vector machine) was designed to select essential antibodies against specific antigens. Several efficient classifiers with a weighted F1 value around 0.75 were constructed. The antigen microarray used for identifying antibody levels in the coronavirus features ten distinct SARS-CoV-2 antigens, comprising various segments of both nucleocapsid protein (NP) and spike protein (S). This study revealed that S1 + S2, S1.mFcTag, S1.HisTag, S1, S2, Spike.RBD.His.Bac, Spike.RBD.rFc, and S1.RBD.mFc were most highly ranked among all features, where S1 and S2 are the subunits of Spike, and the suffixes represent the tagging information of different recombinant proteins. Meanwhile, the classification rules were obtained from the optimal decision tree to explain quantitatively the roles of antigens in the classification. This study identified antibodies associated with decreased clinical immunity based on populations with different time spans after vaccination. These antibodies have important implications for maintaining long-term immunity to SARS-CoV-2.
Collapse
Affiliation(s)
- Qing-Lan Ma
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Fei-Ming Huang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China
| | - Kai-Yan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
5
|
Ren J, Zhang Y, Guo W, Feng K, Yuan Y, Huang T, Cai YD. Identification of Genes Associated with the Impairment of Olfactory and Gustatory Functions in COVID-19 via Machine-Learning Methods. Life (Basel) 2023; 13:798. [PMID: 36983953 PMCID: PMC10051382 DOI: 10.3390/life13030798] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 03/10/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19), as a severe respiratory disease, affects many parts of the body, and approximately 20-85% of patients exhibit functional impairment of the senses of smell and taste, some of whom even experience the permanent loss of these senses. These symptoms are not life-threatening but severely affect patients' quality of life and increase the risk of depression and anxiety. The pathological mechanisms of these symptoms have not been fully identified. In the current study, we aimed to identify the important biomarkers at the expression level associated with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection-mediated loss of taste or olfactory ability, and we have suggested the potential pathogenetic mechanisms of COVID-19 complications. We designed a machine-learning-based approach to analyze the transcriptome of 577 COVID-19 patient samples, including 84 COVID-19 samples with a decreased ability to taste or smell and 493 COVID-19 samples without impairment. Each sample was represented by 58,929 gene expression levels. The features were analyzed and sorted by three feature selection methods (least absolute shrinkage and selection operator, light gradient boosting machine, and Monte Carlo feature selection). The optimal feature sets were obtained through incremental feature selection using two classification algorithms: decision tree (DT) and random forest (RF). The top genes identified by these multiple methods (H3-5, NUDT5, and AOC1) are involved in olfactory and gustatory impairments. Meanwhile, a high-performance RF classifier was developed in this study, and three sets of quantitative rules that describe the impairment of olfactory and gustatory functions were obtained based on the optimal DT classifiers. In summary, this study provides a new computation analysis and suggests the latent biomarkers (genes and rules) for predicting olfactory and gustatory impairment caused by COVID-19 complications.
Collapse
Affiliation(s)
- Jingxin Ren
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Yuhang Zhang
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Ye Yuan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
6
|
Identification of Smoking-Associated Transcriptome Aberration in Blood with Machine Learning Methods. BIOMED RESEARCH INTERNATIONAL 2023; 2023:5333361. [PMID: 36644165 PMCID: PMC9833906 DOI: 10.1155/2023/5333361] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Revised: 12/15/2022] [Accepted: 12/15/2022] [Indexed: 01/06/2023]
Abstract
Long-term cigarette smoking causes various human diseases, including respiratory disease, cancer, and gastrointestinal (GI) disorders. Alterations in gene expression and variable splicing processes induced by smoking are associated with the development of diseases. This study applied advanced machine learning methods to identify the isoforms with important roles in distinguishing smokers from former smokers based on the expression profile of isoforms from current and former smokers collected in one previous study. These isoforms were deemed as features, which were first analyzed by the Boruta to select features highly correlated with the target variables. Then, the selected features were evaluated by four feature ranking algorithms, resulting in four feature lists. The incremental feature selection method was applied to each list for obtaining the optimal feature subsets and building high-performance classification models. Furthermore, a series of classification rules were accessed by decision tree with the highest performance. Eventually, the rationality of the mined isoforms (features) and classification rules was verified by reviewing previous research. Features such as isoforms ENST00000464835 (expressed by LRRN3), ENST00000622663 (expressed by SASH1), and ENST00000284311 (expressed by GPR15), and pathways (cytotoxicity mediated by natural killer cell and cytokine-cytokine receptor interaction) revealed by the enrichment analysis, were highly relevant to smoking response, suggesting the robustness of our analysis pipeline.
Collapse
|
7
|
Wu C, Chen L. A model with deep analysis on a large drug network for drug classification. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:383-401. [PMID: 36650771 DOI: 10.3934/mbe.2023018] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Drugs are an important means to treat various diseases. They are classified into several classes to indicate their properties and effects. Those in the same class always share some important features. The Kyoto Encyclopedia of Genes and Genomes (KEGG) DRUG recently reported a new drug classification system that classifies drugs into 14 classes. Correct identification of the class for any possible drug-like compound is helpful to roughly determine its effects for a particular type of disease. Experiments could be conducted to confirm such latent effects, thus accelerating the procedures for discovering novel drugs. In this study, this classification system was investigated. A classification model was proposed to assign one of the classes in the system to any given drug for the first time. Different from traditional fingerprint features, which indicated essential drug properties alone and were very popular in investigating drug-related problems, drugs were represented by novel features derived from a large drug network via a well-known network embedding algorithm called Node2vec. These features abstracted the drug associations generated from their essential properties, and they could overview each drug with all drugs as background. As class sizes were of great differences, synthetic minority over-sampling technique (SMOTE) was employed to tackle the imbalance problem. A balanced dataset was fed into the support vector machine to build the model. The 10-fold cross-validation results suggested the excellent performance of the model. This model was also superior to models using other drug features, including those generated by another network embedding algorithm and fingerprint features. Furthermore, this model provided more balanced performance across all classes than that without SMOTE.
Collapse
Affiliation(s)
- Chenhao Wu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
8
|
Ren J, Zhou X, Guo W, Feng K, Huang T, Cai YD. Identification of Methylation Signatures and Rules for Sarcoma Subtypes by Machine Learning Methods. BIOMED RESEARCH INTERNATIONAL 2022; 2022:5297235. [PMID: 36619306 PMCID: PMC9812612 DOI: 10.1155/2022/5297235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/28/2022] [Accepted: 12/08/2022] [Indexed: 12/31/2022]
Abstract
Sarcoma, the second common type of solid tumor in children and adolescents, has a wide variety of subtypes that are often not properly diagnosed at an early stage, leading to late metastases and causing serious loss of life and property to patients and families. It exhibits a high degree of heterogeneity at the cellular, molecular, and epigenetic levels, where DNA methylation has been proposed to play a role in the diagnosis of sarcoma subtypes. Thus, this study is aimed at finding potential biomarkers at the DNA methylation level to distinguish different sarcoma subtypes. A machine learning process was designed to analyse sarcoma samples, each of which was represented by lots of methylation sites. Irrelevant sites were removed using the Boruta method, and remaining sites related to the target variables were kept for further analyses. Afterward, three feature ranking methods (LASSO, LightGBM, and MCFS) were adopted to rank these features, and six classification models were constructed by combining incremental feature selection and two classification algorithms (decision tree and random forest). Among these models, the performance of RF model was higher than that of DT model under all three ranking conditions. The specific expression of genes obtained from the annotation of highly correlated methylation site features, such as PRKAR1B, INPP5A, and GLI3, was proven to be associated with sarcoma by publications. Moreover, the quantitative rules obtained by decision tree algorithm helped us to understand the essential differences between various sarcoma types and classify sarcoma subtypes, providing a new means of clinical identification and determining new therapeutic targets.
Collapse
Affiliation(s)
- Jingxin Ren
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - XianChao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
9
|
Jin JQ, Wu D, Spencer R, Elhage KG, Liu J, Davis M, Hakimi M, Kumar S, Huang ZM, Bhutani T, Liao W. Biologic insights from single-cell studies of psoriasis and psoriatic arthritis. Expert Opin Biol Ther 2022; 22:1449-1461. [PMID: 36317702 DOI: 10.1080/14712598.2022.2142465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
INTRODUCTION Psoriasis (PSO) and psoriatic arthritis (PSA) represent a large burden of global inflammatory disease, but sustained treatment response and early diagnosis remain challenging. Both conditions arise from complex immune cell dysregulation. Single-cell techniques, including single-cell RNA sequencing (scRNA-seq), have revolutionized our understanding of pathogenesis by illuminating heterogeneous cell populations and their interactions. AREAS COVERED We discuss the transcriptional profiles and cellular interactions unique to PSO/PSA affecting T cells, myeloid cells, keratinocytes, innate lymphoid cells, and stromal cells. We also review advances, limitations, and future challenges associated with single-cell studies. EXPERT OPINION Following analyses of 22 single-cell studies, several themes emerged. A small subpopulation of cells can have a large impact on disease pathogenesis. Multiple cell types identified via scRNA-seq play supporting roles in PSO pathogenesis, contrary to the traditional paradigm focusing on IL-23/IL-17 signaling among dendritic cells and T cells. Immune cell states are dynamic, with psoriatic subpopulations aberrantly re-activating and differentiating into inflammatory phenotypes depending on surrounding signaling cues. Comparison of circulating immune cells with resident skin/joint cells has uncovered specific T cell clonotypes associated with the disease. Finally, machine learning models demonstrate great promise in identifying biomarkers to diagnose clinically ambiguous rashes and PSA at earlier stages.
Collapse
Affiliation(s)
- Joy Q Jin
- Department of Medicine, UCSF School of Medicine, San Francisco, CA, USA.,Department of Dermatology, University of California at San Francisco, San Francisco, CA, USA
| | - David Wu
- Department of Medicine, UCSF School of Medicine, San Francisco, CA, USA.,Department of Dermatology, University of California at San Francisco, San Francisco, CA, USA
| | - Riley Spencer
- Department of Dermatology, University of California at San Francisco, San Francisco, CA, USA
| | - Kareem G Elhage
- Department of Dermatology, University of California at San Francisco, San Francisco, CA, USA
| | - Jared Liu
- Department of Dermatology, University of California at San Francisco, San Francisco, CA, USA
| | - Mitchell Davis
- Department of Dermatology, University of California at San Francisco, San Francisco, CA, USA
| | - Marwa Hakimi
- Department of Dermatology, University of California at San Francisco, San Francisco, CA, USA
| | - Sugandh Kumar
- Department of Dermatology, University of California at San Francisco, San Francisco, CA, USA
| | - Zhi-Ming Huang
- Department of Dermatology, University of California at San Francisco, San Francisco, CA, USA
| | - Tina Bhutani
- Department of Dermatology, University of California at San Francisco, San Francisco, CA, USA
| | - Wilson Liao
- Department of Dermatology, University of California at San Francisco, San Francisco, CA, USA.,Institute for Human Genetics, University of California at San Francisco, San Francisco, CA, USA
| |
Collapse
|
10
|
Li X, Zhou X, Ding S, Chen L, Feng K, Li H, Huang T, Cai YD. Identification of Transcriptome Biomarkers for Severe COVID-19 with Machine Learning Methods. Biomolecules 2022; 12:1735. [PMID: 36551164 PMCID: PMC9775121 DOI: 10.3390/biom12121735] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/18/2022] [Accepted: 11/18/2022] [Indexed: 11/24/2022] Open
Abstract
The rapid spread of COVID-19 has become a major concern for people's lives and health all around the world. COVID-19 patients in various phases and severity require individualized treatment given that different patients may develop different symptoms. We employed machine learning methods to discover biomarkers that may accurately classify COVID-19 in various disease states and severities in this study. The blood gene expression profiles from 50 COVID-19 patients without intensive care, 50 COVID-19 patients with intensive care, 10 non-COVID-19 individuals without intensive care, and 16 non-COVID-19 individuals with intensive care were analyzed. Boruta was first used to remove irrelevant gene features in the expression profiles, and then, the minimum redundancy maximum relevance was applied to sort the remaining features. The generated feature-ranked list was fed into the incremental feature selection method to discover the essential genes and build powerful classifiers. The molecular mechanism of some biomarker genes was addressed using recent studies, and biological functions enriched by essential genes were examined. Our findings imply that genes including UBE2C, PCLAF, CDK1, CCNB1, MND1, APOBEC3G, TRAF3IP3, CD48, and GZMA play key roles in defining the different states and severity of COVID-19. Thus, a new point of reference is provided for understanding the disease's etiology and facilitating a precise therapy.
Collapse
Affiliation(s)
- Xiaohong Li
- School of Biological and Food Engineering, Jilin Engineering Normal University, Changchun 130052, China
| | - Xianchao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Hao Li
- School of Biological and Food Engineering, Jilin Engineering Normal University, Changchun 130052, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
11
|
Ren J, Guo W, Feng K, Huang T, Cai Y. Identifying MicroRNA Markers That Predict COVID-19 Severity Using Machine Learning Methods. Life (Basel) 2022; 12:1964. [PMID: 36556329 PMCID: PMC9784129 DOI: 10.3390/life12121964] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 11/21/2022] [Accepted: 11/21/2022] [Indexed: 11/25/2022] Open
Abstract
Individuals with the SARS-CoV-2 infection may experience a wide range of symptoms, from being asymptomatic to having a mild fever and cough to a severe respiratory impairment that results in death. MicroRNA (miRNA), which plays a role in the antiviral effects of SARS-CoV-2 infection, has the potential to be used as a novel marker to distinguish between patients who have various COVID-19 clinical severities. In the current study, the existing blood expression profiles reported in two previous studies were combined for deep analyses. The final profiles contained 1444 miRNAs in 375 patients from six categories, which were as follows: 30 patients with mild COVID-19 symptoms, 81 patients with moderate COVID-19 symptoms, 30 non-COVID-19 patients with mild symptoms, 137 patients with severe COVID-19 symptoms, 31 non-COVID-19 patients with severe symptoms, and 66 healthy controls. An efficient computational framework containing four feature selection methods (LASSO, LightGBM, MCFS, and mRMR) and four classification algorithms (DT, KNN, RF, and SVM) was designed to screen clinical miRNA markers, and a high-precision RF model with a 0.780 weighted F1 was constructed. Some miRNAs, including miR-24-3p, whose differential expression was discovered in patients with acute lung injury complications brought on by severe COVID-19, and miR-148a-3p, differentially expressed against SARS-CoV-2 structural proteins, were identified, thereby suggesting the effectiveness and accuracy of our framework. Meanwhile, we extracted classification rules based on the DT model for the quantitative representation of the role of miRNA expression in differentiating COVID-19 patients with different severities. The search for novel biomarkers that could predict the severity of the disease could aid in the clinical diagnosis of COVID-19 and in exploring the specific mechanisms of the complications caused by SARS-CoV-2 infection. Moreover, new therapeutic targets for the disease may be found.
Collapse
Affiliation(s)
- Jingxin Ren
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yudong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
12
|
Lu J, Meng M, Zhou X, Ding S, Feng K, Zeng Z, Huang T, Cai YD. Identification of COVID-19 severity biomarkers based on feature selection on single-cell RNA-Seq data of CD8 + T cells. Front Genet 2022; 13:1053772. [PMID: 36437952 PMCID: PMC9682094 DOI: 10.3389/fgene.2022.1053772] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 10/27/2022] [Indexed: 07/30/2023] Open
Abstract
The global outbreak of the COVID-19 epidemic has become a major public health problem. COVID-19 virus infection triggers a complex immune response. CD8+ T cells, in particular, play an essential role in controlling the severity of the disease. However, the mechanism of the regulatory role of CD8+ T cells on COVID-19 remains poorly investigated. In this study, single-cell gene expression profiles from three CD8+ T cell subtypes (effector, memory, and naive T cells) were downloaded. Each cell subtype included three disease states, namely, acute COVID-19, convalescent COVID-19, and unexposed individuals. The profiles on each cell subtype were individually analyzed in the same way. Irrelevant features in the profiles were first excluded by the Boruta method. The remaining features for each CD8+ T cells subtype were further analyzed by Max-Relevance and Min-Redundancy, Monte Carlo feature selection, and light gradient boosting machine methods to obtain three feature lists. These lists were then brought into the incremental feature selection method to determine the optimal features for each cell subtype. Their corresponding genes may be latent biomarkers to determine COVID-19 severity. Genes, such as ZFP36, DUSP1, TCR, and IL7R, can be confirmed to play an immune regulatory role in COVID-19 infection and recovery. The results of functional enrichment analysis revealed that these important genes may be associated with immune functions, such as response to cAMP, response to virus, T cell receptor complex, T cell activation, and T cell differentiation. This study further set up different gene expression pattens, represented by classification rules, on three states of COVID-19 and constructed several efficient classifiers to distinguish COVID-19 severity. The findings of this study provided new insights into the biological processes of CD8+ T cells in regulating the immune response.
Collapse
Affiliation(s)
- Jian Lu
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
| | - Mei Meng
- State Key Laboratory of Oncogenes and Related Genes, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - XianChao Zhou
- State Key Laboratory of Oncogenes and Related Genes, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Zhenbing Zeng
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
13
|
Li H, Wang D, Zhou X, Ding S, Guo W, Zhang S, Li Z, Huang T, Cai YD. Characterization of spleen and lymph node cell types via CITE-seq and machine learning methods. Front Mol Neurosci 2022; 15:1033159. [PMID: 36311013 PMCID: PMC9608858 DOI: 10.3389/fnmol.2022.1033159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 09/26/2022] [Indexed: 11/13/2022] Open
Abstract
The spleen and lymph nodes are important functional organs for human immune system. The identification of cell types for spleen and lymph nodes is helpful for understanding the mechanism of immune system. However, the cell types of spleen and lymph are highly diverse in the human body. Therefore, in this study, we employed a series of machine learning algorithms to computationally analyze the cell types of spleen and lymph based on single-cell CITE-seq sequencing data. A total of 28,211 cell data (training vs. test = 14,435 vs. 13,776) involving 24 cell types were collected for this study. For the training dataset, it was analyzed by Boruta and minimum redundancy maximum relevance (mRMR) one by one, resulting in an mRMR feature list. This list was fed into the incremental feature selection (IFS) method, incorporating four classification algorithms (deep forest, random forest, K-nearest neighbor, and decision tree). Some essential features were discovered and the deep forest with its optimal features achieved the best performance. A group of related proteins (CD4, TCRb, CD103, CD43, and CD23) and genes (Nkg7 and Thy1) contributing to the classification of spleen and lymph nodes cell types were analyzed. Furthermore, the classification rules yielded by decision tree were also provided and analyzed. Above findings may provide helpful information for deepening our understanding on the diversity of cell types.
Collapse
Affiliation(s)
- Hao Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Deling Wang
- State Key Laboratory of Oncology in South China, Department of Radiology, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Xianchao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Institutes for Biological Sciences (SIBS), Shanghai Jiao Tong University School of Medicine (SJTUSM), Chinese Academy of Sciences (CAS), Shanghai, China
| | - Shiqi Zhang
- Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Zhandong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- Yu-Dong Cai,
| |
Collapse
|
14
|
Lu J, Li J, Ren J, Ding S, Zeng Z, Huang T, Cai YD. Functional and embedding feature analysis for pan-cancer classification. Front Oncol 2022; 12:979336. [PMID: 36248961 PMCID: PMC9559388 DOI: 10.3389/fonc.2022.979336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
With the increasing number of people suffering from cancer, this illness has become a major health problem worldwide. Exploring the biological functions and signaling pathways of carcinogenesis is essential for cancer detection and research. In this study, a mutation dataset for eleven cancer types was first obtained from a web-based resource called cBioPortal for Cancer Genomics, followed by extracting 21,049 features from three aspects: relationship to GO and KEGG (enrichment features), mutated genes learned by word2vec (text features), and protein-protein interaction network analyzed by node2vec (network features). Irrelevant features were then excluded using the Boruta feature filtering method, and the retained relevant features were ranked by four feature selection methods (least absolute shrinkage and selection operator, minimum redundancy maximum relevance, Monte Carlo feature selection and light gradient boosting machine) to generate four feature-ranked lists. Incremental feature selection was used to determine the optimal number of features based on these feature lists to build the optimal classifiers and derive interpretable classification rules. The results of four feature-ranking methods were integrated to identify key functional pathways, such as olfactory transduction (hsa04740) and colorectal cancer (hsa05210), and the roles of these functional pathways in cancers were discussed in reference to literature. Overall, this machine learning-based study revealed the altered biological functions of cancers and provided a reference for the mechanisms of different cancers.
Collapse
Affiliation(s)
- Jian Lu
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
| | - JiaRui Li
- Advanced Research Computing, University of British Columbia, Vancouver, BC, Canada
| | - Jingxin Ren
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Zhenbing Zeng
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
15
|
Jian F, Huang F, Zhang YH, Huang T, Cai YD. Identifying anal and cervical tumorigenesis-associated methylation signaling with machine learning methods. Front Oncol 2022; 12:998032. [PMID: 36249027 PMCID: PMC9557006 DOI: 10.3389/fonc.2022.998032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
Cervical and anal carcinoma are neoplastic diseases with various intraepithelial neoplasia stages. The underlying mechanisms for cancer initiation and progression have not been fully revealed. DNA methylation has been shown to be aberrantly regulated during tumorigenesis in anal and cervical carcinoma, revealing the important roles of DNA methylation signaling as a biomarker to distinguish cancer stages in clinics. In this research, several machine learning methods were used to analyze the methylation profiles on anal and cervical carcinoma samples, which were divided into three classes representing various stages of tumor progression. Advanced feature selection methods, including Boruta, LASSO, LightGBM, and MCFS, were used to select methylation features that are highly correlated with cancer progression. Some methylation probes including cg01550828 and its corresponding gene RNF168 have been reported to be associated with human papilloma virus-related anal cancer. As for biomarkers for cervical carcinoma, cg27012396 and its functional gene HDAC4 were confirmed to regulate the glycolysis and survival of hypoxic tumor cells in cervical carcinoma. Furthermore, we developed effective classifiers for identifying various tumor stages and derived classification rules that reflect the quantitative impact of methylation on tumorigenesis. The current study identified methylation signals associated with the development of cervical and anal carcinoma at qualitative and quantitative levels using advanced machine learning methods.
Collapse
Affiliation(s)
- Fangfang Jian
- Department of Obstetrics & Gynecology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
16
|
Liu Z, Meng M, Ding S, Zhou X, Feng K, Huang T, Cai YD. Identification of methylation signatures and rules for predicting the severity of SARS-CoV-2 infection with machine learning methods. Front Microbiol 2022; 13:1007295. [PMID: 36212830 PMCID: PMC9537378 DOI: 10.3389/fmicb.2022.1007295] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 09/01/2022] [Indexed: 11/17/2022] Open
Abstract
Patients infected with SARS-CoV-2 at various severities have different clinical manifestations and treatments. Mild or moderate patients usually recover with conventional medical treatment, but severe patients require prompt professional treatment. Thus, stratifying infected patients for targeted treatment is meaningful. A computational workflow was designed in this study to identify key blood methylation features and rules that can distinguish the severity of SARS-CoV-2 infection. First, the methylation features in the expression profile were deeply analyzed by a Monte Carlo feature selection method. A feature list was generated. Next, this ranked feature list was fed into the incremental feature selection method to determine the optimal features for different classification algorithms, thereby further building optimal classifiers. These selected key features were analyzed by functional enrichment to detect their biofunctional information. Furthermore, a set of rules were set up by a white-box algorithm, decision tree, to uncover different methylation patterns on various severity of SARS-CoV-2 infection. Some genes (PARP9, MX1, IRF7), corresponding to essential methylation sites, and rules were validated by published academic literature. Overall, this study contributes to revealing potential expression features and provides a reference for patient stratification. The physicians can prioritize and allocate health and medical resources for COVID-19 patients based on their predicted severe clinical outcomes.
Collapse
Affiliation(s)
- Zhiyang Liu
- School of Life Sciences, Changchun Sci-Tech University, Changchun, China
| | - Mei Meng
- State Key Laboratory of Oncogenes and Related Genes, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - ShiJian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - XiaoChao Zhou
- State Key Laboratory of Oncogenes and Related Genes, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- Yu-Dong Cai,
| |
Collapse
|
17
|
Yang L, Zhang YH, Huang F, Li Z, Huang T, Cai YD. Identification of protein–protein interaction associated functions based on gene ontology and KEGG pathway. Front Genet 2022; 13:1011659. [PMID: 36171880 PMCID: PMC9511048 DOI: 10.3389/fgene.2022.1011659] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
Protein–protein interactions (PPIs) are extremely important for gaining mechanistic insights into the functional organization of the proteome. The resolution of PPI functions can help in the identification of novel diagnostic and therapeutic targets with medical utility, thus facilitating the development of new medications. However, the traditional methods for resolving PPI functions are mainly experimental methods, such as co-immunoprecipitation, pull-down assays, cross-linking, label transfer, and far-Western blot analysis, that are not only expensive but also time-consuming. In this study, we constructed an integrated feature selection scheme for the large-scale selection of the relevant functions of PPIs by using the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotations of PPI participants. First, we encoded the proteins in each PPI with their gene ontologies and KEGG pathways. Then, the encoded protein features were refined as features of both positive and negative PPIs. Subsequently, Boruta was used for the initial filtering of features to obtain 5684 features. Three feature ranking algorithms, namely, least absolute shrinkage and selection operator, light gradient boosting machine, and max-relevance and min-redundancy, were applied to evaluate feature importance. Finally, the top-ranked features derived from multiple datasets were comprehensively evaluated, and the intersection of results mined by three feature ranking algorithms was taken to identify the features with high correlation with PPIs. Some functional terms were identified in our study, including cytokine–cytokine receptor interaction (hsa04060), intrinsic component of membrane (GO:0031224), and protein-binding biological process (GO:0005515). Our newly proposed integrated computational approach offers a novel perspective of the large-scale mining of biological functions linked to PPI.
Collapse
Affiliation(s)
- Lili Yang
- Measurement Biotechnique Research Center, School of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - ZhanDong Li
- Measurement Biotechnique Research Center, School of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
18
|
Lu S, Wang H, Zhang J. Identification of uveitis-associated functions based on the feature selection analysis of gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment scores. Front Mol Neurosci 2022; 15:1007352. [PMID: 36157069 PMCID: PMC9493498 DOI: 10.3389/fnmol.2022.1007352] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
Uveitis is a typical type of eye inflammation affecting the middle layer of eye (i.e., uvea layer) and can lead to blindness in middle-aged and young people. Therefore, a comprehensive study determining the disease susceptibility and the underlying mechanisms for uveitis initiation and progression is urgently needed for the development of effective treatments. In the present study, 108 uveitis-related genes are collected on the basis of literature mining, and 17,560 other human genes are collected from the Ensembl database, which are treated as non-uveitis genes. Uveitis- and non-uveitis-related genes are then encoded by gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment scores based on the genes and their neighbors in STRING, resulting in 20,681 GO term features and 297 KEGG pathway features. Subsequently, we identify functions and biological processes that can distinguish uveitis-related genes from other human genes by using an integrated feature selection method, which incorporate feature filtering method (Boruta) and four feature importance assessment methods (i.e., LASSO, LightGBM, MCFS, and mRMR). Some essential GO terms and KEGG pathways related to uveitis, such as GO:0001841 (neural tube formation), has04612 (antigen processing and presentation in human beings), and GO:0043379 (memory T cell differentiation), are identified. The plausibility of the association of mined functional features with uveitis is verified on the basis of the literature. Overall, several advanced machine learning methods are used in the current study to uncover specific functions of uveitis and provide a theoretical foundation for the clinical treatment of uveitis.
Collapse
Affiliation(s)
- Shiheng Lu
- Department of Ophthalmology, Shanghai Eye Disease Prevention and Treatment Center, Shanghai Eye Hospital, Shanghai, China
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai, China
- Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China
- National Clinical Research Center for Eye Diseases, Shanghai, China
- Shanghai Engineering Research Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
- *Correspondence: Shiheng Lu,
| | - Hui Wang
- Department of Orthopedics, Shanghai Yangpu Hospital of Traditional Chinese Medicine, Shanghai, China
| | - Jian Zhang
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai, China
- Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China
- National Clinical Research Center for Eye Diseases, Shanghai, China
- Shanghai Engineering Research Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
- Jian Zhang,
| |
Collapse
|
19
|
Identification of Human Cell Cycle Phase Markers Based on Single-Cell RNA-Seq Data by Using Machine Learning Methods. BIOMED RESEARCH INTERNATIONAL 2022; 2022:2516653. [PMID: 36004205 PMCID: PMC9393965 DOI: 10.1155/2022/2516653] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 07/25/2022] [Accepted: 07/29/2022] [Indexed: 12/17/2022]
Abstract
The cell cycle is composed of a series of ordered, highly regulated processes through which a cell grows and duplicates its genome and eventually divides into two daughter cells. According to the complex changes in cell structure and biosynthesis, the cell cycle is divided into four phases: gap 1 (G1), DNA synthesis (S), gap 2 (G2), and mitosis (M). Determining which cell cycle phases a cell is in is critical to the research of cancer development and pharmacy for targeting cell cycle. However, current detection methods have the following problems: (1) they are complicated and time consuming to perform, and (2) they cannot detect the cell cycle on a large scale. Rapid developments in single-cell technology have made dissecting cells on a large scale possible with unprecedented resolution. In the present research, we construct efficient classifiers and identify essential gene biomarkers based on single-cell RNA sequencing data through Boruta and three feature ranking algorithms (e.g., mRMR, MCFS, and SHAP by LightGBM) by utilizing four advanced classification algorithms. Meanwhile, we mine a series of classification rules that can distinguish different cell cycle phases. Collectively, we have provided a novel method for determining the cell cycle and identified new potential cell cycle-related genes, thereby contributing to the understanding of the processes that regulate the cell cycle.
Collapse
|
20
|
Song J, Huang F, Chen L, Feng K, Jian F, Huang T, Cai YD. Identification of methylation signatures associated with CAR T cell in B-cell acute lymphoblastic leukemia and non-hodgkin’s lymphoma. Front Oncol 2022; 12:976262. [PMID: 36033519 PMCID: PMC9402909 DOI: 10.3389/fonc.2022.976262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 07/25/2022] [Indexed: 11/13/2022] Open
Abstract
CD19-targeted CAR T cell immunotherapy has exceptional efficacy for the treatment of B-cell malignancies. B-cell acute lymphocytic leukemia and non-Hodgkin’s lymphoma are two common B-cell malignancies with high recurrence rate and are refractory to cure. Although CAR T-cell immunotherapy overcomes the limitations of conventional treatments for such malignancies, failure of treatment and tumor recurrence remain common. In this study, we searched for important methylation signatures to differentiate CAR-transduced and untransduced T cells from patients with acute lymphoblastic leukemia and non-Hodgkin’s lymphoma. First, we used three feature ranking methods, namely, Monte Carlo feature selection, light gradient boosting machine, and least absolute shrinkage and selection operator, to rank all methylation features in order of their importance. Then, the incremental feature selection method was adopted to construct efficient classifiers and filter the optimal feature subsets. Some important methylated genes, namely, SERPINB6, ANK1, PDCD5, DAPK2, and DNAJB6, were identified. Furthermore, the classification rules for distinguishing different classes were established, which can precisely describe the role of methylation features in the classification. Overall, we applied advanced machine learning approaches to the high-throughput data, investigating the mechanism of CAR T cells to establish the theoretical foundation for modifying CAR T cells.
Collapse
Affiliation(s)
- Jiwei Song
- College of Life Science, Changchun Sci-Tech University, Shuangyang, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Fangfang Jian
- Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
21
|
Li H, Huang F, Liao H, Li Z, Feng K, Huang T, Cai YD. Identification of COVID-19-Specific Immune Markers Using a Machine Learning Method. Front Mol Biosci 2022; 9:952626. [PMID: 35928229 PMCID: PMC9344575 DOI: 10.3389/fmolb.2022.952626] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 06/21/2022] [Indexed: 01/08/2023] Open
Abstract
Notably, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a tight relationship with the immune system. Human resistance to COVID-19 infection comprises two stages. The first stage is immune defense, while the second stage is extensive inflammation. This process is further divided into innate and adaptive immunity during the immune defense phase. These two stages involve various immune cells, including CD4+ T cells, CD8+ T cells, monocytes, dendritic cells, B cells, and natural killer cells. Various immune cells are involved and make up the complex and unique immune system response to COVID-19, providing characteristics that set it apart from other respiratory infectious diseases. In the present study, we identified cell markers for differentiating COVID-19 from common inflammatory responses, non-COVID-19 severe respiratory diseases, and healthy populations based on single-cell profiling of the gene expression of six immune cell types by using Boruta and mRMR feature selection methods. Some features such as IFI44L in B cells, S100A8 in monocytes, and NCR2 in natural killer cells are involved in the innate immune response of COVID-19. Other features such as ZFP36L2 in CD4+ T cells can regulate the inflammatory process of COVID-19. Subsequently, the IFS method was used to determine the best feature subsets and classifiers in the six immune cell types for two classification algorithms. Furthermore, we established the quantitative rules used to distinguish the disease status. The results of this study can provide theoretical support for a more in-depth investigation of COVID-19 pathogenesis and intervention strategies.
Collapse
Affiliation(s)
- Hao Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Feiming Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Huiping Liao
- Ophthalmology and Optometry Medical School, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Zhandong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
22
|
Li Z, Huang F, Chen L, Huang T, Cai YD. Identifying In Vitro Cultured Human Hepatocytes Markers with Machine Learning Methods Based on Single-Cell RNA-Seq Data. Front Bioeng Biotechnol 2022; 10:916309. [PMID: 35706505 PMCID: PMC9189284 DOI: 10.3389/fbioe.2022.916309] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 05/11/2022] [Indexed: 01/12/2023] Open
Abstract
Cell transplantation is an effective method for compensating for the loss of liver function and improve patient survival. However, given that hepatocytes cultivated in vitro have diverse developmental processes and physiological features, obtaining hepatocytes that can properly function in vivo is difficult. In the present study, we present an advanced computational analysis on single-cell transcriptional profiling to resolve the heterogeneity of the hepatocyte differentiation process in vitro and to mine biomarkers at different periods of differentiation. We obtained a batch of compressed and effective classification features with the Boruta method and ranked them using the Max-Relevance and Min-Redundancy method. Some key genes were identified during the in vitro culture of hepatocytes, including CD147, which not only regulates terminally differentiated cells in the liver but also affects cell differentiation. PPIA, which encodes a CD147 ligand, also appeared in the identified gene list, and the combination of the two proteins mediated multiple biological pathways. Other genes, such as TMSB10, TMEM176B, and CD63, which are involved in the maturation and differentiation of hepatocytes and assist different hepatic cell types in performing their roles were also identified. Then, several classifiers were trained and evaluated to obtain optimal classifiers and optimal feature subsets, using three classification algorithms (random forest, k-nearest neighbor, and decision tree) and the incremental feature selection method. The best random forest classifier with a 0.940 Matthews correlation coefficient was constructed to distinguish different hepatic cell types. Finally, classification rules were created for quantitatively describing hepatic cell types. In summary, This study provided potential targets for cell transplantation associated liver disease treatment strategies by elucidating the process and mechanism of hepatocyte development at both qualitative and quantitative levels.
Collapse
Affiliation(s)
- ZhanDong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
23
|
Huang F, Chen L, Guo W, Zhou X, Feng K, Huang T, Cai Y. Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method. Life (Basel) 2022; 12:806. [PMID: 35743837 PMCID: PMC9225528 DOI: 10.3390/life12060806] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/22/2022] [Accepted: 05/25/2022] [Indexed: 12/22/2022] Open
Abstract
SARS-CoV-2 shows great evolutionary capacity through a high frequency of genomic variation during transmission. Evolved SARS-CoV-2 often demonstrates resistance to previous vaccines and can cause poor clinical status in patients. Mutations in the SARS-CoV-2 genome involve mutations in structural and nonstructural proteins, and some of these proteins such as spike proteins have been shown to be directly associated with the clinical status of patients with severe COVID-19 pneumonia. In this study, we collected genome-wide mutation information of virulent strains and the severity of COVID-19 pneumonia in patients varying depending on their clinical status. Important protein mutations and untranslated region mutations were extracted using machine learning methods. First, through Boruta and four ranking algorithms (least absolute shrinkage and selection operator, light gradient boosting machine, max-relevance and min-redundancy, and Monte Carlo feature selection), mutations that were highly correlated with the clinical status of the patients were screened out and sorted in four feature lists. Some mutations such as D614G and V1176F were shown to be associated with viral infectivity. Moreover, previously unreported mutations such as A320V of nsp14 and I164ILV of nsp14 were also identified, which suggests their potential roles. We then applied the incremental feature selection method to each feature list to construct efficient classifiers, which can be directly used to distinguish the clinical status of COVID-19 patients. Meanwhile, four sets of quantitative rules were set up, which can help us to more intuitively understand the role of each mutation in differentiating the clinical status of COVID-19 patients. Identified key mutations linked to virologic properties will help better understand the mechanisms of infection and will aid in the development of antiviral treatments.
Collapse
Affiliation(s)
- Feiming Huang
- School of Life Sciences, Shanghai University, Shanghai 200444, China;
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China;
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) and Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200025, China;
| | - Xianchao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai 200025, China;
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510060, China;
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yudong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China;
| |
Collapse
|