1
|
Ren J, Gao Q, Zhou X, Chen L, Guo W, Feng K, Hu J, Huang T, Cai YD. Identification of gene and protein signatures associated with long-term effects of COVID-19 on the immune system after patient recovery by analyzing single-cell multi-omics data using a machine learning approach. Vaccine 2024; 42:126253. [PMID: 39182316 DOI: 10.1016/j.vaccine.2024.126253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 08/17/2024] [Accepted: 08/17/2024] [Indexed: 08/27/2024]
Abstract
Viral infections significantly impact the immune system, and impact will persist until recovery. However, the influence of severe acute respiratory syndrome coronavirus 2 infection on the homeostatic immune status and secondary immune response in recovered patients remains unclear. To investigate these persistent alterations, we employed five feature-ranking algorithms (LASSO, MCFS, RF, CATBoost, and XGBoost), incremental feature selection, synthetic minority oversampling technique and two classification algorithms (decision tree and k-nearest neighbors) to analyze multi-omics data (surface proteins and transcriptome) from coronavirus disease 2019 (COVID-19) recovered patients and healthy controls post-influenza vaccination. The single-cell multi-omics dataset was divided into five subsets corresponding to five immune cell subtypes: B cells, CD4+ T cells, CD8+ T cells, Monocytes, and Natural Killer cells. Each cell was represented by 28,402 scRNA-seq (RNA) features, 3 Hash Tag Oligo (HTO) features, 138 Cellular indexing of transcriptomes and epitopes by sequencing (CITE) features and 23,569 Single Cell Transform (SCT) features. Some multi-omics markers were identified and effective classifiers were constructed. Our findings indicate a distinct immune status in COVID-19 recovered patients, characterized by low expression of ribosomal protein (RPS26) and high expression of immune cell surface proteins (CD33, CD48). Notably, TMEM176B, a membrane protein, was highly expressed in monocytes of COVID-19 convalescent patients. These observations aid in discerning molecular differences among immune cell subtypes and contribute to understanding the prolonged effects of COVID-19 on the immune system, which is valuable for treating infectious diseases like COVID-19.
Collapse
Affiliation(s)
- JingXin Ren
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| | - Qian Gao
- Department of Pharmacy, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
| | - XianChao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China.
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Jerry Hu
- Department of Natural Sciences and Mathematics, College of Natural and Applied Science, University of Houston - Victoria, Victoria, TX 77901, USA.
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China; CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| |
Collapse
|
2
|
Zhang L, Cai R, Wang C, Liu J, Kuang Z, Wang H. Prediction of Multiple Degenerative Diseases Based on DNA Methylation in a Co-Physiology Mechanisms Perspective. Int J Mol Sci 2024; 25:9514. [PMID: 39273460 PMCID: PMC11395594 DOI: 10.3390/ijms25179514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 08/26/2024] [Accepted: 08/27/2024] [Indexed: 09/15/2024] Open
Abstract
Degenerative diseases oftentimes occur within the continuous process of aging, and the corresponding clinical manifestations may be neurodegeneration, neoplastic diseases, or various human complex diseases. DNA methylation provides the opportunity to explore aging and degenerative diseases as epigenetic traits. It has already been applied to age prediction and disease diagnosis. It has been shown that various degenerative diseases share co-physiology mechanisms with each other, clues of which may be gained from studying the aging process. Here, we endeavor to predict the risk of degenerative diseases in an aging-relevant comorbid mechanism perspective. Firstly, an epigenetic clock method was implemented based on a multi-scale convolutional neural network, and a Shapley feature attribution analysis was applied to discover the aging-related CpG sites. Then, these sites were further screened to a smaller subset composed of 196 sites by using biomics analysis according to their biological functions and mechanisms. Finally, we constructed a multilayer perceptron (MLP)-based degenerative disease risk prediction model, Mlp-DDR, which was well trained and tested to accurately classify nine degenerative diseases. Recent studies also suggest that DNA methylation plays a significant role in conditions like osteoporosis and osteoarthritis, broadening the potential applications of our model. This approach significantly advances the ability to understand degenerative diseases and represents a substantial shift from traditional diagnostic methods. Despite the promising results, limitations regarding model complexity and dataset diversity suggest directions for future research, including the development of tissue-specific epigenetic clocks and the inclusion of a wider range of diseases.
Collapse
Affiliation(s)
- Li Zhang
- College of Computer Science and Engineering, Changchun University of Technology, Changchun 130051, China
| | - Ruirui Cai
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun 130117, China
| | - Chencai Wang
- College of Computer Science and Engineering, Changchun University of Technology, Changchun 130051, China
| | - Jialong Liu
- College of Computer Science and Engineering, Changchun University of Technology, Changchun 130051, China
| | - Zhejun Kuang
- School of Cyber Security, School of Computer Science and Technology, Changchun University, Changchun 130022, China
| | - Han Wang
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun 130117, China
| |
Collapse
|
3
|
Park MY, Lee S, Kim HH, Jeong SH, Abusaliya A, Bhosale PB, Seong JK, Park KI, Heo JD, Ahn M, Kim HW, Kim GS. Correlation with Apoptosis Process through RNA-Seq Data Analysis of Hep3B Hepatocellular Carcinoma Cells Treated with Glehnia littoralis Extract (GLE). Int J Mol Sci 2024; 25:9462. [PMID: 39273406 PMCID: PMC11394729 DOI: 10.3390/ijms25179462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/27/2024] [Accepted: 08/29/2024] [Indexed: 09/15/2024] Open
Abstract
Glehnia littoralis is a perennial herb found in coastal sand dunes throughout East Asia. This herb has been reported to have hepatoprotective, immunomodulatory, antioxidant, antibacterial, antifungal, anti-inflammatory, and anticancer activities. It may be effective against hepatocellular carcinoma (HCC). However, whether this has been proven through gene-level RNA-seq analysis is still being determined. Therefore, we are attempting to identify target genes for the cell death process by analyzing the transcriptome of Hep3B cells among HCC treated with GLE (Glehnia littoralis extract) using RNA-seq. Hep3B was used for the GLE treatment, and the MTT test was performed. Hep3B was then treated with GLE at a set concentration of 300 μg/mL and stored for 24 h, followed by RNA isolation and sequencing. We then used the data to create a plot. As a result of the MTT analysis, cell death was observed when Hep3B cells were treated with GLE, and the IC50 was about 300 μg/mL. As a result of making plots using the RNA-seq data of Hep3B treated with 300 μg/mL GLE, a tendency for the apoptotic process was found. Flow cytometry and annexin V/propidium iodide (PI) staining verified the apoptosis of HEP3B cells treated with GLE. Therefore, an increase or decrease in the DEGs involved in the apoptosis process was confirmed. The top five genes increased were GADD45B, DDIT3, GADD45G, CHAC1, and PPP1R15A. The bottom five genes decreased were SGK1, CX3CL1, ZC3H12A, IER3, and HNF1A. In summary, we investigated the RNA-seq dataset of GLE to identify potential targets that may be involved in the apoptotic process in HCC. These goals may aid in the identification and management of HCC.
Collapse
Affiliation(s)
- Min-Yeong Park
- Research Institute of Life Science and College of Veterinary Medicine, Gyeongsang National University, Gazwa, Jinju 52828, Republic of Korea
| | - Sujin Lee
- Research Institute of Molecular Alchemy, Gyeongsang National University, 501, Jinju-daero, Jinju 52828, Republic of Korea
| | - Hun-Hwan Kim
- Research Institute of Life Science and College of Veterinary Medicine, Gyeongsang National University, Gazwa, Jinju 52828, Republic of Korea
| | - Se-Hyo Jeong
- Research Institute of Life Science and College of Veterinary Medicine, Gyeongsang National University, Gazwa, Jinju 52828, Republic of Korea
| | - Abuyaseer Abusaliya
- Research Institute of Life Science and College of Veterinary Medicine, Gyeongsang National University, Gazwa, Jinju 52828, Republic of Korea
| | - Pritam Bhangwan Bhosale
- Research Institute of Life Science and College of Veterinary Medicine, Gyeongsang National University, Gazwa, Jinju 52828, Republic of Korea
| | - Je-Kyung Seong
- Laboratory of Developmental Biology and Genomics, BK21 PLUS Program for Creative Veterinary Science Research, Research Institute for Veterinary Science, College of Veterinary Medicine, Seoul National University, Seoul 08826, Republic of Korea
| | - Kwang-Il Park
- Research Institute of Life Science and College of Veterinary Medicine, Gyeongsang National University, Gazwa, Jinju 52828, Republic of Korea
| | - Jeong-Doo Heo
- Biological Resources Research Group, Gyeongnam Department of Environment Toxicology and Chemistry, Korea Institute of Toxicology, 17 Jegok-gil, Jinju 52834, Republic of Korea
| | - Meejung Ahn
- Department of Animal Science, College of Life Science, Sangji University, Wonju 26339, Republic of Korea
| | - Hyun-Wook Kim
- Division of Animal Bioscience and Integrated Biotechnology, Jinju 52725, Republic of Korea
| | - Gon-Sup Kim
- Research Institute of Life Science and College of Veterinary Medicine, Gyeongsang National University, Gazwa, Jinju 52828, Republic of Korea
| |
Collapse
|
4
|
Ma Q, Zhang YH, Guo W, Feng K, Huang T, Cai YD. Machine Learning in Identifying Marker Genes for Congenital Heart Diseases of Different Cardiac Cell Types. Life (Basel) 2024; 14:1032. [PMID: 39202774 PMCID: PMC11355424 DOI: 10.3390/life14081032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 07/31/2024] [Accepted: 08/14/2024] [Indexed: 09/03/2024] Open
Abstract
Congenital heart disease (CHD) represents a spectrum of inborn heart defects influenced by genetic and environmental factors. This study advances the field by analyzing gene expression profiles in 21,034 cardiac fibroblasts, 73,296 cardiomyocytes, and 35,673 endothelial cells, utilizing single-cell level analysis and machine learning techniques. Six CHD conditions: dilated cardiomyopathy (DCM), donor hearts (used as healthy controls), hypertrophic cardiomyopathy (HCM), heart failure with hypoplastic left heart syndrome (HF_HLHS), Neonatal Hypoplastic Left Heart Syndrome (Neo_HLHS), and Tetralogy of Fallot (TOF), were investigated for each cardiac cell type. Each cell sample was represented by 29,266 gene features. These features were first analyzed by six feature-ranking algorithms, resulting in several feature lists. Then, these lists were fed into incremental feature selection, containing two classification algorithms, to extract essential gene features and classification rules and build efficient classifiers. The identified essential genes can be potential CHD markers in different cardiac cell types. For instance, the LASSO identified key genes specific to various heart cell types in CHD subtypes. FOXO3 was found to be up-regulated in cardiac fibroblasts for both Dilated and hypertrophic cardiomyopathy. In cardiomyocytes, distinct genes such as TMTC1, ART3, ARHGAP24, SHROOM3, and XIST were linked to dilated cardiomyopathy, Neo-Hypoplastic Left Heart Syndrome, hypertrophic cardiomyopathy, HF-Hypoplastic Left Heart Syndrome, and Tetralogy of Fallot, respectively. Endothelial cell analysis further revealed COL25A1, NFIB, and KLF7 as significant genes for dilated cardiomyopathy, hypertrophic cardiomyopathy, and Tetralogy of Fallot. LightGBM, Catboost, MCFS, RF, and XGBoost further delineated key genes for specific CHD subtypes, demonstrating the efficacy of machine learning in identifying CHD-specific genes. Additionally, this study developed quantitative rules for representing the gene expression patterns related to CHDs. This research underscores the potential of machine learning in unraveling the molecular complexities of CHD and establishes a foundation for future mechanism-based studies.
Collapse
Affiliation(s)
- Qinglan Ma
- School of Life Sciences, Shanghai University, Shanghai 200444, China;
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA;
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China;
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China;
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China;
| |
Collapse
|
5
|
Ding S, Liao H, Huang F, Chen L, Guo W, Feng K, Huang T, Cai YD. Analyzing domain features of small proteins using a machine-learning method. Proteomics 2024; 24:e2300302. [PMID: 38258387 DOI: 10.1002/pmic.202300302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 01/14/2024] [Accepted: 01/15/2024] [Indexed: 01/24/2024]
Abstract
Small proteins (SPs) are a unique group of proteins that play crucial roles in many important biological processes. Exploring the biological function of SPs is necessary. In this study, the InterPro tool and the maximum correlation method were utilized to analyze functional domains of SPs. The purpose was to identify important functional domains that can indicate the essential differences between small and large protein sequences. First, the small and large proteins were represented by their functional domains via a one-hot scheme. Then, the MaxRel method was adopted to evaluate the relationships between each domain and the target variable, indicating small or large protein. The top 36 domain features were selected for further investigation. Among them, 14 were deemed to be highly related to SPs because they were annotated to SPs more frequently than large proteins. We found the involvement of functional domains, such as ubiquitin-conjugating enzyme/RWD-like, nuclear transport factor 2 domain, and alpha subunit of guanine nucleotide-binding protein (G-protein) in regulating the biological function of SPs. The involvement of these domains has been confirmed by other recent studies. Our findings indicate that protein functional domains may regulate small protein-related functions and predict their biological activity.
Collapse
Affiliation(s)
- ShiJian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | | | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
6
|
Rostami F, Tavakol Hamedani Z, Sadoughi A, Mehrabadi M, Kouhkan F. PDL1 targeting by miR-138-5p amplifies anti-tumor immunity and Jurkat cells survival in non-small cell lung cancer. Sci Rep 2024; 14:13542. [PMID: 38866824 PMCID: PMC11169246 DOI: 10.1038/s41598-024-62064-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 05/13/2024] [Indexed: 06/14/2024] Open
Abstract
Non-small cell lung cancer (NSCLC) has constituted over 80% of the lung cancer population with a poor prognosis. Over the past decade, immunotherapy has been constructed in the enlargement of immune checkpoint inhibitors as a promising approach for NSCLC treatment. Evading the immune system using the PD-1/PD-L1 axis is an intelligent way for cancers, and T cells cannot respond fully and confront cancer. Recently, the miR-138 was reported as a PD-L1 regulator in NSCLC. However, its inhibitory impact on T-cell exhaustion has not been characterized. The present study aims to impair PD-L1 (B7-H1) expression in Adenocarcinoma cell lines using miR-138-5p and determines how it prevents Jurak cell exhaustion. To gain the purpose, first, 18 highly significant dysregulated miRNAs containing hsa-miR-138 and CD274-mRNA network were detected in NSCLC based on bioinformatics analysis. Moreover, our study revealed a high level of miR-138-5p could make significant changes like PDL1 downregulation, proliferation, and mortality rate in A549/Calu6 cells. We also simulate cancer environmental conditions by culturing Jurak cells and NSCLC cell lines under the influence of stimulator cytokines to show how miR-138-5p survives Jurak cells by targeting PD-L1/PD-1pathway.
Collapse
Affiliation(s)
- Fatemeh Rostami
- Stem Cell Technology Research Center (STRC), Iran University of Medical Science (IUMS), P.O. Box: 15856-36473, Tehran, 15856-36473, Iran
| | | | - Azadeh Sadoughi
- Department of Biology, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Marzieh Mehrabadi
- Stem Cell Technology Research Center (STRC), Iran University of Medical Science (IUMS), P.O. Box: 15856-36473, Tehran, 15856-36473, Iran
| | - Fatemeh Kouhkan
- Stem Cell Technology Research Center (STRC), Iran University of Medical Science (IUMS), P.O. Box: 15856-36473, Tehran, 15856-36473, Iran.
| |
Collapse
|
7
|
Qiu Y, Huang T, Cai YD. Review of predicting protein stability changes upon variations. Proteomics 2024; 24:e2300371. [PMID: 38643379 DOI: 10.1002/pmic.202300371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/07/2024] [Accepted: 04/08/2024] [Indexed: 04/22/2024]
Abstract
Forecasting alterations in protein stability caused by variations holds immense importance. Improving the thermal stability of proteins is important for biomedical and industrial applications. This review discusses the latest methods for predicting the effects of mutations on protein stability, databases containing protein mutations and thermodynamic parameters, and experimental techniques for efficiently assessing protein stability in high-throughput settings. Various publicly available databases for protein stability prediction are introduced. Furthermore, state-of-the-art computational approaches for anticipating protein stability changes due to variants are reviewed. Each method's types of features, base algorithm, and prediction results are also detailed. Additionally, some experimental approaches for verifying the prediction results of computational methods are introduced. Finally, the review summarizes the progress and challenges of protein stability prediction and discusses potential models for future research directions.
Collapse
Affiliation(s)
- Yiling Qiu
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
8
|
Li L, Huang F, Zhang YH, Cai YD. Identifying allergic-rhinitis-associated genes with random-walk-based method in PPI network. Comput Biol Med 2024; 175:108495. [PMID: 38697003 DOI: 10.1016/j.compbiomed.2024.108495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/21/2024] [Accepted: 04/21/2024] [Indexed: 05/04/2024]
Abstract
Allergic rhinitis is a common allergic disease with a complex pathogenesis and many unresolved issues. Studies have shown that the incidence of allergic rhinitis is closely related to genetic factors, and research on the related genes could help further understand its pathogenesis and develop new treatment methods. In this study, 446 allergic rhinitis-related genes were obtained on the basis of the DisGeNET database. The protein-protein interaction network was searched using the random-walk-with-restart algorithm with these 446 genes as seed nodes to assess the linkages between other genes and allergic rhinitis. Then, this result was further examined by three screening tests, including permutation, interaction, and enrichment tests, which aimed to pick up genes that have strong and special associations with allergic rhinitis. 52 novel genes were finally obtained. The functional enrichment test confirmed their relationships to the biological processes and pathways related to allergic rhinitis. Furthermore, some genes were extensively analyzed to uncover their special or latent associations to allergic rhinitis, including IRAK2 and MAPK, which are involved in the pathogenesis of allergic rhinitis and the inhibition of allergic inflammation via the p38-MAPK pathway, respectively. The new found genes may help the following investigations for understanding the underlying molecular mechanisms of allergic rhinitis and developing effective treatments.
Collapse
Affiliation(s)
- Lin Li
- Department of Otolaryngology and Head&neck, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi Medical Center, Nanjing Medical University, Wuxi, 214023, China; Department of Otolaryngology and Head&neck, China-Japan Union Hospital, Jilin University, Changchun, 130033, China.
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
9
|
Zhang YH, Huang F, Li J, Shen W, Chen L, Feng K, Huang T, Cai YD. Identification of Protein-Protein Interaction Associated Functions Based on Gene Ontology. Protein J 2024; 43:477-486. [PMID: 38436837 DOI: 10.1007/s10930-024-10180-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/07/2024] [Indexed: 03/05/2024]
Abstract
Protein-protein interactions (PPIs) involve the physical or functional contact between two or more proteins. Generally, proteins that can interact with each other always have special relationships. Some previous studies have reported that gene ontology (GO) terms are related to the determination of PPIs, suggesting the special patterns on the GO terms of proteins in PPIs. In this study, we explored the special GO term patterns on human PPIs, trying to uncover the underlying functional mechanism of PPIs. The experimental validated human PPIs were retrieved from STRING database, which were termed as positive samples. Additionally, we randomly paired proteins occurring in positive samples, yielding lots of negative samples. A simple calculation was conducted to count the number of positive samples for each GO term pair, where proteins in samples were annotated by GO terms in the pair individually. The similar number for negative samples was also counted and further adjusted due to the great gap between the numbers of positive and negative samples. The difference of the above two numbers and the relative ratio compared with the number on positive samples were calculated. This ratio provided a precise evaluation of the occurrence of GO term pairs for positive samples and negative samples, indicating the latent GO term patterns for PPIs. Our analysis unveiled several nuclear biological processes, including gene transcription, cell proliferation, and nutrient metabolism, as key biological functions. Interactions between major proliferative or metabolic GO terms consistently correspond with significantly reported PPIs in recent literature.
Collapse
Affiliation(s)
- Yu-Hang Zhang
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China
| | - JiaBo Li
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, People's Republic of China
| | - WenFeng Shen
- School of Computer and Information Engineering, Shanghai Polytechnic University, Shanghai, 201209, People's Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, 510507, People's Republic of China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China.
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.
| |
Collapse
|
10
|
Ren J, Gao Q, Zhou X, Chen L, Guo W, Feng K, Huang T, Cai YD. Identification of key gene expression associated with quality of life after recovery from COVID-19. Med Biol Eng Comput 2024; 62:1031-1048. [PMID: 38123886 DOI: 10.1007/s11517-023-02988-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 11/30/2023] [Indexed: 12/23/2023]
Abstract
Post-acute sequelae of COVID-19 (PASC) is a persistent complication of severe acute respiratory syndrome coronavirus 2 infection that includes symptoms, such as fatigue, cognitive impairment, and respiratory distress. These symptoms severely affect the quality of life of patients after their recovery from COVID-19. In this study, a group of machine learning algorithms analyzed the whole blood RNA-seq data from patients with different PASC levels. The purpose of this analysis was to identify the gene markers associated with PASC and the special expression patterns for different PASC levels. By comparing the quality of life of patients after the acute phase of COVID-19 and before the disease, samples in the dataset were divided into three groups, namely, "Better," "The Same," and "Worse." Each patient was represented by the expression levels of 58,929 genes. The machine learning-based workflow included six feature-ranking algorithms, incremental feature selection (IFS), and four classification algorithms. The feature ranking algorithms were in charge of assessing feature importance, whereas IFS with classification algorithms were used to extract essential genes and to construct efficient classifiers and classification rules. The expression of top genes in the results was associated with the immune response to viral infection, which is supported by the published literature. For example, patients with low CCDC18 expression and high CPED1 expression had good quality of life, whereas those with low CDC16 expression had poor quality of life.
Collapse
Affiliation(s)
- JingXin Ren
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Qian Gao
- Department of Pharmacy, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - XianChao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, 200030, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, 510507, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
11
|
Junaid M, Lu H, Li Y, Liu Y, Din AU, Qi Z, Xiong Y, Yan J. Novel Synergistic Probiotic Intervention: Transcriptomic and Metabolomic Analysis Reveals Ameliorative Effects on Immunity, Gut Barrier, and Metabolism of Mice during Salmonella typhimurium Infection. Genes (Basel) 2024; 15:435. [PMID: 38674370 PMCID: PMC11050207 DOI: 10.3390/genes15040435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 03/25/2024] [Accepted: 03/27/2024] [Indexed: 04/28/2024] Open
Abstract
Salmonella typhimurium (S. typhimurium), a prevalent cause of foodborne infection, induces significant changes in the host transcriptome and metabolome. The lack of therapeutics with minimal or no side effects prompts the scientific community to explore alternative therapies. This study investigates the therapeutic potential of a probiotic mixture comprising Lactobacillus acidophilus (L. acidophilus 1.3251) and Lactobacillus plantarum (L. plantarum 9513) against S. typhimurium, utilizing transcriptome and metabolomic analyses, a novel approach that has not been previously documented. Twenty-four SPF-BALB/c mice were divided into four groups: control negative group (CNG); positive control group (CPG); probiotic-supplemented non-challenged group (LAPG); and probiotic-supplemented Salmonella-challenged group (LAPST). An RNA-sequencing analysis of small intestinal (ileum) tissue revealed 2907 upregulated and 394 downregulated DEGs in the LAPST vs. CPG group. A functional analysis of DEGs highlighted their significantly altered gene ontology (GO) terms related to metabolism, gut integrity, cellular development, and immunity (p ≤ 0.05). The KEGG analysis showed that differentially expressed genes (DEGs) in the LAPST group were primarily involved in pathways related to gut integrity, immunity, and metabolism, such as MAPK, PI3K-Akt, AMPK, the tryptophan metabolism, the glycine, serine, and threonine metabolism, ECM-receptor interaction, and others. Additionally, the fecal metabolic analysis identified 1215 upregulated and 305 downregulated metabolites in the LAPST vs. CPG group, implying their involvement in KEGG pathways including bile secretion, propanoate metabolism, arginine and proline metabolism, amino acid biosynthesis, and protein digestion and absorption, which are vital for maintaining barrier integrity, immunity, and metabolism. In conclusion, these findings suggest that the administration of a probiotic mixture improves immunity, maintains gut homeostasis and barrier integrity, and enhances metabolism in Salmonella infection.
Collapse
Affiliation(s)
- Muhammad Junaid
- Medical College, Guangxi University, Nanning 530004, China; (M.J.); (H.L.); (Y.L.); (Y.L.); (Z.Q.)
| | - Hongyu Lu
- Medical College, Guangxi University, Nanning 530004, China; (M.J.); (H.L.); (Y.L.); (Y.L.); (Z.Q.)
| | - Yixiang Li
- Medical College, Guangxi University, Nanning 530004, China; (M.J.); (H.L.); (Y.L.); (Y.L.); (Z.Q.)
| | - Yu Liu
- Medical College, Guangxi University, Nanning 530004, China; (M.J.); (H.L.); (Y.L.); (Y.L.); (Z.Q.)
| | - Ahmad Ud Din
- Plants for Human Health Institute, North Carolina State University, 600 Laureate Way, Kannapolis, NC 28081, USA
| | - Zhongquan Qi
- Medical College, Guangxi University, Nanning 530004, China; (M.J.); (H.L.); (Y.L.); (Y.L.); (Z.Q.)
| | - Yi Xiong
- Guangxi Center for Animals Disease Control and Prevention, Nanning 530004, China
| | - Jianhua Yan
- Medical College, Guangxi University, Nanning 530004, China; (M.J.); (H.L.); (Y.L.); (Y.L.); (Z.Q.)
| |
Collapse
|
12
|
Ma Q, Chen L, Feng K, Guo W, Huang T, Cai YD. Exploring Prognostic Gene Factors in Breast Cancer via Machine Learning. Biochem Genet 2024:10.1007/s10528-024-10712-w. [PMID: 38383836 DOI: 10.1007/s10528-024-10712-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 01/21/2024] [Indexed: 02/23/2024]
Abstract
Breast cancer remains the most prevalent cancer in women. To date, its underlying molecular mechanisms have not been fully uncovered. The determination of gene factors is important to improve our understanding on breast cancer, which can correlate the specific gene expression and tumor staging. However, the knowledge in this regard is still far from complete. Thus, this study aimed to explore these knowledge gaps by analyzing existing gene expression profile data from 3149 breast cancer samples, where each sample was represented by the expression of 19,644 genes and classified into Nottingham histological grade (NHG) classes (Grade 1, 2, and 3). To this end, a machine learning-based framework was designed. First, the profile data were analyzed by using seven feature ranking algorithms to evaluate the importance of features (genes). Seven feature lists were generated, each of which sorted features in accordance with feature importance evaluated from a special aspect. Then, the incremental feature selection method was applied to each list to determine essential features for classification and building efficient classifiers. Consequently, overlapping genes, such as AURKA, CBX2, and MYBL2, were deemed as potentially related to breast cancer malignancy and prognosis, indicating that such genes were identified to be important by multiple feature ranking algorithms. In addition, the study formulated classification rules to reflect special gene expression patterns for three NHG classes. Some genes and rules were analyzed and supported by recent literature, providing new references for studying breast cancer.
Collapse
Affiliation(s)
- QingLan Ma
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, 510507, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, 200030, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
13
|
Chen L, Xu J, Zhou Y. PDATC-NCPMKL: Predicting drug's Anatomical Therapeutic Chemical (ATC) codes based on network consistency projection and multiple kernel learning. Comput Biol Med 2024; 169:107862. [PMID: 38150886 DOI: 10.1016/j.compbiomed.2023.107862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/19/2023] [Accepted: 12/17/2023] [Indexed: 12/29/2023]
Abstract
The development and discovery of new drugs is time-consuming and needs lots of human and material resources. Therefore, discovery of novel effects of existing drugs is an important alternative way, which can accelerate the process of designing "new" drugs. The anatomical Therapeutic Chemical (ATC) classification system recommended by World Health Organization (WHO) is a basic research area in this regard. A novel ATC code of an existing drug suggests its novel effects. Some computational models have been proposed, which can predict the drug-ATC code associations. However, their performance is not very high. There still exist spaces for improvement. In this study, a new recommendation system (named PDATC-NCPMKL), which incorporated network consistency projection and multi-kernel learning, was designed to identify drug-ATC code associations. For drugs or ATC codes, several kernels were constructed, which were fused by a multiple kernel learning method and an additional kernel integration scheme. To enhance the performance, the drug-ATC code association adjacency matrix was reformulated by a variant of weighted K nearest known neighbors (WKNKN). The reformulated adjacency matrix, drug and ATC code kernels were fed into network consistency projection to generate the association score matrix. The proposed recommendation system was tested on the ATC codes at the second, third and fourth levels in drug ATC classification system using ten-fold cross-validation. The results indicated that all AUROC and AUPR values were close to or exceeded 0.96. Such performance was higher than some existing computational models. Some additional tests were conducted to prove the utility of adjacency matrix reformulation and to analyze the importance of drug and ATC code kernels.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China.
| | - Jing Xu
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China.
| | - Yubin Zhou
- Department of Thoracic Surgery, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 610072, China.
| |
Collapse
|
14
|
Ren J, Zhou X, Huang K, Chen L, Guo W, Feng K, Huang T, Cai YD. Identification of key genes associated with persistent immune changes and secondary immune activation responses induced by influenza vaccination after COVID-19 recovery by machine learning methods. Comput Biol Med 2024; 169:107883. [PMID: 38157776 DOI: 10.1016/j.compbiomed.2023.107883] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 11/27/2023] [Accepted: 12/18/2023] [Indexed: 01/03/2024]
Abstract
COVID-19 is hypothesized to exert enduring effects on the immune systems of patients, leading to alterations in immune-related gene expression. This study aimed to scrutinize the persistent implications of SARS-CoV-2 infection on gene expression and its influence on subsequent immune activation responses. We designed a machine learning-based approach to analyze transcriptomic data from both healthy individuals and patients who had recovered from COVID-19. Patients were categorized based on their influenza vaccination status and then compared with healthy controls. The initial sample set encompassed 86 blood samples from healthy controls and 72 blood samples from recuperated COVID-19 patients prior to influenza vaccination. The second sample set included 123 blood samples from healthy controls and 106 blood samples from recovered COVID-19 patients who had been vaccinated against influenza. For each sample, the dataset captured expression levels of 17,060 genes. Above two sample sets were first analyzed by seven feature ranking algorithms, yielding seven feature lists for each dataset. Then, each list was fed into the incremental feature selection method, incorporating three classic classification algorithms, to extract essential genes, classification rules and build efficient classifiers. The genes and rules were analyzed in this study. The main findings included that NEXN and ZNF354A were highly expressed in recovered COVID-19 patients, whereas MKI67 and GZMB were highly expressed in patients with secondary immune activation post-COVID-19 recovery. These pivotal genes could provide valuable insights for future health monitoring of COVID-19 patients and guide the creation of continued treatment regimens.
Collapse
Affiliation(s)
- Jingxin Ren
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| | - XianChao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Ke Huang
- School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China.
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China.
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, 200030, China.
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, 510507, China.
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China; CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
15
|
Chen L, Zhang C, Xu J. PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes. BMC Bioinformatics 2024; 25:50. [PMID: 38291384 PMCID: PMC10829269 DOI: 10.1186/s12859-024-05665-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 01/22/2024] [Indexed: 02/01/2024] Open
Abstract
BACKGROUND Enzymes play an irreplaceable and important role in maintaining the lives of living organisms. The Enzyme Commission (EC) number of an enzyme indicates its essential functions. Correct identification of the first digit (family class) of the EC number for a given enzyme is a hot topic in the past twenty years. Several previous methods adopted functional domain composition to represent enzymes. However, it would lead to dimension disaster, thereby reducing the efficiency of the methods. On the other hand, most previous methods can only deal with enzymes belonging to one family class. In fact, several enzymes belong to two or more family classes. RESULTS In this study, a fast and efficient multi-label classifier, named PredictEFC, was designed. To construct this classifier, a novel feature extraction scheme was designed for processing functional domain information of enzymes, which counting the distribution of each functional domain entry across seven family classes in the training dataset. Based on this scheme, each training or test enzyme was encoded into a 7-dimenion vector by fusing its functional domain information and above statistical results. Random k-labelsets (RAKEL) was adopted to build the classifier, where random forest was selected as the base classification algorithm. The two tenfold cross-validation results on the training dataset shown that the accuracy of PredictEFC can reach 0.8493 and 0.8370. The independent test on two datasets indicated the accuracy values of 0.9118 and 0.8777. CONCLUSION The performance of PredictEFC was slightly lower than the classifier directly using functional domain composition. However, its efficiency was sharply improved. The running time was less than one-tenth of the time of the classifier directly using functional domain composition. In additional, the utility of PredictEFC was superior to the classifiers using traditional dimensionality reduction methods and some previous methods, and this classifier can be transplanted for predicting enzyme family classes of other species. Finally, a web-server available at http://124.221.158.221/ was set up for easy usage.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China.
| | - Chenyu Zhang
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China
| | - Jing Xu
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China
| |
Collapse
|
16
|
Rakhshaninejad M, Fathian M, Shirkoohi R, Barzinpour F, Gandomi AH. Refining breast cancer biomarker discovery and drug targeting through an advanced data-driven approach. BMC Bioinformatics 2024; 25:33. [PMID: 38253993 PMCID: PMC10810249 DOI: 10.1186/s12859-024-05657-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 01/15/2024] [Indexed: 01/24/2024] Open
Abstract
Breast cancer remains a major public health challenge worldwide. The identification of accurate biomarkers is critical for the early detection and effective treatment of breast cancer. This study utilizes an integrative machine learning approach to analyze breast cancer gene expression data for superior biomarker and drug target discovery. Gene expression datasets, obtained from the GEO database, were merged post-preprocessing. From the merged dataset, differential expression analysis between breast cancer and normal samples revealed 164 differentially expressed genes. Meanwhile, a separate gene expression dataset revealed 350 differentially expressed genes. Additionally, the BGWO_SA_Ens algorithm, integrating binary grey wolf optimization and simulated annealing with an ensemble classifier, was employed on gene expression datasets to identify predictive genes including TOP2A, AKR1C3, EZH2, MMP1, EDNRB, S100B, and SPP1. From over 10,000 genes, BGWO_SA_Ens identified 1404 in the merged dataset (F1 score: 0.981, PR-AUC: 0.998, ROC-AUC: 0.995) and 1710 in the GSE45827 dataset (F1 score: 0.965, PR-AUC: 0.986, ROC-AUC: 0.972). The intersection of DEGs and BGWO_SA_Ens selected genes revealed 35 superior genes that were consistently significant across methods. Enrichment analyses uncovered the involvement of these superior genes in key pathways such as AMPK, Adipocytokine, and PPAR signaling. Protein-protein interaction network analysis highlighted subnetworks and central nodes. Finally, a drug-gene interaction investigation revealed connections between superior genes and anticancer drugs. Collectively, the machine learning workflow identified a robust gene signature for breast cancer, illuminated their biological roles, interactions and therapeutic associations, and underscored the potential of computational approaches in biomarker discovery and precision oncology.
Collapse
Affiliation(s)
- Morteza Rakhshaninejad
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Mohammad Fathian
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran.
| | - Reza Shirkoohi
- Cancer Biology Research Center, Cancer Institute, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Keshavarz Boulevard, Tehran, 1419733141, Tehran, Iran
| | - Farnaz Barzinpour
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Amir H Gandomi
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, 2007, NSW, Australia
- University Research and Innovation Center (EKIK), Óbuda University, Budapest, 1034, Hungary
| |
Collapse
|
17
|
Ren JX, Chen L, Guo W, Feng KY, Cai YD, Huang T. Patterns of Gene Expression Profiles Associated with Colorectal Cancer in Colorectal Mucosa by Using Machine Learning Methods. Comb Chem High Throughput Screen 2024; 27:2921-2934. [PMID: 37957897 DOI: 10.2174/0113862073266300231026103844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 09/11/2023] [Accepted: 09/30/2023] [Indexed: 11/15/2023]
Abstract
BACKGROUND Colorectal cancer (CRC) has a very high incidence and lethality rate and is one of the most dangerous cancer types. Timely diagnosis can effectively reduce the incidence of colorectal cancer. Changes in para-cancerous tissues may serve as an early signal for tumorigenesis. Comparison of the differences in gene expression between para-cancerous and normal mucosa can help in the diagnosis of CRC and understanding the mechanisms of development. OBJECTIVES This study aimed to identify specific genes at the level of gene expression, which are expressed in normal mucosa and may be predictive of CRC risk. METHODS A machine learning approach was used to analyze transcriptomic data in 459 samples of normal colonic mucosal tissue from 322 CRC cases and 137 non-CRC, in which each sample contained 28,706 gene expression levels. The genes were ranked using four ranking methods based on importance estimation (LASSO, LightGBM, MCFS, and mRMR) and four classification algorithms (decision tree [DT], K-nearest neighbor [KNN], random forest [RF], and support vector machine [SVM]) were combined with incremental feature selection [IFS] methods to construct a prediction model with excellent performance. RESULT The top-ranked genes, namely, HOXD12, CDH1, and S100A12, were associated with tumorigenesis based on previous studies. CONCLUSION This study summarized four sets of quantitative classification rules based on the DT algorithm, providing clues for understanding the microenvironmental changes caused by CRC. According to the rules, the effect of CRC on normal mucosa can be determined.
Collapse
Affiliation(s)
- Jing Xin Ren
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, 200030, China
| | - Kai Yan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, 510507, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| |
Collapse
|
18
|
Chen L, Qu R, Liu X. Improved multi-label classifiers for predicting protein subcellular localization. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:214-236. [PMID: 38303420 DOI: 10.3934/mbe.2024010] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Protein functions are closely related to their subcellular locations. At present, the prediction of protein subcellular locations is one of the most important problems in protein science. The evident defects of traditional methods make it urgent to design methods with high efficiency and low costs. To date, lots of computational methods have been proposed. However, this problem is far from being completely solved. Recently, some multi-label classifiers have been proposed to identify subcellular locations of human, animal, Gram-negative bacterial and eukaryotic proteins. These classifiers adopted the protein features derived from gene ontology information. Although they provided good performance, they can be further improved by adopting more powerful machine learning algorithms. In this study, four improved multi-label classifiers were set up for identification of subcellular locations of the above four protein types. The random k-labelsets (RAKEL) algorithm was used to tackle proteins with multiple locations, and random forest was used as the basic prediction engine. All classifiers were tested by jackknife test, indicating their high performance. Comparisons with previous classifiers further confirmed the superiority of the proposed classifiers.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Ruyun Qu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Xintong Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
19
|
Jia W, Guo A, Bian W, Zhang R, Wang X, Shi L. Integrative deep learning framework predicts lipidomics-based investigation of preservatives on meat nutritional biomarkers and metabolic pathways. Crit Rev Food Sci Nutr 2023:1-15. [PMID: 38127336 DOI: 10.1080/10408398.2023.2295016] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Preservatives are added as antimicrobial agents to extend the shelf life of meat. Adding preservatives to meat products can affect their flavor and nutrition. This review clarifies the effects of preservatives on metabolic pathways and network molecular transformations in meat products based on lipidomics, metabolomics and proteomics analyses. Preservatives change the nutrient content of meat products via altering ionic strength and pH to influence enzyme activity. Ionic strength in salt triggers muscle triglyceride hydrolysis by causing phosphorylation and lipid droplet splitting in adipose tissue hormone-sensitive lipase and triglyceride lipase. DisoLipPred exploiting deep recurrent networks and transfer learning can predict the lipid binding trend of each amino acid in the disordered region of input protein sequences, which could provide omics analyses of biomarkers metabolic pathways in meat products. While conventional meat quality assessment tools are unable to elucidate the intrinsic mechanisms and pathways of variables in the influences of preservatives on the quality of meat products, the promising application of omics techniques in food analysis and discovery through multimodal learning prediction algorithms of neural networks (e.g., deep neural network, convolutional neural network, artificial neural network) will drive the meat industry to develop new strategies for food spoilage prevention and control.
Collapse
Affiliation(s)
- Wei Jia
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an, China
- Agricultural Product Processing and Inspection Center, Shaanxi Testing Institute of Product Quality Supervision, Xi'an, Shaanxi, China
- Agricultural Product Quality Research Center, Shaanxi Research Institute of Agricultural Products Processing Technology, Xi'an, China
- Food Safety Testing Center, Shaanxi Sky Pet Biotechnology Co., Ltd, Xi'an, China
| | - Aiai Guo
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an, China
| | - Wenwen Bian
- Agricultural Product Processing and Inspection Center, Shaanxi Testing Institute of Product Quality Supervision, Xi'an, Shaanxi, China
| | - Rong Zhang
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an, China
| | - Xin Wang
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an, China
| | - Lin Shi
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an, China
| |
Collapse
|
20
|
Ghorbel M, Zribi I, Haddaji N, Siddiqui AJ, Bouali N, Brini F. Genome-Wide Identification and Expression Analysis of Catalase Gene Families in Triticeae. PLANTS (BASEL, SWITZERLAND) 2023; 13:11. [PMID: 38202319 PMCID: PMC10781083 DOI: 10.3390/plants13010011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/03/2023] [Accepted: 12/11/2023] [Indexed: 01/12/2024]
Abstract
Aerobic metabolism in plants results in the production of hydrogen peroxide (H2O2), a significant and comparatively stable non-radical reactive oxygen species (ROS). H2O2 is a signaling molecule that regulates particular physiological and biological processes (the cell cycle, photosynthesis, plant growth and development, and plant responses to environmental challenges) at low concentrations. Plants may experience oxidative stress and ultimately die from cell death if excess H2O2 builds up. Triticum dicoccoides, Triticum urartu, and Triticum spelta are different ancient wheat species that present different interesting characteristics, and their importance is becoming more and more clear. In fact, due to their interesting nutritive health, flavor, and nutritional values, as well as their resistance to different parasites, the cultivation of these species is increasingly important. Thus, it is important to understand the mechanisms of plant tolerance to different biotic and abiotic stresses by studying different stress-induced gene families such as catalases (CAT), which are important H2O2-metabolizing enzymes found in plants. Here, we identified seven CAT-encoding genes (TdCATs) in Triticum dicoccoides, four genes in Triticum urartu (TuCATs), and eight genes in Triticum spelta (TsCATs). The accuracy of the newly identified wheat CAT gene members in different wheat genomes is confirmed by the gene structures, phylogenetic relationships, protein domains, and subcellular location analyses discussed in this article. In fact, our analysis showed that the identified genes harbor the following two conserved domains: a catalase domain (pfam00199) and a catalase-related domain (pfam06628). Phylogenetic analyses showed that the identified wheat CAT proteins were present in an analogous form in durum wheat and bread wheat. Moreover, the identified CAT proteins were located essentially in the peroxisome, as revealed by in silico analyses. Interestingly, analyses of CAT promoters in those species revealed the presence of different cis elements related to plant development, maturation, and plant responses to different environmental stresses. According to RT-qPCR, Triticum CAT genes showed distinctive expression designs in the studied organs and in response to different treatments (salt, heat, cold, mannitol, and ABA). This study completed a thorough analysis of the CAT genes in Triticeae, which advances our knowledge of CAT genes and establishes a framework for further functional analyses of the wheat gene family.
Collapse
Affiliation(s)
- Mouna Ghorbel
- Department of Biology, College of Sciences, University of Hail, P.O. Box 2440, Ha’il City 81451, Saudi Arabia; (M.G.); (N.H.); (A.J.S.); (N.B.)
| | - Ikram Zribi
- Laboratory of Biotechnology and Plant Improvement, Center of Biotechnology of Sfax, P.O. Box 1177, Sfax 3018, Tunisia;
| | - Najla Haddaji
- Department of Biology, College of Sciences, University of Hail, P.O. Box 2440, Ha’il City 81451, Saudi Arabia; (M.G.); (N.H.); (A.J.S.); (N.B.)
| | - Arif Jamal Siddiqui
- Department of Biology, College of Sciences, University of Hail, P.O. Box 2440, Ha’il City 81451, Saudi Arabia; (M.G.); (N.H.); (A.J.S.); (N.B.)
| | - Nouha Bouali
- Department of Biology, College of Sciences, University of Hail, P.O. Box 2440, Ha’il City 81451, Saudi Arabia; (M.G.); (N.H.); (A.J.S.); (N.B.)
| | - Faiçal Brini
- Laboratory of Biotechnology and Plant Improvement, Center of Biotechnology of Sfax, P.O. Box 1177, Sfax 3018, Tunisia;
| |
Collapse
|
21
|
Lin X, Ma Q, Chen L, Guo W, Huang Z, Huang T, Cai YD. Identifying genes associated with resistance to KRAS G12C inhibitors via machine learning methods. Biochim Biophys Acta Gen Subj 2023; 1867:130484. [PMID: 37805078 DOI: 10.1016/j.bbagen.2023.130484] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/02/2023] [Accepted: 10/04/2023] [Indexed: 10/09/2023]
Abstract
BACKGROUND Targeted therapy has revolutionized cancer treatment, greatly improving patient outcomes and quality of life. Lung cancer, specifically non-small cell lung cancer, is frequently driven by the G12C mutation at the KRAS locus. The development of KRAS inhibitors has been a breakthrough in the field of cancer research, given the crucial role of KRAS mutations in driving tumor growth and progression. However, over half of patients with cancer bypass inhibition show limited response to treatment. The mechanisms underlying tumor cell resistance to this treatment remain poorly understood. METHODS To address above gap in knowledge, we conducted a study aimed to elucidate the differences between tumor cells that respond positively to KRAS (G12C) inhibitor therapy and those that do not. Specifically, we analyzed single-cell gene expression profiles from KRAS G12C-mutant tumor cell models (H358, H2122, and SW1573) treated with KRAS G12C (ARS-1620) inhibitor, which contained 4297 cells that continued to proliferate under treatment and 3315 cells that became quiescent. Each cell was represented by the expression levels on 8687 genes. We then designed an innovative machine learning based framework, incorporating seven feature ranking algorithms and four classification algorithms to identify essential genes and establish quantitative rules. RESULTS Our analysis identified some top-ranked genes, including H2AFZ, CKS1B, TUBA1B, RRM2, and BIRC5, that are known to be associated with the progression of multiple cancers. CONCLUSION Above genes were relevant to tumor cell resistance to targeted therapy. This study provides important insights into the molecular mechanisms underlying tumor cell resistance to KRAS inhibitor treatment.
Collapse
Affiliation(s)
- Xiandong Lin
- Laboratory of Radiation Oncology and Radiobiology, Clinical Oncology School of Fujian Medical University and Fujian Cancer Hospital, Fuzhou 350014, China.
| | - QingLan Ma
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China
| | - Zhiyi Huang
- College of Chemistry, Fuzhou University, Fuzhou 350000, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China; CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| |
Collapse
|
22
|
Zhang X, Sun Y, Qi H, Feng J, Hou W, Liu Y. Comparative metabolomics study on areca nut from China and Southeast Asia (Thailand and Indonesia). PHYTOCHEMICAL ANALYSIS : PCA 2023; 34:1022-1035. [PMID: 37813812 DOI: 10.1002/pca.3293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 09/02/2023] [Accepted: 09/21/2023] [Indexed: 10/11/2023]
Abstract
INTRODUCTION Areca nut is an economic crop and an important component in traditional Chinese medicine (TCM) and ethnomedicine. The crop is rich in alkaloids and flavonoids. Most previous studies have focused on the chemical components, especially alkaloids, in crops from certain areca nut-producing areas. OBJECTIVE The purpose of this study was to compare the differences in areca nut seeds in two main cultivation areas, identify differential metabolites, and evaluate seed quality in different production areas. METHODS A widely targeted metabolomics method based on ultrahigh-performance liquid chromatography coupled with triple quadrupole mass spectrometry (UHPLC-QQQ-MS), combined with the TCM systems pharmacology (TCMSP) database and multivariate statistical analysis, was used in this study to maximise the differentiation between quality characteristics of areca nut seeds from China and Southeast Asian regions. RESULTS Altogether, 1031 metabolites were identified in areca nut seeds; by querying the TCMSP database, 375 metabolites were identified as the main active ingredients. Moreover, the research showed that the metabolic profiles of areca nut seeds from China (ASCN) and Southeast Asia (ASSA) exhibit significant differences, and the difference is mainly reflected in 318 compounds. The relative content of 146 metabolites in ASCN was significantly higher than that in ASSA. Through Kyoto Encyclopedia of Genes and Genomes (KEGG) comparative analysis, areca nut seed metabolites in Chinese production areas were determined to have a wider metabolic pathway. CONCLUSION The areca nut seeds from cultivation areas possess many metabolites that are beneficial for health, including alkaloids, amino acids, phenolic acids, and lipids. Thus, compared with ASSA, ASCN have a higher medicinal value. This study provides a direction for the subsequent development and utilisation of areca nut seeds.
Collapse
Affiliation(s)
- Xiaojuan Zhang
- Hainan Provincial Key Laboratory of Resources Conservation and Development of Southern Medicine, Hainan Branch of the Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Haikou, China
| | - Yuanyuan Sun
- Key Laboratory of Bioactive Substances and Resources Utilization of Chinese Herbal Medicine, Ministry of Education & National Engineering Laboratory for Breeding of Endangered Medicinal Materials, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Huasha Qi
- Hainan Provincial Key Laboratory of Resources Conservation and Development of Southern Medicine, Hainan Branch of the Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Haikou, China
| | - Jian Feng
- Hainan Provincial Key Laboratory of Resources Conservation and Development of Southern Medicine, Hainan Branch of the Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Haikou, China
| | - Wencheng Hou
- Hainan Provincial Key Laboratory of Resources Conservation and Development of Southern Medicine, Hainan Branch of the Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Haikou, China
| | - Yangyang Liu
- Hainan Provincial Key Laboratory of Resources Conservation and Development of Southern Medicine, Hainan Branch of the Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Haikou, China
- Key Laboratory of Bioactive Substances and Resources Utilization of Chinese Herbal Medicine, Ministry of Education & National Engineering Laboratory for Breeding of Endangered Medicinal Materials, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
23
|
Liu X, Zhao F, Wang X, Chen S, Qu J, Sang Y. Prediction and validation of enzymatic degradation of aflatoxin M 1: Genomics and proteomics analysis of Bacillus pumilus E-1-1-1 enzymes. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 900:165720. [PMID: 37482353 DOI: 10.1016/j.scitotenv.2023.165720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/19/2023] [Accepted: 07/20/2023] [Indexed: 07/25/2023]
Abstract
Aflatoxins are a class of highly toxic mycotoxins. Aflatoxin M1 (AFM1) is hydroxylated metabolite of aflatoxin B1, having comparable toxicity, which is more commonly found in milk. In this study, the whole genome sequencing of Bacillus pumilus E-1-1-1 isolated from feces of 38 kinds of animals, having aflatoxin M1 degradation ability was conducted. Bacterial genome sequencing indicated that a total of 3445 sequences were finally annotated on 23 different cluster of orthologous groups (COG) categories. Then, the potential AFM1 degradation proteins were verified by proteomics; the properties of these proteins were further explored, including protein molecular weight, hydrophobicity, secondary structure prediction, and three-dimensional structures. Bacterial genome sequencing combined with proteomics showed that eight genes were the most capable of degrading AFM1 including three catalases, one superoxide dismutase, and four peroxidases to clone. These eight genes with AFM1 degrading capacity were successfully expressed. These results indicated that AFM1 can be degraded by Bacillus pumilus E-1-1-1 protein and the most degrading proteins were oxidoreductases.
Collapse
Affiliation(s)
- Xiaoyu Liu
- College of Food Science and Technology, Hebei Agricultural University, 289 Lingyusi Road, Baoding, Hebei 071001, PR China
| | - Fangkun Zhao
- College of Food Science and Technology, Hebei Agricultural University, 289 Lingyusi Road, Baoding, Hebei 071001, PR China.
| | - Xianghong Wang
- College of Food Science and Technology, Hebei Agricultural University, 289 Lingyusi Road, Baoding, Hebei 071001, PR China
| | - Shuiping Chen
- College of Food Science and Technology, Hebei Agricultural University, 289 Lingyusi Road, Baoding, Hebei 071001, PR China
| | - Jingyi Qu
- College of Food Science and Technology, Hebei Agricultural University, 289 Lingyusi Road, Baoding, Hebei 071001, PR China
| | - Yaxin Sang
- College of Food Science and Technology, Hebei Agricultural University, 289 Lingyusi Road, Baoding, Hebei 071001, PR China.
| |
Collapse
|
24
|
Chen L, Zhao X. PCDA-HNMP: Predicting circRNA-disease association using heterogeneous network and meta-path. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:20553-20575. [PMID: 38124565 DOI: 10.3934/mbe.2023909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Increasing amounts of experimental studies have shown that circular RNAs (circRNAs) play important regulatory roles in human diseases through interactions with related microRNAs (miRNAs). CircRNAs have become new potential disease biomarkers and therapeutic targets. Predicting circRNA-disease association (CDA) is of great significance for exploring the pathogenesis of complex diseases, which can improve the diagnosis level of diseases and promote the targeted therapy of diseases. However, determination of CDAs through traditional clinical trials is usually time-consuming and expensive. Computational methods are now alternative ways to predict CDAs. In this study, a new computational method, named PCDA-HNMP, was designed. For obtaining informative features of circRNAs and diseases, a heterogeneous network was first constructed, which defined circRNAs, mRNAs, miRNAs and diseases as nodes and associations between them as edges. Then, a deep analysis was conducted on the heterogeneous network by extracting meta-paths connecting to circRNAs (diseases), thereby mining hidden associations between various circRNAs (diseases). These associations constituted the meta-path-induced networks for circRNAs and diseases. The features of circRNAs and diseases were derived from the aforementioned networks via mashup. On the other hand, miRNA-disease associations (mDAs) were employed to improve the model's performance. miRNA features were yielded from the meta-path-induced networks on miRNAs and circRNAs, which were constructed from the meta-paths connecting miRNAs and circRNAs in the heterogeneous network. A concatenation operation was adopted to build the features of CDAs and mDAs. Such representations of CDAs and mDAs were fed into XGBoost to set up the model. The five-fold cross-validation yielded an area under the curve (AUC) of 0.9846, which was better than those of some existing state-of-the-art methods. The employment of mDAs can really enhance the model's performance and the importance analysis on meta-path-induced networks shown that networks produced by the meta-paths containing validated CDAs provided the most important contributions.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Xiaoyu Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
25
|
Pavlova N, Traykovska M, Penchovsky R. Targeting FMN, TPP, SAM-I, and glmS Riboswitches with Chimeric Antisense Oligonucleotides for Completely Rational Antibacterial Drug Development. Antibiotics (Basel) 2023; 12:1607. [PMID: 37998809 PMCID: PMC10668854 DOI: 10.3390/antibiotics12111607] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/05/2023] [Accepted: 11/07/2023] [Indexed: 11/25/2023] Open
Abstract
Antimicrobial drug resistance has emerged as a significant challenge in contemporary medicine due to the proliferation of numerous bacterial strains resistant to all existing antibiotics. Meanwhile, riboswitches have emerged as promising targets for discovering antibacterial drugs. Riboswitches are regulatory elements in certain bacterial mRNAs that can bind to specific molecules and control gene expression via transcriptional termination, prevention of translation, or mRNA destabilization. By targeting riboswitches, we aim to develop innovative strategies to combat antibiotic-resistant bacteria and enhance the efficacy of antibacterial treatments. This convergence of challenges and opportunities underscores the ongoing quest to revolutionize medical approaches against evolving bacterial threats. For the first time, this innovative review describes the rational design and applications of chimeric antisense oligonucleotides as antibacterial agents targeting four riboswitches selected based on genome-wide bioinformatic analyses. The antisense oligonucleotides are coupled with the cell-penetrating oligopeptide pVEC, which penetrates Gram-positive and Gram-negative bacteria and specifically targets glmS, FMN, TPP, and SAM-I riboswitches in Staphylococcus aureus, Listeria monocytogenes, and Escherichia coli. The average antibiotic dosage of antisense oligonucleotides that inhibits 80% of bacterial growth is around 700 nM (4.5 μg/mL). Antisense oligonucleotides do not exhibit toxicity in human cell lines at this concentration. The results demonstrate that these riboswitches are suitable targets for antibacterial drug development using antisense oligonucleotide technology. The approach is fully rational because selecting suitable riboswitch targets and designing ASOs that target them are based on predefined criteria. The approach can be used to develop narrow or broad-spectrum antibiotics against multidrug-resistant bacterial strains for a short time. The approach is easily adaptive to new resistance using targeting NGS technology.
Collapse
Affiliation(s)
| | | | - Robert Penchovsky
- Laboratory of Synthetic Biology and Bioinformatics, Faculty of Biology, Sofia University “St. Kliment Ohridski”, 8 Dragan Tzankov Blvd., 1164 Sofia, Bulgaria
| |
Collapse
|
26
|
Yang Y, Zhang Y, Ren J, Feng K, Li Z, Huang T, Cai Y. Identification of Colon Immune Cell Marker Genes Using Machine Learning Methods. Life (Basel) 2023; 13:1876. [PMID: 37763280 PMCID: PMC10532943 DOI: 10.3390/life13091876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/24/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Immune cell infiltration that occurs at the site of colon tumors influences the course of cancer. Different immune cell compositions in the microenvironment lead to different immune responses and different therapeutic effects. This study analyzed single-cell RNA sequencing data in a normal colon with the aim of screening genetic markers of 25 candidate immune cell types and revealing quantitative differences between them. The dataset contains 25 classes of immune cells, 41,650 cells in total, and each cell is expressed by 22,164 genes at the expression level. They were fed into a machine learning-based stream. The five feature ranking algorithms (last absolute shrinkage and selection operator, light gradient boosting machine, Monte Carlo feature selection, minimum redundancy maximum relevance, and random forest) were first used to analyze the importance of gene features, yielding five feature lists. Then, incremental feature selection and two classification algorithms (decision tree and random forest) were combined to filter the most important genetic markers from each list. For different immune cell subtypes, their marker genes, such as KLRB1 in CD4 T cells, RPL30 in B cell IGA plasma cells, and JCHAIN in IgG producing B cells, were identified. They were confirmed to be differentially expressed in different immune cells and involved in immune processes. In addition, quantitative rules were summarized by using the decision tree algorithm to distinguish candidate immune cell types. These results provide a reference for exploring the cell composition of the colon cancer microenvironment and for clinical immunotherapy.
Collapse
Affiliation(s)
- Yong Yang
- Qianwei Hospital of Jilin Province, Changchun 130012, China;
| | - Yuhang Zhang
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA;
| | - Jingxin Ren
- School of Life Sciences, Shanghai University, Shanghai 200444, China;
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China;
| | - Zhandong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun 130052, China;
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yudong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China;
| |
Collapse
|
27
|
Tian S, Liu Q, Qu J, Yang M, Ma Q, Liu J, Shao P, Liu Y. Whole-Transcriptome Analysis on the Leaves of Rosa chinensis Jacq. under Exposure to Polycyclic Aromatic Hydrocarbons. TOXICS 2023; 11:610. [PMID: 37505575 PMCID: PMC10386715 DOI: 10.3390/toxics11070610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 07/10/2023] [Accepted: 07/11/2023] [Indexed: 07/29/2023]
Abstract
The leaves of plants can be recommended as a cheap and sustainable environmental protection tool to mitigate PAHs with high toxicity in the ambient environment because they can serve as a reactor to remove ambient PAHs. Although previous studies have demonstrated that PAHs exhibit toxicological features, our knowledge about how ambient PAHs influence the leaves of plants is limited regarding the leaves of plants reducing ambient PAHs as a reactor. In this study, 1-year-old Rosa chinensis Jacq. with good growth potential was selected as a model plant. The leaves of Rosa chinensis Jacq. were exposed to 16 types of PAHs in the environmental concentration exposure group (0.1 μg L-1) and high-concentration exposure group (5 μg L-1) for seven days. In comparison, the leaves of Rosa chinensis Jacq. were exposed to de-ionized water and were chosen as the control group. During the exposure periods, the physiological parameters of leaves including, chlorophyll value, water content, temperature and nitrogen, were monitored using a chlorophyll meter. After 7 days of exposure, the leaves in the control and exposure groups were collected and used for whole-transcriptome analysis. Our results demonstrate that significant differentially expressed genes were observed in the leaves of Rosa chinensis Jacq. exposed to individual PAHs at 5 μg L-1 compared to the control group. These differentially expressed genes were involved in seven main pathways using bioinformatic analyses. In contrast, the levels of PAHs at environmentally relevant concentrations had negligible impacts on the physiological parameters and the gene transcription levels of the leaves of Rosa chinensis Jacq. Our results may provide direct evidence to remove ambient PAHs using terrestrial trees without considering the risk of PAHs at environmentally relevant concentrations on the leaves of terrestrial plants.
Collapse
Affiliation(s)
- Shili Tian
- Beijing Center for Physical and Chemical Analysis, Institute of Analysis and Testing, Beijing Academy of Science and Technology, Beijing 100089, China
| | - Qingyang Liu
- College of Biology and the Environment, Nanjing Forestry University, Nanjing 210037, China
| | - Jingming Qu
- Beijing Center for Physical and Chemical Analysis, Institute of Analysis and Testing, Beijing Academy of Science and Technology, Beijing 100089, China
| | - Ming Yang
- Beijing Center for Physical and Chemical Analysis, Institute of Analysis and Testing, Beijing Academy of Science and Technology, Beijing 100089, China
| | - Qiaoyun Ma
- Beijing Center for Physical and Chemical Analysis, Institute of Analysis and Testing, Beijing Academy of Science and Technology, Beijing 100089, China
| | - Jia Liu
- Beijing Center for Physical and Chemical Analysis, Institute of Analysis and Testing, Beijing Academy of Science and Technology, Beijing 100089, China
| | - Peng Shao
- Beijing Center for Physical and Chemical Analysis, Institute of Analysis and Testing, Beijing Academy of Science and Technology, Beijing 100089, China
| | - Yanju Liu
- Beijing Center for Physical and Chemical Analysis, Institute of Analysis and Testing, Beijing Academy of Science and Technology, Beijing 100089, China
| |
Collapse
|
28
|
Huang T, Li Y. Current progress, challenges, and future perspectives of language models for protein representation and protein design. Innovation (N Y) 2023; 4:100446. [PMID: 37485078 PMCID: PMC10362512 DOI: 10.1016/j.xinn.2023.100446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 05/18/2023] [Indexed: 07/25/2023] Open
Abstract
The sequence-structure-function paradigm of protein is the basis of molecular biology. What is the underlying mechanism of such sequence and structure/function corresponding relationship? We reviewed the methods for protein representation and protein design. With these protein representation models, we can accurately predict many properties of proteins, such as stability and binding affinity. Progen, Chroma, RF Diffusion, SCUBA, and other protein design models have demonstrated how human-designed artificial proteins can have desired biological functions. The protein design will revolutionize drug development. And more efficient artificial enzymes that break down industrial waste or plastics will contribute to carbon neutrality. We also discussed the three greatest challenges of protein design in future and possible solutions.
Collapse
Affiliation(s)
- Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yixue Li
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- Guangzhou Laboratory, Guangzhou 510005, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai 200433, China
| |
Collapse
|
29
|
Ren JX, Gao Q, Zhou XC, Chen L, Guo W, Feng KY, Lu L, Huang T, Cai YD. Identification of Gene Markers Associated with COVID-19 Severity and Recovery in Different Immune Cell Subtypes. BIOLOGY 2023; 12:947. [PMID: 37508378 PMCID: PMC10376631 DOI: 10.3390/biology12070947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 06/20/2023] [Accepted: 06/29/2023] [Indexed: 07/30/2023]
Abstract
As COVID-19 develops, dynamic changes occur in the patient's immune system. Changes in molecular levels in different immune cells can reflect the course of COVID-19. This study aims to uncover the molecular characteristics of different immune cell subpopulations at different stages of COVID-19. We designed a machine learning workflow to analyze scRNA-seq data of three immune cell types (B, T, and myeloid cells) in four levels of COVID-19 severity/outcome. The datasets for three cell types included 403,700 B-cell, 634,595 T-cell, and 346,547 myeloid cell samples. Each cell subtype was divided into four groups, control, convalescence, progression mild/moderate, and progression severe/critical, and each immune cell contained 27,943 gene features. A feature analysis procedure was applied to the data of each cell type. Irrelevant features were first excluded according to their relevance to the target variable measured by mutual information. Then, four ranking algorithms (last absolute shrinkage and selection operator, light gradient boosting machine, Monte Carlo feature selection, and max-relevance and min-redundancy) were adopted to analyze the remaining features, resulting in four feature lists. These lists were fed into the incremental feature selection, incorporating three classification algorithms (decision tree, k-nearest neighbor, and random forest) to extract key gene features and construct classifiers with superior performance. The results confirmed that genes such as PFN1, RPS26, and FTH1 played important roles in SARS-CoV-2 infection. These findings provide a useful reference for the understanding of the ongoing effect of COVID-19 development on the immune system.
Collapse
Affiliation(s)
- Jing-Xin Ren
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Qian Gao
- Department of Pharmacy, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China
| | - Xiao-Chao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai 200025, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China
| | - Kai-Yan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Lin Lu
- Department of Radiology, Columbia University Medical Center, New York, NY 10032, USA
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
30
|
Chen L, Chen K, Zhou B. Inferring drug-disease associations by a deep analysis on drug and disease networks. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:14136-14157. [PMID: 37679129 DOI: 10.3934/mbe.2023632] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Drugs, which treat various diseases, are essential for human health. However, developing new drugs is quite laborious, time-consuming, and expensive. Although investments into drug development have greatly increased over the years, the number of drug approvals each year remain quite low. Drug repositioning is deemed an effective means to accelerate the procedures of drug development because it can discover novel effects of existing drugs. Numerous computational methods have been proposed in drug repositioning, some of which were designed as binary classifiers that can predict drug-disease associations (DDAs). The negative sample selection was a common defect of this method. In this study, a novel reliable negative sample selection scheme, named RNSS, is presented, which can screen out reliable pairs of drugs and diseases with low probabilities of being actual DDAs. This scheme considered information from k-neighbors of one drug in a drug network, including their associations to diseases and the drug. Then, a scoring system was set up to evaluate pairs of drugs and diseases. To test the utility of the RNSS, three classic classification algorithms (random forest, bayes network and nearest neighbor algorithm) were employed to build classifiers using negative samples selected by the RNSS. The cross-validation results suggested that such classifiers provided a nearly perfect performance and were significantly superior to those using some traditional and previous negative sample selection schemes.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Kaiyu Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Bo Zhou
- Shanghai University of Medicine & Health Sciences, Shanghai 201318, China
| |
Collapse
|
31
|
Ma QL, Huang FM, Guo W, Feng KY, Huang T, Cai YD. Machine Learning Classification of Time since BNT162b2 COVID-19 Vaccination Based on Array-Measured Antibody Activity. Life (Basel) 2023; 13:1304. [PMID: 37374086 DOI: 10.3390/life13061304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 05/26/2023] [Accepted: 05/29/2023] [Indexed: 06/29/2023] Open
Abstract
Vaccines trigger an immunological response that includes B and T cells, with B cells producing antibodies. SARS-CoV-2 immunity weakens over time after vaccination. Discovering key changes in antigen-reactive antibodies over time after vaccination could help improve vaccine efficiency. In this study, we collected data on blood antibody levels in a cohort of healthcare workers vaccinated for COVID-19 and obtained 73 antigens in samples from four groups according to the duration after vaccination, including 104 unvaccinated healthcare workers, 534 healthcare workers within 60 days after vaccination, 594 healthcare workers between 60 and 180 days after vaccination, and 141 healthcare workers over 180 days after vaccination. Our work was a reanalysis of the data originally collected at Irvine University. This data was obtained in Orange County, California, USA, with the collection process commencing in December 2020. British variant (B.1.1.7), South African variant (B.1.351), and Brazilian/Japanese variant (P.1) were the most prevalent strains during the sampling period. An efficient machine learning based framework containing four feature selection methods (least absolute shrinkage and selection operator, light gradient boosting machine, Monte Carlo feature selection, and maximum relevance minimum redundancy) and four classification algorithms (decision tree, k-nearest neighbor, random forest, and support vector machine) was designed to select essential antibodies against specific antigens. Several efficient classifiers with a weighted F1 value around 0.75 were constructed. The antigen microarray used for identifying antibody levels in the coronavirus features ten distinct SARS-CoV-2 antigens, comprising various segments of both nucleocapsid protein (NP) and spike protein (S). This study revealed that S1 + S2, S1.mFcTag, S1.HisTag, S1, S2, Spike.RBD.His.Bac, Spike.RBD.rFc, and S1.RBD.mFc were most highly ranked among all features, where S1 and S2 are the subunits of Spike, and the suffixes represent the tagging information of different recombinant proteins. Meanwhile, the classification rules were obtained from the optimal decision tree to explain quantitatively the roles of antigens in the classification. This study identified antibodies associated with decreased clinical immunity based on populations with different time spans after vaccination. These antibodies have important implications for maintaining long-term immunity to SARS-CoV-2.
Collapse
Affiliation(s)
- Qing-Lan Ma
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Fei-Ming Huang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China
| | - Kai-Yan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
32
|
Ma Q, Huang F, Guo W, Feng K, Huang T, Cai Y. Identification of Phase-Separation-Protein-Related Function Based on Gene Ontology by Using Machine Learning Methods. Life (Basel) 2023; 13:1306. [PMID: 37374089 DOI: 10.3390/life13061306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/06/2023] [Accepted: 05/30/2023] [Indexed: 06/29/2023] Open
Abstract
Phase-separation proteins (PSPs) are a class of proteins that play a role in the process of liquid-liquid phase separation, which is a mechanism that mediates the formation of membranelle compartments in cells. Identifying phase separation proteins and their associated function could provide insights into cellular biology and the development of diseases, such as neurodegenerative diseases and cancer. Here, PSPs and non-PSPs that have been experimentally validated in earlier studies were gathered as positive and negative samples. Each protein's corresponding Gene Ontology (GO) terms were extracted and used to create a 24,907-dimensional binary vector. The purpose was to extract essential GO terms that can describe essential functions of PSPs and build efficient classifiers to identify PSPs with these GO terms at the same time. To this end, the incremental feature selection computational framework and an integrated feature analysis scheme, containing categorical boosting, least absolute shrinkage and selection operator, light gradient-boosting machine, extreme gradient boosting, and permutation feature importance, were used to build efficient classifiers and identify GO terms with classification-related importance. A set of random forest (RF) classifiers with F1 scores over 0.960 were established to distinguish PSPs from non-PSPs. A number of GO terms that are crucial for distinguishing between PSPs and non-PSPs were found, including GO:0003723, which is related to a biological process involving RNA binding; GO:0016020, which is related to membrane formation; and GO:0045202, which is related to the function of synapses. This study offered recommendations for future research aimed at determining the functional roles of PSPs in cellular processes by developing efficient RF classifiers and identifying the representative GO terms related to PSPs.
Collapse
Affiliation(s)
- Qinglan Ma
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yudong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
33
|
Xu Y, Ma Q, Ren J, Chen L, Guo W, Feng K, Zeng Z, Huang T, Cai Y. Using Machine Learning Methods in Identifying Genes Associated with COVID-19 in Cardiomyocytes and Cardiac Vascular Endothelial Cells. Life (Basel) 2023; 13:life13041011. [PMID: 37109540 PMCID: PMC10146712 DOI: 10.3390/life13041011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 04/02/2023] [Accepted: 04/08/2023] [Indexed: 04/29/2023] Open
Abstract
Corona Virus Disease 2019 (COVID-19) not only causes respiratory system damage, but also imposes strain on the cardiovascular system. Vascular endothelial cells and cardiomyocytes play an important role in cardiac function. The aberrant expression of genes in vascular endothelial cells and cardiomyocytes can lead to cardiovascular diseases. In this study, we sought to explain the influence of respiratory syndrome coronavirus 2 (SARS-CoV-2) infection on the gene expression levels of vascular endothelial cells and cardiomyocytes. We designed an advanced machine learning-based workflow to analyze the gene expression profile data of vascular endothelial cells and cardiomyocytes from patients with COVID-19 and healthy controls. An incremental feature selection method with a decision tree was used in building efficient classifiers and summarizing quantitative classification genes and rules. Some key genes, such as MALAT1, MT-CO1, and CD36, were extracted, which exert important effects on cardiac function, from the gene expression matrix of 104,182 cardiomyocytes, including 12,007 cells from patients with COVID-19 and 92,175 cells from healthy controls, and 22,438 vascular endothelial cells, including 10,812 cells from patients with COVID-19 and 11,626 cells from healthy controls. The findings reported in this study may provide insights into the effect of COVID-19 on cardiac cells and further explain the pathogenesis of COVID-19, and they may facilitate the identification of potential therapeutic targets.
Collapse
Affiliation(s)
- Yaochen Xu
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai 200444, China
| | - Qinglan Ma
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Jingxin Ren
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Zhenbing Zeng
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai 200444, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yudong Cai
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
34
|
Li H, Ma Q, Ren J, Guo W, Feng K, Li Z, Huang T, Cai YD. Immune responses of different COVID-19 vaccination strategies by analyzing single-cell RNA sequencing data from multiple tissues using machine learning methods. Front Genet 2023; 14:1157305. [PMID: 37007947 PMCID: PMC10065150 DOI: 10.3389/fgene.2023.1157305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 03/07/2023] [Indexed: 03/19/2023] Open
Abstract
Multiple types of COVID-19 vaccines have been shown to be highly effective in preventing SARS-CoV-2 infection and in reducing post-infection symptoms. Almost all of these vaccines induce systemic immune responses, but differences in immune responses induced by different vaccination regimens are evident. This study aimed to reveal the differences in immune gene expression levels of different target cells under different vaccine strategies after SARS-CoV-2 infection in hamsters. A machine learning based process was designed to analyze single-cell transcriptomic data of different cell types from the blood, lung, and nasal mucosa of hamsters infected with SARS-CoV-2, including B and T cells from the blood and nasal cavity, macrophages from the lung and nasal cavity, alveolar epithelial and lung endothelial cells. The cohort was divided into five groups: non-vaccinated (control), 2*adenovirus (two doses of adenovirus vaccine), 2*attenuated (two doses of attenuated virus vaccine), 2*mRNA (two doses of mRNA vaccine), and mRNA/attenuated (primed by mRNA vaccine, boosted by attenuated vaccine). All genes were ranked using five signature ranking methods (LASSO, LightGBM, Monte Carlo feature selection, mRMR, and permutation feature importance). Some key genes that contributed to the analysis of immune changes, such as RPS23, DDX5, PFN1 in immune cells, and IRF9 and MX1 in tissue cells, were screened. Afterward, the five feature sorting lists were fed into the feature incremental selection framework, which contained two classification algorithms (decision tree [DT] and random forest [RF]), to construct optimal classifiers and generate quantitative rules. Results showed that random forest classifiers could provide relative higher performance than decision tree classifiers, whereas the DT classifiers provided quantitative rules that indicated special gene expression levels under different vaccine strategies. These findings may help us to develop better protective vaccination programs and new vaccines.
Collapse
Affiliation(s)
- Hao Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Qinglan Ma
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Jingxin Ren
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Institutes for Biological Sciences (SIBS), Shanghai Jiao Tong University School of Medicine (SJTUSM), Chinese Academy of Sciences (CAS), Shanghai, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Zhandong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
35
|
Li J, Ren J, Liao H, Guo W, Feng K, Huang T, Cai YD. Identification of dynamic gene expression profiles during sequential vaccination with ChAdOx1/BNT162b2 using machine learning methods. Front Microbiol 2023; 14:1138674. [PMID: 37007526 PMCID: PMC10063797 DOI: 10.3389/fmicb.2023.1138674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 03/01/2023] [Indexed: 03/19/2023] Open
Abstract
To date, COVID-19 remains a serious global public health problem. Vaccination against SARS-CoV-2 has been adopted by many countries as an effective coping strategy. The strength of the body’s immune response in the face of viral infection correlates with the number of vaccinations and the duration of vaccination. In this study, we aimed to identify specific genes that may trigger and control the immune response to COVID-19 under different vaccination scenarios. A machine learning-based approach was designed to analyze the blood transcriptomes of 161 individuals who were classified into six groups according to the dose and timing of inoculations, including I-D0, I-D2-4, I-D7 (day 0, days 2–4, and day 7 after the first dose of ChAdOx1, respectively) and II-D0, II-D1-4, II-D7-10 (day 0, days 1–4, and days 7–10 after the second dose of BNT162b2, respectively). Each sample was represented by the expression levels of 26,364 genes. The first dose was ChAdOx1, whereas the second dose was mainly BNT162b2 (Only four individuals received a second dose of ChAdOx1). The groups were deemed as labels and genes were considered as features. Several machine learning algorithms were employed to analyze such classification problem. In detail, five feature ranking algorithms (Lasso, LightGBM, MCFS, mRMR, and PFI) were first applied to evaluate the importance of each gene feature, resulting in five feature lists. Then, the lists were put into incremental feature selection method with four classification algorithms to extract essential genes, classification rules and build optimal classifiers. The essential genes, namely, NRF2, RPRD1B, NEU3, SMC5, and TPX2, have been previously associated with immune response. This study also summarized expression rules that describe different vaccination scenarios to help determine the molecular mechanism of vaccine-induced antiviral immunity.
Collapse
Affiliation(s)
- Jing Li
- School of Computer Science, Baicheng Normal University, Baicheng, Jilin, China
| | - JingXin Ren
- School of Life Sciences, Shanghai University, Shanghai, China
| | | | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) and Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
36
|
Ren J, Zhang Y, Guo W, Feng K, Yuan Y, Huang T, Cai YD. Identification of Genes Associated with the Impairment of Olfactory and Gustatory Functions in COVID-19 via Machine-Learning Methods. Life (Basel) 2023; 13:798. [PMID: 36983953 PMCID: PMC10051382 DOI: 10.3390/life13030798] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 03/10/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19), as a severe respiratory disease, affects many parts of the body, and approximately 20-85% of patients exhibit functional impairment of the senses of smell and taste, some of whom even experience the permanent loss of these senses. These symptoms are not life-threatening but severely affect patients' quality of life and increase the risk of depression and anxiety. The pathological mechanisms of these symptoms have not been fully identified. In the current study, we aimed to identify the important biomarkers at the expression level associated with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection-mediated loss of taste or olfactory ability, and we have suggested the potential pathogenetic mechanisms of COVID-19 complications. We designed a machine-learning-based approach to analyze the transcriptome of 577 COVID-19 patient samples, including 84 COVID-19 samples with a decreased ability to taste or smell and 493 COVID-19 samples without impairment. Each sample was represented by 58,929 gene expression levels. The features were analyzed and sorted by three feature selection methods (least absolute shrinkage and selection operator, light gradient boosting machine, and Monte Carlo feature selection). The optimal feature sets were obtained through incremental feature selection using two classification algorithms: decision tree (DT) and random forest (RF). The top genes identified by these multiple methods (H3-5, NUDT5, and AOC1) are involved in olfactory and gustatory impairments. Meanwhile, a high-performance RF classifier was developed in this study, and three sets of quantitative rules that describe the impairment of olfactory and gustatory functions were obtained based on the optimal DT classifiers. In summary, this study provides a new computation analysis and suggests the latent biomarkers (genes and rules) for predicting olfactory and gustatory impairment caused by COVID-19 complications.
Collapse
Affiliation(s)
- Jingxin Ren
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Yuhang Zhang
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Ye Yuan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
37
|
Xu Y, Huang F, Guo W, Feng K, Zhu L, Zeng Z, Huang T, Cai YD. Characterization of chromatin accessibility patterns in different mouse cell types using machine learning methods at single-cell resolution. Front Genet 2023; 14:1145647. [PMID: 36936430 PMCID: PMC10014730 DOI: 10.3389/fgene.2023.1145647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 02/20/2023] [Indexed: 03/06/2023] Open
Abstract
Chromatin accessibility is a generic property of the eukaryotic genome, which refers to the degree of physical compaction of chromatin. Recent studies have shown that chromatin accessibility is cell type dependent, indicating chromatin heterogeneity across cell lines and tissues. The identification of markers used to distinguish cell types at the chromosome level is important to understand cell function and classify cell types. In the present study, we investigated transcriptionally active chromosome segments identified by sci-ATAC-seq at single-cell resolution, including 69,015 cells belonging to 77 different cell types. Each cell was represented by existence status on 20,783 genes that were obtained from 436,206 active chromosome segments. The gene features were deeply analyzed by Boruta, resulting in 3897 genes, which were ranked in a list by Monte Carlo feature selection. Such list was further analyzed by incremental feature selection (IFS) method, yielding essential genes, classification rules and an efficient random forest (RF) classifier. To improve the performance of the optimal RF classifier, its features were further processed by autoencoder, light gradient boosting machine and IFS method. The final RF classifier with MCC of 0.838 was constructed. Some marker genes such as H2-Dmb2, which are specifically expressed in antigen-presenting cells (e.g., dendritic cells or macrophages), and Tenm2, which are specifically expressed in T cells, were identified in this study. Our analysis revealed numerous potential epigenetic modification patterns that are unique to particular cell types, thereby advancing knowledge of the critical functions of chromatin accessibility in cell processes.
Collapse
Affiliation(s)
- Yaochen Xu
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) and Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Lin Zhu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Zhenbing Zeng
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
- *Correspondence: Zhenbing Zeng, ; Tao Huang, ; Yu-Dong Cai,
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Zhenbing Zeng, ; Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Zhenbing Zeng, ; Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|