1
|
Oliveira MF, de Albuquerque Neto MC, Leite TS, Alves PAA, Lima SVC, Silva RO. Performance evaluate of different chemometrics formalisms used for prostate cancer diagnosis by NMR-based metabolomics. Metabolomics 2023; 20:8. [PMID: 38127222 DOI: 10.1007/s11306-023-02067-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 11/16/2023] [Indexed: 12/23/2023]
Abstract
INTRODUCTION In general, two characteristics are ever present in NMR-based metabolomics studies: (1) they are assays aiming to classify the samples in different groups, and (2) the number of samples is smaller than the feature (chemical shift) number. It is also common to observe imbalanced datasets due to the sampling method and/or inclusion criteria. These situations can cause overfitting. However, appropriate feature selection and classification methods can be useful to solve this issue. OBJECTIVES Investigate the performance of metabolomics models built from the association between feature selectors, the absence of feature selection, and classification algorithms, as well as use the best performance model as an NMR-based metabolomic method for prostate cancer diagnosis. METHODS We evaluated the performance of NMR-based metabolomics models for prostate cancer diagnosis using seven feature selectors and five classification formalisms. We also obtained metabolomics models without feature selection. In this study, thirty-eight volunteers with a positive diagnosis of prostate cancer and twenty-three healthy volunteers were enrolled. RESULTS Thirty-eight models obtained were evaluated using AUROC, accuracy, sensitivity, specificity, and kappa's index values. The best result was obtained when Genetic Algorithm was used with Linear Discriminant Analysis with 0.92 sensitivity, 0.83 specificity, and 0.88 accuracy. CONCLUSION The results show that the pick of a proper feature selection method and classification model, and a resampling method can avoid overfitting in a small metabolomic dataset. Furthermore, this approach would decrease the number of biopsies and optimize patient follow-up. 1H NMR-based metabolomics promises to be a non-invasive tool in prostate cancer diagnosis.
Collapse
Affiliation(s)
- Márcio Felipe Oliveira
- Metabonomics and Chemometrics Laboratory, Fundamental Chemistry Department, Universidade Federal de Pernambuco, Av. Jornalista Anibal Fernandes, s/n, Cidade Universitária, Recife, Pernambuco, Brazil.
- Fundamental Chemistry Department, Universidade Federal de Pernambuco, Av. Jornalista Anibal Fernandes, s/n, Cidade Universitária, Recife, Pernambuco, Brazil.
| | - Moacir Cavalcante de Albuquerque Neto
- Surgery Department, Clinics Hospital, Urology Clinic, Universidade Federal de Pernambuco, Av. Professor Luis Freire, s/n. Cidade Universitária, Recife, Pernambuco, Brazil
| | - Thiago Siqueira Leite
- Surgery Department, Clinics Hospital, Urology Clinic, Universidade Federal de Pernambuco, Av. Professor Luis Freire, s/n. Cidade Universitária, Recife, Pernambuco, Brazil
| | - Paulo André Araújo Alves
- Surgery Department, Clinics Hospital, Urology Clinic, Universidade Federal de Pernambuco, Av. Professor Luis Freire, s/n. Cidade Universitária, Recife, Pernambuco, Brazil
| | - Salvador Vilar Correia Lima
- Surgery Department, Clinics Hospital, Urology Clinic, Universidade Federal de Pernambuco, Av. Professor Luis Freire, s/n. Cidade Universitária, Recife, Pernambuco, Brazil
| | - Ricardo Oliveira Silva
- Metabonomics and Chemometrics Laboratory, Fundamental Chemistry Department, Universidade Federal de Pernambuco, Av. Jornalista Anibal Fernandes, s/n, Cidade Universitária, Recife, Pernambuco, Brazil
| |
Collapse
|
2
|
Chu ZC, Cong T, Zhao JY, Zhang J, Lou ZY, Gao Y, Tang X. The identification of hub-methylated differentially expressed genes in osteoarthritis patients is based on epigenomic and transcriptomic data. Front Med (Lausanne) 2023; 10:1219830. [PMID: 37465641 PMCID: PMC10351907 DOI: 10.3389/fmed.2023.1219830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 06/16/2023] [Indexed: 07/20/2023] Open
Abstract
Introduction Osteoarthritis (OA) refers to a commonly seen degenerative joint disorder and a major global public health burden. According to the existing literature, osteoarthritis is related to epigenetic changes, which are important for diagnosing and treating the disease early. Through early targeted treatment, costly treatments and poor prognosis caused by advanced osteoarthritis can be avoided. Methods This study combined gene differential expression analysis and weighted gene co-expression network analysis (WGCNA) of the transcriptome with epigenome microarray data to discover the hub gene of OA. We obtained 2 microarray datasets (GSE114007, GSE73626) in Gene Expression Omnibus (GEO). The R software was utilized for identifying differentially expressed genes (DEGs) and differentially methylated genes (DMGs). By using WGCNA to analyze the relationships between modules and phenotypes, it was discovered that the blue module (MEBlue) has the strongest phenotypic connection with OA (cor = 0.92, p = 4e-16). The hub genes for OA, also known as the hub methylated differentially expressed genes, were identified by matching the MEblue module to differentially methylated differentially expressed genes. Furthermore, this study used Gene set variation analysis (GSVA) to identify specific signal pathways associated with hub genes. qRT-PCR and western blotting assays were used to confirm the expression levels of the hub genes in OA patients and healthy controls. Results Three hub genes were discovered: HTRA1, P2RY6, and RCAN1. GSVA analysis showed that high HTRA1 expression was mainly enriched in epithelial-mesenchymal transition and apical junction; high expression of P2RY6 was mainly enriched in the peroxisome, coagulation, and epithelial-mesenchymal transition; and high expression of RCAN1 was mainly enriched in epithelial-mesenchymal-transition, TGF-β-signaling, and glycolysis. The results of the RT-qPCR and WB assay were consistent with the findings. Discussion The three genes tested may cause articular cartilage degeneration by inducing chondrocyte hypertrophy, regulating extracellular matrix accumulation, and improving macrophage pro-inflammatory response, resulting in the onset and progression of osteoarthritis. They can provide new ideas for targeted treatment of osteoarthritis.
Collapse
Affiliation(s)
- Zhen-Chen Chu
- Department of Orthopedics, First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
- Dalian Medical University, Dalian, Liaoning, China
| | - Ting Cong
- Dalian Medical University, Dalian, Liaoning, China
- Department of Anesthesiology, Second Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Jian-Yu Zhao
- Department of Orthopedics, First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Jian Zhang
- Department of Orthopedics, First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
- Dalian Medical University, Dalian, Liaoning, China
| | - Zhi-Yuan Lou
- Department of Orthopedics, First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Yang Gao
- Department of Orthopedics, First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Xin Tang
- Department of Orthopedics, First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| |
Collapse
|
3
|
Ma S, Peng P, Duan Z, Fan Y, Li X. Predicting the Progress of Tuberculosis by Inflammatory Response-Related Genes Based on Multiple Machine Learning Comprehensive Analysis. J Immunol Res 2023; 2023:7829286. [PMID: 37228444 PMCID: PMC10205410 DOI: 10.1155/2023/7829286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 03/04/2023] [Accepted: 04/20/2023] [Indexed: 05/27/2023] Open
Abstract
Background Tuberculosis (TB), caused by the bacterium Mycobacterium tuberculosis, affects approximately one-quarter of the global population and is considered one of the most lethal infectious diseases worldwide. The prevention of latent tuberculosis infection (LTBI) from progressing into active tuberculosis (ATB) is crucial for controlling and eradicating TB. Unfortunately, currently available biomarkers have limited effectiveness in identifying subpopulations that are at risk of developing ATB. Hence, it is imperative to develop advanced molecular tools for TB risk stratification. Methods The TB datasets were downloaded from the GEO database. Three machine learning models, namely LASSO, RF, and SVM-RFE, were used to identify the key characteristic genes related to inflammation during the progression of LTBI to ATB. The expression and diagnostic accuracy of these characteristic genes were subsequently verified. These genes were then used to develop diagnostic nomograms. In addition, single-cell expression clustering analysis, immune cell expression clustering analysis, GSVA analysis, immune cell correlation, and immune checkpoint correlation of characteristic genes were conducted. Furthermore, the upstream shared miRNA was predicted, and a miRNA-genes network was constructed. Candidate drugs were also analyzed and predicted. Results In comparison to LTBI, a total of 96 upregulated and 26 downregulated genes related to the inflammatory response were identified in ATB. These characteristic genes have demonstrated excellent diagnostic performance and significant correlation with many immune cells and immune sites. The results of the miRNA-genes network analysis suggested a potential role of hsa-miR-3163 in the molecular mechanism of LTBI progressing into ATB. Moreover, retinoic acid may offer a potential avenue for the prevention of LTBI progression to ATB and for the treatment of ATB. Conclusion Our research has identified key inflammatory response-related genes that are characteristic of LTBI progression to ATB and hsa-miR-3163 as a significant node in the molecular mechanism of this progression. Our analyses have demonstrated the excellent diagnostic performance of these characteristic genes and their significant correlation with many immune cells and immune checkpoints. The CD274 immune checkpoint presents a promising target for the prevention and treatment of ATB. Furthermore, our findings suggest that retinoic acid may have a role in preventing LTBI from progressing to ATB and in treating ATB. This study provides a new perspective for differential diagnosis of LTBI and ATB and may uncover potential inflammatory immune mechanisms, biomarkers, therapeutic targets, and effective drugs in the progression of LTBI into ATB.
Collapse
Affiliation(s)
- Shuai Ma
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang 443000, China
- College of Basic Medical Science, China Three Gorges University, Yichang 443000, China
| | - Peifei Peng
- Department of Geriatrics, Liyuan Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Zhihao Duan
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang 443000, China
- College of Basic Medical Science, China Three Gorges University, Yichang 443000, China
| | - Yifeng Fan
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang 443000, China
- College of Basic Medical Science, China Three Gorges University, Yichang 443000, China
| | - Xinzhi Li
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang 443000, China
- College of Basic Medical Science, China Three Gorges University, Yichang 443000, China
| |
Collapse
|
4
|
Li J, Ren J, Liao H, Guo W, Feng K, Huang T, Cai YD. Identification of dynamic gene expression profiles during sequential vaccination with ChAdOx1/BNT162b2 using machine learning methods. Front Microbiol 2023; 14:1138674. [PMID: 37007526 PMCID: PMC10063797 DOI: 10.3389/fmicb.2023.1138674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 03/01/2023] [Indexed: 03/19/2023] Open
Abstract
To date, COVID-19 remains a serious global public health problem. Vaccination against SARS-CoV-2 has been adopted by many countries as an effective coping strategy. The strength of the body’s immune response in the face of viral infection correlates with the number of vaccinations and the duration of vaccination. In this study, we aimed to identify specific genes that may trigger and control the immune response to COVID-19 under different vaccination scenarios. A machine learning-based approach was designed to analyze the blood transcriptomes of 161 individuals who were classified into six groups according to the dose and timing of inoculations, including I-D0, I-D2-4, I-D7 (day 0, days 2–4, and day 7 after the first dose of ChAdOx1, respectively) and II-D0, II-D1-4, II-D7-10 (day 0, days 1–4, and days 7–10 after the second dose of BNT162b2, respectively). Each sample was represented by the expression levels of 26,364 genes. The first dose was ChAdOx1, whereas the second dose was mainly BNT162b2 (Only four individuals received a second dose of ChAdOx1). The groups were deemed as labels and genes were considered as features. Several machine learning algorithms were employed to analyze such classification problem. In detail, five feature ranking algorithms (Lasso, LightGBM, MCFS, mRMR, and PFI) were first applied to evaluate the importance of each gene feature, resulting in five feature lists. Then, the lists were put into incremental feature selection method with four classification algorithms to extract essential genes, classification rules and build optimal classifiers. The essential genes, namely, NRF2, RPRD1B, NEU3, SMC5, and TPX2, have been previously associated with immune response. This study also summarized expression rules that describe different vaccination scenarios to help determine the molecular mechanism of vaccine-induced antiviral immunity.
Collapse
Affiliation(s)
- Jing Li
- School of Computer Science, Baicheng Normal University, Baicheng, Jilin, China
| | - JingXin Ren
- School of Life Sciences, Shanghai University, Shanghai, China
| | | | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) and Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
5
|
Jia J, Lei R, Qin L, Wu G, Wei X. iEnhancer-DCSV: Predicting enhancers and their strength based on DenseNet and improved convolutional block attention module. Front Genet 2023; 14:1132018. [PMID: 36936423 PMCID: PMC10014624 DOI: 10.3389/fgene.2023.1132018] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 02/13/2023] [Indexed: 03/06/2023] Open
Abstract
Enhancers play a crucial role in controlling gene transcription and expression. Therefore, bioinformatics puts many emphases on predicting enhancers and their strength. It is vital to create quick and accurate calculating techniques because conventional biomedical tests take too long time and are too expensive. This paper proposed a new predictor called iEnhancer-DCSV built on a modified densely connected convolutional network (DenseNet) and an improved convolutional block attention module (CBAM). Coding was performed using one-hot and nucleotide chemical property (NCP). DenseNet was used to extract advanced features from raw coding. The channel attention and spatial attention modules were used to evaluate the significance of the advanced features and then input into a fully connected neural network to yield the prediction probabilities. Finally, ensemble learning was employed on the final categorization findings via voting. According to the experimental results on the test set, the first layer of enhancer recognition achieved an accuracy of 78.95%, and the Matthews correlation coefficient value was 0.5809. The second layer of enhancer strength prediction achieved an accuracy of 80.70%, and the Matthews correlation coefficient value was 0.6609. The iEnhancer-DCSV method can be found at https://github.com/leirufeng/iEnhancer-DCSV. It is easy to obtain the desired results without using the complex mathematical formulas involved.
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
- *Correspondence: Jianhua Jia, ; Rufeng Lei,
| | - Rufeng Lei
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
- *Correspondence: Jianhua Jia, ; Rufeng Lei,
| | - Lulu Qin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| | - Genqiang Wu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| | - Xin Wei
- Business School, Jiangxi Institute of Fashion Technology, Nanchang, China
| |
Collapse
|
6
|
Wu C, Chen L. A model with deep analysis on a large drug network for drug classification. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:383-401. [PMID: 36650771 DOI: 10.3934/mbe.2023018] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Drugs are an important means to treat various diseases. They are classified into several classes to indicate their properties and effects. Those in the same class always share some important features. The Kyoto Encyclopedia of Genes and Genomes (KEGG) DRUG recently reported a new drug classification system that classifies drugs into 14 classes. Correct identification of the class for any possible drug-like compound is helpful to roughly determine its effects for a particular type of disease. Experiments could be conducted to confirm such latent effects, thus accelerating the procedures for discovering novel drugs. In this study, this classification system was investigated. A classification model was proposed to assign one of the classes in the system to any given drug for the first time. Different from traditional fingerprint features, which indicated essential drug properties alone and were very popular in investigating drug-related problems, drugs were represented by novel features derived from a large drug network via a well-known network embedding algorithm called Node2vec. These features abstracted the drug associations generated from their essential properties, and they could overview each drug with all drugs as background. As class sizes were of great differences, synthetic minority over-sampling technique (SMOTE) was employed to tackle the imbalance problem. A balanced dataset was fed into the support vector machine to build the model. The 10-fold cross-validation results suggested the excellent performance of the model. This model was also superior to models using other drug features, including those generated by another network embedding algorithm and fingerprint features. Furthermore, this model provided more balanced performance across all classes than that without SMOTE.
Collapse
Affiliation(s)
- Chenhao Wu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
7
|
Zhao L, Gao J, Chen G, Huang C, Kong W, Feng Y, Zhen G. Mitochondria dysfunction in airway epithelial cells is associated with type 2-low asthma. Front Genet 2023; 14:1186317. [PMID: 37152983 PMCID: PMC10160377 DOI: 10.3389/fgene.2023.1186317] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 04/10/2023] [Indexed: 05/09/2023] Open
Abstract
Background: Type 2 (T2)-low asthma can be severe and corticosteroid-resistant. Airway epithelial cells play a pivotal role in the development of asthma, and mitochondria dysfunction is involved in the pathogenesis of asthma. However, the role of epithelial mitochondria dysfunction in T2-low asthma remains unknown. Methods: Differentially expressed genes (DEGs) were identified using gene expression omnibus (GEO) dataset GSE4302, which is originated from airway epithelial brushings from T2-high (n = 22) and T2-low asthma patients (n = 20). Gene set enrichment analysis (GSEA) was implemented to analyze the potential biological pathway involved between T2-low and T2-high asthma. T2-low asthma related genes were identified using weighted gene co-expression network analysis (WGCNA). The mitochondria-related genes (Mito-RGs) were referred to the Molecular Signatures Database (MSigDB). T2-low asthma related mitochondria (T2-low-Mito) DEGs were obtained by intersecting the DEGs, T2-low asthma related genes, and Mito-RGs. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) was performed to further explore the potential function of the T2-low-Mito DEGs. In addition, the hub genes were further identified by protein-protein interaction (PPI), and the expressions of hub genes were verified in another GEO dataset GSE67472 and bronchial brushings from patients recruited at Tongji Hospital. Results: Six hundred and ninety-two DEGs, including 107 downregulated genes and 585 upregulated genes were identified in airway epithelial brushings from T2-high and T2-low asthma patients included in GSE4302 dataset. GSEA showed that mitochondrial ATP synthesis coupled electron transport is involved in T2-low asthma. Nine hundred and four T2-low asthma related genes were identified using WGCNA. Twenty-two T2-low-Mito DEGs were obtained by intersecting the DEGs, T2-low asthma and Mito-RGs. The GO enrichment analysis of the T2-low-Mito DEGs showed significant enrichment of mitochondrial respiratory chain complex assembly, and respiratory electron transport chain. PPI network was constructed using 22 T2-low-Mito DEGs, and five hub genes, ATP5G1, UQCR10, NDUFA3, TIMM10, and NDUFAB1, were identified. Moreover, the expression of these hub genes was validated in another GEO dataset, and our cohort of asthma patients. Conclusion: This study suggests that mitochondria dysfunction contributes to T2-low asthma.
Collapse
Affiliation(s)
- Lu Zhao
- Division of Respiratory and Critical Care Medicine, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Key Laboratory of Respiratory Diseases, National Health Commission of People’s Republic of China, Wuhan, China
| | - Jiali Gao
- Division of Respiratory and Critical Care Medicine, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Key Laboratory of Respiratory Diseases, National Health Commission of People’s Republic of China, Wuhan, China
| | - Gongqi Chen
- Division of Respiratory and Critical Care Medicine, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Key Laboratory of Respiratory Diseases, National Health Commission of People’s Republic of China, Wuhan, China
| | - Chunli Huang
- Division of Respiratory and Critical Care Medicine, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Key Laboratory of Respiratory Diseases, National Health Commission of People’s Republic of China, Wuhan, China
| | - Weiqiang Kong
- Division of Respiratory and Critical Care Medicine, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Key Laboratory of Respiratory Diseases, National Health Commission of People’s Republic of China, Wuhan, China
| | - Yuchen Feng
- Division of Respiratory and Critical Care Medicine, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Key Laboratory of Respiratory Diseases, National Health Commission of People’s Republic of China, Wuhan, China
- *Correspondence: Yuchen Feng, ; Guohua Zhen,
| | - Guohua Zhen
- Division of Respiratory and Critical Care Medicine, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Key Laboratory of Respiratory Diseases, National Health Commission of People’s Republic of China, Wuhan, China
- *Correspondence: Yuchen Feng, ; Guohua Zhen,
| |
Collapse
|
8
|
Li S, Liu Y, Liu M, Wang L, Li X. Comprehensive bioinformatics analysis reveals biomarkers of DNA methylation-related genes in varicose veins. Front Genet 2022; 13:1013803. [PMID: 36506327 PMCID: PMC9732536 DOI: 10.3389/fgene.2022.1013803] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 11/09/2022] [Indexed: 11/27/2022] Open
Abstract
Background: Patients with Varicose veins (VV) show no obvious symptoms in the early stages, and it is a common and frequent clinical condition. DNA methylation plays a key role in VV by regulating gene expression. However, the molecular mechanism underlying methylation regulation in VV remains unclear. Methods: The mRNA and methylation data of VV and normal samples were obtained from the Gene Expression Omnibus (GEO) database. Methylation-Regulated Genes (MRGs) between VV and normal samples were crossed with VV-associated genes (VVGs) obtained by weighted gene co-expression network analysis (WGCNA) to obtain VV-associated MRGs (VV-MRGs). Their ability to predict disease was assessed using receiver operating characteristic (ROC) curves. Biomarkers were then screened using a random forest model (RF), support vector machine model (SVM), and generalized linear model (GLM). Next, gene set enrichment analysis (GSEA) was performed to explore the functions of biomarkers. Furthermore, we also predicted their drug targets, and constructed a competing endogenous RNAs (ceRNA) network and a drug target network. Finally, we verified their mRNA expression using quantitative real-time polymerase chain reaction (qRT-PCR). Results: Total three VV-MRGs, namely Wnt1-inducible signaling pathway protein 2 (WISP2), Cysteine-rich intestinal protein 1 (CRIP1), and Odd-skipped related 1 (OSR1) were identified by VVGs and MRGs overlapping. The area under the curves (AUCs) of the ROC curves for these three VV-MRGs were greater than 0.8. RF was confirmed as the optimal diagnostic model, and WISP2, CRIP1, and OSR1 were regarded as biomarkers. GSEA showed that WISP2, CRIP1, and OSR1 were associated with oxidative phosphorylation, extracellular matrix (ECM), and respiratory system functions. Furthermore, we found that lncRNA MIR17HG can regulate OSR1 by binding to hsa-miR-21-5p and that PAX2 might treat VV by targeting OSR1. Finally, qRT-PCR results showed that the mRNA expression of the three genes was consistent with the results of the datasets. Conclusion: This study identified WISP2, CRIP1, and OSR1 as biomarkers of VV through comprehensive bioinformatics analysis, and preliminary explored the DNA methylation-related molecular mechanism in VV, which might be important for VV diagnosis and exploration of potential molecular mechanisms.
Collapse
Affiliation(s)
- Shengyu Li
- Department of Vascular Surgery, Tianjin First Central Hospital, Tianjin, China,*Correspondence: Shengyu Li, ; Xiaofeng Li,
| | - Yuehan Liu
- Department of Functional Examination, Beijing Aerospace General Hospital, Beijing, China
| | - Mingming Liu
- Department of Vascular Surgery, Tianjin First Central Hospital, Tianjin, China
| | - Lizhao Wang
- Department of Vascular Surgery, Tianjin First Central Hospital, Tianjin, China
| | - Xiaofeng Li
- Department of Vascular Surgery, Tianjin First Central Hospital, Tianjin, China,*Correspondence: Shengyu Li, ; Xiaofeng Li,
| |
Collapse
|
9
|
Han M, Chen Z, He P, Li Z, Chen Q, Tong Z, Wang M, Du H, Zhang H. YgiM may act as a trigger in the sepsis caused by Klebsiella pneumoniae through the membrane-associated ceRNA network. Front Genet 2022; 13:973145. [PMID: 36212144 PMCID: PMC9537587 DOI: 10.3389/fgene.2022.973145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 09/07/2022] [Indexed: 11/27/2022] Open
Abstract
Sepsis is one of the diseases that can cause serious mortality. In E. coli, an inner membrane protein YgiM encoded by gene ygiM can target the eukaryotic peroxisome. Peroxisome is a membrane-enclosed organelle associated with the ROS metabolism and was reported to play the key role in immune responses and inflammation during the development of sepsis. Klebsiella pneumoniae (K. pneumoniae) is one of the important pathogens causing sepsis. However, the function of gene vk055_4013 which is highly homologous to ygiM of E. coli has not been demonstrated in K. pneumoniae. In this study, we prepared ΔygiM of K. pneumoniae ATCC43816, and found that the deletion of ygiM did not affect bacterial growth and mouse mortality in the mouse infection model. Interestingly, ΔygiM not only resulted in reduced bacterial resistance to macrophages, but also attenuated pathological manifestations in mouse organs. Furthermore, based on the data of Gene Expression Omnibus, the expression profiles of micro RNAs (miRNAs) and messenger RNAs (mRNAs) in the serum of 44 sepsis patients caused by K. pneumoniae infection were analyzed, and 11 differently expressed miRNAs and 8 DEmRNAs associated with the membrane function were found. Finally, the membrane-associated competing endogenous RNAs (ceRNAs) network was constructed. In this ceRNAs network, DEmiRNAs (hsa-miR-7108-5p, hsa-miR-6780a-5p, hsa-miR-6756-5p, hsa-miR-4433b-3p, hsa-miR-3652, hsa-miR-342-3p, hsa-miR-32-5p) and their potential downstream target DEmRNAs (VNN1, CEACAM8, PGLYRP1) were verified in the cell model infected by wild type and ΔygiM of K. pneumoniae, respectively. Taken together, YgiM may trigger the sepsis caused by K. pneumoniae via membrane-associated ceRNAs. This study provided new insights into the role of YgiM in the process of K. pneumoniae induced sepsis.
Collapse
Affiliation(s)
- Mingxiao Han
- Department of Clinical Laboratory, The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Zhihao Chen
- Department of Orthopedics, The Second Affiliated Hospital of Soochow University, Suzhou, China
- Department of Musculoskeletal Oncology, Sun Yat-sen University Cancer Center, Guangzhou, China
- State Key Laboratory of Oncology in Southern China, Collaborative Innovation Center of Cancer Medicine, Guangzhou, China
| | - Ping He
- Department of Clinical Laboratory, The Second Affiliated Hospital of Soochow University, Suzhou, China
- Department of Clinical Laboratory, Sichuan Province Science City Hospital, Chengdu, China
| | - Ziyuan Li
- Department of Orthopedics, The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Qi Chen
- Department of Clinical Laboratory, The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Zelei Tong
- Department of Orthopedics, The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Min Wang
- Department of Clinical Laboratory, The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Hong Du
- Department of Clinical Laboratory, The Second Affiliated Hospital of Soochow University, Suzhou, China
- *Correspondence: Haifang Zhang, , ; Hong Du,
| | - Haifang Zhang
- Department of Clinical Laboratory, The Second Affiliated Hospital of Soochow University, Suzhou, China
- *Correspondence: Haifang Zhang, , ; Hong Du,
| |
Collapse
|
10
|
A Ferroptosis-Related LncRNA Signature Associated with Prognosis, Tumor Immune Environment, and Genome Instability in Hepatocellular Carcinoma. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:6284540. [PMID: 36035299 PMCID: PMC9410853 DOI: 10.1155/2022/6284540] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 07/13/2022] [Accepted: 07/25/2022] [Indexed: 11/18/2022]
Abstract
Background Ferroptosis is an iron-dependent form of cell death. In this study, we identified ferroptosis-related long noncoding RNAs (FRlncRNAs) to investigate their association with hepatocellular carcinoma (HCC) in prognosis, tumor immune environment, and genome instability. Methods Transcriptome profile data of HCC were retrieved from a public database. FRlncRNAs were identified by co-expression analysis. Patients were randomly divided into training and test cohorts. Univariate Cox analysis and Least Absolute Shrinkage and Selection Operator (LASSO) Cox regression were performed to construct a risk model. Patients were divided into high- and low-risk groups based on the risk model. AUC and C index were used to assess the risk model. Survival analysis, immune status, and genome instability were compared between the two groups. Results Sixteen FRlncRNAs were identified and used to construct an FRlncRNA signature for the risk model. The Kaplan-Meier analysis revealed that patients in the high-risk group had poorer overall survival than patients in the low-risk group. The area under curve of the risk model was 0.879, 0.809, and 0.757 in the training cohort and 0.635, 0.688, and 0.739 in the test cohort at 1, 2, and 3 years, respectively. The risk model was an independent prognostic predictor and showed excellent prediction of prognosis compared with clinicopathological features. For the differentially expressed ferroptosis-related genes, many enriched metabolic pathways were identified in the functional enrichment analysis. Immune cells such as CD8+ T cells, macrophages M1, natural killer cells, and B cells, which may be associated with antitumor immune responses, differed between the high- and low-risk groups. Genome instability based on the risk model was also explored. A total of 61 genes were differently mutated between the two risk groups, and among them, TP53, HECW2, TRIM66, MCTP2, and KIAA1551 had the most significant mutation frequency differences. Conclusion The FRlncRNA signature is closely related with overall survival, tumor immune environment, and genome instability in HCC.
Collapse
|
11
|
Song J, Huang F, Chen L, Feng K, Jian F, Huang T, Cai YD. Identification of methylation signatures associated with CAR T cell in B-cell acute lymphoblastic leukemia and non-hodgkin’s lymphoma. Front Oncol 2022; 12:976262. [PMID: 36033519 PMCID: PMC9402909 DOI: 10.3389/fonc.2022.976262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 07/25/2022] [Indexed: 11/13/2022] Open
Abstract
CD19-targeted CAR T cell immunotherapy has exceptional efficacy for the treatment of B-cell malignancies. B-cell acute lymphocytic leukemia and non-Hodgkin’s lymphoma are two common B-cell malignancies with high recurrence rate and are refractory to cure. Although CAR T-cell immunotherapy overcomes the limitations of conventional treatments for such malignancies, failure of treatment and tumor recurrence remain common. In this study, we searched for important methylation signatures to differentiate CAR-transduced and untransduced T cells from patients with acute lymphoblastic leukemia and non-Hodgkin’s lymphoma. First, we used three feature ranking methods, namely, Monte Carlo feature selection, light gradient boosting machine, and least absolute shrinkage and selection operator, to rank all methylation features in order of their importance. Then, the incremental feature selection method was adopted to construct efficient classifiers and filter the optimal feature subsets. Some important methylated genes, namely, SERPINB6, ANK1, PDCD5, DAPK2, and DNAJB6, were identified. Furthermore, the classification rules for distinguishing different classes were established, which can precisely describe the role of methylation features in the classification. Overall, we applied advanced machine learning approaches to the high-throughput data, investigating the mechanism of CAR T cells to establish the theoretical foundation for modifying CAR T cells.
Collapse
Affiliation(s)
- Jiwei Song
- College of Life Science, Changchun Sci-Tech University, Shuangyang, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Fangfang Jian
- Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
12
|
A Novel Multistep Iterative Technique for Models in Medical Sciences with Complex Dynamics. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:7656451. [PMID: 35936367 PMCID: PMC9352491 DOI: 10.1155/2022/7656451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 06/16/2022] [Accepted: 06/27/2022] [Indexed: 11/24/2022]
Abstract
This paper proposes a three-step iterative technique for solving nonlinear equations from medical science. We designed the proposed technique by blending the well-known Newton's method with an existing two-step technique. The method needs only five evaluations per iteration: three for the given function and two for its first derivatives. As a result, the novel approach converges faster than many existing techniques. We investigated several models of applied medical science in both scalar and vector versions, including population growth, blood rheology, and neurophysiology. Finally, some complex-valued polynomials are shown as polynomiographs to visualize the convergence zones.
Collapse
|
13
|
Li H, Zhang S, Chen L, Pan X, Li Z, Huang T, Cai YD. Identifying Functions of Proteins in Mice With Functional Embedding Features. Front Genet 2022; 13:909040. [PMID: 35651937 PMCID: PMC9149260 DOI: 10.3389/fgene.2022.909040] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 04/28/2022] [Indexed: 12/02/2022] Open
Abstract
In current biology, exploring the biological functions of proteins is important. Given the large number of proteins in some organisms, exploring their functions one by one through traditional experiments is impossible. Therefore, developing quick and reliable methods for identifying protein functions is necessary. Considerable accumulation of protein knowledge and recent developments on computer science provide an alternative way to complete this task, that is, designing computational methods. Several efforts have been made in this field. Most previous methods have adopted the protein sequence features or directly used the linkage from a protein–protein interaction (PPI) network. In this study, we proposed some novel multi-label classifiers, which adopted new embedding features to represent proteins. These features were derived from functional domains and a PPI network via word embedding and network embedding, respectively. The minimum redundancy maximum relevance method was used to assess the features, generating a feature list. Incremental feature selection, incorporating RAndom k-labELsets to construct multi-label classifiers, used such list to construct two optimum classifiers, corresponding to two key measurements: accuracy and exact match. These two classifiers had good performance, and they were superior to classifiers that used features extracted by traditional methods.
Collapse
Affiliation(s)
- Hao Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - ShiQi Zhang
- Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - ZhanDong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.,CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
14
|
Li Z, Pan X, Cai YD. Identification of Type 2 Diabetes Biomarkers From Mixed Single-Cell Sequencing Data With Feature Selection Methods. Front Bioeng Biotechnol 2022; 10:890901. [PMID: 35721855 PMCID: PMC9201257 DOI: 10.3389/fbioe.2022.890901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 04/04/2022] [Indexed: 11/18/2022] Open
Abstract
Diabetes is the most common disease and a major threat to human health. Type 2 diabetes (T2D) makes up about 90% of all cases. With the development of high-throughput sequencing technologies, more and more fundamental pathogenesis of T2D at genetic and transcriptomic levels has been revealed. The recent single-cell sequencing can further reveal the cellular heterogenicity of complex diseases in an unprecedented way. With the expectation on the molecular essence of T2D across multiple cell types, we investigated the expression profiling of more than 1,600 single cells (949 cells from T2D patients and 651 cells from normal controls) and identified the differential expression profiling and characteristics at the transcriptomics level that can distinguish such two groups of cells at the single-cell level. The expression profile was analyzed by several machine learning algorithms, including Monte Carlo feature selection, support vector machine, and repeated incremental pruning to produce error reduction (RIPPER). On one hand, some T2D-associated genes (MTND4P24, MTND2P28, and LOC100128906) were discovered. On the other hand, we revealed novel potential pathogenic mechanisms in a rule manner. They are induced by newly recognized genes and neglected by traditional bulk sequencing techniques. Particularly, the newly identified T2D genes were shown to follow specific quantitative rules with diabetes prediction potentials, and such rules further indicated several potential functional crosstalks involved in T2D.
Collapse
Affiliation(s)
- Zhandong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Xiaoyong Pan
- Key Laboratory of System Control and Information Processing, Institute of Image Processing and Pattern Recognition, Ministry of Education of China, Shanghai Jiao Tong University, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Yu-Dong Cai,
| |
Collapse
|
15
|
CNNLSTMac4CPred: A Hybrid Model for N4-Acetylcytidine Prediction. Interdiscip Sci 2022; 14:439-451. [PMID: 35106702 DOI: 10.1007/s12539-021-00500-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 12/04/2021] [Accepted: 12/13/2021] [Indexed: 12/23/2022]
Abstract
N4-Acetylcytidine (ac4C) is a highly conserved post-transcriptional and an extensively existing RNA modification, playing versatile roles in the cellular processes. Due to the limitation of techniques and knowledge, large-scale identification of ac4C is still a challenging task. RNA sequences are like sentences containing semantics in the natural language. Inspired by the semantics of language, we proposed a hybrid model for ac4C prediction. The model used long short-term memory and convolution neural network to extract the semantic features hidden in the sequences. The semantic and the two traditional features (k-nucleotide frequencies and pseudo tri-tuple nucleotide composition) were combined to represent ac4C or non-ac4C sequences. The eXtreme Gradient Boosting was used as the learning algorithm. Five-fold cross-validation over the training set consisting of 1160 ac4C and 10,855 non-ac4C sequences obtained the area under the receiver operating characteristic curve (AUROC) of 0.9004, and the independent test over 469 ac4C and 4343 non-ac4C sequences reached an AUROC of 0.8825. The model obtained a sensitivity of 0.6474 in the five-fold cross-validation and 0.6290 in the independent test, outperforming two state-of-the-art methods. The performance of semantic features alone was better than those of k-nucleotide frequencies and pseudo tri-tuple nucleotide composition, implying that ac4C sequences are of semantics. The proposed hybrid model was implemented into a user-friendly web-server which is freely available to scientific communities: http://47.113.117.61/ac4c/ . The presented model and tool are beneficial to identify ac4C on large scale.
Collapse
|
16
|
Ran B, Chen L, Li M, Han Y, Dai Q. Drug-Drug Interactions Prediction Using Fingerprint Only. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:7818480. [PMID: 35586666 PMCID: PMC9110191 DOI: 10.1155/2022/7818480] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 04/21/2022] [Indexed: 12/27/2022]
Abstract
Combination drug therapy is an efficient way to treat complicated diseases. Drug-drug interaction (DDI) is an important research topic in this therapy as patient safety is a problem when two or more drugs are taken at the same time. Traditionally, in vitro experiments and clinical trials are common ways to determine DDIs. However, these methods cannot meet the requirements of large-scale tests. It is an alternative way to develop computational methods for predicting DDIs. Although several previous methods have been proposed, they always need several types of drug information, limiting their applications. In this study, we proposed a simple computational method to predict DDIs. In this method, drugs were represented by their fingerprint features, which are most widely used in investigating drug-related problems. These features were refined by three models, including addition, subtraction, and Hadamard models, to generate the representation of DDIs. The powerful classification algorithm, random forest, was picked up to build the classifier. The results of two types of tenfold cross-validation on the classifier indicated good performance for discovering novel DDIs among known drugs and acceptable performance for identifying DDIs between known drugs and unknown drugs or among unknown drugs. Although the classifier adopted a sample scheme to represent DDIs, it was still superior to other methods, which adopted features generated by some advanced computer algorithms. Furthermore, a user-friendly web-server, named DDIPF (http://106.14.164.77:5004/DDIPF/), was developed to implement the classifier.
Collapse
Affiliation(s)
- Bing Ran
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Meijing Li
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Yujuan Han
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| |
Collapse
|
17
|
Li X, Lu L, Chen L. Identification of protein functions in mouse with a label space partition method. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:3820-3842. [PMID: 35341276 DOI: 10.3934/mbe.2022176] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Protein is very important for almost all living creatures because it participates in most complicated and essential biological processes. Determining the functions of given proteins is one of the most essential problems in protein science. Such determination can be conducted through traditional experiments. However, the experimental methods are always time-consuming and of high costs. In recent years, computational methods give useful aids for identification of protein functions. This study presented a new multi-label classifier for identifying functions of mouse proteins. Due to the number of functional types, which were termed as labels in the classification procedure, a label space partition method was employed to divide labels into some partitions. On each partition, a multi-label classifier was constructed. The classifiers based on all partitions were integrated in the proposed classifier. The cross-validation results proved that the proposed classifier was of good performance. Classifiers with label partition were superior to those without label partition or with random label partition.
Collapse
Affiliation(s)
- Xuan Li
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lin Lu
- Department of Radiology, Columbia University Medical Center, New York 10032, USA
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
18
|
Predicting RNA 5-Methylcytosine Sites by Using Essential Sequence Features and Distributions. BIOMED RESEARCH INTERNATIONAL 2022; 2022:4035462. [PMID: 35071593 PMCID: PMC8776474 DOI: 10.1155/2022/4035462] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 12/07/2021] [Accepted: 12/22/2021] [Indexed: 12/15/2022]
Abstract
Methylation is one of the most common and considerable modifications in biological systems mediated by multiple enzymes. Recent studies have shown that methylation has been widely identified in different RNA molecules. RNA methylation modifications have various kinds, such as 5-methylcytosine (m5C). However, for individual methylation sites, their functions still remain to be elucidated. Testing of all methylation sites relies heavily on high-throughput sequencing technology, which is expensive and labor consuming. Thus, computational prediction approaches could serve as a substitute. In this study, multiple machine learning models were used to predict possible RNA m5C sites on the basis of mRNA sequences in human and mouse. Each site was represented by several features derived from
-mers of an RNA subsequence containing such site as center. The powerful max-relevance and min-redundancy (mRMR) feature selection method was employed to analyse these features. The outcome feature list was fed into incremental feature selection method, incorporating four classification algorithms, to build efficient models. Furthermore, the sites related to features used in the models were also investigated.
Collapse
|
19
|
Ding S, Li H, Zhang YH, Zhou X, Feng K, Li Z, Chen L, Huang T, Cai YD. Identification of Pan-Cancer Biomarkers Based on the Gene Expression Profiles of Cancer Cell Lines. Front Cell Dev Biol 2021; 9:781285. [PMID: 34917619 PMCID: PMC8669964 DOI: 10.3389/fcell.2021.781285] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 11/16/2021] [Indexed: 12/12/2022] Open
Abstract
There are many types of cancers. Although they share some hallmarks, such as proliferation and metastasis, they are still very different from many perspectives. They grow on different organ or tissues. Does each cancer have a unique gene expression pattern that makes it different from other cancer types? After the Cancer Genome Atlas (TCGA) project, there are more and more pan-cancer studies. Researchers want to get robust gene expression signature from pan-cancer patients. But there is large variance in cancer patients due to heterogeneity. To get robust results, the sample size will be too large to recruit. In this study, we tried another approach to get robust pan-cancer biomarkers by using the cell line data to reduce the variance. We applied several advanced computational methods to analyze the Cancer Cell Line Encyclopedia (CCLE) gene expression profiles which included 988 cell lines from 20 cancer types. Two feature selection methods, including Boruta, and max-relevance and min-redundancy methods, were applied to the cell line gene expression data one by one, generating a feature list. Such list was fed into incremental feature selection method, incorporating one classification algorithm, to extract biomarkers, construct optimal classifiers and decision rules. The optimal classifiers provided good performance, which can be useful tools to identify cell lines from different cancer types, whereas the biomarkers (e.g. NCKAP1, TNFRSF12A, LAMB2, FKBP9, PFN2, TOM1L1) and rules identified in this work may provide a meaningful and precise reference for differentiating multiple types of cancer and contribute to the personalized treatment of tumors.
Collapse
Affiliation(s)
- ShiJian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Hao Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - XianChao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - ZhanDong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.,CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
20
|
Wang P, Zhang G, Yu ZG, Huang G. A Deep Learning and XGBoost-Based Method for Predicting Protein-Protein Interaction Sites. Front Genet 2021; 12:752732. [PMID: 34764983 PMCID: PMC8576272 DOI: 10.3389/fgene.2021.752732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 09/20/2021] [Indexed: 11/29/2022] Open
Abstract
Knowledge about protein-protein interactions is beneficial in understanding cellular mechanisms. Protein-protein interactions are usually determined according to their protein-protein interaction sites. Due to the limitations of current techniques, it is still a challenging task to detect protein-protein interaction sites. In this article, we presented a method based on deep learning and XGBoost (called DeepPPISP-XGB) for predicting protein-protein interaction sites. The deep learning model served as a feature extractor to remove redundant information from protein sequences. The Extreme Gradient Boosting algorithm was used to construct a classifier for predicting protein-protein interaction sites. The DeepPPISP-XGB achieved the following results: area under the receiver operating characteristic curve of 0.681, a recall of 0.624, and area under the precision-recall curve of 0.339, being competitive with the state-of-the-art methods. We also validated the positive role of global features in predicting protein-protein interaction sites.
Collapse
Affiliation(s)
- Pan Wang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Guiyang Zhang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| |
Collapse
|