1
|
Pradhan UK, Mahapatra A, Naha S, Gupta A, Parsad R, Gahlaut V, Rath SN, Meher PK. ASPTF: A computational tool to predict abiotic stress-responsive transcription factors in plants by employing machine learning algorithms. Biochim Biophys Acta Gen Subj 2024; 1868:130597. [PMID: 38490467 DOI: 10.1016/j.bbagen.2024.130597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/26/2024] [Accepted: 03/10/2024] [Indexed: 03/17/2024]
Abstract
BACKGROUND Abiotic stresses pose serious threat to the growth and yield of crop plants. Several studies suggest that in plants, transcription factors (TFs) are important regulators of gene expression, especially when it comes to coping with abiotic stresses. Therefore, it is crucial to identify TFs associated with abiotic stress response for breeding of abiotic stress tolerant crop cultivars. METHODS Based on a machine learning framework, a computational model was envisaged to predict TFs associated with abiotic stress response in plants. To numerically encode TF sequences, four distinct sequence derived features were generated. The prediction was performed using ten shallow learning and four deep learning algorithms. For prediction using more pertinent and informative features, feature selection techniques were also employed. RESULTS Using the features chosen by the light-gradient boosting machine-variable importance measure (LGBM-VIM), the LGBM achieved the highest cross-validation performance metrics (accuracy: 86.81%, auROC: 92.98%, and auPRC: 94.03%). Further evaluation of the proposed model (LGBM prediction method + LGBM-VIM selected features) was also done using an independent test dataset, where the accuracy, auROC and auPRC were observed 81.98%, 90.65% and 91.30%, respectively. CONCLUSIONS To facilitate the adoption of the proposed strategy by users, the approach was implemented as a prediction server called ASPTF, accessible at https://iasri-sg.icar.gov.in/asptf/. The developed approach and the corresponding web application are anticipated to supplement experimental methods in the identification of transcription factors (TFs) responsive to abiotic stress in plants.
Collapse
Affiliation(s)
- Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Anuradha Mahapatra
- Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar 751003, Odisha, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Vijay Gahlaut
- University Centre for Research & Development, Chandigarh University, Mohali, Punjab, India.
| | - Surya Narayan Rath
- Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar 751003, Odisha, India
| | - Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| |
Collapse
|
2
|
Powell RT, Rinkenbaugh AL, Guo L, Cai S, Shao J, Zhou X, Zhang X, Jeter-Jones S, Fu C, Qi Y, Baameur Hancock F, White JB, Stephan C, Davies PJ, Moulder S, Symmans WF, Chang JT, Piwnica-Worms H. Targeting neddylation and sumoylation in chemoresistant triple negative breast cancer. NPJ Breast Cancer 2024; 10:37. [PMID: 38802426 PMCID: PMC11130334 DOI: 10.1038/s41523-024-00644-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 05/09/2024] [Indexed: 05/29/2024] Open
Abstract
Triple negative breast cancer (TNBC) accounts for 15-20% of breast cancer cases in the United States. Systemic neoadjuvant chemotherapy (NACT), with or without immunotherapy, is the current standard of care for patients with early-stage TNBC. However, up to 70% of TNBC patients have significant residual disease once NACT is completed, which is associated with a high risk of developing recurrence within two to three years of surgical resection. To identify targetable vulnerabilities in chemoresistant TNBC, we generated longitudinal patient-derived xenograft (PDX) models from TNBC tumors before and after patients received NACT. We then compiled transcriptomes and drug response profiles for all models. Transcriptomic analysis identified the enrichment of aberrant protein homeostasis pathways in models from post-NACT tumors relative to pre-NACT tumors. This observation correlated with increased sensitivity in vitro to inhibitors targeting the proteasome, heat shock proteins, and neddylation pathways. Pevonedistat, a drug annotated as a NEDD8-activating enzyme (NAE) inhibitor, was prioritized for validation in vivo and demonstrated efficacy as a single agent in multiple PDX models of TNBC. Pharmacotranscriptomic analysis identified a pathway-level correlation between pevonedistat activity and post-translational modification (PTM) machinery, particularly involving neddylation and sumoylation targets. Elevated levels of both NEDD8 and SUMO1 were observed in models exhibiting a favorable response to pevonedistat compared to those with a less favorable response in vivo. Moreover, a correlation emerged between the expression of neddylation-regulated pathways and tumor response to pevonedistat, indicating that targeting these PTM pathways may prove effective in combating chemoresistant TNBC.
Collapse
Affiliation(s)
- Reid T Powell
- Center for Translational Cancer Research, Institute of Bioscience and Technology Texas A&M Health Science Center, Houston, TX, USA
| | - Amanda L Rinkenbaugh
- Department of Experimental Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Lei Guo
- Center for Translational Cancer Research, Institute of Bioscience and Technology Texas A&M Health Science Center, Houston, TX, USA
| | - Shirong Cai
- Department of Experimental Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jiansu Shao
- Department of Experimental Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Xinhui Zhou
- Department of Experimental Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Xiaomei Zhang
- Department of Experimental Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Sabrina Jeter-Jones
- Department of Experimental Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Chunxiao Fu
- Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Yuan Qi
- Department of Experimental Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Faiza Baameur Hancock
- Department of Experimental Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jason B White
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Clifford Stephan
- Center for Translational Cancer Research, Institute of Bioscience and Technology Texas A&M Health Science Center, Houston, TX, USA
| | - Peter J Davies
- Center for Translational Cancer Research, Institute of Bioscience and Technology Texas A&M Health Science Center, Houston, TX, USA
| | - Stacy Moulder
- Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Eli Lilly and Company, Indianapolis, IN, USA
| | - W Fraser Symmans
- Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jeffrey T Chang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Integrative Biology and Pharmacology, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Helen Piwnica-Worms
- Department of Experimental Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
3
|
Lin S, Wei C, Wei Y, Fan J. Construction and verification of an endoplasmic reticulum stress-related prognostic model for endometrial cancer based on WGCNA and machine learning algorithms. Front Oncol 2024; 14:1362891. [PMID: 38725627 PMCID: PMC11079237 DOI: 10.3389/fonc.2024.1362891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 04/11/2024] [Indexed: 05/12/2024] Open
Abstract
Background Endoplasmic reticulum (ER) stress arises from the accumulation of misfolded or unfolded proteins within the cell and is intricately linked to the initiation and progression of various tumors and their therapeutic strategies. However, the precise role of ER stress in uterine corpus endometrial cancer (UCEC) remains unclear. Methods Data on patients with UCEC and control subjects were obtained from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Using differential expression analysis and Weighted Gene Co-expression Network Analysis (WGCNA), we identified pivotal differentially expressed ER stress-related genes (DEERGs). Further validation of the significance of these genes in UCEC was achieved through consensus clustering and bioinformatic analyses. Using Cox regression analysis and several machine learning algorithms (least absolute shrinkage and selection operator [LASSO], eXtreme Gradient Boosting [XGBoost], support vector machine recursive feature elimination [SVM-RFE], and Random Forest), hub DEERGs associated with patient prognosis were effectively identified. Based on the four identified hub genes, a prognostic model and nomogram were constructed. Additionally, a drug sensitivity analysis and in vitro validation experiments were performed. Results A total of 94 DEERGs were identified in patients with UCEC and healthy controls. Consensus clustering analysis revealed significant differences in prognosis, typical immune checkpoints, and tumor microenvironments between the subtypes. Using Cox regression analysis and machine learning, four hub DEERGs, MYBL2, RADX, RUSC2, and CYP46A1, were identified to construct a prognostic model. The reliability of the model was validated using receiver operating characteristic (ROC) curves. Decision curve analysis (DCA) demonstrated the superior predictive ability of the nomogram in terms of 3- and 5-year survival, compared with that of other clinical indicators. Drug sensitivity analysis revealed increased sensitivity to dactinomycin, docetaxel, selumetinib, and trametinib in the low-risk group. The expressions of RADX, RUSC2, and CYP46A1 were downregulated, whereas that of MYBL2 was upregulated in UCEC tissues, as demonstrated by reverse transcription-quantitative polymerase chain reaction (RT-qPCR) and immunofluorescence assays. Conclusion This study developed a stable and accurate prognostic model based on multiple bioinformatics analyses, which can be used to assess the prognosis of UCEC. This model may contribute to future research on the risk stratification of patients with UCEC and the formulation of novel treatment strategies.
Collapse
Affiliation(s)
- Shanshan Lin
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Changqiang Wei
- Department of Prenatal Diagnosis, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Yiyun Wei
- Department of Prenatal Diagnosis, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Jiangtao Fan
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| |
Collapse
|
4
|
Wang SS, Hall ML, Lee E, Kim SC, Ramesh N, Lee SH, Jang JY, Bold RJ, Ku JL, Hwang CI. Whole-genome bisulfite sequencing identifies stage- and subtype-specific DNA methylation signatures in pancreatic cancer. iScience 2024; 27:109414. [PMID: 38532888 PMCID: PMC10963232 DOI: 10.1016/j.isci.2024.109414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 02/03/2024] [Accepted: 02/29/2024] [Indexed: 03/28/2024] Open
Abstract
In pancreatic ductal adenocarcinoma (PDAC), no recurrent metastasis-specific mutation has been found, suggesting that epigenetic mechanisms, such as DNA methylation, are the major contributors of late-stage disease progression. Here, we performed the first whole-genome bisulfite sequencing (WGBS) on mouse and human PDAC organoid models to identify stage-specific and molecular subtype-specific DNA methylation signatures. With this approach, we identified thousands of differentially methylated regions (DMRs) that can distinguish between the stages and molecular subtypes of PDAC. Stage-specific DMRs are associated with genes related to nervous system development and cell-cell adhesions, and are enriched in promoters and bivalent enhancers. Subtype-specific DMRs showed hypermethylation of GATA6 foregut endoderm transcriptional networks in the squamous subtype and hypermethylation of EMT transcriptional networks in the progenitor subtype. These results indicate that aberrant DNA methylation contributes to both PDAC progression and subtype differentiation, resulting in significant and reoccurring DNA methylation patterns with diagnostic and prognostic potential.
Collapse
Affiliation(s)
- Sarah S. Wang
- Department of Microbiology and Molecular Genetics, College of Biological Sciences, University of California Davis, Davis, CA 95616, USA
| | - Madison L. Hall
- Department of Microbiology and Molecular Genetics, College of Biological Sciences, University of California Davis, Davis, CA 95616, USA
| | - EunJung Lee
- Department of Microbiology and Molecular Genetics, College of Biological Sciences, University of California Davis, Davis, CA 95616, USA
| | - Soon-Chan Kim
- Department of Biomedical Sciences, Korean Cell Line Bank, Laboratory of Cell Biology and Cancer Research Institute, Seoul National University College of Medicine, Seoul, South Korea
| | - Neha Ramesh
- Department of Microbiology and Molecular Genetics, College of Biological Sciences, University of California Davis, Davis, CA 95616, USA
| | - Sang Hyub Lee
- Department of Internal Medicine and Liver Research Institute, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, South Korea
| | - Jin-Young Jang
- Department of Surgery and Cancer Research Institute, Seoul National University College of Medicine, Seoul, South Korea
| | - Richard J. Bold
- Division of Surgical Oncology, Department of Surgery, University of California, Davis, Sacramento, CA, USA
- University of California Davis Comprehensive Cancer Center, Sacramento, CA, USA
| | - Ja-Lok Ku
- Department of Biomedical Sciences, Korean Cell Line Bank, Laboratory of Cell Biology and Cancer Research Institute, Seoul National University College of Medicine, Seoul, South Korea
| | - Chang-Il Hwang
- Department of Microbiology and Molecular Genetics, College of Biological Sciences, University of California Davis, Davis, CA 95616, USA
- University of California Davis Comprehensive Cancer Center, Sacramento, CA, USA
| |
Collapse
|
5
|
Meher PK, Sahu TK, Gupta A, Kumar A, Rustgi S. ASRpro: A machine-learning computational model for identifying proteins associated with multiple abiotic stress in plants. THE PLANT GENOME 2024; 17:e20259. [PMID: 36098562 DOI: 10.1002/tpg2.20259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 08/10/2022] [Indexed: 06/15/2023]
Abstract
One of the thrust areas of research in plant breeding is to develop crop cultivars with enhanced tolerance to abiotic stresses. Thus, identifying abiotic stress-responsive genes (SRGs) and proteins is important for plant breeding research. However, identifying such genes via established genetic approaches is laborious and resource intensive. Although transcriptome profiling has remained a reliable method of SRG identification, it is species specific. Additionally, identifying multistress responsive genes using gene expression studies is cumbersome. Thus, endorsing the need to develop a computational method for identifying the genes associated with different abiotic stresses. In this work, we aimed to develop a computational model for identifying genes responsive to six abiotic stresses: cold, drought, heat, light, oxidative, and salt. The predictions were performed using support vector machine (SVM), random forest, adaptive boosting (ADB), and extreme gradient boosting (XGB), where the autocross covariance (ACC) and K-mer compositional features were used as input. With ACC, K-mer, and ACC + K-mer compositional features, the overall accuracy of ∼60-77, ∼75-86, and ∼61-78% were respectively obtained using the SVM algorithm with fivefold cross-validation. The SVM also achieved higher accuracy than the other three algorithms. The proposed model was also assessed with an independent dataset and obtained an accuracy consistent with cross-validation. The proposed model is the first of its kind and is expected to serve the requirement of experimental biologists; however, the prediction accuracy was modest. Given its importance for the research community, the online prediction application, ASRpro, is made freely available (https://iasri-sg.icar.gov.in/asrpro/) for predicting abiotic SRGs and proteins.
Collapse
Affiliation(s)
| | | | - Ajit Gupta
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Anuj Kumar
- Dep. of Microbiology and Immunology, Dalhousie Univ., Halifax, Nova Scotia, Canada
- Laboratory of Immunity, Shantou Univ. Medical College, Shantou, PRC
| | - Sachin Rustgi
- Dep. of Plant and Environmental Sciences, Pee Dee Research and Education Centre, Clemson Univ., Florence, SC, USA
| |
Collapse
|
6
|
Huang P, Song Y, Yang Y, Bai F, Li N, Liu D, Li C, Li X, Gou W, Zong L. Identification and verification of diagnostic biomarkers based on mitochondria-related genes related to immune microenvironment for preeclampsia using machine learning algorithms. Front Immunol 2024; 14:1304165. [PMID: 38259465 PMCID: PMC10800455 DOI: 10.3389/fimmu.2023.1304165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 12/14/2023] [Indexed: 01/24/2024] Open
Abstract
Preeclampsia is one of the leading causes of maternal and fetal morbidity and mortality worldwide. Preeclampsia is linked to mitochondrial dysfunction as a contributing factor in its progression. This study aimed to develop a novel diagnostic model based on mitochondria-related genes(MRGs) for preeclampsia using machine learning and further investigate the association of the MRGs and immune infiltration landscape in preeclampsia. In this research, we analyzed GSE75010 database and screened 552 DE-MRGs between preeclampsia samples and normal samples. Enrichment assays indicated that 552 DE-MRGs were mainly related to energy metabolism pathway and several different diseases. Then, we performed LASSO and SVM-RFE and identified three critical diagnostic genes for preeclampsia, including CPOX, DEGS1 and SH3BP5. In addition, we developed a novel diagnostic model using the above three genes and its diagnostic value was confirmed in GSE44711, GSE75010 datasets and our cohorts. Importantly, the results of RT-PCR confirmed the expressions of CPOX, DEGS1 and SH3BP5 were distinctly increased in preeclampsia samples compared with normal samples. The results of the CIBERSORT algorithm revealed a striking dissimilarity between the immune cells found in preeclampsia samples and those found in normal samples. In addition, we found that the levels of SH3BP5 were closely associated with several immune cells, highlighting its potential involved in immune microenvironment of preeclampsia. Overall, this study has provided a novel diagnostic model and diagnostic genes for preeclampsia while also revealing the association between MRGs and immune infiltration. These findings offer valuable insights for further research and treatment of preeclampsia.
Collapse
Affiliation(s)
- Pu Huang
- Department of Obstetrics & Gynecology, the First Affiliated Hospital of Xi’an Jiaotong University, Xian, Shaanxi, China
| | - Yuchun Song
- Department of Gynecology and Obstetrics, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, Shandong, China
| | - Yu Yang
- Department of Obstetrics & Gynecology, the First Affiliated Hospital of Xi’an Jiaotong University, Xian, Shaanxi, China
| | - Feiyue Bai
- Department of Obstetrics & Gynecology, the First Affiliated Hospital of Xi’an Jiaotong University, Xian, Shaanxi, China
| | - Na Li
- Department of Obstetrics & Gynecology, the First Affiliated Hospital of Xi’an Jiaotong University, Xian, Shaanxi, China
| | - Dan Liu
- Department of Obstetrics & Gynecology, the First Affiliated Hospital of Xi’an Jiaotong University, Xian, Shaanxi, China
| | - Chunfang Li
- Department of Obstetrics & Gynecology, the First Affiliated Hospital of Xi’an Jiaotong University, Xian, Shaanxi, China
| | - Xuelan Li
- Department of Obstetrics & Gynecology, the First Affiliated Hospital of Xi’an Jiaotong University, Xian, Shaanxi, China
| | - Wenli Gou
- Department of Obstetrics & Gynecology, the First Affiliated Hospital of Xi’an Jiaotong University, Xian, Shaanxi, China
| | - Lu Zong
- Department of Obstetrics & Gynecology, the First Affiliated Hospital of Xi’an Jiaotong University, Xian, Shaanxi, China
| |
Collapse
|
7
|
Mouat JS, Li S, Myint SS, Laufer BI, Lupo PJ, Schraw JM, Woodhouse JP, de Smith AJ, LaSalle JM. Epigenomic signature of major congenital heart defects in newborns with Down syndrome. Hum Genomics 2023; 17:92. [PMID: 37803336 PMCID: PMC10559462 DOI: 10.1186/s40246-023-00540-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 10/02/2023] [Indexed: 10/08/2023] Open
Abstract
BACKGROUND Congenital heart defects (CHDs) affect approximately half of individuals with Down syndrome (DS), but the molecular reasons for incomplete penetrance are unknown. Previous studies have largely focused on identifying genetic risk factors associated with CHDs in individuals with DS, but comprehensive studies of the contribution of epigenetic marks are lacking. We aimed to identify and characterize DNA methylation differences from newborn dried blood spots (NDBS) of DS individuals with major CHDs compared to DS individuals without CHDs. METHODS We used the Illumina EPIC array and whole-genome bisulfite sequencing (WGBS) to quantitate DNA methylation for 86 NDBS samples from the California Biobank Program: (1) 45 DS-CHD (27 female, 18 male) and (2) 41 DS non-CHD (27 female, 14 male). We analyzed global CpG methylation and identified differentially methylated regions (DMRs) in DS-CHD versus DS non-CHD comparisons (both sex-combined and sex-stratified) corrected for sex, age of blood collection, and cell-type proportions. CHD DMRs were analyzed for enrichment in CpG and genic contexts, chromatin states, and histone modifications by genomic coordinates and for gene ontology enrichment by gene mapping. DMRs were also tested in a replication dataset and compared to methylation levels in DS versus typical development (TD) WGBS NDBS samples. RESULTS We found global CpG hypomethylation in DS-CHD males compared to DS non-CHD males, which was attributable to elevated levels of nucleated red blood cells and not seen in females. At a regional level, we identified 58, 341, and 3938 CHD-associated DMRs in the Sex Combined, Females Only, and Males Only groups, respectively, and used machine learning algorithms to select 19 Males Only loci that could distinguish CHD from non-CHD. DMRs in all comparisons were enriched for gene exons, CpG islands, and bivalent chromatin and mapped to genes enriched for terms related to cardiac and immune functions. Lastly, a greater percentage of CHD-associated DMRs than background regions were differentially methylated in DS versus TD samples. CONCLUSIONS A sex-specific signature of DNA methylation was detected in NDBS of DS-CHD compared to DS non-CHD individuals. This supports the hypothesis that epigenetics can reflect the variability of phenotypes in DS, particularly CHDs.
Collapse
Affiliation(s)
- Julia S Mouat
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, CA, USA
- Perinatal Origins of Disparities Center, University of California, Davis, CA, USA
- Genome Center, University of California, Davis, CA, USA
- MIND Institute, University of California, Davis, CA, USA
| | - Shaobo Li
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Swe Swe Myint
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Benjamin I Laufer
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, CA, USA
- Perinatal Origins of Disparities Center, University of California, Davis, CA, USA
- Genome Center, University of California, Davis, CA, USA
- MIND Institute, University of California, Davis, CA, USA
| | - Philip J Lupo
- Division of Hematology-Oncology, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Jeremy M Schraw
- Division of Hematology-Oncology, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - John P Woodhouse
- Division of Hematology-Oncology, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Adam J de Smith
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Janine M LaSalle
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, CA, USA.
- Perinatal Origins of Disparities Center, University of California, Davis, CA, USA.
- Genome Center, University of California, Davis, CA, USA.
- MIND Institute, University of California, Davis, CA, USA.
| |
Collapse
|
8
|
Morabito F, Adornetto C, Monti P, Amaro A, Reggiani F, Colombo M, Rodriguez-Aldana Y, Tripepi G, D’Arrigo G, Vener C, Torricelli F, Rossi T, Neri A, Ferrarini M, Cutrona G, Gentile M, Greco G. Genes selection using deep learning and explainable artificial intelligence for chronic lymphocytic leukemia predicting the need and time to therapy. Front Oncol 2023; 13:1198992. [PMID: 37719021 PMCID: PMC10501728 DOI: 10.3389/fonc.2023.1198992] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 07/31/2023] [Indexed: 09/19/2023] Open
Abstract
Analyzing gene expression profiles (GEP) through artificial intelligence provides meaningful insight into cancer disease. This study introduces DeepSHAP Autoencoder Filter for Genes Selection (DSAF-GS), a novel deep learning and explainable artificial intelligence-based approach for feature selection in genomics-scale data. DSAF-GS exploits the autoencoder's reconstruction capabilities without changing the original feature space, enhancing the interpretation of the results. Explainable artificial intelligence is then used to select the informative genes for chronic lymphocytic leukemia prognosis of 217 cases from a GEP database comprising roughly 20,000 genes. The model for prognosis prediction achieved an accuracy of 86.4%, a sensitivity of 85.0%, and a specificity of 87.5%. According to the proposed approach, predictions were strongly influenced by CEACAM19 and PIGP, moderately influenced by MKL1 and GNE, and poorly influenced by other genes. The 10 most influential genes were selected for further analysis. Among them, FADD, FIBP, FIBP, GNE, IGF1R, MKL1, PIGP, and SLC39A6 were identified in the Reactome pathway database as involved in signal transduction, transcription, protein metabolism, immune system, cell cycle, and apoptosis. Moreover, according to the network model of the 3D protein-protein interaction (PPI) explored using the NetworkAnalyst tool, FADD, FIBP, IGF1R, QTRT1, GNE, SLC39A6, and MKL1 appear coupled into a complex network. Finally, all 10 selected genes showed a predictive power on time to first treatment (TTFT) in univariate analyses on a basic prognostic model including IGHV mutational status, del(11q) and del(17p), NOTCH1 mutations, β2-microglobulin, Rai stage, and B-lymphocytosis known to predict TTFT in CLL. However, only IGF1R [hazard ratio (HR) 1.41, 95% CI 1.08-1.84, P=0.013), COL28A1 (HR 0.32, 95% CI 0.10-0.97, P=0.045), and QTRT1 (HR 7.73, 95% CI 2.48-24.04, P<0.001) genes were significantly associated with TTFT in multivariable analyses when combined with the prognostic factors of the basic model, ultimately increasing the Harrell's c-index and the explained variation to 78.6% (versus 76.5% of the basic prognostic model) and 52.6% (versus 42.2% of the basic prognostic model), respectively. Also, the goodness of model fit was enhanced (χ2 = 20.1, P=0.002), indicating its improved performance above the basic prognostic model. In conclusion, DSAF-GS identified a group of significant genes for CLL prognosis, suggesting future directions for bio-molecular research.
Collapse
Affiliation(s)
| | - Carlo Adornetto
- Department of Mathematics and Computer Science, University of Calabria, Cosenza, Italy
| | - Paola Monti
- Mutagenesis and Cancer Prevention Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | - Adriana Amaro
- Tumor Epigenetics Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | - Francesco Reggiani
- Tumor Epigenetics Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | - Monica Colombo
- Molecular Pathology Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | | | - Giovanni Tripepi
- Consiglio Nazionale delle Ricerche, Istituto di Fisiologia Clinica del Consiglio Nazionale delle Ricerche (CNR), Reggio Calabria, Italy
| | - Graziella D’Arrigo
- Consiglio Nazionale delle Ricerche, Istituto di Fisiologia Clinica del Consiglio Nazionale delle Ricerche (CNR), Reggio Calabria, Italy
| | - Claudia Vener
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Federica Torricelli
- Laboratory of Translational Research, Azienda Unità Sanitaria Locale - Istituto di Ricovero e Cura a Crabtree Scientifico (USL-IRCCS) of Reggio Emilia, Reggio Emilia, Italy
| | - Teresa Rossi
- Laboratory of Translational Research, Azienda Unità Sanitaria Locale - Istituto di Ricovero e Cura a Crabtree Scientifico (USL-IRCCS) of Reggio Emilia, Reggio Emilia, Italy
| | - Antonino Neri
- Scientific Directorate, Azienda Unità Sanitaria Locale - Istituto di Ricovero e Cura a Carattere Scientifico (USL-IRCCS) of Reggio Emilia, Reggio Emilia, Italy
| | - Manlio Ferrarini
- Unità Operariva (UO) Molecular Pathology, Ospedale Policlinico San Martino Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Genoa, Italy
| | - Giovanna Cutrona
- Molecular Pathology Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | - Massimo Gentile
- Hematology Unit, Department of Onco-Hematology, Azienda Ospedaliera (A.O.) of Cosenza, Cosenza, Italy
- Department of Pharmacy and Health and Nutritional Sciences, University of Calabria, Cosenza, Italy
| | - Gianluigi Greco
- Department of Mathematics and Computer Science, University of Calabria, Cosenza, Italy
| |
Collapse
|
9
|
Mouat JS, Li S, Myint SS, Laufer BI, Lupo PJ, Schraw JM, Woodhouse JP, de Smith AJ, LaSalle JM. Epigenomic signature of major congenital heart defects in newborns with Down syndrome. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.02.23289417. [PMID: 37205408 PMCID: PMC10187438 DOI: 10.1101/2023.05.02.23289417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Background Congenital heart defects (CHDs) affect approximately half of individuals with Down syndrome (DS) but the molecular reasons for incomplete penetrance are unknown. Previous studies have largely focused on identifying genetic risk factors associated with CHDs in individuals with DS, but comprehensive studies of the contribution of epigenetic marks are lacking. We aimed to identify and characterize DNA methylation differences from newborn dried blood spots (NDBS) of DS individuals with major CHDs compared to DS individuals without CHDs. Methods We used the Illumina EPIC array and whole-genome bisulfite sequencing (WGBS) to quantitate DNA methylation for 86 NDBS samples from the California Biobank Program: 1) 45 DS-CHD (27 female, 18 male) and 2) 41 DS non-CHD (27 female, 14 male). We analyzed global CpG methylation and identified differentially methylated regions (DMRs) in DS-CHD vs DS non-CHD comparisons (both sex-combined and sex-stratified) corrected for sex, age of blood collection, and cell type proportions. CHD DMRs were analyzed for enrichment in CpG and genic contexts, chromatin states, and histone modifications by genomic coordinates and for gene ontology enrichment by gene mapping. DMRs were also tested in a replication dataset and compared to methylation levels in DS vs typical development (TD) WGBS NDBS samples. Results We found global CpG hypomethylation in DS-CHD males compared to DS non-CHD males, which was attributable to elevated levels of nucleated red blood cells and not seen in females. At a regional level, we identified 58, 341, and 3,938 CHD-associated DMRs in the Sex Combined, Females Only, and Males Only groups, respectively, and used machine learning algorithms to select 19 Males Only loci that could distinguish CHD from non-CHD. DMRs in all comparisons were enriched for gene exons, CpG islands, and bivalent chromatin and mapped to genes enriched for terms related to cardiac and immune functions. Lastly, a greater percentage of CHD-associated DMRs than background regions were differentially methylated in DS vs TD samples. Conclusions A sex-specific signature of DNA methylation was detected in NDBS of DS-CHD compared to DS non-CHD individuals. This supports the hypothesis that epigenetics can reflect the variability of phenotypes in DS, particularly CHDs.
Collapse
Affiliation(s)
- Julia S Mouat
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, CA USA
- Perinatal Origins of Disparities Center, University of California, Davis, CA USA
- Genome Center, University of California, Davis, CA USA
- MIND Institute, University of California, Davis, CA USA
| | - Shaobo Li
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, CA USA
| | - Swe Swe Myint
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, CA USA
| | - Benjamin I Laufer
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, CA USA
- Perinatal Origins of Disparities Center, University of California, Davis, CA USA
- Genome Center, University of California, Davis, CA USA
- MIND Institute, University of California, Davis, CA USA
| | - Philip J Lupo
- Division of Hematology-Oncology, Department of Pediatrics, Baylor College of Medicine, Houston, TX USA
| | - Jeremy M Schraw
- Division of Hematology-Oncology, Department of Pediatrics, Baylor College of Medicine, Houston, TX USA
| | - John P Woodhouse
- Division of Hematology-Oncology, Department of Pediatrics, Baylor College of Medicine, Houston, TX USA
| | - Adam J de Smith
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, CA USA
| | - Janine M LaSalle
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, CA USA
- Perinatal Origins of Disparities Center, University of California, Davis, CA USA
- Genome Center, University of California, Davis, CA USA
- MIND Institute, University of California, Davis, CA USA
| |
Collapse
|
10
|
Huang AA, Huang SY. Computation of the distribution of model accuracy statistics in machine learning: Comparison between analytically derived distributions and simulation-based methods. Health Sci Rep 2023; 6:e1214. [PMID: 37091362 PMCID: PMC10119581 DOI: 10.1002/hsr2.1214] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 03/16/2023] [Accepted: 03/20/2023] [Indexed: 04/25/2023] Open
Abstract
Background and Aims All fields have seen an increase in machine-learning techniques. To accurately evaluate the efficacy of novel modeling methods, it is necessary to conduct a critical evaluation of the utilized model metrics, such as sensitivity, specificity, and area under the receiver operator characteristic curve (AUROC). For commonly used model metrics, we proposed the use of analytically derived distributions (ADDs) and compared it with simulation-based approaches. Methods A retrospective cohort study was conducted using the England National Health Services Heart Disease Prediction Cohort. Four machine learning models (XGBoost, Random Forest, Artificial Neural Network, and Adaptive Boost) were used. The distribution of the model metrics and covariate gain statistics were empirically derived using boot-strap simulation (N = 10,000). The ADDs were created from analytic formulas from the covariates to describe the distribution of the model metrics and compared with those of bootstrap simulation. Results XGBoost had the most optimal model having the highest AUROC and the highest aggregate score considering six other model metrics. Based on the Anderson-Darling test, the distribution of the model metrics created from bootstrap did not significantly deviate from a normal distribution. The variance created from the ADD led to smaller SDs than those derived from bootstrap simulation, whereas the rest of the distribution remained not statistically significantly different. Conclusions ADD allows for cross study comparison of model metrics, which is usually done with bootstrapping that rely on simulations, which cannot be replicated by the reader.
Collapse
Affiliation(s)
- Alexander A. Huang
- Northwestern University Feinberg School of MedicineNorthwestern UniversityChicagoIllinoisUSA
| | - Samuel Y. Huang
- Virginia Commonwealth School of MedicineVirginia Commonwealth UniversityRichmondVirginiaUSA
| |
Collapse
|
11
|
Sahoo B, Pinnix Z, Sims S, Zelikovsky A. Identifying Biomarkers Using Support Vector Machine to Understand the Racial Disparity in Triple-Negative Breast Cancer. J Comput Biol 2023; 30:502-517. [PMID: 36716280 PMCID: PMC10325814 DOI: 10.1089/cmb.2022.0422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
With the properties of aggressive cancer and heterogeneous tumor biology, triple-negative breast cancer (TNBC) is a type of breast cancer known for its poor clinical outcome. The lack of estrogen, progesterone, and human epidermal growth factor receptor in the tumors of TNBC leads to fewer treatment options in clinics. The incidence of TNBC is higher in African American (AA) women compared with European American (EA) women with worse clinical outcomes. The significant factors responsible for the racial disparity in TNBC are socioeconomic lifestyle and tumor biology. The current study considered the open-source gene expression data of triple-negative breast cancer samples' racial information. We implemented a state-of-the-art classification Support Vector Machine (SVM) method with a recurrent feature elimination approach to the gene expression data to identify significant biomarkers deregulated in AA women and EA women. We also included Spearman's rho and Ward's linkage method in our feature selection workflow. Our proposed method generates 24 features/genes that can classify the AA and EA samples 98% accurately. We also performed the Kaplan-Meier analysis and log-rank test on the 24 features/genes. We only discussed the correlation between deregulated expression and cancer progression with a poor survival rate of 2 genes, KLK10 and LRRC37A2, out of 24 genes. We believe that further improvement of our method with a higher number of RNA-seq gene expression data will more accurately provide insight into racial disparity in TNBC.
Collapse
Affiliation(s)
- Bikram Sahoo
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | - Zandra Pinnix
- Department of Biology and Marine Biology, University of North Carolina at Wilmington, Wilmington, North Carolina, USA
| | - Seth Sims
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| |
Collapse
|
12
|
Pradhan UK, Meher PK, Naha S, Rao AR, Gupta A. ASLncR: a novel computational tool for prediction of abiotic stress-responsive long non-coding RNAs in plants. Funct Integr Genomics 2023; 23:113. [PMID: 37000299 DOI: 10.1007/s10142-023-01040-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 03/23/2023] [Accepted: 03/24/2023] [Indexed: 04/01/2023]
Abstract
Abiotic stresses are detrimental to plant growth and development and have a major negative impact on crop yields. A growing body of evidence indicates that a large number of long non-coding RNAs (lncRNAs) are key to many abiotic stress responses. Thus, identifying abiotic stress-responsive lncRNAs is essential in crop breeding programs in order to develop crop cultivars resistant to abiotic stresses. In this study, we have developed the first machine learning-based computational model for predicting abiotic stress-responsive lncRNAs. The lncRNA sequences which were responsive and non-responsive to abiotic stresses served as the two classes of the dataset for binary classification using the machine learning algorithms. The training dataset was created using 263 stress-responsive and 263 non-stress-responsive sequences, whereas the independent test set consists of 101 sequences from both classes. As the machine learning model can adopt only the numeric data, the Kmer features ranging from sizes 1 to 6 were utilized to represent lncRNAs in numeric form. To select important features, four different feature selection strategies were utilized. Among the seven learning algorithms, the support vector machine (SVM) achieved the highest cross-validation accuracy with the selected feature sets. The observed 5-fold cross-validation accuracy, AU-ROC, and AU-PRC were found to be 68.84, 72.78, and 75.86%, respectively. Furthermore, the robustness of the developed model (SVM with the selected feature) was evaluated using an independent test dataset, where the overall accuracy, AU-ROC, and AU-PRC were found to be 76.23, 87.71, and 88.49%, respectively. The developed computational approach was also implemented in an online prediction tool ASLncR accessible at https://iasri-sg.icar.gov.in/aslncr/ . The proposed computational model and the developed prediction tool are believed to supplement the existing effort for the identification of abiotic stress-responsive lncRNAs in plants.
Collapse
Affiliation(s)
- Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| | - Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India.
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| | | | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| |
Collapse
|
13
|
Pradhan UK, Meher PK, Naha S, Rao AR, Kumar U, Pal S, Gupta A. ASmiR: a machine learning framework for prediction of abiotic stress-specific miRNAs in plants. Funct Integr Genomics 2023; 23:92. [PMID: 36939943 DOI: 10.1007/s10142-023-01014-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 01/18/2023] [Accepted: 03/06/2023] [Indexed: 03/21/2023]
Abstract
Abiotic stresses have become a major challenge in recent years due to their pervasive nature and shocking impacts on plant growth, development, and quality. MicroRNAs (miRNAs) play a significant role in plant response to different abiotic stresses. Thus, identification of specific abiotic stress-responsive miRNAs holds immense importance in crop breeding programmes to develop cultivars resistant to abiotic stresses. In this study, we developed a machine learning-based computational model for prediction of miRNAs associated with four specific abiotic stresses such as cold, drought, heat and salt. The pseudo K-tuple nucleotide compositional features of Kmer size 1 to 5 were used to represent miRNAs in numeric form. Feature selection strategy was employed to select important features. With the selected feature sets, support vector machine (SVM) achieved the highest cross-validation accuracy in all four abiotic stress conditions. The highest cross-validated prediction accuracies in terms of area under precision-recall curve were found to be 90.15, 90.09, 87.71, and 89.25% for cold, drought, heat and salt respectively. Overall prediction accuracies for the independent dataset were respectively observed 84.57, 80.62, 80.38 and 82.78%, for the abiotic stresses. The SVM was also seen to outperform different deep learning models for prediction of abiotic stress-responsive miRNAs. To implement our method with ease, an online prediction server "ASmiR" has been established at https://iasri-sg.icar.gov.in/asmir/ . The proposed computational model and the developed prediction tool are believed to supplement the existing effort for identification of specific abiotic stress-responsive miRNAs in plants.
Collapse
Affiliation(s)
- Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| | - Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India.
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| | | | - Upendra Kumar
- Department of Molecular Biology, Biotechnology and Bioinformatics, College of Basic Sciences and Humanities, CCS Haryana Agricultural University, Hisar, 125004, India
| | - Soumen Pal
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| |
Collapse
|
14
|
Bockorny B, Muthuswamy L, Huang L, Hadisurya M, Lim CM, Tsai LL, Gill RR, Wei JL, Bullock AJ, Grossman JE, Besaw RJ, Narasimhan S, Tao WA, Perea S, Sawhney MS, Freedman SD, Hidalgo M, Iliuk A, Muthuswamy SK. A Large-Scale Proteomics Resource of Circulating Extracellular Vesicles for Biomarker Discovery in Pancreatic Cancer. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.03.13.23287216. [PMID: 36993200 PMCID: PMC10055460 DOI: 10.1101/2023.03.13.23287216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Pancreatic cancer has the worst prognosis of all common tumors. Earlier cancer diagnosis could increase survival rates and better assessment of metastatic disease could improve patient care. As such, there is an urgent need to develop biomarkers to diagnose this deadly malignancy earlier. Analyzing circulating extracellular vesicles (cEVs) using 'liquid biopsies' offers an attractive approach to diagnose and monitor disease status. However, it is important to differentiate EV-associated proteins enriched in patients with pancreatic ductal adenocarcinoma (PDAC) from those with benign pancreatic diseases such as chronic pancreatitis and intraductal papillary mucinous neoplasm (IPMN). To meet this need, we combined the novel EVtrap method for highly efficient isolation of EVs from plasma and conducted proteomics analysis of samples from 124 individuals, including patients with PDAC, benign pancreatic diseases and controls. On average, 912 EV proteins were identified per 100μL of plasma. EVs containing high levels of PDCD6IP, SERPINA12 and RUVBL2 were associated with PDAC compared to the benign diseases in both discovery and validation cohorts. EVs with PSMB4, RUVBL2 and ANKAR were associated with metastasis, and those with CRP, RALB and CD55 correlated with poor clinical prognosis. Finally, we validated a 7-EV protein PDAC signature against a background of benign pancreatic diseases that yielded an 89% prediction accuracy for the diagnosis of PDAC. To our knowledge, our study represents the largest proteomics profiling of circulating EVs ever conducted in pancreatic cancer and provides a valuable open-source atlas to the scientific community with a comprehensive catalogue of novel cEVs that may assist in the development of biomarkers and improve the outcomes of patients with PDAC.
Collapse
Affiliation(s)
- Bruno Bockorny
- Division of Medical Oncology, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | - Ling Huang
- Henry Ford Cancer Institute, Detroit, MI, USA
| | - Marco Hadisurya
- Department of Biochemistry, Purdue University, West Lafayette, IN, USA
| | | | - Leo L. Tsai
- Harvard Medical School, Boston, MA, USA
- Department of Radiology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Ritu R. Gill
- Harvard Medical School, Boston, MA, USA
- Department of Radiology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Jesse L. Wei
- Harvard Medical School, Boston, MA, USA
- Department of Radiology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Andrea J. Bullock
- Division of Medical Oncology, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | - Robert J. Besaw
- Division of Medical Oncology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | | | - W. Andy Tao
- Department of Biochemistry, Purdue University, West Lafayette, IN, USA
| | - Sofia Perea
- Division of Medical Oncology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Mandeep S. Sawhney
- Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology, Beth Israel Deaconess Medical Center, Boston, MA
| | - Steven D. Freedman
- Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology, Beth Israel Deaconess Medical Center, Boston, MA
| | - Manuel Hidalgo
- Division of Hematology-Oncology, Weill Cornell Medical College, New York, NY, USA
- New York-Presbyterian Hospital, New York, NY, USA
| | - Anton Iliuk
- Tymora Analytical Operations, West Lafayette, IN, USA
| | | |
Collapse
|
15
|
Zhao T, Zhu G, Dubey HV, Flaherty P. Identification of significant gene expression changes in multiple perturbation experiments using knockoffs. Brief Bioinform 2023; 24:bbad084. [PMID: 36892174 PMCID: PMC10025447 DOI: 10.1093/bib/bbad084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/20/2023] [Accepted: 02/13/2023] [Indexed: 03/10/2023] Open
Abstract
Large-scale multiple perturbation experiments have the potential to reveal a more detailed understanding of the molecular pathways that respond to genetic and environmental changes. A key question in these studies is which gene expression changes are important for the response to the perturbation. This problem is challenging because (i) the functional form of the nonlinear relationship between gene expression and the perturbation is unknown and (ii) identification of the most important genes is a high-dimensional variable selection problem. To deal with these challenges, we present here a method based on the model-X knockoffs framework and Deep Neural Networks to identify significant gene expression changes in multiple perturbation experiments. This approach makes no assumptions on the functional form of the dependence between the responses and the perturbations and it enjoys finite sample false discovery rate control for the selected set of important gene expression responses. We apply this approach to the Library of Integrated Network-Based Cellular Signature data sets which is a National Institutes of Health Common Fund program that catalogs how human cells globally respond to chemical, genetic and disease perturbations. We identified important genes whose expression is directly modulated in response to perturbation with anthracycline, vorinostat, trichostatin-a, geldanamycin and sirolimus. We compare the set of important genes that respond to these small molecules to identify co-responsive pathways. Identification of which genes respond to specific perturbation stressors can provide better understanding of the underlying mechanisms of disease and advance the identification of new drug targets.
Collapse
Affiliation(s)
- Tingting Zhao
- Department of Information Systems and Analytics, College of Business, Bryant University, Smithfield, 02917, RI, USA
- Center for Health and Behavioral Sciences, Bryant University, Smithfield, 02917, RI, USA
| | - Guangyu Zhu
- Department of Computer Science and Statistics, University of Rhode Island, Kingston, 02881, RI, USA
| | - Harsh Vardhan Dubey
- Department of Mathematics & Statistics, University of Massachusetts Amherst, Amherst, 01003, MA, USA
| | - Patrick Flaherty
- Department of Mathematics & Statistics, University of Massachusetts Amherst, Amherst, 01003, MA, USA
| |
Collapse
|
16
|
Pradhan UK, Meher PK, Naha S, Pal S, Gupta A, Parsad R. PlDBPred: a novel computational model for discovery of DNA binding proteins in plants. Brief Bioinform 2023; 24:6840070. [PMID: 36416116 DOI: 10.1093/bib/bbac483] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 10/10/2022] [Accepted: 10/11/2022] [Indexed: 11/24/2022] Open
Abstract
DNA-binding proteins (DBPs) play crucial roles in numerous cellular processes including nucleotide recognition, transcriptional control and the regulation of gene expression. Majority of the existing computational techniques for identifying DBPs are mainly applicable to human and mouse datasets. Even though some models have been tested on Arabidopsis, they produce poor accuracy when applied to other plant species. Therefore, it is imperative to develop an effective computational model for predicting plant DBPs. In this study, we developed a comprehensive computational model for plant specific DBPs identification. Five shallow learning and six deep learning models were initially used for prediction, where shallow learning methods outperformed deep learning algorithms. In particular, support vector machine achieved highest repeated 5-fold cross-validation accuracy of 94.0% area under receiver operating characteristic curve (AUC-ROC) and 93.5% area under precision recall curve (AUC-PR). With an independent dataset, the developed approach secured 93.8% AUC-ROC and 94.6% AUC-PR. While compared with the state-of-art existing tools by using an independent dataset, the proposed model achieved much higher accuracy. Overall results suggest that the developed computational model is more efficient and reliable as compared to the existing models for the prediction of DBPs in plants. For the convenience of the majority of experimental scientists, the developed prediction server PlDBPred is publicly accessible at https://iasri-sg.icar.gov.in/pldbpred/.The source code is also provided at https://iasri-sg.icar.gov.in/pldbpred/source_code.php for prediction using a large-size dataset.
Collapse
Affiliation(s)
- Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi-110012, India
| | - Soumen Pal
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi-110012, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi-110012, India
| |
Collapse
|
17
|
Hamraz M, Ali A, Mashwani WK, Aldahmani S, Khan Z. Feature selection for high dimensional microarray gene expression data via weighted signal to noise ratio. PLoS One 2023; 18:e0284619. [PMID: 37098036 PMCID: PMC10128961 DOI: 10.1371/journal.pone.0284619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 04/04/2023] [Indexed: 04/26/2023] Open
Abstract
Feature selection in high dimensional gene expression datasets not only reduces the dimension of the data, but also the execution time and computational cost of the underlying classifier. The current study introduces a novel feature selection method called weighted signal to noise ratio (WSNR) by exploiting the weights of features based on support vectors and signal to noise ratio, with an objective to identify the most informative genes in high dimensional classification problems. The combination of two state-of-the-art procedures enables the extration of the most informative genes. The corresponding weights of these procedures are then multiplied and arranged in decreasing order. Larger weight of a feature indicates its discriminatory power in classifying the tissue samples to their true classes. The current method is validated on eight gene expression datasets. Moreover, results of the proposed method (WSNR) are also compared with four well known feature selection methods. We found that the (WSNR) outperform the other competing methods on 6 out of 8 datasets. Box-plots and Bar-plots of the results of the proposed method and all the other methods are also constructed. The proposed method is further assessed on simulated data. Simulation analysis reveal that (WSNR) outperforms all the other methods included in the study.
Collapse
Affiliation(s)
- Muhammad Hamraz
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Amjad Ali
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Wali Khan Mashwani
- Institute of Numerical Sciences, Kohat University of Science and Technology, Kohat, Pakistan
| | - Saeed Aldahmani
- Department of Analytics in the Digital Era, United Arab Emirates University, Al Ain, UAE
| | - Zardad Khan
- Department of Analytics in the Digital Era, United Arab Emirates University, Al Ain, UAE
| |
Collapse
|
18
|
Zarei Ghobadi M, Emamzadeh R, Teymoori-Rad M, Afsaneh E. Exploration of blood−derived coding and non-coding RNA diagnostic immunological panels for COVID-19 through a co-expressed-based machine learning procedure. Front Immunol 2022; 13:1001070. [DOI: 10.3389/fimmu.2022.1001070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/17/2022] [Indexed: 11/06/2022] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS- CoV-2) is the causative virus of the pandemic coronavirus disease 2019 (COVID-19). Evaluating the immunological factors and other implicated processes underlying the progression of COVID-19 is essential for the recognition and then the design of efficacious therapies. Therefore, we analyzed RNAseq data obtained from PBMCs of the COVID-19 patients to explore coding and non-coding RNA diagnostic immunological panels. For this purpose, we integrated multiple RNAseq data and analyzed them overall as well as by considering the state of disease including severe and non-severe conditions. Afterward, we utilized a co-expressed-based machine learning procedure comprising weighted-gene co-expression analysis and differential expression gene as filter phase and recursive feature elimination-support vector machine as wrapper phase. This procedure led to the identification of two modules containing 5 and 84 genes which are mostly involved in cell dysregulation and innate immune suppression, respectively. Moreover, the role of vitamin D in regulating some classifiers was highlighted. Further analysis disclosed the role of discriminant miRNAs including miR-197-3p, miR-150-5p, miR-340-5p, miR-122-5p, miR-1307-3p, miR-34a-5p, miR-98-5p and their target genes comprising GAN, VWC2, TNFRSF6B, and CHST3 in the metabolic pathways. These classifiers differentiate the final fate of infection toward severe or non-severe COVID-19. The identified classifier genes and miRNAs may help in the proper design of therapeutic procedures considering their involvement in the immune and metabolic pathways.
Collapse
|
19
|
Chen Z, Shi J, Zhang Y, Zhang J, Li S, Guan L, Jia G. Screening of Serum Biomarkers of Coal Workers' Pneumoconiosis by Metabolomics Combined with Machine Learning Strategy. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19127051. [PMID: 35742299 PMCID: PMC9222502 DOI: 10.3390/ijerph19127051] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 06/04/2022] [Accepted: 06/07/2022] [Indexed: 12/03/2022]
Abstract
Pneumoconiosis remains one of the most serious global occupational diseases. However, effective treatments are lacking, and early detection is crucial for disease prevention. This study aimed to explore serum biomarkers of occupational coal workers’ pneumoconiosis (CWP) by high-throughput metabolomics, combining with machine learning strategy for precision screening. A case–control study was conducted in Beijing, China, involving 150 pneumoconiosis patients with different stages and 120 healthy controls. Metabolomics found a total of 68 differential metabolites between the CWP group and the control group. Then, potential biomarkers of CWP were screened from these differential metabolites by three machine learning methods. The four most important differential metabolites were identified as benzamide, terazosin, propylparaben and N-methyl-2-pyrrolidone. However, after adjusting for the influence of confounding factors, including age, smoking, drinking and chronic diseases, only one metabolite, propylparaben, was significantly correlated with CWP. The more severe CWP was, the higher the content of propylparaben in serum. Moreover, the receiver operating characteristic curve (ROC) of propylparaben showed good sensitivity and specificity as a biomarker of CWP. Therefore, it was demonstrated that the serum metabolite profiles in CWP patients changed significantly and that the serum metabolites represented by propylparaben were good biomarkers of CWP.
Collapse
Affiliation(s)
- Zhangjian Chen
- Department of Occupational and Environmental Health Sciences, School of Public Health, Peking University, Beijing 100191, China; (Z.C.); (J.S.); (Y.Z.); (J.Z.)
| | - Jiaqi Shi
- Department of Occupational and Environmental Health Sciences, School of Public Health, Peking University, Beijing 100191, China; (Z.C.); (J.S.); (Y.Z.); (J.Z.)
| | - Yi Zhang
- Department of Occupational and Environmental Health Sciences, School of Public Health, Peking University, Beijing 100191, China; (Z.C.); (J.S.); (Y.Z.); (J.Z.)
| | - Jiahe Zhang
- Department of Occupational and Environmental Health Sciences, School of Public Health, Peking University, Beijing 100191, China; (Z.C.); (J.S.); (Y.Z.); (J.Z.)
| | - Shuqiang Li
- Department of Occupational Disease, Peking University Third Hospital, Beijing 100191, China;
| | - Li Guan
- Department of Occupational Disease, Peking University Third Hospital, Beijing 100191, China;
- Correspondence: (L.G.); (G.J.)
| | - Guang Jia
- Department of Occupational and Environmental Health Sciences, School of Public Health, Peking University, Beijing 100191, China; (Z.C.); (J.S.); (Y.Z.); (J.Z.)
- Correspondence: (L.G.); (G.J.)
| |
Collapse
|
20
|
Ma L, Gong J, Zhao M, Kong X, Gao P, Jiang Y, Liu Y, Feng X, Si S, Cao Y. A Novel Stool Methylation Test for the Non-Invasive Screening of Gastric and Colorectal Cancer. Front Oncol 2022; 12:860701. [PMID: 35419280 PMCID: PMC8995552 DOI: 10.3389/fonc.2022.860701] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Accepted: 03/02/2022] [Indexed: 11/13/2022] Open
Abstract
Background Because of poor compliance or low sensitivity, existing diagnostic approaches are unable to provide an efficient diagnosis of patients with gastric and colorectal cancer. Here, we developed the ColoCaller test, which simultaneously detects the methylation status of the SDC2, TFPI2, WIF1, and NDRG4 genes in stool DNA, to optimize the screening of gastric and colorectal cancer in high-risk populations. Methods A total of 217 stool samples from patients with gastrointestinal cancer and from patients with negative endoscopy were prospectively collected, complete with preoperative and postoperative clinical data from patients. The methylation of these samples was detected using ColoCaller, which was designed by selecting CpGs with a two-step screening strategy, and was interpreted using a prediction model built using libSVM to evaluate its clinical value for gastric and colorectal cancer screening. Results Compared to pathological diagnosis, the sensitivity and specificity of the ColoCaller test in 217 stool DNA samples were 95.56% and 91.86%, respectively, for colorectal cancer, and 67.5% and 97.81%, respectively, for gastric cancer. The detection limit was as low as 1% in 8 ng of DNA. Conclusion In this study, we developed and established a new test, ColoCaller, which can be used as a screening tool or as an auxiliary diagnostic approach in high-risk populations with gastric and colorectal cancer to promote timely diagnosis and treatment.
Collapse
Affiliation(s)
- Liang Ma
- Department of Clinical Laboratory, China-Japan Friendship Hospital, Beijing, China
| | - Jian Gong
- Department of Research and Development, Apexbio Biotechnology (Suzhou) Co., Ltd., Suzhou, China
| | - Meimei Zhao
- Department of Clinical Laboratory, China-Japan Friendship Hospital, Beijing, China
| | - Xiaomu Kong
- Department of Clinical Laboratory, China-Japan Friendship Hospital, Beijing, China
| | - Peng Gao
- Department of Clinical Laboratory, China-Japan Friendship Hospital, Beijing, China
| | - Yongwei Jiang
- Department of Clinical Laboratory, China-Japan Friendship Hospital, Beijing, China
| | - Yi Liu
- Department of Clinical Laboratory, China-Japan Friendship Hospital, Beijing, China
| | - Xiaoyan Feng
- Department of Research and Development, Apexbio Biotechnology (Suzhou) Co., Ltd., Suzhou, China
| | - Shuang Si
- Department of General Surgery, China-Japan Friendship Hospital, Beijing, China
| | - Yongtong Cao
- Department of Clinical Laboratory, China-Japan Friendship Hospital, Beijing, China
| |
Collapse
|
21
|
Tan MS, Cheah PL, Chin AV, Looi LM, Chang SW. A review on omics-based biomarkers discovery for Alzheimer's disease from the bioinformatics perspectives: Statistical approach vs machine learning approach. Comput Biol Med 2021; 139:104947. [PMID: 34678481 DOI: 10.1016/j.compbiomed.2021.104947] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 10/12/2021] [Accepted: 10/12/2021] [Indexed: 12/26/2022]
Abstract
Alzheimer's Disease (AD) is a neurodegenerative disease that affects cognition and is the most common cause of dementia in the elderly. As the number of elderly individuals increases globally, the incidence and prevalence of AD are expected to increase. At present, AD is diagnosed clinically, according to accepted criteria. The essential elements in the diagnosis of AD include a patients history, a physical examination and neuropsychological testing, in addition to appropriate investigations such as neuroimaging. The omics-based approach is an emerging field of study that may not only aid in the diagnosis of AD but also facilitate the exploration of factors that influence the development of the disease. Omics techniques, including genomics, transcriptomics, proteomics and metabolomics, may reveal the pathways that lead to neuronal death and identify biomolecular markers associated with AD. This will further facilitate an understanding of AD neuropathology. In this review, omics-based approaches that were implemented in studies on AD were assessed from a bioinformatics perspective. Current state-of-the-art statistical and machine learning approaches used in the single omics analysis of AD were compared based on correlations of variants, differential expression, functional analysis and network analysis. This was followed by a review of the approaches used in the integration and analysis of multi-omics of AD. The strengths and limitations of multi-omics analysis methods were explored and the issues and challenges associated with omics studies of AD were highlighted. Lastly, future studies in this area of research were justified.
Collapse
Affiliation(s)
- Mei Sze Tan
- Bioinformatics Programme, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Phaik-Leng Cheah
- Department of Pathology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Ai-Vyrn Chin
- Division of Geriatric Medicine, Department of Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Lai-Meng Looi
- Department of Pathology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Siow-Wee Chang
- Bioinformatics Programme, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia.
| |
Collapse
|
22
|
Zhang Y, Wei X, Cao C, Yu F, Li W, Zhao G, Wei H, Zhang F, Meng P, Sun S, Lammi MJ, Guo X. Identifying discriminative features for diagnosis of Kashin-Beck disease among adolescents. BMC Musculoskelet Disord 2021; 22:801. [PMID: 34537022 PMCID: PMC8449456 DOI: 10.1186/s12891-021-04514-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 07/07/2021] [Indexed: 11/23/2022] Open
Abstract
INTRODUCTION Diagnosing Kashin-Beck disease (KBD) involves damages to multiple joints and carries variable clinical symptoms, posing great challenge to the diagnosis of KBD for clinical practitioners. However, it is still unclear which clinical features of KBD are more informative for the diagnosis of Kashin-Beck disease among adolescent. METHODS We first manually extracted 26 possible features including clinical manifestations, and pathological changes of X-ray images from 400 KBD and 400 non-KBD adolescents. With such features, we performed four classification methods, i.e., random forest algorithms (RFA), artificial neural networks (ANNs), support vector machines (SVMs) and linear regression (LR) with four feature selection methods, i.e., RFA, minimum redundancy maximum relevance (mRMR), support vector machine recursive feature elimination (SVM-RFE) and Relief. The performance of diagnosis of KBD with respect to different classification models were evaluated by sensitivity, specificity, accuracy, and the area under the receiver operating characteristic (ROC) curve (AUC). RESULTS Our results demonstrated that the 10 out of 26 discriminative features were displayed more powerful performance, regardless of the chosen of classification models and feature selection methods. These ten discriminative features were distal end of phalanges alterations, metaphysis alterations and carpals alterations and clinical manifestations of ankle joint movement limitation, enlarged finger joints, flexion of the distal part of fingers, elbow joint movement limitation, squatting limitation, deformed finger joints, wrist joint movement limitation. CONCLUSIONS The selected ten discriminative features could provide a fast, effective diagnostic standard for KBD adolescents.
Collapse
Affiliation(s)
- Yanan Zhang
- School of Public Health, Xi'an Jiaotong University, Key Laboratory of Trace Elements and Endemic Diseases, National Health Commission of the People's Republic of China, Xi'an, Shaanxi, P.R. China
| | - Xiaoli Wei
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, Shaanxi, P.R. China
| | - Chunxia Cao
- Institute of Disaster Medicine, Tianjin University, Tianjin, P.R. China
| | - Fangfang Yu
- Department of Health Statistics, College of Public Health, Zhengzhou University, Zhengzhou, P. R. China
| | - Wenrong Li
- School of Public Health, Xi'an Jiaotong University, Key Laboratory of Trace Elements and Endemic Diseases, National Health Commission of the People's Republic of China, Xi'an, Shaanxi, P.R. China
- Department of Medical Imaging, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, P. R. China
| | - Guanghui Zhao
- Xi'an Honghui Hospital, Health Science Center of Xi'an Jiaotong University, Xi'an, Shaanxi, P.R. China
| | - Haiyan Wei
- School of Public Health, Xi'an Jiaotong University, Key Laboratory of Trace Elements and Endemic Diseases, National Health Commission of the People's Republic of China, Xi'an, Shaanxi, P.R. China
| | - Feng'e Zhang
- School of Public Health, Xi'an Jiaotong University, Key Laboratory of Trace Elements and Endemic Diseases, National Health Commission of the People's Republic of China, Xi'an, Shaanxi, P.R. China
| | - Peilin Meng
- School of Public Health, Xi'an Jiaotong University, Key Laboratory of Trace Elements and Endemic Diseases, National Health Commission of the People's Republic of China, Xi'an, Shaanxi, P.R. China
| | - Shiquan Sun
- School of Public Health, Xi'an Jiaotong University, Key Laboratory of Trace Elements and Endemic Diseases, National Health Commission of the People's Republic of China, Xi'an, Shaanxi, P.R. China
| | - Mikko Juhani Lammi
- School of Public Health, Xi'an Jiaotong University, Key Laboratory of Trace Elements and Endemic Diseases, National Health Commission of the People's Republic of China, Xi'an, Shaanxi, P.R. China.
- Department of Integrative Medical Biology, University of Umeå, 90187, Umeå, Sweden.
| | - Xiong Guo
- School of Public Health, Xi'an Jiaotong University, Key Laboratory of Trace Elements and Endemic Diseases, National Health Commission of the People's Republic of China, Xi'an, Shaanxi, P.R. China.
| |
Collapse
|
23
|
Greco FA, McKee AC, Kowall NW, Hanlon EB. Near-Infrared Optical Spectroscopy In Vivo Distinguishes Subjects with Alzheimer's Disease from Age-Matched Controls. J Alzheimers Dis 2021; 82:791-802. [PMID: 34092628 PMCID: PMC8385529 DOI: 10.3233/jad-201021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Background: Medical imaging methods such as PET and MRI aid clinical assessment of Alzheimer’s disease (AD). Less expensive, less technically demanding, and more widely deployable technologies are needed to expand objective screening for diagnosis, treatment, and research. We previously reported brain tissue near-infrared optical spectroscopy (NIR) in vitro indicating the potential to meet this need. Objective: To determine whether completely non-invasive, clinical, NIR in vivo can distinguish AD patients from age-matched controls and to show the potential of NIR as a clinical screen and monitor of therapeutic efficacy. Methods: NIR spectra were acquired in vivo. Three groups were studied: autopsy-confirmed AD, control and mild cognitive impairment (MCI). A feature selection approach using the first derivative of the intensity normalized spectra was used to discover spectral regions that best distinguished “AD-alone” (i.e., without other significant neuropathology) from controls. The approach was then applied to other autopsy-confirmed AD cases and to clinically diagnosed MCI cases. Results: Two regions about 860 and 895 nm completely separate AD patients from controls and differentiate MCI subjects according to the degree of impairment. The 895 nm feature is more important in separating MCI subjects from controls (ratio-of-weights: 1.3); the 860 nm feature is more important for distinguishing MCI from AD (ratio-of-weights: 8.2). Conclusion: These results form a proof of the concept that near-infrared spectroscopy can detect and classify diseased and normal human brain in vivo. A clinical trial is needed to determine whether the two features can track disease progression and monitor potential therapeutic interventions.
Collapse
Affiliation(s)
- Frank A Greco
- VA Bedford Healthcare System, Medical Research & Development Service, Bedford, MA, USA
| | - Ann C McKee
- VA Bedford Healthcare System, Medical Research & Development Service, Bedford, MA, USA.,VA Boston Healthcare System, Neurology Service, Boston, MA, USA.,Boston University School of Medicine, Alzheimer's Disease Center, and Chronic Traumatic Encephalopathy Center, Boston, MA, USA.,Boston University School of Medicine, Department of Pathology and Laboratory Medicine, and Department of Neurology, Boston, MA, USA
| | - Neil W Kowall
- VA Boston Healthcare System, Neurology Service, Boston, MA, USA.,Boston University School of Medicine, Alzheimer's Disease Center, and Chronic Traumatic Encephalopathy Center, Boston, MA, USA.,Boston University School of Medicine, Department of Pathology and Laboratory Medicine, and Department of Neurology, Boston, MA, USA
| | - Eugene B Hanlon
- VA Bedford Healthcare System, Medical Research & Development Service, Bedford, MA, USA
| |
Collapse
|
24
|
Chen Z, Han S, Zhang J, Zheng P, Liu X, Zhang Y, Jia G. Metabolomics screening of serum biomarkers for occupational exposure of titanium dioxide nanoparticles. Nanotoxicology 2021; 15:832-849. [PMID: 33961536 DOI: 10.1080/17435390.2021.1921872] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Although nanotoxicology studies have shown that respiratory exposure of titanium dioxide nanoparticles (TiO2 NPs) could induce adverse health effects, limited biomarkers associated with occupational exposure of TiO2 NPs were reported. The purpose of this study is to screen serum biomarkers among workers occupationally exposed to TiO2 NPs using metabolomics. Compared with the control group, a total of 296 serum metabolites were differentially expressed in the TiO2 NPs-exposed group, of which the relative expression of 265 metabolites increased, and the remaining 31 decreased. Three machine learning methods including random forest (RF), support vector machines (SVM), and boruta screened eight potential biomarkers and simultaneously selected a metabolite, Liquoric acid. Through multiple linear regression analysis to adjust the influence of confounding factors such as gender, age, BMI, smoking and drinking, occupational exposure to TiO2 NPs was significantly related to the relative expression of the eight potential biomarkers. Meanwhile, the receiver operating characteristic curves (ROCs) of these potential biomarkers had good sensitivity and specificity. These potential biomarkers were related to lipid peroxidation, and had biological basis for occupational exposure to TiO2 NPs. Therefore, it was demonstrated that the serum metabolites represented by Liquoric acid were good biomarkers of occupational exposure to TiO2 NPs.
Collapse
Affiliation(s)
- Zhangjian Chen
- Department of Occupational and Environmental Health Sciences, School of Public Health, Peking University, Beijing, China
| | - Shuo Han
- Department of Occupational and Environmental Health Sciences, School of Public Health, Peking University, Beijing, China
| | - Jiahe Zhang
- Department of Occupational and Environmental Health Sciences, School of Public Health, Peking University, Beijing, China
| | - Pai Zheng
- Department of Occupational and Environmental Health Sciences, School of Public Health, Peking University, Beijing, China
| | - Xiaodong Liu
- Beijing Institute of Occupational Disease Prevention and Treatment, Beijing, China
| | - Yuanyuan Zhang
- Beijing Institute of Occupational Disease Prevention and Treatment, Beijing, China
| | - Guang Jia
- Department of Occupational and Environmental Health Sciences, School of Public Health, Peking University, Beijing, China
| |
Collapse
|
25
|
Integrated meta-analysis and machine learning approach identifies acyl-CoA thioesterase with other novel genes responsible for biofilm development in Staphylococcus aureus. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2021; 88:104702. [PMID: 33388440 DOI: 10.1016/j.meegid.2020.104702] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 12/24/2020] [Accepted: 12/29/2020] [Indexed: 02/08/2023]
Abstract
Biofilm forming Staphylococcus aureus is a major threat to the health-care industry. It is important to understand the differences between planktonic and biofilm growth forms in the pathogen since conventional treatments targeting the planktonic forms are not effective against biofilms. The current study conducts a meta-analysis of three public transcriptomic profiles to examine the differences in gene expression between the planktonic and biofilm states of S. aureus using random-effects modeling. Mean effect sizes were calculated for 2847 genes among which 726 differentially expressed genes were taken for further analysis. Major genes that are discriminatory between the two conditions were mined using supervised learning techniques and validated by high-accuracy classifiers. Ten different feature selection algorithms were applied and used to rank the most important genes in S. aureus biofilms. Finally, an optimal set of 36 genes are presented as candidate genes in biofilm formation or development while throwing light on the novel roles of an acyl-CoA thioesterase enzyme and 10 hypothetical proteins in biofilms. The relevance of the identified gene set was further validated by building five different classification models using SVM, RF, kNN, NB and DT algorithms that were compared with models built from other relevant gene sets and by reviewing the functional role of 25 previously known genes in biofilm development. The study combines meta-analysis of differential expression with supervised machine learning strategies and feature selection for the first time to identify and validate a discriminatory set of genes important in biofilms of S. aureus. The functional roles of the identified genes predicted to be important in biofilms are further scrutinized and can be considered as a signature target list to develop anti-biofilm therapeutics in S. aureus.
Collapse
|
26
|
Hamraz M, Gul N, Raza M, Khan DM, Khalil U, Zubair S, Khan Z. Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments. PeerJ Comput Sci 2021; 7:e562. [PMID: 34141889 PMCID: PMC8176540 DOI: 10.7717/peerj-cs.562] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 05/04/2021] [Indexed: 05/10/2023]
Abstract
In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate between the classes are selected. The results of the proposed method are compared with five state-of-the-art gene selection methods based on classification error, Brier score, and sensitivity, by considering eleven gene expression datasets. Classification of observations for different sets of selected genes by the proposed method is carried out by three different classifiers, i.e., random forest, k-nearest neighbors (k-NN), and support vector machine (SVM). Box-plots and stability scores of the results are also shown in this paper. The results reveal that in most of the cases the proposed method outperforms the other methods.
Collapse
Affiliation(s)
- Muhammad Hamraz
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Naz Gul
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Mushtaq Raza
- Department of Computer Sciences, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Dost Muhammad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Umair Khalil
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Seema Zubair
- Department of Mathematics, Statistics and Computer Science, University of Agriculture Peshawar, Peshawar, Pakistan
| | - Zardad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| |
Collapse
|
27
|
Yousef M, Kumar A, Bakir-Gungor B. Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data. ENTROPY (BASEL, SWITZERLAND) 2020; 23:E2. [PMID: 33374969 PMCID: PMC7821996 DOI: 10.3390/e23010002] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 12/14/2020] [Accepted: 12/16/2020] [Indexed: 12/19/2022]
Abstract
In the last two decades, there have been massive advancements in high throughput technologies, which resulted in the exponential growth of public repositories of gene expression datasets for various phenotypes. It is possible to unravel biomarkers by comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc. This problem refers to a well-studied problem in the machine learning domain, i.e., the feature selection problem. In biological data analysis, most of the computational feature selection methodologies were taken from other fields, without considering the nature of the biological data. Thus, integrative approaches that utilize the biological knowledge while performing feature selection are necessary for this kind of data. The main idea behind the integrative gene selection process is to generate a ranked list of genes considering both the statistical metrics that are applied to the gene expression data, and the biological background information which is provided as external datasets. One of the main goals of this review is to explore the existing methods that integrate different types of information in order to improve the identification of the biomolecular signatures of diseases and the discovery of new potential targets for treatment. These integrative approaches are expected to aid the prediction, diagnosis, and treatment of diseases, as well as to enlighten us on disease state dynamics, mechanisms of their onset and progression. The integration of various types of biological information will necessitate the development of novel techniques for integration and data analysis. Another aim of this review is to boost the bioinformatics community to develop new approaches for searching and determining significant groups/clusters of features based on one or more biological grouping functions.
Collapse
Affiliation(s)
- Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat 13206, Israel
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat 13206, Israel
| | - Abhishek Kumar
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India;
- Manipal Academy of Higher Education (MAHE), Manipal 576104, India
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri 38080, Turkey;
| |
Collapse
|
28
|
Laufer BI, Hwang H, Jianu JM, Mordaunt CE, Korf IF, Hertz-Picciotto I, LaSalle JM. Low-pass whole genome bisulfite sequencing of neonatal dried blood spots identifies a role for RUNX1 in Down syndrome DNA methylation profiles. Hum Mol Genet 2020; 29:3465-3476. [PMID: 33001180 PMCID: PMC7788293 DOI: 10.1093/hmg/ddaa218] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 09/16/2020] [Accepted: 09/25/2020] [Indexed: 12/17/2022] Open
Abstract
Neonatal dried blood spots (NDBS) are a widely banked sample source that enables retrospective investigation into early life molecular events. Here, we performed low-pass whole genome bisulfite sequencing (WGBS) of 86 NDBS DNA to examine early life Down syndrome (DS) DNA methylation profiles. DS represents an example of genetics shaping epigenetics, as multiple array-based studies have demonstrated that trisomy 21 is characterized by genome-wide alterations to DNA methylation. By assaying over 24 million CpG sites, thousands of genome-wide significant (q < 0.05) differentially methylated regions (DMRs) that distinguished DS from typical development and idiopathic developmental delay were identified. Machine learning feature selection refined these DMRs to 22 loci. The DS DMRs mapped to genes involved in neurodevelopment, metabolism, and transcriptional regulation. Based on comparisons with previous DS methylation studies and reference epigenomes, the hypermethylated DS DMRs were significantly (q < 0.05) enriched across tissues while the hypomethylated DS DMRs were significantly (q < 0.05) enriched for blood-specific chromatin states. A ~28 kb block of hypermethylation was observed on chromosome 21 in the RUNX1 locus, which encodes a hematopoietic transcription factor whose binding motif was the most significantly enriched (q < 0.05) overall and specifically within the hypomethylated DMRs. Finally, we also identified DMRs that distinguished DS NDBS based on the presence or absence of congenital heart disease (CHD). Together, these results not only demonstrate the utility of low-pass WGBS on NDBS samples for epigenome-wide association studies, but also provide new insights into the early life mechanisms of epigenomic dysregulation resulting from trisomy 21.
Collapse
Affiliation(s)
- Benjamin I Laufer
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, CA 95616, USA.,Genome Center, University of California, Davis, CA 95616, USA.,MIND Institute, University of California, Davis, CA 95616, USA
| | - Hyeyeon Hwang
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, CA 95616, USA.,Genome Center, University of California, Davis, CA 95616, USA.,MIND Institute, University of California, Davis, CA 95616, USA
| | - Julia M Jianu
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, CA 95616, USA.,Genome Center, University of California, Davis, CA 95616, USA.,MIND Institute, University of California, Davis, CA 95616, USA
| | - Charles E Mordaunt
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, CA 95616, USA.,Genome Center, University of California, Davis, CA 95616, USA.,MIND Institute, University of California, Davis, CA 95616, USA
| | - Ian F Korf
- Genome Center, University of California, Davis, CA 95616, USA.,Department of Molecular and Cellular Biology, College of Biological Sciences, University of California, Davis, CA 95616, USA
| | - Irva Hertz-Picciotto
- MIND Institute, University of California, Davis, CA 95616, USA.,Department of Public Health Sciences, School of Medicine, University of California, Davis, CA 95616, USA
| | - Janine M LaSalle
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, CA 95616, USA.,Genome Center, University of California, Davis, CA 95616, USA.,MIND Institute, University of California, Davis, CA 95616, USA
| |
Collapse
|