1
|
Discovering common pathogenetic processes between COVID-19 and diabetes mellitus by differential gene expression pattern analysis. Brief Bioinform 2021; 22:bbab262. [PMID: 34260684 PMCID: PMC8344483 DOI: 10.1093/bib/bbab262] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/28/2021] [Accepted: 06/21/2021] [Indexed: 01/08/2023] Open
Abstract
Coronavirus disease 2019 (COVID-19) is an infectious disease caused by the newly discovered coronavirus, SARS-CoV-2. Increased severity of COVID-19 has been observed in patients with diabetes mellitus (DM). This study aimed to identify common transcriptional signatures, regulators and pathways between COVID-19 and DM. We have integrated human whole-genome transcriptomic datasets from COVID-19 and DM, followed by functional assessment with gene ontology (GO) and pathway analyses. In peripheral blood mononuclear cells (PBMCs), among the upregulated differentially expressed genes (DEGs), 32 were found to be commonly modulated in COVID-19 and type 2 diabetes (T2D), while 10 DEGs were commonly downregulated. As regards type 1 diabetes (T1D), 21 DEGs were commonly upregulated, and 29 DEGs were commonly downregulated in COVID-19 and T1D. Moreover, 35 DEGs were commonly upregulated in SARS-CoV-2 infected pancreas organoids and T2D islets, while 14 were commonly downregulated. Several GO terms were found in common between COVID-19 and DM. Prediction of the putative transcription factors involved in the upregulation of genes in COVID-19 and DM identified RELA to be implicated in both PBMCs and pancreas. Here, for the first time, we have characterized the biological processes and pathways commonly dysregulated in COVID-19 and DM, which could be in the next future used for the design of personalized treatment of COVID-19 patients suffering from DM as comorbidity.
Collapse
|
2
|
rMisbeta: A robust missing value imputation approach in transcriptomics and metabolomics data. Comput Biol Med 2021; 138:104911. [PMID: 34634637 DOI: 10.1016/j.compbiomed.2021.104911] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 09/25/2021] [Accepted: 09/25/2021] [Indexed: 12/14/2022]
Abstract
Transcriptomics and metabolomics data often contain missing values or outliers due to limitations of the data acquisition techniques. Most of the statistical methods require complete datasets for downstream analysis. A number of methods have been developed for missing value imputation using the classical mean and variance based on maximum likelihood estimators, which are not robust against outliers. Consequently, the performance of these methods deteriorates in the presence of outliers. Hence precise imputation of missing values and outliers handling are both concurrently important. Therefore, in this paper, we developed a robust iterative approach using robust estimators based on the minimum beta divergence method, which simultaneously impute missing values and outliers. We investigate the performance of the proposed method in a comparison with six frequently used missing value imputation methods such as Zero, KNN, robust SVD, EM, random forest (RF) and weighted least square approach (WLSA) through feature selection using both simulated and real datasets. Ten performance indices were used to explore the optimal method such as Frobenius norm (FOBN), accuracy (ACC), sensitivity (SN), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), detection rate (DR), misclassification error rate (MER), the area under the ROC curve (AUC) and computational runtime. Evaluation based on both simulated and real data suggests the superiority of the proposed method over the other traditional methods in terms of various rates of outliers and missing values. The suggested approach also keeps almost equal performance in absence of outliers with the other methods. The proposed method is accurate, simple, and consumes lower computational time compared to the other methods. Therefore, our recommendation is to apply the proposed procedure for large-scale transcriptomics and metabolomics data analysis. The computational tool has been implemented in an R package, which is publicly available from https://CRAN.R-project.org/package=rMisbeta.
Collapse
|
3
|
A network-based systems biology approach for identification of shared Gene signatures between male and female in COVID-19 datasets. INFORMATICS IN MEDICINE UNLOCKED 2021; 25:100702. [PMID: 34423108 PMCID: PMC8372456 DOI: 10.1016/j.imu.2021.100702] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 08/12/2021] [Accepted: 08/13/2021] [Indexed: 12/14/2022] Open
Abstract
The novel coronavirus (SARS-CoV-2) has expanded rapidly worldwide. Now it has covered more than 150 countries worldwide. It is referred to as COVID-19. SARS-CoV-2 mainly affects the respiratory systems of humans that can lead up to serious illness or even death in the presence of different comorbidities. However, most COVID-19 infected people show mild to moderate symptoms, and no medication is suggested. Still, drugs of other diseases have been used to treat COVID-19. Nevertheless, the absence of vaccines and proper drugs against the COVID-19 virus has increased the mortality rate. Albeit sex is a risk factor for COVID-19, none of the studies considered this risk factor for identifying biomarkers from the RNASeq count dataset. Men are more likely to undertake severe symptoms with different comorbidities and show greater mortality compared with women. From this standpoint, we aim to identify shared gene signatures between males and females from the human COVID-19 RNAseq count dataset of peripheral blood cells using a robust voom approach. We identified 1341 overlapping DEGs between male and female datasets. The gene ontology (GO) annotation and pathway enrichment analysis revealed that DEGs are involved in various BP categories such as nucleosome assembly, DNA conformation change, DNA packaging, and different KEGG pathways such as cell cycle, ECM-receptor interaction, progesterone-mediated oocyte maturation, etc. Ten hub-proteins (UBC, KIAA0101, APP, CDK1, SUMO2, SP1, FN1, CDK2, E2F1, and TP53) were unveiled using PPI network analysis. The top three miRNAs (mir-17-5p, mir-20a-5p, mir-93-5p) and TFs (PPARG, E2F1 and KLF5) were uncovered. In conclusion, the top ten significant drugs (roscovitine, curcumin, simvastatin, fulvestrant, troglitazone, alvocidib, L-alanine, tamoxifen, serine, and doxorubicin) were retrieved using drug repurposing analysis of overlapping DEGs, which might be therapeutic agents of COVID-19.
Collapse
|
4
|
Bioinformatics and machine learning approach identifies potential drug targets and pathways in COVID-19. Brief Bioinform 2021; 22:6220170. [PMID: 33839760 PMCID: PMC8083354 DOI: 10.1093/bib/bbab120] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 02/15/2021] [Accepted: 03/13/2021] [Indexed: 12/12/2022] Open
Abstract
Current coronavirus disease-2019 (COVID-19) pandemic has caused massive loss of lives. Clinical trials of vaccines and drugs are currently being conducted around the world; however, till now no effective drug is available for COVID-19. Identification of key genes and perturbed pathways in COVID-19 may uncover potential drug targets and biomarkers. We aimed to identify key gene modules and hub targets involved in COVID-19. We have analyzed SARS-CoV-2 infected peripheral blood mononuclear cell (PBMC) transcriptomic data through gene coexpression analysis. We identified 1520 and 1733 differentially expressed genes (DEGs) from the GSE152418 and CRA002390 PBMC datasets, respectively (FDR < 0.05). We found four key gene modules and hub gene signature based on module membership (MMhub) statistics and protein-protein interaction (PPI) networks (PPIhub). Functional annotation by enrichment analysis of the genes of these modules demonstrated immune and inflammatory response biological processes enriched by the DEGs. The pathway analysis revealed the hub genes were enriched with the IL-17 signaling pathway, cytokine-cytokine receptor interaction pathways. Then, we demonstrated the classification performance of hub genes (PLK1, AURKB, AURKA, CDK1, CDC20, KIF11, CCNB1, KIF2C, DTL and CDC6) with accuracy >0.90 suggesting the biomarker potential of the hub genes. The regulatory network analysis showed transcription factors and microRNAs that target these hub genes. Finally, drug-gene interactions analysis suggests amsacrine, BRD-K68548958, naproxol, palbociclib and teniposide as the top-scored repurposed drugs. The identified biomarkers and pathways might be therapeutic targets to the COVID-19.
Collapse
|
5
|
Integrative transcriptomics analysis of lung epithelial cells and identification of repurposable drug candidates for COVID-19. Eur J Pharmacol 2020; 887:173594. [PMID: 32971089 PMCID: PMC7505772 DOI: 10.1016/j.ejphar.2020.173594] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 09/09/2020] [Accepted: 09/21/2020] [Indexed: 12/14/2022]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) disease, more commonly COVID-19 has emerged as a world health pandemic. There are couples of treatment methods for COVID-19, however, well-established drugs and vaccines are urgently needed to treat the COVID-19. The new drug discovery is a tremendous challenge; repurposing of existing drugs could shorten the time and expense compared with de novo drug development. In this study, we aimed to decode molecular signatures and pathways of the host cells in response to SARS-CoV-2 and the rapid identification of repurposable drugs using bioinformatics and network biology strategies. We have analyzed available transcriptomic RNA-seq COVID-19 data to identify differentially expressed genes (DEGs). We detected 177 DEGs specific for COVID-19 where 122 were upregulated and 55 were downregulated compared to control (FDR<0.05 and logFC ≥ 1). The DEGs were significantly involved in the immune and inflammatory response. The pathway analysis revealed the DEGs were found in influenza A, measles, cytokine signaling in the immune system, interleukin-4, interleukin -13, interleukin -17 signaling, and TNF signaling pathways. Protein-protein interaction analysis showed 10 hub genes (BIRC3, ICAM1, IRAK2, MAP3K8, S100A8, SOCS3, STAT5A, TNF, TNFAIP3, TNIP1). The regulatory network analysis showed significant transcription factors (TFs) that target DEGs, namely FOXC1, GATA2, YY1, FOXL1, NFKB1. Finally, drug repositioning analysis was performed with these 10 hub genes and showed that in silico validated three drugs with molecular docking. The transcriptomics signatures, molecular pathways, and regulatory biomolecules shed light on candidate biomarkers and drug targets which have potential roles to manage COVID-19. ICAM1 and TNFAIP3 were the key hubs that have demonstrated good binding affinities with repurposed drug candidates. Dabrafenib, radicicol, and AT-7519 were the top-scored repurposed drugs that showed efficient docking results when they tested with hub genes. The identified drugs should be further evaluated in molecular level wet-lab experiments in prior to clinical studies in the treatment of COVID-19.
Collapse
|
6
|
Identification of molecular signatures and pathways to identify novel therapeutic targets in Alzheimer's disease: Insights from a systems biomedicine perspective. Genomics 2019; 112:1290-1299. [PMID: 31377428 DOI: 10.1016/j.ygeno.2019.07.018] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 07/01/2019] [Accepted: 07/30/2019] [Indexed: 12/20/2022]
Abstract
Alzheimer's disease (AD) is a progressive neurodegenerative disease characterized by the accumulation of amyloid plaques and neurofibrillary tangles in the brain. However, there are no peripheral biomarkers available that can detect AD onset. This study aimed to identify the molecular signatures in AD through an integrative analysis of blood gene expression data. We used two microarray datasets (GSE4226 and GSE4229) comparing peripheral blood transcriptomes of AD patients and controls to identify differentially expressed genes (DEGs). Gene set and protein overrepresentation analysis, protein-protein interaction (PPI), DEGs-Transcription Factors (TFs) interactions, DEGs-microRNAs (miRNAs) interactions, protein-drug interactions, and protein subcellular localizations analyses were performed on DEGs common to the datasets. We identified 25 common DEGs between the two datasets. Integration of genome scale transcriptome datasets with biomolecular networks revealed hub genes (NOL6, ATF3, TUBB, UQCRC1, CASP2, SND1, VCAM1, BTF3, VPS37B), common transcription factors (FOXC1, GATA2, NFIC, PPARG, USF2, YY1) and miRNAs (mir-20a-5p, mir-93-5p, mir-16-5p, let-7b-5p, mir-708-5p, mir-24-3p, mir-26b-5p, mir-17-5p, mir-193-3p, mir-186-5p). Evaluation of histone modifications revealed that hub genes possess several histone modification sites associated with AD. Protein-drug interactions revealed 10 compounds that affect the identified AD candidate biomolecules, including anti-neoplastic agents (Vinorelbine, Vincristine, Vinblastine, Epothilone D, Epothilone B, CYT997, and ZEN-012), a dermatological (Podofilox) and an immunosuppressive agent (Colchicine). The subcellular localization of molecular signatures varied, including nuclear, plasma membrane and cytosolic proteins. In the present study, it was identified blood-cell derived molecular signatures that might be useful as candidate peripheral biomarkers in AD. It was also identified potential drugs and epigenetic data associated with these molecules that may be useful in designing therapeutic approaches to ameliorate AD.
Collapse
|
7
|
Discovering Biomarkers and Pathways Shared by Alzheimer's Disease and Ischemic Stroke to Identify Novel Therapeutic Targets. MEDICINA (KAUNAS, LITHUANIA) 2019; 55:E191. [PMID: 31121943 PMCID: PMC6572146 DOI: 10.3390/medicina55050191] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 03/20/2019] [Accepted: 05/17/2019] [Indexed: 12/21/2022]
Abstract
Background and objectives: Alzheimer's disease (AD) is a progressive neurodegenerative disease that results in severe dementia. Having ischemic strokes (IS) is one of the risk factors of the AD, but the molecular mechanisms that underlie IS and AD are not well understood. We thus aimed to identify common molecular biomarkers and pathways in IS and AD that can help predict the progression of these diseases and provide clues to important pathological mechanisms. Materials and Methods: We have analyzed the microarray gene expression datasets of IS and AD. To obtain robust results, combinatorial statistical methods were used to analyze the datasets and 26 transcripts (22 unique genes) were identified that were abnormally expressed in both IS and AD. Results: Gene Ontology (GO) and KEGG pathway analyses indicated that these 26 common dysregulated genes identified several altered molecular pathways: Alcoholism, MAPK signaling, glycine metabolism, serine metabolism, and threonine metabolism. Further protein-protein interactions (PPI) analysis revealed pathway hub proteins PDE9A, GNAO1, DUSP16, NTRK2, PGAM2, MAG, and TXLNA. Transcriptional and post-transcriptional components were then identified, and significant transcription factors (SPIB, SMAD3, and SOX2) found. Conclusions: Protein-drug interaction analysis revealed PDE9A has interaction with drugs caffeine, γ-glutamyl glycine, and 3-isobutyl-1-methyl-7H-xanthine. Thus, we identified novel putative links between pathological processes in IS and AD at transcripts levels, and identified possible mechanistic and gene expression links between IS and AD.
Collapse
|
8
|
Identification of Prognostic Biomarker Signatures and Candidate Drugs in Colorectal Cancer: Insights from Systems Biology Analysis. ACTA ACUST UNITED AC 2019; 55:medicina55010020. [PMID: 30658502 PMCID: PMC6359148 DOI: 10.3390/medicina55010020] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 12/23/2018] [Accepted: 01/14/2019] [Indexed: 12/17/2022]
Abstract
Background and objectives: Colorectal cancer (CRC) is the second most common cause of cancer-related death in the world, but early diagnosis ameliorates the survival of CRC. This report aimed to identify molecular biomarker signatures in CRC. Materials and Methods: We analyzed two microarray datasets (GSE35279 and GSE21815) from the Gene Expression Omnibus (GEO) to identify mutual differentially expressed genes (DEGs). We integrated DEGs with protein–protein interaction and transcriptional/post-transcriptional regulatory networks to identify reporter signaling and regulatory molecules; utilized functional overrepresentation and pathway enrichment analyses to elucidate their roles in biological processes and molecular pathways; performed survival analyses to evaluate their prognostic performance; and applied drug repositioning analyses through Connectivity Map (CMap) and geneXpharma tools to hypothesize possible drug candidates targeting reporter molecules. Results: A total of 727 upregulated and 99 downregulated DEGs were detected. The PI3K/Akt signaling, Wnt signaling, extracellular matrix (ECM) interaction, and cell cycle were identified as significantly enriched pathways. Ten hub proteins (ADNP, CCND1, CD44, CDK4, CEBPB, CENPA, CENPH, CENPN, MYC, and RFC2), 10 transcription factors (ETS1, ESR1, GATA1, GATA2, GATA3, AR, YBX1, FOXP3, E2F4, and PRDM14) and two microRNAs (miRNAs) (miR-193b-3p and miR-615-3p) were detected as reporter molecules. The survival analyses through Kaplan–Meier curves indicated remarkable performance of reporter molecules in the estimation of survival probability in CRC patients. In addition, several drug candidates including anti-neoplastic and immunomodulating agents were repositioned. Conclusions: This study presents biomarker signatures at protein and RNA levels with prognostic capability in CRC. We think that the molecular signatures and candidate drugs presented in this study might be useful in future studies indenting the development of accurate diagnostic and/or prognostic biomarker screens and efficient therapeutic strategies in CRC.
Collapse
|
9
|
Robust Feature Selection Approach for Patient Classification using Gene Expression Data. Bioinformation 2017; 13:327-332. [PMID: 29162964 PMCID: PMC5680713 DOI: 10.6026/97320630013327] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Revised: 09/11/2017] [Accepted: 09/12/2017] [Indexed: 11/23/2022] Open
Abstract
Patient classification through feature selection (FS) based on gene expression data (GED) has already become popular to the research communities. T-test is the well-known statistical FS method in GED analysis. However, it produces higher false positives and lower accuracies for small sample sizes or in presence of outliers. To get rid from the shortcomings of t-test with small sample sizes, SAM has been applied in GED. But, it is highly sensitive to outliers. Recently, robust SAM using the minimum β-divergence estimators has overcome all the problems of classical t-test & SAM and it has been successfully applied for identification of differentially expressed (DE) genes. But, it was not applied in classification. Therefore, in this paper, we employ robust SAM as a feature selection approach along with classifiers for patient classification. We demonstrate the performance of the robust SAM in a comparison of classical t-test and SAM along with four popular classifiers (LDA, KNN, SVM and naive Bayes) using both simulated and real gene expression datasets. The results obtained from simulation and real data analysis confirm that the performance of the four classifiers improve with robust SAM than the classical t-test and SAM. From a real Colon cancer dataset we identified 21 additional DE genes using robust SAM that were not identified by the classical t-test or SAM. To reveal the biological functions and pathways of these 21 genes, we perform KEGG pathway enrichment analysis and found that these genes are involved in some important pathways related to cancer disease.
Collapse
|
10
|
Serum and Plasma Metabolomic Biomarkers for Lung Cancer. Bioinformation 2017; 13:202-208. [PMID: 28729763 PMCID: PMC5512859 DOI: 10.6026/97320630013202] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 06/05/2017] [Indexed: 01/22/2023] Open
Abstract
In drug invention and early disease prediction of lung cancer, metabolomic biomarker detection is very important. Mortality rate can
be decreased, if cancer is predicted at the earlier stage. Recent diagnostic techniques for lung cancer are not prognosis diagnostic
techniques. However, if we know the name of the metabolites, whose intensity levels are considerably changing between cancer
subject and control subject, then it will be easy to early diagnosis the disease as well as to discover the drug. Therefore, in this paper we
have identified the influential plasma and serum blood sample metabolites for lung cancer and also identified the biomarkers that will
be helpful for early disease prediction as well as for drug invention. To identify the influential metabolites, we considered a parametric
and a nonparametric test namely student׳s t-test as parametric and Kruskal-Wallis test as non-parametric test. We also categorized the
up-regulated and down-regulated metabolites by the heatmap plot and identified the biomarkers by support vector machine (SVM)
classifier and pathway analysis. From our analysis, we got 27 influential (p-value<0.05) metabolites from plasma sample and 13
influential (p-value<0.05) metabolites from serum sample. According to the importance plot through SVM classifier, pathway analysis
and correlation network analysis, we declared 4 metabolites (taurine, aspertic acid, glutamine and pyruvic acid) as plasma biomarker
and 3 metabolites (aspartic acid, taurine and inosine) as serum biomarker.
Collapse
|