1
|
Fajarda O, Almeida JR, Duarte-Pereira S, Silva RM, Oliveira JL. Methodology to identify a gene expression signature by merging microarray datasets. Comput Biol Med 2023; 159:106867. [PMID: 37060770 DOI: 10.1016/j.compbiomed.2023.106867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 03/01/2023] [Accepted: 03/30/2023] [Indexed: 04/17/2023]
Abstract
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.
Collapse
Affiliation(s)
- Olga Fajarda
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal.
| | - João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Sara Duarte-Pereira
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, Portugal.
| | - Raquel M Silva
- Universidade Católica Portuguesa, Faculty of Dental Medicine (FMD), Center for Interdisciplinary Research in Health (CIIS), Viseu, Portugal.
| | | |
Collapse
|
2
|
Foster GJ, Sievert MAC, Button-Simons K, Vendrely KM, Romero-Severson J, Ferdig MT. Cyclical regression covariates remove the major confounding effect of cyclical developmental gene expression with strain-specific drug response in the malaria parasite Plasmodium falciparum. BMC Genomics 2022; 23:180. [PMID: 35247977 PMCID: PMC8897900 DOI: 10.1186/s12864-021-08281-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 12/24/2021] [Indexed: 12/21/2022] Open
Abstract
Background The cyclical nature of gene expression in the intraerythrocytic development cycle (IDC) of the malaria parasite, Plasmodium falciparum, confounds the accurate detection of specific transcriptional differences, e.g. as provoked by the development of drug resistance. In lab-based studies, P. falciparum cultures are synchronized to remove this confounding factor, but the rapid detection of emerging resistance to artemisinin therapies requires rapid analysis of transcriptomes extracted directly from clinical samples. Here we propose the use of cyclical regression covariates (CRC) to eliminate the major confounding effect of developmentally driven transcriptional changes in clinical samples. We show that elimination of this confounding factor reduces both Type I and Type II errors and demonstrate the effectiveness of this approach using a published dataset of 1043 transcriptomes extracted directly from patient blood samples with different patient clearance times after treatment with artemisinin. Results We apply this method to two publicly available datasets and demonstrate its ability to reduce the confounding of differences in transcript levels due to misaligned intraerythrocytic development time. Adjusting the clinical 1043 transcriptomes dataset with CRC results in detection of fewer functional categories than previously reported from the same data set adjusted using other methods. We also detect mostly the same functional categories, but observe fewer genes within these categories. Finally, the CRC method identifies genes in a functional category that was absent from the results when the dataset was adjusted using other methods. Analysis of differential gene expression in the clinical data samples that vary broadly for developmental stage resulted in the detection of far fewer transcripts in fewer functional categories while, at the same time, identifying genes in two functional categories not present in the unadjusted data analysis. These differences are consistent with the expectation that CRC reduces both false positives and false negatives with the largest effect on datasets from samples with greater variance in developmental stage. Conclusions Cyclical regression covariates have immediate application to parasite transcriptome sequencing directly from clinical blood samples and to cost-constrained in vitro experiments. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08281-y.
Collapse
|
3
|
Fajarda O, Duarte-Pereira S, Silva RM, Oliveira JL. Merging microarray studies to identify a common gene expression signature to several structural heart diseases. BioData Min 2020; 13:8. [PMID: 32670412 PMCID: PMC7346458 DOI: 10.1186/s13040-020-00217-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 06/05/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Heart disease is the leading cause of death worldwide. Knowing a gene expression signature in heart disease can lead to the development of more efficient diagnosis and treatments that may prevent premature deaths. A large amount of microarray data is available in public repositories and can be used to identify differentially expressed genes. However, most of the microarray datasets are composed of a reduced number of samples and to obtain more reliable results, several datasets have to be merged, which is a challenging task. The identification of differentially expressed genes is commonly done using statistical methods. Nonetheless, these methods are based on the definition of an arbitrary threshold to select the differentially expressed genes and there is no consensus on the values that should be used. RESULTS Nine publicly available microarray datasets from studies of different heart diseases were merged to form a dataset composed of 689 samples and 8354 features. Subsequently, the adjusted p-value and fold change were determined and by combining a set of adjusted p-values cutoffs with a list of different fold change thresholds, 12 sets of differentially expressed genes were obtained. To select the set of differentially expressed genes that has the best accuracy in classifying samples from patients with heart diseases and samples from patients with no heart condition, the random forest algorithm was used. A set of 62 differentially expressed genes having a classification accuracy of approximately 95% was identified. CONCLUSIONS We identified a gene expression signature common to different cardiac diseases and supported our findings by showing their involvement in the pathophysiology of the heart. The approach used in this study is suitable for the identification of gene expression signatures, and can be extended to different diseases.
Collapse
Affiliation(s)
- Olga Fajarda
- IEETA/DETI, University of Aveiro, Aveiro, 3810-193 Portugal
| | - Sara Duarte-Pereira
- IEETA/DETI, University of Aveiro, Aveiro, 3810-193 Portugal
- Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, 3810-193 Portugal
| | - Raquel M. Silva
- IEETA/DETI, University of Aveiro, Aveiro, 3810-193 Portugal
- Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, 3810-193 Portugal
- Current Address: Universidade Católica Portuguesa, Faculdade de Medicina Dentária, CIIS-Centro de Investigação Interdisciplinar em Saúde, Campus de Viseu, Viseu, 3504-505 Portugal
| | | |
Collapse
|
4
|
Wang Y, Chen YJ, Xiang C, Jiang GW, Xu YD, Yin LM, Zhou DD, Liu YY, Yang YQ. Discovery of potential asthma targets based on the clinical efficacy of Traditional Chinese Medicine formulas. JOURNAL OF ETHNOPHARMACOLOGY 2020; 252:112635. [PMID: 32004629 DOI: 10.1016/j.jep.2020.112635] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 01/23/2020] [Accepted: 01/24/2020] [Indexed: 06/10/2023]
Abstract
ETHNOPHARMACOLOGICAL RELEVANCE Standard therapy for asthma, a highly heterogeneous disease, is primarily based on bronchodilators and immunosuppressive drugs, which confer short-term symptomatic relief but not a cure. It is difficult to discover novel bronchodilators, although potential new targets are emerging. Traditional Chinese Medicine (TCM) formulas have been used to treat asthma for more than 2000 years, forming the basis for representative asthma treatments. AIM OF THE STUDY Based on the efficacy of TCM formulas, anti-asthmatic herbal compounds bind proteins are potential targets for asthma therapy. This analysis will provide new drug targets and discovery strategies for asthma therapy. MATERIALS AND METHODS A list of candidate herbs for asthma was selected from the classical formulas (CFs) of TCM for the treatment of wheezing or dyspnea recorded in Treatise on Cold Damage and Miscellaneous Diseases (TCDMD) and from modern herbal formulas identified in the SAPHRON TCM Database using the keywords "wheezing" or "dyspnea". Compounds in the selected herbs and compounds that directly bind target proteins were acquired by searching the Herbal Ingredients' Targets Database (HITD), TCM Data Bank (TCMDB) and TCM Integrated Database (TCMID). Therapeutic targets of conventional medicine (CM) for asthma were collected by searching Therapeutic Target Database (TTD), DrugBank and PubMed as supplements. Finally, the enriched gene ontology (GO) terms of the targets were obtained using the Database for Annotation Visualization and Integrated Discovery (DAVID) and protein-protein interactions (PPI) networks were constructed using Search Tool for the Retrieval of Interacting Genes/Proteins (STRING). The effects of two selected TCM compounds, kaempferol and ginkgolide A, on cellular resistance in human airway smooth muscle cells (ASMCs) and pulmonary resistance in a mouse model were investigated. RESULTS The list of 32 candidate herbs for asthma was selected from 10 CFs for the treatment of wheezing or dyspnea recorded in TCDMD and 1037 modern herbal formulas obtained from the SAPHRON TCM Database. A total of 130 compounds from the 32 selected herbs and 68 herbal compounds directly bind target proteins were acquired from HITD and TCMDB. Eighty-eight therapeutic targets of CM for asthma were collected by searching TTD and PubMed as supplements. DAVID and STRING analyses showed targets of TCM formulas are primarily related to cytochrome P450 (CYP) family, transient receptor potential (TRP) channels, matrix metalloproteinases (MMPs) and ribosomal protein. Both TCM formulas and CM act on the same types of targets or signaling pathways, such as G protein-coupled receptors (GPCRs), steroid hormone receptors (SHRs), and JAK-STAT signaling pathway. The proteins directly targeted by herbal compounds, TRPM8, TRPA1, TRPV3, CYP1B1, CYP2B6, CYP1A2, CYP3A4, CYP1A1, PPARA, PPARD, NR1I2, MMP1, MMP2, ESR1, ESR2, RPLP0, RPLP1 and RPLP2, are potential targets for asthma therapy. In vitro results showed kaempferol (1 × 10-2 mM) and ginkgolide A (1 × 10-5 mM) significantly increased the cell index (P < 0.05 vs. histamine, n = 3) and therefore relaxed human ASMCs. In vivo results showed kaempferol (145 μg/kg) and ginkgolide A (205 μg/kg) significantly reduced pulmonary resistance (P < 0.05 vs. methacholine, n = 6). CONCLUSION Potential target discovery for asthma treatment based on the clinical effectiveness of TCM is a feasible strategy.
Collapse
Affiliation(s)
- Yu Wang
- International Union Laboratory on Acupuncture Based Target Discovery, International Joint Laboratory on Acupuncture Neuro-immunology, Shanghai Research Institute of Acupuncture and Meridian, Yue Yang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China; Experiment Center for Science and Technology, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Yan-Jiao Chen
- International Union Laboratory on Acupuncture Based Target Discovery, International Joint Laboratory on Acupuncture Neuro-immunology, Shanghai Research Institute of Acupuncture and Meridian, Yue Yang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Cheng Xiang
- International Union Laboratory on Acupuncture Based Target Discovery, International Joint Laboratory on Acupuncture Neuro-immunology, Shanghai Research Institute of Acupuncture and Meridian, Yue Yang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Guang-Wei Jiang
- International Union Laboratory on Acupuncture Based Target Discovery, International Joint Laboratory on Acupuncture Neuro-immunology, Shanghai Research Institute of Acupuncture and Meridian, Yue Yang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Yu-Dong Xu
- International Union Laboratory on Acupuncture Based Target Discovery, International Joint Laboratory on Acupuncture Neuro-immunology, Shanghai Research Institute of Acupuncture and Meridian, Yue Yang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Lei-Miao Yin
- International Union Laboratory on Acupuncture Based Target Discovery, International Joint Laboratory on Acupuncture Neuro-immunology, Shanghai Research Institute of Acupuncture and Meridian, Yue Yang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Dong-Dong Zhou
- International Union Laboratory on Acupuncture Based Target Discovery, International Joint Laboratory on Acupuncture Neuro-immunology, Shanghai Research Institute of Acupuncture and Meridian, Yue Yang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Yan-Yan Liu
- International Union Laboratory on Acupuncture Based Target Discovery, International Joint Laboratory on Acupuncture Neuro-immunology, Shanghai Research Institute of Acupuncture and Meridian, Yue Yang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Yong-Qing Yang
- International Union Laboratory on Acupuncture Based Target Discovery, International Joint Laboratory on Acupuncture Neuro-immunology, Shanghai Research Institute of Acupuncture and Meridian, Yue Yang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China.
| |
Collapse
|
5
|
Aliper A, Jellen L, Cortese F, Artemov A, Karpinsky-Semper D, Moskalev A, Swick AG, Zhavoronkov A. Towards natural mimetics of metformin and rapamycin. Aging (Albany NY) 2017; 9:2245-2268. [PMID: 29165314 PMCID: PMC5723685 DOI: 10.18632/aging.101319] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Accepted: 11/02/2017] [Indexed: 12/14/2022]
Abstract
Aging is now at the forefront of major challenges faced globally, creating an immediate need for safe, widescale interventions to reduce the burden of chronic disease and extend human healthspan. Metformin and rapamycin are two FDA-approved mTOR inhibitors proposed for this purpose, exhibiting significant anti-cancer and anti-aging properties beyond their current clinical applications. However, each faces issues with approval for off-label, prophylactic use due to adverse effects. Here, we initiate an effort to identify nutraceuticals-safer, naturally-occurring compounds-that mimic the anti-aging effects of metformin and rapamycin without adverse effects. We applied several bioinformatic approaches and deep learning methods to the Library of Integrated Network-based Cellular Signatures (LINCS) dataset to map the gene- and pathway-level signatures of metformin and rapamycin and screen for matches among over 800 natural compounds. We then predicted the safety of each compound with an ensemble of deep neural network classifiers. The analysis revealed many novel candidate metformin and rapamycin mimetics, including allantoin and ginsenoside (metformin), epigallocatechin gallate and isoliquiritigenin (rapamycin), and withaferin A (both). Four relatively unexplored compounds also scored well with rapamycin. This work revealed promising candidates for future experimental validation while demonstrating the applications of powerful screening methods for this and similar endeavors.
Collapse
Affiliation(s)
- Alexander Aliper
- Insilico Medicine, Inc, Research Department, Baltimore, MD 21218, USA
| | - Leslie Jellen
- Insilico Medicine, Inc, Research Department, Baltimore, MD 21218, USA
| | - Franco Cortese
- Biogerontology Research Foundation, Research Department, Oxford, United Kingdom
- Department of Biomedical and Molecular Science, Queen's University School of Medicine, Queen's University, Kingston, ON K7L 3N6, Canada
| | - Artem Artemov
- Insilico Medicine, Inc, Research Department, Baltimore, MD 21218, USA
| | | | - Alexey Moskalev
- Laboratory of Molecular Radiobiology and Gerontology, Institute of Biology of Komi Science Center of Ural Branch of Russian Academy of Sciences, Syktyvkar, 167982, Russia
| | | | - Alex Zhavoronkov
- Insilico Medicine, Inc, Research Department, Baltimore, MD 21218, USA
- Biogerontology Research Foundation, Research Department, Oxford, United Kingdom
| |
Collapse
|