1
|
Rosati D, Palmieri M, Brunelli G, Morrione A, Iannelli F, Frullanti E, Giordano A. Differential gene expression analysis pipelines and bioinformatic tools for the identification of specific biomarkers: A review. Comput Struct Biotechnol J 2024; 23:1154-1168. [PMID: 38510977 PMCID: PMC10951429 DOI: 10.1016/j.csbj.2024.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/20/2024] [Accepted: 02/20/2024] [Indexed: 03/22/2024] Open
Abstract
In recent years, the role of bioinformatics and computational biology together with omics techniques and transcriptomics has gained tremendous importance in biomedicine and healthcare, particularly for the identification of biomarkers for precision medicine and drug discovery. Differential gene expression (DGE) analysis is one of the most used techniques for RNA-sequencing (RNA-seq) data analysis. This tool, which is typically used in various RNA-seq data processing applications, allows the identification of differentially expressed genes across two or more sample sets. Functional enrichment analyses can then be performed to annotate and contextualize the resulting gene lists. These studies provide valuable information about disease-causing biological processes and can help in identifying molecular targets for novel therapies. This review focuses on differential gene expression (DGE) analysis pipelines and bioinformatic techniques commonly used to identify specific biomarkers and discuss the advantages and disadvantages of these techniques.
Collapse
Affiliation(s)
- Diletta Rosati
- Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy
- Cancer Genomics & Systems Biology Lab, Dept. of Medical Biotechnologies, University of Siena, 53100 Siena, Italy
- Med Biotech Hub and Competence Center, Department of Medical Biotechnologies, University of Siena, Italy
| | - Maria Palmieri
- Cancer Genomics & Systems Biology Lab, Dept. of Medical Biotechnologies, University of Siena, 53100 Siena, Italy
- Med Biotech Hub and Competence Center, Department of Medical Biotechnologies, University of Siena, Italy
| | - Giulia Brunelli
- Med Biotech Hub and Competence Center, Department of Medical Biotechnologies, University of Siena, Italy
| | - Andrea Morrione
- Sbarro Institute for Cancer Research and Molecular Medicine, Center for Biotechnology, Department of Biology, College of Science and Technology, Temple University, Philadelphia, PA 19122, USA
| | - Francesco Iannelli
- Laboratory of Molecular Microbiology and Biotechnology, Department of Medical Biotechnologies, University of Siena, Siena, Italy
| | - Elisa Frullanti
- Cancer Genomics & Systems Biology Lab, Dept. of Medical Biotechnologies, University of Siena, 53100 Siena, Italy
- Med Biotech Hub and Competence Center, Department of Medical Biotechnologies, University of Siena, Italy
| | - Antonio Giordano
- Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy
- Sbarro Institute for Cancer Research and Molecular Medicine, Center for Biotechnology, Department of Biology, College of Science and Technology, Temple University, Philadelphia, PA 19122, USA
| |
Collapse
|
2
|
Zhang S, Jiang Z, Zeng P. Incorporating genetic similarity of auxiliary samples into eGene identification under the transfer learning framework. J Transl Med 2024; 22:258. [PMID: 38461317 PMCID: PMC10924384 DOI: 10.1186/s12967-024-05053-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/01/2024] [Indexed: 03/11/2024] Open
Abstract
BACKGROUND The term eGene has been applied to define a gene whose expression level is affected by at least one independent expression quantitative trait locus (eQTL). It is both theoretically and empirically important to identify eQTLs and eGenes in genomic studies. However, standard eGene detection methods generally focus on individual cis-variants and cannot efficiently leverage useful knowledge acquired from auxiliary samples into target studies. METHODS We propose a multilocus-based eGene identification method called TLegene by integrating shared genetic similarity information available from auxiliary studies under the statistical framework of transfer learning. We apply TLegene to eGene identification in ten TCGA cancers which have an explicit relevant tissue in the GTEx project, and learn genetic effect of variant in TCGA from GTEx. We also adopt TLegene to the Geuvadis project to evaluate its usefulness in non-cancer studies. RESULTS We observed substantial genetic effect correlation of cis-variants between TCGA and GTEx for a larger number of genes. Furthermore, consistent with the results of our simulations, we found that TLegene was more powerful than existing methods and thus identified 169 distinct candidate eGenes, which was much larger than the approach that did not consider knowledge transfer across target and auxiliary studies. Previous studies and functional enrichment analyses provided empirical evidence supporting the associations of discovered eGenes, and it also showed evidence of allelic heterogeneity of gene expression. Furthermore, TLegene identified more eGenes in Geuvadis and revealed that these eGenes were mainly enriched in cells EBV transformed lymphocytes tissue. CONCLUSION Overall, TLegene represents a flexible and powerful statistical method for eGene identification through transfer learning of genetic similarity shared across auxiliary and target studies.
Collapse
Affiliation(s)
- Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Xuzhou Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Jiangsu Engineering Research Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
3
|
Xu H, Shao Z, Zhang S, Liu X, Zeng P. How can childhood maltreatment affect post-traumatic stress disorder in adult: Results from a composite null hypothesis perspective of mediation analysis. Front Psychiatry 2023; 14:1102811. [PMID: 36970281 PMCID: PMC10033829 DOI: 10.3389/fpsyt.2023.1102811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 02/20/2023] [Indexed: 03/11/2023] Open
Abstract
BackgroundA greatly growing body of literature has revealed the mediating role of DNA methylation in the influence path from childhood maltreatment to psychiatric disorders such as post-traumatic stress disorder (PTSD) in adult. However, the statistical method is challenging and powerful mediation analyses regarding this issue are lacking.MethodsTo study how the maltreatment in childhood alters long-lasting DNA methylation changes which further affect PTSD in adult, we here carried out a gene-based mediation analysis from a perspective of composite null hypothesis in the Grady Trauma Project (352 participants and 16,565 genes) with childhood maltreatment as exposure, multiple DNA methylation sites as mediators, and PTSD or its relevant scores as outcome. We effectively addressed the challenging issue of gene-based mediation analysis by taking its composite null hypothesis testing nature into consideration and fitting a weighted test statistic.ResultsWe discovered that childhood maltreatment could substantially affected PTSD or PTSD-related scores, and that childhood maltreatment was associated with DNA methylation which further had significant roles in PTSD and these scores. Furthermore, using the proposed mediation method, we identified multiple genes within which DNA methylation sites exhibited mediating roles in the influence path from childhood maltreatment to PTSD-relevant scores in adult, with 13 for Beck Depression Inventory and 6 for modified PTSD Symptom Scale, respectively.ConclusionOur results have the potential to confer meaningful insights into the biological mechanism for the impact of early adverse experience on adult diseases; and our proposed mediation methods can be applied to other similar analysis settings.
Collapse
Affiliation(s)
- Haibo Xu
- Center for Mental Health Education and Research, Xuzhou Medical University, Xuzhou, China
- School of Management, Xuzhou Medical University, Xuzhou, China
- *Correspondence: Haibo Xu,
| | - Zhonghe Shao
- Department of Epidemiology and Biostatistics, Ministry of Education Key Laboratory of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Xin Liu
- Center for Mental Health Education and Research, Xuzhou Medical University, Xuzhou, China
- School of Management, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Ping Zeng,
| |
Collapse
|
4
|
Survival Risk Prediction of Esophageal Cancer Based on the Kohonen Network Clustering Algorithm and Kernel Extreme Learning Machine. MATHEMATICS 2022. [DOI: 10.3390/math10091367] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Accurate prediction of the survival risk level of patients with esophageal cancer is significant for the selection of appropriate treatment methods. It contributes to improving the living quality and survival chance of patients. However, considering that the characteristics of blood index vary with individuals on the basis of their ages, personal habits and living environment etc., a unified artificial intelligence prediction model is not precisely adequate. In order to enhance the precision of the model on the prediction of esophageal cancer survival risk, this study proposes a different model based on the Kohonen network clustering algorithm and the kernel extreme learning machine (KELM), aiming to classifying the tested population into five catergories and provide better efficiency with the use of machine learning. Firstly, the Kohonen network clustering method was used to cluster the patient samples and five types of samples were obtained. Secondly, patients were divided into two risk levels based on 5-year net survival. Then, the Taylor formula was used to expand the theory to analyze the influence of different activation functions on the KELM modeling effect, and conduct experimental verification. RBF was selected as the activation function of the KELM. Finally, the adaptive mutation sparrow search algorithm (AMSSA) was used to optimize the model parameters. The experimental results were compared with the methods of the artificial bee colony optimized support vector machine (ABC-SVM), the three layers of random forest (TLRF), the gray relational analysis–particle swarm optimization support vector machine (GP-SVM) and the mixed-effects Cox model (Cox-LMM). The results showed that the prediction model proposed in this study had certain advantages in terms of prediction accuracy and running time, and could provide support for medical personnel to choose the treatment mode of esophageal cancer patients.
Collapse
|
5
|
Lu H, Wei Y, Jiang Z, Zhang J, Wang T, Huang S, Zeng P. Integrative eQTL-weighted hierarchical Cox models for SNP-set based time-to-event association studies. J Transl Med 2021; 19:418. [PMID: 34627275 PMCID: PMC8502405 DOI: 10.1186/s12967-021-03090-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 09/26/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Integrating functional annotations into SNP-set association studies has been proven a powerful analysis strategy. Statistical methods for such integration have been developed for continuous and binary phenotypes; however, the SNP-set integrative approaches for time-to-event or survival outcomes are lacking. METHODS We here propose IEHC, an integrative eQTL (expression quantitative trait loci) hierarchical Cox regression, for SNP-set based survival association analysis by modeling effect sizes of genetic variants as a function of eQTL via a hierarchical manner. Three p-values combination tests are developed to examine the joint effects of eQTL and genetic variants after a novel decorrelated modification of statistics for the two components. An omnibus test (IEHC-ACAT) is further adapted to aggregate the strengths of all available tests. RESULTS Simulations demonstrated that the IEHC joint tests were more powerful if both eQTL and genetic variants contributed to association signal, while IEHC-ACAT was robust and often outperformed other approaches across various simulation scenarios. When applying IEHC to ten TCGA cancers by incorporating eQTL from relevant tissues of GTEx, we revealed that substantial correlations existed between the two types of effect sizes of genetic variants from TCGA and GTEx, and identified 21 (9 unique) cancer-associated genes which would otherwise be missed by approaches not incorporating eQTL. CONCLUSION IEHC represents a flexible, robust, and powerful approach to integrate functional omics information to enhance the power of identifying association signals for the survival risk of complex human cancers.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yongyue Wei
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, 211166, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jinhui Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
6
|
Shao Z, Wang T, Zhang M, Jiang Z, Huang S, Zeng P. IUSMMT: Survival mediation analysis of gene expression with multiple DNA methylation exposures and its application to cancers of TCGA. PLoS Comput Biol 2021; 17:e1009250. [PMID: 34464378 PMCID: PMC8437300 DOI: 10.1371/journal.pcbi.1009250] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/13/2021] [Accepted: 07/06/2021] [Indexed: 02/07/2023] Open
Abstract
Effective and powerful survival mediation models are currently lacking. To partly fill such knowledge gap, we particularly focus on the mediation analysis that includes multiple DNA methylations acting as exposures, one gene expression as the mediator and one survival time as the outcome. We proposed IUSMMT (intersection-union survival mixture-adjusted mediation test) to effectively examine the existence of mediation effect by fitting an empirical three-component mixture null distribution. With extensive simulation studies, we demonstrated the advantage of IUSMMT over existing methods. We applied IUSMMT to ten TCGA cancers and identified multiple genes that exhibited mediating effects. We further revealed that most of the identified regions, in which genes behaved as active mediators, were cancer type-specific and exhibited a full mediation from DNA methylation CpG sites to the survival risk of various types of cancers. Overall, IUSMMT represents an effective and powerful alternative for survival mediation analysis; our results also provide new insights into the functional role of DNA methylation and gene expression in cancer progression/prognosis and demonstrate potential therapeutic targets for future clinical practice.
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Meng Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
| |
Collapse
|
7
|
Huang L, Yu X, Jiang Z, Zeng P. Novel Autophagy-Related Gene Signature Investigation for Patients With Oral Squamous Cell Carcinoma. Front Genet 2021; 12:673319. [PMID: 34220946 PMCID: PMC8248343 DOI: 10.3389/fgene.2021.673319] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/26/2021] [Indexed: 12/26/2022] Open
Abstract
The correlation between autophagy defects and oral squamous cell carcinoma (OSCC) has been previously studied, but only based on a limited number of autophagy-related genes in cell lines or animal models. The aim of the present study was to analyze differentially expressed autophagy-related genes through The Cancer Genome Atlas (TCGA) database to explore enriched pathways and potential biological function. Based on TCGA database, a signature composed of four autophagy-related genes (CDKN2A, NKX2-3, NRG3, and FADD) was established by using multivariate Cox regression models and two Gene Expression Omnibus datasets were applied for external validation. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed to study the function of autophagy-related genes and their pathways. The most significant GO and KEGG pathways were enriched in several key pathways that were related to the progression of autophagy and OSCC. Furthermore, a prognostic risk score was constructed based on the four genes; patients were then divided into two groups (i.e., high risk and low risk) in terms of the median of risk score. Prognosis of the two groups and results showed that patients at the low-risk group had a much better prognosis than those at the high-risk group, regardless of whether they were in the training datasets or validation datasets. Multivariate Cox regression results indicated that the risk score of the autophagy-related gene signatures could greatly predict the prognosis of patients after controlling for several clinical covariates. The findings of the present study revealed that autophagy-related gene signatures play an important role in OSCC and are potential prognostic biomarkers and therapeutic targets.
Collapse
Affiliation(s)
- Lihong Huang
- Department of Biostatistics, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Xinghao Yu
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
| | - Zhou Jiang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
8
|
Zhang J, Lu H, Zhang S, Wang T, Zhao H, Guan F, Zeng P. Leveraging Methylation Alterations to Discover Potential Causal Genes Associated With the Survival Risk of Cervical Cancer in TCGA Through a Two-Stage Inference Approach. Front Genet 2021; 12:667877. [PMID: 34149809 PMCID: PMC8206792 DOI: 10.3389/fgene.2021.667877] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/19/2021] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Multiple genes were previously identified to be associated with cervical cancer; however, the genetic architecture of cervical cancer remains unknown and many potential causal genes are yet to be discovered. METHODS To explore potential causal genes related to cervical cancer, a two-stage causal inference approach was proposed within the framework of Mendelian randomization, where the gene expression was treated as exposure, with methylations located within the promoter regions of genes serving as instrumental variables. Five prediction models were first utilized to characterize the relationship between the expression and methylations for each gene; then, the methylation-regulated gene expression (MReX) was obtained and the association was evaluated via Cox mixed-effect model based on MReX. We further implemented the aggregated Cauchy association test (ACAT) combination to take advantage of respective strengths of these prediction models while accounting for dependency among the p-values. RESULTS A total of 14 potential causal genes were discovered to be associated with the survival risk of cervical cancer in TCGA when the five prediction models were separately employed. The total number of potential causal genes was brought to 23 when conducting ACAT. Some of the newly discovered genes may be novel (e.g., YJEFN3, SPATA5L1, IMMP1L, C5orf55, PPIP5K2, ZNF330, CRYZL1, PPM1A, ESCO2, ZNF605, ZNF225, ZNF266, FICD, and OSTC). Functional analyses showed that these genes were enriched in tumor-associated pathways. Additionally, four genes (i.e., COL6A1, SYDE1, ESCO2, and GIPC1) were differentially expressed between tumor and normal tissues. CONCLUSION Our study discovered promising candidate genes that were causally associated with the survival risk of cervical cancer and thus provided new insights into the genetic etiology of cervical cancer.
Collapse
Affiliation(s)
- Jinhui Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Haojie Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Shuo Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Huashuo Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Fengjun Guan
- Department of Pediatrics, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|