1
|
Wang X, Lian Q, Dong H, Xu S, Su Y, Wu X. Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae014. [PMID: 39049508 DOI: 10.1093/gpbjnl/qzae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Revised: 06/20/2023] [Accepted: 06/25/2023] [Indexed: 07/27/2024]
Abstract
Gene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA sequencing (RNA-seq) data, which helps to decipher single-cell heterogeneity and cell type-specific variability by incorporating prior knowledge from functional gene sets. Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a powerful technique for interrogating single-cell chromatin-based gene regulation, and genes or gene sets with dynamic regulatory potentials can be regarded as cell type-specific markers as if in single-cell RNA-seq (scRNA-seq). However, there are few GSS tools specifically designed for scATAC-seq, and the applicability and performance of RNA-seq GSS tools on scATAC-seq data remain to be investigated. Here, we systematically benchmarked ten GSS tools, including four bulk RNA-seq tools, five scRNA-seq tools, and one scATAC-seq method. First, using matched scATAC-seq and scRNA-seq datasets, we found that the performance of GSS tools on scATAC-seq data was comparable to that on scRNA-seq, suggesting their applicability to scATAC-seq. Then, the performance of different GSS tools was extensively evaluated using up to ten scATAC-seq datasets. Moreover, we evaluated the impact of gene activity conversion, dropout imputation, and gene set collections on the results of GSS. Results show that dropout imputation can significantly promote the performance of almost all GSS tools, while the impact of gene activity conversion methods or gene set collections on GSS performance is more dependent on GSS tools or datasets. Finally, we provided practical guidelines for choosing appropriate preprocessing methods and GSS tools in different application scenarios.
Collapse
Affiliation(s)
- Xi Wang
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Qiwei Lian
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Haoyu Dong
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| | - Shuo Xu
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| |
Collapse
|
2
|
Barakat A, Munro G, Heegaard AM. Finding new analgesics: Computational pharmacology faces drug discovery challenges. Biochem Pharmacol 2024; 222:116091. [PMID: 38412924 DOI: 10.1016/j.bcp.2024.116091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 01/10/2024] [Accepted: 02/23/2024] [Indexed: 02/29/2024]
Abstract
Despite the worldwide prevalence and huge burden of pain, pain is an undertreated phenomenon. Currently used analgesics have several limitations regarding their efficacy and safety. The discovery of analgesics possessing a novel mechanism of action has faced multiple challenges, including a limited understanding of biological processes underpinning pain and analgesia and poor animal-to-human translation. Computational pharmacology is currently employed to face these challenges. In this review, we discuss the theory, methods, and applications of computational pharmacology in pain research. Computational pharmacology encompasses a wide variety of theoretical concepts and practical methodological approaches, with the overall aim of gaining biological insight through data acquisition and analysis. Data are acquired from patients or animal models with pain or analgesic treatment, at different levels of biological organization (molecular, cellular, physiological, and behavioral). Distinct methodological algorithms can then be used to analyze and integrate data. This helps to facilitate the identification of biological molecules and processes associated with pain phenotype, build quantitative models of pain signaling, and extract translatable features between humans and animals. However, computational pharmacology has several limitations, and its predictions can provide false positive and negative findings. Therefore, computational predictions are required to be validated experimentally before drawing solid conclusions. In this review, we discuss several case study examples of combining and integrating computational tools with experimental pain research tools to meet drug discovery challenges.
Collapse
Affiliation(s)
- Ahmed Barakat
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; Department of Pharmacology and Toxicology, Faculty of Pharmacy, Assiut University, Assiut, Egypt.
| | | | - Anne-Marie Heegaard
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
3
|
Leri J, Liu J, Kelly M, Kertes DA. A preliminary investigation of epigenome-wide DNA methylation and temperament during infancy. Dev Psychobiol 2024; 66:e22475. [PMID: 38470455 DOI: 10.1002/dev.22475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 02/05/2024] [Accepted: 02/06/2024] [Indexed: 03/13/2024]
Abstract
This study provides preliminary evidence for an epigenetic architecture of infant temperament. At 12 months of age, blood was collected and assayed for DNA methylation and maternally reported infant temperament was assessed using the Infant Behavior Questionnaire in 67 mother-infant dyads. Epigenome-wide analyses showed that the higher order temperament dimensions Surgency and Negative Affect were associated with DNA methylation. The epigenetic signatures of Surgency and Negative Affect were situated at genes involved in synaptic signaling and plasticity. Although replication is required, these results are consistent with a biologically based model of temperament, create new avenues for hypothesis-driven research into epigenetic pathways that underlie individual differences in temperament, and demonstrate that infant temperament has a widespread epigenetic signature in the methylome.
Collapse
Affiliation(s)
- John Leri
- Department of Psychology, University of Florida, Gainesville, Florida, USA
| | - Jingwen Liu
- Department of Psychology, University of Florida, Gainesville, Florida, USA
| | - Maria Kelly
- Department of Pediatrics, University of Florida, Gainesville, Florida, USA
| | - Darlene A Kertes
- Department of Psychology, University of Florida, Gainesville, Florida, USA
- UF Genetics Institute, University of Florida, Gainesville, USA
| |
Collapse
|
4
|
Hui TX, Kasim S, Aziz IA, Fudzee MFM, Haron NS, Sutikno T, Hassan R, Mahdin H, Sen SC. Robustness evaluations of pathway activity inference methods on gene expression data. BMC Bioinformatics 2024; 25:23. [PMID: 38216898 PMCID: PMC10785356 DOI: 10.1186/s12859-024-05632-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 01/02/2024] [Indexed: 01/14/2024] Open
Abstract
BACKGROUND With the exponential growth of high-throughput technologies, multiple pathway analysis methods have been proposed to estimate pathway activities from gene expression profiles. These pathway activity inference methods can be divided into two main categories: non-Topology-Based (non-TB) and Pathway Topology-Based (PTB) methods. Although some review and survey articles discussed the topic from different aspects, there is a lack of systematic assessment and comparisons on the robustness of these approaches. RESULTS Thus, this study presents comprehensive robustness evaluations of seven widely used pathway activity inference methods using six cancer datasets based on two assessments. The first assessment seeks to investigate the robustness of pathway activity in pathway activity inference methods, while the second assessment aims to assess the robustness of risk-active pathways and genes predicted by these methods. The mean reproducibility power and total number of identified informative pathways and genes were evaluated. Based on the first assessment, the mean reproducibility power of pathway activity inference methods generally decreased as the number of pathway selections increased. Entropy-based Directed Random Walk (e-DRW) distinctly outperformed other methods in exhibiting the greatest reproducibility power across all cancer datasets. On the other hand, the second assessment shows that no methods provide satisfactory results across datasets. CONCLUSION However, PTB methods generally appear to perform better in producing greater reproducibility power and identifying potential cancer markers compared to non-TB methods.
Collapse
Affiliation(s)
- Tay Xin Hui
- Soft Computing and Data Mining Center, Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), 83000, Batu Pahat, Malaysia
| | - Shahreen Kasim
- Soft Computing and Data Mining Center, Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), 83000, Batu Pahat, Malaysia.
| | - Izzatdin Abdul Aziz
- Computer and Information Sciences Department (CISD), Universiti Teknologi PETRONAS (UTP), 32610, Seri Iskandar, Malaysia
| | - Mohd Farhan Md Fudzee
- Soft Computing and Data Mining Center, Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), 83000, Batu Pahat, Malaysia
| | - Nazleeni Samiha Haron
- Computer and Information Sciences Department (CISD), Universiti Teknologi PETRONAS (UTP), 32610, Seri Iskandar, Malaysia
| | - Tole Sutikno
- Department of Electrical Engineering, Universitas Ahmad Dahlan (UAD), 55166, Yogyakarta, Indonesia
| | - Rohayanti Hassan
- Faculty of Electrical Engineering, Universiti Teknologi Malaysia (UTM), 81310, Johor Bahru, Malaysia
| | - Hairulnizam Mahdin
- Soft Computing and Data Mining Center, Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), 83000, Batu Pahat, Malaysia
| | - Seah Choon Sen
- Faculty of Computing, Universiti Teknologi Malaysia (UTM), 81310, Johor Bahru, Malaysia
| |
Collapse
|
5
|
Hakobyan S, Stepanyan A, Nersisyan L, Binder H, Arakelyan A. PSF toolkit: an R package for pathway curation and topology-aware analysis. Front Genet 2023; 14:1264656. [PMID: 37680201 PMCID: PMC10482229 DOI: 10.3389/fgene.2023.1264656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 08/09/2023] [Indexed: 09/09/2023] Open
Abstract
Most high throughput genomic data analysis pipelines currently rely on over-representation or gene set enrichment analysis (ORA/GSEA) approaches for functional analysis. In contrast, topology-based pathway analysis methods, which offer a more biologically informed perspective by incorporating interaction and topology information, have remained underutilized and inaccessible due to various limiting factors. These methods heavily rely on the quality of pathway topologies and often utilize predefined topologies from databases without assessing their correctness. To address these issues and make topology-aware pathway analysis more accessible and flexible, we introduce the PSF (Pathway Signal Flow) toolkit R package. Our toolkit integrates pathway curation and topology-based analysis, providing interactive and command-line tools that facilitate pathway importation, correction, and modification from diverse sources. This enables users to perform topology-based pathway signal flow analysis in both interactive and command-line modes. To showcase the toolkit's usability, we curated 36 KEGG signaling pathways and conducted several use-case studies, comparing our method with ORA and the topology-based signaling pathway impact analysis (SPIA) method. The results demonstrate that the algorithm can effectively identify ORA enriched pathways while providing more detailed branch-level information. Moreover, in contrast to the SPIA method, it offers the advantage of being cut-off free and less susceptible to the variability caused by selection thresholds. By combining pathway curation and topology-based analysis, the PSF toolkit enhances the quality, flexibility, and accessibility of topology-aware pathway analysis. Researchers can now easily import pathways from various sources, correct and modify them as needed, and perform detailed topology-based pathway signal flow analysis. In summary, our PSF toolkit offers an integrated solution that addresses the limitations of current topology-based pathway analysis methods. By providing interactive and command-line tools for pathway curation and topology-based analysis, we empower researchers to conduct comprehensive pathway analyses across a wide range of applications.
Collapse
Affiliation(s)
- Siras Hakobyan
- Bioinformatics Group, Institute of Molecular Biology, Armenian National Academy of Sciences, Yerevan, Armenia
- Armenian Bioinformatics Institute (ABI), Yerevan, Armenia
| | | | | | - Hans Binder
- Armenian Bioinformatics Institute, Yerevan, Armenia
- Interdisciplinary Centre for Bioinformatics, University of Leipzig, Leipzig, Germany
| | - Arsen Arakelyan
- Bioinformatics Group, Institute of Molecular Biology, Armenian National Academy of Sciences, Yerevan, Armenia
- Russian-Armenian University, Yerevan, Armenia
| |
Collapse
|
6
|
Zhao K, Rhee SY. Interpreting omics data with pathway enrichment analysis. Trends Genet 2023; 39:308-319. [PMID: 36750393 DOI: 10.1016/j.tig.2023.01.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 11/24/2022] [Accepted: 01/13/2023] [Indexed: 02/09/2023]
Abstract
Pathway enrichment analysis is indispensable for interpreting omics datasets and generating hypotheses. However, the foundations of enrichment analysis remain elusive to many biologists. Here, we discuss best practices in interpreting different types of omics data using pathway enrichment analysis and highlight the importance of considering intrinsic features of various types of omics data. We further explain major components that influence the outcomes of a pathway enrichment analysis, including defining background sets and choosing reference annotation databases. To improve reproducibility, we describe how to standardize reporting methodological details in publications. This article aims to serve as a primer for biologists to leverage the wealth of omics resources and motivate bioinformatics tool developers to enhance the power of pathway enrichment analysis.
Collapse
Affiliation(s)
- Kangmei Zhao
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94025, USA.
| | - Seung Yon Rhee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94025, USA.
| |
Collapse
|
7
|
Griffin AT, Vlahos LJ, Chiuzan C, Califano A. NaRnEA: An Information Theoretic Framework for Gene Set Analysis. ENTROPY (BASEL, SWITZERLAND) 2023; 25:e25030542. [PMID: 36981431 PMCID: PMC10048242 DOI: 10.3390/e25030542] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/03/2023] [Accepted: 03/13/2023] [Indexed: 05/26/2023]
Abstract
Gene sets are being increasingly leveraged to make high-level biological inferences from transcriptomic data; however, existing gene set analysis methods rely on overly conservative, heuristic approaches for quantifying the statistical significance of gene set enrichment. We created Nonparametric analytical-Rank-based Enrichment Analysis (NaRnEA) to facilitate accurate and robust gene set analysis with an optimal null model derived using the information theoretic Principle of Maximum Entropy. By measuring the differential activity of ~2500 transcriptional regulatory proteins based on the differential expression of each protein's transcriptional targets between primary tumors and normal tissue samples in three cohorts from The Cancer Genome Atlas (TCGA), we demonstrate that NaRnEA critically improves in two widely used gene set analysis methods: Gene Set Enrichment Analysis (GSEA) and analytical-Rank-based Enrichment Analysis (aREA). We show that the NaRnEA-inferred differential protein activity is significantly correlated with differential protein abundance inferred from independent, phenotype-matched mass spectrometry data in the Clinical Proteomic Tumor Analysis Consortium (CPTAC), confirming the statistical and biological accuracy of our approach. Additionally, our analysis crucially demonstrates that the sample-shuffling empirical null models leveraged by GSEA and aREA for gene set analysis are overly conservative, a shortcoming that is avoided by the newly developed Maximum Entropy analytical null model employed by NaRnEA.
Collapse
Affiliation(s)
- Aaron T. Griffin
- Medical Scientist Training Program, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Lukas J. Vlahos
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Codruta Chiuzan
- Department of Biostatistics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Andrea Califano
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
- Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA
- JP Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| |
Collapse
|
8
|
Rai SN, Das S, Pan J, Mishra DC, Fu XA. Multigroup prediction in lung cancer patients and comparative controls using signature of volatile organic compounds in breath samples. PLoS One 2022; 17:e0277431. [PMID: 36449484 PMCID: PMC9710764 DOI: 10.1371/journal.pone.0277431] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 10/26/2022] [Indexed: 12/05/2022] Open
Abstract
Early detection of lung cancer is a crucial factor for increasing its survival rates among the detected patients. The presence of carbonyl volatile organic compounds (VOCs) in exhaled breath can play a vital role in early detection of lung cancer. Identifying these VOC markers in breath samples through innovative statistical and machine learning techniques is an important task in lung cancer research. Therefore, we proposed an experimental approach for generation of VOC molecular concentration data using unique silicon microreactor technology and further identification and characterization of key relevant VOCs important for lung cancer detection through statistical and machine learning algorithms. We reported several informative VOCs and tested their effectiveness in multi-group classification of patients. Our analytical results indicated that seven key VOCs, including C4H8O2, C13H22O, C11H22O, C2H4O2, C7H14O, C6H12O, and C5H8O, are sufficient to detect the lung cancer patients with higher mean classification accuracy (92%) and lower standard error (0.03) compared to other combinations. In other words, the molecular concentrations of these VOCs in exhaled breath samples were able to discriminate the patients with lung cancer (n = 156) from the healthy smoker and nonsmoker controls (n = 193) and patients with benign pulmonary nodules (n = 65). The quantification of carbonyl VOC profiles from breath samples and identification of crucial VOCs through our experimental approach paves the way forward for non-invasive lung cancer detection. Further, our experimental and analytical approach of VOC quantitative analysis in breath samples may be extended to other diseases, including COVID-19 detection.
Collapse
Affiliation(s)
- Shesh N. Rai
- Biostatistics and Bioinformatics Facility, Brown Cancer Center, University of Louisville, Louisville, KY, United States of America
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY, United States of America
- Hepatobiology and Toxicology Center, University of Louisville, Louisville, KY, United States of America
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, United States of America
- Biostatistics and Informatics Facility, Center for Integrative Environmental Research Sciences, University of Louisville, Louisville, KY, United States of America
- Christina Lee Brown Envirome Institute, University of Louisville, Louisville, KY, United States of America
- * E-mail: (SNR); (SD)
| | - Samarendra Das
- Biostatistics and Bioinformatics Facility, Brown Cancer Center, University of Louisville, Louisville, KY, United States of America
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY, United States of America
- ICAR-Directorate of Foot and Mouth Disease, Arugul, Bhubaneswar, Odisha, India
- International Centre for Foot and Mouth Disease, Arugul, Bhubaneswar, Odisha, India
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, India
- * E-mail: (SNR); (SD)
| | - Jianmin Pan
- Biostatistics and Bioinformatics Facility, Brown Cancer Center, University of Louisville, Louisville, KY, United States of America
| | - Dwijesh C. Mishra
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, India
| | - Xiao-An Fu
- Department of Chemical Engineering, University of Louisville, Louisville, KY, United States of America
| |
Collapse
|
9
|
Afridi MS, Javed MA, Ali S, De Medeiros FHV, Ali B, Salam A, Sumaira, Marc RA, Alkhalifah DHM, Selim S, Santoyo G. New opportunities in plant microbiome engineering for increasing agricultural sustainability under stressful conditions. FRONTIERS IN PLANT SCIENCE 2022; 13:899464. [PMID: 36186071 PMCID: PMC9524194 DOI: 10.3389/fpls.2022.899464] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 08/08/2022] [Indexed: 07/30/2023]
Abstract
Plant microbiome (or phytomicrobiome) engineering (PME) is an anticipated untapped alternative strategy that could be exploited for plant growth, health and productivity under different environmental conditions. It has been proven that the phytomicrobiome has crucial contributions to plant health, pathogen control and tolerance under drastic environmental (a)biotic constraints. Consistent with plant health and safety, in this article we address the fundamental role of plant microbiome and its insights in plant health and productivity. We also explore the potential of plant microbiome under environmental restrictions and the proposition of improving microbial functions that can be supportive for better plant growth and production. Understanding the crucial role of plant associated microbial communities, we propose how the associated microbial actions could be enhanced to improve plant growth-promoting mechanisms, with a particular emphasis on plant beneficial fungi. Additionally, we suggest the possible plant strategies to adapt to a harsh environment by manipulating plant microbiomes. However, our current understanding of the microbiome is still in its infancy, and the major perturbations, such as anthropocentric actions, are not fully understood. Therefore, this work highlights the importance of manipulating the beneficial plant microbiome to create more sustainable agriculture, particularly under different environmental stressors.
Collapse
Affiliation(s)
| | - Muhammad Ammar Javed
- Institute of Industrial Biotechnology, Government College University, Lahore, Pakistan
| | - Sher Ali
- Department of Food Engineering, Faculty of Animal Science and Food Engineering, University of São Paulo (USP), São Paulo, Brazil
| | | | - Baber Ali
- Department of Plant Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Abdul Salam
- Zhejiang Key Laboratory of Crop Germplasm, Department of Agronomy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Sumaira
- Department of Biotechnology, Quaid-i-Azam University, Islamabad, Pakistan
| | - Romina Alina Marc
- Food Engineering Department, Faculty of Food Science and Technology, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Cluj-Napoca, Romania
| | - Dalal Hussien M. Alkhalifah
- Department of Biology, College of Science, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Samy Selim
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Jouf University, Sakaka, Saudi Arabia
| | - Gustavo Santoyo
- Instituto de Investigaciones Químico-Biológicas, Universidad Michoacana de San Nicolás de Hidalgo, Morelia, Mexico
| |
Collapse
|
10
|
Das S, Rai A, Rai SN. Differential Expression Analysis of Single-Cell RNA-Seq Data: Current Statistical Approaches and Outstanding Challenges. ENTROPY 2022; 24:e24070995. [PMID: 35885218 PMCID: PMC9315519 DOI: 10.3390/e24070995] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/25/2022] [Accepted: 07/09/2022] [Indexed: 01/11/2023]
Abstract
With the advent of single-cell RNA-sequencing (scRNA-seq), it is possible to measure the expression dynamics of genes at the single-cell level. Through scRNA-seq, a huge amount of expression data for several thousand(s) of genes over million(s) of cells are generated in a single experiment. Differential expression analysis is the primary downstream analysis of such data to identify gene markers for cell type detection and also provide inputs to other secondary analyses. Many statistical approaches for differential expression analysis have been reported in the literature. Therefore, we critically discuss the underlying statistical principles of the approaches and distinctly divide them into six major classes, i.e., generalized linear, generalized additive, Hurdle, mixture models, two-class parametric, and non-parametric approaches. We also succinctly discuss the limitations that are specific to each class of approaches, and how they are addressed by other subsequent classes of approach. A number of challenges are identified in this study that must be addressed to develop the next class of innovative approaches. Furthermore, we also emphasize the methodological challenges involved in differential expression analysis of scRNA-seq data that researchers must address to draw maximum benefit from this recent single-cell technology. This study will serve as a guide to genome researchers and experimental biologists to objectively select options for their analysis.
Collapse
Affiliation(s)
- Samarendra Das
- ICAR-Directorate of Foot and Mouth Disease, Arugul, Bhubaneswar 752050, India
- International Centre for Foot and Mouth Disease, Arugul, Bhubaneswar 752050, India
- Correspondence: or (S.D.); (S.N.R.)
| | - Anil Rai
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India;
| | - Shesh N. Rai
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
- Biostatistics and Bioinformatics Facility, Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA
- Biostatisitcs and Informatics Facility, Center for Integrative Environmental Health Sciences, University of Louisville, Louisville, KY 40202, USA
- Data Analysis and Sample Management Facility, The University of Louisville Super Fund Center, University of Louisville, Louisville, KY 40202, USA
- Hepatobiology and Toxicology Center, University of Louisville, Louisville, KY 40202, USA
- Christina Lee Brown Envirome Institute, University of Louisville, Louisville, KY 40202, USA
- Correspondence: or (S.D.); (S.N.R.)
| |
Collapse
|
11
|
Cerulo L, Pagnotta SM. massiveGST: A Mann-Whitney-Wilcoxon Gene-Set Test Tool That Gives Meaning to Gene-Set Enrichment Analysis. ENTROPY 2022; 24:e24050739. [PMID: 35626622 PMCID: PMC9140214 DOI: 10.3390/e24050739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 05/16/2022] [Accepted: 05/19/2022] [Indexed: 01/27/2023]
Abstract
Gene-set enrichment analysis is the key methodology for obtaining biological information from transcriptomic space’s statistical result. Since its introduction, Gene-set Enrichment analysis methods have obtained more reliable results and a wider range of application. Great attention has been devoted to global tests, in contrast to competitive methods that have been largely ignored, although they appear more flexible because they are independent from the source of gene-profiles. We analyzed the properties of the Mann–Whitney–Wilcoxon test, a competitive method, and adapted its interpretation in the context of enrichment analysis by introducing a Normalized Enrichment Score that summarize two interpretations: a probability estimate and a location index. Two implementations are presented and compared with relevant literature methods: an R package and an online web tool. Both allow for obtaining tabular and graphical results with attention to reproducible research.
Collapse
Affiliation(s)
- Luigi Cerulo
- Department of Science and Technology, Università degli Studi del Sannio, 82100 Benevento, Italy;
- Bioinformatics Lab, Biogem, Molecular Biology and Genetics Research Institute, 83031 Ariano Irpino, Italy
| | - Stefano Maria Pagnotta
- Department of Science and Technology, Università degli Studi del Sannio, 82100 Benevento, Italy;
- Correspondence:
| |
Collapse
|
12
|
Lett BM, Kirkpatrick BW. Identifying genetic variants and pathways influencing daughter averages for twinning in North American Holstein cattle and evaluating the potential for genomic selection. J Dairy Sci 2022; 105:5972-5984. [PMID: 35525609 DOI: 10.3168/jds.2021-21238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 03/04/2022] [Indexed: 11/19/2022]
Abstract
Multiple birth in dairy cattle is a detrimental trait both economically for producers and for animal health. Genetics of twinning is complex and has led to several quantitative trait loci regions being associated with increased twinning. To identify variants associated with this trait, calving records from 2 time periods were used to estimate daughter averages for twinning for Holstein bulls. Multiple analyses were conducted and compared including GWAS, genomic prediction, and gene set enrichment analysis for pathway detection. Although pathway analysis did not yield many congruent pathways of interest between data sets, it did indicate two of interest. Both pathways have ties to the strong candidate region on BTA11 from the genome-wide association analysis across data sets. This region does not overlap with previously identified quantitative trait loci regions for twinning or ovulation rate in cattle. The strongest associated SNPs were upstream from 2 candidate genes LHCGR and FSHR, which are involved in folliculogenesis. Genomic prediction showed a moderate correlation accuracy (0.43) when predicting genomic breeding values for bulls with estimates from calving records from 2010 to 2016. Future analysis of the region on BTA11 and the relation of the candidate genes could improve this accuracy.
Collapse
Affiliation(s)
- Beth M Lett
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison 53706
| | - Brian W Kirkpatrick
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison 53706.
| |
Collapse
|
13
|
Li J, Zhang Z, Guo K, Wu S, Guo C, Zhang X, Wang Z. Identification of a key glioblastoma candidate gene, FUBP3, based on weighted gene co-expression network analysis. BMC Neurol 2022; 22:139. [PMID: 35413821 PMCID: PMC9004042 DOI: 10.1186/s12883-022-02661-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 03/16/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Glioblastoma multiforme (GBM) is the most common aggressive malignant brain tumor. However, the molecular mechanism of glioblastoma formation is still poorly understood. To identify candidate genes that may be connected to glioma growth and development, weighted gene co-expression network analysis (WGCNA) was performed to construct a gene co-expression network between gene sets and clinical characteristics. We also explored the function of the key candidate gene. METHODS Two GBM datasets were selected from GEO Datasets. The R language was used to identify differentially expressed genes. WGCNA was performed to construct a gene co-expression network in the GEO glioblastoma samples. A custom Venn diagram website was used to find the intersecting genes. The GEPIA website was applied for survival analysis to determine the significant gene, FUBP3. OS, DSS, and PFI analyses, based on the UCSC Cancer Genomics Browser, were performed to verify the significance of FUBP3. Immunohistochemistry was performed to evaluate the expression of FUBP3 in glioblastoma and adjacent normal tissue. KEGG and GO enrichment analyses were used to reveal possible functions of FUBP3. Microenvironment analysis was used to explore the relationship between FUBP3 and immune infiltration. Immunohistochemistry was performed to verify the results of the microenvironment analysis. RESULTS GSE70231 and GSE108474 were selected from GEO Datasets, then 715 and 694 differentially expressed genes (DEGs) from GSE70231 and GSE108474, respectively, were identified. We then performed weighted gene co-expression network analysis (WGCNA) and identified the most downregulated gene modules of GSE70231 and GSE108474, and 659 and 3915 module genes from GSE70231 and GSE108474, respectively, were selected. Five intersection genes (FUBP3, DAD1, CLIC1, ABR, and DNM1) were calculated by Venn diagram. FUBP3 was then identified as the only significant gene by survival analysis using the GEPIA website. OS, DSS, and PFI analyses verified the significance of FUBP3. Immunohistochemical analysis revealed FUBP3 expression in GBM and adjacent normal tissue. KEGG and GO analyses uncovered the possible function of FUBP3 in GBM. Tumor microenvironment analysis showed that FUBP3 may be connected to immune infiltration, and immunohistochemistry identified a positive correlation between immune cells (CD4 + T cells, CD8 + T cells, and macrophages) and FUBP3. CONCLUSION FUBP3 is associated with immune surveillance in GBM, indicating that it has a great impact on GBM development and progression. Therefore, interventions involving FUBP3 and its regulatory pathway may be a new approach for GBM treatment.
Collapse
Affiliation(s)
- Jianmin Li
- Department of Neurosurgery, Binzhou Medical University Hospital, Binzhou, Shandong Province, People's Republic of China.
| | - Zhao Zhang
- Department of Neurosurgery, Binzhou Medical University Hospital, Binzhou, Shandong Province, People's Republic of China
| | - Ke Guo
- Department of Neurosurgery, Binzhou Medical University Hospital, Binzhou, Shandong Province, People's Republic of China
| | - Shuhua Wu
- Department of Pathology, Binzhou Medical University Hospital, Binzhou, Shandong Province, China
| | - Chong Guo
- Department of Neurosurgery, Binzhou Medical University Hospital, Binzhou, Shandong Province, People's Republic of China
| | - Xinfan Zhang
- Department of Neurosurgery, Binzhou Medical University Hospital, Binzhou, Shandong Province, People's Republic of China
| | - Zi Wang
- Department of Neurosurgery, Binzhou Medical University Hospital, Binzhou, Shandong Province, People's Republic of China
| |
Collapse
|
14
|
Marczyk M, Macioszek A, Tobiasz J, Polanska J, Zyla J. Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies. Front Genet 2021; 12:767358. [PMID: 34956320 PMCID: PMC8696167 DOI: 10.3389/fgene.2021.767358] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 11/10/2021] [Indexed: 11/13/2022] Open
Abstract
A typical genome-wide association study (GWAS) analyzes millions of single-nucleotide polymorphisms (SNPs), several of which are in a region of the same gene. To conduct gene set analysis (GSA), information from SNPs needs to be unified at the gene level. A widely used practice is to use only the most relevant SNP per gene; however, there are other methods of integration that could be applied here. Also, the problem of nonrandom association of alleles at two or more loci is often neglected. Here, we tested the impact of incorporation of different integrations and linkage disequilibrium (LD) correction on the performance of several GSA methods. Matched normal and breast cancer samples from The Cancer Genome Atlas database were used to evaluate the performance of six GSA algorithms: Coincident Extreme Ranks in Numerical Observations (CERNO), Gene Set Enrichment Analysis (GSEA), GSEA-SNP, improved GSEA for GWAS (i-GSEA4GWAS), Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA), and Over-Representation Analysis (ORA). Association of SNPs to phenotype was calculated using modified McNemar's test. Results for SNPs mapped to the same gene were integrated using Fisher and Stouffer methods and compared with the minimum p-value method. Four common measures were used to quantify the performance of all combinations of methods. Results of GSA analysis on GWAS were compared to the one performed on gene expression data. Comparing all evaluation metrics across different GSA algorithms, integrations, and LD correction, we highlighted CERNO, and MAGENTA with Stouffer as the most efficient. Applying LD correction increased prioritization and specificity of enrichment outcomes for all tested algorithms. When Fisher or Stouffer were used with LD, sensitivity and reproducibility were also better. Using any integration method was beneficial in comparison with a minimum p-value method in specific combinations. The correlation between GSA results from genomic and transcriptomic level was the highest when Stouffer integration was combined with LD correction. We thoroughly evaluated different approaches to GSA in GWAS in terms of performance to guide others to select the most effective combinations. We showed that LD correction and Stouffer integration could increase the performance of enrichment analysis and encourage the usage of these techniques.
Collapse
Affiliation(s)
- Michal Marczyk
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland.,Yale Cancer Center, Yale School of Medicine, New Haven, CT, United States
| | - Agnieszka Macioszek
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Joanna Tobiasz
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Joanna Polanska
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Joanna Zyla
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
15
|
Rocca A, Kholodenko BN. Can Systems Biology Advance Clinical Precision Oncology? Cancers (Basel) 2021; 13:6312. [PMID: 34944932 PMCID: PMC8699328 DOI: 10.3390/cancers13246312] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 12/10/2021] [Indexed: 12/13/2022] Open
Abstract
Precision oncology is perceived as a way forward to treat individual cancer patients. However, knowing particular cancer mutations is not enough for optimal therapeutic treatment, because cancer genotype-phenotype relationships are nonlinear and dynamic. Systems biology studies the biological processes at the systems' level, using an array of techniques, ranging from statistical methods to network reconstruction and analysis, to mathematical modeling. Its goal is to reconstruct the complex and often counterintuitive dynamic behavior of biological systems and quantitatively predict their responses to environmental perturbations. In this paper, we review the impact of systems biology on precision oncology. We show examples of how the analysis of signal transduction networks allows to dissect resistance to targeted therapies and inform the choice of combinations of targeted drugs based on tumor molecular alterations. Patient-specific biomarkers based on dynamical models of signaling networks can have a greater prognostic value than conventional biomarkers. These examples support systems biology models as valuable tools to advance clinical and translational oncological research.
Collapse
Affiliation(s)
- Andrea Rocca
- Hygiene and Public Health, Local Health Unit of Romagna, 47121 Forlì, Italy
| | - Boris N. Kholodenko
- Systems Biology Ireland, School of Medicine, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland
- Department of Pharmacology, Yale University School of Medicine, New Haven, CT 06520, USA
| |
Collapse
|
16
|
Toward modeling metabolic state from single-cell transcriptomics. Mol Metab 2021; 57:101396. [PMID: 34785394 PMCID: PMC8829761 DOI: 10.1016/j.molmet.2021.101396] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 10/21/2021] [Accepted: 11/09/2021] [Indexed: 12/31/2022] Open
Abstract
Background Single-cell metabolic studies bring new insights into cellular function, which can often not be captured on other omics layers. Metabolic information has wide applicability, such as for the study of cellular heterogeneity or for the understanding of drug mechanisms and biomarker development. However, metabolic measurements on single-cell level are limited by insufficient scalability and sensitivity, as well as resource intensiveness, and are currently not possible in parallel with measuring transcript state, commonly used to identify cell types. Nevertheless, because omics layers are strongly intertwined, it is possible to make metabolic predictions based on measured data of more easily measurable omics layers together with prior metabolic network knowledge. Scope of Review We summarize the current state of single-cell metabolic measurement and modeling approaches, motivating the use of computational techniques. We review three main classes of computational methods used for prediction of single-cell metabolism: pathway-level analysis, constraint-based modeling, and kinetic modeling. We describe the unique challenges arising when transitioning from bulk to single-cell modeling. Finally, we propose potential model extensions and computational methods that could be leveraged to achieve these goals. Major Conclusions Single-cell metabolic modeling is a rising field that provides a new perspective for understanding cellular functions. The presented modeling approaches vary in terms of input requirements and assumptions, scalability, modeled metabolic layers, and newly gained insights. We believe that the use of prior metabolic knowledge will lead to more robust predictions and will pave the way for mechanistic and interpretable machine-learning models. Single-cell RNA sequencing and prior metabolic knowledge enable metabolic predictions. When compared to bulk, single-cell modeling is linked to unique insights and challenges. Computational modelling approaches differ in applicability and newly provided insights. The use of prior metabolic knowledge paves the way for mechanistic machine-learning.
Collapse
|
17
|
Gerstner N, Kehl T, Lenhof K, Eckhart L, Schneider L, Stöckel D, Backes C, Meese E, Keller A, Lenhof HP. GeneTrail: A Framework for the Analysis of High-Throughput Profiles. Front Mol Biosci 2021; 8:716544. [PMID: 34604304 PMCID: PMC8481803 DOI: 10.3389/fmolb.2021.716544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 09/01/2021] [Indexed: 12/05/2022] Open
Abstract
Experimental high-throughput techniques, like next-generation sequencing or microarrays, are nowadays routinely applied to create detailed molecular profiles of cells. In general, these platforms generate high-dimensional and noisy data sets. For their analysis, powerful bioinformatics tools are required to gain novel insights into the biological processes under investigation. Here, we present an overview of the GeneTrail tool suite that offers rich functionality for the analysis and visualization of (epi-)genomic, transcriptomic, miRNomic, and proteomic profiles. Our framework enables the analysis of standard bulk, time-series, and single-cell measurements and includes various state-of-the-art methods to identify potentially deregulated biological processes and to detect driving factors within those deregulated processes. We highlight the capabilities of our web service with an analysis of a single-cell COVID-19 data set that demonstrates its potential for uncovering complex molecular mechanisms. GeneTrail can be accessed freely and without login requirements at http://genetrail.bioinf.uni-sb.de.
Collapse
Affiliation(s)
- Nico Gerstner
- Center for Bioinformatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Tim Kehl
- Center for Bioinformatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Kerstin Lenhof
- Center for Bioinformatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Lea Eckhart
- Center for Bioinformatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Lara Schneider
- Center for Bioinformatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Daniel Stöckel
- Healthcare Digital & Data, Merck Healthcare KGaA, Darmstadt, Germany
| | - Christina Backes
- Center for Bioinformatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Eckart Meese
- Department of Human Genetics, Saarland University, Homburg, Germany
| | - Andreas Keller
- Center for Bioinformatics, Saarland Informatics Campus, Saarbrücken, Germany
- Chair for Clinical Bioinformatics, Saarland University, Saarbrücken, Germany
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, United States
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Saarland Informatics Campus, Saarbrücken, Germany
| |
Collapse
|
18
|
Karimi MR, Karimi AH, Abolmaali S, Sadeghi M, Schmitz U. Prospects and challenges of cancer systems medicine: from genes to disease networks. Brief Bioinform 2021; 23:6361045. [PMID: 34471925 PMCID: PMC8769701 DOI: 10.1093/bib/bbab343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 12/20/2022] Open
Abstract
It is becoming evident that holistic perspectives toward cancer are crucial in deciphering the overwhelming complexity of tumors. Single-layer analysis of genome-wide data has greatly contributed to our understanding of cellular systems and their perturbations. However, fundamental gaps in our knowledge persist and hamper the design of effective interventions. It is becoming more apparent than ever, that cancer should not only be viewed as a disease of the genome but as a disease of the cellular system. Integrative multilayer approaches are emerging as vigorous assets in our endeavors to achieve systemic views on cancer biology. Herein, we provide a comprehensive review of the approaches, methods and technologies that can serve to achieve systemic perspectives of cancer. We start with genome-wide single-layer approaches of omics analyses of cellular systems and move on to multilayer integrative approaches in which in-depth descriptions of proteogenomics and network-based data analysis are provided. Proteogenomics is a remarkable example of how the integration of multiple levels of information can reduce our blind spots and increase the accuracy and reliability of our interpretations and network-based data analysis is a major approach for data interpretation and a robust scaffold for data integration and modeling. Overall, this review aims to increase cross-field awareness of the approaches and challenges regarding the omics-based study of cancer and to facilitate the necessary shift toward holistic approaches.
Collapse
Affiliation(s)
| | | | | | - Mehdi Sadeghi
- Department of Cell & Molecular Biology, Semnan University, Semnan, Iran
| | - Ulf Schmitz
- Department of Molecular & Cell Biology, James Cook University, Townsville, QLD 4811, Australia
| |
Collapse
|
19
|
Mapping gene and gene pathways associated with coronary artery disease: a CARDIoGRAM exome and multi-ancestry UK biobank analysis. Sci Rep 2021; 11:16461. [PMID: 34385509 PMCID: PMC8361107 DOI: 10.1038/s41598-021-95637-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Accepted: 07/28/2021] [Indexed: 02/07/2023] Open
Abstract
Coronary artery disease (CAD) genome-wide association studies typically focus on single nucleotide variants (SNVs), and many potentially associated SNVs fail to reach the GWAS significance threshold. We performed gene and pathway-based association (GBA) tests on publicly available Coronary ARtery DIsease Genome wide Replication and Meta-analysis consortium Exome (n = 120,575) and multi ancestry pan UK Biobank study (n = 442,574) summary data using versatile gene-based association study (VEGAS2) and Multi-marker analysis of genomic annotation (MAGMA) to identify novel genes and pathways associated with CAD. We included only exonic SNVs and excluded regulatory regions. VEGAS2 and MAGMA ranked genes and pathways based on aggregated SNV test statistics. We used Bonferroni corrected gene and pathway significance threshold at 3.0 × 10-6 and 1.0 × 10-5, respectively. We also report the top one percent of ranked genes and pathways. We identified 17 top enriched genes with four genes (PCSK9, FAM177, LPL, ARGEF26), reaching statistical significance (p ≤ 3.0 × 10-6) using both GBA tests in two GWAS studies. In addition, our analyses identified ten genes (DUSP13, KCNJ11, CD300LF/RAB37, SLCO1B1, LRRFIP1, QSER1, UBR2, MOB3C, MST1R, and ABCC8) with previously unreported associations with CAD, although none of the single SNV associations within the genes were genome-wide significant. Among the top 1% non-lipid pathways, we detected pathways regulating coagulation, inflammation, neuronal aging, and wound healing.
Collapse
|
20
|
Das S, Rai SN. Statistical Approach of Gene Set Analysis with Quantitative Trait Loci for Crop Gene Expression Studies. ENTROPY (BASEL, SWITZERLAND) 2021; 23:945. [PMID: 34441085 PMCID: PMC8391627 DOI: 10.3390/e23080945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 07/19/2021] [Accepted: 07/21/2021] [Indexed: 11/16/2022]
Abstract
Genome-wide expression study is a powerful genomic technology to quantify expression dynamics of genes in a genome. In gene expression study, gene set analysis has become the first choice to gain insights into the underlying biology of diseases or stresses in plants. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results from the primary downstream differential expression analysis. The gene set analysis approaches are well developed in microarrays and RNA-seq gene expression data analysis. These approaches mainly focus on analyzing the gene sets with gene ontology or pathway annotation data. However, in plant biology, such methods may not establish any formal relationship between the genotypes and the phenotypes, as most of the traits are quantitative and controlled by polygenes. The existing Quantitative Trait Loci (QTL)-based gene set analysis approaches only focus on the over-representation analysis of the selected genes while ignoring their associated gene scores. Therefore, we developed an innovative statistical approach, GSQSeq, to analyze the gene sets with trait enriched QTL data. This approach considers the associated differential expression scores of genes while analyzing the gene sets. The performance of the developed method was tested on five different crop gene expression datasets obtained from real crop gene expression studies. Our analytical results indicated that the trait-specific analysis of gene sets was more robust and successful through the proposed approach than existing techniques. Further, the developed method provides a valuable platform for integrating the gene expression data with QTL data.
Collapse
Affiliation(s)
- Samarendra Das
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India;
- Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
| | - Shesh N. Rai
- Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
- Department of Pharmacology and Toxicology, University of Louisville, Louisville, KY 40202, USA
- Alcohol Research Center, University of Louisville, Louisville, KY 40202, USA
- Hepatobiology and Toxicology Center, University of Louisville, Louisville, KY 40202, USA
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA
| |
Collapse
|
21
|
Gogolev YV, Ahmar S, Akpinar BA, Budak H, Kiryushkin AS, Gorshkov VY, Hensel G, Demchenko KN, Kovalchuk I, Mora-Poblete F, Muslu T, Tsers ID, Yadav NS, Korzun V. OMICs, Epigenetics, and Genome Editing Techniques for Food and Nutritional Security. PLANTS (BASEL, SWITZERLAND) 2021; 10:1423. [PMID: 34371624 PMCID: PMC8309286 DOI: 10.3390/plants10071423] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 06/30/2021] [Accepted: 07/07/2021] [Indexed: 12/22/2022]
Abstract
The incredible success of crop breeding and agricultural innovation in the last century greatly contributed to the Green Revolution, which significantly increased yields and ensures food security, despite the population explosion. However, new challenges such as rapid climate change, deteriorating soil, and the accumulation of pollutants require much faster responses and more effective solutions that cannot be achieved through traditional breeding. Further prospects for increasing the efficiency of agriculture are undoubtedly associated with the inclusion in the breeding strategy of new knowledge obtained using high-throughput technologies and new tools in the future to ensure the design of new plant genomes and predict the desired phenotype. This article provides an overview of the current state of research in these areas, as well as the study of soil and plant microbiomes, and the prospective use of their potential in a new field of microbiome engineering. In terms of genomic and phenomic predictions, we also propose an integrated approach that combines high-density genotyping and high-throughput phenotyping techniques, which can improve the prediction accuracy of quantitative traits in crop species.
Collapse
Affiliation(s)
- Yuri V. Gogolev
- Federal Research Center Kazan Scientific Center of Russian Academy of Sciences, Kazan Institute of Biochemistry and Biophysics, 420111 Kazan, Russia;
- Federal Research Center Kazan Scientific Center of Russian Academy of Sciences, Laboratory of Plant Infectious Diseases, 420111 Kazan, Russia;
| | - Sunny Ahmar
- Institute of Biological Sciences, University of Talca, 1 Poniente 1141, Talca 3460000, Chile; (S.A.); (F.M.-P.)
| | | | - Hikmet Budak
- Montana BioAg Inc., Missoula, MT 59802, USA; (B.A.A.); (H.B.)
| | - Alexey S. Kiryushkin
- Laboratory of Cellular and Molecular Mechanisms of Plant Development, Komarov Botanical Institute of the Russian Academy of Sciences, 197376 Saint Petersburg, Russia; (A.S.K.); (K.N.D.)
| | - Vladimir Y. Gorshkov
- Federal Research Center Kazan Scientific Center of Russian Academy of Sciences, Kazan Institute of Biochemistry and Biophysics, 420111 Kazan, Russia;
- Federal Research Center Kazan Scientific Center of Russian Academy of Sciences, Laboratory of Plant Infectious Diseases, 420111 Kazan, Russia;
| | - Goetz Hensel
- Centre for Plant Genome Engineering, Institute of Plant Biochemistry, Heinrich-Heine-University, 40225 Dusseldorf, Germany;
- Centre of the Region Haná for Biotechnological and Agricultural Research, Czech Advanced Technology and Research Institute, Palacký University Olomouc, 78371 Olomouc, Czech Republic
| | - Kirill N. Demchenko
- Laboratory of Cellular and Molecular Mechanisms of Plant Development, Komarov Botanical Institute of the Russian Academy of Sciences, 197376 Saint Petersburg, Russia; (A.S.K.); (K.N.D.)
| | - Igor Kovalchuk
- Department of Biological Sciences, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada; (I.K.); (N.S.Y.)
| | - Freddy Mora-Poblete
- Institute of Biological Sciences, University of Talca, 1 Poniente 1141, Talca 3460000, Chile; (S.A.); (F.M.-P.)
| | - Tugdem Muslu
- Faculty of Engineering and Natural Sciences, Sabanci University, 34956 Istanbul, Turkey;
| | - Ivan D. Tsers
- Federal Research Center Kazan Scientific Center of Russian Academy of Sciences, Laboratory of Plant Infectious Diseases, 420111 Kazan, Russia;
| | - Narendra Singh Yadav
- Department of Biological Sciences, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada; (I.K.); (N.S.Y.)
| | - Viktor Korzun
- Federal Research Center Kazan Scientific Center of Russian Academy of Sciences, Laboratory of Plant Infectious Diseases, 420111 Kazan, Russia;
- KWS SAAT SE & Co. KGaA, Grimsehlstr. 31, 37555 Einbeck, Germany
| |
Collapse
|
22
|
Qayed A, Han D. High-dimensional covariance matrices tests for analyzing multi-tumor gene expression data. Stat Methods Med Res 2021; 30:1904-1916. [PMID: 34232835 DOI: 10.1177/09622802211009257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
By collecting multiple sets per subject in microarray data, gene sets analysis requires characterize intra-subject variation using gene expression profiling. For each subject, the data can be written as a matrix with the different subsets of gene expressions (e.g. multiple tumor types) indexing the rows and the genes indexing the columns. To test the assumption of intra-subject (tumor) variation, we present and perform tests of multi-set sphericity and multi-set identity of covariance structures across subjects (tumor types). We demonstrate by both theoretical and empirical studies that the tests have good properties. We applied the proposed tests on The Cancer Genome Atlas (TCGA) and tested covariance structures for the gene expressions across several tumor types.
Collapse
Affiliation(s)
- Abdullah Qayed
- School of Mathematical Sciences, Department of Statistics, Shanghai Jiao Tong University, Shanghai, China
| | - Dong Han
- School of Mathematical Sciences, Department of Statistics, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
23
|
Differential Expression of a Panel of Ten CNTN1-Associated Genes during Prostate Cancer Progression and the Predictive Properties of the Panel Towards Prostate Cancer Relapse. Genes (Basel) 2021; 12:genes12020257. [PMID: 33578925 PMCID: PMC7916715 DOI: 10.3390/genes12020257] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 01/29/2021] [Accepted: 02/01/2021] [Indexed: 12/18/2022] Open
Abstract
Contactin 1 (CNTN1) is a new oncogenic protein of prostate cancer (PC); its impact on PC remains incompletely understood. We observed CNTN1 upregulation in LNCaP cell-derived castration-resistant PCs (CRPC) and CNTN1-mediated enhancement of LNCaP cell proliferation. CNTN1 overexpression in LNCaP cells resulted in enrichment of the CREIGHTON_ENDOCRINE_THERAPY_RESISTANCE_3 gene set that facilitates endocrine resistance in breast cancer. The leading-edge (LE) genes (n = 10) of this enrichment consist of four genes with limited knowledge on PC and six genes novel to PC. These LE genes display differential expression during PC initiation, metastatic progression, and CRPC development, and they predict PC relapse following curative therapies at hazard ratio (HR) 2.72, 95% confidence interval (CI) 1.96–3.77, and p = 1.77 × 10−9 in The Cancer Genome Atlas (TCGA) PanCancer cohort (n = 492) and HR 2.72, 95% CI 1.84–4.01, and p = 4.99 × 10−7 in Memorial Sloan Kettering Cancer Center (MSKCC) cohort (n = 140). The LE gene panel classifies high-, moderate-, and low-risk of PC relapse in both cohorts. Additionally, the gene panel robustly predicts poor overall survival in clear cell renal cell carcinoma (ccRCC, p = 1.13 × 10−11), consistent with ccRCC and PC both being urogenital cancers. Collectively, we report multiple CNTN1-related genes relevant to PC and their biomarker values in predicting PC relapse.
Collapse
|
24
|
Manechini JRV, Santos PHDS, Romanel E, Brito MDS, Scarpari MS, Jackson S, Pinto LR, Vicentini R. Transcriptomic Analysis of Changes in Gene Expression During Flowering Induction in Sugarcane Under Controlled Photoperiodic Conditions. FRONTIERS IN PLANT SCIENCE 2021; 12:635784. [PMID: 34211482 PMCID: PMC8239368 DOI: 10.3389/fpls.2021.635784] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/12/2021] [Indexed: 05/11/2023]
Abstract
Flowering is of utmost relevance for the agricultural productivity of the sugarcane bioeconomy, but data and knowledge of the genetic mechanisms underlying its photoperiodic induction are still scarce. An understanding of the molecular mechanisms that regulate the transition from vegetative to reproductive growth in sugarcane could provide better control of flowering for breeding. This study aimed to investigate the transcriptome of +1 mature leaves of a sugarcane cultivar subjected to florally inductive and non-inductive photoperiodic treatments to identify gene expression patterns and molecular regulatory modules. We identified 7,083 differentially expressed (DE) genes, of which 5,623 showed significant identity to other plant genes. Functional group analysis showed differential regulation of important metabolic pathways involved in plant development, such as plant hormones (i.e., cytokinin, gibberellin, and abscisic acid), light reactions, and photorespiration. Gene ontology enrichment analysis revealed evidence of upregulated processes and functions related to the response to abiotic stress, photoprotection, photosynthesis, light harvesting, and pigment biosynthesis, whereas important categories related to growth and vegetative development of plants, such as plant organ morphogenesis, shoot system development, macromolecule metabolic process, and lignin biosynthesis, were downregulated. Also, out of 76 sugarcane transcripts considered putative orthologs to flowering genes from other plants (such as Arabidopsis thaliana, Oryza sativa, and Sorghum bicolor), 21 transcripts were DE. Nine DE genes related to flowering and response to photoperiod were analyzed either at mature or spindle leaves at two development stages corresponding to the early stage of induction and inflorescence primordia formation. Finally, we report a set of flowering-induced long non-coding RNAs and describe their level of conservation to other crops, many of which showed expression patterns correlated against those in the functionally grouped gene network.
Collapse
Affiliation(s)
- João Ricardo Vieira Manechini
- Laboratório de Biologia de Sistemas, Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas (UNICAMP), Campinas, Brazil
| | - Paulo Henrique da Silva Santos
- Departamento de Genética e Melhoramento de Plantas, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual de São Paulo (UNESP), Jaboticabal, Brazil
| | - Elisson Romanel
- Laboratório de Genômica de Plantas e Bioenergia (PGEMBL), Departamento de Biotecnologia, Escola de Engenharia de Lorena (EEL), Universidade de São Paulo (USP), Lorena, Brazil
| | - Michael dos Santos Brito
- Instituto de Ciência e Tecnologia, Universidade Federal de São Paulo (UNIFESP), São José dos Campos, Brazil
| | | | - Stephen Jackson
- School of Life Sciences, The University of Warwick, Coventry, United Kingdom
| | - Luciana Rossini Pinto
- Departamento de Genética e Melhoramento de Plantas, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual de São Paulo (UNESP), Jaboticabal, Brazil
- Centro de Cana, Instituto Agronômico de Campinas (IAC), Ribeirão Preto, Brazil
| | - Renato Vicentini
- Laboratório de Biologia de Sistemas, Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas (UNICAMP), Campinas, Brazil
- *Correspondence: Renato Vicentini,
| |
Collapse
|
25
|
Maleki F, Ovens K, Hogan DJ, Kusalik AJ. Gene Set Analysis: Challenges, Opportunities, and Future Research. Front Genet 2020; 11:654. [PMID: 32695141 PMCID: PMC7339292 DOI: 10.3389/fgene.2020.00654] [Citation(s) in RCA: 93] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Accepted: 05/29/2020] [Indexed: 12/14/2022] Open
Abstract
Gene set analysis methods are widely used to provide insight into high-throughput gene expression data. There are many gene set analysis methods available. These methods rely on various assumptions and have different requirements, strengths and weaknesses. In this paper, we classify gene set analysis methods based on their components, describe the underlying requirements and assumptions for each class, and provide directions for future research in developing and evaluating gene set analysis methods.
Collapse
|