1
|
Farr JN, Saul D, Doolittle ML, Kaur J, Rowsey JL, Vos SJ, Froemming MN, Lagnado AB, Zhu Y, Weivoda M, Ikeno Y, Pignolo RJ, Niedernhofer LJ, Robbins PD, Jurk D, Passos JF, LeBrasseur NK, Tchkonia T, Kirkland JL, Monroe DG, Khosla S. Local senolysis in aged mice only partially replicates the benefits of systemic senolysis. J Clin Invest 2023; 133:e162519. [PMID: 36809340 PMCID: PMC10104901 DOI: 10.1172/jci162519] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 02/16/2023] [Indexed: 02/23/2023] Open
Abstract
Clearance of senescent cells (SnCs) can prevent several age-related pathologies, including bone loss. However, the local versus systemic roles of SnCs in mediating tissue dysfunction remain unclear. Thus, we developed a mouse model (p16-LOX-ATTAC) that allowed for inducible SnC elimination (senolysis) in a cell-specific manner and compared the effects of local versus systemic senolysis during aging using bone as a prototype tissue. Specific removal of Sn osteocytes prevented age-related bone loss at the spine, but not the femur, by improving bone formation without affecting osteoclasts or marrow adipocytes. By contrast, systemic senolysis prevented bone loss at the spine and femur and not only improved bone formation, but also reduced osteoclast and marrow adipocyte numbers. Transplantation of SnCs into the peritoneal cavity of young mice caused bone loss and also induced senescence in distant host osteocytes. Collectively, our findings provide proof-of-concept evidence that local senolysis has health benefits in the context of aging, but, importantly, that local senolysis only partially replicates the benefits of systemic senolysis. Furthermore, we establish that SnCs, through their senescence-associated secretory phenotype (SASP), lead to senescence in distant cells. Therefore, our study indicates that optimizing senolytic drugs may require systemic instead of local SnC targeting to extend healthy aging.
Collapse
Affiliation(s)
- Joshua N. Farr
- Robert and Arlene Kogod Center on Aging
- Division of Endocrinology
- Department of Physiology and Biomedical Engineering, and
| | - Dominik Saul
- Robert and Arlene Kogod Center on Aging
- Division of Endocrinology
| | | | - Japneet Kaur
- Robert and Arlene Kogod Center on Aging
- Division of Endocrinology
| | | | - Stephanie J. Vos
- Robert and Arlene Kogod Center on Aging
- Division of Endocrinology
| | | | - Anthony B. Lagnado
- Robert and Arlene Kogod Center on Aging
- Department of Physiology and Biomedical Engineering, and
| | - Yi Zhu
- Robert and Arlene Kogod Center on Aging
- Department of Physiology and Biomedical Engineering, and
| | - Megan Weivoda
- Department of Hematology, Mayo Clinic College of Medicine, Rochester, Minnesota, USA
| | - Yuji Ikeno
- Department of Pathology and Laboratory Medicine, University of Texas Health Science Center, San Antonio, Texas, USA
| | - Robert J. Pignolo
- Robert and Arlene Kogod Center on Aging
- Department of Physiology and Biomedical Engineering, and
- Department of Medicine, Mayo Clinic College of Medicine, Rochester, Minnesota, USA
| | - Laura J. Niedernhofer
- Institute on the Biology of Aging and Metabolism, Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Paul D. Robbins
- Institute on the Biology of Aging and Metabolism, Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Diana Jurk
- Robert and Arlene Kogod Center on Aging
- Department of Physiology and Biomedical Engineering, and
| | - João F. Passos
- Robert and Arlene Kogod Center on Aging
- Department of Physiology and Biomedical Engineering, and
| | - Nathan K. LeBrasseur
- Robert and Arlene Kogod Center on Aging
- Department of Physiology and Biomedical Engineering, and
- Department of Physical Medicine and Rehabilitation, Mayo Clinic College of Medicine, Rochester, Minnesota, USA
| | | | | | - David G. Monroe
- Robert and Arlene Kogod Center on Aging
- Division of Endocrinology
| | - Sundeep Khosla
- Robert and Arlene Kogod Center on Aging
- Division of Endocrinology
- Department of Physiology and Biomedical Engineering, and
| |
Collapse
|
2
|
Amini P, Hajihosseini M, Pyne S, Dinu I. Geographically weighted linear combination test for gene-set analysis of a continuous spatial phenotype as applied to intratumor heterogeneity. Front Cell Dev Biol 2023; 11:1065586. [PMID: 36998245 PMCID: PMC10044624 DOI: 10.3389/fcell.2023.1065586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 02/22/2023] [Indexed: 03/11/2023] Open
Abstract
Background: The impact of gene-sets on a spatial phenotype is not necessarily uniform across different locations of cancer tissue. This study introduces a computational platform, GWLCT, for combining gene set analysis with spatial data modeling to provide a new statistical test for location-specific association of phenotypes and molecular pathways in spatial single-cell RNA-seq data collected from an input tumor sample.Methods: The main advantage of GWLCT consists of an analysis beyond global significance, allowing the association between the gene-set and the phenotype to vary across the tumor space. At each location, the most significant linear combination is found using a geographically weighted shrunken covariance matrix and kernel function. Whether a fixed or adaptive bandwidth is determined based on a cross-validation cross procedure. Our proposed method is compared to the global version of linear combination test (LCT), bulk and random-forest based gene-set enrichment analyses using data created by the Visium Spatial Gene Expression technique on an invasive breast cancer tissue sample, as well as 144 different simulation scenarios.Results: In an illustrative example, the new geographically weighted linear combination test, GWLCT, identifies the cancer hallmark gene-sets that are significantly associated at each location with the five spatially continuous phenotypic contexts in the tumors defined by different well-known markers of cancer-associated fibroblasts. Scan statistics revealed clustering in the number of significant gene-sets. A spatial heatmap of combined significance over all selected gene-sets is also produced. Extensive simulation studies demonstrate that our proposed approach outperforms other methods in the considered scenarios, especially when the spatial association increases.Conclusion: Our proposed approach considers the spatial covariance of gene expression to detect the most significant gene-sets affecting a continuous phenotype. It reveals spatially detailed information in tissue space and can thus play a key role in understanding the contextual heterogeneity of cancer cells.
Collapse
Affiliation(s)
- Payam Amini
- Department of Biostatistics, School of Public Health, Iran University of Medical Sciences, Tehran, Iran
- School of Medicine, Keele University, Keele, Staffordshire, United Kingdom
| | - Morteza Hajihosseini
- School of Public Health, University of Alberta, Edmonton, AB, Canada
- Stanford Department of Urology, Center for Academic Medicine, Palo Alto, CA, United States
| | - Saumyadipta Pyne
- Health Analytics Network, Pittsburgh, PA, United States
- University of California, Santa Barbara, Santa Barbara, CA, United States
- *Correspondence: Saumyadipta Pyne, ; Irina Dinu,
| | - Irina Dinu
- School of Public Health, University of Alberta, Edmonton, AB, Canada
- *Correspondence: Saumyadipta Pyne, ; Irina Dinu,
| |
Collapse
|
3
|
Maghsoudi Z, Nguyen H, Tavakkoli A, Nguyen T. A comprehensive survey of the approaches for pathway analysis using multi-omics data integration. Brief Bioinform 2022; 23:6761962. [PMID: 36252928 PMCID: PMC9677478 DOI: 10.1093/bib/bbac435] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/26/2022] [Accepted: 09/08/2022] [Indexed: 02/07/2023] Open
Abstract
Pathway analysis has been widely used to detect pathways and functions associated with complex disease phenotypes. The proliferation of this approach is due to better interpretability of its results and its higher statistical power compared with the gene-level statistics. A plethora of pathway analysis methods that utilize multi-omics setup, rather than just transcriptomics or proteomics, have recently been developed to discover novel pathways and biomarkers. Since multi-omics gives multiple views into the same problem, different approaches are employed in aggregating these views into a comprehensive biological context. As a result, a variety of novel hypotheses regarding disease ideation and treatment targets can be formulated. In this article, we review 32 such pathway analysis methods developed for multi-omics and multi-cohort data. We discuss their availability and implementation, assumptions, supported omics types and databases, pathway analysis techniques and integration strategies. A comprehensive assessment of each method's practicality, and a thorough discussion of the strengths and drawbacks of each technique will be provided. The main objective of this survey is to provide a thorough examination of existing methods to assist potential users and researchers in selecting suitable tools for their data and analysis purposes, while highlighting outstanding challenges in the field that remain to be addressed for future development.
Collapse
Affiliation(s)
- Zeynab Maghsoudi
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Ha Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Alireza Tavakkoli
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Tin Nguyen
- Corresponding author: Tin Nguyen, Department of Computer Science and Engineering, University of Nevada, Reno, NV, USA. Tel.: +1-775-784-6619;
| |
Collapse
|
4
|
Doolittle ML, Saul D, Kaur J, Rowsey JL, Eckhardt B, Vos S, Grain S, Kroupova K, Ruan M, Weivoda M, Oursler MJ, Farr JN, Monroe DG, Khosla S. Skeletal Effects of Inducible ERα Deletion in Osteocytes in Adult Mice. J Bone Miner Res 2022; 37:1750-1760. [PMID: 35789113 PMCID: PMC9474695 DOI: 10.1002/jbmr.4644] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 06/28/2022] [Accepted: 07/02/2022] [Indexed: 11/12/2022]
Abstract
Estrogen is known to regulate bone metabolism in both women and men, but substantial gaps remain in our knowledge of estrogen and estrogen receptor alpha (ERα) regulation of adult bone metabolism. Studies using global ERα-knockout mice were confounded by high circulating sex-steroid levels, and osteocyte/osteoblast-specific ERα deletion may be confounded by ERα effects on growth versus the adult skeleton. Thus, we developed mice expressing the tamoxifen-inducible CreERT2 in osteocytes using the 8-kilobase (kb) Dmp1 promoter (Dmp1CreERT2 ). These mice were crossed with ERαfl//fl mice to create ERαΔOcy mice, permitting inducible osteocyte-specific ERα deletion in adulthood. After intermittent tamoxifen treatment of adult 4-month-old mice for 1 month, female, but not male, ERαΔOcy mice exhibited reduced spine bone volume fraction (BV/TV (-20.1%, p = 0.004) accompanied by decreased trabecular bone formation rate (-18.9%, p = 0.0496) and serum P1NP levels (-38.9%, p = 0.014). Periosteal (+65.6%, p = 0.004) and endocortical (+64.1%, p = 0.003) expansion were higher in ERαΔOcy mice compared to control (Dmp1CreERT2 ) mice at the tibial diaphysis, reflecting the known effects of estrogen to inhibit periosteal apposition and promote endocortical formation. Increases in Sost (2.1-fold, p = 0.001) messenger RNA (mRNA) levels were observed in trabecular bone at the spine in ERαΔOcy mice, consistent with previous reports that estrogen deficiency is associated with increased circulating sclerostin as well as bone SOST mRNA levels in humans. Further, the biological consequences of increased Sost expression were reflected in significant overall downregulation in panels of osteoblast and Wnt target genes in osteocyte-enriched bones from ERαΔOcy mice. These findings thus establish that osteocytic ERα is critical for estrogen action in female, but not male, adult bone metabolism. Moreover, the reduction in bone formation accompanied by increased Sost, decreased osteoblast, and decreased Wnt target gene expression in ERαΔOcy mice provides a direct link in vivo between ERα and Wnt signaling. © 2022 American Society for Bone and Mineral Research (ASBMR).
Collapse
Affiliation(s)
- Madison L. Doolittle
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
| | - Dominik Saul
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
| | - Japneet Kaur
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
| | - Jennifer L. Rowsey
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
| | - Brittany Eckhardt
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
| | - Stephanie Vos
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
| | - Sarah Grain
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
| | - Kveta Kroupova
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
- University Hospital Hradec Kralove and the Faculty of Medicine in Hradec Kralove, Czech Republic
| | - Ming Ruan
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
| | - Megan Weivoda
- Robert and Arlene Kogod Center on Aging and Division of Hematology, Mayo Clinic College of Medicine, Rochester, MN
| | - Merry Jo Oursler
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
| | - Joshua N. Farr
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
| | - David G. Monroe
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
| | - Sundeep Khosla
- Robert and Arlene Kogod Center on Aging and Division of Endocrinology, Mayo Clinic College of Medicine, Rochester, MN
| |
Collapse
|
5
|
Djordjilović V, Chiogna M. Searching for a source of difference in graphical models. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2022.104973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
6
|
Xu S, Wang J, Li J, Wang Y, Wang Z. System dynamics research of non-adaptive evacuation psychology in toxic gas leakage emergencies of chemical park. J Loss Prev Process Ind 2021. [DOI: 10.1016/j.jlp.2021.104556] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
7
|
Abstract
Background:
Gene set enrichment analyses (GSEA) provide a useful and powerful
approach to identify differentially expressed gene sets with prior biological knowledge. Several
GSEA algorithms have been proposed to perform enrichment analyses on groups of genes.
However, many of these algorithms have focused on the identification of differentially expressed
gene sets in a given phenotype.
Objective:
In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression
and highly co-related pathways.
Methods:
We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data
to measure the co-structure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is
one multivariate method to identify trends or co-relationships in multiple datasets, which contain the
same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two
gene sets such that the square covariance between the projections of the gene sets on successive axes
is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships
between gene sets in all simulation settings when compared to correlation-based gene
set methods.
Result and Conclusion:
We also combine between-gene set CIA and GSEA to discover the relationships between gene
sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate
integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using
the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization
of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.
Collapse
Affiliation(s)
- Chen-An Tsai
- Department of Agronomy, National Taiwan University, Taipei,Taiwan
| | - James J. Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, Jefferson, AR 72079,United States
| |
Collapse
|
8
|
Kaspi A, Ziemann M. mitch: multi-contrast pathway enrichment for multi-omics and single-cell profiling data. BMC Genomics 2020; 21:447. [PMID: 32600408 PMCID: PMC7325150 DOI: 10.1186/s12864-020-06856-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 06/19/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Inference of biological pathway activity via gene set enrichment analysis is frequently used in the interpretation of clinical and other omics data. With the proliferation of new omics profiling approaches and ever-growing size of data sets generated, there is a lack of tools available to perform and visualise gene set enrichments in analyses involving multiple contrasts. RESULTS To address this, we developed mitch, an R package for multi-contrast gene set enrichment analysis. It uses a rank-MANOVA statistical approach to identify sets of genes that exhibit joint enrichment across multiple contrasts. Its unique visualisation features enable the exploration of enrichments in up to 20 contrasts. We demonstrate the utility of mitch with case studies spanning multi-contrast RNA expression profiling, integrative multi-omics, tool benchmarking and single-cell RNA sequencing. Using simulated data we show that mitch has similar accuracy to state of the art tools for single-contrast enrichment analysis, and superior accuracy in identifying multi-contrast enrichments. CONCLUSION mitch is a versatile tool for rapidly and accurately identifying and visualising gene set enrichments in multi-contrast omics data. Mitch is available from Bioconductor ( https://bioconductor.org/packages/mitch ).
Collapse
Affiliation(s)
- Antony Kaspi
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC, 3052, Australia
- Department of Medical Biology, University of Melbourne, 1G Royal Parade, Parkville, VIC, 3052, Australia
| | - Mark Ziemann
- School of Life and Environmental Sciences, Deakin University, Geelong, Australia.
| |
Collapse
|
9
|
Clavel J, Morlon H. Reliable Phylogenetic Regressions for Multivariate Comparative Data: Illustration with the MANOVA and Application to the Effect of Diet on Mandible Morphology in Phyllostomid Bats. Syst Biol 2020; 69:927-943. [DOI: 10.1093/sysbio/syaa010] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 02/02/2020] [Accepted: 02/07/2020] [Indexed: 11/12/2022] Open
Abstract
Abstract
Understanding what shapes species phenotypes over macroevolutionary timescales from comparative data often requires studying the relationship between phenotypes and putative explanatory factors or testing for differences in phenotypes across species groups. In phyllostomid bats for example, is mandible morphology associated to diet preferences? Performing such analyses depends upon reliable phylogenetic regression techniques and associated tests (e.g., phylogenetic Generalized Least Squares, pGLS, and phylogenetic analyses of variance and covariance, pANOVA, pANCOVA). While these tools are well established for univariate data, their multivariate counterparts are lagging behind. This is particularly true for high-dimensional phenotypic data, such as morphometric data. Here, we implement much-needed likelihood-based multivariate pGLS, pMANOVA, and pMANCOVA, and use a recently developed penalized-likelihood framework to extend their application to the difficult case when the number of traits $p$ approaches or exceeds the number of species $n$. We then focus on the pMANOVA and use intensive simulations to assess the performance of the approach as $p$ increases, under various levels of phylogenetic signal and correlations between the traits, phylogenetic structure in the predictors, and under various types of phenotypic differences across species groups. We show that our approach outperforms available alternatives under all circumstances, with greater power to detect phenotypic differences across species group when they exist, and a lower risk of improperly detecting nonexistent differences. Finally, we provide an empirical illustration of our pMANOVA on a geometric-morphometric data set describing mandible morphology in phyllostomid bats along with data on their diet preferences. Overall our results show significant differences between ecological groups. Our approach, implemented in the R package mvMORPH and illustrated in a tutorial for end-users, provides efficient multivariate phylogenetic regression tools for understanding what shapes phenotypic differences across species. [Generalized least squares; high-dimensional data sets; multivariate phylogenetic comparative methods; penalized likelihood; phenomics; phyllostomid bats; phylogenetic MANOVA; phylogenetic regression.]
Collapse
Affiliation(s)
- Julien Clavel
- Institut de Biologie de l’École Normale Supérieure (IBENS), École Normale Supérieure, Paris Sciences et Lettres (PSL) Research University, CNRS UMR 8197, INSERM U1024, 46 rue d’Ulm, F-75005 Paris, France
- Life Sciences Department, The Natural History Museum, Cromwell Road, London SW7 5BD, UK
- Univ Lyon, Laboratoire d’Ecologie des Hydrosystémes Naturels et Anthropisés, UMR CNRS 5023, Université Claude Bernard Lyon 1, ENTPE, Boulevard du 11 Novembre 1918 F-69622, Villeurbanne Cedex, France
| | - Hélène Morlon
- Institut de Biologie de l’École Normale Supérieure (IBENS), École Normale Supérieure, Paris Sciences et Lettres (PSL) Research University, CNRS UMR 8197, INSERM U1024, 46 rue d’Ulm, F-75005 Paris, France
| |
Collapse
|
10
|
Combination of Ensembles of Regularized Regression Models with Resampling-Based Lasso Feature Selection in High Dimensional Data. MATHEMATICS 2020. [DOI: 10.3390/math8010110] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
In high-dimensional data, the performances of various classifiers are largely dependent on the selection of important features. Most of the individual classifiers with the existing feature selection (FS) methods do not perform well for highly correlated data. Obtaining important features using the FS method and selecting the best performing classifier is a challenging task in high throughput data. In this article, we propose a combination of resampling-based least absolute shrinkage and selection operator (LASSO) feature selection (RLFS) and ensembles of regularized regression (ERRM) capable of dealing data with the high correlation structures. The ERRM boosts the prediction accuracy with the top-ranked features obtained from RLFS. The RLFS utilizes the lasso penalty with sure independence screening (SIS) condition to select the top k ranked features. The ERRM includes five individual penalty based classifiers: LASSO, adaptive LASSO (ALASSO), elastic net (ENET), smoothly clipped absolute deviations (SCAD), and minimax concave penalty (MCP). It was built on the idea of bagging and rank aggregation. Upon performing simulation studies and applying to smokers’ cancer gene expression data, we demonstrated that the proposed combination of ERRM with RLFS achieved superior performance of accuracy and geometric mean.
Collapse
|
11
|
Zhang M, Zhou C, He Y, Liu B. Data‐adaptive test for high‐dimensional multivariate analysis of variance problem. AUST NZ J STAT 2018. [DOI: 10.1111/anzs.12246] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Mingjuan Zhang
- School of Statistics and Mathematics Shanghai Lixin University of Accounting and Finance Shanghai 201209China
| | - Cheng Zhou
- Department of Statistics School of Management Fudan University Shanghai 200433China
| | - Yong He
- School of Statistics Shandong University of Finance and Economics Jinan 250014ShandongChina
| | - Bin Liu
- Department of Statistics School of Management Fudan University Shanghai 200433China
| |
Collapse
|
12
|
Baldascino E, Di Cristina G, Tedesco P, Hobbs C, Shaw TJ, Ponte G, Andrews PLR. The Gastric Ganglion of Octopus vulgaris: Preliminary Characterization of Gene- and Putative Neurochemical-Complexity, and the Effect of Aggregata octopiana Digestive Tract Infection on Gene Expression. Front Physiol 2017; 8:1001. [PMID: 29326594 PMCID: PMC5736919 DOI: 10.3389/fphys.2017.01001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 11/20/2017] [Indexed: 12/19/2022] Open
Abstract
The gastric ganglion is the largest visceral ganglion in cephalopods. It is connected to the brain and is implicated in regulation of digestive tract functions. Here we have investigated the neurochemical complexity (through in silico gene expression analysis and immunohistochemistry) of the gastric ganglion in Octopus vulgaris and tested whether the expression of a selected number of genes was influenced by the magnitude of digestive tract parasitic infection by Aggregata octopiana. Novel evidence was obtained for putative peptide and non-peptide neurotransmitters in the gastric ganglion: cephalotocin, corticotrophin releasing factor, FMRFamide, gamma amino butyric acid, 5-hydroxytryptamine, molluscan insulin-related peptide 3, peptide PRQFV-amide, and tachykinin-related peptide. Receptors for cholecystokininA and cholecystokininB, and orexin2 were also identified in this context for the first time. We report evidence for acetylcholine, dopamine, noradrenaline, octopamine, small cardioactive peptide related peptide, and receptors for cephalotocin and octopressin, confirming previous publications. The effects of Aggregata observed here extend those previously described by showing effects on the gastric ganglion; in animals with a higher level of infection, genes implicated in inflammation (NFκB, fascin, serpinB10 and the toll-like 3 receptor) increased their relative expression, but TNF-α gene expression was lower as was expression of other genes implicated in oxidative stress (i.e., superoxide dismutase, peroxiredoxin 6, and glutathione peroxidase). Elevated Aggregata levels in the octopuses corresponded to an increase in the expression of the cholecystokininA receptor and the small cardioactive peptide-related peptide. In contrast, we observed decreased relative expression of cephalotocin, dopamine β-hydroxylase, peptide PRQFV-amide, and tachykinin-related peptide genes. A discussion is provided on (i) potential roles of the various molecules in food intake regulation and digestive tract motility control and (ii) the difference in relative gene expression in the gastric ganglion in octopus with relatively high and low parasitic loads and the similarities to changes in the enteric innervation of mammals with digestive tract parasites. Our results provide additional data to the described neurochemical complexity of O. vulgaris gastric ganglion.
Collapse
Affiliation(s)
- Elena Baldascino
- Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Napoli, Italy
| | - Giulia Di Cristina
- Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Napoli, Italy
| | - Perla Tedesco
- Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Napoli, Italy
| | - Carl Hobbs
- Wolfson Centre for Age-Related Diseases, King's College London, London, United Kingdom
| | - Tanya J. Shaw
- Centre for Inflammation Biology and Cancer Immunology, King's College London, London, United Kingdom
| | - Giovanna Ponte
- Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Napoli, Italy
- Association for Cephalopod Research - CephRes, Napoli, Italy
| | - Paul L. R. Andrews
- Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Napoli, Italy
- Association for Cephalopod Research - CephRes, Napoli, Italy
| |
Collapse
|
13
|
Xia Y. Testing and support recovery of multiple high-dimensional covariance matrices with false discovery rate control. TEST-SPAIN 2017. [DOI: 10.1007/s11749-017-0533-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
14
|
Hu Z, Dong K, Dai W, Tong T. A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix. Int J Biostat 2017; 13:/j/ijb.ahead-of-print/ijb-2017-0013/ijb-2017-0013.xml. [PMID: 28953454 DOI: 10.1515/ijb-2017-0013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 08/16/2017] [Indexed: 11/15/2022]
Abstract
The determinant of the covariance matrix for high-dimensional data plays an important role in statistical inference and decision. It has many real applications including statistical tests and information theory. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of high-dimensional covariance matrix. In this paper, we estimate the determinant of the covariance matrix using some recent proposals for estimating high-dimensional covariance matrix. Specifically, we consider a total of eight covariance matrix estimation methods for comparison. Through extensive simulation studies, we explore and summarize some interesting comparison results among all compared methods. We also provide practical guidelines based on the sample size, the dimension, and the correlation of the data set for estimating the determinant of high-dimensional covariance matrix. Finally, from a perspective of the loss function, the comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation.
Collapse
|
15
|
You N, Wang X. An empirical Bayes method for robust variance estimation in detecting DEGs using microarray data. J Bioinform Comput Biol 2017; 15:1750020. [PMID: 28893113 DOI: 10.1142/s0219720017500202] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The microarray technology is widely used to identify the differentially expressed genes due to its high throughput capability. The number of replicated microarray chips in each group is usually not abundant. It is an efficient way to borrow information across different genes to improve the parameter estimation which suffers from the limited sample size. In this paper, we use a hierarchical model to describe the dispersion of gene expression profiles and model the variance through the gene expression level via a link function. A heuristic algorithm is proposed to estimate the hyper-parameters and link function. The differentially expressed genes are identified using a multiple testing procedure. Compared to SAM and LIMMA, our proposed method shows a significant superiority in term of detection power as the false discovery rate being controlled.
Collapse
Affiliation(s)
- Na You
- 1 School of Mathematics, Southern China Center for Statistical Science, Sun Yat-sen University, Guangzhou 510275, P. R. China
| | - Xueqin Wang
- 1 School of Mathematics, Southern China Center for Statistical Science, Sun Yat-sen University, Guangzhou 510275, P. R. China
| |
Collapse
|
16
|
Abstract
Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA. Pathway analysis is a common approach to quickly access the pathways being regulated in the experiments. There are numerous statistics to perform pathway analysis; most of them assume that the genes or proteins are independent of each other for statistical ease. This assumption, however, is unrealistic to the real biological system and may cause false positives in practice. A standard way to address this issue is to measure the associations among genes or proteins. Unfortunately, the estimation of associations requires sufficient sample size, which is usually not available for proteomic data produced by mass spectrometry. In this study, we propose a T2-statistic, which estimates the associations among gene products, to perform pathway analysis for quantitative proteomic data. Instead of calculating the associations directly from data, we use the confidence scores retrieved from protein-protein interaction databases. We also design an integrating procedure to reserve pathways of sufficient evidence as a regulated pathway group. We compare the proposed T2-statistic to other popular statistics using five published experimental datasets, and the T2-statistic yields more accurate descriptions in agreement with the discussion of the original papers.
Collapse
|
17
|
Zhuo B, Jiang D. MEACA: efficient gene-set interpretation of expression data using mixed models.. [DOI: 10.1101/106781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
AbstractCompetitive gene-set analysis, or enrichment analysis, is widely used for functional interpretation of gene expression data. It tests a known category (e.g. pathway) of genes for enriched differential expression signals. Current methods do not properly capture inter-gene correlations and heterogeneity, resulting in mis-calibration and power loss. We propose MEACA, a new gene-set method based on mixed-effects models. MEACA flexibly incorporates unknown heterogeneity and correlations across genes, and does not need time-consuming permutations. Compared to existing methods, MEACA substantially improves type 1 error control and power in widely ranging scenarios. Real data applications demonstrate MEACA’s ability to recover biologically meaningful relationships.
Collapse
|
18
|
Zhang L, Wang L, Tian P, Tian S. Identification of Genes Discriminating Multiple Sclerosis Patients from Controls by Adapting a Pathway Analysis Method. PLoS One 2016; 11:e0165543. [PMID: 27846233 PMCID: PMC5112852 DOI: 10.1371/journal.pone.0165543] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 09/13/2016] [Indexed: 11/18/2022] Open
Abstract
The focus of analyzing data from microarray experiments has shifted from the identification of associated individual genes to that of associated biological pathways or gene sets. In bioinformatics, a feature selection algorithm is usually used to cope with the high dimensionality of microarray data. In addition to those algorithms that use the biological information contained within a gene set as a priori to facilitate the process of feature selection, various gene set analysis methods can be applied directly or modified readily for the purpose of feature selection. Significance analysis of microarray to gene-set reduction analysis (SAM-GSR) algorithm, a novel direction of gene set analysis, is one of such methods. Here, we explore the feature selection property of SAM-GSR and provide a modification to better achieve the goal of feature selection. In a multiple sclerosis (MS) microarray data application, both SAM-GSR and our modification of SAM-GSR perform well. Our results show that SAM-GSR can carry out feature selection indeed, and modified SAM-GSR outperforms SAM-GSR. Given pathway information is far from completeness, a statistical method capable of constructing biologically meaningful gene networks is of interest. Consequently, both SAM-GSR algorithms will be continuously revaluated in our future work, and thus better characterized.
Collapse
Affiliation(s)
- Lei Zhang
- College of Life Science, Jilin University, 2699 Qianjin Street, Changchun, Jilin, China, 130012
- Department of Neurology, The Second Hospital of Jilin University, 218 Ziqiang Street, Changchun, Jilin, China, 130041
| | - Linlin Wang
- College of Life Science, Jilin University, 2699 Qianjin Street, Changchun, Jilin, China, 130012
| | - Pu Tian
- College of Life Science, Jilin University, 2699 Qianjin Street, Changchun, Jilin, China, 130012
| | - Suyan Tian
- Division of Clinical Research, The First Hospital of Jilin University, 71 Xinmin Street, Changchun, Jilin, China, 130021
| |
Collapse
|
19
|
Walker JA. Monte Carlo simulation of OLS and linear mixed model inference of phenotypic effects on gene expression. PeerJ 2016; 4:e2575. [PMID: 27761350 PMCID: PMC5068350 DOI: 10.7717/peerj.2575] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 09/15/2016] [Indexed: 12/02/2022] Open
Abstract
Background Self-contained tests estimate and test the association between a phenotype and mean expression level in a gene set defined a priori. Many self-contained gene set analysis methods have been developed but the performance of these methods for phenotypes that are continuous rather than discrete and with multiple nuisance covariates has not been well studied. Here, I use Monte Carlo simulation to evaluate the performance of both novel and previously published (and readily available via R) methods for inferring effects of a continuous predictor on mean expression in the presence of nuisance covariates. The motivating data are a high-profile dataset which was used to show opposing effects of hedonic and eudaimonic well-being (or happiness) on the mean expression level of a set of genes that has been correlated with social adversity (the CTRA gene set). The original analysis of these data used a linear model (GLS) of fixed effects with correlated error to infer effects of Hedonia and Eudaimonia on mean CTRA expression. Methods The standardized effects of Hedonia and Eudaimonia on CTRA gene set expression estimated by GLS were compared to estimates using multivariate (OLS) linear models and generalized estimating equation (GEE) models. The OLS estimates were tested using O’Brien’s OLS test, Anderson’s permutation \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}${r}_{F}^{2}$\end{document}rF2-test, two permutation F-tests (including GlobalAncova), and a rotation z-test (Roast). The GEE estimates were tested using a Wald test with robust standard errors. The performance (Type I, II, S, and M errors) of all tests was investigated using a Monte Carlo simulation of data explicitly modeled on the re-analyzed dataset. Results GLS estimates are inconsistent between data sets, and, in each dataset, at least one coefficient is large and highly statistically significant. By contrast, effects estimated by OLS or GEE are very small, especially relative to the standard errors. Bootstrap and permutation GLS distributions suggest that the GLS results in downward biased standard errors and inflated coefficients. The Monte Carlo simulation of error rates shows highly inflated Type I error from the GLS test and slightly inflated Type I error from the GEE test. By contrast, Type I error for all OLS tests are at the nominal level. The permutation F-tests have ∼1.9X the power of the other OLS tests. This increased power comes at a cost of high sign error (∼10%) if tested on small effects. Discussion The apparently replicated pattern of well-being effects on gene expression is most parsimoniously explained as “correlated noise” due to the geometry of multiple regression. The GLS for fixed effects with correlated error, or any linear mixed model for estimating fixed effects in designs with many repeated measures or outcomes, should be used cautiously because of the inflated Type I and M error. By contrast, all OLS tests perform well, and the permutation F-tests have superior performance, including moderate power for very small effects.
Collapse
Affiliation(s)
- Jeffrey A Walker
- Department of Biological Sciences, University of Southern Maine , Portland , ME , United States
| |
Collapse
|
20
|
Hsueh HM, Tsai CA. Gene set analysis using sufficient dimension reduction. BMC Bioinformatics 2016; 17:74. [PMID: 26852017 PMCID: PMC4744442 DOI: 10.1186/s12859-016-0928-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Accepted: 02/01/2016] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND Gene set analysis (GSA) aims to evaluate the association between the expression of biological pathways, or a priori defined gene sets, and a particular phenotype. Numerous GSA methods have been proposed to assess the enrichment of sets of genes. However, most methods are developed with respect to a specific alternative scenario, such as a differential mean pattern or a differential coexpression. Moreover, a very limited number of methods can handle either binary, categorical, or continuous phenotypes. In this paper, we develop two novel GSA tests, called SDRs, based on the sufficient dimension reduction technique, which aims to capture sufficient information about the relationship between genes and the phenotype. The advantages of our proposed methods are that they allow for categorical and continuous phenotypes, and they are also able to identify a variety of enriched gene sets. RESULTS Through simulation studies, we compared the type I error and power of SDRs with existing GSA methods for binary, triple, and continuous phenotypes. We found that SDR methods adequately control the type I error rate at the pre-specified nominal level, and they have a satisfactory power to detect gene sets with differential coexpression and to test non-linear associations between gene sets and a continuous phenotype. In addition, the SDR methods were compared with seven widely-used GSA methods using two real microarray datasets for illustration. CONCLUSIONS We concluded that the SDR methods outperform the others because of their flexibility with regard to handling different kinds of phenotypes and their power to detect a wide range of alternative scenarios. Our real data analysis highlights the differences between GSA methods for detecting enriched gene sets.
Collapse
Affiliation(s)
- Huey-Miin Hsueh
- Department of Statistics, National Chengchi UniversityZhinan Road, Taipei116, Taiwan, Taipei, 116, Taiwan.
| | - Chen-An Tsai
- Department of Agronomy, National Taiwan University, No. 1, Section 4, Roosevelt Road, Taipei, 106, Taiwan.
| |
Collapse
|
21
|
Su YC, Gauderman WJ, Berhane K, Lewinger JP. Adaptive Set-Based Methods for Association Testing. Genet Epidemiol 2015; 40:113-22. [PMID: 26707371 DOI: 10.1002/gepi.21950] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Revised: 11/02/2015] [Accepted: 11/17/2015] [Indexed: 12/31/2022]
Abstract
With a typical sample size of a few thousand subjects, a single genome-wide association study (GWAS) using traditional one single nucleotide polymorphism (SNP)-at-a-time methods can only detect genetic variants conferring a sizable effect on disease risk. Set-based methods, which analyze sets of SNPs jointly, can detect variants with smaller effects acting within a gene, a pathway, or other biologically relevant sets. Although self-contained set-based methods (those that test sets of variants without regard to variants not in the set) are generally more powerful than competitive set-based approaches (those that rely on comparison of variants in the set of interest with variants not in the set), there is no consensus as to which self-contained methods are best. In particular, several self-contained set tests have been proposed to directly or indirectly "adapt" to the a priori unknown proportion and distribution of effects of the truly associated SNPs in the set, which is a major determinant of their power. A popular adaptive set-based test is the adaptive rank truncated product (ARTP), which seeks the set of SNPs that yields the best-combined evidence of association. We compared the standard ARTP, several ARTP variations we introduced, and other adaptive methods in a comprehensive simulation study to evaluate their performance. We used permutations to assess significance for all the methods and thus provide a level playing field for comparison. We found the standard ARTP test to have the highest power across our simulations followed closely by the global model of random effects (GMRE) and a least absolute shrinkage and selection operator (LASSO)-based test.
Collapse
Affiliation(s)
- Yu-Chen Su
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - William James Gauderman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Kiros Berhane
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Juan Pablo Lewinger
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| |
Collapse
|
22
|
Lee S, Lim J, Sohn I, Jung SH, Park CK. Two sample test for high-dimensional partially paired data. J Appl Stat 2015. [DOI: 10.1080/02664763.2015.1014890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
23
|
An adaptive test for the mean vector in large-<mml:math altimg="si101.gif" display="inline" overflow="scroll" xmlns:xocs="http://www.elsevier.com/xml/xocs/dtd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.elsevier.com/xml/ja/dtd" xmlns:ja="http://www.elsevier.com/xml/ja/dtd" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:tb="http://www.elsevier.com/xml/common/table/dtd" xmlns:sb="http://www.elsevier.com/xml/common/struct-bib/dtd" xmlns:ce="http://www.elsevier.com/xml/common/dtd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:cals="http://www.elsevier.com/xml/common/cals/dtd" xmlns:sa="http://www.elsevier.com/xml/common/struct-aff/dtd"><mml:mi>p</mml:mi></mml:math>-small-<mml:math altimg="si102.gif" display="inline" overflow="scroll" xmlns:xocs="http://www.elsevier.com/xml/xocs/dtd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.elsevier.com/xml/ja/dtd" xmlns:ja="http://www.elsevier.com/xml/ja/dtd" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:tb="http://www.elsevier.com/xml/common/table/dtd" xmlns:sb="http://www.elsevier.com/xml/common/struct-bib/dtd" xmlns:ce="http://www.elsevier.com/xml/common/dtd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:cals="http://www.elsevier.com/xml/common/cals/dtd" xmlns:sa="http://www.elsevier.com/xml/common/struct-aff/dtd"><mml:mi>n</mml:mi></mml:math> problems. Comput Stat Data Anal 2015. [DOI: 10.1016/j.csda.2015.03.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
24
|
Engel J, Blanchet L, Bloemen B, van den Heuvel LP, Engelke UHF, Wevers RA, Buydens LMC. Regularized MANOVA (rMANOVA) in untargeted metabolomics. Anal Chim Acta 2015; 899:1-12. [PMID: 26547490 DOI: 10.1016/j.aca.2015.06.042] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Revised: 06/09/2015] [Accepted: 06/11/2015] [Indexed: 12/14/2022]
Abstract
Many advanced metabolomics experiments currently lead to data where a large number of response variables were measured while one or several factors were changed. Often the number of response variables vastly exceeds the sample size and well-established techniques such as multivariate analysis of variance (MANOVA) cannot be used to analyze the data. ANOVA simultaneous component analysis (ASCA) is an alternative to MANOVA for analysis of metabolomics data from an experimental design. In this paper, we show that ASCA assumes that none of the metabolites are correlated and that they all have the same variance. Because of these assumptions, ASCA may relate the wrong variables to a factor. This reduces the power of the method and hampers interpretation. We propose an improved model that is essentially a weighted average of the ASCA and MANOVA models. The optimal weight is determined in a data-driven fashion. Compared to ASCA, this method assumes that variables can correlate, leading to a more realistic view of the data. Compared to MANOVA, the model is also applicable when the number of samples is (much) smaller than the number of variables. These advantages are demonstrated by means of simulated and real data examples. The source code of the method is available from the first author upon request, and at the following github repository: https://github.com/JasperE/regularized-MANOVA.
Collapse
Affiliation(s)
- J Engel
- Radboud University Nijmegen, Institute for Molecules and Materials, Heyendaalseweg 135, Nijmegen, The Netherlands; Translational Metabolic Laboratory at the Department of Laboratory Medicine, Radboud University Medical Centre, Geert Grooteplein 10, Nijmegen, The Netherlands
| | - L Blanchet
- Radboud University Nijmegen, Institute for Molecules and Materials, Heyendaalseweg 135, Nijmegen, The Netherlands; Department of Biochemistry, Nijmegen Centre for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 10, Nijmegen, The Netherlands
| | - B Bloemen
- Radboud University Nijmegen, Institute for Molecules and Materials, Heyendaalseweg 135, Nijmegen, The Netherlands
| | - L P van den Heuvel
- Translational Metabolic Laboratory at the Department of Laboratory Medicine, Radboud University Medical Centre, Geert Grooteplein 10, Nijmegen, The Netherlands
| | - U H F Engelke
- Translational Metabolic Laboratory at the Department of Laboratory Medicine, Radboud University Medical Centre, Geert Grooteplein 10, Nijmegen, The Netherlands
| | - R A Wevers
- Translational Metabolic Laboratory at the Department of Laboratory Medicine, Radboud University Medical Centre, Geert Grooteplein 10, Nijmegen, The Netherlands
| | - L M C Buydens
- Radboud University Nijmegen, Institute for Molecules and Materials, Heyendaalseweg 135, Nijmegen, The Netherlands.
| |
Collapse
|
25
|
Affiliation(s)
- Insha Ullah
- Institute of Management Sciences; Kohat University of Science and Technology; Kohat 26000 Pakistan
| | - Beatrix Jones
- Institute of Natural & Mathematical Sciences; Massey University; Albany Campus, Private Bag 102904, North Shore Auckland 0745 New Zealand
| |
Collapse
|
26
|
Djordjilović V, Chiogna M, Massa MS, Romualdi C. Graphical modeling for gene set analysis: A critical appraisal. Biom J 2015; 57:852-66. [PMID: 26149206 DOI: 10.1002/bimj.201300287] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Revised: 03/13/2015] [Accepted: 03/17/2015] [Indexed: 11/08/2022]
Abstract
Current demand for understanding the behavior of groups of related genes, combined with the greater availability of data, has led to an increased focus on statistical methods in gene set analysis. In this paper, we aim to perform a critical appraisal of the methodology based on graphical models developed in Massa et al. (2010) that uses pathway signaling networks as a starting point to develop statistically sound procedures for gene set analysis. We pay attention to the potential of the methodology with respect to the organizational aspects of dealing with such complex but highly informative starting structures, that is pathways. We focus on three themes: the translation of a biological pathway into a graph suitable for modeling, the role of shrinkage when more genes than samples are obtained, the evaluation of respondence of the statistical models to the biological expectations. To study the impact of shrinkage, two simulation studies will be run. To evaluate the biological expectation we will use data from a network with known behavior that offer the possibility of carrying out a realistic check of respondence of the model to changes in the experimental conditions.
Collapse
Affiliation(s)
- Vera Djordjilović
- Department of Statistical Sciences, University of Padua, via Cesare Battisti 241, 35121 Padova, Italy
| | - Monica Chiogna
- Department of Statistical Sciences, University of Padua, via Cesare Battisti 241, 35121 Padova, Italy
| | - M Sofia Massa
- Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, United Kingdom
| | - Chiara Romualdi
- Department of Biology, University of Padua, Via Ugo Bassi 58/B, 35121 Padova, Italy
| |
Collapse
|
27
|
Khodakarim S, Tabatabaei SM, AlaviMajd H. The multivariate nonparametric methods for identifying gene sets with differential expression. Gene 2014; 552:18-23. [PMID: 25194897 DOI: 10.1016/j.gene.2014.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2014] [Revised: 08/28/2014] [Accepted: 09/01/2014] [Indexed: 10/24/2022]
Abstract
BACKGROUND Gene Set Analysis (GSA) identifies differential expression gene sets amid the different phenotypes. The results of published papers in this filed are inconsistent and there is no consensus on the best method. In this paper two new methods, in comparison to the previous ones, are introduced for GSA. METHODS The MMGSA and MRGSA methods based on multivariate nonparametric techniques were presented. The implementation of five GSA methods (Hotelling's T(2), Globaltest, Abs_Cat, Med_Cat and Rs_Cat) and the novel methods to detect differential gene expression between phenotypes were compared using simulated and real microarray data sets. RESULTS In a real dataset, the results showed that the powers of MMGSA and MRGSA were as well as Globaltest and Tsai. The MRGSA method has not a good performance in the simulation dataset. CONCLUSIONS The Globaltest method is the best method in the real or simulation datasets. The performance of MMGSA in simulation dataset is good in small-size gene sets. The GLS methods are not good in the simulated data, except the Med_Cat method in large-size gene sets.
Collapse
Affiliation(s)
- Soheila Khodakarim
- Faculty of Public Health, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | | | - Hamid AlaviMajd
- Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
28
|
Afsari B, Geman D, Fertig EJ. Learning dysregulated pathways in cancers from differential variability analysis. Cancer Inform 2014; 13:61-7. [PMID: 25392694 PMCID: PMC4218688 DOI: 10.4137/cin.s14066] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Revised: 08/13/2014] [Accepted: 08/14/2014] [Indexed: 12/16/2022] Open
Abstract
Analysis of gene sets can implicate activity in signaling pathways that is responsible for cancer initiation and progression, but is not discernible from the analysis of individual genes. Multiple methods and software packages have been developed to infer pathway activity from expression measurements for set of genes targeted by that pathway. Broadly, three major methodologies have been proposed: over-representation, enrichment, and differential variability. Both over-representation and enrichment analyses are effective techniques to infer differentially regulated pathways from gene sets with relatively consistent differentially expressed (DE) genes. Specifically, these algorithms aggregate statistics from each gene in the pathway. However, they overlook multivariate patterns related to gene interactions and variations in expression. Therefore, the analysis of differential variability of multigene expression patterns can be essential to pathway inference in cancers. The corresponding methodologies and software packages for such multivariate variability analysis of pathways are reviewed here. We also introduce a new, computationally efficient algorithm, expression variation analysis (EVA), which has been implemented along with a previously proposed algorithm, Differential Rank Conservation (DIRAC), in an open source R package, gene set regulation (GSReg). EVA inferred similar pathways as DIRAC at reduced computational costs. Moreover, EVA also inferred different dysregulated pathways than those identified by enrichment analysis.
Collapse
Affiliation(s)
- Bahman Afsari
- Postdoctoral Fellow, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Donald Geman
- Professor, Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
| | - Elana J Fertig
- Assistant Professor, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
29
|
Abstract
BACKGROUND The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. RESULTS In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. CONCLUSION Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.
Collapse
|
30
|
|
31
|
Minas C, Montana G. Distance-based analysis of variance: Approximate inference. Stat Anal Data Min 2014. [DOI: 10.1002/sam.11227] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
32
|
MAVTgsa: an R package for gene set (enrichment) analysis. BIOMED RESEARCH INTERNATIONAL 2014; 2014:346074. [PMID: 25101274 PMCID: PMC4101957 DOI: 10.1155/2014/346074] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Revised: 05/27/2014] [Accepted: 06/04/2014] [Indexed: 11/18/2022]
Abstract
Gene set analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q-value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.
Collapse
|
33
|
Wang X, Pyne S, Dinu I. Gene set enrichment analysis for multiple continuous phenotypes. BMC Bioinformatics 2014; 15:260. [PMID: 25086605 PMCID: PMC4129103 DOI: 10.1186/1471-2105-15-260] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Accepted: 07/25/2014] [Indexed: 12/02/2022] Open
Abstract
Background Gene set analysis (GSA) methods test the association of sets of genes with phenotypes in gene expression microarray studies. While GSA methods on a single binary or categorical phenotype abounds, little attention has been paid to the case of a continuous phenotype, and there is no method to accommodate correlated multiple continuous phenotypes. Result We propose here an extension of the linear combination test (LCT) to its new version for multiple continuous phenotypes, incorporating correlations among gene expressions of functionally related gene sets, as well as correlations among multiple phenotypes. Further, we extend our new method to its nonlinear version, referred as nonlinear combination test (NLCT), to test potential nonlinear association of gene sets with multiple phenotypes. Simulation study and a real microarray example demonstrate the practical aspects of the proposed methods. Conclusion The proposed approaches are effective in controlling type I errors and powerful in testing associations between gene-sets and multiple continuous phenotypes. They are both computationally effective. Naively (univariately) analyzing a group of multiple correlated phenotypes could be dangerous. R-codes to perform LCT and NLCT for multiple continuous phenotypes are available at http://www.ualberta.ca/~yyasui/homepage.html. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-260) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoming Wang
- School of Public Health, University of Alberta, Edmonton, AB T6G 1C9, Canada.
| | | | | |
Collapse
|
34
|
Martini P, Sales G, Calura E, Cagnin S, Chiogna M, Romualdi C. timeClip: pathway analysis for time course data without replicates. BMC Bioinformatics 2014; 15 Suppl 5:S3. [PMID: 25077979 PMCID: PMC4095003 DOI: 10.1186/1471-2105-15-s5-s3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Background Time-course gene expression experiments are useful tools for exploring biological processes. In this type of experiments, gene expression changes are monitored along time. Unfortunately, replication of time series is still costly and usually long time course do not have replicates. Many approaches have been proposed to deal with this data structure, but none of them in the field of pathway analysis. Pathway analyses have acquired great relevance for helping the interpretation of gene expression data. Several methods have been proposed to this aim: from the classical enrichment to the more complex topological analysis that gains power from the topology of the pathway. None of them were devised to identify temporal variations in time course data. Results Here we present timeClip, a topology based pathway analysis specifically tailored to long time series without replicates. timeClip combines dimension reduction techniques and graph decomposition theory to explore and identify the portion of pathways that is most time-dependent. In the first step, timeClip selects the time-dependent pathways; in the second step, the most time dependent portions of these pathways are highlighted. We used timeClip on simulated data and on a benchmark dataset regarding mouse muscle regeneration model. Our approach shows good performance on different simulated settings. On the real dataset, we identify 76 time-dependent pathways, most of which known to be involved in the regeneration process. Focusing on the 'mTOR signaling pathway' we highlight the timing of key processes of the muscle regeneration: from the early pathway activation through growth factor signals to the late burst of protein production needed for the fiber regeneration. Conclusions timeClip represents a new improvement in the field of time-dependent pathway analysis. It allows to isolate and dissect pathways characterized by time-dependent components. Furthermore, using timeClip on a mouse muscle regeneration dataset we were able to characterize the process of muscle fiber regeneration with its correct timing.
Collapse
|
35
|
Soneson C, Fontes M. Incorporation of gene exchangeabilities improves the reproducibility of gene set rankings. Comput Stat Data Anal 2014. [DOI: 10.1016/j.csda.2012.07.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
36
|
Manjanatha MG, Bishop ME, Pearce MG, Kulkarni R, Lyn-Cook LE, Ding W. Genotoxicity of doxorubicin in F344 rats by combining the comet assay, flow-cytometric peripheral blood micronucleus test, and pathway-focused gene expression profiling. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2014; 55:24-34. [PMID: 24155181 DOI: 10.1002/em.21822] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Revised: 09/18/2013] [Accepted: 09/18/2013] [Indexed: 06/02/2023]
Abstract
Doxorubicin (DOX) is an antineoplastic drug effective against many human malignancies. DOX's clinical efficacy is greatly limited because of severe cardiotoxicity. To evaluate if DOX is genotoxic in the heart, ~7-week-old, male F344 rats were administered intravenously 1, 2, and 3 mg/kg bw DOX at 0, 24, 48, and 69 hr and the Comet assays in heart, liver, kidney, and testis and micronucleus (MN) assay in the peripheral blood (PB) erythrocytes using flow cytometry were conducted. Rats were euthanized at 72 hr and PB was removed for the MN assay and single cells were isolated from multiple tissues for the Comet assays. None of the doses of DOX induced a significant DNA damage in any of the tissues examined by the alkaline Comet assay. Contrastingly, the glycosylase enzymes-modified Comet assay showed a significant dose dependent increase in the oxidative DNA damage in the cardiac tissue (P ≤ 0.05). In the liver, only the top dose induced significant increase in the oxidative DNA damage (P ≤ 0.05). The histopathology showed no severe cardiotoxicity but non-neoplastic lesions were present in both untreated and treated samples. A severe toxicity likely occurred in the bone marrow because no viable reticulocytes could be screened for the MN assay. Gene expression profiling of the heart tissues showed a significant alteration in the expression of 11 DNA damage and repair genes. These results suggest that DOX is genotoxic in the heart and the DNA damage may be induced primarily via the production of reactive oxygen species.
Collapse
Affiliation(s)
- Mugimane G Manjanatha
- Division of Genetic and Molecular Toxicology, US Food and Drug Administration, National Center for Toxicological Research, Jefferson, Arkansas
| | | | | | | | | | | |
Collapse
|
37
|
|
38
|
Lu TP, Chuang EY, Chen JJ. Identification of reproducible gene expression signatures in lung adenocarcinoma. BMC Bioinformatics 2013; 14:371. [PMID: 24369726 PMCID: PMC3877965 DOI: 10.1186/1471-2105-14-371] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Accepted: 12/20/2013] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Lung cancer is the leading cause of cancer-related death worldwide. Tremendous research efforts have been devoted to improving treatment procedures, but the average five-year overall survival rates are still less than 20%. Many biomarkers have been identified for predicting survival; challenges arise, however, in translating the findings into clinical practice due to their inconsistency and irreproducibility. In this study, we proposed an approach by identifying predictive genes through pathways. RESULTS The microarrays from Shedden et al. were used as the training set, and the log-rank test was performed to select potential signature genes. We focused on 24 cancer-related pathways from 4 biological databases. A scoring scheme was developed by the Cox hazard regression model, and patients were divided into two groups based on the medians. Subsequently, their predictability and generalizability were evaluated by the 2-fold cross-validation and a resampling test in 4 independent datasets, respectively. A set of 16 genes related to apoptosis execution was demonstrated to have good predictability as well as generalizability in more than 700 lung adenocarcinoma patients and was reproducible in 4 independent datasets. This signature set was shown to have superior performances compared to 6 other published signatures. Furthermore, the corresponding risk scores derived from the set were found to associate with the efficacy of the anti-cancer drug ZD-6474 targeting EGFR. CONCLUSIONS In summary, we presented a new approach to identify reproducible survival predictors for lung adenocarcinoma, and the identified genes may serve as both prognostic and predictive biomarkers in the future.
Collapse
Affiliation(s)
| | | | - James J Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, Food and Drug Administration Jefferson, Little Rock, Arkansas, USA.
| |
Collapse
|
39
|
|
40
|
Assessment of gene set analysis methods based on microarray data. Gene 2013; 534:383-9. [PMID: 24012817 DOI: 10.1016/j.gene.2013.08.063] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2013] [Revised: 07/23/2013] [Accepted: 08/20/2013] [Indexed: 11/21/2022]
Abstract
Gene set analysis (GSA) incorporates biological information into statistical knowledge to identify gene sets differently expressed between two or more phenotypes. It allows us to gain an insight into the functional working mechanism of cells beyond the detection of differently expressed gene sets. In order to evaluate the competence of GSA approaches, three self-contained GSA approaches with different statistical methods were chosen; Category, Globaltest and Hotelling's T(2) together with their assayed power to identify the differences expressed via simulation and real microarray data. The Category does not take care of the correlation structure, while the other two deal with correlations. In order to perform these methods, R and Bioconductor were used. Furthermore, venous thromboembolism and acute lymphoblastic leukemia microarray data were applied. The results of three GSAs showed that the competence of these methods depends on the distribution of gene expression in a dataset. It is very important to assay the distribution of gene expression data before choosing the GSA method to identify gene sets differently expressed between phenotypes. On the other hand, assessment of common genes among significant gene sets indicated that there was a significant agreement between the result of GSA and the findings of biologists.
Collapse
|
41
|
Yemini E, Jucikas T, Grundy LJ, Brown AE, Schafer WR. A database of Caenorhabditis elegans behavioral phenotypes. Nat Methods 2013; 10:877-9. [PMID: 23852451 PMCID: PMC3962822 DOI: 10.1038/nmeth.2560] [Citation(s) in RCA: 188] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 06/01/2013] [Indexed: 11/10/2022]
Abstract
Using low-cost automated tracking microscopes, we have generated a behavioral database for 305 Caenorhabditis elegans strains, including 76 mutants with no previously described phenotype. The growing database currently consists of 9,203 short videos segmented to extract behavior and morphology features, and these videos and feature data are available online for further analysis. The database also includes summary statistics for 702 measures with statistical comparisons to wild-type controls so that phenotypes can be identified and understood by users.
Collapse
Affiliation(s)
- Eviatar Yemini
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 OQH, United Kingdom
| | - Tadas Jucikas
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 OQH, United Kingdom
| | - Laura J. Grundy
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 OQH, United Kingdom
| | - André E.X. Brown
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 OQH, United Kingdom
| | - William R. Schafer
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 OQH, United Kingdom
| |
Collapse
|
42
|
Abstract
A major consideration in multitrait analysis is which traits should be jointly analyzed. As a common strategy, multitrait analysis is performed either on pairs of traits or on all of traits. To fully exploit the power of multitrait analysis, we propose variable selection to choose a subset of informative traits for multitrait quantitative trait locus (QTL) mapping. The proposed method is very useful for achieving optimal statistical power for QTL identification and for disclosing the most relevant traits. It is also a practical strategy to effectively take advantage of multitrait analysis when the number of traits under consideration is too large, making the usual multivariate analysis of all traits challenging. We study the impact of selection bias and the usage of permutation tests in the context of variable selection and develop a powerful implementation procedure of variable selection for genome scanning. We demonstrate the proposed method and selection procedure in a backcross population, using both simulated and real data. The extension to other experimental mapping populations is straightforward.
Collapse
|
43
|
Dinu I, Wang X, Kelemen LE, Vatanpour S, Pyne S. Linear combination test for gene set analysis of a continuous phenotype. BMC Bioinformatics 2013; 14:212. [PMID: 23815123 PMCID: PMC3717275 DOI: 10.1186/1471-2105-14-212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 06/13/2013] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Gene set analysis (GSA) methods test the association of sets of genes with a phenotype in gene expression microarray studies. Many GSA methods have been proposed, especially methods for use with a binary phenotype. Equally, if not more importantly however, is the ability to test the enrichment of a gene signature or pathway against the continuous phenotypes which are routinely and commonly observed in, for example, clinicopathological measurements. It is not always easy or meaningful to dichotomize continuous phenotypes into two classes, and attempting to do this may lead to the inaccurate classification of samples, which would affect the downstream enrichment analysis. In the present study, we have build on recent efforts to incorporate correlation structure within gene sets and pathways into the GSA test statistic. To address the issue of continuous phenotypes directly without the need for artificial discrete classification and thus increase the power of the test while ensuring computational efficiency and rigor, new GSA methods that can incorporate a covariance matrix estimator for a continuous phenotype may present an effective approach. RESULTS We have designed a new method by extending the GSA approach called Linear Combination Test (LCT) from a binary to a continuous phenotype. Simulation studies and a real microarray dataset were used to compare the proposed LCT for a continuous phenotype, a modification of LCT (referred to as LCT2), and two publicly available GSA methods for continuous phenotypes. CONCLUSIONS We found that the LCT methods performed better than the other two GSA methods; however, this finding should be understood in the context of our specific simulation studies and the real microarray dataset that were used to compare the methods. Free R-codes to perform LCT for binary and continuous phenotypes are available at http://www.ualberta.ca/~yyasui/homepage.html. The R-code to perform LCT for a continuous phenotype is available as Additional file 1.
Collapse
Affiliation(s)
- Irina Dinu
- School of Public Health, University of Alberta, Edmonton, Alberta T6G 1C9, Canada.
| | | | | | | | | |
Collapse
|
44
|
Soheila K, Hamid A, Farid Z, Mostafa RT, Nasrin DN, Syyed-Mohammad T, Vahide T. Comparison of univariate and multivariate gene set analysis in acute lymphoblastic leukemia. Asian Pac J Cancer Prev 2013; 14:1629-33. [PMID: 23679247 DOI: 10.7314/apjcp.2013.14.3.1629] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene set analysis (GSA) incorporates biological with statistical knowledge to identify gene sets which are differentially expressed that between two or more phenotypes. MATERIALS AND METHODS In this paper gene sets differentially expressed between acute lymphoblastic leukaemia (ALL) with BCR-ABL and those with no observed cytogenetic abnormalities were determined by GSA methods. The BCR-ABL is an abnormal gene found in some people with ALL. RESULTS The results of two GSAs showed that the Category test identified 30 gene sets differentially expressed between two phenotypes, while the Hotelling's T2 could discover just 19 gene sets. On the other hand, assessment of common genes among significant gene sets showed that there were high agreement between the results of GSA and the findings of biologists. In addition, the performance of these methods was compared by simulated and ALL data. CONCLUSIONS The results on simulated data indicated decrease in the type I error rate and increase the power in multivariate (Hotelling's T2) test as increasing the correlation between gene pairs in contrast to the univariate (Category) test.
Collapse
Affiliation(s)
- Khodakarim Soheila
- Department of Epidemiology, Faculty of Public Health, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | | | | | | | | | | | | |
Collapse
|
45
|
Daub JT, Hofer T, Cutivet E, Dupanloup I, Quintana-Murci L, Robinson-Rechavi M, Excoffier L. Evidence for polygenic adaptation to pathogens in the human genome. Mol Biol Evol 2013; 30:1544-58. [PMID: 23625889 DOI: 10.1093/molbev/mst080] [Citation(s) in RCA: 149] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Most approaches aiming at finding genes involved in adaptive events have focused on the detection of outlier loci, which resulted in the discovery of individually "significant" genes with strong effects. However, a collection of small effect mutations could have a large effect on a given biological pathway that includes many genes, and such a polygenic mode of adaptation has not been systematically investigated in humans. We propose here to evidence polygenic selection by detecting signals of adaptation at the pathway or gene set level instead of analyzing single independent genes. Using a gene-set enrichment test to identify genome-wide signals of adaptation among human populations, we find that most pathways globally enriched for signals of positive selection are either directly or indirectly involved in immune response. We also find evidence for long-distance genotypic linkage disequilibrium, suggesting functional epistatic interactions between members of the same pathway. Our results show that past interactions with pathogens have elicited widespread and coordinated genomic responses, and suggest that adaptation to pathogens can be considered as a primary example of polygenic selection.
Collapse
Affiliation(s)
- Josephine T Daub
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Berne, Berne, Switzerland.
| | | | | | | | | | | | | |
Collapse
|
46
|
Väremo L, Nielsen J, Nookaew I. Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods. Nucleic Acids Res 2013; 41:4378-91. [PMID: 23444143 PMCID: PMC3632109 DOI: 10.1093/nar/gkt111] [Citation(s) in RCA: 513] [Impact Index Per Article: 46.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Gene set analysis (GSA) is used to elucidate genome-wide data, in particular transcriptome data. A multitude of methods have been proposed for this step of the analysis, and many of them have been compared and evaluated. Unfortunately, there is no consolidated opinion regarding what methods should be preferred, and the variety of available GSA software and implementations pose a difficulty for the end-user who wants to try out different methods. To address this, we have developed the R package Piano that collects a range of GSA methods into the same system, for the benefit of the end-user. Further on we refine the GSA workflow by using modifications of the gene-level statistics. This enables us to divide the resulting gene set P-values into three classes, describing different aspects of gene expression directionality at gene set level. We use our fully implemented workflow to investigate the impact of the individual components of GSA by using microarray and RNA-seq data. The results show that the evaluated methods are globally similar and the major separation correlates well with our defined directionality classes. As a consequence of this, we suggest to use a consensus scoring approach, based on multiple GSA runs. In combination with the directionality classes, this constitutes a more thorough basis for an enriched biological interpretation.
Collapse
Affiliation(s)
- Leif Väremo
- Department of Chemical and Biological Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
| | | | | |
Collapse
|
47
|
Brown AEX, Yemini EI, Grundy LJ, Jucikas T, Schafer WR. A dictionary of behavioral motifs reveals clusters of genes affecting Caenorhabditis elegans locomotion. Proc Natl Acad Sci U S A 2013; 110:791-6. [PMID: 23267063 PMCID: PMC3545781 DOI: 10.1073/pnas.1211447110] [Citation(s) in RCA: 130] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Visible phenotypes based on locomotion and posture have played a critical role in understanding the molecular basis of behavior and development in Caenorhabditis elegans and other model organisms. However, it is not known whether these human-defined features capture the most important aspects of behavior for phenotypic comparison or whether they are sufficient to discover new behaviors. Here we show that four basic shapes, or eigenworms, previously described for wild-type worms, also capture mutant shapes, and that this representation can be used to build a dictionary of repetitive behavioral motifs in an unbiased way. By measuring the distance between each individual's behavior and the elements in the motif dictionary, we create a fingerprint that can be used to compare mutants to wild type and to each other. This analysis has revealed phenotypes not previously detected by real-time observation and has allowed clustering of mutants into related groups. Behavioral motifs provide a compact and intuitive representation of behavioral phenotypes.
Collapse
Affiliation(s)
- André E. X. Brown
- Medical Research Council Laboratory of Molecular Biology, Cambridge CB2 0QH, United Kingdom
| | - Eviatar I. Yemini
- Medical Research Council Laboratory of Molecular Biology, Cambridge CB2 0QH, United Kingdom
| | - Laura J. Grundy
- Medical Research Council Laboratory of Molecular Biology, Cambridge CB2 0QH, United Kingdom
| | - Tadas Jucikas
- Medical Research Council Laboratory of Molecular Biology, Cambridge CB2 0QH, United Kingdom
| | - William R. Schafer
- Medical Research Council Laboratory of Molecular Biology, Cambridge CB2 0QH, United Kingdom
| |
Collapse
|
48
|
Emmert-Streib F, Tripathi S, de Matos Simoes R. Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods. Biol Direct 2012; 7:44. [PMID: 23227854 PMCID: PMC3769148 DOI: 10.1186/1745-6150-7-44] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2012] [Accepted: 10/01/2012] [Indexed: 12/22/2022] Open
Abstract
High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning Laboratory, Queen's University Belfast, Belfast, UK.
| | | | | |
Collapse
|
49
|
Random forests-based differential analysis of gene sets for gene expression data. Gene 2012; 518:179-86. [PMID: 23219997 DOI: 10.1016/j.gene.2012.11.034] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Accepted: 11/27/2012] [Indexed: 01/14/2023]
Abstract
In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses.
Collapse
|
50
|
Yu T, Bai Y. Analyzing LC/MS metabolic profiling data in the context of existing metabolic networks. ACTA ACUST UNITED AC 2012; 1:83-91. [PMID: 24010053 DOI: 10.2174/2213235x11301010084] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Metabolic profiling is the unbiased detection and quantification of low molecular-weight metabolites in a living system. It is rapidly developing in biological and translational research, contributing to disease mechanism elucidation, environmental chemical surveillance, biomarker detection, and health outcome prediction. Recent developments in experimental and computational technology allow more and more known metabolites to be detected and quantified from complex samples. As the coverage of the metabolic network improves, it has become feasible to examine metabolic profiling data from a systems perspective, i.e. interpreting the data and performing statistical inference in the context of pathways and genome-scale metabolic networks. Recently a number of methods have been developed in this area, and much improvement in algorithms and databases are still needed. In this review, we survey some methods for the analysis of metabolic profiling data based on metabolic networks.
Collapse
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA
| | | |
Collapse
|