1
|
Gisby JS, Buang NB, Papadaki A, Clarke CL, Malik TH, Medjeral-Thomas N, Pinheiro D, Mortimer PM, Lewis S, Sandhu E, McAdoo SP, Prendecki MF, Willicombe M, Pickering MC, Botto M, Thomas DC, Peters JE. Multi-omics identify falling LRRC15 as a COVID-19 severity marker and persistent pro-thrombotic signals in convalescence. Nat Commun 2022; 13:7775. [PMID: 36522333 PMCID: PMC9753891 DOI: 10.1038/s41467-022-35454-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 12/05/2022] [Indexed: 12/23/2022] Open
Abstract
Patients with end-stage kidney disease (ESKD) are at high risk of severe COVID-19. Here, we perform longitudinal blood sampling of ESKD haemodialysis patients with COVID-19, collecting samples pre-infection, serially during infection, and after clinical recovery. Using plasma proteomics, and RNA-sequencing and flow cytometry of immune cells, we identify transcriptomic and proteomic signatures of COVID-19 severity, and find distinct temporal molecular profiles in patients with severe disease. Supervised learning reveals that the plasma proteome is a superior indicator of clinical severity than the PBMC transcriptome. We show that a decreasing trajectory of plasma LRRC15, a proposed co-receptor for SARS-CoV-2, is associated with a more severe clinical course. We observe that two months after the acute infection, patients still display dysregulated gene expression related to vascular, platelet and coagulation pathways, including PF4 (platelet factor 4), which may explain the prolonged thrombotic risk following COVID-19.
Collapse
Affiliation(s)
- Jack S Gisby
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
| | - Norzawani B Buang
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
| | - Artemis Papadaki
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
| | - Candice L Clarke
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
- Renal and Transplant Centre, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London, UK
| | - Talat H Malik
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
| | - Nicholas Medjeral-Thomas
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
- Renal and Transplant Centre, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London, UK
| | - Damiola Pinheiro
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
| | - Paige M Mortimer
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
| | - Shanice Lewis
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
| | - Eleanor Sandhu
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
- Renal and Transplant Centre, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London, UK
| | - Stephen P McAdoo
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
- Renal and Transplant Centre, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London, UK
| | - Maria F Prendecki
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
- Renal and Transplant Centre, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London, UK
| | - Michelle Willicombe
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
- Renal and Transplant Centre, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London, UK
| | - Matthew C Pickering
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
| | - Marina Botto
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK
| | - David C Thomas
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK.
- Renal and Transplant Centre, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London, UK.
| | - James E Peters
- Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College London, London, UK.
| |
Collapse
|
2
|
Reanalysis and integration of public microarray datasets reveals novel host genes modulated in leprosy. Mol Genet Genomics 2020; 295:1355-1368. [PMID: 32661593 DOI: 10.1007/s00438-020-01705-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Accepted: 07/01/2020] [Indexed: 01/24/2023]
Abstract
Due to multiple hypothesis testing with often limited sample size, microarrays and other-omics technologies can sometimes produce irreproducible findings. Complementary to better experimental design, reanalysis and integration of gene expression datasets may help overcome reproducibility issues by identifying consistent differentially expressed genes from independent studies. In this work, after a systematic search, nine microarray datasets evaluating host gene expression in leprosy were reanalyzed and the information was integrated to strengthen evidence of differential expression for several genes. Our results are relevant in prioritizing genes and pathways for further investigation, whether in functional studies or in biomarker discovery. Reanalysis of individual datasets revealed several differentially expressed genes (DEGs) in accordance with original reports. Then, five integration methods (P value and effect size based) were tested. In the end, random-effects model and ratio association were selected as the main methods to pinpoint DEGs. Overall, classic pathways were found corroborating previous findings and validating this approach. Also, we identified some novel DEG involved especially with skin development processes (AQP3, AKR1C3, CYP27B1, LTB, VDR) and keratinocyte biology (CSTA, DSG1, KRT14, KRT5, PKP1, IVL), both still poorly understood in leprosy context. In addition, here we provide aggregated evidence towards some gene candidates that should be prioritized in further leprosy research, as they are likely important in immunopathogenesis. Altogether, these data are useful in better understanding host responses to the disease and, at the same time, provide a list of potential host biomarkers that could be useful in complementing leprosy diagnosis based on transcriptional levels.
Collapse
|
3
|
Replicable and Coupled Changes in Innate and Adaptive Immune Gene Expression in Two Case-Control Studies of Blood Microarrays in Major Depressive Disorder. Biol Psychiatry 2018; 83:70-80. [PMID: 28688579 PMCID: PMC5720346 DOI: 10.1016/j.biopsych.2017.01.021] [Citation(s) in RCA: 140] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Revised: 01/08/2017] [Accepted: 01/12/2017] [Indexed: 12/22/2022]
Abstract
BACKGROUND Peripheral inflammation is often associated with major depressive disorder (MDD), and immunological biomarkers of depression remain a focus of investigation. METHODS We used microarray data on whole blood from two independent case-control studies of MDD: the GlaxoSmithKline-High-Throughput Disease-specific target Identification Program [GSK-HiTDiP] study (113 patients and 57 healthy control subjects) and the Janssen-Brain Resource Company study (94 patients and 100 control subjects). Genome-wide differential gene expression analysis (18,863 probes) resulted in a p value for each gene in each study. A Bayesian method identified the largest p-value threshold (q = .025) associated with twice the number of genes differentially expressed in both studies compared with the number of coincidental case-control differences expected by chance. RESULTS A total of 165 genes were differentially expressed in both studies with concordant direction of fold change. The 90 genes overexpressed (or UP genes) in MDD were significantly enriched for immune response to infection, were concentrated in a module of the gene coexpression network associated with innate immunity, and included clusters of genes with correlated expression in monocytes, monocyte-derived dendritic cells, and neutrophils. In contrast, the 75 genes underexpressed (or DOWN genes) in MDD were associated with the adaptive immune response and included clusters of genes with correlated expression in T cells, natural killer cells, and erythroblasts. Consistently, the MDD patients with overexpression of UP genes also had underexpression of DOWN genes (correlation > .70 in both studies). CONCLUSIONS MDD was replicably associated with proinflammatory activation of the peripheral innate immune system, coupled with relative inactivation of the adaptive immune system, indicating the potential of transcriptional biomarkers for immunological stratification of patients with depression.
Collapse
|
4
|
Li X, Wang X, Xiao G. A comparative study of rank aggregation methods for partial and top ranked lists in genomic applications. Brief Bioinform 2017; 20:178-189. [PMID: 28968705 PMCID: PMC6357556 DOI: 10.1093/bib/bbx101] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2017] [Indexed: 02/05/2023] Open
Abstract
Rank aggregation (RA), the process of combining multiple ranked lists into a single ranking, has played an important role in integrating information from individual genomic studies that address the same biological question. In previous research, attention has been focused on aggregating full lists. However, partial and/or top ranked lists are prevalent because of the great heterogeneity of genomic studies and limited resources for follow-up investigation. To be able to handle such lists, some ad hoc adjustments have been suggested in the past, but how RA methods perform on them (after the adjustments) has never been fully evaluated. In this article, a systematic framework is proposed to define different situations that may occur based on the nature of individually ranked lists. A comprehensive simulation study is conducted to examine the performance characteristics of a collection of existing RA methods that are suitable for genomic applications under various settings simulated to mimic practical situations. A non-small cell lung cancer data example is provided for further comparison. Based on our numerical results, general guidelines about which methods perform the best/worst, and under what conditions, are provided. Also, we discuss key factors that substantially affect the performance of the different methods.
Collapse
Affiliation(s)
- Xue Li
- Department of Statistical Science at Southern Methodist University, Dallas, TX
| | - Xinlei Wang
- Department of Statistical Science at Southern Methodist University, Dallas, TX,Corresponding author. Xinlei Wang, Department of Statistical Science, Southern Methodist University, 3225 Daniel Avenue, P O Box 750332, Dallas, Texas 75275, USA. Tel: 214-768-2459; Fax: (214) 768-4035; E-mail:
| | - Guanghua Xiao
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX
| |
Collapse
|
5
|
Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, Cairns J, Wingett SW, Várnai C, Thiecke MJ, Burden F, Farrow S, Cutler AJ, Rehnström K, Downes K, Grassi L, Kostadima M, Freire-Pritchett P, Wang F, Stunnenberg HG, Todd JA, Zerbino DR, Stegle O, Ouwehand WH, Frontini M, Wallace C, Spivakov M, Fraser P. Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters. Cell 2016; 167:1369-1384.e19. [PMID: 27863249 PMCID: PMC5123897 DOI: 10.1016/j.cell.2016.09.037] [Citation(s) in RCA: 648] [Impact Index Per Article: 81.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Revised: 09/06/2016] [Accepted: 09/22/2016] [Indexed: 12/20/2022]
Abstract
Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases.
Collapse
Affiliation(s)
- Biola M Javierre
- Nuclear Dynamics Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Oliver S Burren
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Cambridge Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK
| | - Steven P Wilder
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Roman Kreuzhuber
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SD, UK; Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; National Health Service Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
| | - Steven M Hill
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK
| | - Sven Sewitz
- Nuclear Dynamics Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Jonathan Cairns
- Nuclear Dynamics Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Steven W Wingett
- Nuclear Dynamics Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Csilla Várnai
- Nuclear Dynamics Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Michiel J Thiecke
- Nuclear Dynamics Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Frances Burden
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; National Health Service Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
| | - Samantha Farrow
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; National Health Service Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
| | - Antony J Cutler
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Cambridge Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK
| | - Karola Rehnström
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; National Health Service Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
| | - Kate Downes
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; National Health Service Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
| | - Luigi Grassi
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; National Health Service Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
| | - Myrto Kostadima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SD, UK; Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; National Health Service Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
| | - Paula Freire-Pritchett
- Nuclear Dynamics Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Fan Wang
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK
| | - Hendrik G Stunnenberg
- Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Radboud University Nijmegen, Geert Grooteplein Zuid 30, 6525 GA Nijmegen, the Netherlands
| | - John A Todd
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Cambridge Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK
| | - Daniel R Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Willem H Ouwehand
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; National Health Service Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; British Heart Foundation Centre of Excellence, Division of Cardiovascular Medicine, Addenbrooke's Hospital, Hills Road, Cambridge CB2 0QQ, UK; Department of Human Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK
| | - Mattia Frontini
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; National Health Service Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK; British Heart Foundation Centre of Excellence, Division of Cardiovascular Medicine, Addenbrooke's Hospital, Hills Road, Cambridge CB2 0QQ, UK.
| | - Chris Wallace
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Cambridge Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK; MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK; Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge CB2 0SP, UK.
| | - Mikhail Spivakov
- Nuclear Dynamics Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK.
| | - Peter Fraser
- Nuclear Dynamics Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK.
| |
Collapse
|
6
|
Arima S, Liseo B, Mariani F, Tardella L. Exploiting blank spots for model-based background correction in discovering genes with DNA array data. STAT MODEL 2011. [DOI: 10.1177/1471082x1001100201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivated by a real data set deriving from a study on the genetic determinants of the behavior of Mycobacterium tuberculosis (MTB) hosted in macrophage, we take advantage of the presence of control spots and illustrate modelling issues for background correction and the ensuing empirical findings resulting from a Bayesian hierarchical approach to the problem of detecting differentially expressed genes. We prove the usefulness of a fully integrated approach where background correction and normalization are embedded in a single model-based framework, creating a new tailored model to account for the peculiar features of DNA array data where null expressions are planned by design. We also advocate the use of an alternative normalization device resulting from a suitable reparameterization. The new model is validated by using both simulated and our MTB data. This work suggests that the presence of a substantial fraction of exact null expressions might be the effect of an imperfect background calibration and shows how this can be suitably re-calibrated with the information coming from control spots. The proposed idea can be extended to all experiments in which a subset of genes whose expression levels can be ascribed mainly to background noise is planned by design.
Collapse
Affiliation(s)
- Serena Arima
- Serena Arima, Dipartimento di metodi e modelli per l’economia, il territorio e la finanza, Sapienza Università di Roma, via del Castno Laurenziano 9, Roma, 00161, Italy
| | - Brunero Liseo
- Dipartimento di metodi e modelli per l’economia, il territorio e la finanza, Sapienza Università di Roma, Italy
| | | | - Luca Tardella
- Dipartimento di Statistica, Sapienza Università di Roma, Italy
| |
Collapse
|
7
|
Blangiardo M, Cassese A, Richardson S. sdef: an R package to synthesize lists of significant features in related experiments. BMC Bioinformatics 2010; 11:270. [PMID: 20487547 PMCID: PMC3239329 DOI: 10.1186/1471-2105-11-270] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2010] [Accepted: 05/20/2010] [Indexed: 11/16/2022] Open
Abstract
Background In microarray studies researchers are often interested in the comparison of relevant quantities between two or more similar experiments, involving different treatments, tissues, or species. Typically each experiment reports measures of significance (e.g. p-values) or other measures that rank its features (e.g genes). Our objective is to find a list of features that are significant in all experiments, to be further investigated. In this paper we present an R package called sdef, that allows the user to quantify the evidence of communality between the experiments using previously proposed statistical methods based on the ranked lists of p-values. sdef implements two approaches that address this objective: the first is a permutation test of the maximal ratio of observed to expected common features under the hypothesis of independence between the experiments. The second approach, set in a Bayesian framework, is more flexible as it takes into account the uncertainty on the number of genes differentially expressed in each experiment. Results We used sdef to re-analyze publicly available data i) on Type 2 diabetes susceptibility in mice on liver and skeletal muscle (two experiments); ii) on molecular similarities between mammalian sexes (three experiments). For the first example, we found between 68 and 104 genes commonly perturbed between the two tissues, using the two methods described above, and enrichment of the inflammation pathways, which are related to obesity and diabetes. For the second example, looking at three lists of features, we found 110 genes commonly perturbed between the three tissues, using the same two methods, and enrichment on genes involved in cell development. Conclusions sdef is an R package that provides researchers with an easy and powerful methodology to find lists of features commonly perturbed in two or more experiments to be further investigated. The package is provided with plots and tables to help the user visualize and interpret the results. The Windows, Linux and MacOS versions of the package, together with the documentation are available on the website http://cran.r-project.org/web/packages/sdef/index.html.
Collapse
Affiliation(s)
- Marta Blangiardo
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College, St, Mary's Campus, Norfolk Place London W2 1PG, UK.
| | | | | |
Collapse
|
8
|
Rouam S, Moreau T, Broët P. Identifying common prognostic factors in genomic cancer studies: a novel index for censored outcomes. BMC Bioinformatics 2010; 11:150. [PMID: 20334636 PMCID: PMC2863163 DOI: 10.1186/1471-2105-11-150] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2009] [Accepted: 03/24/2010] [Indexed: 01/12/2023] Open
Abstract
Background With the growing number of public repositories for high-throughput genomic data, it is of great interest to combine the results produced by independent research groups. Such a combination allows the identification of common genomic factors across multiple cancer types and provides new insights into the disease process. In the framework of the proportional hazards model, classical procedures, which consist of ranking genes according to the estimated hazard ratio or the p-value obtained from a test statistic of no association between survival and gene expression level, are not suitable for gene selection across multiple genomic datasets with different sample sizes. We propose a novel index for identifying genes with a common effect across heterogeneous genomic studies designed to remain stable whatever the sample size and which has a straightforward interpretation in terms of the percentage of separability between patients according to their survival times and gene expression measurements. Results The simulations results show that the proposed index is not substantially affected by the sample size of the study and the censoring. They also show that its separability performance is higher than indices of predictive accuracy relying on the likelihood function. A simulated example illustrates the good operating characteristics of our index. In addition, we demonstrate that it is linked to the score statistic and possesses a biologically relevant interpretation. The practical use of the index is illustrated for identifying genes with common effects across eight independent genomic cancer studies of different sample sizes. The meta-selection allows the identification of four genes (ESPL1, KIF4A, HJURP, LRIG1) that are biologically relevant to the carcinogenesis process and have a prognostic impact on survival outcome across various solid tumors. Conclusion The proposed index is a promising tool for identifying factors having a prognostic impact across a collection of heterogeneous genomic datasets of various sizes.
Collapse
Affiliation(s)
- Sigrid Rouam
- Computational and Mathematical Biology, Genome Institute of Singapore, Singapore, Singapore.
| | | | | |
Collapse
|
9
|
Jeffries CD, Ward WO, Perkins DO, Wright FA. Discovering collectively informative descriptors from high-throughput experiments. BMC Bioinformatics 2009; 10:431. [PMID: 20021653 PMCID: PMC2813853 DOI: 10.1186/1471-2105-10-431] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2009] [Accepted: 12/18/2009] [Indexed: 01/07/2023] Open
Abstract
Background Improvements in high-throughput technology and its increasing use have led to the generation of many highly complex datasets that often address similar biological questions. Combining information from these studies can increase the reliability and generalizability of results and also yield new insights that guide future research. Results This paper describes a novel algorithm called BLANKET for symmetric analysis of two experiments that assess informativeness of descriptors. The experiments are required to be related only in that their descriptor sets intersect substantially and their definitions of case and control are consistent. From resulting lists of n descriptors ranked by informativeness, BLANKET determines shortlists of descriptors from each experiment, generally of different lengths p and q. For any pair of shortlists, four numbers are evident: the number of descriptors appearing in both shortlists, in exactly one shortlist, or in neither shortlist. From the associated contingency table, BLANKET computes Right Fisher Exact Test (RFET) values used as scores over a plane of possible pairs of shortlist lengths [1,2]. BLANKET then chooses a pair or pairs with RFET score less than a threshold; the threshold depends upon n and shortlist length limits and represents a quality of intersection achieved by less than 5% of random lists. Conclusions Researchers seek within a universe of descriptors some minimal subset that collectively and efficiently predicts experimental outcomes. Ideally, any smaller subset should be insufficient for reliable prediction and any larger subset should have little additional accuracy. As a method, BLANKET is easy to conceptualize and presents only moderate computational complexity. Many existing databases could be mined using BLANKET to suggest optimal sets of predictive descriptors.
Collapse
Affiliation(s)
- Clark D Jeffries
- Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, NC, USA.
| | | | | | | |
Collapse
|
10
|
Bernard-Pierrot I, Gruel N, Stransky N, Vincent-Salomon A, Reyal F, Raynal V, Vallot C, Pierron G, Radvanyi F, Delattre O. Characterization of the recurrent 8p11-12 amplicon identifies PPAPDC1B, a phosphatase protein, as a new therapeutic target in breast cancer. Cancer Res 2008; 68:7165-75. [PMID: 18757432 DOI: 10.1158/0008-5472.can-08-1360] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The 8p11-12 chromosome region is one of the regions most frequently amplified in breast carcinoma (10-15% of cases). Several genes within this region have been identified as candidate oncogenes, as they are both amplified and overexpressed. However, very few studies have explored the role of these genes in cell transformation, with the aim of identifying valuable therapeutic targets. An analysis of comparative genomic hybridization array and expression profiling data for a series of 152 ductal breast carcinomas and 21 cell lines identified five genes (LSM1, BAG4, DDHD2, PPAPDC1B, and WHSC1L1) within the amplified region as consistently overexpressed due to an increased gene copy number. The use of small interfering RNA to knock down the expression of each of these genes showed the major role played by two genes, PPAPDC1B and WHSC1L1, in regulating the survival and transformation of two different cell lines harboring the 8p amplicon. The role of these two genes in cell survival and cell transformation was also confirmed by long-term knockdown expression studies using short hairpin RNAs. The potential of PPAPDC1B, which encodes a transmembrane phosphatase, as a therapeutic target was further shown by the strong inhibition of growth of breast tumor xenografts displaying 8p11-12 amplification induced by the silencing of PPAPDC1B. The oncogenic properties of PPAPDC1B were further shown by its ability to transform NIH-3T3 fibroblasts, inducing their anchorage-independent growth. Finally, microarray experiments on PPAPDC1B knockdown indicated that this gene interfered with multiple cell signaling pathways, including the Janus-activated kinase-signal transducer and activator of transcription, mitogen-activated protein kinase, and protein kinase C pathways. PPAPDC1B may also potentiate the estrogen receptor pathway by down-regulating DUSP22.
Collapse
|
11
|
Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Stat Appl Genet Mol Biol 2008; 7:Article7. [PMID: 18312212 DOI: 10.2202/1544-6115.1307] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
One application of gene expression arrays is to derive molecular profiles, i.e., sets of genes, which discriminate well between two classes of samples, for example between tumour types. Users are confronted with a multitude of classification methods of varying complexity that can be applied to this task. To help decide which method to use in a given situation, we compare important characteristics of a range of classification methods, including simple univariate filtering, penalised likelihood methods and the random forest. Classification accuracy is an important characteristic, but the biological interpretability of molecular profiles is also important. This implies both parsimony and stability, in the sense that profiles should not vary much when there are slight changes in the training data. We perform a random resampling study to compare these characteristics between the methods and across a range of profile sizes. We measure stability by adopting the Jaccard index to assess the similarity of resampled molecular profiles. We carry out a case study on five well-established cancer microarray data sets, for two of which we have the benefit of being able to validate the results in an independent data set. The study shows that those methods which produce parsimonious profiles generally result in better prediction accuracy than methods which don't include variable selection. For very small profile sizes, the sparse penalised likelihood methods tend to result in more stable profiles than univariate filtering while maintaining similar predictive performance.
Collapse
|
12
|
Nielsen HB, Mundy J, Willenbrock H. Functional Associations by Response Overlap (FARO), a functional genomics approach matching gene expression phenotypes. PLoS One 2007; 2:e676. [PMID: 17668056 PMCID: PMC1924877 DOI: 10.1371/journal.pone.0000676] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2007] [Accepted: 06/21/2007] [Indexed: 01/07/2023] Open
Abstract
The systematic comparison of transcriptional responses of organisms is a powerful tool in functional genomics. For example, mutants may be characterized by comparing their transcript profiles to those obtained in other experiments querying the effects on gene expression of many experimental factors including treatments, mutations and pathogen infections. Similarly, drugs may be discovered by the relationship between the transcript profiles effectuated or impacted by a candidate drug and by the target disease. The integration of such data enables systems biology to predict the interplay between experimental factors affecting a biological system. Unfortunately, direct comparisons of gene expression profiles obtained in independent, publicly available microarray experiments are typically compromised by substantial, experiment-specific biases. Here we suggest a novel yet conceptually simple approach for deriving 'Functional Association(s) by Response Overlap' (FARO) between microarray gene expression studies. The transcriptional response is defined by the set of differentially expressed genes independent from the magnitude or direction of the change. This approach overcomes the limited comparability between studies that is typical for methods that rely on correlation in gene expression. We apply FARO to a compendium of 242 diverse Arabidopsis microarray experimental factors, including phyto-hormones, stresses and pathogens, growth conditions/stages, tissue types and mutants. We also use FARO to confirm and further delineate the functions of Arabidopsis MAP kinase 4 in disease and stress responses. Furthermore, we find that a large, well-defined set of genes responds in opposing directions to different stress conditions and predict the effects of different stress combinations. This demonstrates the usefulness of our approach for exploiting public microarray data to derive biologically meaningful associations between experimental factors. Finally, our results indicate that FARO is more powerful in associating mutants in common pathways than existing methods such as co-expression analysis.
Collapse
Affiliation(s)
- Henrik Bjørn Nielsen
- Center for Biological Sequence Analysis, BioCentrum, Technical University of Denmark, Kongens Lyngby, Denmark.
| | | | | |
Collapse
|