201
|
Amar D, Hait T, Izraeli S, Shamir R. Integrated analysis of numerous heterogeneous gene expression profiles for detecting robust disease-specific biomarkers and proposing drug targets. Nucleic Acids Res 2015; 43:7779-89. [PMID: 26261215 PMCID: PMC4652780 DOI: 10.1093/nar/gkv810] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2015] [Revised: 07/23/2015] [Accepted: 07/29/2015] [Indexed: 12/18/2022] Open
Abstract
Genome-wide expression profiling has revolutionized biomedical research; vast amounts of expression data from numerous studies of many diseases are now available. Making the best use of this resource in order to better understand disease processes and treatment remains an open challenge. In particular, disease biomarkers detected in case-control studies suffer from low reliability and are only weakly reproducible. Here, we present a systematic integrative analysis methodology to overcome these shortcomings. We assembled and manually curated more than 14,000 expression profiles spanning 48 diseases and 18 expression platforms. We show that when studying a particular disease, judicious utilization of profiles from other diseases and information on disease hierarchy improves classification quality, avoids overoptimistic evaluation of that quality, and enhances disease-specific biomarker discovery. This approach yielded specific biomarkers for 24 of the analyzed diseases. We demonstrate how to combine these biomarkers with large-scale interaction, mutation and drug target data, forming a highly valuable disease summary that suggests novel directions in disease understanding and drug repurposing. Our analysis also estimates the number of samples required to reach a desired level of biomarker stability. This methodology can greatly improve the exploitation of the mountain of expression profiles for better disease analysis.
Collapse
Affiliation(s)
- David Amar
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tom Hait
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Shai Izraeli
- Department of Pediatric Hematology-Oncology, Safra Children's Hospital, Sheba Medical Center, Tel Hashomer, Ramat Gan 52620, Israel Sackler School of Medicine, Tel-Aviv University, Tel Aviv 69978, Israel
| | - Ron Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
202
|
Parallel molecular routes to cold adaptation in eight genera of New Zealand stick insects. Sci Rep 2015; 5:13965. [PMID: 26355841 PMCID: PMC4564816 DOI: 10.1038/srep13965] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 08/12/2015] [Indexed: 12/20/2022] Open
Abstract
The acquisition of physiological strategies to tolerate novel thermal conditions allows organisms to exploit new environments. As a result, thermal tolerance is a key determinant of the global distribution of biodiversity, yet the constraints on its evolution are not well understood. Here we investigate parallel evolution of cold tolerance in New Zealand stick insects, an endemic radiation containing three montane-occurring species. Using a phylogeny constructed from 274 orthologous genes, we show that stick insects have independently colonized montane environments at least twice. We compare supercooling point and survival of internal ice formation among ten species from eight genera, and identify both freeze tolerance and freeze avoidance in separate montane lineages. Freeze tolerance is also verified in both lowland and montane populations of a single, geographically widespread, species. Transcriptome sequencing following cold shock identifies a set of structural cuticular genes that are both differentially regulated and under positive sequence selection in each species. However, while cuticular proteins in general are associated with cold shock across the phylogeny, the specific genes at play differ among species. Thus, while processes related to cuticular structure are consistently associated with adaptation for cold, this may not be the consequence of shared ancestral genetic constraints.
Collapse
|
203
|
Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery. MICROARRAYS 2015; 4:389-406. [PMID: 27600230 PMCID: PMC4996376 DOI: 10.3390/microarrays4030389] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Revised: 08/16/2015] [Accepted: 08/17/2015] [Indexed: 01/24/2023]
Abstract
The diagnostic and prognostic potential of the vast quantity of publicly-available microarray data has driven the development of methods for integrating the data from different microarray platforms. Cross-platform integration, when appropriately implemented, has been shown to improve reproducibility and robustness of gene signature biomarkers. Microarray platform integration can be conceptually divided into approaches that perform early stage integration (cross-platform normalization) versus late stage data integration (meta-analysis). A growing number of statistical methods and associated software for platform integration are available to the user, however an understanding of their comparative performance and potential pitfalls is critical for best implementation. In this review we provide evidence-based, practical guidance to researchers performing cross-platform integration, particularly with an objective to discover biomarkers.
Collapse
|
204
|
K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data. BIOMED RESEARCH INTERNATIONAL 2015; 2015:918954. [PMID: 26339652 PMCID: PMC4538770 DOI: 10.1155/2015/918954] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 12/18/2014] [Indexed: 01/23/2023]
Abstract
With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden patterns and understanding the massive data is the application of clustering techniques. Nonlinear relations, which were mostly unutilized in contrast to linear correlations, are prevalent in high-throughput data. In many cases, nonlinear relations can model the biological relationship more precisely and reflect critical patterns in the biological systems. Using the general dependency measure, Distance Based on Conditional Ordered List (DCOL) that we introduced before, we designed the nonlinear K-profiles clustering method, which can be seen as the nonlinear counterpart of the K-means clustering algorithm. The method has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed that K-profiles clustering not only outperformed traditional linear K-means algorithm, but also presented significantly better performance over our previous General Dependency Hierarchical Clustering (GDHC) algorithm. We further analyzed a gene expression dataset, on which K-profile clustering generated biologically meaningful results.
Collapse
|
205
|
Martinez-Ledesma E, Verhaak RGW, Treviño V. Identification of a multi-cancer gene expression biomarker for cancer clinical outcomes using a network-based algorithm. Sci Rep 2015. [PMID: 26202601 PMCID: PMC5378879 DOI: 10.1038/srep11966] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Cancer types are commonly classified by histopathology and more recently through molecular characteristics such as gene expression, mutations, copy number variations, and epigenetic alterations. These molecular characterizations have led to the proposal of prognostic biomarkers for many cancer types. Nevertheless, most of these biomarkers have been proposed for a specific cancer type or even specific subtypes. Although more challenging, it is useful to identify biomarkers that can be applied for multiple types of cancer. Here, we have used a network-based exploration approach to identify a multi-cancer gene expression biomarker highly connected by ESR1, PRKACA, LRP1, JUN and SMAD2 that can be predictive of clinical outcome in 12 types of cancer from The Cancer Genome Atlas (TCGA) repository. The gene signature of this biomarker is highly supported by cancer literature, biological terms, and prognostic power in other cancer types. Additionally, the signature does not seem to be highly associated with specific mutations or copy number alterations. Comparisons with cancer-type specific and other multi-cancer biomarkers in TCGA and other datasets showed that the performance of the proposed multi-cancer biomarker is superior, making the proposed approach and multi-cancer biomarker potentially useful in research and clinical settings.
Collapse
Affiliation(s)
- Emmanuel Martinez-Ledesma
- 1] Grupo de Enfoque e Investigación en Bioinformática, Departamento de Investigación e Innovación, Escuela Nacional de Medicina, Tecnológico de Monterrey, Monterrey, Nuevo León 64849, México [2] Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Roeland G W Verhaak
- 1] Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA [2] Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Victor Treviño
- Grupo de Enfoque e Investigación en Bioinformática, Departamento de Investigación e Innovación, Escuela Nacional de Medicina, Tecnológico de Monterrey, Monterrey, Nuevo León 64849, México
| |
Collapse
|
206
|
Wang X, Ning Y, Guo X. Integrative meta-analysis of differentially expressed genes in osteoarthritis using microarray technology. Mol Med Rep 2015; 12:3439-3445. [PMID: 25975828 PMCID: PMC4526045 DOI: 10.3892/mmr.2015.3790] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2014] [Accepted: 04/22/2015] [Indexed: 01/15/2023] Open
Abstract
The aim of the present study was to identify differentially expressed (DE) genes in patients with osteoarthritis (OA), and biological processes associated with changes in gene expression that occur in this disease. Using the INMEX (integrative meta-analysis of expression data) software tool, a meta-analysis of publicly available microarray Gene Expression Omnibus (GEO) datasets of OA was performed. Gene ontology (GO) enrichment analysis was performed in order to detect enriched functional attributes based on gene-associated GO terms. Three GEO datasets, containing 137 patients with OA and 52 healthy controls, were included in the meta-analysis. The analysis identified 85 genes that were consistently differentially expressed in OA (30 genes were upregulated and 55 genes were downregulated). The upregulated gene with the lowest P-value (P=5.36E-07) was S-phase kinase-associated protein 2, E3 ubiquitin protein ligase (SKP2). The downregulated gene with the lowest P-value (P=4.42E-09) was Proline rich 5 like (PRR5L). Among the 210 GO terms that were associated with the set of DE genes, the most significant two enrichments were observed in the GO categories of 'Immune response', with a P-value of 0.000129438, and 'Immune effectors process', with a P-value of 0.000288619. The current meta-analysis identified genes that were consistently DE in OA, in addition to biological pathways associated with changes in gene expression that occur during OA, which may provide insight into the molecular mechanisms underlying the pathogenesis of this disease.
Collapse
Affiliation(s)
- Xi Wang
- School of Public Health, Xi'an Jiaotong University Health Science Center, Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, Xi'an, Shaanxi 710061, P.R. China
| | - Yujie Ning
- School of Public Health, Xi'an Jiaotong University Health Science Center, Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, Xi'an, Shaanxi 710061, P.R. China
| | - Xiong Guo
- School of Public Health, Xi'an Jiaotong University Health Science Center, Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, Xi'an, Shaanxi 710061, P.R. China
| |
Collapse
|
207
|
Shaar-Moshe L, Hübner S, Peleg Z. Identification of conserved drought-adaptive genes using a cross-species meta-analysis approach. BMC PLANT BIOLOGY 2015; 15:111. [PMID: 25935420 PMCID: PMC4417316 DOI: 10.1186/s12870-015-0493-6] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 04/16/2015] [Indexed: 05/22/2023]
Abstract
BACKGROUND Drought is the major environmental stress threatening crop-plant productivity worldwide. Identification of new genes and metabolic pathways involved in plant adaptation to progressive drought stress at the reproductive stage is of great interest for agricultural research. RESULTS We developed a novel Cross-Species meta-Analysis of progressive Drought stress at the reproductive stage (CSA:Drought) to identify key drought adaptive genes and mechanisms and to test their evolutionary conservation. Empirically defined filtering criteria were used to facilitate a robust integration of 17 deposited microarray experiments (148 arrays) of Arabidopsis, rice, wheat and barley. By prioritizing consistency over intensity, our approach was able to identify 225 differentially expressed genes shared across studies and taxa. Gene ontology enrichment and pathway analyses classified the shared genes into functional categories involved predominantly in metabolic processes (e.g. amino acid and carbohydrate metabolism), regulatory function (e.g. protein degradation and transcription) and response to stimulus. We further investigated drought related cis-acting elements in the shared gene promoters, and the evolutionary conservation of shared genes. The universal nature of the identified drought-adaptive genes was further validated in a fifth species, Brachypodium distachyon that was not included in the meta-analysis. qPCR analysis of 27, randomly selected, shared orthologs showed similar expression pattern as was found by the CSA:Drought.In accordance, morpho-physiological characterization of progressive drought stress, in B. distachyon, highlighted the key role of osmotic adjustment as evolutionary conserved drought-adaptive mechanism. CONCLUSIONS Our CSA:Drought strategy highlights major drought-adaptive genes and metabolic pathways that were only partially, if at all, reported in the original studies included in the meta-analysis. These genes include a group of unclassified genes that could be involved in novel drought adaptation mechanisms. The identified shared genes can provide a useful resource for subsequent research to better understand the mechanisms involved in drought adaptation across-species and can serve as a potential set of molecular biomarkers for progressive drought experiments.
Collapse
Affiliation(s)
- Lidor Shaar-Moshe
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, The Hebrew University of Jerusalem, Rehovot, 7610001, Israel.
| | - Sariel Hübner
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, The Hebrew University of Jerusalem, Rehovot, 7610001, Israel.
- Present address: Department of Botany, University of British Columbia, Vancouver, BC, Canada.
| | - Zvi Peleg
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, The Hebrew University of Jerusalem, Rehovot, 7610001, Israel.
| |
Collapse
|
208
|
Wang RS, Maron BA, Loscalzo J. Systems medicine: evolution of systems biology from bench to bedside. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2015; 7:141-61. [PMID: 25891169 DOI: 10.1002/wsbm.1297] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Revised: 03/04/2015] [Accepted: 03/06/2015] [Indexed: 12/11/2022]
Abstract
High-throughput experimental techniques for generating genomes, transcriptomes, proteomes, metabolomes, and interactomes have provided unprecedented opportunities to interrogate biological systems and human diseases on a global level. Systems biology integrates the mass of heterogeneous high-throughput data and predictive computational modeling to understand biological functions as system-level properties. Most human diseases are biological states caused by multiple components of perturbed pathways and regulatory networks rather than individual failing components. Systems biology not only facilitates basic biological research but also provides new avenues through which to understand human diseases, identify diagnostic biomarkers, and develop disease treatments. At the same time, systems biology seeks to assist in drug discovery, drug optimization, drug combinations, and drug repositioning by investigating the molecular mechanisms of action of drugs at a system's level. Indeed, systems biology is evolving to systems medicine as a new discipline that aims to offer new approaches for addressing the diagnosis and treatment of major human diseases uniquely, effectively, and with personalized precision.
Collapse
Affiliation(s)
- Rui-Sheng Wang
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Bradley A Maron
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.,Department of Cardiology, Veterans Affairs Boston Healthcare System, West Roxbury, MA, USA
| | - Joseph Loscalzo
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
209
|
Roux J, Rosikiewicz M, Robinson-Rechavi M. What to compare and how: Comparative transcriptomics for Evo-Devo. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2015; 324:372-82. [PMID: 25864439 PMCID: PMC4949521 DOI: 10.1002/jez.b.22618] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 02/19/2015] [Indexed: 12/30/2022]
Abstract
Evolutionary developmental biology has grown historically from the capacity to relate patterns of evolution in anatomy to patterns of evolution of expression of specific genes, whether between very distantly related species, or very closely related species or populations. Scaling up such studies by taking advantage of modern transcriptomics brings promising improvements, allowing us to estimate the overall impact and molecular mechanisms of convergence, constraint or innovation in anatomy and development. But it also presents major challenges, including the computational definitions of anatomical homology and of organ function, the criteria for the comparison of developmental stages, the annotation of transcriptomics data to proper anatomical and developmental terms, and the statistical methods to compare transcriptomic data between species to highlight significant conservation or changes. In this article, we review these challenges, and the ongoing efforts to address them, which are emerging from bioinformatics work on ontologies, evolutionary statistics, and data curation, with a focus on their implementation in the context of the development of our database Bgee (http://bgee.org). J. Exp. Zool. (Mol. Dev. Evol.) 324B: 372–382, 2015. © 2015 The Authors. J. Exp. Zool. (Mol. Dev. Evol.) published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Julien Roux
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Human Genetics, University of Chicago, Chicago, Illinois
| | - Marta Rosikiewicz
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
210
|
Insel PA, Wilderman A, Zambon AC, Snead AN, Murray F, Aroonsakool N, McDonald DS, Zhou S, McCann T, Zhang L, Sriram K, Chinn AM, Michkov AV, Lynch RM, Overland AC, Corriden R. G Protein-Coupled Receptor (GPCR) Expression in Native Cells: "Novel" endoGPCRs as Physiologic Regulators and Therapeutic Targets. Mol Pharmacol 2015; 88:181-7. [PMID: 25737495 DOI: 10.1124/mol.115.098129] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 03/02/2015] [Indexed: 12/24/2022] Open
Abstract
G protein-coupled receptors (GPCRs), the largest family of signaling receptors in the human genome, are also the largest class of targets of approved drugs. Are the optimal GPCRs (in terms of efficacy and safety) currently targeted therapeutically? Especially given the large number (∼ 120) of orphan GPCRs (which lack known physiologic agonists), it is likely that previously unrecognized GPCRs, especially orphan receptors, regulate cell function and can be therapeutic targets. Knowledge is limited regarding the diversity and identity of GPCRs that are activated by endogenous ligands and that native cells express. Here, we review approaches to define GPCR expression in tissues and cells and results from studies using these approaches. We identify problems with the available data and suggest future ways to identify and validate the physiologic and therapeutic roles of previously unrecognized GPCRs. We propose that a particularly useful approach to identify functionally important GPCRs with therapeutic potential will be to focus on receptors that show selective increases in expression in diseased cells from patients and experimental animals.
Collapse
Affiliation(s)
- Paul A Insel
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Andrea Wilderman
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Alexander C Zambon
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Aaron N Snead
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Fiona Murray
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Nakon Aroonsakool
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Daniel S McDonald
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Shu Zhou
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Thalia McCann
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Lingzhi Zhang
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Krishna Sriram
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Amy M Chinn
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Alexander V Michkov
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Rebecca M Lynch
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Aaron C Overland
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| | - Ross Corriden
- Departments of Pharmacology (P.A.I., A.W., A.C.Z., A.N.S., N.A., D.S.M., S.Z., T.M., L.Z., K.S., A.M.C., A.V.M., R.M.L., A.C.O., R.C.) and Medicine (P.A.I., F.M.), University of California, San Diego, La Jolla, California
| |
Collapse
|
211
|
Marakhonov A, Sadovskaya N, Antonov I, Baranova A, Skoblov M. Analysis of discordant Affymetrix probesets casts serious doubt on idea of microarray data reutilization. BMC Genomics 2014; 15 Suppl 12:S8. [PMID: 25563078 PMCID: PMC4303952 DOI: 10.1186/1471-2164-15-s12-s8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Background Affymetrix microarray technology allows one to investigate expression of thousands of genes simultaneously upon a variety of conditions. In a popular U133A microarray platform, the expression of 37% of genes is measured by more than one probeset. The discordant expression observed for two different probesets that match the same gene is a widespread phenomenon which is usually underestimated, ignored or disregarded. Results Here we evaluate the prevalence of discordant expression in data collected using Affymetrix HG-U133A microarray platform. In U133A, about 30% of genes annotated by two different probesets demonstrate a substantial correlation between independently measured expression values. To our surprise, sorting the probesets according to the nature of the discrepancy in their expression levels allowed the classification of the respective genes according to their fundamental functional properties, including observed enrichment by tissue-specific transcripts and alternatively spliced variants. On another hand, an absence of discrepancies in probesets that simultaneously match several different genes allowed us to pinpoint non-expressed pseudogenes and gene groups with highly correlated expression patterns. Nevertheless, in many cases, the nature of discordant expression of two probesets that match the same transcript remains unexplained. It is possible that these probesets report differently regulated sets of transcripts, or, in best case scenario, two different sets of transcripts that represent the same gene. Conclusion The majority of absolute gene expression values collected using Affymetrix microarrays may not be suitable for typical interpretative downstream analysis.
Collapse
|
212
|
Cahan P, Li H, Morris SA, Lummertz da Rocha E, Daley GQ, Collins JJ. CellNet: network biology applied to stem cell engineering. Cell 2014; 158:903-915. [PMID: 25126793 DOI: 10.1016/j.cell.2014.07.020] [Citation(s) in RCA: 377] [Impact Index Per Article: 37.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Revised: 05/28/2014] [Accepted: 07/17/2014] [Indexed: 02/07/2023]
Abstract
Somatic cell reprogramming, directed differentiation of pluripotent stem cells, and direct conversions between differentiated cell lineages represent powerful approaches to engineer cells for research and regenerative medicine. We have developed CellNet, a network biology platform that more accurately assesses the fidelity of cellular engineering than existing methodologies and generates hypotheses for improving cell derivations. Analyzing expression data from 56 published reports, we found that cells derived via directed differentiation more closely resemble their in vivo counterparts than products of direct conversion, as reflected by the establishment of target cell-type gene regulatory networks (GRNs). Furthermore, we discovered that directly converted cells fail to adequately silence expression programs of the starting population and that the establishment of unintended GRNs is common to virtually every cellular engineering paradigm. CellNet provides a platform for quantifying how closely engineered cell populations resemble their target cell type and a rational strategy to guide enhanced cellular engineering.
Collapse
Affiliation(s)
- Patrick Cahan
- Stem Cell Transplantation Program, Division of Pediatric Hematology and Oncology, Manton Center for Orphan Disease Research, Howard Hughes Medical Institute, Boston Children's Hospital and Dana Farber Cancer Institute, Boston, MA 02115, USA; Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA; Harvard Stem Cell Institute, Cambridge, MA 02138, USA
| | - Hu Li
- Center for Individualized Medicine, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN 55905, USA
| | - Samantha A Morris
- Stem Cell Transplantation Program, Division of Pediatric Hematology and Oncology, Manton Center for Orphan Disease Research, Howard Hughes Medical Institute, Boston Children's Hospital and Dana Farber Cancer Institute, Boston, MA 02115, USA; Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA; Harvard Stem Cell Institute, Cambridge, MA 02138, USA
| | - Edroaldo Lummertz da Rocha
- Howard Hughes Medical Institute, Department of Biomedical Engineering and Center of Synthetic Biology, Boston University, Boston, MA 02215, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Graduate Program in Materials Science and Engineering, Federal University of Santa Catarina, 88040-900 Florianópolis, Brazil
| | - George Q Daley
- Stem Cell Transplantation Program, Division of Pediatric Hematology and Oncology, Manton Center for Orphan Disease Research, Howard Hughes Medical Institute, Boston Children's Hospital and Dana Farber Cancer Institute, Boston, MA 02115, USA; Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA; Harvard Stem Cell Institute, Cambridge, MA 02138, USA.
| | - James J Collins
- Howard Hughes Medical Institute, Department of Biomedical Engineering and Center of Synthetic Biology, Boston University, Boston, MA 02215, USA.
| |
Collapse
|
213
|
Faisal A, Peltonen J, Georgii E, Rung J, Kaski S. Toward computational cumulative biology by combining models of biological datasets. PLoS One 2014; 9:e113053. [PMID: 25427176 PMCID: PMC4245117 DOI: 10.1371/journal.pone.0113053] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Accepted: 10/17/2014] [Indexed: 11/21/2022] Open
Abstract
A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations—for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.
Collapse
Affiliation(s)
- Ali Faisal
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland
| | - Jaakko Peltonen
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland
| | - Elisabeth Georgii
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland
| | - Johan Rung
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Samuel Kaski
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
- * E-mail:
| |
Collapse
|
214
|
Okamura Y, Aoki Y, Obayashi T, Tadaka S, Ito S, Narise T, Kinoshita K. COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems. Nucleic Acids Res 2014; 43:D82-6. [PMID: 25392420 PMCID: PMC4383961 DOI: 10.1093/nar/gku1163] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
The COXPRESdb (http://coxpresdb.jp) provides gene coexpression relationships for animal species. Here, we report the updates of the database, mainly focusing on the following two points. For the first point, we added RNAseq-based gene coexpression data for three species (human, mouse and fly), and largely increased the number of microarray experiments to nine species. The increase of the number of expression data with multiple platforms could enhance the reliability of coexpression data. For the second point, we refined the data assessment procedures, for each coexpressed gene list and for the total performance of a platform. The assessment of coexpressed gene list now uses more reasonable P-values derived from platform-specific null distribution. These developments greatly reduced pseudo-predictions for directly associated genes, thus expanding the reliability of coexpression data to design new experiments and to discuss experimental results.
Collapse
Affiliation(s)
- Yasunobu Okamura
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Yuichi Aoki
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Takeshi Obayashi
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Shu Tadaka
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Satoshi Ito
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Takafumi Narise
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Kengo Kinoshita
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan Institute of Development, Aging, and Cancer, Tohoku University, Sendai 980-8575, Japan Tohoku Medical Megabank Organization, Tohoku University, Sendai 980-8573, Japan
| |
Collapse
|
215
|
Jazayeri SM, Melgarejo-Muñoz LM, Romero HM. RNA-SEQ: A GLANCE AT TECHNOLOGIES AND METHODOLOGIES. ACTA BIOLÓGICA COLOMBIANA 2014. [DOI: 10.15446/abc.v20n2.43639] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
|
216
|
Kolesnikov N, Hastings E, Keays M, Melnichuk O, Tang YA, Williams E, Dylag M, Kurbatova N, Brandizi M, Burdett T, Megy K, Pilicheva E, Rustici G, Tikhonov A, Parkinson H, Petryszak R, Sarkans U, Brazma A. ArrayExpress update--simplifying data submissions. Nucleic Acids Res 2014; 43:D1113-6. [PMID: 25361974 PMCID: PMC4383899 DOI: 10.1093/nar/gku1057] [Citation(s) in RCA: 499] [Impact Index Per Article: 49.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is an international functional genomics database at the European Bioinformatics Institute (EMBL-EBI) recommended by most journals as a repository for data supporting peer-reviewed publications. It contains data from over 7000 public sequencing and 42 000 array-based studies comprising over 1.5 million assays in total. The proportion of sequencing-based submissions has grown significantly over the last few years and has doubled in the last 18 months, whilst the rate of microarray submissions is growing slightly. All data in ArrayExpress are available in the MAGE-TAB format, which allows robust linking to data analysis and visualization tools and standardized analysis. The main development over the last two years has been the release of a new data submission tool Annotare, which has reduced the average submission time almost 3-fold. In the near future, Annotare will become the only submission route into ArrayExpress, alongside MAGE-TAB format-based pipelines. ArrayExpress is a stable and highly accessed resource. Our future tasks include automation of data flows and further integration with other EMBL-EBI resources for the representation of multi-omics data.
Collapse
Affiliation(s)
- Nikolay Kolesnikov
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Emma Hastings
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Maria Keays
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Olga Melnichuk
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Y Amy Tang
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Eleanor Williams
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Miroslaw Dylag
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Natalja Kurbatova
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Marco Brandizi
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Karyn Megy
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Ekaterina Pilicheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Gabriella Rustici
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK School of Biological Sciences, Cambridge Systems Biology Centre, Tennis Court Road, Cambridge, CB2 1QR, UK
| | - Andrew Tikhonov
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Robert Petryszak
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Ugis Sarkans
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| |
Collapse
|
217
|
Agarwal P, Parida SK, Mahto A, Das S, Mathew IE, Malik N, Tyagi AK. Expanding frontiers in plant transcriptomics in aid of functional genomics and molecular breeding. Biotechnol J 2014; 9:1480-92. [PMID: 25349922 DOI: 10.1002/biot.201400063] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Revised: 09/02/2014] [Accepted: 10/01/2014] [Indexed: 12/30/2022]
Abstract
The transcript pool of a plant part, under any given condition, is a collection of mRNAs that will pave the way for a biochemical reaction of the plant to stimuli. Over the past decades, transcriptome study has advanced from Northern blotting to RNA sequencing (RNA-seq), through other techniques, of which real-time quantitative polymerase chain reaction (PCR) and microarray are the most significant ones. The questions being addressed by such studies have also matured from a solitary process to expression atlas and marker-assisted genetic enhancement. Not only genes and their networks involved in various developmental processes of plant parts have been elucidated, but also stress tolerant genes have been highlighted. The transcriptome of a plant with altered expression of a target gene has given information about the downstream genes. Marker information has been used for breeding improved varieties. Fortunately, the data generated by transcriptome analysis has been made freely available for ample utilization and comparison. The review discusses this wide variety of transcriptome data being generated in plants, which includes developmental stages, abiotic and biotic stress, effect of altered gene expression, as well as comparative transcriptomics, with a special emphasis on microarray and RNA-seq. Such data can be used to determine the regulatory gene networks, which can subsequently be utilized for generating improved plant varieties.
Collapse
Affiliation(s)
- Pinky Agarwal
- National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, India
| | | | | | | | | | | | | |
Collapse
|
218
|
Wang HQ, Zheng CH, Zhao XM. jNMFMA: a joint non-negative matrix factorization meta-analysis of transcriptomics data. Bioinformatics 2014; 31:572-80. [PMID: 25411328 DOI: 10.1093/bioinformatics/btu679] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Tremendous amount of omics data being accumulated poses a pressing challenge of meta-analyzing the heterogeneous data for mining new biological knowledge. Most existing methods deal with each gene independently, thus often resulting in high false positive rates in detecting differentially expressed genes (DEG). To our knowledge, no or little effort has been devoted to methods that consider dependence structures underlying transcriptomics data for DEG identification in meta-analysis context. RESULTS This article proposes a new meta-analysis method for identification of DEGs based on joint non-negative matrix factorization (jNMFMA). We mathematically extend non-negative matrix factorization (NMF) to a joint version (jNMF), which is used to simultaneously decompose multiple transcriptomics data matrices into one common submatrix plus multiple individual submatrices. By the jNMF, the dependence structures underlying transcriptomics data can be interrogated and utilized, while the high-dimensional transcriptomics data are mapped into a low-dimensional space spanned by metagenes that represent hidden biological signals. jNMFMA finally identifies DEGs as genes that are associated with differentially expressed metagenes. The ability of extracting dependence structures makes jNMFMA more efficient and robust to identify DEGs in meta-analysis context. Furthermore, jNMFMA is also flexible to identify DEGs that are consistent among various types of omics data, e.g. gene expression and DNA methylation. Experimental results on both simulation data and real-world cancer data demonstrate the effectiveness of jNMFMA and its superior performance over other popular approaches. AVAILABILITY AND IMPLEMENTATION R code for jNMFMA is available for non-commercial use via http://micblab.iim.ac.cn/Download/. CONTACT hqwang@ustc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hong-Qiang Wang
- Machine Intelligence and Computational Biology Lab, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei 230031, China, College of Electrical Engineering and Automation, Anhui University, Hefei 230031, China and Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Chun-Hou Zheng
- Machine Intelligence and Computational Biology Lab, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei 230031, China, College of Electrical Engineering and Automation, Anhui University, Hefei 230031, China and Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Xing-Ming Zhao
- Machine Intelligence and Computational Biology Lab, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei 230031, China, College of Electrical Engineering and Automation, Anhui University, Hefei 230031, China and Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| |
Collapse
|
219
|
Casanova JL, Conley ME, Seligman SJ, Abel L, Notarangelo LD. Guidelines for genetic studies in single patients: lessons from primary immunodeficiencies. ACTA ACUST UNITED AC 2014; 211:2137-49. [PMID: 25311508 PMCID: PMC4203950 DOI: 10.1084/jem.20140520] [Citation(s) in RCA: 180] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Casanova and colleagues discuss the importance of single-patient genetic studies in the discovery of novel primary immunodeficiencies and offer insight into the standards and criteria that should accompany these studies. Can genetic and clinical findings made in a single patient be considered sufficient to establish a causal relationship between genotype and phenotype? We report that up to 49 of the 232 monogenic etiologies (21%) of human primary immunodeficiencies (PIDs) were initially reported in single patients. The ability to incriminate single-gene inborn errors in immunodeficient patients results from the relative ease in validating the disease-causing role of the genotype by in-depth mechanistic studies demonstrating the structural and functional consequences of the mutations using blood samples. The candidate genotype can be causally connected to a clinical phenotype using cellular (leukocytes) or molecular (plasma) substrates. The recent advent of next generation sequencing (NGS), with whole exome and whole genome sequencing, induced pluripotent stem cell (iPSC) technology, and gene editing technologies—including in particular the clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 technology—offer new and exciting possibilities for the genetic exploration of single patients not only in hematology and immunology but also in other fields. We propose three criteria for deciding if the clinical and experimental data suffice to establish a causal relationship based on only one case. The patient’s candidate genotype must not occur in individuals without the clinical phenotype. Experimental studies must indicate that the genetic variant impairs, destroys, or alters the expression or function of the gene product (or two genetic variants for compound heterozygosity). The causal relationship between the candidate genotype and the clinical phenotype must be confirmed via a relevant cellular phenotype, or by default via a relevant animal phenotype. When supported by satisfaction of rigorous criteria, the report of single patient–based discovery of Mendelian disorders should be encouraged, as it can provide the first step in the understanding of a group of human diseases, thereby revealing crucial pathways underlying physiological and pathological processes.
Collapse
Affiliation(s)
- Jean-Laurent Casanova
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065 Howard Hughes Medical Institute, New York, NY 10065 Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Necker Hospital for Sick Children, 75015 Paris, France Paris Descartes University, Imagine Institute, 75015 Paris, France Pediatric Hematology-Immunology Unit, Necker Hospital for Sick Children, 75015 Paris, France
| | - Mary Ellen Conley
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
| | - Stephen J Seligman
- Department of Microbiology and Immunology, New York Medical College, Valhalla, NY 10595
| | - Laurent Abel
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065 Howard Hughes Medical Institute, New York, NY 10065 Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Necker Hospital for Sick Children, 75015 Paris, France Paris Descartes University, Imagine Institute, 75015 Paris, France
| | - Luigi D Notarangelo
- Division of Immunology, Boston Children's Hospital, Boston, MA 02115 Department of Pediatrics and Pathology, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
220
|
|
221
|
Jin L, Tu J, Jia J, An W, Tan H, Cui Q, Li Z. Drug-repurposing identified the combination of Trolox C and Cytisine for the treatment of type 2 diabetes. J Transl Med 2014; 12:153. [PMID: 24885253 PMCID: PMC4047784 DOI: 10.1186/1479-5876-12-153] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2014] [Accepted: 05/27/2014] [Indexed: 11/22/2022] Open
Abstract
Background Drug-induced gene expression dataset (for example Connectivity Map, CMap) represent a valuable resource for drug-repurposing, a class of methods for identifying novel indications for approved drugs. Recently, CMap-based methods have successfully applied to identifying drugs for a number of diseases. However, currently few gene expression based methods are available for the repurposing of combined drugs. Increasing evidence has shown that the combination of drugs may valid for novel indications. Method Here, for this purpose, we presented a simple CMap-based scoring system to predict novel indications for the combination of two drugs. We then confirmed the effectiveness of the predicted drug combination in an animal model of type 2 diabetes. Results We applied the presented scoring system to type 2 diabetes and identified a candidate combination of two drugs, Trolox C and Cytisine. Finally, we confirmed that the predicted combined drugs are effective for the treatment of type 2 diabetes. Conclusion The presented scoring system represents one novel method for drug repurposing, which would provide helps for greatly extended the space of drugs.
Collapse
Affiliation(s)
| | | | | | | | - Huanran Tan
- Department of Pharmacology, Peking University Health Science Center, Beijing 100191, China.
| | | | | |
Collapse
|
222
|
Kohonen P, Ceder R, Smit I, Hongisto V, Myatt G, Hardy B, Spjuth O, Grafström R. Cancer biology, toxicology and alternative methods development go hand-in-hand. Basic Clin Pharmacol Toxicol 2014; 115:50-8. [PMID: 24779563 DOI: 10.1111/bcpt.12257] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 04/21/2014] [Indexed: 12/13/2022]
Abstract
Toxicological research faces the challenge of integrating knowledge from diverse fields and novel technological developments generally in the biological and medical sciences. We discuss herein the fact that the multiple facets of cancer research, including discovery related to mechanisms, treatment and diagnosis, overlap many up and coming interest areas in toxicology, including the need for improved methods and analysis tools. Common to both disciplines, in vitro and in silico methods serve as alternative investigation routes to animal studies. Knowledge on cancer development helps in understanding the relevance of chemical toxicity studies in cell models, and many bioinformatics-based cancer biomarker discovery tools are also applicable to computational toxicology. Robotics-aided, cell-based, high-throughput screening, microscale immunostaining techniques and gene expression profiling analyses are common tools in cancer research, and when sequentially combined, form a tiered approach to structured safety evaluation of thousands of environmental agents, novel chemicals or engineered nanomaterials. Comprehensive tumour data collections in databases have been translated into clinically useful data, and this concept serves as template for computer-driven evaluation of toxicity data into meaningful results. Future 'cancer research-inspired knowledge management' of toxicological data will aid the translation of basic discovery results and chemicals- and materials-testing data to information relevant to human health and environmental safety.
Collapse
Affiliation(s)
- Pekka Kohonen
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | | | | | | | | | | | | | | |
Collapse
|
223
|
Immonen E, Snook RR, Ritchie MG. Mating system variation drives rapid evolution of the female transcriptome in Drosophila pseudoobscura. Ecol Evol 2014; 4:2186-201. [PMID: 25360260 PMCID: PMC4201433 DOI: 10.1002/ece3.1098] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 04/08/2014] [Indexed: 12/31/2022] Open
Abstract
Interactions between the sexes are believed to be a potent source of selection on sex-specific evolution. The way in which sexual interactions influence male investment is much studied, but effects on females are more poorly understood. To address this deficiency, we examined gene expression in virgin female Drosophila pseudoobscura following 100 generations of mating system manipulations in which we either elevated polyandry or enforced monandry. Gene expression evolution following mating system manipulation resulted in 14% of the transcriptome of virgin females being altered. Polyandrous females elevated expression of a greater number of genes normally enriched in ovaries and associated with mitosis and meiosis, which might reflect female investment into reproductive functions. Monandrous females showed a greater number of genes normally enriched for expression in somatic tissues, including the head and gut and associated with visual perception and metabolism, respectively. By comparing our data with a previous study of sex differences in gene expression in this species, we found that the majority of the genes that are differentially expressed between females of the selection treatments show female-biased expression in the wild-type population. A striking exception is genes associated with male-specific reproductive tissues (in D. melanogaster), which are upregulated in polyandrous females. Our results provide experimental evidence for a role of sex-specific selection arising from differing sexual interactions with males in promoting rapid evolution of the female transcriptome.
Collapse
Affiliation(s)
- Elina Immonen
- School of Biology, University of St Andrews Dyers Brae House, St Andrews, Fife, KY16 9TH, U.K ; Department of Ecology and Genetics (Animal Ecology), Evolutionary Biology Centre, Uppsala University Norbyvägen 18 D, Uppsala, 752 36, Sweden
| | - Rhonda R Snook
- Animal & Plant Sciences, University of Sheffield Alfred Denny Building, Sheffield, S10 2TN, U.K
| | - Michael G Ritchie
- School of Biology, University of St Andrews Dyers Brae House, St Andrews, Fife, KY16 9TH, U.K
| |
Collapse
|
224
|
|
225
|
Mikalsen SO. Proteomics made more accessible. Proteomics 2014; 14:989-90. [DOI: 10.1002/pmic.201400064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2014] [Accepted: 03/11/2014] [Indexed: 11/09/2022]
Affiliation(s)
- Svein-Ole Mikalsen
- Faculty of Natural and Health Sciences, Department of Science and Technology; University of the Faroe Islands; Faroe Islands
| |
Collapse
|
226
|
Buckberry S, Bent SJ, Bianco-Miotto T, Roberts CT. massiR: a method for predicting the sex of samples in gene expression microarray datasets. ACTA ACUST UNITED AC 2014; 30:2084-5. [PMID: 24659105 PMCID: PMC4080740 DOI: 10.1093/bioinformatics/btu161] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Summary: High-throughput gene expression microarrays are currently the most efficient method for transcriptome-wide expression analyses. Consequently, gene expression data available through public repositories have largely been obtained from microarray experiments. However, the metadata associated with many publicly available expression microarray datasets often lacks sample sex information, therefore limiting the reuse of these data in new analyses or larger meta-analyses where the effect of sex is to be considered. Here, we present the massiR package, which provides a method for researchers to predict the sex of samples in microarray datasets. Using information from microarray probes representing Y chromosome genes, this package implements unsupervised clustering methods to classify samples into male and female groups, providing an efficient way to identify or confirm the sex of samples in mammalian microarray datasets. Availability and implementation:massiR is implemented as a Bioconductor package in R. The package and the vignette can be downloaded at bioconductor.org and are provided under a GPL-2 license. Contact:sam.buckberry@adelaide.edu.au Supplementary information:Supplementary data are available at Bioinformatics online
Collapse
Affiliation(s)
- Sam Buckberry
- The Robinson Research Institute, School of Paediatrics and Reproductive Health, The University of Adelaide, Adelaide 5005, Australia and School of Agriculture Food and Wine, The University of Adelaide, Adelaide 5005, Australia
| | - Stephen J Bent
- The Robinson Research Institute, School of Paediatrics and Reproductive Health, The University of Adelaide, Adelaide 5005, Australia and School of Agriculture Food and Wine, The University of Adelaide, Adelaide 5005, Australia
| | - Tina Bianco-Miotto
- The Robinson Research Institute, School of Paediatrics and Reproductive Health, The University of Adelaide, Adelaide 5005, Australia and School of Agriculture Food and Wine, The University of Adelaide, Adelaide 5005, AustraliaThe Robinson Research Institute, School of Paediatrics and Reproductive Health, The University of Adelaide, Adelaide 5005, Australia and School of Agriculture Food and Wine, The University of Adelaide, Adelaide 5005, Australia
| | - Claire T Roberts
- The Robinson Research Institute, School of Paediatrics and Reproductive Health, The University of Adelaide, Adelaide 5005, Australia and School of Agriculture Food and Wine, The University of Adelaide, Adelaide 5005, Australia
| |
Collapse
|
227
|
Fahrenbach JP, Andrade J, McNally EM. The CO-Regulation Database (CORD): a tool to identify coordinately expressed genes. PLoS One 2014; 9:e90408. [PMID: 24599084 PMCID: PMC3944024 DOI: 10.1371/journal.pone.0090408] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2013] [Accepted: 02/01/2014] [Indexed: 02/03/2023] Open
Abstract
Background Meta-analysis of gene expression array databases has the potential to reveal information about gene function. The identification of gene-gene interactions may be inferred from gene expression information but such meta-analysis is often limited to a single microarray platform. To address this limitation, we developed a gene-centered approach to analyze differential expression across thousands of gene expression experiments and created the CO-Regulation Database (CORD) to determine which genes are correlated with a queried gene. Results Using the GEO and ArrayExpress database, we analyzed over 120,000 group by group experiments from gene microarrays to determine the correlating genes for over 30,000 different genes or hypothesized genes. CORD output data is presented for sample queries with focus on genes with well-known interaction networks including p16 (CDKN2A), vimentin (VIM), MyoD (MYOD1). CDKN2A, VIM, and MYOD1 all displayed gene correlations consistent with known interacting genes. Conclusions We developed a facile, web-enabled program to determine gene-gene correlations across different gene expression microarray platforms. Using well-characterized genes, we illustrate how CORD's identification of co-expressed genes contributes to a better understanding a gene's potential function. The website is found at http://cord-db.org.
Collapse
Affiliation(s)
- John P. Fahrenbach
- Department of Medicine, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| | - Jorge Andrade
- Center for Research Informatics, The University of Chicago, Chicago, Illinois, United States of America
| | - Elizabeth M. McNally
- Department of Medicine, The University of Chicago, Chicago, Illinois, United States of America
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
228
|
Lee YH, Bae SC, Song GG. Meta-analysis of gene expression profiles to predict response to biologic agents in rheumatoid arthritis. Clin Rheumatol 2014; 33:775-82. [PMID: 24595895 DOI: 10.1007/s10067-014-2547-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2013] [Revised: 02/07/2014] [Accepted: 02/19/2014] [Indexed: 10/25/2022]
Abstract
Our aim was to identify differentially expressed (DE) genes and biological processes that may help predict patient response to biologic agents for rheumatoid arthritis (RA). Using the INMEX (integrative meta-analysis of expression data) software tool, we performed a meta-analysis of publicly available microarray Gene Expression Omnibus (GEO) datasets that examined patient response to biologic therapy for RA. Three GEO datasets, containing 79 responders and 34 non-responders, were included in the meta-analysis. We identified 1,374 genes that were consistently differentially expressed in responders vs. non-responders (651 up-regulated and 723 down-regulated). The up-regulated gene with the smallest p value (p=0.000192) was ASCC2 (Activating Signal Cointegrator 1 Complex Subunit 2), and the up-regulated gene with the largest fold change (average log fold change=-0.75869, p=0.000206) was KLRC3 (Killer Cell Lectin-Like Receptor Subfamily C, Member 3). The down-regulated gene with the smallest p value (p=0.000195) was MPL (Myeloproliferative Leukemia Virus Oncogene). Among the 236 GO terms associated with the set of DE genes, the most significantly enriched was "CTP biosynthetic process" (GO:0006241; p=0.000454). Our meta-analysis identified genes that were consistently DE in responders vs. non-responders, as well as biological pathways associated with this set of genes. These results provide insight into the molecular mechanisms underlying responsiveness to biologic therapy for RA.
Collapse
Affiliation(s)
- Young Ho Lee
- Division of Rheumatology, Department of Internal Medicine, Korea University Anam Hospital, Korea University College of Medicine, 126-1 5 ga, Anam-dong, Seongbuk-gu, Seoul, 136-705, Korea,
| | | | | |
Collapse
|
229
|
Abstract
Transcriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs. healthy), based on the studies' experimental designs, followed by computing the overlap between the resulting differential expression signatures. While useful, in this methodology each study yields multiple independent phenotype comparisons, and connections are established not between studies, but rather between subsets of the studies corresponding to phenotype comparisons. We propose a rank-based statistical meta-analysis framework that establishes global connections between transcriptomics studies without breaking down studies into sets of phenotype comparisons. By using a rank product method, our framework extracts global features from each study, corresponding to genes that are consistently among the most expressed or differentially expressed genes in that study. Those features are then statistically modelled via a term-frequency inverse-document frequency (TF-IDF) model, which is then used for connecting studies. Our framework is fast and parameter-free; when applied to large collections of Homo sapiens and Streptococcus pneumoniae transcriptomics studies, it performs better than similarity-based approaches in retrieving related studies, using a Medical Subject Headings gold standard. Finally, we highlight via case studies how the framework can be used to derive novel biological hypotheses regarding related studies and the genes that drive those connections. Our proposed statistical framework shows that it is possible to perform a meta-analysis of transcriptomics studies with arbitrary experimental designs by deriving global expression features rather than decomposing studies into multiple phenotype comparisons.
Collapse
|
230
|
Cremaschi P, Rovida S, Sacchi L, Lisa A, Calvi F, Montecucco A, Biamonti G, Bione S, Sacchi G. CorrelaGenes: a new tool for the interpretation of the human transcriptome. BMC Bioinformatics 2014; 15 Suppl 1:S6. [PMID: 24564370 PMCID: PMC4016313 DOI: 10.1186/1471-2105-15-s1-s6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Background The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists. Results By exploiting expression data publicly available in the Gene Expression Omnibus (GEO) database, we developed a new bioinformatics tool aimed at the identification of genes whose expression appeared simultaneously altered in different experimental conditions, thus suggesting co-regulation or coordinated action in the same biological process. To accomplish this task, we used the 978 human GEO Curated DataSets and we manually performed the selection of 2,109 pair-wise comparisons based on their biological rationale. The lists of differentially expressed genes, obtained from the selected comparisons, were stored in a PostgreSQL database and used as data source for the CorrelaGenes tool. Our application uses a customized Association Rule Mining (ARM) algorithm to identify sets of genes showing expression profiles correlated with a gene of interest. The significance of the correlation is measured coupling the Lift, a well-known standard ARM index, and the χ2 p value. The manually curated selection of the comparisons and the developed algorithm constitute a new approach in the field of gene expression profiling studies. Simulation performed on 100 randomly selected target genes allowed us to evaluate the efficiency of the procedure and to obtain preliminary data demonstrating the consistency of the results. Conclusions The preliminary results of the simulation showed how CorrelaGenes could contribute to the characterization of molecular pathways and biological processes integrating data obtained from other applications and available in public repositories.
Collapse
|
231
|
Valdes C, Capobianco E. Methods to detect transcribed pseudogenes: RNA-Seq discovery allows learning through features. Methods Mol Biol 2014; 1167:157-83. [PMID: 24823777 DOI: 10.1007/978-1-4939-0835-6_11] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
The detection of transcripts and the measurement of their associated activity at the pseudogene scale have recently become important topics of research. Being integral part of many recent studies aimed at establishing a role for a variety of noncoding RNA structures, pseudogenes' popularity has substantially increased due to the discovery of regulatory properties and complex mechanisms of action that, while requiring further investigation, analysis, and validation, promise as well to have a broad impact on human disease. Currently, there are relatively few methodologies specifically designed to accomplish the detection of pseudogene transcripts and tools that either replace or integrate manual annotation procedures are very much needed. In particular, it seems to us justified that we engage in advancing the computational treatment of pseudogenes at the whole transcriptome level. Catalogs of human pseudogenes have started to be delivered, through RNA-Seq technologies. However, just a certain number of transcriptomes has been covered. Furthermore, while most proposals have led to the production of a targeted algorithm, especially used for detection, few computational pipelines were designed following a comprehensive approach addressing identification and quantification of transcriptional activity within a unifying methodological frame. Given the currently incomplete evidence, the limitations of the impacts due to the lack of extensive testing, and the presence of unsolved uncertainties affecting the reproducibility of results, our motivation for the proposal of a new computational approach is high and timely. We have considered a hybrid approach, based on the assembly of a variety of computational tools, including RNA-Seq methods and machine learning applications, all applied to transcriptome data of various complexities. Our initial strategy is to provide lists of pseudogenes to be validated against the currently known examples, in order to extend our knowledge further. An ultimate goal that is naturally linked to this work is to provide an automatic approach that analyzes transcriptomes with the goal of detecting candidate pseudogenes through characteristic features and that allows efficient and reproducible pseudogene classification models.
Collapse
Affiliation(s)
- Camilo Valdes
- Center for Computational Science, University of Miami, Miami, FL, 33146, USA
| | | |
Collapse
|
232
|
Montero-Melendez T, Perretti M. Connections in pharmacology: innovation serving translational medicine. Drug Discov Today 2013; 19:820-3. [PMID: 24316023 DOI: 10.1016/j.drudis.2013.11.022] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2013] [Revised: 11/04/2013] [Accepted: 11/28/2013] [Indexed: 10/25/2022]
Abstract
There is a paucity of molecules that progress through the drug development pipeline, making the drug discovery process expensive and frustrating. Innovative approaches to drug development are therefore required to maximise opportunities. Strategies such as the Connectivity Map (CMap), which compares >7000 gene expression signatures generated from more than 1000 drugs, can produce associations between currently unrelated therapeutics, unveiling new mechanisms of action and favouring drug repositioning. Here, we discuss these opportunities that could aid the drug development process and propose rigorous publication of 'omics data with open access and data sharing. We, pharmacologists of the third millennium, must aim towards maximising knowledge in an unbiased and cost-effective manner, to deliver new drugs for the global benefit of patients.
Collapse
Affiliation(s)
- Trinidad Montero-Melendez
- The William Harvey Research Institute, Barts and The London School of Medicine, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK.
| | - Mauro Perretti
- The William Harvey Research Institute, Barts and The London School of Medicine, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| |
Collapse
|
233
|
Faustino RS, Arrell DK, Folmes CDL, Terzic A, Perez-Terzic C. Stem cell systems informatics for advanced clinical biodiagnostics: tracing molecular signatures from bench to bedside. Croat Med J 2013. [PMID: 23986272 PMCID: PMC3760656 DOI: 10.3325//cmj.2013.54.319] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Development of innovative high throughput technologies has enabled a variety of molecular landscapes to be interrogated with an unprecedented degree of detail. Emergence of next generation nucleotide sequencing methods, advanced proteomic techniques, and metabolic profiling approaches continue to produce a wealth of biological data that captures molecular frameworks underlying phenotype. The advent of these novel technologies has significant translational applications, as investigators can now explore molecular underpinnings of developmental states with a high degree of resolution. Application of these leading-edge techniques to patient samples has been successfully used to unmask nuanced molecular details of disease vs healthy tissue, which may provide novel targets for palliative intervention. To enhance such approaches, concomitant development of algorithms to reprogram differentiated cells in order to recapitulate pluripotent capacity offers a distinct advantage to advancing diagnostic methodology. Bioinformatic deconvolution of several “-omic” layers extracted from reprogrammed patient cells, could, in principle, provide a means by which the evolution of individual pathology can be developmentally monitored. Significant logistic challenges face current implementation of this novel paradigm of patient treatment and care, however, several of these limitations have been successfully addressed through continuous development of cutting edge in silico archiving and processing methods. Comprehensive elucidation of genomic, transcriptomic, proteomic, and metabolomic networks that define normal and pathological states, in combination with reprogrammed patient cells are thus poised to become high value resources in modern diagnosis and prognosis of patient disease.
Collapse
Affiliation(s)
- Randolph S Faustino
- C. Perez-Terzic, Mayo Clinic, 200 First Street SW, Rochester, MN, USA 55905,
| | | | | | | | | |
Collapse
|
234
|
Kenakin T, Bylund DB, Toews ML, Mullane K, Winquist RJ, Williams M. Replicated, replicable and relevant-target engagement and pharmacological experimentation in the 21st century. Biochem Pharmacol 2013; 87:64-77. [PMID: 24269285 DOI: 10.1016/j.bcp.2013.10.024] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2013] [Accepted: 10/29/2013] [Indexed: 02/06/2023]
Abstract
A pharmacological experiment is typically conducted to: i) test or expand a hypothesis regarding the potential role of a target in the mechanism(s) underlying a disease state using an existing drug or tool compound in normal and/or diseased tissue or animals; or ii) characterize and optimize a new chemical entity (NCE) targeted to modulate a specific disease-associated target to restore homeostasis as a potential drug candidate. Hypothesis testing necessitates an intellectually rigorous, null hypothesis approach that is distinct from a high throughput fishing expedition in search of a hypothesis. In conducting an experiment, the protocol should be transparently defined along with its powering, design, appropriate statistical analysis and consideration of the anticipated outcome (s) before it is initiated. Compound-target interactions often involve the direct study of phenotype(s) unique to the target at the cell, tissue or animal/human level. However, in vivo studies are often compromised by a lack of sufficient information on the compound pharmacokinetics necessary to ensure target engagement and also by the context-free analysis of ubiquitous cellular signaling pathways downstream from the target. The use of single tool compounds/drugs at one concentration in engineered cell lines frequently results in reductionistic data that have no physiologically relevance. This overview, focused on trends in the peer-reviewed literature, discusses the execution and reporting of experiments and the criteria recommended for the physiologically-relevant assessment of target engagement to identify viable new drug targets and facilitate the advancement of translational studies.
Collapse
Affiliation(s)
- Terry Kenakin
- Department of Pharmacology, University of North Carolina School of Medicine, Chapel Hill, NC, USA
| | - David B Bylund
- Department of Pharmacology and Experimental Neuroscience, University of Nebraska Medical Center, Omaha, NE, USA
| | - Myron L Toews
- Department of Pharmacology and Experimental Neuroscience, University of Nebraska Medical Center, Omaha, NE, USA
| | | | - Raymond J Winquist
- Department of Integrated Biology, Vertex Pharmaceuticals, Inc., Cambridge, MA, USA
| | - Michael Williams
- Department of Molecular Pharmacology and Biological Chemistry, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
| |
Collapse
|
235
|
Mullane K, Winquist RJ, Williams M. Translational paradigms in pharmacology and drug discovery. Biochem Pharmacol 2013; 87:189-210. [PMID: 24184503 DOI: 10.1016/j.bcp.2013.10.019] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Accepted: 10/16/2013] [Indexed: 12/15/2022]
Abstract
The translational sciences represent the core element in enabling and utilizing the output from the biomedical sciences and to improving drug discovery metrics by reducing the attrition rate as compounds move from preclinical research to clinical proof of concept. Key to understanding the basis of disease causality and to developing therapeutics is an ability to accurately diagnose the disease and to identify and develop safe and effective therapeutics for its treatment. The former requires validated biomarkers and the latter, qualified targets. Progress has been hampered by semantic issues, specifically those that define the end product, and by scientific issues that include data reliability, an overt reductionistic cultural focus and a lack of hierarchically integrated data gathering and systematic analysis. A necessary framework for these activities is represented by the discipline of pharmacology, efforts and training in which require recognition and revitalization.
Collapse
Affiliation(s)
- Kevin Mullane
- Profectus Pharma Consulting Inc., San Jose, CA, United States.
| | - Raymond J Winquist
- Department of Pharmacology, Vertex Pharmaceuticals Inc., Cambridge, MA, United States
| | - Michael Williams
- Department of Molecular Pharmacology and Biological Chemistry, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| |
Collapse
|
236
|
Piwowar HA, Vision TJ. Data reuse and the open data citation advantage. PeerJ 2013; 1:e175. [PMID: 24109559 PMCID: PMC3792178 DOI: 10.7717/peerj.175] [Citation(s) in RCA: 206] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Accepted: 09/13/2013] [Indexed: 01/10/2023] Open
Abstract
Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the “citation benefit”. Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
Collapse
Affiliation(s)
- Heather A Piwowar
- National Evolutionary Synthesis Center , Durham, NC , USA ; Department of Biology, Duke University , Durham, NC , USA
| | | |
Collapse
|
237
|
Song GG, Kim JH, Seo YH, Choi SJ, Ji JD, Lee YH. Meta-analysis of differentially expressed genes in primary Sjogren's syndrome by using microarray. Hum Immunol 2013; 75:98-104. [PMID: 24090683 DOI: 10.1016/j.humimm.2013.09.012] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2013] [Revised: 09/11/2013] [Accepted: 09/20/2013] [Indexed: 12/16/2022]
Abstract
INTRODUCTION The purpose of this study was to identify differentially expressed (DE) genes and biological processes associated with changes in gene expression in primary Sjogren's syndrome (pSS). METHODS We performed a meta-analysis using the INMEX program (integrative meta-analysis of expression data) of publicly available microarray GEO datasets of pSS. We performed Gene Ontology (GO) enrichment analyses and pathway analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG). RESULTS Three GEO datasets including 37 cases and 33 controls were available for the meta-analysis. We identified 179 genes across the studies which were consistently DE in pSS (146 up-regulated and 33 down-regulated). The up-regulated gene with the largest effect size (ES) (ES = -2.4228) was SELL (selectin L), whose product is required for the binding and subsequent rolling of leucocytes on endothelial cells to facilitate their migration into secondary lymphoid organs and inflammation sites. The most significant enrichment was in the immune response GO category (P = 2.52 × 10(-25)). The most significant pathway in our KEGG analysis was Epstein-Barr virus infection (P = 9.91 × 10(-06)). CONCLUSIONS Our meta-analysis demonstrated genes that were consistently DE and biological pathways associated with gene expression changes with pSS.
Collapse
Affiliation(s)
- Gwan Gyu Song
- Division of Rheumatology, Department of Internal Medicine, Korea University College of Medicine, Seoul, Republic of Korea
| | - Jae-Hoon Kim
- Division of Rheumatology, Department of Internal Medicine, Korea University College of Medicine, Seoul, Republic of Korea
| | - Young Ho Seo
- Division of Rheumatology, Department of Internal Medicine, Korea University College of Medicine, Seoul, Republic of Korea
| | - Sung Jae Choi
- Division of Rheumatology, Department of Internal Medicine, Korea University College of Medicine, Seoul, Republic of Korea
| | - Jong Dae Ji
- Division of Rheumatology, Department of Internal Medicine, Korea University College of Medicine, Seoul, Republic of Korea
| | - Young Ho Lee
- Division of Rheumatology, Department of Internal Medicine, Korea University College of Medicine, Seoul, Republic of Korea.
| |
Collapse
|
238
|
Winquist RJ, Mullane K, Williams M. The fall and rise of pharmacology--(re-)defining the discipline? Biochem Pharmacol 2013; 87:4-24. [PMID: 24070656 DOI: 10.1016/j.bcp.2013.09.011] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2013] [Accepted: 09/09/2013] [Indexed: 12/19/2022]
Abstract
Pharmacology is an integrative discipline that originated from activities, now nearly 7000 years old, to identify therapeutics from natural product sources. Research in the 19th Century that focused on the Law of Mass Action (LMA) demonstrated that compound effects were dose-/concentration-dependent eventually leading to the receptor concept, now a century old, that remains the key to understanding disease causality and drug action. As pharmacology evolved in the 20th Century through successive biochemical, molecular and genomic eras, the precision in understanding receptor function at the molecular level increased and while providing important insights, led to an overtly reductionistic emphasis. This resulted in the generation of data lacking physiological context that ignored the LMA and was not integrated at the tissue/whole organism level. As reductionism became a primary focus in biomedical research, it led to the fall of pharmacology. However, concerns regarding the disconnect between basic research efforts and the approval of new drugs to treat 21st Century disease tsunamis, e.g., neurodegeneration, metabolic syndrome, etc. has led to the reemergence of pharmacology, its rise, often in the semantic guise of systems biology. Against a background of limited training in pharmacology, this has resulted in issues in experimental replication with a bioinformatics emphasis that often has a limited relationship to reality. The integration of newer technologies within a pharmacological context where research is driven by testable hypotheses rather than technology, together with renewed efforts in teaching pharmacology, is anticipated to improve the focus and relevance of biomedical research and lead to novel therapeutics that will contain health care costs.
Collapse
Affiliation(s)
- Raymond J Winquist
- Department of Pharmacology, Vertex Pharmaceuticals Inc., Cambridge, MA, United States
| | - Kevin Mullane
- Profectus Pharma Consulting Inc., San Jose, CA, United States
| | - Michael Williams
- Department of Molecular Pharmacology and Biological Chemistry, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States.
| |
Collapse
|
239
|
Lee YS, Krishnan A, Zhu Q, Troyanskaya OG. Ontology-aware classification of tissue and cell-type signals in gene expression profiles across platforms and technologies. ACTA ACUST UNITED AC 2013; 29:3036-44. [PMID: 24037214 PMCID: PMC3834796 DOI: 10.1093/bioinformatics/btt529] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Motivation: Leveraging gene expression data through large-scale integrative analyses for multicellular organisms is challenging because most samples are not fully annotated to their tissue/cell-type of origin. A computational method to classify samples using their entire gene expression profiles is needed. Such a method must be applicable across thousands of independent studies, hundreds of gene expression technologies and hundreds of diverse human tissues and cell-types. Results: We present Unveiling RNA Sample Annotation (URSA) that leverages the complex tissue/cell-type relationships and simultaneously estimates the probabilities associated with hundreds of tissues/cell-types for any given gene expression profile. URSA provides accurate and intuitive probability values for expression profiles across independent studies and outperforms other methods, irrespective of data preprocessing techniques. Moreover, without re-training, URSA can be used to classify samples from diverse microarray platforms and even from next-generation sequencing technology. Finally, we provide a molecular interpretation for the tissue and cell-type models as the biological basis for URSA’s classifications. Availability and implementation: An interactive web interface for using URSA for gene expression analysis is available at: ursa.princeton.edu. The source code is available at https://bitbucket.org/youngl/ursa_backend. Contact:ogt@cs.princeton.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Young-suk Lee
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | | | | | | |
Collapse
|
240
|
Orsini M, Travaglione A, Capobianco E. Cancer markers: integratively annotated classification. Gene 2013; 530:257-65. [PMID: 23928109 DOI: 10.1016/j.gene.2013.07.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Accepted: 07/01/2013] [Indexed: 11/15/2022]
Abstract
Translational cancer genomics research aims to ensure that experimental knowledge is subject to computational analysis, and integrated with a variety of records from omics and clinical sources. The data retrieval from such sources is not trivial, due to their redundancy and heterogeneity, and the presence of false evidence. In silico marker identification, therefore, remains a complex task that is mainly motivated by the impact that target identification from the elucidation of gene co-expression dynamics and regulation mechanisms, combined with the discovery of genotype-phenotype associations, may have for clinical validation. Based on the reuse of publicly available gene expression data, our aim is to propose cancer marker classification by integrating the prediction power of multiple annotation sources. In particular, with reference to the functional annotation for colorectal markers, we indicate a classification of markers into diagnostic and prognostic classes combined with susceptibility and risk factors.
Collapse
Affiliation(s)
- M Orsini
- CRS4 Bioinformatics Laboratory, Polaris, Pula (CA), Italy
| | | | | |
Collapse
|
241
|
Faustino RS, Arrell DK, Folmes CD, Terzic A, Perez-Terzic C. Stem cell systems informatics for advanced clinical biodiagnostics: tracing molecular signatures from bench to bedside. Croat Med J 2013; 54:319-29. [PMID: 23986272 PMCID: PMC3760656 DOI: 10.3325/cmj.2013.54.319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023] Open
Abstract
Development of innovative high throughput technologies has enabled a variety of molecular landscapes to be interrogated with an unprecedented degree of detail. Emergence of next generation nucleotide sequencing methods, advanced proteomic techniques, and metabolic profiling approaches continue to produce a wealth of biological data that captures molecular frameworks underlying phenotype. The advent of these novel technologies has significant translational applications, as investigators can now explore molecular underpinnings of developmental states with a high degree of resolution. Application of these leading-edge techniques to patient samples has been successfully used to unmask nuanced molecular details of disease vs healthy tissue, which may provide novel targets for palliative intervention. To enhance such approaches, concomitant development of algorithms to reprogram differentiated cells in order to recapitulate pluripotent capacity offers a distinct advantage to advancing diagnostic methodology. Bioinformatic deconvolution of several "-omic" layers extracted from reprogrammed patient cells, could, in principle, provide a means by which the evolution of individual pathology can be developmentally monitored. Significant logistic challenges face current implementation of this novel paradigm of patient treatment and care, however, several of these limitations have been successfully addressed through continuous development of cutting edge in silico archiving and processing methods. Comprehensive elucidation of genomic, transcriptomic, proteomic, and metabolomic networks that define normal and pathological states, in combination with reprogrammed patient cells are thus poised to become high value resources in modern diagnosis and prognosis of patient disease.
Collapse
Affiliation(s)
- Randolph S. Faustino
- Division of Cardiovascular Diseases, Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA
| | - D. Kent Arrell
- Division of Cardiovascular Diseases, Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA
| | - Clifford D.L. Folmes
- Division of Cardiovascular Diseases, Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA
| | - Andre Terzic
- Division of Cardiovascular Diseases, Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA
| | - Carmen Perez-Terzic
- Division of Cardiovascular Diseases, Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA,Physical Medicine and Rehabilitation, Mayo Clinic College of Medicine, Rochester, MN, USA
| |
Collapse
|
242
|
Richardson JE, Reid MC. The promises and pitfalls of leveraging mobile health technology for pain care. PAIN MEDICINE 2013; 14:1621-6. [PMID: 23865541 DOI: 10.1111/pme.12206] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Joshua E Richardson
- Center for Healthcare Informatics and Policy, Weill Cornell Medical College, New York, New York, USA
| | | |
Collapse
|
243
|
Li JW, Bolser D, Manske M, Giorgi FM, Vyahhi N, Usadel B, Clavijo BJ, Chan TF, Wong N, Zerbino D, Schneider MV. The NGS WikiBook: a dynamic collaborative online training effort with long-term sustainability. Brief Bioinform 2013; 14:548-55. [PMID: 23793381 PMCID: PMC3771235 DOI: 10.1093/bib/bbt045] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Next-generation sequencing (NGS) is increasingly being adopted as the backbone of biomedical research. With the commercialization of various affordable desktop sequencers, NGS will be reached by increasing numbers of cellular and molecular biologists, necessitating community consensus on bioinformatics protocols to tackle the exponential increase in quantity of sequence data. The current resources for NGS informatics are extremely fragmented. Finding a centralized synthesis is difficult. A multitude of tools exist for NGS data analysis; however, none of these satisfies all possible uses and needs. This gap in functionality could be filled by integrating different methods in customized pipelines, an approach helped by the open-source nature of many NGS programmes. Drawing from community spirit and with the use of the Wikipedia framework, we have initiated a collaborative NGS resource: The NGS WikiBook. We have collected a sufficient amount of text to incentivize a broader community to contribute to it. Users can search, browse, edit and create new content, so as to facilitate self-learning and feedback to the community. The overall structure and style for this dynamic material is designed for the bench biologists and non-bioinformaticians. The flexibility of online material allows the readers to ignore details in a first read, yet have immediate access to the information they need. Each chapter comes with practical exercises so readers may familiarize themselves with each step. The NGS WikiBook aims to create a collective laboratory book and protocol that explains the key concepts and describes best practices in this fast-evolving field.
Collapse
Affiliation(s)
- Jing-Woei Li
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR. Tel.: +852-39431302;
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
244
|
Loraine AE, McCormick S, Estrada A, Patel K, Qin P. RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing. PLANT PHYSIOLOGY 2013; 162:1092-109. [PMID: 23590974 PMCID: PMC3668042 DOI: 10.1104/pp.112.211441] [Citation(s) in RCA: 162] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 04/14/2013] [Indexed: 05/18/2023]
Abstract
Pollen grains of Arabidopsis (Arabidopsis thaliana) contain two haploid sperm cells enclosed in a haploid vegetative cell. Upon germination, the vegetative cell extrudes a pollen tube that carries the sperm to an ovule for fertilization. Knowing the identity, relative abundance, and splicing patterns of pollen transcripts will improve our understanding of pollen and allow investigation of tissue-specific splicing in plants. Most Arabidopsis pollen transcriptome studies have used the ATH1 microarray, which does not assay splice variants and lacks specific probe sets for many genes. To investigate the pollen transcriptome, we performed high-throughput sequencing (RNA-Seq) of Arabidopsis pollen and seedlings for comparison. Gene expression was more diverse in seedling, and genes involved in cell wall biogenesis were highly expressed in pollen. RNA-Seq detected at least 4,172 protein-coding genes expressed in pollen, including 289 assayed only by nonspecific probe sets. Additional exons and previously unannotated 5' and 3' untranslated regions for pollen-expressed genes were revealed. We detected regions in the genome not previously annotated as expressed; 14 were tested and 12 were confirmed by polymerase chain reaction. Gapped read alignments revealed 1,908 high-confidence new splicing events supported by 10 or more spliced read alignments. Alternative splicing patterns in pollen and seedling were highly correlated. For most alternatively spliced genes, the ratio of variants in pollen and seedling was similar, except for some encoding proteins involved in RNA splicing. This study highlights the robustness of splicing patterns in plants and the importance of ongoing annotation and visualization of RNA-Seq data using interactive tools such as Integrated Genome Browser.
Collapse
Affiliation(s)
- Ann E Loraine
- Department of Bioinformatics and Genomics, University of North Carolina, Kannapolis, North Carolina 28081, USA.
| | | | | | | | | |
Collapse
|
245
|
Abstract
UNLABELLED The estimation of isoform abundances from RNA-Seq data requires a time-intensive step of mapping reads to either an assembled or previously annotated transcriptome, followed by an optimization procedure for deconvolution of multi-mapping reads. These procedures are essential for downstream analysis such as differential expression. In cases where it is desirable to adjust the underlying annotation, for example, on the discovery of novel isoforms or errors in existing annotations, current pipelines must be rerun from scratch. This makes it difficult to update abundance estimates after re-annotation, or to explore the effect of changes in the transcriptome on analyses. We present a novel efficient algorithm for updating abundance estimates from RNA-Seq experiments on re-annotation that does not require re-analysis of the entire dataset. Our approach is based on a fast partitioning algorithm for identifying transcripts whose abundances may depend on the added or deleted isoforms, and on a fast follow-up approach to re-estimating abundances for all transcripts. We demonstrate the effectiveness of our methods by showing how to synchronize RNA-Seq abundance estimates with the daily RefSeq incremental updates. Thus, we provide a practical approach to maintaining relevant databases of RNA-Seq derived abundance estimates even as annotations are being constantly revised. AVAILABILITY AND IMPLEMENTATION Our methods are implemented in software called ReXpress and are freely available, together with source code, at http://bio.math.berkeley.edu/ReXpress/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Adam Roberts
- Department of Computer Science, University of Calofornia Berkeley, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
246
|
Eijssen LMT, Jaillard M, Adriaens ME, Gaj S, de Groot PJ, Müller M, Evelo CT. User-friendly solutions for microarray quality control and pre-processing on ArrayAnalysis.org. Nucleic Acids Res 2013; 41:W71-6. [PMID: 23620278 PMCID: PMC3692049 DOI: 10.1093/nar/gkt293] [Citation(s) in RCA: 102] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Quality control (QC) is crucial for any scientific method producing data. Applying adequate QC introduces new challenges in the genomics field where large amounts of data are produced with complex technologies. For DNA microarrays, specific algorithms for QC and pre-processing including normalization have been developed by the scientific community, especially for expression chips of the Affymetrix platform. Many of these have been implemented in the statistical scripting language R and are available from the Bioconductor repository. However, application is hampered by lack of integrative tools that can be used by users of any experience level. To fill this gap, we developed a freely available tool for QC and pre-processing of Affymetrix gene expression results, extending, integrating and harmonizing functionality of Bioconductor packages. The tool can be easily accessed through a wizard-like web portal at http://www.arrayanalysis.org or downloaded for local use in R. The portal provides extensive documentation, including user guides, interpretation help with real output illustrations and detailed technical documentation. It assists newcomers to the field in performing state-of-the-art QC and pre-processing while offering data analysts an integral open-source package. Providing the scientific community with this easily accessible tool will allow improving data quality and reuse and adoption of standards.
Collapse
Affiliation(s)
- Lars M T Eijssen
- Department of Bioinformatics-BiGCaT, Maastricht University, PO Box 616, 6200 MD Maastricht, The Netherlands.
| | | | | | | | | | | | | |
Collapse
|
247
|
Lahti L, Torrente A, Elo LL, Brazma A, Rung J. A fully scalable online pre-processing algorithm for short oligonucleotide microarray atlases. Nucleic Acids Res 2013; 41:e110. [PMID: 23563154 PMCID: PMC3664815 DOI: 10.1093/nar/gkt229] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Rapid accumulation of large and standardized microarray data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of these data resources. Although short oligonucleotide arrays constitute a major source of genome-wide profiling data, scalable probe-level techniques have been available only for few platforms based on pre-calculated probe effects from restricted reference training sets. To overcome these key limitations, we introduce a fully scalable online-learning algorithm for probe-level analysis and pre-processing of large microarray atlases involving tens of thousands of arrays. In contrast to the alternatives, our algorithm scales up linearly with respect to sample size and is applicable to all short oligonucleotide platforms. The model can use the most comprehensive data collections available to date to pinpoint individual probes affected by noise and biases, providing tools to guide array design and quality control. This is the only available algorithm that can learn probe-level parameters based on sequential hyperparameter updates at small consecutive batches of data, thus circumventing the extensive memory requirements of the standard approaches and opening up novel opportunities to take full advantage of contemporary microarray collections.
Collapse
Affiliation(s)
- Leo Lahti
- Department of Veterinary Bioscience, University of Helsinki, Agnes Sjöbergin katu 2, PO Box 66, FI-00014 University of Helsinki, Finland.
| | | | | | | | | |
Collapse
|
248
|
Abstract
Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
Collapse
Affiliation(s)
- Heather A Piwowar
- National Evolutionary Synthesis Center , Durham, NC , USA ; Department of Biology, Duke University , Durham, NC , USA
| | | |
Collapse
|
249
|
He H, Conrad CA, Nilsson CL, Ji Y, Schaub TM, Marshall AG, Emmett MR. Method for lipidomic analysis: p53 expression modulation of sulfatide, ganglioside, and phospholipid composition of U87 MG glioblastoma cells. Anal Chem 2007; 79:8423-30. [PMID: 17929901 DOI: 10.1021/ac071413m] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Lipidomics can complement genomics and proteomics by providing new insight into dynamic changes in biomembranes; however, few reports in the literature have explored, on an organism-wide scale, the functional link between nonenzymatic proteins and cellular lipids. Here, we report changes induced by adenovirus-delivered wild-type p53 gene and chemotherapy of U87 MG glioblastoma cells, a treatment known to trigger apoptosis and cell cycle arrest. We compare polar lipid changes in treated cells and control cells by use of a novel, sensitive method that employs lipid extraction, one-step liquid chromatography separation, high-resolution mass analysis, and Kendrick mass defect analysis. Nano-LC FT-ICR MS and quadrupole linear ion trap MS/MS analysis of polar lipids yields hundreds of unique assignments of glyco- and phospholipids at sub-ppm mass accuracy and high resolving power (m/Deltam50% = 200 000 at m/z 400) at 1 s/scan. MS/MS data confirm molecular structures in many instances. Sulfatides are most highly modulated by wild-type p53 treatment. The treatment also leads to an increase in phospholipids such as phosphatidyl inositols, phosphatidyl serines, phosphatidyl glycerols, and phosphatidyl ethanolamines. An increase in hydroxylated phospholipids is especially noteworthy. Also, a decrease in the longer chain gangliosides, GD1 and GM1b, is observed in wild-type p53 (treated) cells.
Collapse
Affiliation(s)
- Huan He
- National High Magnetic Field Laboratory, Florida State University, 1800 East Paul Dirac Drive, Tallahassee, Florida 32310-4005, USA
| | | | | | | | | | | | | |
Collapse
|