1
|
COVID-19 susceptibility and severity risks in a cross-sectional survey of over 500 000 US adults. BMJ Open 2022; 12:e049657. [PMID: 36223959 PMCID: PMC9561492 DOI: 10.1136/bmjopen-2021-049657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
OBJECTIVES The enormous toll of the COVID-19 pandemic has heightened the urgency of collecting and analysing population-scale datasets in real time to monitor and better understand the evolving pandemic. The objectives of this study were to examine the relationship of risk factors to COVID-19 susceptibility and severity and to develop risk models to accurately predict COVID-19 outcomes using rapidly obtained self-reported data. DESIGN A cross-sectional study. SETTING AncestryDNA customers in the USA who consented to research. PARTICIPANTS The AncestryDNA COVID-19 Study collected self-reported survey data on symptoms, outcomes, risk factors and exposures for over 563 000 adult individuals in the USA in just under 4 months, including over 4700 COVID-19 cases as measured by a self-reported positive test. RESULTS We replicated previously reported associations between several risk factors and COVID-19 susceptibility and severity outcomes, and additionally found that differences in known exposures accounted for many of the susceptibility associations. A notable exception was elevated susceptibility for men even after adjusting for known exposures and age (adjusted OR=1.36, 95% CI=1.19 to 1.55). We also demonstrated that self-reported data can be used to build accurate risk models to predict individualised COVID-19 susceptibility (area under the curve (AUC)=0.84) and severity outcomes including hospitalisation and critical illness (AUC=0.87 and 0.90, respectively). The risk models achieved robust discriminative performance across different age, sex and genetic ancestry groups within the study. CONCLUSIONS The results highlight the value of self-reported epidemiological data to rapidly provide public health insights into the evolving COVID-19 pandemic.
Collapse
|
2
|
|
3
|
Expanded COVID-19 phenotype definitions reveal distinct patterns of genetic association and protective effects. Nat Genet 2022; 54:374-381. [DOI: 10.1038/s41588-022-01042-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 03/02/2022] [Indexed: 12/21/2022]
|
4
|
The history and geographic distribution of a KCNQ1 atrial fibrillation risk allele. Nat Commun 2021; 12:6442. [PMID: 34750360 PMCID: PMC8575962 DOI: 10.1038/s41467-021-26741-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 10/20/2021] [Indexed: 11/08/2022] Open
Abstract
The genetic architecture of atrial fibrillation (AF) encompasses low impact, common genetic variants and high impact, rare variants. Here, we characterize a high impact AF-susceptibility allele, KCNQ1 R231H, and describe its transcontinental geographic distribution and history. Induced pluripotent stem cell-derived cardiomyocytes procured from risk allele carriers exhibit abbreviated action potential duration, consistent with a gain-of-function effect. Using identity-by-descent (IBD) networks, we estimate the broad- and fine-scale population ancestry of risk allele carriers and their relatives. Analysis of ancestral migration routes reveals ancestors who inhabited Denmark in the 1700s, migrated to the Northeastern United States in the early 1800s, and traveled across the Midwest to arrive in Utah in the late 1800s. IBD/coalescent-based allele dating analysis reveals a relatively recent origin of the AF risk allele (~5000 years). Thus, our approach broadens the scope of study for disease susceptibility alleles to the context of human migration and ancestral origins.
Collapse
|
5
|
Ancestry inference using reference labeled clusters of haplotypes. BMC Bioinformatics 2021; 22:459. [PMID: 34563119 PMCID: PMC8466715 DOI: 10.1186/s12859-021-04350-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 08/31/2021] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual's ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. RESULTS The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and reused to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations using 10 CPU). We test ARCHes on public data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) as well as simulated examples of known admixture. CONCLUSIONS Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at finer population scales regardless of the amount of population admixture.
Collapse
|
6
|
Genome-wide analysis in 756,646 individuals provides first genetic evidence that ACE2 expression influences COVID-19 risk and yields genetic risk scores predictive of severe disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021. [PMID: 33619501 PMCID: PMC7899471 DOI: 10.1101/2020.12.14.20248176] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
SARS-CoV-2 enters host cells by binding angiotensin-converting enzyme 2 (ACE2). Through a genome-wide association study, we show that a rare variant (MAF = 0.3%, odds ratio 0.60, P=4.5×10-13) that down-regulates ACE2 expression reduces risk of COVID-19 disease, providing human genetics support for the hypothesis that ACE2 levels influence COVID-19 risk. Further, we show that common genetic variants define a risk score that predicts severe disease among COVID-19 cases.
Collapse
|
7
|
Clustering of 770,000 genomes reveals post-colonial population structure of North America. Nat Commun 2017; 8:14238. [PMID: 28169989 PMCID: PMC5309710 DOI: 10.1038/ncomms14238] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 12/12/2016] [Indexed: 02/06/2023] Open
Abstract
Despite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited. Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin. We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records. Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow. Our results yield a detailed historical portrait of North America after European settlement and support substantial genetic heterogeneity in the United States beyond that uncovered by previous studies.
Collapse
|
8
|
PortEco: a resource for exploring bacterial biology through high-throughput data and analysis tools. Nucleic Acids Res 2013; 42:D677-84. [PMID: 24285306 PMCID: PMC3965092 DOI: 10.1093/nar/gkt1203] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
PortEco (http://porteco.org) aims to collect, curate and provide data and analysis tools to support basic biological research in Escherichia coli (and eventually other bacterial systems). PortEco is implemented as a ‘virtual’ model organism database that provides a single unified interface to the user, while integrating information from a variety of sources. The main focus of PortEco is to enable broad use of the growing number of high-throughput experiments available for E. coli, and to leverage community annotation through the EcoliWiki and GONUTS systems. Currently, PortEco includes curated data from hundreds of genome-wide RNA expression studies, from high-throughput phenotyping of single-gene knockouts under hundreds of annotated conditions, from chromatin immunoprecipitation experiments for tens of different DNA-binding factors and from ribosome profiling experiments that yield insights into protein expression. Conditions have been annotated with a consistent vocabulary, and data have been consistently normalized to enable users to find, compare and interpret relevant experiments. PortEco includes tools for data analysis, including clustering, enrichment analysis and exploration via genome browsers. PortEco search and data analysis tools are extensively linked to the curated gene, metabolic pathway and regulation content at its sister site, EcoCyc.
Collapse
|
9
|
Abstract
To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.
Collapse
|
10
|
Abstract
Summary: Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis. Availability and Implementation: Annotare is available from http://code.google.com/p/annotare/ under the terms of the open-source MIT License (http://www.opensource.org/licenses/mit-license.php). It has been tested on both Mac and Windows. Contact:rshankar@stanford.edu
Collapse
|
11
|
TB database 2010: overview and update. Tuberculosis (Edinb) 2010; 90:225-35. [PMID: 20488753 DOI: 10.1016/j.tube.2010.03.010] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2010] [Accepted: 03/31/2010] [Indexed: 11/28/2022]
Abstract
The Tuberculosis Database (TBDB) is an online database providing integrated access to genome sequence, expression data and literature curation for TB. TBDB currently houses genome assemblies for numerous strains of Mycobacterium tuberculosis (MTB) as well assemblies for over 20 strains related to MTB and useful for comparative analysis. TBDB stores pre- and post-publication gene-expression data from M. tuberculosis and its close relatives, including over 3000 MTB microarrays, 95 RT-PCR datasets, 2700 microarrays for human and mouse TB related experiments, and 260 arrays for Streptomyces coelicolor. To enable wide use of these data, TBDB provides a suite of tools for searching, browsing, analyzing, and downloading the data. We provide here an overview of TBDB focusing on recent data releases and enhancements. In particular, we describe the recent release of a Global Genetic Diversity dataset for TB, support for short-read re-sequencing data, new tools for exploring gene expression data in the context of gene regulation, and the integration of a metabolic network reconstruction and BioCyc with TBDB. By integrating a wide range of genomic data with tools for their use, TBDB is a unique platform for both basic science research in TB, as well as research into the discovery and development of TB drugs, vaccines and biomarkers.
Collapse
|
12
|
Abstract
Hundreds of researchers across the world use the Stanford Microarray Database (SMD; http://smd.stanford.edu/) to store, annotate, view, analyze and share microarray data. In addition to providing registered users at Stanford access to their own data, SMD also provides access to public data, and tools with which to analyze those data, to any public user anywhere in the world. Previously, the addition of new microarray data analysis tools to SMD has been limited by available engineering resources, and in addition, the existing suite of tools did not provide a simple way to design, execute and share analysis pipelines, or to document such pipelines for the purposes of publication. To address this, we have incorporated the GenePattern software package directly into SMD, providing access to many new analysis tools, as well as a plug-in architecture that allows users to directly integrate and share additional tools through SMD. In this article, we describe our implementation of the GenePattern microarray analysis software package into the SMD code base. This extension is available with the SMD source code that is fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD with an enriched data analysis capability.
Collapse
|
13
|
Abstract
The effective control of tuberculosis (TB) has been thwarted by the need for prolonged, complex and potentially toxic drug regimens, by reliance on an inefficient vaccine and by the absence of biomarkers of clinical status. The promise of the genomics era for TB control is substantial, but has been hindered by the lack of a central repository that collects and integrates genomic and experimental data about this organism in a way that can be readily accessed and analyzed. The Tuberculosis Database (TBDB) is an integrated database providing access to TB genomic data and resources, relevant to the discovery and development of TB drugs, vaccines and biomarkers. The current release of TBDB houses genome sequence data and annotations for 28 different Mycobacterium tuberculosis strains and related bacteria. TBDB stores pre- and post-publication gene-expression data from M. tuberculosis and its close relatives. TBDB currently hosts data for nearly 1500 public tuberculosis microarrays and 260 arrays for Streptomyces. In addition, TBDB provides access to a suite of comparative genomics and microarray analysis software. By bringing together M. tuberculosis genome annotation and gene-expression data with a suite of analysis tools, TBDB (http://www.tbdb.org/) provides a unique discovery platform for TB research.
Collapse
|
14
|
Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 2008; 26:889-96. [PMID: 18688244 PMCID: PMC2771753 DOI: 10.1038/nbt.1411] [Citation(s) in RCA: 356] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The Minimum Information for Biological and Biomedical Investigations (MIBBI) project provides a resource for those exploring the range of extant minimum information checklists and fosters coordinated development of such checklists.
Collapse
|
15
|
Domain-specific data sharing in neuroscience: what do we have to learn from each other? Neuroinformatics 2008; 6:117-21. [PMID: 18473189 DOI: 10.1007/s12021-008-9019-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/11/2008] [Indexed: 11/30/2022]
Abstract
Molecular biology and genomics have made notable strides in the sharing of primary data and resources. In other domains of neuroscience research, however, there has been resistance to adopting formalized strategies for data exchange, archiving, and availability. In this article, we discuss how neuroscience domains might follow the lead of molecular biology on what has been successful and what has failed in active data sharing. This considers not only the technical challenges but also the sociological concerns in making it possible. Though, not a pain-free process, with increased data availability, scientists from multiple fields can enjoy greater opportunity for novel discoveries about the brain in health and disease.
Collapse
|
16
|
Repeatability of published microarray gene expression analyses. Nat Genet 2008; 41:149-55. [PMID: 19174838 DOI: 10.1038/ng.295] [Citation(s) in RCA: 355] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2008] [Accepted: 11/04/2008] [Indexed: 12/14/2022]
Abstract
Given the complexity of microarray-based gene expression studies, guidelines encourage transparent design and public data availability. Several journals require public data deposition and several public databases exist. However, not all data are publicly available, and even when available, it is unknown whether the published results are reproducible by independent scientists. Here we evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005-2006. One table or figure from each article was independently evaluated by two teams of analysts. We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability, and discrepancies were mostly due to incomplete data annotation or specification of data processing and analysis. Repeatability of published microarray studies is apparently limited. More strict publication rules enforcing public data availability and explicit description of data processing and analysis should be considered.
Collapse
|
17
|
Abstract
Background MAGE-ML has been promoted as a standard format for describing microarray experiments and the data they produce. Two characteristics of the MAGE-ML format compromise its use as a universal standard: First, MAGE-ML files are exceptionally large – too large to be easily read by most people, and often too large to be read by most software programs. Second, the MAGE-ML standard permits many ways of representing the same information. As a result, different producers of MAGE-ML create different documents describing the same experiment and its data. Recognizing all the variants is an unwieldy software engineering task, resulting in software packages that can read and process MAGE-ML from some, but not all producers. This Tower of MAGE-ML Babel bars the unencumbered exchange of microarray experiment descriptions couched in MAGE-ML. Results We have developed XBabelPhish – an XQuery-based technology for translating one MAGE-ML variant into another. XBabelPhish's use is not restricted to translating MAGE-ML documents. It can transform XML files independent of their DTD, XML schema, or semantic content. Moreover, it is designed to work on very large (> 200 Mb.) files, which are common in the world of MAGE-ML. Conclusion XBabelPhish provides a way to inter-translate MAGE-ML variants for improved interchange of microarray experiment information. More generally, it can be used to transform most XML files, including very large ones that exceed the capacity of most XML tools.
Collapse
|
18
|
Abstract
The Microarray Gene Expression Data (MGED) society is an international organization established in 1999 for facilitating sharing of functional genomics and proteomics array data. To facilitate microarray data sharing, the MGED society has been working in establishing the relevant data standards. The three main components (which will be described in more detail later) of MGED standards are Minimum Information About a Microarray Experiment (MIAME), a document that outlines the minimum information that should be reported about a microarray experiment to enable its unambiguous interpretation and reproduction; MAGE, which consists of three parts, The Microarray Gene Expression Object Model (MAGE-OM), an XML-based document exchange format (MAGE-ML), which is derived directly from the object model, and the supporting tool kit MAGEstk; and MO, or MGED Ontology, which defines sets of common terms and annotation rules for microarray experiments, enabling unambiguous annotation and efficient queries, data analysis and data exchange without loss of meaning. We discuss here how these standards have been established, how they have evolved, and how they are used.
Collapse
|
19
|
Abstract
The Stanford Tissue Microarray Database (TMAD; http://tma.stanford.edu) is a public resource for disseminating annotated tissue images and associated expression data. Stanford University pathologists, researchers and their collaborators worldwide use TMAD for designing, viewing, scoring and analyzing their tissue microarrays. The use of tissue microarrays allows hundreds of human tissue cores to be simultaneously probed by antibodies to detect protein abundance (Immunohistochemistry; IHC), or by labeled nucleic acids (in situ hybridization; ISH) to detect transcript abundance. TMAD archives multi-wavelength fluorescence and bright-field images of tissue microarrays for scoring and analysis. As of July 2007, TMAD contained 205 161 images archiving 349 distinct probes on 1488 tissue microarray slides. Of these, 31 306 images for 68 probes on 125 slides have been released to the public. To date, 12 publications have been based on these raw public data. TMAD incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain. Image processing researchers can extract images and scores for training and testing classification algorithms. The production server uses the Apache HTTP Server, Oracle Database and Perl application code. Source code is available to interested researchers under a no-cost license.
Collapse
|
20
|
Development of the Minimum Information Specification for In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE). OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2007; 10:205-8. [PMID: 16901227 DOI: 10.1089/omi.2006.10.205] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We describe the creation process of the Minimum Information Specification for In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE). Modeled after the existing minimum information specification for microarray data, we created a new specification for gene expression localization experiments, initially to facilitate data sharing within a consortium. After successful use within the consortium, the specification was circulated to members of the wider biomedical research community for comment and refinement. After a period of acquiring many new suggested requirements, it was necessary to enter a final phase of excluding those requirements that were deemed inappropriate as a minimum requirement for all experiments. The full specification will soon be published as a version 1.0 proposal to the community, upon which a more full discussion must take place so that the final specification may be achieved with the involvement of the whole community.
Collapse
|
21
|
The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics. Nat Biotechnol 2007; 25:1127-33. [DOI: 10.1038/nbt1347] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
22
|
OntologyWidget - a reusable, embeddable widget for easily locating ontology terms. BMC Bioinformatics 2007; 8:338. [PMID: 17854506 PMCID: PMC2080642 DOI: 10.1186/1471-2105-8-338] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2007] [Accepted: 09/13/2007] [Indexed: 11/17/2022] Open
Abstract
Background Biomedical ontologies are being widely used to annotate biological data in a computer-accessible, consistent and well-defined manner. However, due to their size and complexity, annotating data with appropriate terms from an ontology is often challenging for experts and non-experts alike, because there exist few tools that allow one to quickly find relevant ontology terms to easily populate a web form. Results We have produced a tool, OntologyWidget, which allows users to rapidly search for and browse ontology terms. OntologyWidget can easily be embedded in other web-based applications. OntologyWidget is written using AJAX (Asynchronous JavaScript and XML) and has two related elements. The first is a dynamic auto-complete ontology search feature. As a user enters characters into the search box, the appropriate ontology is queried remotely for terms that match the typed-in text, and the query results populate a drop-down list with all potential matches. Upon selection of a term from the list, the user can locate this term within a generic and dynamic ontology browser, which comprises the second element of the tool. The ontology browser shows the paths from a selected term to the root as well as parent/child tree hierarchies. We have implemented web services at the Stanford Microarray Database (SMD), which provide the OntologyWidget with access to over 40 ontologies from the Open Biological Ontology (OBO) website [1]. Each ontology is updated weekly. Adopters of the OntologyWidget can either use SMD's web services, or elect to rely on their own. Deploying the OntologyWidget can be accomplished in three simple steps: (1) install Apache Tomcat [2] on one's web server, (2) download and install the OntologyWidget servlet stub that provides access to the SMD ontology web services, and (3) create an html (HyperText Markup Language) file that refers to the OntologyWidget using a simple, well-defined format. Conclusion We have developed OntologyWidget, an easy-to-use ontology search and display tool that can be used on any web page by creating a simple html description. OntologyWidget provides a rapid auto-complete search function paired with an interactive tree display. We have developed a web service layer that communicates between the web page interface and a database of ontology terms. We currently store 40 of the ontologies from the OBO website [1], as well as a several others. These ontologies are automatically updated on a weekly basis. OntologyWidget can be used in any web-based application to take advantage of the ontologies we provide via web services or any other ontology that is provided elsewhere in the correct format. The full source code for the JavaScript and description of the OntologyWidget is available from .
Collapse
|
23
|
Abstract
To avoid duplication of effort, slow adoption and inefficiency in development, those developing biological standards need to communicate more with each other, attract help from experts in the ontology/standards communities and keep focused on needs of users.
Collapse
|
24
|
The Stanford Microarray Database: implementation of new analysis tools and open source release of software. Nucleic Acids Res 2006; 35:D766-70. [PMID: 17182626 PMCID: PMC1781111 DOI: 10.1093/nar/gkl1019] [Citation(s) in RCA: 130] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The Stanford Microarray Database (SMD; ) is a research tool and archive that allows hundreds of researchers worldwide to store, annotate, analyze and share data generated by microarray technology. SMD supports most major microarray platforms, and is MIAME-supportive and can export or import MAGE-ML. The primary mission of SMD is to be a research tool that supports researchers from the point of data generation to data publication and dissemination, but it also provides unrestricted access to analysis tools and public data from 300 publications. In addition to supporting ongoing research, SMD makes its source code fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD. In this article, we describe several data analysis tools implemented in SMD and we discuss features of our software release.
Collapse
|
25
|
Abstract
The S. cerevisiae genome is the most well-characterized eukaryotic genome and one of the simplest in terms of identifying open reading frames (ORFs), yet its primary annotation has been updated continually in the decade since its initial release in 1996 (Goffeau et al., 1996). The Saccharomyces Genome Database (SGD; www.yeastgenome.org) (Hirschman et al., 2006), the community-designated repository for this reference genome, strives to ensure that the S. cerevisiae annotation is as accurate and useful as possible. At SGD, the S. cerevisiae genome sequence and annotation are treated as a working hypothesis, which must be repeatedly tested and refined. In this paper, in celebration of the tenth anniversary of the completion of the S. cerevisiae genome sequence, we discuss the ways in which the S. cerevisiae sequence and annotation have changed, consider the multiple sources of experimental and comparative data on which these changes are based, and describe our methods for evaluating, incorporating and documenting these new data.
Collapse
|
26
|
A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics 2006; 7:489. [PMID: 17087822 PMCID: PMC1687205 DOI: 10.1186/1471-2105-7-489] [Citation(s) in RCA: 160] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2006] [Accepted: 11/06/2006] [Indexed: 11/18/2022] Open
Abstract
Background Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support. Results We propose a simple tab-delimited, spreadsheet-based format, MAGE-TAB, which will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion. Conclusion MAGE-TAB will enable laboratories without bioinformatics experience or support to manage, exchange and submit well-annotated microarray data in a standard format using a spreadsheet. The MAGE-TAB format is self-contained, and does not require an understanding of MAGE-ML or XML.
Collapse
|
27
|
Abstract
The Stanford Microarray Database (SMD) is a DNA microarray research database that provides a large amount of data for public use. This chapter describes the use of the primary tools for searching, browsing, retrieving, and analyzing data available for SMD. With this introduction, researchers and students will be able to examine and analyze a large body of gene expression and other experiments. Additional tools for depositing, annotating, sharing, and analyzing data, available only to registered users, are also described. SMD is available for installation as a local database.
Collapse
|
28
|
The Stanford Microarray Database accommodates additional microarray platforms and data formats. Nucleic Acids Res 2005; 33:D580-2. [PMID: 15608265 PMCID: PMC539960 DOI: 10.1093/nar/gki006] [Citation(s) in RCA: 150] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The Stanford Microarray Database (SMD) (http://smd.stanford.edu) is a research tool for hundreds of Stanford researchers and their collaborators. In addition, SMD functions as a resource for the entire biological research community by providing unrestricted access to microarray data published by SMD users and by disseminating its source code. In addition to storing GenePix (Axon Instruments) and ScanAlyze output from spotted microarrays, SMD has recently added the ability to store, retrieve, display and analyze the complete raw data produced by several additional microarray platforms and image analysis software packages, so that we can also now accept data from Affymetrix GeneChips (MAS5/GCOS or dChip), Agilent Catalog or Custom arrays (using Agilent's Feature Extraction software) or data created by SpotReader (Niles Scientific). We have implemented software that allows us to accept MAGE-ML documents from array manufacturers and to submit MIAME-compliant data in MAGE-ML format directly to ArrayExpress and GEO, greatly increasing the ease with which data from SMD can be published adhering to accepted standards and also increasing the accessibility of published microarray data to the general public. We have introduced a new tool to facilitate data sharing among our users, so that datasets can be shared during, before or after the completion of data analysis. The latest version of the source code for the complete database package was released in November 2004 (http://smd.stanford.edu/download/), allowing researchers around the world to deploy their own installations of SMD.
Collapse
|
29
|
|
30
|
Abstract
Microarray technology has been widely adopted by researchers who use both home-made microarrays and microarrays purchased from commercial vendors. Associated with the adoption of this technology has been a deluge of complex data, both from the microarrays themselves, and also in the form of associated meta data, such as gene annotation information, the properties and treatment of biological samples, and the data transformation and analysis steps taken downstream. In addition, standards for annotation and data exchange have been proposed, and are now being adopted by journals and funding agencies alike. The coupling of large quantities of complex data with extensive and complex standards require all but the most small-scale of microarray users to have access to a robust and scaleable database with various tools. In this review, we discuss some of the desirable properties of such a database, and look at the features of several freely available alternatives.
Collapse
|
31
|
The novel marker, DOG1, is expressed ubiquitously in gastrointestinal stromal tumors irrespective of KIT or PDGFRA mutation status. THE AMERICAN JOURNAL OF PATHOLOGY 2004; 165:107-13. [PMID: 15215166 PMCID: PMC1618538 DOI: 10.1016/s0002-9440(10)63279-8] [Citation(s) in RCA: 440] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
We recently characterized gene expression patterns in gastrointestinal stromal tumors (GISTs) using cDNA microarrays, and found that the gene FLJ10261 (DOG1, discovered on GIST-1), encoding a hypothetical protein, was specifically expressed in GISTs. The immunoreactivity of a rabbit antiserum to synthetic DOG1 peptides was assessed on two soft tissue tumor microarrays. The tissue microarrays included 587 soft tissue tumors, with 149 GISTs, including 127 GIST cases for which the KIT and PDGFRA mutation status was known. Immunoreactivity for DOG1 was found in 136 of 139 (97.8%) of scorable GISTs. All seven GIST cases with a PDGFRA mutation were DOG1-positive, while most of these failed to react for KIT. The immunohistochemical findings were confirmed with in situ hybridization probes for DOG1, KIT, and PDGFRA. Other neoplasms in the differential diagnosis of GIST, including desmoid fibromatosis (0 of 17) and Schwannoma (0 of 3), were immunonegative for DOG1. Only 4 of 438 non-GIST cases were immunoreactive for DOG1. DOG1, a protein of unknown function, is expressed strongly on the cell surface of GISTs and is rarely expressed in other soft tissue tumors. Reactivity for DOG1 may aid in the diagnosis of GISTs, including PDGFRA mutants that fail to express KIT antigen, and lead to appropriate treatment with imatinib mesylate, an inhibitor of the KIT tyrosine kinase.
Collapse
|
32
|
The novel marker, DOG1, is expressed ubiquitously in gastrointestinal stromal tumors irrespective of KIT or PDGFRA mutation status. THE AMERICAN JOURNAL OF PATHOLOGY 2004; 321:141-9. [PMID: 15215166 DOI: 10.1016/j.ydbio.2008.06.009] [Citation(s) in RCA: 181] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 04/11/2008] [Revised: 06/03/2008] [Accepted: 06/04/2008] [Indexed: 11/19/2022]
Abstract
We recently characterized gene expression patterns in gastrointestinal stromal tumors (GISTs) using cDNA microarrays, and found that the gene FLJ10261 (DOG1, discovered on GIST-1), encoding a hypothetical protein, was specifically expressed in GISTs. The immunoreactivity of a rabbit antiserum to synthetic DOG1 peptides was assessed on two soft tissue tumor microarrays. The tissue microarrays included 587 soft tissue tumors, with 149 GISTs, including 127 GIST cases for which the KIT and PDGFRA mutation status was known. Immunoreactivity for DOG1 was found in 136 of 139 (97.8%) of scorable GISTs. All seven GIST cases with a PDGFRA mutation were DOG1-positive, while most of these failed to react for KIT. The immunohistochemical findings were confirmed with in situ hybridization probes for DOG1, KIT, and PDGFRA. Other neoplasms in the differential diagnosis of GIST, including desmoid fibromatosis (0 of 17) and Schwannoma (0 of 3), were immunonegative for DOG1. Only 4 of 438 non-GIST cases were immunoreactive for DOG1. DOG1, a protein of unknown function, is expressed strongly on the cell surface of GISTs and is rarely expressed in other soft tissue tumors. Reactivity for DOG1 may aid in the diagnosis of GISTs, including PDGFRA mutants that fail to express KIT antigen, and lead to appropriate treatment with imatinib mesylate, an inhibitor of the KIT tyrosine kinase.
Collapse
|
33
|
Abstract
The Microarray Gene Expression Data Society believe that the time is right for journals to require that microarray data be deposited in public repositories, as a condition for publication
Collapse
|
34
|
Minimum information about a functional genomics experiment: the state of microarray standards and their extension to other technologies. ACTA ACUST UNITED AC 2004. [DOI: 10.1016/s1741-8372(04)02435-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
35
|
Microarray databases: storage and retrieval of microarray data. Methods Mol Biol 2004; 224:235-48. [PMID: 12710676 DOI: 10.1385/1-59259-364-x:235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
36
|
The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res 2003; 31:94-6. [PMID: 12519956 PMCID: PMC165525 DOI: 10.1093/nar/gkg078] [Citation(s) in RCA: 260] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2002] [Revised: 10/11/2002] [Accepted: 10/11/2002] [Indexed: 11/12/2022] Open
Abstract
The Stanford Microarray Database (SMD; http://genome-www.stanford.edu/microarray/) serves as a microarray research database for Stanford investigators and their collaborators. In addition, SMD functions as a resource for the entire scientific community, by making freely available all of its source code and providing full public access to data published by SMD users, along with many tools to explore and analyze those data. SMD currently provides public access to data from 3500 microarrays, including data from 85 publications, and this total is increasing rapidly. In this article, we describe some of SMD's newer tools for accessing public data, assessing data quality and for data analysis.
Collapse
|
37
|
Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell 2002. [PMID: 12058064 DOI: 10.1091/mbc.02-02-0030.] [Citation(s) in RCA: 130] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
The genome-wide program of gene expression during the cell division cycle in a human cancer cell line (HeLa) was characterized using cDNA microarrays. Transcripts of >850 genes showed periodic variation during the cell cycle. Hierarchical clustering of the expression patterns revealed coexpressed groups of previously well-characterized genes involved in essential cell cycle processes such as DNA replication, chromosome segregation, and cell adhesion along with genes of uncharacterized function. Most of the genes whose expression had previously been reported to correlate with the proliferative state of tumors were found herein also to be periodically expressed during the HeLa cell cycle. However, some of the genes periodically expressed in the HeLa cell cycle do not have a consistent correlation with tumor proliferation. Cell cycle-regulated transcripts of genes involved in fundamental processes such as DNA replication and chromosome segregation seem to be more highly expressed in proliferative tumors simply because they contain more cycling cells. The data in this report provide a comprehensive catalog of cell cycle regulated genes that can serve as a starting point for functional discovery. The full dataset is available at http://genome-www.stanford.edu/Human-CellCycle/HeLa/.
Collapse
|
38
|
Abstract
A single microarray can provide information on the expression of tens of thousands of genes. The amount of information generated by a microarray-based experiment is sufficiently large that no single study can be expected to mine each nugget of scientific information. As a consequence, the scale and complexity of microarray experiments require that computer software programs do much of the data processing, storage, visualization, analysis and transfer. The adoption of common standards and ontologies for the management and sharing of microarray data is essential and will provide immediate benefit to the research community.
Collapse
|
39
|
|
40
|
|
41
|
|
42
|
Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell 2002; 13:1977-2000. [PMID: 12058064 PMCID: PMC117619 DOI: 10.1091/mbc.02-02-0030] [Citation(s) in RCA: 1076] [Impact Index Per Article: 48.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The genome-wide program of gene expression during the cell division cycle in a human cancer cell line (HeLa) was characterized using cDNA microarrays. Transcripts of >850 genes showed periodic variation during the cell cycle. Hierarchical clustering of the expression patterns revealed coexpressed groups of previously well-characterized genes involved in essential cell cycle processes such as DNA replication, chromosome segregation, and cell adhesion along with genes of uncharacterized function. Most of the genes whose expression had previously been reported to correlate with the proliferative state of tumors were found herein also to be periodically expressed during the HeLa cell cycle. However, some of the genes periodically expressed in the HeLa cell cycle do not have a consistent correlation with tumor proliferation. Cell cycle-regulated transcripts of genes involved in fundamental processes such as DNA replication and chromosome segregation seem to be more highly expressed in proliferative tumors simply because they contain more cycling cells. The data in this report provide a comprehensive catalog of cell cycle regulated genes that can serve as a starting point for functional discovery. The full dataset is available at http://genome-www.stanford.edu/Human-CellCycle/HeLa/.
Collapse
|
43
|
Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res 2002; 30:69-72. [PMID: 11752257 PMCID: PMC99086 DOI: 10.1093/nar/30.1.69] [Citation(s) in RCA: 272] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Saccharomyces Genome Database (SGD) resources, ranging from genetic and physical maps to genome-wide analysis tools, reflect the scientific progress in identifying genes and their functions over the last decade. As emphasis shifts from identification of the genes to identification of the role of their gene products in the cell, SGD seeks to provide its users with annotations that will allow relationships to be made between gene products, both within Saccharomyces cerevisiae and across species. To this end, SGD is annotating genes to the Gene Ontology (GO), a structured representation of biological knowledge that can be shared across species. The GO consists of three separate ontologies describing molecular function, biological process and cellular component. The goal is to use published information to associate each characterized S.cerevisiae gene product with one or more GO terms from each of the three ontologies. To be useful, this must be done in a manner that allows accurate associations based on experimental evidence, modifications to GO when necessary, and careful documentation of the annotations through evidence codes for given citations. Reaching this goal is an ongoing process at SGD. For information on the current progress of GO annotations at SGD and other participating databases, as well as a description of each of the three ontologies, please visit the GO Consortium page at http://www.geneontology.org. SGD gene associations to GO can be found by visiting our site at http://genome-www.stanford.edu/Saccharomyces/.
Collapse
|
44
|
Abstract
Microarray analysis has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of standards for presenting and exchanging such data. Here we present a proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary information rather than the technical format for capturing it.
Collapse
|
45
|
Abstract
In 2000, the number of completely sequenced eukaryotic genomes increased to four. The addition of Drosophila and Arabidopsis into this cohort permits additional insights into the processes that have shaped evolution. Analysis and comparisons of both completed genomes and partially sequenced genomes have already shed light on mechanisms such as gene duplication and gene loss that have long been hypothesized to be major forces in speciation. Indeed, duplicate gene pairs in Saccharomyces, Arabidopsis, Caenorhabditis and Drosophila are high: 30%, 60%, 48% and 40%, respectively. Evidence of horizontal gene-transfer, thought to be a major evolutionary force in bacteria, has been found in Arabidopsis. The release of the 'first draft' of the human genome sequence in 2000 heralds a new stage of biological study. Understanding the as-yet-unannotated human genome will be largely based on conclusions, techniques and tools developed during the analysis and comparison of the genome of these four model organisms.
Collapse
|
46
|
Saccharomyces Genome Database provides tools to survey gene expression and functional analysis data. Nucleic Acids Res 2001; 29:80-1. [PMID: 11125055 PMCID: PMC29796 DOI: 10.1093/nar/29.1.80] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Upon the completion of the SACCHAROMYCES: cerevisiae genomic sequence in 1996 [Goffeau,A. et al. (1997) NATURE:, 387, 5], several creative and ambitious projects have been initiated to explore the functions of gene products or gene expression on a genome-wide scale. To help researchers take advantage of these projects, the SACCHAROMYCES: Genome Database (SGD) has created two new tools, Function Junction and Expression Connection. Together, the tools form a central resource for querying multiple large-scale analysis projects for data about individual genes. Function Junction provides information from diverse projects that shed light on the role a gene product plays in the cell, while Expression Connection delivers information produced by the ever-increasing number of microarray projects. WWW access to SGD is available at genome-www.stanford. edu/Saccharomyces/.
Collapse
|
47
|
Abstract
The Stanford Microarray Database (SMD) stores raw and normalized data from microarray experiments, and provides web interfaces for researchers to retrieve, analyze and visualize their data. The two immediate goals for SMD are to serve as a storage site for microarray data from ongoing research at Stanford University, and to facilitate the public dissemination of that data once published, or released by the researcher. Of paramount importance is the connection of microarray data with the biological data that pertains to the DNA deposited on the microarray (genes, clones etc.). SMD makes use of many public resources to connect expression information to the relevant biology, including SGD [Ball,C.A., Dolinski,K., Dwight,S.S., Harris,M.A., Issel-Tarver,L., Kasarskis,A., Scafe,C.R., Sherlock,G., Binkley,G., Jin,H. et al. (2000) Nucleic Acids Res., 28, 77-80], YPD and WormPD [Costanzo,M.C., Hogan,J.D., Cusick,M.E., Davis,B.P., Fancher,A.M., Hodges,P.E., Kondu,P., Lengieza,C., Lew-Smith,J.E., Lingner,C. et al. (2000) Nucleic Acids Res., 28, 73-76], Unigene [Wheeler,D.L., Chappey,C., Lash,A.E., Leipe,D.D., Madden,T.L., Schuler,G.D., Tatusova,T.A. and Rapp,B.A. (2000) Nucleic Acids Res., 28, 10-14], dbEST [Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) Nature Genet., 4, 332-333] and SWISS-PROT [Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45-48] and can be accessed at http://genome-www.stanford.edu/microarray.
Collapse
|
48
|
Abstract
Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
Collapse
|
49
|
Abstract
Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
Collapse
|
50
|
Autonomy, justice, and disability. UCLA LAW REVIEW. UNIVERSITY OF CALIFORNIA, LOS ANGELES. SCHOOL OF LAW 2000; 47:599-651. [PMID: 16273682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
In this Article, Professor Carlos A. Ball explores the philosophical foundations for the types of rights and benefits that our society currently provides to individuals with disabilities. The concept of autonomy places on society a moral obligation to assist individuals with disabilities when their basic human functional capabilities are impaired. The exercise of this obligation entails assisting individuals with crossing a minimum threshold of functional capabilities below which it is not possible to lead autonomous lives. In making this argument, Professor Ball responds to libertarian critics who contend that notions of freedom or liberty proscribe an activist role for government in this arena. He explains how even a libertarian state redistributes wealth in order to provide for some incapacities. Professor Ball also disputes the idea that the meeting of the needs of the disabled is enough to provide moral justification for the rights and benefits provided to individuals with disabilities. The problem with the concept of needs, Professor Ball argues, is that it fails to account sufficiently for the human good of personal autonomy.
Collapse
|