1
|
Bolduc B, Hodgkins SB, Varner RK, Crill PM, McCalley CK, Chanton JP, Tyson GW, Riley WJ, Palace M, Duhaime MB, Hough MA, Saleska SR, Sullivan MB, Rich VI. The IsoGenie database: an interdisciplinary data management solution for ecosystems biology and environmental research. PeerJ 2020. [DOI: 10.7717/peerj.9467] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Modern microbial and ecosystem sciences require diverse interdisciplinary teams that are often challenged in “speaking” to one another due to different languages and data product types. Here we introduce the IsoGenie Database (IsoGenieDB; https://isogenie-db.asc.ohio-state.edu/), a de novo developed data management and exploration platform, as a solution to this challenge of accurately representing and integrating heterogenous environmental and microbial data across ecosystem scales. The IsoGenieDB is a public and private data infrastructure designed to store and query data generated by the IsoGenie Project, a ~10 year DOE-funded project focused on discovering ecosystem climate feedbacks in a thawing permafrost landscape. The IsoGenieDB provides (i) a platform for IsoGenie Project members to explore the project’s interdisciplinary datasets across scales through the inherent relationships among data entities, (ii) a framework to consolidate and harmonize the datasets needed by the team’s modelers, and (iii) a public venue that leverages the same spatially explicit, disciplinarily integrated data structure to share published datasets. The IsoGenieDB is also being expanded to cover the NASA-funded Archaea to Atmosphere (A2A) project, which scales the findings of IsoGenie to a broader suite of Arctic peatlands, via the umbrella A2A Database (A2A-DB). The IsoGenieDB’s expandability and flexible architecture allow it to serve as an example ecosystems database.
Collapse
Affiliation(s)
- Benjamin Bolduc
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
| | | | - Ruth K. Varner
- Earth Systems Research Center, Institute for the Study of Earth, Oceans and Space, University of New Hampshire, Durham, NH, USA
- Department of Earth Sciences, College of Engineering and Physical Sciences, University of New Hampshire, Durham, NH, USA
| | - Patrick M. Crill
- Department of Geological Sciences and Bolin Centre for Climate Research, Stockholm University, Stockholm, Sweden
| | - Carmody K. McCalley
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, NY, USA
| | - Jeffrey P. Chanton
- Department of Earth, Ocean, and Atmospheric Science, Florida State University, Tallahassee, FL, USA
| | - Gene W. Tyson
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - William J. Riley
- Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Michael Palace
- Earth Systems Research Center, Institute for the Study of Earth, Oceans and Space, University of New Hampshire, Durham, NH, USA
- Department of Earth Sciences, College of Engineering and Physical Sciences, University of New Hampshire, Durham, NH, USA
| | - Melissa B. Duhaime
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Moira A. Hough
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Scott R. Saleska
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Matthew B. Sullivan
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
- Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, OH, USA
| | - Virginia I. Rich
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
| | | |
Collapse
|
2
|
Hermida L, Poussin C, Stadler MB, Gubian S, Sewer A, Gaidatzis D, Hotz HR, Martin F, Belcastro V, Cano S, Peitsch MC, Hoeng J. Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data. BMC Genomics 2013; 14:514. [PMID: 23895370 PMCID: PMC3750322 DOI: 10.1186/1471-2164-14-514] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2012] [Accepted: 07/17/2013] [Indexed: 11/18/2022] Open
Abstract
Background High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as ” contrast data”) in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.). Results To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset. Conclusion Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from http://sourceforge.net/projects/confero/.
Collapse
Affiliation(s)
- Leandro Hermida
- Philip Morris International Research & Development, Quai Jeanrenaud 5, CH-2000 Neuchatel, Switzerland.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Gattiker A, Hermida L, Liechti R, Xenarios I, Collin O, Rougemont J, Primig M. MIMAS 3.0 is a Multiomics Information Management and Annotation System. BMC Bioinformatics 2009; 10:151. [PMID: 19450266 PMCID: PMC2694794 DOI: 10.1186/1471-2105-10-151] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2009] [Accepted: 05/18/2009] [Indexed: 01/08/2023] Open
Abstract
Background DNA sequence integrity, mRNA concentrations and protein-DNA interactions have been subject to genome-wide analyses based on microarrays with ever increasing efficiency and reliability over the past fifteen years. However, very recently novel technologies for Ultra High-Throughput DNA Sequencing (UHTS) have been harnessed to study these phenomena with unprecedented precision. As a consequence, the extensive bioinformatics environment available for array data management, analysis, interpretation and publication must be extended to include these novel sequencing data types. Description MIMAS was originally conceived as a simple, convenient and local Microarray Information Management and Annotation System focused on GeneChips for expression profiling studies. MIMAS 3.0 enables users to manage data from high-density oligonucleotide SNP Chips, expression arrays (both 3'UTR and tiling) and promoter arrays, BeadArrays as well as UHTS data using MIAME-compliant standardized vocabulary. Importantly, researchers can export data in MAGE-TAB format and upload them to the EBI's ArrayExpress certified data repository using a one-step procedure. Conclusion We have vastly extended the capability of the system such that it processes the data output of six types of GeneChips (Affymetrix), two different BeadArrays for mRNA and miRNA (Illumina) and the Genome Analyzer (a popular Ultra-High Throughput DNA Sequencer, Illumina), without compromising on its flexibility and user-friendliness. MIMAS, appropriately renamed into Multiomics Information Management and Annotation System, is currently used by scientists working in approximately 50 academic laboratories and genomics platforms in Switzerland and France. MIMAS 3.0 is freely available via .
Collapse
Affiliation(s)
- Alexandre Gattiker
- Inserm, U625, GERHM; IFR-140; Université de Rennes 1, Rennes F-35042, France.
| | | | | | | | | | | | | |
Collapse
|
4
|
Mathur S, Visvanathan M, Svojanovsky S, Yoo B, Srinivas AB, Lushington GH, Smith PG. GOAPhAR: An Integrative Discovery Tool for Annotation, Pathway Analysis. THE OPEN BIOINFORMATICS JOURNAL 2009; 3:26-30. [PMID: 21132056 DOI: 10.2174/1875036200903010026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We have developed the web based tool GOAPhAR (Gene Ontology, Annotations and Pathways for Array Research), that integrates information from disparate sources regarding gene annotations, protein annotations, identifiers associated with probe sets, functional pathways, protein interactions, Gene Ontology, publicly available microarray datasets and tools for statistically validating clusters in microarray data. Genes of interest can be input as Affymetrix probe identifiers, Genbank, or Unigene identifiers for human, mouse or rat genomes. Results are provided in a user friendly interface with hyperlinks to the sources of information.
Collapse
Affiliation(s)
- Sachin Mathur
- School of Computing and Engineering, University of Missouri-Kansas City, MO 64110, USA
| | | | | | | | | | | | | |
Collapse
|
5
|
Chalmel F, Primig M. The Annotation, Mapping, Expression and Network (AMEN) suite of tools for molecular systems biology. BMC Bioinformatics 2008; 9:86. [PMID: 18254954 PMCID: PMC2375118 DOI: 10.1186/1471-2105-9-86] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2007] [Accepted: 02/06/2008] [Indexed: 11/10/2022] Open
Abstract
Background High-throughput genome biological experiments yield large and multifaceted datasets that require flexible and user-friendly analysis tools to facilitate their interpretation by life scientists. Many solutions currently exist, but they are often limited to specific steps in the complex process of data management and analysis and some require extensive informatics skills to be installed and run efficiently. Results We developed the Annotation, Mapping, Expression and Network (AMEN) software as a stand-alone, unified suite of tools that enables biological and medical researchers with basic bioinformatics training to manage and explore genome annotation, chromosomal mapping, protein-protein interaction, expression profiling and proteomics data. The current version provides modules for (i) uploading and pre-processing data from microarray expression profiling experiments, (ii) detecting groups of significantly co-expressed genes, and (iii) searching for enrichment of functional annotations within those groups. Moreover, the user interface is designed to simultaneously visualize several types of data such as protein-protein interaction networks in conjunction with expression profiles and cellular co-localization patterns. We have successfully applied the program to interpret expression profiling data from budding yeast, rodents and human. Conclusion AMEN is an innovative solution for molecular systems biological data analysis freely available under the GNU license. The program is available via a website at the Sourceforge portal which includes a user guide with concrete examples, links to external databases and helpful comments to implement additional functionalities. We emphasize that AMEN will continue to be developed and maintained by our laboratory because it has proven to be extremely useful for our genome biological research program.
Collapse
Affiliation(s)
- Frédéric Chalmel
- Institut National de la Santé et de la Recherche Médicale Unité 625, Groupe d'Etude de la Reproduction chez l'Homme et les Mammifères, Institut Fédératif de Recherche 140, F-35042 Rennes, France.
| | | |
Collapse
|
6
|
Wyrobek AJ, Mulvihill JJ, Wassom JS, Malling HV, Shelby MD, Lewis SE, Witt KL, Preston RJ, Perreault SD, Allen JW, DeMarini DM, Woychik RP, Bishop JB. Assessing human germ-cell mutagenesis in the Postgenome Era: a celebration of the legacy of William Lawson (Bill) Russell. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2007; 48:71-95. [PMID: 17295306 PMCID: PMC2071946 DOI: 10.1002/em.20284] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Birth defects, de novo genetic diseases, and chromosomal abnormality syndromes occur in approximately 5% of all live births, and affected children suffer from a broad range of lifelong health consequences. Despite the social and medical impact of these defects, and the 8 decades of research in animal systems that have identified numerous germ-cell mutagens, no human germ-cell mutagen has been confirmed to date. There is now a growing consensus that the inability to detect human germ-cell mutagens is due to technological limitations in the detection of random mutations rather than biological differences between animal and human susceptibility. A multidisciplinary workshop responding to this challenge convened at The Jackson Laboratory in Bar Harbor, Maine. The purpose of the workshop was to assess the applicability of an emerging repertoire of genomic technologies to studies of human germ-cell mutagenesis. Workshop participants recommended large-scale human germ-cell mutation studies be conducted using samples from donors with high-dose exposures, such as cancer survivors. Within this high-risk cohort, parents and children could be evaluated for heritable changes in (a) DNA sequence and chromosomal structure, (b) repeat sequences and minisatellites, and (c) global gene expression profiles and pathways. Participants also advocated the establishment of a bio-bank of human tissue samples from donors with well-characterized exposure, including medical and reproductive histories. This mutational resource could support large-scale, multiple-endpoint studies. Additional studies could involve the examination of transgenerational effects associated with changes in imprinting and methylation patterns, nucleotide repeats, and mitochondrial DNA mutations. The further development of animal models and the integration of these with human studies are necessary to provide molecular insights into the mechanisms of germ-cell mutations and to identify prevention strategies. Furthermore, scientific specialty groups should be convened to review and prioritize the evidence for germ-cell mutagenicity from common environmental, occupational, medical, and lifestyle exposures. Workshop attendees agreed on the need for a full-scale assault to address key fundamental questions in human germ-cell environmental mutagenesis. These include, but are not limited to, the following: Do human germ-cell mutagens exist? What are the risks to future generations? Are some parents at higher risk than others for acquiring and transmitting germ-cell mutations? Obtaining answers to these, and other critical questions, will require strong support from relevant funding agencies, in addition to the engagement of scientists outside the fields of genomics and germ-cell mutagenesis.
Collapse
Affiliation(s)
| | - John J. Mulvihill
- University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma
| | - John S. Wassom
- YAHSGS, LLC, Richland, Washington
- Oak Ridge National Laboratory, Oak Ridge, Tennessee
| | - Heinrich V. Malling
- National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina
| | - Michael D. Shelby
- National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina
| | | | - Kristine L. Witt
- National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina
| | - R. Julian Preston
- US Environmental Protection Agency, Research Triangle Park, North Carolina
| | - Sally D. Perreault
- US Environmental Protection Agency, Research Triangle Park, North Carolina
| | - James W. Allen
- US Environmental Protection Agency, Research Triangle Park, North Carolina
| | - David M. DeMarini
- US Environmental Protection Agency, Research Triangle Park, North Carolina
| | | | - Jack B. Bishop
- National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina
- *Correspondence to: Dr. Jack B. Bishop, National Institute of Environmental Health Sciences, EC-01, PO Box 12233, Research Triangle Park, North Carolina, USA. E-mail:
| | | |
Collapse
|
7
|
PASSIM--an open source software system for managing information in biomedical studies. BMC Bioinformatics 2007; 8:52. [PMID: 17291344 PMCID: PMC1803798 DOI: 10.1186/1471-2105-8-52] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2006] [Accepted: 02/09/2007] [Indexed: 11/10/2022] Open
Abstract
Background One of the crucial aspects of day-to-day laboratory information management is collection, storage and retrieval of information about research subjects and biomedical samples. An efficient link between sample data and experiment results is absolutely imperative for a successful outcome of a biomedical study. Currently available software solutions are largely limited to large-scale, expensive commercial Laboratory Information Management Systems (LIMS). Acquiring such LIMS indeed can bring laboratory information management to a higher level, but often implies sufficient investment of time, effort and funds, which are not always available. There is a clear need for lightweight open source systems for patient and sample information management. Results We present a web-based tool for submission, management and retrieval of sample and research subject data. The system secures confidentiality by separating anonymized sample information from individuals' records. It is simple and generic, and can be customised for various biomedical studies. Information can be both entered and accessed using the same web interface. User groups and their privileges can be defined. The system is open-source and is supplied with an on-line tutorial and necessary documentation. It has proven to be successful in a large international collaborative project. Conclusion The presented system closes the gap between the need and the availability of lightweight software solutions for managing information in biomedical studies involving human research subjects.
Collapse
|
8
|
Ashbya Genome Database 3.0: a cross-species genome and transcriptome browser for yeast biologists. BMC Genomics 2007; 8:9. [PMID: 17212814 PMCID: PMC1779777 DOI: 10.1186/1471-2164-8-9] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2006] [Accepted: 01/09/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Ashbya Genome Database (AGD) 3.0 is an innovative cross-species genome and transcriptome browser based on release 40 of the Ensembl developer environment. DESCRIPTION AGD 3.0 provides information on 4726 protein-encoding loci and 293 non-coding RNA genes present in the genome of the filamentous fungus Ashbya gossypii. A synteny viewer depicts the chromosomal location and orientation of orthologous genes in the budding yeast Saccharomyces cerevisiae. Genome-wide expression profiling data obtained with high-density oligonucleotide microarrays (GeneChips) are available for nearly all currently annotated protein-coding loci in A. gossypii and S. cerevisiae. CONCLUSION AGD 3.0 hence provides yeast- and genome biologists with comprehensive report pages including reliable DNA annotation, Gene Ontology terms associated with S. cerevisiae orthologues and RNA expression data as well as numerous links to external sources of information. The database is accessible at http://agd.vital-it.ch/.
Collapse
|
9
|
Heber S, Sick B. Quality assessment of Affymetrix GeneChip data. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2006; 10:358-68. [PMID: 17069513 DOI: 10.1089/omi.2006.10.358] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Affymetrix GeneChips are one of the best established microarray platforms. This powerful technique allows users to measure the expression of thousands of genes simultaneously. However, a microarray experiment is a sophisticated and time consuming endeavor with many potential sources of unwanted variation that could compromise the results if left uncontrolled. Increasing data volume and data complexity have triggered growing concern and awareness of the importance of assessing the quality of generated microarray data. In this review, we give an overview of current methods and software tools for quality assessment of Affymetrix GeneChip data. We focus on quality metrics, diagnostic plots, probe-level methods, pseudo-images, and classification methods to identify corrupted chips. We also describe RNA quality assessment methods which play an important role in challenging RNA sources like formalin embedded biopsies, laser-micro dissected samples, or single cells. No wet-lab methods are discussed in this paper.
Collapse
Affiliation(s)
- Steffen Heber
- Department of Computer Science, North Carolina State University, Raleigh, North Carolina, USA
| | | |
Collapse
|
10
|
Gattiker A, Niederhauser-Wiederkehr C, Moore J, Hermida L, Primig M. The GermOnline cross-species systems browser provides comprehensive information on genes and gene products relevant for sexual reproduction. Nucleic Acids Res 2006; 35:D457-62. [PMID: 17145711 PMCID: PMC1751528 DOI: 10.1093/nar/gkl957] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We report a novel release of the GermOnline knowledgebase covering genes relevant for the cell cycle, gametogenesis and fertility. GermOnline was extended into a cross-species systems browser including information on DNA sequence annotation, gene expression and the function of gene products. The database covers eight model organisms and Homo sapiens, for which complete genome annotation data are available. The database is now built around a sophisticated genome browser (Ensembl), our own microarray information management and annotation system (MIMAS) used to extensively describe experimental data obtained with high-density oligonucleotide microarrays (GeneChips) and a comprehensive system for online editing of database entries (MediaWiki). The RNA data include results from classical microarrays as well as tiling arrays that yield information on RNA expression levels, transcript start sites and lengths as well as exon composition. Members of the research community are solicited to help GermOnline curators keep database entries on genes and gene products complete and accurate. The database is accessible at .
Collapse
Affiliation(s)
| | | | | | | | - Michael Primig
- To whom correspondence should be addressed. Tel: +41 61 267 2098; Fax: +41 61 267 3398;
| |
Collapse
|