1
|
Emissah H, Ljungquist B, Ascoli GA. Bibliometric analysis of neuroscience publications quantifies the impact of data sharing. Bioinformatics 2023; 39:btad746. [PMID: 38070153 PMCID: PMC10733721 DOI: 10.1093/bioinformatics/btad746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/01/2023] [Accepted: 12/07/2023] [Indexed: 12/19/2023] Open
Abstract
SUMMARY Neural morphology, the branching geometry of brain cells, is an essential cellular substrate of nervous system function and pathology. Despite the accelerating production of digital reconstructions of neural morphology, the public accessibility of data remains a core issue in neuroscience. Deficiencies in the availability of existing data create redundancy of research efforts and limit synergy. We carried out a comprehensive bibliometric analysis of neural morphology publications to quantify the impact of data sharing in the neuroscience community. Our findings demonstrate that sharing digital reconstructions of neural morphology via NeuroMorpho.Org leads to a significant increase of citations to the original article, thus directly benefiting authors. The rate of data reusage remains constant for at least 16 years after sharing (the whole period analyzed), altogether nearly doubling the peer-reviewed discoveries in the field. Furthermore, the recent availability of larger and more numerous datasets fostered integrative applications, which accrue on average twice the citations of re-analyses of individual datasets. We also released an open-source citation tracking web-service allowing researchers to monitor reusage of their datasets in independent peer-reviewed reports. These results and tools can facilitate the recognition of shared data reuse for merit evaluations and funding decisions. AVAILABILITY AND IMPLEMENTATION The application is available at: http://cng-nmo-dev3.orc.gmu.edu:8181/. The source code at https://github.com/HerveEmissah/nmo-authors-app and https://github.com/HerveEmissah/nmo-bibliometric-analysis.
Collapse
Affiliation(s)
- Herve Emissah
- Bioinformatics Program, College of Science, George Mason University, Fairfax, VA 22030, United States
- Center for Neural Informatics, Structures, & Plasticity (CN3) and Bioengineering Department, College of Engineering & Computing, George Mason University, Fairfax, VA 22030, United States
| | - Bengt Ljungquist
- Center for Neural Informatics, Structures, & Plasticity (CN3) and Bioengineering Department, College of Engineering & Computing, George Mason University, Fairfax, VA 22030, United States
| | - Giorgio A Ascoli
- Bioinformatics Program, College of Science, George Mason University, Fairfax, VA 22030, United States
- Center for Neural Informatics, Structures, & Plasticity (CN3) and Bioengineering Department, College of Engineering & Computing, George Mason University, Fairfax, VA 22030, United States
| |
Collapse
|
2
|
Falconnier C, Caparros-Roissard A, Decraene C, Lutz PE. Functional genomic mechanisms of opioid action and opioid use disorder: a systematic review of animal models and human studies. Mol Psychiatry 2023; 28:4568-4584. [PMID: 37723284 PMCID: PMC10914629 DOI: 10.1038/s41380-023-02238-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 08/17/2023] [Accepted: 08/24/2023] [Indexed: 09/20/2023]
Abstract
In the past two decades, over-prescription of opioids for pain management has driven a steep increase in opioid use disorder (OUD) and death by overdose, exerting a dramatic toll on western countries. OUD is a chronic relapsing disease associated with a lifetime struggle to control drug consumption, suggesting that opioids trigger long-lasting brain adaptations, notably through functional genomic and epigenomic mechanisms. Current understanding of these processes, however, remain scarce, and have not been previously reviewed systematically. To do so, the goal of the present work was to synthesize current knowledge on genome-wide transcriptomic and epigenetic mechanisms of opioid action, in primate and rodent species. Using a prospectively registered methodology, comprehensive literature searches were completed in PubMed, Embase, and Web of Science. Of the 2709 articles identified, 73 met our inclusion criteria and were considered for qualitative analysis. Focusing on the 5 most studied nervous system structures (nucleus accumbens, frontal cortex, whole striatum, dorsal striatum, spinal cord; 44 articles), we also conducted a quantitative analysis of differentially expressed genes, in an effort to identify a putative core transcriptional signature of opioids. Only one gene, Cdkn1a, was consistently identified in eleven studies, and globally, our results unveil surprisingly low consistency across published work, even when considering most recent single-cell approaches. Analysis of sources of variability detected significant contributions from species, brain structure, duration of opioid exposure, strain, time-point of analysis, and batch effects, but not type of opioid. To go beyond those limitations, we leveraged threshold-free methods to illustrate how genome-wide comparisons may generate new findings and hypotheses. Finally, we discuss current methodological development in the field, and their implication for future research and, ultimately, better care.
Collapse
Affiliation(s)
- Camille Falconnier
- Centre National de la Recherche Scientifique, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, 67000, Strasbourg, France
| | - Alba Caparros-Roissard
- Centre National de la Recherche Scientifique, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, 67000, Strasbourg, France
| | - Charles Decraene
- Centre National de la Recherche Scientifique, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, 67000, Strasbourg, France
- Centre National de la Recherche Scientifique, Université de Strasbourg, Laboratoire de Neurosciences Cognitives et Adaptatives UMR 7364, 67000, Strasbourg, France
| | - Pierre-Eric Lutz
- Centre National de la Recherche Scientifique, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, 67000, Strasbourg, France.
- Douglas Mental Health University Institute, Montreal, QC, Canada.
| |
Collapse
|
3
|
Emissah H, Ljungquist B, Ascoli GA. Bibliometric analysis of neuroscience publications quantifies the impact of data sharing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.12.557386. [PMID: 37745378 PMCID: PMC10515804 DOI: 10.1101/2023.09.12.557386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Motivation Neural morphology, the branching geometry of neurons and glia in the nervous system, is an essential cellular substrate of brain function and pathology. Despite the accelerating production of digital reconstructions of neural morphology in laboratories worldwide, the public accessibility of data remains a core issue in neuroscience. Deficiencies in the availability of existing data create redundancy of research efforts and prevent researchers from building on others' work. Data sharing complements the development of computational resources and literature mining tools to accelerate scientific discovery. Results We carried out a comprehensive bibliometric analysis of neural morphology publications to quantify the impact of data sharing in the neuroscience community. Our findings demonstrate that sharing digital reconstructions of neural morphology via the NeuroMorpho.Org online repository leads to a significant increase of citations to the original article, thus directly benefiting the authors. Moreover, the rate of data reusage remains constant for at least 16 years after sharing (the whole period analyzed), altogether nearly doubling the peer-reviewed discoveries in the field. Furthermore, the recent availability of larger and more numerous datasets fostered integrative meta-analysis applications, which accrue on average twice the citations of re-analyses of individual datasets. We also designed and deployed an open-source citation tracking web-service that allows researchers to monitor reusage of their datasets in independent peer-reviewed reports. These results and the released tool can facilitate the recognition of shared data reuse for promotion and tenure considerations, merit evaluations, and funding decisions.
Collapse
Affiliation(s)
- Herve Emissah
- Bioinformatics Program, College of Science, George Mason University
| | - Bengt Ljungquist
- Center for Neural Informatics, Structures, and Plasticity, College of Engineering & Computing, George Mason University
| | - Giorgio A. Ascoli
- Bioinformatics Program, College of Science, George Mason University
- Center for Neural Informatics, Structures, and Plasticity, College of Engineering & Computing, George Mason University
| |
Collapse
|
4
|
Shea MM, Kuppermann J, Rogers MP, Smith DS, Edwards P, Boehm AB. Systematic review of marine environmental DNA metabarcoding studies: toward best practices for data usability and accessibility. PeerJ 2023; 11:e14993. [PMID: 36992947 PMCID: PMC10042160 DOI: 10.7717/peerj.14993] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 02/12/2023] [Indexed: 03/31/2023] Open
Abstract
The emerging field of environmental DNA (eDNA) research lacks universal guidelines for ensuring data produced are FAIR-findable, accessible, interoperable, and reusable-despite growing awareness of the importance of such practices. In order to better understand these data usability challenges, we systematically reviewed 60 peer reviewed articles conducting a specific subset of eDNA research: metabarcoding studies in marine environments. For each article, we characterized approximately 90 features across several categories: general article attributes and topics, methodological choices, types of metadata included, and availability and storage of sequence data. Analyzing these characteristics, we identified several barriers to data accessibility, including a lack of common context and vocabulary across the articles, missing metadata, supplementary information limitations, and a concentration of both sample collection and analysis in the United States. While some of these barriers require significant effort to address, we also found many instances where small choices made by authors and journals could have an outsized influence on the discoverability and reusability of data. Promisingly, articles also showed consistency and creativity in data storage choices as well as a strong trend toward open access publishing. Our analysis underscores the need to think critically about data accessibility and usability as marine eDNA metabarcoding studies, and eDNA projects more broadly, continue to proliferate.
Collapse
Affiliation(s)
- Meghan M. Shea
- Emmett Interdisciplinary Program in Environment & Resources (E-IPER), Stanford University, Stanford, CA, United States of America
| | - Jacob Kuppermann
- Earth Systems Program, Stanford University, Stanford, CA, United States of America
| | - Megan P. Rogers
- Program in Human Biology, Stanford University, Stanford, CA, United States of America
| | - Dustin Summer Smith
- Earth Systems Program, Stanford University, Stanford, CA, United States of America
| | - Paul Edwards
- Program in Science, Technology and Society, Stanford University, Stanford, CA, United States of America
| | - Alexandria B. Boehm
- Department of Civil and Environmental Engineering, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
5
|
Assidi M, Buhmeida A, Budowle B. Medicine and health of 21st Century: Not just a high biotech-driven solution. NPJ Genom Med 2022; 7:67. [PMID: 36379953 PMCID: PMC9666643 DOI: 10.1038/s41525-022-00336-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 10/27/2022] [Indexed: 11/16/2022] Open
Abstract
Many biotechnological innovations have shaped the contemporary healthcare system (CHS) with significant progress to treat or cure several acute conditions and diseases of known causes (particularly infectious, trauma). Some have been successful while others have created additional health care challenges. For example, a reliance on drugs has not been a panacea to meet the challenges related to multifactorial noncommunicable diseases (NCDs)-the main health burden of the 21st century. In contrast, the advent of omics-based and big data technologies has raised global hope to predict, treat, and/or cure NCDs, effectively fight even the current COVID-19 pandemic, and improve overall healthcare outcomes. Although this digital revolution has introduced extensive changes on all aspects of contemporary society, economy, firms, job market, and healthcare management, it is facing and will face several intrinsic and extrinsic challenges, impacting precision medicine implementation, costs, possible outcomes, and managing expectations. With all of biotechnology's exciting promises, biological systems' complexity, unfortunately, continues to be underestimated since it cannot readily be compartmentalized as an independent and segregated set of problems, and therefore is, in a number of situations, not readily mimicable by the current algorithm-building proficiency tools. Although the potential of biotechnology is motivating, we should not lose sight of approaches that may not seem as glamorous but can have large impacts on the healthcare of many and across disparate population groups. A balanced approach of "omics and big data" solution in CHS along with a large scale, simpler, and suitable strategies should be defined with expectations properly managed.
Collapse
Affiliation(s)
- Mourad Assidi
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
- Medical Laboratory Department, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Abdelbaset Buhmeida
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Bruce Budowle
- Department of Forensic Medicine, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
6
|
Lou RN, Therkildsen NO. Batch effects in population genomic studies with low-coverage whole genome sequencing data: Causes, detection and mitigation. Mol Ecol Resour 2021; 22:1678-1692. [PMID: 34825778 DOI: 10.1111/1755-0998.13559] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 11/05/2021] [Accepted: 11/11/2021] [Indexed: 01/04/2023]
Abstract
Over the past few decades, there has been an explosion in the amount of publicly available sequencing data. This opens new opportunities for combining data sets to achieve unprecedented sample sizes, spatial coverage or temporal replication in population genomic studies. However, a common concern is that nonbiological differences between data sets may generate patterns of variation in the data that can confound real biological patterns, a problem known as batch effects. In this paper, we compare two batches of low-coverage whole genome sequencing (lcWGS) data generated from the same populations of Atlantic cod (Gadus morhua). First, we show that with a "batch-effect-naive" bioinformatic pipeline, batch effects systematically biased our genetic diversity estimates, population structure inference and selection scans. We then demonstrate that these batch effects resulted from multiple technical differences between our data sets, including the sequencing chemistry (four-channel vs. two-channel), sequencing run, read type (single-end vs. paired-end), read length (125 vs. 150 bp), DNA degradation level (degraded vs. well preserved) and sequencing depth (0.8× vs. 0.3× on average). Lastly, we illustrate that a set of simple bioinformatic strategies (such as different read trimming and single nucleotide polymorphism filtering) can be used to detect batch effects in our data and substantially mitigate their impact. We conclude that combining data sets remains a powerful approach as long as batch effects are explicitly accounted for. We focus on lcWGS data in this paper, which may be particularly vulnerable to certain causes of batch effects, but many of our conclusions also apply to other sequencing strategies.
Collapse
|
7
|
Arribas P, Andújar C, Bidartondo MI, Bohmann K, Coissac É, Creer S, deWaard JR, Elbrecht V, Ficetola GF, Goberna M, Kennedy S, Krehenwinkel H, Leese F, Novotny V, Ronquist F, Yu DW, Zinger L, Creedy TJ, Meramveliotakis E, Noguerales V, Overcast I, Morlon H, Vogler AP, Papadopoulou A, Emerson BC. Connecting high-throughput biodiversity inventories: Opportunities for a site-based genomic framework for global integration and synthesis. Mol Ecol 2021; 30:1120-1135. [PMID: 33432777 PMCID: PMC7986105 DOI: 10.1111/mec.15797] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 12/21/2020] [Accepted: 01/05/2021] [Indexed: 01/03/2023]
Abstract
High-throughput sequencing (HTS) is increasingly being used for the characterization and monitoring of biodiversity. If applied in a structured way, across broad geographical scales, it offers the potential for a much deeper understanding of global biodiversity through the integration of massive quantities of molecular inventory data generated independently at local, regional and global scales. The universality, reliability and efficiency of HTS data can potentially facilitate the seamless linking of data among species assemblages from different sites, at different hierarchical levels of diversity, for any taxonomic group and regardless of prior taxonomic knowledge. However, collective international efforts are required to optimally exploit the potential of site-based HTS data for global integration and synthesis, efforts that at present are limited to the microbial domain. To contribute to the development of an analogous strategy for the nonmicrobial terrestrial domain, an international symposium entitled "Next Generation Biodiversity Monitoring" was held in November 2019 in Nicosia (Cyprus). The symposium brought together evolutionary geneticists, ecologists and biodiversity scientists involved in diverse regional and global initiatives using HTS as a core tool for biodiversity assessment. In this review, we summarize the consensus that emerged from the 3-day symposium. We converged on the opinion that an effective terrestrial Genomic Observatories network for global biodiversity integration and synthesis should be spatially led and strategically united under the umbrella of the metabarcoding approach. Subsequently, we outline an HTS-based strategy to collectively build an integrative framework for site-based biodiversity data generation.
Collapse
Affiliation(s)
- Paula Arribas
- Island Ecology and Evolution Research GroupInstituto de Productos Naturales y Agrobiología (IPNA‐CSIC)San Cristóbal de la LagunaSpain
| | - Carmelo Andújar
- Island Ecology and Evolution Research GroupInstituto de Productos Naturales y Agrobiología (IPNA‐CSIC)San Cristóbal de la LagunaSpain
| | - Martin I. Bidartondo
- Department of Life SciencesImperial College LondonLondonUK
- Comparative Plant and Fungal BiologyRoyal Botanic GardensLondonUK
| | - Kristine Bohmann
- Section for Evolutionary Genomics, Faculty of Health and Medical Sciences, Globe InstituteUniversity of CopenhagenCopenhagenDenmark
| | - Éric Coissac
- Université Grenoble Alpes, CNRS, Université Savoie Mont BlancLECA, Laboratoire d’Ecologie AlpineGrenobleFrance
| | - Simon Creer
- School of Natural SciencesBangor UniversityGwyneddUK
| | - Jeremy R. deWaard
- Centre for Biodiversity GenomicsUniversity of GuelphGuelphCanada
- School of Environmental SciencesUniversity of GuelphGuelphCanada
| | - Vasco Elbrecht
- Centre for Biodiversity Monitoring (ZBM)Zoological Research Museum Alexander KoenigBonnGermany
| | - Gentile F. Ficetola
- Université Grenoble Alpes, CNRS, Université Savoie Mont BlancLECA, Laboratoire d’Ecologie AlpineGrenobleFrance
- Department of Environmental Sciences and PolicyUniversity of MilanoMilanoItaly
| | - Marta Goberna
- Department of Environment and AgronomyINIAMadridSpain
| | - Susan Kennedy
- Biodiversity and Biocomplexity UnitOkinawa Institute of Science and Technology Graduate UniversityOnna‐sonJapan
- Department of BiogeographyTrier UniversityTrierGermany
| | | | - Florian Leese
- Aquatic Ecosystem Research, Faculty of BiologyUniversity of Duisburg‐EssenEssenGermany
- Centre for Water and Environmental Research (ZWU) EssenUniversity of Duisburg‐EssenEssenGermany
| | - Vojtech Novotny
- Biology Centre, Institute of EntomologyCzech Academy of SciencesCeske BudejoviceCzech Republic
- Faculty of ScienceUniversity of South BohemiaCeske BudejoviceCzech Republic
| | - Fredrik Ronquist
- Department of Bioinformatics and GeneticsSwedish Museum of Natural HistoryStockholmSweden
| | - Douglas W. Yu
- State Key Laboratory of Genetic Resources and EvolutionKunming Institute of Zoology, Chinese Academy of SciencesKunmingChina
- Center for Excellence in Animal Evolution and GeneticsChinese Academy of SciencesKunmingChina
- School of Biological SciencesUniversity of East AngliaNorwichUK
| | - Lucie Zinger
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERMUniversité PSLParisFrance
| | | | | | | | - Isaac Overcast
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERMUniversité PSLParisFrance
- Division of Vertebrate ZoologyAmerican Museum of Natural HistoryNew YorkUSA
| | - Hélène Morlon
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERMUniversité PSLParisFrance
| | - Alfried P. Vogler
- Department of Life SciencesImperial College LondonLondonUK
- Department of Life SciencesNatural History MuseumLondonUK
| | | | - Brent C. Emerson
- Island Ecology and Evolution Research GroupInstituto de Productos Naturales y Agrobiología (IPNA‐CSIC)San Cristóbal de la LagunaSpain
| |
Collapse
|
8
|
Reynolds T, Johnson EC, Huggett SB, Bubier JA, Palmer RHC, Agrawal A, Baker EJ, Chesler EJ. Interpretation of psychiatric genome-wide association studies with multispecies heterogeneous functional genomic data integration. Neuropsychopharmacology 2021; 46:86-97. [PMID: 32791514 PMCID: PMC7688940 DOI: 10.1038/s41386-020-00795-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 07/27/2020] [Accepted: 07/29/2020] [Indexed: 02/08/2023]
Abstract
Genome-wide association studies and other discovery genetics methods provide a means to identify previously unknown biological mechanisms underlying behavioral disorders that may point to new therapeutic avenues, augment diagnostic tools, and yield a deeper understanding of the biology of psychiatric conditions. Recent advances in psychiatric genetics have been made possible through large-scale collaborative efforts. These studies have begun to unearth many novel genetic variants associated with psychiatric disorders and behavioral traits in human populations. Significant challenges remain in characterizing the resulting disease-associated genetic variants and prioritizing functional follow-up to make them useful for mechanistic understanding and development of therapeutics. Model organism research has generated extensive genomic data that can provide insight into the neurobiological mechanisms of variant action, but a cohesive effort must be made to establish which aspects of the biological modulation of behavioral traits are evolutionarily conserved across species. Scalable computing, new data integration strategies, and advanced analysis methods outlined in this review provide a framework to efficiently harness model organism data in support of clinically relevant psychiatric phenotypes.
Collapse
Affiliation(s)
- Timothy Reynolds
- The Jackson Laboratory, Bar Harbor, ME, USA
- Computer Science Department, Baylor University, Waco, TX, USA
| | - Emma C Johnson
- Department of Psychiatry, Washington University in St Louis, St Louis, MO, USA
| | | | | | | | - Arpana Agrawal
- Department of Psychiatry, Washington University in St Louis, St Louis, MO, USA
| | - Erich J Baker
- Computer Science Department, Baylor University, Waco, TX, USA
| | | |
Collapse
|
9
|
Abstract
Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, "synthetic-diploid" and simulated WGS datasets. DRAGEN and DeepVariant show better accuracy in SNP and indel calling, with no significant differences in their F1-score. DRAGEN platform offers accuracy, flexibility and a highly-efficient execution speed, and therefore superior performance in the analysis of WGS data on a large scale. The combination of DRAGEN and DeepVariant also suggests a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical applications.
Collapse
|
10
|
Zhao S, Agafonov O, Azab A, Stokowy T, Hovig E. Accuracy and efficiency of germline variant calling pipelines for human genome data. Sci Rep 2020; 10:20222. [PMID: 33214604 PMCID: PMC7678823 DOI: 10.1038/s41598-020-77218-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 11/02/2020] [Indexed: 12/30/2022] Open
Abstract
Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, "synthetic-diploid" and simulated WGS datasets. DRAGEN and DeepVariant show better accuracy in SNP and indel calling, with no significant differences in their F1-score. DRAGEN platform offers accuracy, flexibility and a highly-efficient execution speed, and therefore superior performance in the analysis of WGS data on a large scale. The combination of DRAGEN and DeepVariant also suggests a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical applications.
Collapse
Affiliation(s)
- Sen Zhao
- Department of Tumor Biology, Institute of Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, 0310, Oslo, Norway
| | | | - Abdulrahman Azab
- Center for Bioinformatics, Department of Informatics, University of Oslo, 0316, Oslo, Norway
- Division of Research Computing, University Center for Information Technology (USIT), University of Oslo, 0316, Oslo, Norway
| | - Tomasz Stokowy
- Computational Biology Unit, Institute of Informatics, University of Bergen, 5008, Bergen, Norway
- Department of Clinical Science, University of Bergen, 5021, Bergen, Norway
| | - Eivind Hovig
- Department of Tumor Biology, Institute of Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, 0310, Oslo, Norway.
- Center for Bioinformatics, Department of Informatics, University of Oslo, 0316, Oslo, Norway.
| |
Collapse
|
11
|
Abstract
In this chapter we discuss the past, present and future of clinical biomarker development. We explore the advent of new technologies, paving the way in which health, medicine and disease is understood. This review includes the identification of physicochemical assays, current regulations, the development and reproducibility of clinical trials, as well as, the revolution of omics technologies and state-of-the-art integration and analysis approaches.
Collapse
|
12
|
Dass G, Vu MT, Xu P, Audain E, Hitz MP, Grüning B, Hermjakob H, Perez-Riverol Y. The omics discovery REST interface. Nucleic Acids Res 2020; 48:W380-W384. [PMID: 32374843 PMCID: PMC7319562 DOI: 10.1093/nar/gkaa326] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 04/11/2020] [Accepted: 04/21/2020] [Indexed: 01/22/2023] Open
Abstract
The Omics Discovery Index is an open source platform that can be used to access, discover and disseminate omics datasets. OmicsDI integrates proteomics, genomics, metabolomics, models and transcriptomics datasets. Using an efficient indexing system, OmicsDI integrates different biological entities including genes, transcripts, proteins, metabolites and the corresponding publications from PubMed. In addition, it implements a group of pipelines to estimate the impact of each dataset by tracing the number of citations, reanalysis and biological entities reported by each dataset. Here, we present the OmicsDI REST interface (www.omicsdi.org/ws/) to enable programmatic access to any dataset in OmicsDI or all the datasets for a specific provider (database). Clients can perform queries on the API using different metadata information such as sample details (species, tissues, etc), instrumentation (mass spectrometer, sequencer), keywords and other provided annotations. In addition, we present two different libraries in R and Python to facilitate the development of tools that can programmatically interact with the OmicsDI REST interface.
Collapse
Affiliation(s)
- Gaurhari Dass
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, UK
| | - Manh-Tu Vu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, UK
| | - Pan Xu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, 102206 Beijing, China
| | - Enrique Audain
- Department of Human Genetics, University Medical Center Schleswig-Holstein (UKSH), Kiel, Germany
| | - Marc-Phillip Hitz
- Department of Human Genetics, University Medical Center Schleswig-Holstein (UKSH), Kiel, Germany
| | - Björn A Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, UK
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, 102206 Beijing, China
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, UK
| |
Collapse
|
13
|
Hemphill L, Hedstrom ML, Leonard SH. Saving social media data: Understanding data management practices among social media researchers and their implications for archives. J Assoc Inf Sci Technol 2020. [DOI: 10.1002/asi.24368] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Libby Hemphill
- School of Information University of Michigan Ann Arbor Michigan USA
- ICPSR University of Michigan Ann Arbor Michigan USA
| | | | | |
Collapse
|
14
|
|
15
|
Brumfield KD, Huq A, Colwell RR, Olds JL, Leddy MB. Microbial resolution of whole genome shotgun and 16S amplicon metagenomic sequencing using publicly available NEON data. PLoS One 2020; 15:e0228899. [PMID: 32053657 PMCID: PMC7018008 DOI: 10.1371/journal.pone.0228899] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 01/24/2020] [Indexed: 01/01/2023] Open
Abstract
Microorganisms are ubiquitous in the biosphere, playing a crucial role in both biogeochemistry of the planet and human health. However, identifying these microorganisms and defining their function are challenging. Widely used approaches in comparative metagenomics, 16S amplicon sequencing and whole genome shotgun sequencing (WGS), have provided access to DNA sequencing analysis to identify microorganisms and evaluate diversity and abundance in various environments. However, advances in parallel high-throughput DNA sequencing in the past decade have introduced major hurdles, namely standardization of methods, data storage, reproducible interoperability of results, and data sharing. The National Ecological Observatory Network (NEON), established by the National Science Foundation, enables all researchers to address queries on a regional to continental scale around a variety of environmental challenges and provide high-quality, integrated, and standardized data from field sites across the U.S. As the amount of metagenomic data continues to grow, standardized procedures that allow results across projects to be assessed and compared is becoming increasingly important in the field of metagenomics. We demonstrate the feasibility of using publicly available NEON soil metagenomic sequencing datasets in combination with open access Metagenomics Rapid Annotation using the Subsystem Technology (MG-RAST) server to illustrate advantages of WGS compared to 16S amplicon sequencing. Four WGS and four 16S amplicon sequence datasets, from surface soil samples prepared by NEON investigators, were selected for comparison, using standardized protocols collected at the same locations in Colorado between April-July 2014. The dominant bacterial phyla detected across samples agreed between sequencing methodologies. However, WGS yielded greater microbial resolution, increased accuracy, and allowed identification of more genera of bacteria, archaea, viruses, and eukaryota, and putative functional genes that would have gone undetected using 16S amplicon sequencing. NEON open data will be useful for future studies characterizing and quantifying complex ecological processes associated with changing aquatic and terrestrial ecosystems.
Collapse
Affiliation(s)
- Kyle D. Brumfield
- Maryland Pathogen Research Institute, University of Maryland, College Park, Maryland, United States of America
- University of Maryland Institute for Advanced Computer Studies, University of Maryland, College Park, Maryland, United States of America
| | - Anwar Huq
- Maryland Pathogen Research Institute, University of Maryland, College Park, Maryland, United States of America
| | - Rita R. Colwell
- Maryland Pathogen Research Institute, University of Maryland, College Park, Maryland, United States of America
- University of Maryland Institute for Advanced Computer Studies, University of Maryland, College Park, Maryland, United States of America
- CosmosID Inc., Rockville, MD, United States of America
| | - James L. Olds
- Schar School, George Mason University, Arlington, Virginia, United States of America
| | - Menu B. Leddy
- Essential Environmental and Engineering Systems, Huntington Beach, California, United States of America
| |
Collapse
|
16
|
Santiago CRDN, Assis RDAB, Moreira LM, Digiampietri LA. Gene Tags Assessment by Comparative Genomics (GTACG): A User-Friendly Framework for Bacterial Comparative Genomics. Front Genet 2019; 10:725. [PMID: 31507629 PMCID: PMC6718126 DOI: 10.3389/fgene.2019.00725] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Accepted: 07/10/2019] [Indexed: 12/04/2022] Open
Abstract
Genomics research has produced an exponential amount of data. However, the genetic knowledge pertaining to certain phenotypic characteristics is lacking. Also, a considerable part of these genomes have coding sequences (CDSs) with unknown functions, posing additional challenges to researchers. Phylogenetically close microorganisms share much of their CDSs, and certain phenotypes unique to a set of microorganisms may be the result of the genes found exclusively in those microorganisms. This study presents the GTACG framework, an easy-to-use tool for identifying in the subgroups of bacterial genomes whose microorganisms have common phenotypic characteristics, to find data that differentiates them from other associated genomes in a simple and fast way. The GTACG analysis is based on the formation of homologous CDS clusters from local alignments. The front-end is easy to use, and the installation packages have been developed to enable users lacking knowledge of programming languages or bioinformatics analyze high-throughput data using the tool. The validation of the GTACG framework has been carried out based on a case report involving a set of 161 genomes from the Xanthomonadaceae family, in which 19 families of orthologous proteins were found in 90% of the plant-associated genomes, allowing the identification of the proteins potentially associated with adaptation and virulence in plant tissue. The results show the potential use of GTACG in the search for new targets for molecular studies, and GTACG can be used as a research tool by biologists who lack advanced knowledge in the use of computational tools for bacterial comparative genomics.
Collapse
Affiliation(s)
| | - Renata de Almeida Barbosa Assis
- Biotecnology Graduate Program, Núcleo de Pesquisas em Ciências Biológicas, Federal University of Ouro Preto, Ouro Preto, Brazil
| | - Leandro Marcio Moreira
- Biotecnology Graduate Program, Núcleo de Pesquisas em Ciências Biológicas, Federal University of Ouro Preto, Ouro Preto, Brazil
- Department of Biological Sciences, Federal University of Ouro Preto, Ouro Preto, Brazil
| | - Luciano Antonio Digiampietri
- Bioinformatics Graduate Program, University of Sao Paulo, Sao Paulo, Brazil
- School of Arts, Science, and Humanities, University of Sao Paulo, Sao Paulo, Brazil
| |
Collapse
|
17
|
Sansone SA, McQuilton P, Rocca-Serra P, Gonzalez-Beltran A, Izzo M, Lister AL, Thurston M. FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol 2019; 37:358-367. [PMID: 30940948 PMCID: PMC6785156 DOI: 10.1038/s41587-019-0080-8] [Citation(s) in RCA: 149] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK.
| | - Peter McQuilton
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK
| | | | - Massimiliano Izzo
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Allyson L Lister
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Milo Thurston
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK
| | | |
Collapse
|
18
|
Amann RI, Baichoo S, Blencowe BJ, Bork P, Borodovsky M, Brooksbank C, Chain PSG, Colwell RR, Daffonchio DG, Danchin A, de Lorenzo V, Dorrestein PC, Finn RD, Fraser CM, Gilbert JA, Hallam SJ, Hugenholtz P, Ioannidis JPA, Jansson JK, Kim JF, Klenk HP, Klotz MG, Knight R, Konstantinidis KT, Kyrpides NC, Mason CE, McHardy AC, Meyer F, Ouzounis CA, Patrinos AAN, Podar M, Pollard KS, Ravel J, Muñoz AR, Roberts RJ, Rosselló-Móra R, Sansone SA, Schloss PD, Schriml LM, Setubal JC, Sorek R, Stevens RL, Tiedje JM, Turjanski A, Tyson GW, Ussery DW, Weinstock GM, White O, Whitman WB, Xenarios I. Toward unrestricted use of public genomic data. Science 2019; 363:350-352. [PMID: 30679363 DOI: 10.1126/science.aaw1280] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Publication interests should not limit access to public data
Collapse
|
19
|
Maxson Jones K, Ankeny RA, Cook-Deegan R. The Bermuda Triangle: The Pragmatics, Policies, and Principles for Data Sharing in the History of the Human Genome Project. JOURNAL OF THE HISTORY OF BIOLOGY 2018; 51:693-805. [PMID: 30390178 PMCID: PMC7307446 DOI: 10.1007/s10739-018-9538-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The Bermuda Principles for DNA sequence data sharing are an enduring legacy of the Human Genome Project (HGP). They were adopted by the HGP at a strategy meeting in Bermuda in February of 1996 and implemented in formal policies by early 1998, mandating daily release of HGP-funded DNA sequences into the public domain. The idea of daily sharing, we argue, emanated directly from strategies for large, goal-directed molecular biology projects first tested within the "community" of C. elegans researchers, and were introduced and defended for the HGP by the nematode biologists John Sulston and Robert Waterston. In the C. elegans community, and subsequently in the HGP, daily sharing served the pragmatic goals of quality control and project coordination. Yet in the HGP human genome, we also argue, the Bermuda Principles addressed concerns about gene patents impeding scientific advancement, and were aspirational and flexible in implementation and justification. They endured as an archetype for how rapid data sharing could be realized and rationalized, and permitted adaptation to the needs of various scientific communities. Yet in addition to the support of Sulston and Waterston, their adoption also depended on the clout of administrators at the US National Institutes of Health (NIH) and the UK nonprofit charity the Wellcome Trust, which together funded 90% of the HGP human sequencing effort. The other nations wishing to remain in the HGP consortium had to accommodate to the Bermuda Principles, requiring exceptions from incompatible existing or pending data access policies for publicly funded research in Germany, Japan, and France. We begin this story in 1963, with the biologist Sydney Brenner's proposal for a nematode research program at the Laboratory of Molecular Biology (LMB) at the University of Cambridge. We continue through 2003, with the completion of the HGP human reference genome, and conclude with observations about policy and the historiography of molecular biology.
Collapse
Affiliation(s)
- Kathryn Maxson Jones
- Department of History, Princeton University, Princeton, NJ, USA.
- MBL McDonnell Foundation Scholar, Marine Biological Laboratory, Woods Hole, MA, USA.
| | - Rachel A Ankeny
- School of Humanities, The University of Adelaide, Adelaide, Australia
| | - Robert Cook-Deegan
- School for the Future of Innovation in Society, Consortium for Science, Policy & Outcomes, Arizona State University, Barrett & O'Connor Washington Center, Washington, D.C., USA
| |
Collapse
|
20
|
Falomir-Lockhart AH, Villegas-Castagnaso EE, Giovambattista G, Rogberg-Muñoz A. Computational prediction of nsSNPs effects on protein function and structure, a prioritization approach for further in vitro studies applied to bovine GSTP1. Free Radic Biol Med 2018; 129:486-491. [PMID: 30315934 DOI: 10.1016/j.freeradbiomed.2018.10.403] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 09/20/2018] [Accepted: 10/03/2018] [Indexed: 11/20/2022]
Abstract
The development of high-throughput technologies in the last decade produced an exponential increase in the amount of biological data available. The case of redox biology and apoptosis is not an exception, and nowadays there is a need to integrate information from multiple "omics" studies. Therefore, validation of proposed discoveries is essential. However, the study in biological systems of the effect of the massive amounts of sequence variation data generated with next-generation sequencing (NGS) technologies can be a very difficult and expensive process. In this context, the present study aimed to demonstrate the advantages of a computational methodology to systematically analyze the structural and functional effects of protein variants, in order to prioritize further studies. This approach stands out for its easy implementation, low costs and low time consumed. First, the possible impact of mutations on protein structure and function was tested by a combination of tools based on evolutionary and structural information. Next, homology modeling was performed to predict and compare the 3D protein structures of unresolved amino acid sequences obtained from genomic resequencing. This analysis applied to the bovine GSTP1 allowed to determine that some of amino acid substitutions may generate important changes in protein structure and function. Moreover, the haplotype analysis highlighted three structure variants worthwhile studying through in vitro or in vivo experiments.
Collapse
Affiliation(s)
- A H Falomir-Lockhart
- IGEVET - Instituto de Genética Veterinaria "Ing. Fernando Noel Dulout" (UNLP-CONICET La Plata), Facultad de Ciencias Veterinarias, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina.
| | - E E Villegas-Castagnaso
- IGEVET - Instituto de Genética Veterinaria "Ing. Fernando Noel Dulout" (UNLP-CONICET La Plata), Facultad de Ciencias Veterinarias, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina.
| | - G Giovambattista
- IGEVET - Instituto de Genética Veterinaria "Ing. Fernando Noel Dulout" (UNLP-CONICET La Plata), Facultad de Ciencias Veterinarias, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina.
| | - A Rogberg-Muñoz
- IGEVET - Instituto de Genética Veterinaria "Ing. Fernando Noel Dulout" (UNLP-CONICET La Plata), Facultad de Ciencias Veterinarias, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina; Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina.
| |
Collapse
|
21
|
Huan T, Palermo A, Ivanisevic J, Rinehart D, Edler D, Phommavongsay T, Benton HP, Guijas C, Domingo-Almenara X, Warth B, Siuzdak G. Autonomous Multimodal Metabolomics Data Integration for Comprehensive Pathway Analysis and Systems Biology. Anal Chem 2018; 90:8396-8403. [PMID: 29893550 DOI: 10.1021/acs.analchem.8b00875] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Comprehensive metabolomic data can be achieved using multiple orthogonal separation and mass spectrometry (MS) analytical techniques. However, drawing biologically relevant conclusions from this data and combining it with additional layers of information collected by other omic technologies present a significant bioinformatic challenge. To address this, a data processing approach was designed to automate the comprehensive prediction of dysregulated metabolic pathways/networks from multiple data sources. The platform autonomously integrates multiple MS-based metabolomics data types without constraints due to different sample preparation/extraction, chromatographic separation, or MS detection method. This multimodal analysis streamlines the extraction of biological information from the metabolomics data as well as the contextualization within proteomics and transcriptomics data sets. As a proof of concept, this multimodal analysis approach was applied to a colorectal cancer (CRC) study, in which complementary liquid chromatography-mass spectrometry (LC-MS) data were combined with proteomic and transcriptomic data. Our approach provided a highly resolved overview of colon cancer metabolic dysregulation, with an average 17% increase of detected dysregulated metabolites per pathway and an increase in metabolic pathway prediction confidence. Moreover, 95% of the altered metabolic pathways matched with the dysregulated genes and proteins, providing additional validation at a systems level. The analysis platform is currently available via the XCMS Online ( XCMSOnline.scripps.edu ).
Collapse
Affiliation(s)
| | | | - Julijana Ivanisevic
- Metabolomics Platform, Faculty of Biology and Medicine , University of Lausanne , CH-1005 Lausanne , Switzerland
| | | | - David Edler
- Department of Molecular Medicine and Surgery , Karolinska Institute , 171 77 Stockholm , Sweden
| | | | | | | | | | - Benedikt Warth
- Department of Food Chemistry and Toxicology, Faculty of Chemistry and Vienna Metabolomics Center (VIME) , University of Vienna , Währingerstrasse 38 , 1090 Vienna , Austria
| | | |
Collapse
|
22
|
Dickie IA, Boyer S, Buckley HL, Duncan RP, Gardner PP, Hogg ID, Holdaway RJ, Lear G, Makiola A, Morales SE, Powell JR, Weaver L. Towards robust and repeatable sampling methods in eDNA-based studies. Mol Ecol Resour 2018; 18:940-952. [PMID: 29802793 DOI: 10.1111/1755-0998.12907] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Revised: 05/10/2018] [Accepted: 05/14/2018] [Indexed: 01/28/2023]
Abstract
DNA-based techniques are increasingly used for measuring the biodiversity (species presence, identity, abundance and community composition) of terrestrial and aquatic ecosystems. While there are numerous reviews of molecular methods and bioinformatic steps, there has been little consideration of the methods used to collect samples upon which these later steps are based. This represents a critical knowledge gap, as methodologically sound field sampling is the foundation for subsequent analyses. We reviewed field sampling methods used for metabarcoding studies of both terrestrial and freshwater ecosystem biodiversity over a nearly three-year period (n = 75). We found that 95% (n = 71) of these studies used subjective sampling methods and inappropriate field methods and/or failed to provide critical methodological information. It would be possible for researchers to replicate only 5% of the metabarcoding studies in our sample, a poorer level of reproducibility than for ecological studies in general. Our findings suggest greater attention to field sampling methods, and reporting is necessary in eDNA-based studies of biodiversity to ensure robust outcomes and future reproducibility. Methods must be fully and accurately reported, and protocols developed that minimize subjectivity. Standardization of sampling protocols would be one way to help to improve reproducibility and have additional benefits in allowing compilation and comparison of data from across studies.
Collapse
Affiliation(s)
- Ian A Dickie
- Bio-Protection Research Centre, Lincoln University, Lincoln, New Zealand
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Stephane Boyer
- Institut de Recherche sur la Biologie de l'Insecte - UMR 7261 CNRS, Université de Tours, Tours, France
- Applied Molecular Solutions Research Group, Environmental and Animal Sciences, Unitec Institute of Technology, Auckland, New Zealand
| | - Hannah L Buckley
- School of Science, Auckland University of Technology, Auckland, New Zealand
| | - Richard P Duncan
- Institute for Applied Ecology, University of Canberra, Bruce, ACT, Australia
| | - Paul P Gardner
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Ian D Hogg
- School of Science, University of Waikato, Hamilton, New Zealand
- Polar Knowledge Canada, CHARS Campus, Cambridge Bay, NU, Canada
| | | | - Gavin Lear
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand
| | - Andreas Makiola
- Bio-Protection Research Centre, Lincoln University, Lincoln, New Zealand
| | - Sergio E Morales
- Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand
| | - Jeff R Powell
- Hawkesbury Institute for the Environment, Western Sydney University, Penrith, NSW, Australia
| | - Louise Weaver
- Institute of Environmental Science and Research Ltd., Christchurch, New Zealand
| |
Collapse
|
23
|
Kaye J, Terry SF, Juengst E, Coy S, Harris JR, Chalmers D, Dove ES, Budin-Ljøsne I, Adebamowo C, Ogbe E, Bezuidenhout L, Morrison M, Minion JT, Murtagh MJ, Minari J, Teare H, Isasi R, Kato K, Rial-Sebbag E, Marshall P, Koenig B, Cambon-Thomsen A. Including all voices in international data-sharing governance. Hum Genomics 2018. [PMID: 29514717 PMCID: PMC5842530 DOI: 10.1186/s40246-018-0143-9] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Background Governments, funding bodies, institutions, and publishers have developed a number of strategies to encourage researchers to facilitate access to datasets. The rationale behind this approach is that this will bring a number of benefits and enable advances in healthcare and medicine by allowing the maximum returns from the investment in research, as well as reducing waste and promoting transparency. As this approach gains momentum, these data-sharing practices have implications for many kinds of research as they become standard practice across the world. Main text The governance frameworks that have been developed to support biomedical research are not well equipped to deal with the complexities of international data sharing. This system is nationally based and is dependent upon expert committees for oversight and compliance, which has often led to piece-meal decision-making. This system tends to perpetuate inequalities by obscuring the contributions and the important role of different data providers along the data stream, whether they be low- or middle-income country researchers, patients, research participants, groups, or communities. As research and data-sharing activities are largely publicly funded, there is a strong moral argument for including the people who provide the data in decision-making and to develop governance systems for their continued participation. Conclusions We recommend that governance of science becomes more transparent, representative, and responsive to the voices of many constituencies by conducting public consultations about data-sharing addressing issues of access and use; including all data providers in decision-making about the use and sharing of data along the whole of the data stream; and using digital technologies to encourage accessibility, transparency, and accountability. We anticipate that this approach could enhance the legitimacy of the research process, generate insights that may otherwise be overlooked or ignored, and help to bring valuable perspectives into the decision-making around international data sharing.
Collapse
Affiliation(s)
- Jane Kaye
- Centre for Health Law and Emerging Technologies, NDPH, University of Oxford, Ewert House, Ewert Place, Summertown, Oxford, OX2 7DD, UK. .,Melbourne Law School, University of Melbourne, 185 Pelham Street, Carlton, Victoria, 3053, Australia.
| | - Sharon F Terry
- Genetic Alliance USA, 4301 Connecticut Ave NW, Suite 404, Washington DC, 20008-2369, USA
| | - Eric Juengst
- Center for Bioethics, University of North Carolina at Chapel Hill, 333 McNider Hall, Chapel Hill, NC, 27599-7240, USA
| | - Sarah Coy
- Centre for Health Law and Emerging Technologies, NDPH, University of Oxford, Ewert House, Ewert Place, Summertown, Oxford, OX2 7DD, UK
| | - Jennifer R Harris
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, PO Box 4404, Nydalen, 0403, Oslo, Norway
| | - Don Chalmers
- Faculty of Law, University of Tasmania, Private Bag 89, Hobart, Tasmania, 7001, Australia
| | - Edward S Dove
- School of Law, University of Edinburgh, Old College, South Bridge, Edinburgh, EH8 9YL, UK
| | - Isabelle Budin-Ljøsne
- Cohort Studies, Norwegian Institute of Public Health, PO Box 4404, Nydalen, 0403, Oslo, Norway
| | - Clement Adebamowo
- Center for Bioethics and Research, Ibadan, Nigeria.,Institute of Human Virology Nigeria, Abuja, Nigeria.,Greenebaum Comprehensive Cancer Center and Institute of Human Virology, University of Maryland School of Medicine, 725 W. Lombard St. Suite 445, Baltimore, MD, 21201, USA
| | - Emilomo Ogbe
- International Centre for Reproductive Health, University of Gent, De Pintepark II, De Pintelaan 185, 9000, Ghent, Belgium
| | - Louise Bezuidenhout
- Institute for Science, Innovation and Society, University of Oxford, 64 Banbury Road, Oxford, OX2 6PN, UK
| | - Michael Morrison
- Centre for Health Law and Emerging Technologies, NDPH, University of Oxford, Ewert House, Ewert Place, Summertown, Oxford, OX2 7DD, UK
| | - Joel T Minion
- Policy, Ethics and Life Sciences (PEALS) Research Centre, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
| | - Madeleine J Murtagh
- Policy, Ethics and Life Sciences (PEALS) Research Centre, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
| | - Jusaku Minari
- Uehiro Research Division for iPS Cell Ethics, Center for iPS Cell Research and Application, Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto, 606-8507, Japan
| | - Harriet Teare
- Centre for Health Law and Emerging Technologies, NDPH, University of Oxford, Ewert House, Ewert Place, Summertown, Oxford, OX2 7DD, UK.,Melbourne Law School, University of Melbourne, 185 Pelham Street, Carlton, Victoria, 3053, Australia
| | - Rosario Isasi
- Institute for Bioethics and Health Policy, Department of Human Genetics, Leonard M. Miller School of Medicine, University of Miami, 1501 NW 10th Avenue, Biomedical Research Building (BRB) Room 361, Miami, FL, 33136, USA
| | - Kazuto Kato
- Department of Biomedical Ethics and Public Policy, Graduate School of Medicine, Osaka University, 2-2 Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Emmanuelle Rial-Sebbag
- National Institute for Research and Health (Inserm), UMR 1027 Inserm, Toulouse University, 37 allées Jules Guesde, 31000, Toulouse, France
| | - Patricia Marshall
- Department of Bioethics, School of Medicine, TA200, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH, 44106-4976, USA
| | - Barbara Koenig
- UCSF School of Nursing, Institute for Health and Aging, University of California, San Francisco, 3333 Calif. St, Laurel Heights, San Francisco, CA, 94118, USA
| | - Anne Cambon-Thomsen
- CNRS, Toulouse, France; Joint research unit on epidemiology and public health, Inserm (National Institute for Health and Medical Research) and University Toulouse III Paul Sabatier, Toulouse, France
| |
Collapse
|
24
|
Karcher S, Willighagen EL, Rumble J, Ehrhart F, Evelo CT, Fritts M, Gaheen S, Harper SL, Hoover MD, Jeliazkova N, Lewinski N, Marchese Robinson RL, Mills KC, Mustad AP, Thomas DG, Tsiliki G, Ogilvie Hendren C. Integration among databases and data sets to support productive nanotechnology: Challenges and recommendations. NANOIMPACT 2018; 9:85-101. [PMID: 30246165 PMCID: PMC6145474 DOI: 10.1016/j.impact.2017.11.002] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Many groups within the broad field of nanoinformatics are already developing data repositories and analytical tools driven by their individual organizational goals. Integrating these data resources across disciplines and with non-nanotechnology resources can support multiple objectives by enabling the reuse of the same information. Integration can also serve as the impetus for novel scientific discoveries by providing the framework to support deeper data analyses. This article discusses current data integration practices in nanoinformatics and in comparable mature fields, and nanotechnology-specific challenges impacting data integration. Based on results from a nanoinformatics-community-wide survey, recommendations for achieving integration of existing operational nanotechnology resources are presented. Nanotechnology-specific data integration challenges, if effectively resolved, can foster the application and validation of nanotechnology within and across disciplines. This paper is one of a series of articles by the Nanomaterial Data Curation Initiative that address data issues such as data curation workflows, data completeness and quality, curator responsibilities, and metadata.
Collapse
Affiliation(s)
- Sandra Karcher
- Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA 15213-3890, USA
- Center for the Environmental Implications of Nano Technology (CEINT) Duke University, Box 90287, 121 Hudson Hall, Durham, NC 27708-0287, USA
| | - Egon L. Willighagen
- Department of Bioinformatics - BiGCaT, Maastricht University, P.O. Box 616, UNS50, Box 19, NL-6200, MD, Maastricht, The Netherlands
| | - John Rumble
- R&R Data Services, 11 Montgomery Avenue, Gaithersburg, MD 20877, USA
- CODATA-VAMAS Working Group on Nanomaterials, Paris, France
| | - Friederike Ehrhart
- Department of Bioinformatics - BiGCaT, Maastricht University, P.O. Box 616, UNS50, Box 19, NL-6200, MD, Maastricht, The Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics - BiGCaT, Maastricht University, P.O. Box 616, UNS50, Box 19, NL-6200, MD, Maastricht, The Netherlands
| | - Martin Fritts
- Clinical Research Directorate/Clinical Monitoring Research Program, Leidos Biomedical Research, Inc., NCI Campus at Frederick, Frederick, MD 21702, USA
| | - Sharon Gaheen
- Clinical Research Directorate/Clinical Monitoring Research Program, Leidos Biomedical Research, Inc., NCI Campus at Frederick, Frederick, MD 21702, USA
| | - Stacey L. Harper
- Environmental and Molecular Toxicology and School of Chemical, Biological and Environmental Engineering, Oregon State University, Corvallis, OR 97331, USA
| | - Mark D. Hoover
- National Institute for Occupational Safety and Health, 1095 Willowdale Road, Morgantown, WV 26505-2888, USA
| | | | - Nastassja Lewinski
- Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Richard L. Marchese Robinson
- School of Chemical and Process Engineering, University of Leeds, Leeds LS2 9JT, United Kingdom
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, United Kingdom
| | - Karmann C. Mills
- RTI International, 3040 Cornwallis Rd., Research Triangle Park, NC 27709, USA
| | - Axel P. Mustad
- Nordic Quantum Computing Group AS, Oslo Science Park, P.O. Box 1892, Vika, N-0124 Oslo, Norway
| | - Dennis G. Thomas
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Georgia Tsiliki
- School of Chemical Engineering, National Technical University of Athens, 9 Heroon Polytechneiou Street, Zografou, 15780, Athens, Greece
- Institute for the management of Information Systems, ATHENA Research and Innovation Centre, Artemidos 6 & Epidavrou, Marousi, 15125 Athens, Greece
| | - Christine Ogilvie Hendren
- Center for the Environmental Implications of Nano Technology (CEINT) Duke University, Box 90287, 121 Hudson Hall, Durham, NC 27708-0287, USA
| |
Collapse
|
25
|
Abstract
The diversity and huge omics data take biology and biomedicine research and application into a big data era, just like that popular in human society a decade ago. They are opening a new challenge from horizontal data ensemble (e.g., the similar types of data collected from different labs or companies) to vertical data ensemble (e.g., the different types of data collected for a group of person with match information), which requires the integrative analysis in biology and biomedicine and also asks for emergent development of data integration to address the great changes from previous population-guided to newly individual-guided investigations.Data integration is an effective concept to solve the complex problem or understand the complicate system. Several benchmark studies have revealed the heterogeneity and trade-off that existed in the analysis of omics data. Integrative analysis can combine and investigate many datasets in a cost-effective reproducible way. Current integration approaches on biological data have two modes: one is "bottom-up integration" mode with follow-up manual integration, and the other one is "top-down integration" mode with follow-up in silico integration.This paper will firstly summarize the combinatory analysis approaches to give candidate protocol on biological experiment design for effectively integrative study on genomics and then survey the data fusion approaches to give helpful instruction on computational model development for biological significance detection, which have also provided newly data resources and analysis tools to support the precision medicine dependent on the big biomedical data. Finally, the problems and future directions are highlighted for integrative analysis of omics big data.
Collapse
Affiliation(s)
- Xiang-Tian Yu
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, China.
| |
Collapse
|
26
|
Park J, Gabbard JL. Factors that affect scientists' knowledge sharing behavior in health and life sciences research communities: Differences between explicit and implicit knowledge. COMPUTERS IN HUMAN BEHAVIOR 2018. [DOI: 10.1016/j.chb.2017.09.017] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
27
|
Garlid AO, Polson JS, Garlid KD, Hermjakob H, Ping P. Equipping Physiologists with an Informatics Tool Chest: Toward an Integerated Mitochondrial Phenome. Handb Exp Pharmacol 2017; 240:377-401. [PMID: 27995389 DOI: 10.1007/164_2016_93] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Understanding the complex involvement of mitochondrial biology in disease development often requires the acquisition, analysis, and integration of large-scale molecular and phenotypic data. An increasing number of bioinformatics tools are currently employed to aid in mitochondrial investigations, most notably in predicting or corroborating the spatial and temporal dynamics of mitochondrial molecules, in retrieving structural data of mitochondrial components, and in aggregating as well as transforming mitochondrial centric biomedical knowledge. With the increasing prevalence of complex Big Data from omics experiments and clinical cohorts, informatics tools have become indispensable in our quest to understand mitochondrial physiology and pathology. Here we present an overview of the various informatics resources that are helping researchers explore this vital organelle and gain insights into its form, function, and dynamics.
Collapse
Affiliation(s)
- Anders Olav Garlid
- The NIH BD2K Center of Excellence in Biomedical Computing at UCLA, Department of Physiology, University of California, Los Angeles, CA, 90095, USA.
| | - Jennifer S Polson
- The NIH BD2K Center of Excellence in Biomedical Computing at UCLA, Department of Physiology, University of California, Los Angeles, CA, 90095, USA.
| | - Keith D Garlid
- The NIH BD2K Center of Excellence in Biomedical Computing at UCLA, Department of Physiology, University of California, Los Angeles, CA, 90095, USA
| | - Henning Hermjakob
- The NIH BD2K Center of Excellence in Biomedical Computing at UCLA, Department of Physiology, University of California, Los Angeles, CA, 90095, USA
- Molecular Systems Cluster, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Peipei Ping
- The NIH BD2K Center of Excellence in Biomedical Computing at UCLA, Departments of Physiology, Medicine, and Bioinformatics, University of California, Los Angeles, CA, 90095, USA
| |
Collapse
|
28
|
Pitchers WR, Constantinou SJ, Losilla M, Gallant JR. Electric fish genomics: Progress, prospects, and new tools for neuroethology. ACTA ACUST UNITED AC 2016; 110:259-272. [PMID: 27769923 DOI: 10.1016/j.jphysparis.2016.10.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 09/06/2016] [Accepted: 10/16/2016] [Indexed: 01/01/2023]
Abstract
Electric fish have served as a model system in biology since the 18th century, providing deep insight into the nature of bioelectrogenesis, the molecular structure of the synapse, and brain circuitry underlying complex behavior. Neuroethologists have collected extensive phenotypic data that span biological levels of analysis from molecules to ecosystems. This phenotypic data, together with genomic resources obtained over the past decades, have motivated new and exciting hypotheses that position the weakly electric fish model to address fundamental 21st century biological questions. This review article considers the molecular data collected for weakly electric fish over the past three decades, and the insights that data of this nature has motivated. For readers relatively new to molecular genetics techniques, we also provide a table of terminology aimed at clarifying the numerous acronyms and techniques that accompany this field. Next, we pose a research agenda for expanding genomic resources for electric fish research over the next 10years. We conclude by considering some of the exciting research prospects for neuroethology that electric fish genomics may offer over the coming decades, if the electric fish community is successful in these endeavors.
Collapse
Affiliation(s)
- William R Pitchers
- Dept. of Integrative Biology, Michigan State University, 288 Farm Lane RM 203, East Lansing, MI 48824, USA.
| | - Savvas J Constantinou
- Dept. of Integrative Biology, Michigan State University, 288 Farm Lane RM 203, East Lansing, MI 48824, USA
| | - Mauricio Losilla
- Dept. of Integrative Biology, Michigan State University, 288 Farm Lane RM 203, East Lansing, MI 48824, USA
| | - Jason R Gallant
- Dept. of Integrative Biology, Michigan State University, 288 Farm Lane RM 203, East Lansing, MI 48824, USA.
| |
Collapse
|
29
|
Forrest CB, Margolis P, Seid M, Colletti RB. PEDSnet: how a prototype pediatric learning health system is being expanded into a national network. Health Aff (Millwood) 2016; 33:1171-7. [PMID: 25006143 DOI: 10.1377/hlthaff.2014.0127] [Citation(s) in RCA: 110] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Except for a few conditions, pediatric disorders are rare diseases. Because of this, no single institution has enough patients to generate adequate sample sizes to produce generalizable knowledge. Aggregating electronic clinical data from millions of children across many pediatric institutions holds the promise of producing sufficiently large data sets to accelerate knowledge discovery. However, without deliberately embedding these data in a pediatric learning health system (defined as a health care organization that is purposefully designed to produce research in routine care settings and implement evidence at the point of care), efforts to act on this new knowledge, reducing the distress and suffering that children experience when sick, will be ineffective. In this article we discuss a prototype pediatric learning health system, ImproveCareNow, for children with inflammatory bowel disease. This prototype is being scaled up to create PEDSnet, a national network that will support the efficient conduct of clinical trials, observational research, and quality improvement across diseases, specialties, and institutions.
Collapse
Affiliation(s)
- Christopher B Forrest
- Christopher B. Forrest is a professor of pediatrics at the Children's Hospital of Philadelphia and the University of Pennsylvania as well as principal investigator for the PEDSnet learning health system, all in Philadelphia
| | - Peter Margolis
- Peter Margolis is a professor of pediatrics and director of research at the James M. Anderson Center for Health Systems Excellence at the Cincinnati Children's Hospital Medical Center, in Ohio, and scientific director of the ImproveCareNow network
| | - Michael Seid
- Michael Seid is director of health outcomes and quality of care research in the Division of Pulmonary Medicine and a professor of pediatrics at the Cincinnati Children's Hospital Medical Center
| | - Richard B Colletti
- Richard B. Colletti is a professor of pediatrics at the University of Vermont College of Medicine, in Burlington, and network director of the ImproveCareNow network
| |
Collapse
|
30
|
Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, McDermott MG, Ma'ayan A. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database (Oxford) 2016; 2016:baw100. [PMID: 27374120 PMCID: PMC4930834 DOI: 10.1093/database/baw100] [Citation(s) in RCA: 889] [Impact Index Per Article: 111.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2016] [Revised: 05/15/2016] [Accepted: 05/31/2016] [Indexed: 12/18/2022]
Abstract
Genomics, epigenomics, transcriptomics, proteomics and metabolomics efforts rapidly generate a plethora of data on the activity and levels of biomolecules within mammalian cells. At the same time, curation projects that organize knowledge from the biomedical literature into online databases are expanding. Hence, there is a wealth of information about genes, proteins and their associations, with an urgent need for data integration to achieve better knowledge extraction and data reuse. For this purpose, we developed the Harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins from over 70 major online resources. We extracted, abstracted and organized data into ∼72 million functional associations between genes/proteins and their attributes. Such attributes could be physical relationships with other biomolecules, expression in cell lines and tissues, genetic associations with knockout mouse or human phenotypes, or changes in expression after drug treatment. We stored these associations in a relational database along with rich metadata for the genes/proteins, their attributes and the original resources. The freely available Harmonizome web portal provides a graphical user interface, a web service and a mobile app for querying, browsing and downloading all of the collected data. To demonstrate the utility of the Harmonizome, we computed and visualized gene-gene and attribute-attribute similarity networks, and through unsupervised clustering, identified many unexpected relationships by combining pairs of datasets such as the association between kinase perturbations and disease signatures. We also applied supervised machine learning methods to predict novel substrates for kinases, endogenous ligands for G-protein coupled receptors, mouse phenotypes for knockout genes, and classified unannotated transmembrane proteins for likelihood of being ion channels. The Harmonizome is a comprehensive resource of knowledge about genes and proteins, and as such, it enables researchers to discover novel relationships between biological entities, as well as form novel data-driven hypotheses for experimental validation.Database URL: http://amp.pharm.mssm.edu/Harmonizome.
Collapse
Affiliation(s)
- Andrew D Rouillard
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gregory W Gundersen
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Nicolas F Fernandez
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Zichen Wang
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Caroline D Monteiro
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michael G McDermott
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Avi Ma'ayan
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
31
|
Marchese Robinson RL, Lynch I, Peijnenburg W, Rumble J, Klaessig F, Marquardt C, Rauscher H, Puzyn T, Purian R, Åberg C, Karcher S, Vriens H, Hoet P, Hoover MD, Hendren CO, Harper SL. How should the completeness and quality of curated nanomaterial data be evaluated? NANOSCALE 2016; 8:9919-43. [PMID: 27143028 PMCID: PMC4899944 DOI: 10.1039/c5nr08944a] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated?
Collapse
Affiliation(s)
- Richard L. Marchese Robinson
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool, L3 3AF, United Kingdom
| | - Iseult Lynch
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, B15 2TT Birmingham, United Kingdom
| | - Willie Peijnenburg
- National Institute of Public Health and the Environment (RIVM), Bilthoven, The Netherlands
- Institute of Environmental Sciences, Leiden University, Leiden, The Netherlands
| | - John Rumble
- R&R Data Services, 11 Montgomery Avenue, Gaithersburg MD 20877 USA
| | - Fred Klaessig
- Pennsylvania Bio Nano Systems LLC, 3805 Old Easton Road, Doylestown, PA 18902
| | - Clarissa Marquardt
- Institute of Applied Computer Sciences (IAI), Karlsruhe Institute of Technology (KIT), Hermann v. Helmholtz Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Hubert Rauscher
- European Commission, Joint Research Centre, Institute for Health and Consumer Protection, Via Fermi 2749, 21027 Ispra (VA), Italy
| | - Tomasz Puzyn
- Laboratory of Environmental Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Ronit Purian
- Faculty of Engineering, Tel Aviv University, Tel Aviv 69978 Israel
| | - Christoffer Åberg
- Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| | - Sandra Karcher
- Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA 15213-3890
| | - Hanne Vriens
- Department of Public Health and Primary Care, K.U.Leuven, Faculty of Medicine, Unit Environment & Health – Toxicology, Herestraat 49 (O&N 706), Leuven, Belgium
| | - Peter Hoet
- Department of Public Health and Primary Care, K.U.Leuven, Faculty of Medicine, Unit Environment & Health – Toxicology, Herestraat 49 (O&N 706), Leuven, Belgium
| | - Mark D. Hoover
- National Institute for Occupational Safety and Health, 1095 Willowdale Road, Morgantown, WV 26505-2888
| | - Christine Ogilvie Hendren
- Center for the Environmental Implications of NanoTechnology, Duke University, PO Box 90287 121 Hudson Hall, Durham NC 27708
| | - Stacey L. Harper
- Department of Environmental and Molecular Toxicology, School of Chemical, Biological and Environmental Engineering, Oregon State University, 1007 ALS, Corvallis, OR 97331
| |
Collapse
|
32
|
Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, Chibucos MC, Clancy K, Courtot M, Derom D, Dumontier M, Fan L, Fostel J, Fragoso G, Gibson F, Gonzalez-Beltran A, Haendel MA, He Y, Heiskanen M, Hernandez-Boussard T, Jensen M, Lin Y, Lister AL, Lord P, Malone J, Manduchi E, McGee M, Morrison N, Overton JA, Parkinson H, Peters B, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Schober D, Smith B, Soldatova LN, Stoeckert CJ, Taylor CF, Torniai C, Turner JA, Vita R, Whetzel PL, Zheng J. The Ontology for Biomedical Investigations. PLoS One 2016; 11:e0154556. [PMID: 27128319 PMCID: PMC4851331 DOI: 10.1371/journal.pone.0154556] [Citation(s) in RCA: 142] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Accepted: 04/17/2016] [Indexed: 12/18/2022] Open
Abstract
The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed in association with OBI. The current release of OBI is available at http://purl.obolibrary.org/obo/obi.owl.
Collapse
Affiliation(s)
- Anita Bandrowski
- University of California San Diego, La Jolla, California, United States of America
| | - Ryan Brinkman
- British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada
| | - Mathias Brochhausen
- University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Matthew H. Brush
- Oregon Health and Science University, Portland, Oregon, United States of America
| | - Bill Bug
- Drexel University College of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Marcus C. Chibucos
- University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Kevin Clancy
- Thermo Fisher Scientific, Carlsbad, California, United States of America
| | | | - Dirk Derom
- The Vrije Universiteit Brussel, Ixelles, Brussels, Belgium
| | - Michel Dumontier
- Stanford University, Stanford, California, United States of America
| | - Liju Fan
- Ontology Workshop, LLC, Columbia, Maryland, United States of America
| | - Jennifer Fostel
- National Toxicology Program, NIEHS, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Gilberto Fragoso
- Center for Biomedical Informatics and Information Technology, National Institutes of Health, Rockville, Maryland, United States of America
| | - Frank Gibson
- Royal Society of Chemistry, Cambridge, Cambridgeshire, United Kingdom
| | | | - Melissa A. Haendel
- Oregon Health and Science University, Portland, Oregon, United States of America
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Mervi Heiskanen
- National Cancer Institute, Rockville, Maryland, United States of America
| | | | - Mark Jensen
- University at Buffalo, Buffalo, New York, United States of America
| | - Yu Lin
- University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | | | - Phillip Lord
- Newcastle University, Newcastle-upon-Tyne, Tyne and Wear, United Kingdom
| | - James Malone
- European Molecular Biology Laboratory- European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom
| | - Elisabetta Manduchi
- University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Monnie McGee
- Southern Methodist University, Dallas, Texas, United States of America
| | - Norman Morrison
- The University of Manchester, Manchester, Greater Manchester, United Kingdom
| | - James A. Overton
- La Jolla Institute for Allergy and Immunology, La Jolla, California, United States of America
| | - Helen Parkinson
- European Molecular Biology Laboratory- European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom
| | - Bjoern Peters
- La Jolla Institute for Allergy and Immunology, La Jolla, California, United States of America
| | | | - Alan Ruttenberg
- University at Buffalo, Buffalo, New York, United States of America
| | | | | | - Daniel Schober
- Leibniz Institute of Plant Biochemistry, Halle, Saxony-Anhalt, Germany
| | - Barry Smith
- University at Buffalo, Buffalo, New York, United States of America
| | | | | | - Chris F. Taylor
- European Molecular Biology Laboratory- European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom
| | - Carlo Torniai
- Oregon Health and Science University, Portland, Oregon, United States of America
| | - Jessica A. Turner
- Georgia State University, Atlanta, Georgia, United States of America
| | - Randi Vita
- La Jolla Institute for Allergy and Immunology, La Jolla, California, United States of America
| | - Patricia L. Whetzel
- University of California San Diego, La Jolla, California, United States of America
| | - Jie Zheng
- University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
33
|
Arend D, Junker A, Scholz U, Schüler D, Wylie J, Lange M. PGP repository: a plant phenomics and genomics data publication infrastructure. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw033. [PMID: 27087305 PMCID: PMC4834206 DOI: 10.1093/database/baw033] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 02/26/2016] [Indexed: 11/22/2022]
Abstract
Plant genomics and phenomics represents the most promising tools for accelerating yield gains and overcoming emerging crop productivity bottlenecks. However, accessing this wealth of plant diversity requires the characterization of this material using state-of-the-art genomic, phenomic and molecular technologies and the release of subsequent research data via a long-term stable, open-access portal. Although several international consortia and public resource centres offer services for plant research data management, valuable digital assets remains unpublished and thus inaccessible to the scientific community. Recently, the Leibniz Institute of Plant Genetics and Crop Plant Research and the German Plant Phenotyping Network have jointly initiated the Plant Genomics and Phenomics Research Data Repository (PGP) as infrastructure to comprehensively publish plant research data. This covers in particular cross-domain datasets that are not being published in central repositories because of its volume or unsupported data scope, like image collections from plant phenotyping and microscopy, unfinished genomes, genotyping data, visualizations of morphological plant models, data from mass spectrometry as well as software and documents. The repository is hosted at Leibniz Institute of Plant Genetics and Crop Plant Research using e!DAL as software infrastructure and a Hierarchical Storage Management System as data archival backend. A novel developed data submission tool was made available for the consortium that features a high level of automation to lower the barriers of data publication. After an internal review process, data are published as citable digital object identifiers and a core set of technical metadata is registered at DataCite. The used e!DAL-embedded Web frontend generates for each dataset a landing page and supports an interactive exploration. PGP is registered as research data repository at BioSharing.org, re3data.org and OpenAIRE as valid EU Horizon 2020 open data archive. Above features, the programmatic interface and the support of standard metadata formats, enable PGP to fulfil the FAIR data principles—findable, accessible, interoperable, reusable. Database URL:http://edal.ipk-gatersleben.de/repos/pgp/
Collapse
Affiliation(s)
- Daniel Arend
- Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstraße 3, Stadt Seeland, 06466, Gatersleben, Germany
| | - Astrid Junker
- Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstraße 3, Stadt Seeland, 06466, Gatersleben, Germany
| | - Uwe Scholz
- Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstraße 3, Stadt Seeland, 06466, Gatersleben, Germany
| | - Danuta Schüler
- Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstraße 3, Stadt Seeland, 06466, Gatersleben, Germany
| | - Juliane Wylie
- Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstraße 3, Stadt Seeland, 06466, Gatersleben, Germany
| | - Matthias Lange
- Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstraße 3, Stadt Seeland, 06466, Gatersleben, Germany
| |
Collapse
|
34
|
Higdon R, Earl RK, Stanberry L, Hudac CM, Montague E, Stewart E, Janko I, Choiniere J, Broomall W, Kolker N, Bernier RA, Kolker E. The promise of multi-omics and clinical data integration to identify and target personalized healthcare approaches in autism spectrum disorders. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2016; 19:197-208. [PMID: 25831060 DOI: 10.1089/omi.2015.0020] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Complex diseases are caused by a combination of genetic and environmental factors, creating a difficult challenge for diagnosis and defining subtypes. This review article describes how distinct disease subtypes can be identified through integration and analysis of clinical and multi-omics data. A broad shift toward molecular subtyping of disease using genetic and omics data has yielded successful results in cancer and other complex diseases. To determine molecular subtypes, patients are first classified by applying clustering methods to different types of omics data, then these results are integrated with clinical data to characterize distinct disease subtypes. An example of this molecular-data-first approach is in research on Autism Spectrum Disorder (ASD), a spectrum of social communication disorders marked by tremendous etiological and phenotypic heterogeneity. In the case of ASD, omics data such as exome sequences and gene and protein expression data are combined with clinical data such as psychometric testing and imaging to enable subtype identification. Novel ASD subtypes have been proposed, such as CHD8, using this molecular subtyping approach. Broader use of molecular subtyping in complex disease research is impeded by data heterogeneity, diversity of standards, and ineffective analysis tools. The future of molecular subtyping for ASD and other complex diseases calls for an integrated resource to identify disease mechanisms, classify new patients, and inform effective treatment options. This in turn will empower and accelerate precision medicine and personalized healthcare.
Collapse
Affiliation(s)
- Roger Higdon
- 1 Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Cysique LA. Advancing research in NeuroAIDS using collaboration and public data sharing. BMC Med Genomics 2015; 8:76. [PMID: 26560870 PMCID: PMC4642768 DOI: 10.1186/s12920-015-0150-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 11/03/2015] [Indexed: 11/16/2022] Open
Abstract
In this issue of BMC Medical Genomics Griffin et al. present a user-friendly and freely accessible HIV-associated neurocognitive disorder (HAND) genomic database that compiles viral (HIV-1) genetic sequences and other relevant clinical and treatment data. We discuss the benefits and caveats of public data sharing in NeuroAIDS research, while emphasizing the importance of such novel initiatives for advancing knowledge.
Collapse
Affiliation(s)
- Lucette A Cysique
- University of New South Wales, Sydney, Australia. .,Neuroscience Research, Sydney, Australia. .,St. Vincent's Hospital Centre for Applied Medical Research, Sydney, Australia.
| |
Collapse
|
36
|
Sustaining large-scale infrastructure to promote pre-competitive biomedical research: lessons from mouse genomics. N Biotechnol 2015; 33:280-94. [PMID: 26563511 DOI: 10.1016/j.nbt.2015.10.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Revised: 08/07/2015] [Accepted: 10/12/2015] [Indexed: 01/25/2023]
Abstract
Bio-repositories and databases for biomedical research enable the efficient community-wide sharing of reagents and data. These archives play an increasingly prominent role in the generation and dissemination of bioresources and data essential for fundamental and translational research. Evidence suggests, however, that current funding and governance models, generally short-term and nationally focused, do not adequately support the role of archives in long-term, transnational endeavours to make and share high-impact resources. Our qualitative case study of the International Knockout Mouse Consortium and the International Mouse Phenotyping Consortium examines new governance mechanisms for archive sustainability. Funders and archive managers highlight in interviews that archives need stable public funding and new revenue-generation models to be sustainable. Sustainability also requires archives, journal publishers, and funders to implement appropriate incentives, associated metrics, and enforcement mechanisms to ensure that researchers use archives to deposit reagents and data to make them publicly accessible for academia and industry alike.
Collapse
|
37
|
Antman EM, Benjamin EJ, Harrington RA, Houser SR, Peterson ED, Bauman MA, Brown N, Bufalino V, Califf RM, Creager MA, Daugherty A, Demets DL, Dennis BP, Ebadollahi S, Jessup M, Lauer MS, Lo B, MacRae CA, McConnell MV, McCray AT, Mello MM, Mueller E, Newburger JW, Okun S, Packer M, Philippakis A, Ping P, Prasoon P, Roger VL, Singer S, Temple R, Turner MB, Vigilante K, Warner J, Wayte P. Acquisition, Analysis, and Sharing of Data in 2015 and Beyond: A Survey of the Landscape: A Conference Report From the American Heart Association Data Summit 2015. J Am Heart Assoc 2015; 4:e002810. [PMID: 26541391 PMCID: PMC4845234 DOI: 10.1161/jaha.115.002810] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Accepted: 10/14/2015] [Indexed: 01/11/2023]
Abstract
BACKGROUND A 1.5-day interactive forum was convened to discuss critical issues in the acquisition, analysis, and sharing of data in the field of cardiovascular and stroke science. The discussion will serve as the foundation for the American Heart Association's (AHA's) near-term and future strategies in the Big Data area. The concepts evolving from this forum may also inform other fields of medicine and science. METHODS AND RESULTS A total of 47 participants representing stakeholders from 7 domains (patients, basic scientists, clinical investigators, population researchers, clinicians and healthcare system administrators, industry, and regulatory authorities) participated in the conference. Presentation topics included updates on data as viewed from conventional medical and nonmedical sources, building and using Big Data repositories, articulation of the goals of data sharing, and principles of responsible data sharing. Facilitated breakout sessions were conducted to examine what each of the 7 stakeholder domains wants from Big Data under ideal circumstances and the possible roles that the AHA might play in meeting their needs. Important areas that are high priorities for further study regarding Big Data include a description of the methodology of how to acquire and analyze findings, validation of the veracity of discoveries from such research, and integration into investigative and clinical care aspects of future cardiovascular and stroke medicine. Potential roles that the AHA might consider include facilitating a standards discussion (eg, tools, methodology, and appropriate data use), providing education (eg, healthcare providers, patients, investigators), and helping build an interoperable digital ecosystem in cardiovascular and stroke science. CONCLUSION There was a consensus across stakeholder domains that Big Data holds great promise for revolutionizing the way cardiovascular and stroke research is conducted and clinical care is delivered; however, there is a clear need for the creation of a vision of how to use it to achieve the desired goals. Potential roles for the AHA center around facilitating a discussion of standards, providing education, and helping establish a cardiovascular digital ecosystem. This ecosystem should be interoperable and needs to interface with the rapidly growing digital object environment of the modern-day healthcare system.
Collapse
|
38
|
Kaye J, Muddyman D, Smee C, Kennedy K, Bell J. 'Pop-Up' Governance: developing internal governance frameworks for consortia: the example of UK10K. LIFE SCIENCES, SOCIETY AND POLICY 2015; 11:10. [PMID: 26412243 PMCID: PMC4584211 DOI: 10.1186/s40504-015-0028-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 09/15/2015] [Indexed: 06/05/2023]
Abstract
Innovations in information technologies have facilitated the development of new styles of research networks and forms of governance. This is evident in genomics where increasingly, research is carried out by large, interdisciplinary consortia focussing on a specific research endeavour. The UK10K project is an example of a human genomics consortium funded to provide insights into the genomics of rare conditions, and establish a community resource from generated sequence data. To achieve its objectives according to the agreed timetable, the UK10K project established an internal governance system to expedite the research and to deal with the complex issues that arose. The project's governance structure exemplifies a new form of network governance called 'pop-up' governance. 'Pop-up' because: it was put together quickly, existed for a specific period, was designed for a specific purpose, and was dismantled easily on project completion. In this paper, we use UK10K to describe how 'pop-up' governance works on the ground and how relational, hierarchical and contractual governance mechanisms are used in this new form of network governance.
Collapse
Affiliation(s)
- Jane Kaye
- HeLEX Centre, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
- Nuffield Department of Population Health, HeLEX - Centre for Health, Law and Emerging Technologies, University of Oxford, Old Road Campus, Oxford, OX3 7LF, UK.
| | | | - Carol Smee
- Wellcome Trust Sanger Institute, Cambridge, UK
| | | | - Jessica Bell
- HeLEX Centre, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| |
Collapse
|
39
|
Alterovitz G, Warner J, Zhang P, Chen Y, Ullman-Cullere M, Kreda D, Kohane IS. SMART on FHIR Genomics: facilitating standardized clinico-genomic apps. J Am Med Inform Assoc 2015. [PMID: 26198304 DOI: 10.1093/jamia/ocv045] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Supporting clinical decision support for personalized medicine will require linking genome and phenome variants to a patient's electronic health record (EHR), at times on a vast scale. Clinico-genomic data standards will be needed to unify how genomic variant data are accessed from different sequencing systems. METHODS A specification for the basis of a clinic-genomic standard, building upon the current Health Level Seven International Fast Healthcare Interoperability Resources (FHIR®) standard, was developed. An FHIR application protocol interface (API) layer was attached to proprietary sequencing platforms and EHRs in order to expose gene variant data for presentation to the end-user. Three representative apps based on the SMART platform were built to test end-to-end feasibility, including integration of genomic and clinical data. RESULTS Successful design, deployment, and use of the API was demonstrated and adopted by HL7 Clinical Genomics Workgroup. Feasibility was shown through development of three apps by various types of users with background levels and locations. CONCLUSION This prototyping work suggests that an entirely data (and web) standards-based approach could prove both effective and efficient for advancing personalized medicine.
Collapse
Affiliation(s)
- Gil Alterovitz
- Children's Hospital Informatics Program, Boston, MA Center for Biomedical Informatics, Harvard Medical School, Boston, MA Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA
| | - Jeremy Warner
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN Department of Medicine, Division of Hematology/Oncology, Vanderbilt University, Nashville, TN
| | - Peijin Zhang
- Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA
| | - Yishen Chen
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL
| | | | - David Kreda
- Center for Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Isaac S Kohane
- Children's Hospital Informatics Program, Boston, MA Center for Biomedical Informatics, Harvard Medical School, Boston, MA Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA
| |
Collapse
|
40
|
Federer LM, Lu YL, Joubert DJ, Welsh J, Brandys B. Biomedical Data Sharing and Reuse: Attitudes and Practices of Clinical and Scientific Research Staff. PLoS One 2015; 10:e0129506. [PMID: 26107811 PMCID: PMC4481309 DOI: 10.1371/journal.pone.0129506] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2015] [Accepted: 05/08/2015] [Indexed: 01/01/2023] Open
Abstract
Background Significant efforts are underway within the biomedical research community to encourage sharing and reuse of research data in order to enhance research reproducibility and enable scientific discovery. While some technological challenges do exist, many of the barriers to sharing and reuse are social in nature, arising from researchers’ concerns about and attitudes toward sharing their data. In addition, clinical and basic science researchers face their own unique sets of challenges to sharing data within their communities. This study investigates these differences in experiences with and perceptions about sharing data, as well as barriers to sharing among clinical and basic science researchers. Methods Clinical and basic science researchers in the Intramural Research Program at the National Institutes of Health were surveyed about their attitudes toward and experiences with sharing and reusing research data. Of 190 respondents to the survey, the 135 respondents who identified themselves as clinical or basic science researchers were included in this analysis. Odds ratio and Fisher’s exact tests were the primary methods to examine potential relationships between variables. Worst-case scenario sensitivity tests were conducted when necessary. Results and Discussion While most respondents considered data sharing and reuse important to their work, they generally rated their expertise as low. Sharing data directly with other researchers was common, but most respondents did not have experience with uploading data to a repository. A number of significant differences exist between the attitudes and practices of clinical and basic science researchers, including their motivations for sharing, their reasons for not sharing, and the amount of work required to prepare their data. Conclusions Even within the scope of biomedical research, addressing the unique concerns of diverse research communities is important to encouraging researchers to share and reuse data. Efforts at promoting data sharing and reuse should be aimed at solving not only technological problems, but also addressing researchers’ concerns about sharing their data. Given the varied practices of individual researchers and research communities, standardizing data practices like data citation and repository upload could make sharing and reuse easier.
Collapse
Affiliation(s)
- Lisa M. Federer
- NIH Library, Division of Library Services, Office of Research Services, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| | - Ya-Ling Lu
- NIH Library, Division of Library Services, Office of Research Services, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Douglas J. Joubert
- NIH Library, Division of Library Services, Office of Research Services, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Judith Welsh
- NIH Library, Division of Library Services, Office of Research Services, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Barbara Brandys
- NIH Library, Division of Library Services, Office of Research Services, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
41
|
Intuitive web-based experimental design for high-throughput biomedical data. BIOMED RESEARCH INTERNATIONAL 2015; 2015:958302. [PMID: 25954760 PMCID: PMC4411450 DOI: 10.1155/2015/958302] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2014] [Accepted: 03/09/2015] [Indexed: 11/25/2022]
Abstract
Big data bioinformatics aims at drawing biological conclusions from huge and complex biological datasets. Added value from the analysis of big data, however, is only possible if the data is accompanied by accurate metadata annotation. Particularly in high-throughput experiments intelligent approaches are needed to keep track of the experimental design, including the conditions that are studied as well as information that might be interesting for failure analysis or further experiments in the future. In addition to the management of this information, means for an integrated design and interfaces for structured data annotation are urgently needed by researchers. Here, we propose a factor-based experimental design approach that enables scientists to easily create large-scale experiments with the help of a web-based system. We present a novel implementation of a web-based interface allowing the collection of arbitrary metadata. To exchange and edit information we provide a spreadsheet-based, humanly readable format. Subsequently, sample sheets with identifiers and metainformation for data generation facilities can be created. Data files created after measurement of the samples can be uploaded to a datastore, where they are automatically linked to the previously created experimental design model.
Collapse
|
42
|
Dubé L, Labban A, Moubarac JC, Heslop G, Ma Y, Paquet C. A nutrition/health mindset on commercial Big Data and drivers of food demand in modern and traditional systems. Ann N Y Acad Sci 2015; 1331:278-295. [PMID: 25514866 DOI: 10.1111/nyas.12595] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Building greater reciprocity between traditional and modern food systems and better convergence of human and economic development outcomes may enable the production and consumption of accessible, affordable, and appealing nutritious food for all. Information being key to such transformations, this roadmap paper offers a strategy that capitalizes on Big Data and advanced analytics, setting the foundation for an integrative intersectoral knowledge platform to better inform and monitor behavioral change and ecosystem transformation. Building upon the four P's of marketing (product, price, promotion, placement), we examine digital commercial marketing data through the lenses of the four A's of food security (availability, accessibility, affordability, appeal) using advanced consumer choice analytics for archetypal traditional (fresh fruits and vegetables) and modern (soft drinks) product categories. We demonstrate that business practices typically associated with the latter also have an important, if not more important, impact on purchases of the former category. Implications and limitations of the approach are discussed.
Collapse
Affiliation(s)
- Laurette Dubé
- Desautels Faculty of Management, McGill University, Montréal, Québec, Canada.,McGill Centre for the Convergence of Health and Economics (MCCHE), McGill University, Montréal, Québec, Canada
| | - Alice Labban
- Desautels Faculty of Management, McGill University, Montréal, Québec, Canada
| | | | - Gabriela Heslop
- McGill Centre for the Convergence of Health and Economics (MCCHE), McGill University, Montréal, Québec, Canada
| | - Yu Ma
- Department of Marketing, Business Economics, and Law, Alberta School of Business, University of Alberta, Edmonton, Alberta, Canada
| | - Catherine Paquet
- School of Population Health, University of South Australia, Adelaide, Australia.,Douglas Hospital Research Center, Douglas Mental Health University Institute, Montréal, Québec, Canada
| |
Collapse
|
43
|
Swamidass SJ, Matlock M, Rozenblit L. Securely measuring the overlap between private datasets with cryptosets. PLoS One 2015; 10:e0117898. [PMID: 25714898 PMCID: PMC4340911 DOI: 10.1371/journal.pone.0117898] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 01/04/2015] [Indexed: 11/19/2022] Open
Abstract
Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure.
Collapse
Affiliation(s)
- S. Joshua Swamidass
- Department of Pathology, Washington University School of Medicine, St. Louis, MO, USA
| | - Matthew Matlock
- Department of Pathology, Washington University School of Medicine, St. Louis, MO, USA
| | | |
Collapse
|
44
|
Kelder T, Summer G, Caspers M, van Schothorst EM, Keijer J, Duivenvoorde L, Klaus S, Voigt A, Bohnert L, Pico C, Palou A, Bonet ML, Dembinska-Kiec A, Malczewska-Malec M, Kieć-Wilk B, del Bas JM, Caimari A, Arola L, van Erk M, van Ommen B, Radonjic M. White adipose tissue reference network: a knowledge resource for exploring health-relevant relations. GENES & NUTRITION 2015; 10:439. [PMID: 25466819 PMCID: PMC4252261 DOI: 10.1007/s12263-014-0439-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2014] [Accepted: 10/24/2014] [Indexed: 12/13/2022]
Abstract
Optimal health is maintained by interaction of multiple intrinsic and environmental factors at different levels of complexity-from molecular, to physiological, to social. Understanding and quantification of these interactions will aid design of successful health interventions. We introduce the reference network concept as a platform for multi-level exploration of biological relations relevant for metabolic health, by integration and mining of biological interactions derived from public resources and context-specific experimental data. A White Adipose Tissue Health Reference Network (WATRefNet) was constructed as a resource for discovery and prioritization of mechanism-based biomarkers for white adipose tissue (WAT) health status and the effect of food and drug compounds on WAT health status. The WATRefNet (6,797 nodes and 32,171 edges) is based on (1) experimental data obtained from 10 studies addressing different adiposity states, (2) seven public knowledge bases of molecular interactions, (3) expert's definitions of five physiologically relevant processes key to WAT health, namely WAT expandability, Oxidative capacity, Metabolic state, Oxidative stress and Tissue inflammation, and (4) a collection of relevant biomarkers of these processes identified by BIOCLAIMS ( http://bioclaims.uib.es ). The WATRefNet comprehends multiple layers of biological complexity as it contains various types of nodes and edges that represent different biological levels and interactions. We have validated the reference network by showing overrepresentation with anti-obesity drug targets, pathology-associated genes and differentially expressed genes from an external disease model dataset. The resulting network has been used to extract subnetworks specific to the above-mentioned expert-defined physiological processes. Each of these process-specific signatures represents a mechanistically supported composite biomarker for assessing and quantifying the effect of interventions on a physiological aspect that determines WAT health status. Following this principle, five anti-diabetic drug interventions and one diet intervention were scored for the match of their expression signature to the five biomarker signatures derived from the WATRefNet. This confirmed previous observations of successful intervention by dietary lifestyle and revealed WAT-specific effects of drug interventions. The WATRefNet represents a sustainable knowledge resource for extraction of relevant relationships such as mechanisms of action, nutrient intervention targets and biomarkers and for assessment of health effects for support of health claims made on food products.
Collapse
Affiliation(s)
- Thomas Kelder
- Microbiology & Systems Biology, TNO, Zeist, The Netherlands
- Present Address: EdgeLeap B.V., Hooghiemstraplein 15, 3514 AX Utrecht, The Netherlands
| | - Georg Summer
- Microbiology & Systems Biology, TNO, Zeist, The Netherlands
- CARIM, Maastricht University, Maastricht, The Netherlands
| | | | | | - Jaap Keijer
- Human and Animal Physiology, Wageningen University, Wageningen, The Netherlands
| | - Loes Duivenvoorde
- Human and Animal Physiology, Wageningen University, Wageningen, The Netherlands
| | - Susanne Klaus
- Group of Energy Metabolism, German Institute of Human Nutrition in Potsdam, Nuthetal, Germany
| | - Anja Voigt
- Group of Energy Metabolism, German Institute of Human Nutrition in Potsdam, Nuthetal, Germany
| | - Laura Bohnert
- Group of Energy Metabolism, German Institute of Human Nutrition in Potsdam, Nuthetal, Germany
| | - Catalina Pico
- Molecular Biology, Nutrition and Biotechnology (Nutrigenomics), University of the Balearic Islands (UIB), Palma de Mallorca, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Palma de Mallorca, Spain
| | - Andreu Palou
- Molecular Biology, Nutrition and Biotechnology (Nutrigenomics), University of the Balearic Islands (UIB), Palma de Mallorca, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Palma de Mallorca, Spain
| | - M. Luisa Bonet
- Molecular Biology, Nutrition and Biotechnology (Nutrigenomics), University of the Balearic Islands (UIB), Palma de Mallorca, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Palma de Mallorca, Spain
| | - Aldona Dembinska-Kiec
- Department of Clinical Biochemistry, Jagiellonian University Medical College, Krakow, Poland
| | | | - Beata Kieć-Wilk
- Department of Metabolic Disorders, Jagiellonian University Medical College, Krakow, Poland
| | - Josep M. del Bas
- Centre Tecnològic de Nutrició i Salut (CTNS), TECNIO, Reus, Spain
| | - Antoni Caimari
- Centre Tecnològic de Nutrició i Salut (CTNS), TECNIO, Reus, Spain
| | - Lluis Arola
- Centre Tecnològic de Nutrició i Salut (CTNS), TECNIO, Reus, Spain
- Rovira i Virgili University, Tarragona, Spain
| | - Marjan van Erk
- Microbiology & Systems Biology, TNO, Zeist, The Netherlands
| | - Ben van Ommen
- Microbiology & Systems Biology, TNO, Zeist, The Netherlands
| | - Marijana Radonjic
- Microbiology & Systems Biology, TNO, Zeist, The Netherlands
- Present Address: EdgeLeap B.V., Hooghiemstraplein 15, 3514 AX Utrecht, The Netherlands
| |
Collapse
|
45
|
Angrist M, Cook-Deegan R. Distributing the future: The weak justifications for keeping human genomic databases secret and the challenges and opportunities in reverse engineering them. Appl Transl Genom 2014; 3:124-127. [PMID: 25642409 PMCID: PMC4307597 DOI: 10.1016/j.atg.2014.09.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Affiliation(s)
- Misha Angrist
- Science and Society, Social Science Research Institute, Duke University, Durham, NC, United States
| | - Robert Cook-Deegan
- Sanford School of Public Policy, Duke University, Durham, NC, United States
| |
Collapse
|
46
|
Heatherly R, Denny JC, Haines JL, Roden DM, Malin BA. Size matters: how population size influences genotype-phenotype association studies in anonymized data. J Biomed Inform 2014; 52:243-50. [PMID: 25038554 PMCID: PMC4260994 DOI: 10.1016/j.jbi.2014.07.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2013] [Revised: 05/21/2014] [Accepted: 07/07/2014] [Indexed: 12/29/2022]
Abstract
OBJECTIVE Electronic medical records (EMRs) data is increasingly incorporated into genome-phenome association studies. Investigators hope to share data, but there are concerns it may be "re-identified" through the exploitation of various features, such as combinations of standardized clinical codes. Formal anonymization algorithms (e.g., k-anonymization) can prevent such violations, but prior studies suggest that the size of the population available for anonymization may influence the utility of the resulting data. We systematically investigate this issue using a large-scale biorepository and EMR system through which we evaluate the ability of researchers to learn from anonymized data for genome-phenome association studies under various conditions. METHODS We use a k-anonymization strategy to simulate a data protection process (on data sets containing clinical codes) for resources of similar size to those found at nine academic medical institutions within the United States. Following the protection process, we replicate an existing genome-phenome association study and compare the discoveries using the protected data and the original data through the correlation (r(2)) of the p-values of association significance. RESULTS Our investigation shows that anonymizing an entire dataset with respect to the population from which it is derived yields significantly more utility than small study-specific datasets anonymized unto themselves. When evaluated using the correlation of genome-phenome association strengths on anonymized data versus original data, all nine simulated sites, results from largest-scale anonymizations (population ∼100,000) retained better utility to those on smaller sizes (population ∼6000-75,000). We observed a general trend of increasing r(2) for larger data set sizes: r(2)=0.9481 for small-sized datasets, r(2)=0.9493 for moderately-sized datasets, r(2)=0.9934 for large-sized datasets. CONCLUSIONS This research implies that regardless of the overall size of an institution's data, there may be significant benefits to anonymization of the entire EMR, even if the institution is planning on releasing only data about a specific cohort of patients.
Collapse
Affiliation(s)
- Raymond Heatherly
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA.
| | - Joshua C Denny
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA; Department of Medicine, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA
| | - Jonathan L Haines
- Department of Epidemiology and Biostatistics, University School of Medicine, Case Western Reserve University, USA
| | - Dan M Roden
- Department of Medicine, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA; Department of Pharmacology, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA; Department of Electrical Engineering and Computer Science, School of Engineering, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA
| |
Collapse
|
47
|
Kyrpides NC, Hugenholtz P, Eisen JA, Woyke T, Göker M, Parker CT, Amann R, Beck BJ, Chain PSG, Chun J, Colwell RR, Danchin A, Dawyndt P, Dedeurwaerdere T, DeLong EF, Detter JC, De Vos P, Donohue TJ, Dong XZ, Ehrlich DS, Fraser C, Gibbs R, Gilbert J, Gilna P, Glöckner FO, Jansson JK, Keasling JD, Knight R, Labeda D, Lapidus A, Lee JS, Li WJ, MA J, Markowitz V, Moore ERB, Morrison M, Meyer F, Nelson KE, Ohkuma M, Ouzounis CA, Pace N, Parkhill J, Qin N, Rossello-Mora R, Sikorski J, Smith D, Sogin M, Stevens R, Stingl U, Suzuki KI, Taylor D, Tiedje JM, Tindall B, Wagner M, Weinstock G, Weissenbach J, White O, Wang J, Zhang L, Zhou YG, Field D, Whitman WB, Garrity GM, Klenk HP. Genomic encyclopedia of bacteria and archaea: sequencing a myriad of type strains. PLoS Biol 2014; 12:e1001920. [PMID: 25093819 PMCID: PMC4122341 DOI: 10.1371/journal.pbio.1001920] [Citation(s) in RCA: 138] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
This manuscript calls for an international effort to generate a comprehensive catalog from genome sequences of all the archaeal and bacterial type strains. Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.
Collapse
Affiliation(s)
- Nikos C. Kyrpides
- DOE-Joint Genome Institute, Walnut Creek, California, United States of America
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
- * E-mail: (NCK); (HPK)
| | - Philip Hugenholtz
- Australian Centre for Ecogenomics Research, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Jonathan A. Eisen
- University of California, Davis, Davis, California, United States of America
| | - Tanja Woyke
- DOE-Joint Genome Institute, Walnut Creek, California, United States of America
| | - Markus Göker
- DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany
| | | | - Rudolf Amann
- Max Planck Institute for Marine Microbiology, Bremen, Germany
| | - Brian J. Beck
- American Type Culture Collection (ATCC), Manassas, Virginia, United States of America
| | - Patrick S. G. Chain
- Los Alamos National Laboratory, Bioscience Division, Los Alamos, New Mexico, United States of America
| | - Jongsik Chun
- School of Biological Sciences and Chunlab Inc., Seoul National University, Seoul, Korea
| | - Rita R. Colwell
- University of Maryland, College Park, College Park, Maryland, United States of America
- Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
| | | | - Peter Dawyndt
- Ghent University, Department of Applied Mathematics and Computer Science, Ghent, Belgium
| | - Tom Dedeurwaerdere
- Centre for Philosophy of Law, Université catholique de Louvain, Louvain-la-Neuve, Belgium
| | - Edward F. DeLong
- Department of Civil and Environmental Engineering and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - John C. Detter
- Los Alamos National Laboratory, Bioscience Division, Los Alamos, New Mexico, United States of America
| | - Paul De Vos
- Ghent University, Department of Applied Mathematics and Computer Science, Ghent, Belgium
- Ghent University, BCCM/LMG Bacteria collection, Laboratory of Microbiology, Ghent, Belgium
| | - Timothy J. Donohue
- University of Wisconsin-Madison, Great Lakes Bioenergy Research Center, Madison, Wisconsin, United States of America
| | - Xiu-Zhu Dong
- Bioresource Center (BRC) of Institute of Microbiology, Chinese Academy of Sciences, P. R. China
| | - Dusko S. Ehrlich
- Institut National de la Recherche Agronomique, Jouy en Josas, France
| | - Claire Fraser
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Richard Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Jack Gilbert
- Institute for Genomics and Systems Biology, Argonne National Laboratory, Argonne, Illinois, United States of America
| | - Paul Gilna
- BioEnergy Science Center (BESC), Oak Ridge National Laboratory, Knoxville, Tennessee, United States of America
| | - Frank Oliver Glöckner
- Max Planck Institute for Marine Microbiology, Bremen, Germany
- Jacobs University Bremen gGmbH, Bremen, Germany
| | - Janet K. Jansson
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Jay D. Keasling
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Joint BioEnergy Institute (JBEI), Berkeley, California, United States of America
| | - Rob Knight
- Howard Hughes Medical Institute and Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado, United States of America
| | - David Labeda
- ARS, USDA, National Center for Agricultural Utilization Research, Peoria, Illinois, United States of America
| | - Alla Lapidus
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia
- Algorithmic Biology Lab, St. Petersburg Academic University, St. Petersburg, Russia
| | - Jung-Sook Lee
- Korean Collection for Type Cultures (KCTC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong-gu, Daejeon, Korea
| | - Wen-Jun Li
- The Key Laboratory for Microbial Resources of the Ministry of Education, Kunming, People's Republic of China
| | - Juncai MA
- China General Microbiological Culture Collection Center (CGMCC), Institute of Microbiology, Chinese Academy of Sciences, Beijing, P. R. China
| | - Victor Markowitz
- DOE-Joint Genome Institute, Walnut Creek, California, United States of America
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Edward R. B. Moore
- CCUG - Culture Collection University of Gothenburg, Sahlgrenska Academy of the University of Gothenburg, Gothenburg, Sweden
| | - Mark Morrison
- Diamantina Institute, The University of Queensland, St Lucia, Queensland, Australia
| | - Folker Meyer
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, United States of America
| | - Karen E. Nelson
- The J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Moriya Ohkuma
- Riken Bioresource Center, Japan Collection of Microorganisms, Hirosawa, Wako, Saitama, Japan
| | - Christos A. Ouzounis
- Chemical Process & Energy Resources Institute, Centre for Research & Technology, Thessalonica, Greece
- Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Norman Pace
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Colorado, United States of America
| | - Julian Parkhill
- The Pathogen Genomics, The Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - Nan Qin
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China
| | - Ramon Rossello-Mora
- Institut Mediterrani d'Estudis Avançats (IMEDEA, CSIC-UIB), Esporles, Illes Balears, Spain
| | - Johannes Sikorski
- DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany
| | - David Smith
- CABI, Bakeham Lane, Egham, Surrey, United Kingdom
| | - Mitch Sogin
- Josephine Bay Paul Center for Comparative Evolution and Molecular Biology, MBL, Woods Hole, Massachusetts, United States of America
| | - Rick Stevens
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, United States of America
| | - Uli Stingl
- Red Sea Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | | | - Dorothea Taylor
- NamesforLife, LLC, East Lansing, Michigan, United States of America
| | - Jim M. Tiedje
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
| | - Brian Tindall
- DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany
| | - Michael Wagner
- Department of Microbial Ecology, University of Vienna, Vienna, Austria
| | - George Weinstock
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut
| | - Jean Weissenbach
- Commissariat à l'Energie Atomique (CEA), Genoscope, Evry, France
| | - Owen White
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Jun Wang
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Lixin Zhang
- Bioresource Center (BRC) of Institute of Microbiology, Chinese Academy of Sciences, P. R. China
- Chinese Academy of Sciences Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, P. R. China
| | - Yu-Guang Zhou
- China General Microbiological Culture Collection Center (CGMCC), Institute of Microbiology, Chinese Academy of Sciences, Beijing, P. R. China
| | - Dawn Field
- U.K. Natural Environment Research Council (NERC), Environmental Bioinformatics Centre, Oxford, United Kingdom
| | - William B. Whitman
- Department of Microbiology, University of Georgia, Athens, Georgia, United States of America
| | - George M. Garrity
- NamesforLife, LLC, East Lansing, Michigan, United States of America
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
| | - Hans-Peter Klenk
- DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany
- * E-mail: (NCK); (HPK)
| |
Collapse
|
48
|
Tenenbaum JD, Sansone SA, Haendel M. A sea of standards for omics data: sink or swim? J Am Med Inform Assoc 2014; 21:200-3. [PMID: 24076747 PMCID: PMC3932466 DOI: 10.1136/amiajnl-2013-002066] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Revised: 07/08/2013] [Accepted: 09/10/2013] [Indexed: 11/29/2022] Open
Abstract
In the era of Big Data, omic-scale technologies, and increasing calls for data sharing, it is generally agreed that the use of community-developed, open data standards is critical. Far less agreed upon is exactly which data standards should be used, the criteria by which one should choose a standard, or even what constitutes a data standard. It is impossible simply to choose a domain and have it naturally follow which data standards should be used in all cases. The 'right' standards to use is often dependent on the use case scenarios for a given project. Potential downstream applications for the data, however, may not always be apparent at the time the data are generated. Similarly, technology evolves, adding further complexity. Would-be standards adopters must strike a balance between planning for the future and minimizing the burden of compliance. Better tools and resources are required to help guide this balancing act.
Collapse
Affiliation(s)
- Jessica D Tenenbaum
- Duke Translational Medicine Institute, Duke University, Durham, North Carolina, USA
| | | | - Melissa Haendel
- Library and Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| |
Collapse
|
49
|
Abstract
The field of human genomics has led advances in the sharing of data with a view to facilitating translation of research into innovations for human health. This change in scientific practice has been implemented through new policy developed by many principal investigators, project managers and funders, which has ultimately led to new forms of practice and innovative governance models for data sharing. Here, we examine the development of the governance of data sharing in genomics, and explore some of the key challenges associated with the design and implementation of these policies. We examine how the incremental nature of policy design, the perennial problem of consent, the gridlock caused by multiple and overlapping access systems, the administrative burden and the problems with incentives and acknowledgment all have an impact on the potential for data sharing to be maximized. We conclude by proposing ways in which the scientific community can address these problems, to improve the sustainability of data sharing into the future.
Collapse
Affiliation(s)
- Jane Kaye
- HeLEX - Centre for Health, Law and Emerging Technologies, Department of Public Health, University of Oxford, Old Road Campus, Oxford OX3 7LF, UK
| | - Naomi Hawkins
- University of Exeter Law School, Amory Building, Rennes Drive, Exeter EX4 4RJ, UK
| |
Collapse
|
50
|
Abstract
Portraying high-throughput genomics research as a wild frontier, Andrea Bild and colleagues use caricatures to highlight common pitfalls in genomic research and provide recommendations for navigating this terrain.
Collapse
|