1
|
Kumar S, Tao Q, Lamarca AP, Tamura K. Computational Reproducibility of Molecular Phylogenies. Mol Biol Evol 2023; 40:msad165. [PMID: 37467477 PMCID: PMC10370456 DOI: 10.1093/molbev/msad165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 07/11/2023] [Accepted: 07/12/2023] [Indexed: 07/21/2023] Open
Abstract
Repeated runs of the same program can generate different molecular phylogenies from identical data sets under the same analytical conditions. This lack of reproducibility of inferred phylogenies casts a long shadow on downstream research employing these phylogenies in areas such as comparative genomics, systematics, and functional biology. We have assessed the relative accuracies and log-likelihoods of alternative phylogenies generated for computer-simulated and empirical data sets. Our findings indicate that these alternative phylogenies reconstruct evolutionary relationships with comparable accuracy. They also have similar log-likelihoods that are not inferior to the log-likelihoods of the true tree. We determined that the direct relationship between irreproducibility and inaccuracy is due to their common dependence on the amount of phylogenetic information in the data. While computational reproducibility can be enhanced through more extensive heuristic searches for the maximum likelihood tree, this does not lead to higher accuracy. We conclude that computational irreproducibility plays a minor role in molecular phylogenetics.
Collapse
Affiliation(s)
- Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Alessandra P Lamarca
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Koichiro Tamura
- Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University, Hachioji, Tokyo, Japan
- Department of Biological Sciences, Tokyo Metropolitan University, Hachioji, Tokyo, Japan
| |
Collapse
|
2
|
Gao J, May MR, Rannala B, Moore BR. PrioriTree: a utility for improving phylodynamic analyses in BEAST. Bioinformatics 2023; 39:6967033. [PMID: 36592035 PMCID: PMC9841403 DOI: 10.1093/bioinformatics/btac849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 12/20/2022] [Accepted: 12/30/2022] [Indexed: 01/03/2023] Open
Abstract
SUMMARY Phylodynamic methods are central to studies of the geographic and demographic history of disease outbreaks. Inference under discrete-geographic phylodynamic models-which involve many parameters that must be inferred from minimal information-is inherently sensitive to our prior beliefs about the model parameters. We present an interactive utility, PrioriTree, to help researchers identify and accommodate prior sensitivity in discrete-geographic inferences. Specifically, PrioriTree provides a suite of functions to generate input files for-and summarize output from-BEAST analyses for performing robust Bayesian inference, data-cloning analyses and assessing the relative and absolute fit of candidate discrete-geographic (prior) models to empirical datasets. AVAILABILITY AND IMPLEMENTATION PrioriTree is distributed as an R package available at https://github.com/jsigao/prioritree, with a comprehensive user manual provided at https://bookdown.org/jsigao/prioritree_manual/.
Collapse
Affiliation(s)
- Jiansi Gao
- To whom correspondence should be addressed
| | - Michael R May
- Department of Evolution and Ecology, University of California, Davis, Davis, CA 95616, USA
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, Davis, CA 95616, USA
| | - Brian R Moore
- Department of Evolution and Ecology, University of California, Davis, Davis, CA 95616, USA
| |
Collapse
|
3
|
Xu S, Li L, Luo X, Chen M, Tang W, Zhan L, Dai Z, Lam TT, Guan Y, Yu G. Ggtree: A serialized data object for visualization of a phylogenetic tree and annotation data. IMETA 2022; 1:e56. [PMID: 38867905 PMCID: PMC10989815 DOI: 10.1002/imt2.56] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/05/2022] [Accepted: 09/14/2022] [Indexed: 06/14/2024]
Abstract
While phylogenetic trees and associated data have been getting easier to generate, it has been difficult to reuse, combine, and synthesize the information they provided, because published trees are often only available as image files and associated data are often stored in incompatible formats. To increase the reproducibility and reusability of phylogenetic data, the ggtree object was designed for storing phylogenetic tree and associated data, as well as visualization directives. The ggtree object itself is a graphic object and can be rendered as a static image. More importantly, the input tree and associated data that are used in visualization can be extracted from the graphic object, making it an ideal data structure for publishing tree (image, tree, and data in one single object) and thus enhancing data reuse and analytical reproducibility, as well as facilitating integrative and comparative studies. The ggtree package is freely available at https://www.bioconductor.org/packages/ggtree.
Collapse
Affiliation(s)
- Shuangbin Xu
- Department of Bioinformatics, School of Basic Medical SciencesSouthern Medical UniversityGuangzhouChina
| | - Lin Li
- Department of Bioinformatics, School of Basic Medical SciencesSouthern Medical UniversityGuangzhouChina
| | - Xiao Luo
- Department of Bioinformatics, School of Basic Medical SciencesSouthern Medical UniversityGuangzhouChina
| | - Meijun Chen
- Department of Bioinformatics, School of Basic Medical SciencesSouthern Medical UniversityGuangzhouChina
| | - Wenli Tang
- Department of Bioinformatics, School of Basic Medical SciencesSouthern Medical UniversityGuangzhouChina
| | - Li Zhan
- Department of Bioinformatics, School of Basic Medical SciencesSouthern Medical UniversityGuangzhouChina
| | - Zehan Dai
- Department of Bioinformatics, School of Basic Medical SciencesSouthern Medical UniversityGuangzhouChina
| | - Tommy T. Lam
- State Key Laboratory of Emerging Infectious Diseases, School of Public HealthThe University of Hong KongHong Kong SARChina
| | - Yi Guan
- State Key Laboratory of Emerging Infectious Diseases, School of Public HealthThe University of Hong KongHong Kong SARChina
- Joint Institute of Virology (Shantou University–The University of Hong Kong)Shantou UniversityShantouChina
| | - Guangchuang Yu
- Department of Bioinformatics, School of Basic Medical SciencesSouthern Medical UniversityGuangzhouChina
| |
Collapse
|
4
|
Evans TR, Pownall M, Collins E, Henderson EL, Pickering JS, O'Mahony A, Zaneva M, Jaquiery M, Dumbalska T. A network of change: united action on research integrity. BMC Res Notes 2022; 15:141. [PMID: 35421988 PMCID: PMC9008612 DOI: 10.1186/s13104-022-06026-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 04/04/2022] [Indexed: 11/10/2022] Open
Abstract
The last decade has seen renewed concern within the scientific community over the reproducibility and transparency of research findings. This paper outlines some of the various responsibilities of stakeholders in addressing the systemic issues that contribute to this concern. In particular, this paper asserts that a united, joined-up approach is needed, in which all stakeholders, including researchers, universities, funders, publishers, and governments, work together to set standards of research integrity and engender scientific progress and innovation. Using two developments as examples: the adoption of Registered Reports as a discrete initiative, and the use of open data as an ongoing norm change, we discuss the importance of collaboration across stakeholders.
Collapse
Affiliation(s)
- Thomas Rhys Evans
- School of Human Sciences, University of Greenwich, London, England. .,Institute for Lifecourse Development, University of Greenwich, London, England.
| | | | | | | | | | - Aoife O'Mahony
- School of Psychology, Cardiff University, Cardiff, Wales
| | | | | | | |
Collapse
|
5
|
Ribeiro CVR, Oliveira LP, Batista R, De Sousa M. UCEasy: A software package for automating and simplifying the analysis of ultraconserved elements (UCEs). Biodivers Data J 2021; 9:e78132. [PMID: 34934383 PMCID: PMC8683391 DOI: 10.3897/bdj.9.e78132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 12/09/2021] [Indexed: 11/25/2022] Open
Abstract
Background The use of Ultraconserved Elements (UCEs) as genetic markers in phylogenomics has become popular and has provided promising results. Although UCE data can be easily obtained from targeted enriched sequencing, the protocol for in silico analysis of UCEs consist of the execution of heterogeneous and complex tools, a challenge for scientists without training in bioinformatics. Developing tools with the adoption of best practices in research software can lessen this problem by improving the execution of computational experiments, thus promoting better reproducibility. New information We present UCEasy, an easy-to-install and easy-to-use software package with a simple command line interface that facilitates the computational analysis of UCEs from sequencing samples, following the best practices of research software. UCEasy is a wrapper that standardises, automates and simplifies the quality control of raw reads, assembly and extraction and alignment of UCEs, generating at the end a data matrix with different levels of completeness that can be used to infer phylogenetic trees. We demonstrate the functionalities of UCEasy by reproducing the published results of phylogenomic studies of the bird genus Turdus (Aves) and of Adephaga families (Coleoptera) containing genomic datasets to efficiently extract UCEs.
Collapse
Affiliation(s)
- Caio V R Ribeiro
- Coordenação de Ciência da Computação, Centro Universitário do Estado do Pará (CESUPA), Belém, Brazil Coordenação de Ciência da Computação, Centro Universitário do Estado do Pará (CESUPA) Belém Brazil
| | - Lucas P Oliveira
- Instituto de Computação, Universidade Estadual de Campinas (UNICAMP), Campinas, Brazil Instituto de Computação, Universidade Estadual de Campinas (UNICAMP) Campinas Brazil
| | - Romina Batista
- Instituto Nacional de Pesquisas da Amazônia (INPA), Manaus, Brazil Instituto Nacional de Pesquisas da Amazônia (INPA) Manaus Brazil.,Gothenburg Global Biodiversity Centre, Gothenburg, Sweden Gothenburg Global Biodiversity Centre Gothenburg Sweden
| | - Marcos De Sousa
- Museu Paraense Emílio Goeldi (MPEG), Belém, Brazil Museu Paraense Emílio Goeldi (MPEG) Belém Brazil.,Coordenação de Ciência da Computação, Centro Universitário do Estado do Pará (CESUPA), Belém, Brazil Coordenação de Ciência da Computação, Centro Universitário do Estado do Pará (CESUPA) Belém Brazil
| |
Collapse
|
6
|
Toribio-Flórez D, Anneser L, deOliveira-Lopes FN, Pallandt M, Tunn I, Windel H. Where Do Early Career Researchers Stand on Open Science Practices? A Survey Within the Max Planck Society. Front Res Metr Anal 2021; 5:586992. [PMID: 33870051 PMCID: PMC8025980 DOI: 10.3389/frma.2020.586992] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 10/19/2020] [Indexed: 12/24/2022] Open
Abstract
Open science (OS) is of paramount importance for the improvement of science worldwide and across research fields. Recent years have witnessed a transition toward open and transparent scientific practices, but there is still a long way to go. Early career researchers (ECRs) are of crucial relevance in the process of steering toward the standardization of OS practices, as they will become the future decision makers of the institutional change that necessarily accompanies this transition. Thus, it is imperative to gain insight into where ECRs stand on OS practices. Under this premise, the Open Science group of the Max Planck PhDnet designed and conducted an online survey to assess the stance toward OS practices of doctoral candidates from the Max Planck Society. As one of the leading scientific institutions for basic research worldwide, the Max Planck Society provides a considerable population of researchers from multiple scientific fields, englobed into three sections: biomedical sciences, chemistry, physics and technology, and human and social sciences. From an approximate total population of 5,100 doctoral candidates affiliated with the Max Planck Society, the survey collected responses from 568 doctoral candidates. The survey assessed self-reported knowledge, attitudes, and implementation of different OS practices, namely, open access publications, open data, preregistrations, registered reports, and replication studies. ECRs seemed to hold a generally positive view toward these different practices and to be interested in learning more about them. Furthermore, we found that ECRs' knowledge and positive attitudes predicted the extent to which they implemented these OS practices, although levels of implementation were rather low in the past. We observed differences and similarities between scientific sections. We discuss these differences in terms of need and feasibility to apply these OS practices in specific scientific fields, but additionally in relation to the incentive systems that shape scientific communities. Lastly, we discuss the implications that these results can have for the training and career advancement of ECRs, and ultimately, for the consolidation of OS practices.
Collapse
Affiliation(s)
| | - Lukas Anneser
- Max Planck Institute for Brain Research, Frankfurt Am Main, Germany
| | | | | | - Isabell Tunn
- Max Planck Institute of Colloids and Interfaces, Potsdam, Germany
| | | | | |
Collapse
|
7
|
Eckert EM, Di Cesare A, Fontaneto D, Berendonk TU, Bürgmann H, Cytryn E, Fatta-Kassinos D, Franzetti A, Larsson DGJ, Manaia CM, Pruden A, Singer AC, Udikovic-Kolic N, Corno G. Every fifth published metagenome is not available to science. PLoS Biol 2020; 18:e3000698. [PMID: 32243442 PMCID: PMC7159239 DOI: 10.1371/journal.pbio.3000698] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 04/15/2020] [Indexed: 01/17/2023] Open
Abstract
Have you ever sought to use metagenomic DNA sequences reported in scientific publications? Were you successful? Here, we reveal that metagenomes from no fewer than 20% of the papers found in our literature search, published between 2016 and 2019, were not deposited in a repository or were simply inaccessible. The proportion of inaccessible data within the literature has been increasing year-on-year. Noncompliance with Open Data is best predicted by the scientific discipline of the journal. The number of citations, journal type (e.g., Open Access or subscription journals), and publisher are not good predictors of data accessibility. However, many publications in high-impact factor journals do display a higher likelihood of accessible metagenomic data sets. Twenty-first century science demands compliance with the ethical standard of data sharing of metagenomes and DNA sequence data more broadly. Data accessibility must become one of the routine and mandatory components of manuscript submissions-a requirement that should be applicable across the increasing number of disciplines using metagenomics. Compliance must be ensured and reinforced by funders, publishers, editors, reviewers, and, ultimately, the authors.
Collapse
Affiliation(s)
- Ester M. Eckert
- Molecular Ecology Group (MEG), Water Research Institute, National Research Council of Italy, Verbania Pallanza, Italy
| | - Andrea Di Cesare
- Molecular Ecology Group (MEG), Water Research Institute, National Research Council of Italy, Verbania Pallanza, Italy
| | - Diego Fontaneto
- Molecular Ecology Group (MEG), Water Research Institute, National Research Council of Italy, Verbania Pallanza, Italy
| | | | - Helmut Bürgmann
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Kastanienbaum, Switzerland
| | - Eddie Cytryn
- Institute of Soil Water and Environmental Sciences, Volcani Center, Agricultural Research Organization, Rishon Lezion, Israel
| | - Despo Fatta-Kassinos
- Department of Civil and Environmental Engineering and Nireas-International Water Research Center, University of Cyprus, Nicosia, Cyprus
| | | | - D. G. Joakim Larsson
- Centre for Antibiotic Resistance Research (CARe), University of Gothenburg, Gothenburg, Sweden
- Department of Infectious Diseases, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Célia M. Manaia
- Universidade Católica Portuguesa, CBQF—Centro de Biotecnologia e Química Fina–Laboratório Associado, Escola Superior de Biotecnologia, Porto, Portugal
| | - Amy Pruden
- Department of Civil & Environmental Engineering, Blacksburg, Virginia, United States of America
| | | | | | - Gianluca Corno
- Molecular Ecology Group (MEG), Water Research Institute, National Research Council of Italy, Verbania Pallanza, Italy
| |
Collapse
|
8
|
Baker E, Vincent S. A deafening silence: a lack of data and reproducibility in published bioacoustics research? Biodivers Data J 2019; 7:e36783. [PMID: 31723333 PMCID: PMC6834726 DOI: 10.3897/bdj.7.e36783] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 10/29/2019] [Indexed: 11/12/2022] Open
Abstract
A study of 100 papers from five journals that make use of bioacoustic recordings shows that only a minority (21%) deposit any of the recordings in a repository, supplementary materials section or a personal website. This lack of deposition hinders re-use of the raw data by other researchers, prevents the reproduction of a project's analyses and confirmation of its findings and impedes progress within the broader bioacoustics community. We make some recommendations for researchers interested in depositing their data.
Collapse
Affiliation(s)
- Ed Baker
- Natural History Museum, London, United KingdomNatural History MuseumLondonUnited Kingdom
- University of York, York, United KingdomUniversity of YorkYorkUnited Kingdom
| | - Sarah Vincent
- Natural History Museum, London, United KingdomNatural History MuseumLondonUnited Kingdom
| |
Collapse
|
9
|
Feilich KL, López-Fernández H. When Does Form Reflect Function? Acknowledging and Supporting Ecomorphological Assumptions. Integr Comp Biol 2019; 59:358-370. [DOI: 10.1093/icb/icz070] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Abstract
Ecomorphology is the study of relationships between organismal morphology and ecology. As such, it is the only way to determine if morphometric data can be used as an informative proxy for ecological variables of interest. To achieve this goal, ecomorphology often depends on, or directly tests, assumptions about the nature of the relationships among morphology, performance, and ecology. We discuss three approaches to the study of ecomorphology: morphometry-driven, function-driven, and ecology-driven and study design choices inherent to each approach. We also identify 10 assumptions that underlie ecomorphological research: 4 of these are central to all ecomorphological studies and the remaining 6 are variably applicable to some of the specific approaches described above. We discuss how these assumptions may impact ecomorphological studies and affect the interpretation of their findings. We also point out some limitations of ecomorphological studies, and highlight some ways by which we can strengthen, validate, or eliminate systematic assumptions.
Collapse
Affiliation(s)
- Kara L Feilich
- Museum of Paleontology, University of Michigan, 1105 North University Ave, Ann Arbor, MI 48109, USA
| | - Hernán López-Fernández
- Department of Ecology and Evolutionary Biology and Museum of Zoology, University of Michigan, 1105 North University Ave, Ann Arbor, MI 48109, USA
| |
Collapse
|
10
|
Chang J, Rabosky DL, Smith SA, Alfaro ME. An
r
package and online resource for macroevolutionary studies using the ray‐finned fish tree of life. Methods Ecol Evol 2019. [DOI: 10.1111/2041-210x.13182] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Jonathan Chang
- School of Biological Sciences Monash University Clayton VIC Australia
| | - Daniel L. Rabosky
- Museum of Zoology Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor MI
| | - Stephen A. Smith
- Museum of Zoology Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor MI
| | - Michael E. Alfaro
- Department of Ecology and Evolutionary BiologyUniversity of CaliforniaLos AngelesCA
| |
Collapse
|
11
|
Bravo GA, Antonelli A, Bacon CD, Bartoszek K, Blom MPK, Huynh S, Jones G, Knowles LL, Lamichhaney S, Marcussen T, Morlon H, Nakhleh LK, Oxelman B, Pfeil B, Schliep A, Wahlberg N, Werneck FP, Wiedenhoeft J, Willows-Munro S, Edwards SV. Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics. PeerJ 2019; 7:e6399. [PMID: 30783571 PMCID: PMC6378093 DOI: 10.7717/peerj.6399] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 01/07/2019] [Indexed: 12/23/2022] Open
Abstract
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.
Collapse
Affiliation(s)
- Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Alexandre Antonelli
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
- Gothenburg Botanical Garden, Göteborg, Sweden
| | - Christine D. Bacon
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Krzysztof Bartoszek
- Department of Computer and Information Science, Linköping University, Linköping, Sweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| | - Stella Huynh
- Institut de Biologie, Université de Neuchâtel, Neuchâtel, Switzerland
| | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - L. Lacey Knowles
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Sangeet Lamichhaney
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Thomas Marcussen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway
| | - Hélène Morlon
- Institut de Biologie, Ecole Normale Supérieure de Paris, Paris, France
| | - Luay K. Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bengt Oxelman
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Bernard Pfeil
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Alexander Schliep
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| | | | - Fernanda P. Werneck
- Coordenação de Biodiversidade, Programa de Coleções Científicas Biológicas, Instituto Nacional de Pesquisa da Amazônia, Manaus, AM, Brazil
| | - John Wiedenhoeft
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Sandi Willows-Munro
- School of Life Sciences, University of Kwazulu-Natal, Pietermaritzburg, South Africa
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Centre for Advanced Studies in Science and Technology, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| |
Collapse
|
12
|
Wagholikar KB, Dessai P, Sanz J, Mendis ME, Bell DS, Murphy SN. Implementation of informatics for integrating biology and the bedside (i2b2) platform as Docker containers. BMC Med Inform Decis Mak 2018; 18:66. [PMID: 30012140 PMCID: PMC6048900 DOI: 10.1186/s12911-018-0646-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Accepted: 06/27/2018] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Informatics for Integrating Biology and the Bedside (i2b2) is an open source clinical data analytics platform used at over 200 healthcare institutions for querying patient data. The i2b2 platform has several components with numerous dependencies and configuration parameters, which renders the task of installing or upgrading i2b2 a challenging one. Even with the availability of extensive documentation and tutorials, new users often require several weeks to correctly install a functional i2b2 platform. The goal of this work is to simplify the installation and upgrade process for i2b2. Specifically, we have containerized the core components of the platform, and evaluated the containers for ease of installation. RESULTS We developed three Docker container images: WildFly, database, and web, to encapsulate the three major deployment components of i2b2. These containers isolate the core functionalities of the i2b2 platform, and work in unison to provide its functionalities. Our evaluations indicate that i2b2 containers function successfully on the Linux platform. Our results demonstrate that the containerized components work out-of-the-box, with minimal configuration. CONCLUSIONS Containerization offers the potential to package the i2b2 platform components into standalone executable packages that are agnostic to the underlying host operating system. By releasing i2b2 as a Docker container, we anticipate that users will be able to create a working i2b2 hive installation without the need to download, compile, and configure individual components that constitute the i2b2 cells, thus making this platform accessible to a greater number of institutions.
Collapse
Affiliation(s)
| | - Pralav Dessai
- University of California Los Angeles, Los Angeles, CA USA
| | - Javier Sanz
- University of California Los Angeles, Los Angeles, CA USA
| | | | | | - Shawn N. Murphy
- Massachusetts General Hospital, Boston, MA USA
- Harvard Medical School, Boston, MA USA
| |
Collapse
|
13
|
Federer LM, Belter CW, Joubert DJ, Livinski A, Lu YL, Snyders LN, Thompson H. Data sharing in PLOS ONE: An analysis of Data Availability Statements. PLoS One 2018; 13:e0194768. [PMID: 29719004 PMCID: PMC5931451 DOI: 10.1371/journal.pone.0194768] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Accepted: 03/09/2018] [Indexed: 11/24/2022] Open
Abstract
A number of publishers and funders, including PLOS, have recently adopted policies requiring researchers to share the data underlying their results and publications. Such policies help increase the reproducibility of the published literature, as well as make a larger body of data available for reuse and re-analysis. In this study, we evaluate the extent to which authors have complied with this policy by analyzing Data Availability Statements from 47,593 papers published in PLOS ONE between March 2014 (when the policy went into effect) and May 2016. Our analysis shows that compliance with the policy has increased, with a significant decline over time in papers that did not include a Data Availability Statement. However, only about 20% of statements indicate that data are deposited in a repository, which the PLOS policy states is the preferred method. More commonly, authors state that their data are in the paper itself or in the supplemental information, though it is unclear whether these data meet the level of sharing required in the PLOS policy. These findings suggest that additional review of Data Availability Statements or more stringent policies may be needed to increase data sharing.
Collapse
Affiliation(s)
- Lisa M. Federer
- NIH Library, Division of Library Services, Office of Research Services, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Christopher W. Belter
- NIH Library, Division of Library Services, Office of Research Services, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Douglas J. Joubert
- NIH Library, Division of Library Services, Office of Research Services, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Alicia Livinski
- NIH Library, Division of Library Services, Office of Research Services, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Ya-Ling Lu
- NIH Library, Division of Library Services, Office of Research Services, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Lissa N. Snyders
- NIH Library, Division of Library Services, Office of Research Services, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Holly Thompson
- NIH Library, Division of Library Services, Office of Research Services, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
14
|
Culina A, Baglioni M, Crowther TW, Visser ME, Woutersen-Windhouwer S, Manghi P. Navigating the unfolding open data landscape in ecology and evolution. Nat Ecol Evol 2018; 2:420-426. [DOI: 10.1038/s41559-017-0458-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Accepted: 12/19/2017] [Indexed: 02/05/2023]
|
15
|
McTavish EJ, Drew BT, Redelings B, Cranston KA. How and Why to Build a Unified Tree of Life. Bioessays 2017; 39. [PMID: 28980328 DOI: 10.1002/bies.201700114] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 08/27/2017] [Indexed: 01/20/2023]
Abstract
Phylogenetic trees are a crucial backbone for a wide breadth of biological research spanning systematics, organismal biology, ecology, and medicine. In 2015, the Open Tree of Life project published a first draft of a comprehensive tree of life, summarizing digitally available taxonomic and phylogenetic knowledge. This paper reviews, investigates, and addresses the following questions as a follow-up to that paper, from the perspective of researchers involved in building this summary of the tree of life: Is there a tree of life and should we reconstruct it? Is available data sufficient to reconstruct the tree of life? Do we have access to phylogenetic inferences in usable form? Can we combine different phylogenetic estimates across the tree of life? And finally, what is the future of understanding the tree of life?
Collapse
Affiliation(s)
| | - Bryan T Drew
- University of Nebraska at Kearney, Kerney, NE, 68849, USA
| | - Ben Redelings
- University of Kansas, Lawrence, KS, 66045, USA Duke University, Durham NC 27705 USA; Ronin Institute, Durham, NC 27705 USA
| | | |
Collapse
|
16
|
Mounce R, Murray-Rust P, Wills M. A machine-compiled microbial supertree from figure-mining thousands of papers. RESEARCH IDEAS AND OUTCOMES 2017. [DOI: 10.3897/rio.3.e13589] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
|
17
|
Vasilevsky NA, Minnier J, Haendel MA, Champieux RE. Reproducible and reusable research: are journal data sharing policies meeting the mark? PeerJ 2017; 5:e3208. [PMID: 28462024 PMCID: PMC5407277 DOI: 10.7717/peerj.3208] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 03/20/2017] [Indexed: 11/20/2022] Open
Abstract
Background There is wide agreement in the biomedical research community that research data sharing is a primary ingredient for ensuring that science is more transparent and reproducible. Publishers could play an important role in facilitating and enforcing data sharing; however, many journals have not yet implemented data sharing policies and the requirements vary widely across journals. This study set out to analyze the pervasiveness and quality of data sharing policies in the biomedical literature. Methods The online author’s instructions and editorial policies for 318 biomedical journals were manually reviewed to analyze the journal’s data sharing requirements and characteristics. The data sharing policies were ranked using a rubric to determine if data sharing was required, recommended, required only for omics data, or not addressed at all. The data sharing method and licensing recommendations were examined, as well any mention of reproducibility or similar concepts. The data was analyzed for patterns relating to publishing volume, Journal Impact Factor, and the publishing model (open access or subscription) of each journal. Results A total of 11.9% of journals analyzed explicitly stated that data sharing was required as a condition of publication. A total of 9.1% of journals required data sharing, but did not state that it would affect publication decisions. 23.3% of journals had a statement encouraging authors to share their data but did not require it. A total of 9.1% of journals mentioned data sharing indirectly, and only 14.8% addressed protein, proteomic, and/or genomic data sharing. There was no mention of data sharing in 31.8% of journals. Impact factors were significantly higher for journals with the strongest data sharing policies compared to all other data sharing criteria. Open access journals were not more likely to require data sharing than subscription journals. Discussion Our study confirmed earlier investigations which observed that only a minority of biomedical journals require data sharing, and a significant association between higher Impact Factors and journals with a data sharing requirement. Moreover, while 65.7% of the journals in our study that required data sharing addressed the concept of reproducibility, as with earlier investigations, we found that most data sharing policies did not provide specific guidance on the practices that ensure data is maximally available and reusable.
Collapse
Affiliation(s)
- Nicole A Vasilevsky
- OHSU Library, Oregon Health & Science University, Portland, OR, United States.,Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, United States
| | - Jessica Minnier
- OHSU-PSU School of Public Health, Oregon Health & Science University, Portland, OR, United States
| | - Melissa A Haendel
- OHSU Library, Oregon Health & Science University, Portland, OR, United States.,Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, United States
| | - Robin E Champieux
- OHSU Library, Oregon Health & Science University, Portland, OR, United States
| |
Collapse
|
18
|
Public availability of research data in dentistry journals indexed in Journal Citation Reports. Clin Oral Investig 2017; 22:275-280. [DOI: 10.1007/s00784-017-2108-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Accepted: 03/16/2017] [Indexed: 12/30/2022]
|
19
|
Michonneau F, Brown JW, Winter DJ. rotl: an R package to interact with the Open Tree of Life data. Methods Ecol Evol 2016. [DOI: 10.1111/2041-210x.12593] [Citation(s) in RCA: 202] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- François Michonneau
- Whitney Laboratory for Marine Sciences University of Florida St. Augustine FL 32080 USA
- Florida Museum of Natural History University of Florida Gainesville FL 32611‐7800 USA
| | - Joseph W. Brown
- Department of Ecology & Evolutionary Biology University of Michigan Ann Arbor MI 48109 USA
| | - David J. Winter
- Virginia G. Piper Centre for Personalized Diagnostics The Biodesign Institute, Arizona State University Tempe AZ 85287‐5001 USA
| |
Collapse
|
20
|
Kidwell MC, Lazarević LB, Baranski E, Hardwicke TE, Piechowski S, Falkenberg LS, Kennett C, Slowik A, Sonnleitner C, Hess-Holden C, Errington TM, Fiedler S, Nosek BA. Badges to Acknowledge Open Practices: A Simple, Low-Cost, Effective Method for Increasing Transparency. PLoS Biol 2016; 14:e1002456. [PMID: 27171007 PMCID: PMC4865119 DOI: 10.1371/journal.pbio.1002456] [Citation(s) in RCA: 256] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 04/08/2016] [Indexed: 11/18/2022] Open
Abstract
Beginning January 2014, Psychological Science gave authors the opportunity to signal open data and materials if they qualified for badges that accompanied published articles. Before badges, less than 3% of Psychological Science articles reported open data. After badges, 23% reported open data, with an accelerating trend; 39% reported open data in the first half of 2015, an increase of more than an order of magnitude from baseline. There was no change over time in the low rates of data sharing among comparison journals. Moreover, reporting openness does not guarantee openness. When badges were earned, reportedly available data were more likely to be actually available, correct, usable, and complete than when badges were not earned. Open materials also increased to a weaker degree, and there was more variability among comparison journals. Badges are simple, effective signals to promote open practices and improve preservation of data and materials by using independent repositories. Badges that acknowledge open practices significantly increase sharing of reported data and materials, as well as subsequent accessibility, correctness, usability, and completeness. Openness is a core value of scientific practice. The sharing of research materials and data facilitates critique, extension, and application within the scientific community, yet current norms provide few incentives for researchers to share evidence underlying scientific claims. In January 2014, the journal Psychological Science adopted such an incentive by offering “badges” to acknowledge and signal open practices in publications. In this study, we evaluated the effect that two types of badges—Open Data badges and Open Materials badges—have had on reported data and material sharing, as well as on the actual availability, correctness, usability, and completeness of those data and materials both in Psychological Science and in four comparison journals. We report an increase in reported data sharing of more than an order of magnitude from baseline in Psychological Science, as well as an increase in reported materials sharing, although to a weaker degree. Moreover, we show that reportedly available data and materials were more accessible, correct, usable, and complete when badges were earned. We demonstrate that badges are effective incentives that improve the openness, accessibility, and persistence of data and materials that underlie scientific research.
Collapse
Affiliation(s)
- Mallory C. Kidwell
- Center for Open Science, Charlottesville, Virginia, United States of America
- * E-mail: ;
| | | | - Erica Baranski
- University of California, Riverside, Riverside, California, United States of America
| | | | - Sarah Piechowski
- Max Planck Institute for Research on Collective Goods, Bonn, Germany
| | | | - Curtis Kennett
- Mississippi State University, Starkville, Mississippi, United States of America
| | | | | | - Chelsey Hess-Holden
- Mississippi State University, Starkville, Mississippi, United States of America
| | | | - Susann Fiedler
- Max Planck Institute for Research on Collective Goods, Bonn, Germany
| | - Brian A. Nosek
- Center for Open Science, Charlottesville, Virginia, United States of America
- University of Virginia, Charlottesville, Virginia, United States of America
| |
Collapse
|
21
|
Abstract
It was recently proposed that long-term population studies be exempted from the expectation that authors publicly archive the primary data underlying published articles. Such studies are valuable to many areas of ecological and evolutionary biological research, and multiple risks to their viability were anticipated as a result of public data archiving (PDA), ultimately all stemming from independent reuse of archived data. However, empirical assessment was missing, making it difficult to determine whether such fears are realistic. I addressed this by surveying data packages from long-term population studies archived in the Dryad Digital Repository. I found no evidence that PDA results in reuse of data by independent parties, suggesting the purported costs of PDA for long-term population studies have been overstated.
Collapse
|
22
|
|
23
|
Roche DG, Kruuk LEB, Lanfear R, Binning SA. Public Data Archiving in Ecology and Evolution: How Well Are We Doing? PLoS Biol 2015; 13:e1002295. [PMID: 26556502 PMCID: PMC4640582 DOI: 10.1371/journal.pbio.1002295] [Citation(s) in RCA: 130] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Policies that mandate public data archiving (PDA) successfully increase accessibility to data underlying scientific publications. However, is the data quality sufficient to allow reuse and reanalysis? We surveyed 100 datasets associated with nonmolecular studies in journals that commonly publish ecological and evolutionary research and have a strong PDA policy. Out of these datasets, 56% were incomplete, and 64% were archived in a way that partially or entirely prevented reuse. We suggest that cultural shifts facilitating clearer benefits to authors are necessary to achieve high-quality PDA and highlight key guidelines to help authors increase their data's reuse potential and compliance with journal data policies.
Collapse
Affiliation(s)
- Dominique G. Roche
- Division of Evolution, Ecology and Genetics, Research School of Biology, The Australian National University, Canberra, Australian Capital Territory, Australia
- Éco-Éthologie, Institut de Biologie, Université de Neuchâtel, Neuchâtel, Switzerland
- * E-mail:
| | - Loeske E. B. Kruuk
- Division of Evolution, Ecology and Genetics, Research School of Biology, The Australian National University, Canberra, Australian Capital Territory, Australia
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Robert Lanfear
- Division of Evolution, Ecology and Genetics, Research School of Biology, The Australian National University, Canberra, Australian Capital Territory, Australia
- Department of Biological Sciences, Macquarie University, Sydney, Australia
| | - Sandra A. Binning
- Division of Evolution, Ecology and Genetics, Research School of Biology, The Australian National University, Canberra, Australian Capital Territory, Australia
- Éco-Éthologie, Institut de Biologie, Université de Neuchâtel, Neuchâtel, Switzerland
| |
Collapse
|
24
|
Hinchliff CE, Smith SA, Allman JF, Burleigh JG, Chaudhary R, Coghill LM, Crandall KA, Deng J, Drew BT, Gazis R, Gude K, Hibbett DS, Katz LA, Laughinghouse HD, McTavish EJ, Midford PE, Owen CL, Ree RH, Rees JA, Soltis DE, Williams T, Cranston KA. Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proc Natl Acad Sci U S A 2015; 112:12764-9. [PMID: 26385966 PMCID: PMC4611642 DOI: 10.1073/pnas.1423041112] [Citation(s) in RCA: 372] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips-the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.
Collapse
Affiliation(s)
- Cody E Hinchliff
- Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109
| | - Stephen A Smith
- Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109;
| | | | | | - Ruchi Chaudhary
- Department of Biology, University of Florida, Gainesville, FL 32611
| | | | - Keith A Crandall
- Computational Biology Institute, George Washington University, Ashburn, VA 20147
| | - Jiabin Deng
- Department of Biology, University of Florida, Gainesville, FL 32611
| | - Bryan T Drew
- Department of Biology, University of Nebraska-Kearney, Kearney, NE 68849
| | - Romina Gazis
- Department of Biology, Clark University, Worcester, MA 01610
| | - Karl Gude
- School of Journalism, Michigan State University, East Lansing, MI 48824
| | - David S Hibbett
- Department of Biology, Clark University, Worcester, MA 01610
| | - Laura A Katz
- Biological Science, Clark Science Center, Smith College, Northampton, MA 01063
| | | | - Emily Jane McTavish
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045
| | | | | | | | - Jonathan A Rees
- National Evolutionary Synthesis Center, Duke University, Durham, NC 27705
| | - Douglas E Soltis
- Department of Biology, University of Florida, Gainesville, FL 32611; Florida Museum of Natural History, University of Florida, Gainesville, FL 32611
| | - Tiffani Williams
- Computer Science and Engineering, Texas A&M University, College Station, TX 77843
| | - Karen A Cranston
- National Evolutionary Synthesis Center, Duke University, Durham, NC 27705;
| |
Collapse
|
25
|
ReproPhylo: An Environment for Reproducible Phylogenomics. PLoS Comput Biol 2015; 11:e1004447. [PMID: 26335558 PMCID: PMC4559436 DOI: 10.1371/journal.pcbi.1004447] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 07/13/2015] [Indexed: 11/19/2022] Open
Abstract
The reproducibility of experiments is key to the scientific process, and particularly necessary for accurate reporting of analyses in data-rich fields such as phylogenomics. We present ReproPhylo, a phylogenomic analysis environment developed to ensure experimental reproducibility, to facilitate the handling of large-scale data, and to assist methodological experimentation. Reproducibility, and instantaneous repeatability, is built in to the ReproPhylo system and does not require user intervention or configuration because it stores the experimental workflow as a single, serialized Python object containing explicit provenance and environment information. This ‘single file’ approach ensures the persistence of provenance across iterations of the analysis, with changes automatically managed by the version control program Git. This file, along with a Git repository, are the primary reproducibility outputs of the program. In addition, ReproPhylo produces an extensive human-readable report and generates a comprehensive experimental archive file, both of which are suitable for submission with publications. The system facilitates thorough experimental exploration of both parameters and data. ReproPhylo is a platform independent CC0 Python module and is easily installed as a Docker image or a WinPython self-sufficient package, with a Jupyter Notebook GUI, or as a slimmer version in a Galaxy distribution.
Collapse
|
26
|
Pope LC, Liggins L, Keyse J, Carvalho SB, Riginos C. Not the time or the place: the missing spatio-temporal link in publicly available genetic data. Mol Ecol 2015; 24:3802-9. [DOI: 10.1111/mec.13254] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2015] [Revised: 05/07/2015] [Accepted: 05/22/2015] [Indexed: 11/29/2022]
Affiliation(s)
- Lisa C. Pope
- School of Biological Sciences; The University of Queensland; Brisbane Qld 4072 Australia
| | - Libby Liggins
- Allan Wilson Centre for Molecular Ecology and Evolution; New Zealand Institute for Advanced Study; Institute of Natural and Mathematical Sciences; Massey University; Auckland 0745 New Zealand
- Auckland War Memorial Museum; Tāmaki Paenga Hira; Auckland 1142 New Zealand
| | - Jude Keyse
- School of Biological Sciences; The University of Queensland; Brisbane Qld 4072 Australia
| | - Silvia B Carvalho
- CIBIO/InBIO - Centro de Investigação em Biodiversidade e Recursos Genéticos da Universidade do Porto; R. Padre Armando Quintas 4485-661 Vairão Portugal
| | - Cynthia Riginos
- School of Biological Sciences; The University of Queensland; Brisbane Qld 4072 Australia
| |
Collapse
|
27
|
McTavish EJ, Hinchliff CE, Allman JF, Brown JW, Cranston KA, Holder MT, Rees JA, Smith SA. Phylesystem: a git-based data store for community-curated phylogenetic estimates. Bioinformatics 2015; 31:2794-800. [PMID: 25940563 PMCID: PMC4547614 DOI: 10.1093/bioinformatics/btv276] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 04/27/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Phylogenetic estimates from published studies can be archived using general platforms like Dryad (Vision, 2010) or TreeBASE (Sanderson et al., 1994). Such services fulfill a crucial role in ensuring transparency and reproducibility in phylogenetic research. However, digital tree data files often require some editing (e.g. rerooting) to improve the accuracy and reusability of the phylogenetic statements. Furthermore, establishing the mapping between tip labels used in a tree and taxa in a single common taxonomy dramatically improves the ability of other researchers to reuse phylogenetic estimates. As the process of curating a published phylogenetic estimate is not error-free, retaining a full record of the provenance of edits to a tree is crucial for openness, allowing editors to receive credit for their work and making errors introduced during curation easier to correct. RESULTS Here, we report the development of software infrastructure to support the open curation of phylogenetic data by the community of biologists. The backend of the system provides an interface for the standard database operations of creating, reading, updating and deleting records by making commits to a git repository. The record of the history of edits to a tree is preserved by git's version control features. Hosting this data store on GitHub (http://github.com/) provides open access to the data store using tools familiar to many developers. We have deployed a server running the 'phylesystem-api', which wraps the interactions with git and GitHub. The Open Tree of Life project has also developed and deployed a JavaScript application that uses the phylesystem-api and other web services to enable input and curation of published phylogenetic statements. AVAILABILITY AND IMPLEMENTATION Source code for the web service layer is available at https://github.com/OpenTreeOfLife/phylesystem-api. The data store can be cloned from: https://github.com/OpenTreeOfLife/phylesystem. A web application that uses the phylesystem web services is deployed at http://tree.opentreeoflife.org/curator. Code for that tool is available from https://github.com/OpenTreeOfLife/opentree. CONTACT mtholder@gmail.com.
Collapse
Affiliation(s)
- Emily Jane McTavish
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Cody E Hinchliff
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | | | - Joseph W Brown
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Karen A Cranston
- National Evolutionary Synthesis Center, Duke University, Durham, NC, USA
| | - Mark T Holder
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Jonathan A Rees
- National Evolutionary Synthesis Center, Duke University, Durham, NC, USA
| | - Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
28
|
Anagnostou P, Capocasa M, Milia N, Sanna E, Battaggia C, Luzi D, Destro Bisol G. When data sharing gets close to 100%: what human paleogenetics can teach the open science movement. PLoS One 2015; 10:e0121409. [PMID: 25799293 PMCID: PMC4370607 DOI: 10.1371/journal.pone.0121409] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 02/02/2015] [Indexed: 12/31/2022] Open
Abstract
This study analyzes data sharing regarding mitochondrial, Y chromosomal and autosomal polymorphisms in a total of 162 papers on ancient human DNA published between 1988 and 2013. The estimated sharing rate was not far from totality (97.6% ± 2.1%) and substantially higher than observed in other fields of genetic research (evolutionary, medical and forensic genetics). Both a questionnaire-based survey and the examination of Journals' editorial policies suggest that this high sharing rate cannot be simply explained by the need to comply with stakeholders requests. Most data were made available through body text, but the use of primary databases increased in coincidence with the introduction of complete mitochondrial and next-generation sequencing methods. Our study highlights three important aspects. First, our results imply that researchers' awareness of the importance of openness and transparency for scientific progress may complement stakeholders' policies in achieving very high sharing rates. Second, widespread data sharing does not necessarily coincide with a prevalent use of practices which maximize data findability, accessibility, useability and preservation. A detailed look at the different ways in which data are released can be very useful to detect failures to adopt the best sharing modalities and understand how to correct them. Third and finally, the case of human paleogenetics tells us that a widespread awareness of the importance of Open Science may be important to build reliable scientific practices even in the presence of complex experimental challenges.
Collapse
Affiliation(s)
- Paolo Anagnostou
- Dipartimento di Biologia Ambientale, “Sapienza” Università di Roma, Rome, Italy
- Istituto Italiano di Antropologia, Rome, Italy
| | - Marco Capocasa
- Istituto Italiano di Antropologia, Rome, Italy
- Dipartimento Biologia e Biotecnologie “Charles Darwin”, “Sapienza” Università di Roma, Rome, Italy
| | - Nicola Milia
- Dipartimento di Scienze della Vita e dell'Ambiente, Università di Cagliari, Cagliari, Italy
| | - Emanuele Sanna
- Dipartimento di Scienze della Vita e dell'Ambiente, Università di Cagliari, Cagliari, Italy
| | - Cinzia Battaggia
- Dipartimento di Biologia Ambientale, “Sapienza” Università di Roma, Rome, Italy
| | - Daniela Luzi
- Istituto di Ricerche sulla Popolazione e le Politiche Sociali, Consiglio Nazionale delle Ricerche, Rome, Italy
| | - Giovanni Destro Bisol
- Dipartimento di Biologia Ambientale, “Sapienza” Università di Roma, Rome, Italy
- Istituto Italiano di Antropologia, Rome, Italy
| |
Collapse
|
29
|
Ballantyne A. In Favor of a No-Consent/Opt-Out Model of Research With Clinical Samples. THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2015; 15:65-67. [PMID: 26305761 DOI: 10.1080/15265161.2015.1062171] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
|
30
|
|