1
|
Abdill RJ, Talarico E, Grieneisen L. A how-to guide for code sharing in biology. PLoS Biol 2024; 22:e3002815. [PMID: 39255324 DOI: 10.1371/journal.pbio.3002815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/12/2024] Open
Abstract
In 2024, all biology is computational biology. Computer-aided analysis continues to spread into new fields, becoming more accessible to researchers trained in the wet lab who are eager to take advantage of growing datasets, falling costs, and novel assays that present new opportunities for discovery. It is currently much easier to find guidance for implementing these techniques than for reporting their use, leaving biologists to guess which details and files are relevant. In this essay, we review existing literature on the topic, summarize common tips, and link to additional resources for training. Following this overview, we then provide a set of recommendations for sharing code, with an eye toward guiding those who are comparatively new to applying open science principles to their computational work. Taken together, we provide a guide for biologists who seek to follow code sharing best practices but are unsure where to start.
Collapse
Affiliation(s)
- Richard J Abdill
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Emma Talarico
- Department of Biology, University of British Columbia-Okanagan Campus, Kelowna, Canada
| | - Laura Grieneisen
- Department of Biology, University of British Columbia-Okanagan Campus, Kelowna, Canada
- Okanagan Institute for Biodiversity, Resilience, and Ecosystem Services, University of British Columbia-Okanagan Campus, Kelowna, Canada
| |
Collapse
|
2
|
Gallagher K, Creswell R, Lambert B, Robinson M, Lok Lei C, Mirams GR, Gavaghan DJ. Ten simple rules for training scientists to make better software. PLoS Comput Biol 2024; 20:e1012410. [PMID: 39264985 PMCID: PMC11392269 DOI: 10.1371/journal.pcbi.1012410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2024] Open
Affiliation(s)
- Kit Gallagher
- Doctoral Training Centre, University of Oxford, Oxford, United Kingdom
| | - Richard Creswell
- Department of Computer Science, University of Oxford, Oxford, United Kingdom
| | - Ben Lambert
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Martin Robinson
- Department of Computer Science, University of Oxford, Oxford, United Kingdom
| | - Chon Lok Lei
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Macau, China
| | - Gary R Mirams
- Centre for Mathematical Medicine & Biology, School of Mathematical Sciences, University of Nottingham, Nottingham, United Kingdom
| | - David J Gavaghan
- Doctoral Training Centre, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
3
|
Botvinik-Nezer R, Wager TD. Reproducibility in Neuroimaging Analysis: Challenges and Solutions. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2023; 8:780-788. [PMID: 36906444 DOI: 10.1016/j.bpsc.2022.12.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 11/27/2022] [Accepted: 12/11/2022] [Indexed: 12/23/2022]
Abstract
Recent years have marked a renaissance in efforts to increase research reproducibility in psychology, neuroscience, and related fields. Reproducibility is the cornerstone of a solid foundation of fundamental research-one that will support new theories built on valid findings and technological innovation that works. The increased focus on reproducibility has made the barriers to it increasingly apparent, along with the development of new tools and practices to overcome these barriers. Here, we review challenges, solutions, and emerging best practices with a particular emphasis on neuroimaging studies. We distinguish 3 main types of reproducibility, discussing each in turn. Analytical reproducibility is the ability to reproduce findings using the same data and methods. Replicability is the ability to find an effect in new datasets, using the same or similar methods. Finally, robustness to analytical variability refers to the ability to identify a finding consistently across variation in methods. The incorporation of these tools and practices will result in more reproducible, replicable, and robust psychological and brain research and a stronger scientific foundation across fields of inquiry.
Collapse
Affiliation(s)
- Rotem Botvinik-Nezer
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire.
| | - Tor D Wager
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire
| |
Collapse
|
4
|
Oza VH, Whitlock JH, Wilk EJ, Uno-Antonison A, Wilk B, Gajapathy M, Howton TC, Trull A, Ianov L, Worthey EA, Lasseigne BN. Ten simple rules for using public biological data for your research. PLoS Comput Biol 2023; 19:e1010749. [PMID: 36602970 DOI: 10.1371/journal.pcbi.1010749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
With an increasing amount of biological data available publicly, there is a need for a guide on how to successfully download and use this data. The 10 simple rules for using public biological data are: (1) use public data purposefully in your research; (2) evaluate data for your use case; (3) check data reuse requirements and embargoes; (4) be aware of ethics for data reuse; (5) plan for data storage and compute requirements; (6) know what you are downloading; (7) download programmatically and verify integrity; (8) properly cite data; (9) make reprocessed data and models Findable, Accessible, Interoperable, and Reusable (FAIR) and share; and (10) make pipelines and code FAIR and share. These rules are intended as a guide for researchers wanting to make use of available data and to increase data reuse and reproducibility.
Collapse
Affiliation(s)
- Vishal H Oza
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Jordan H Whitlock
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Elizabeth J Wilk
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Angelina Uno-Antonison
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Brandon Wilk
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Manavalan Gajapathy
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Timothy C Howton
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Austyn Trull
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Lara Ianov
- Civitan International Research Center, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Elizabeth A Worthey
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Brittany N Lasseigne
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| |
Collapse
|
6
|
Bodner K, Rauen Firkowski C, Bennett JR, Brookson C, Dietze M, Green S, Hughes J, Kerr J, Kunegel‐Lion M, Leroux SJ, McIntire E, Molnár PK, Simpkins C, Tekwa E, Watts A, Fortin M. Bridging the divide between ecological forecasts and environmental decision making. Ecosphere 2021. [DOI: 10.1002/ecs2.3869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Affiliation(s)
- Korryn Bodner
- Department of Ecology and Evolution University of Toronto Toronto Ontario Canada
- Department of Biological Sciences University of Toronto Scarborough Toronto Ontario Canada
| | - Carina Rauen Firkowski
- Department of Ecology and Evolution University of Toronto Toronto Ontario Canada
- Department of Biology McGill University Montreal Quebec Canada
| | | | - Cole Brookson
- Department of Biological Sciences University of Alberta Edmonton Alberta Canada
| | - Michael Dietze
- Department of Earth & Environment Boston University Boston Massachusetts USA
| | - Stephanie Green
- Department of Biological Sciences University of Alberta Edmonton Alberta Canada
| | - Josie Hughes
- National Wildlife Research Centre Environment and Climate Change Canada Ottawa Ontario Canada
| | - Jeremy Kerr
- Department of Biology University of Ottawa Ottawa Ontario Canada
| | - Mélodie Kunegel‐Lion
- Canadian Forest Service Northern Forestry Centre Natural Resources Canada Edmonton Alberta Canada
| | - Shawn J. Leroux
- Department of Biology Memorial University of Newfoundland St. John’s Newfoundland Canada
| | - Eliot McIntire
- Canadian Forest Service Pacific Forestry Centre Natural Resources Canada Victoria British Columbia Canada
- Faculty of Forestry Forest Resources Management University of British Columbia Vancouver British Columbia Canada
| | - Péter K. Molnár
- Department of Ecology and Evolution University of Toronto Toronto Ontario Canada
- Department of Biological Sciences University of Toronto Scarborough Toronto Ontario Canada
| | - Craig Simpkins
- School of Environment University of Auckland Auckland New Zealand
- Department of Biology Wilfrid Laurier University Waterloo Ontario Canada
- Department of Ecological Modelling Georg‐August University of Goettingen Goettingen Germany
| | - Edward Tekwa
- Department of Zoology University of British Columbia Vancouver British Columbia Canada
| | | | - Marie‐Josée Fortin
- Department of Ecology and Evolution University of Toronto Toronto Ontario Canada
| |
Collapse
|
7
|
Hunter-Zinck H, de Siqueira AF, Vásquez VN, Barnes R, Martinez CC. Ten simple rules on writing clean and reliable open-source scientific software. PLoS Comput Biol 2021; 17:e1009481. [PMID: 34762641 PMCID: PMC8584773 DOI: 10.1371/journal.pcbi.1009481] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Functional, usable, and maintainable open-source software is increasingly essential to scientific research, but there is a large variation in formal training for software development and maintainability. Here, we propose 10 "rules" centered on 2 best practice components: clean code and testing. These 2 areas are relatively straightforward and provide substantial utility relative to the learning investment. Adopting clean code practices helps to standardize and organize software code in order to enhance readability and reduce cognitive load for both the initial developer and subsequent contributors; this allows developers to concentrate on core functionality and reduce errors. Clean coding styles make software code more amenable to testing, including unit tests that work best with modular and consistent software code. Unit tests interrogate specific and isolated coding behavior to reduce coding errors and ensure intended functionality, especially as code increases in complexity; unit tests also implicitly provide example usages of code. Other forms of testing are geared to discover erroneous behavior arising from unexpected inputs or emerging from the interaction of complex codebases. Although conforming to coding styles and designing tests can add time to the software development project in the short term, these foundational tools can help to improve the correctness, quality, usability, and maintainability of open-source scientific software code. They also advance the principal point of scientific research: producing accurate results in a reproducible way. In addition to suggesting several tips for getting started with clean code and testing practices, we recommend numerous tools for the popular open-source scientific software languages Python, R, and Julia.
Collapse
Affiliation(s)
- Haley Hunter-Zinck
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, California, United States of America
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, United States of America
- VA Boston Healthcare System, Boston, Massachusetts, United States of America
- VA St. Louis Health Care System, St. Louis, Missouri, United States of America
| | | | - Váleri N. Vásquez
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, California, United States of America
- Energy and Resources Group, Rausser College of Natural Resources, University of California, Berkeley, Berkeley, California, United States of America
| | - Richard Barnes
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, California, United States of America
- Energy and Resources Group, Rausser College of Natural Resources, University of California, Berkeley, Berkeley, California, United States of America
| | - Ciera C. Martinez
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, California, United States of America
| |
Collapse
|