1
|
Afiaz A, Ivanov AA, Chamberlin J, Hanauer D, Savonen CL, Goldman MJ, Morgan M, Reich M, Getka A, Holmes A, Pati S, Knight D, Boutros PC, Bakas S, Caporaso JG, Del Fiol G, Hochheiser H, Haas B, Schloss PD, Eddy JA, Albrecht J, Fedorov A, Waldron L, Hoffman AM, Bradshaw RL, Leek JT, Wright C. Best practices to evaluate the impact of biomedical research software-metric collection beyond citations. Bioinformatics 2024; 40:btae469. [PMID: 39067017 PMCID: PMC11297485 DOI: 10.1093/bioinformatics/btae469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 05/28/2024] [Accepted: 07/22/2024] [Indexed: 07/30/2024] Open
Abstract
MOTIVATION Software is vital for the advancement of biology and medicine. Impact evaluations of scientific software have primarily emphasized traditional citation metrics of associated papers, despite these metrics inadequately capturing the dynamic picture of impact and despite challenges with improper citation. RESULTS To understand how software developers evaluate their tools, we conducted a survey of participants in the Informatics Technology for Cancer Research (ITCR) program funded by the National Cancer Institute (NCI). We found that although developers realize the value of more extensive metric collection, they find a lack of funding and time hindering. We also investigated software among this community for how often infrastructure that supports more nontraditional metrics were implemented and how this impacted rates of papers describing usage of the software. We found that infrastructure such as social media presence, more in-depth documentation, the presence of software health metrics, and clear information on how to contact developers seemed to be associated with increased mention rates. Analysing more diverse metrics can enable developers to better understand user engagement, justify continued funding, identify novel use cases, pinpoint improvement areas, and ultimately amplify their software's impact. Challenges are associated, including distorted or misleading metrics, as well as ethical and security concerns. More attention to nuances involved in capturing impact across the spectrum of biomedical software is needed. For funders and developers, we outline guidance based on experience from our community. By considering how we evaluate software, we can empower developers to create tools that more effectively accelerate biological and medical research progress. AVAILABILITY AND IMPLEMENTATION More information about the analysis, as well as access to data and code is available at https://github.com/fhdsl/ITCR_Metrics_manuscript_website.
Collapse
Affiliation(s)
- Awan Afiaz
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, United States
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, United States
| | - Andrey A Ivanov
- Department of Pharmacology and Chemical Biology, Emory University School of Medicine, Emory University, Atlanta , GA, 30322, United States
| | - John Chamberlin
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84108, United States
| | - David Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, 48109, United States
| | - Candace L Savonen
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, United States
| | - Mary J Goldman
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95060, United States
| | - Martin Morgan
- Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, United States
| | - Michael Reich
- University of California, San Diego, La Jolla, CA, 92093, United States
| | - Alexander Getka
- University of Pennsylvania, Philadelphia, PA, 19104, United States
| | - Aaron Holmes
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, 90095, United States
- Institute for Precision Health, University of California, Los Angeles, CA, 90095, United States
- Department of Human Genetics, University of California, Los Angeles, CA, 90095, United States
- Department of Urology, University of California, Los Angeles, CA, 90095, United States
| | - Sarthak Pati
- University of Pennsylvania, Philadelphia, PA, 19104, United States
- Division of Computational Pathology, Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, 46202, United States
- Center for Federated Learning, Indiana University School of Medicine, Indianapolis, IN, 46202, United States
| | - Dan Knight
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, 90095, United States
- Institute for Precision Health, University of California, Los Angeles, CA, 90095, United States
- Department of Human Genetics, University of California, Los Angeles, CA, 90095, United States
- Department of Urology, University of California, Los Angeles, CA, 90095, United States
| | - Paul C Boutros
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, 90095, United States
- Institute for Precision Health, University of California, Los Angeles, CA, 90095, United States
- Department of Human Genetics, University of California, Los Angeles, CA, 90095, United States
- Department of Urology, University of California, Los Angeles, CA, 90095, United States
| | - Spyridon Bakas
- University of Pennsylvania, Philadelphia, PA, 19104, United States
- Division of Computational Pathology, Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, 46202, United States
- Center for Federated Learning, Indiana University School of Medicine, Indianapolis, IN, 46202, United States
| | - J Gregory Caporaso
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, 86011, United States
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84108, United States
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15206, United States
| | - Brian Haas
- Methods Development Laboratory, Broad Institute, Cambridge, MA, 02141, United States
| | - Patrick D Schloss
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, 48109, United States
| | - James A Eddy
- Sage Bionetworks, Seattle, WA, 98121, United States
| | | | - Andrey Fedorov
- Department of Radiology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02138, United States
| | - Levi Waldron
- Department of Epidemiology and Biostatistics, City University of New York Graduate School of Public Health and Health Policy, New York, NY, 10027, United States
| | - Ava M Hoffman
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, United States
| | - Richard L Bradshaw
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84108, United States
| | - Jeffrey T Leek
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, United States
| | - Carrie Wright
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, United States
| |
Collapse
|
2
|
Stoudt S, Jernite Y, Marshall B, Marwick B, Sharan M, Whitaker K, Danchev V. Ten simple rules for building and maintaining a responsible data science workflow. PLoS Comput Biol 2024; 20:e1012232. [PMID: 39024267 PMCID: PMC11257324 DOI: 10.1371/journal.pcbi.1012232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/20/2024] Open
Affiliation(s)
- Sara Stoudt
- Department of Mathematics, Bucknell University, Lewisburg, Pennsylvania, United States of America
| | - Yacine Jernite
- Hugging Face, Inc., New York, New York, United States of America
| | | | - Ben Marwick
- Department of Anthropology, University of Washington, Seattle, Washington, United States of America
| | | | | | - Valentin Danchev
- School of Business and Management, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
3
|
Conte ML, Boisvert P, Barrison P, Seifi F, Landis-Lewis Z, Flynn A, Friedman CP. Ten simple rules to make computable knowledge shareable and reusable. PLoS Comput Biol 2024; 20:e1012179. [PMID: 38900708 PMCID: PMC11189186 DOI: 10.1371/journal.pcbi.1012179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2024] Open
Abstract
Computable biomedical knowledge (CBK) is: "the result of an analytic and/or deliberative process about human health, or affecting human health, that is explicit, and therefore can be represented and reasned upon using logic, formal standards, and mathematical approaches." Representing biomedical knowledge in a machine-interpretable, computable form increases its ability to be discovered, accessed, understood, and deployed. Computable knowledge artifacts can greatly advance the potential for implementation, reproducibility, or extension of the knowledge by users, who may include practitioners, researchers, and learners. Enriching computable knowledge artifacts may help facilitate reuse and translation into practice. Following the examples of 10 Simple Rules papers for scientific code, software, and applications, we present 10 Simple Rules intended to make shared computable knowledge artifacts more useful and reusable. These rules are mainly for researchers and their teams who have decided that sharing their computable knowledge is important, who wish to go beyond simply describing results, algorithms, or models via traditional publication pathways, and who want to both make their research findings more accessible, and to help others use their computable knowledge. These rules are roughly organized into 3 categories: planning, engineering, and documentation. Finally, while many of the following examples are of computable knowledge in biomedical domains, these rules are generalizable to computable knowledge in any research domain.
Collapse
Affiliation(s)
- Marisa L. Conte
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Peter Boisvert
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Philip Barrison
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Farid Seifi
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Zach Landis-Lewis
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Allen Flynn
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Charles P. Friedman
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| |
Collapse
|
4
|
Shalom ES, Khan A, Van Loo S, Sourbron SP. Current status in spatiotemporal analysis of contrast-based perfusion MRI. Magn Reson Med 2024; 91:1136-1148. [PMID: 37929645 PMCID: PMC10962600 DOI: 10.1002/mrm.29906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 10/05/2023] [Accepted: 10/10/2023] [Indexed: 11/07/2023]
Abstract
In perfusion MRI, image voxels form a spatially organized network of systems, all exchanging indicator with their immediate neighbors. Yet the current paradigm for perfusion MRI analysis treats all voxels or regions-of-interest as isolated systems supplied by a single global source. This simplification not only leads to long-recognized systematic errors but also fails to leverage the embedded spatial structure within the data. Since the early 2000s, a variety of models and implementations have been proposed to analyze systems with between-voxel interactions. In general, this leads to large and connected numerical inverse problems that are intractible with conventional computational methods. With recent advances in machine learning, however, these approaches are becoming practically feasible, opening up the way for a paradigm shift in the approach to perfusion MRI. This paper seeks to review the work in spatiotemporal modelling of perfusion MRI using a coherent, harmonized nomenclature and notation, with clear physical definitions and assumptions. The aim is to introduce clarity in the state-of-the-art of this promising new approach to perfusion MRI, and help to identify gaps of knowledge and priorities for future research.
Collapse
Affiliation(s)
- Eve S. Shalom
- School of Physics and AstronomyUniversity of LeedsLeedsUK
- Division of Clinical MedicineUniversity of SheffieldSheffieldUK
| | - Amirul Khan
- School of Civil EngineeringUniversity of LeedsLeedsUK
| | - Sven Van Loo
- School of Physics and AstronomyUniversity of LeedsLeedsUK
- Department of Applied PhysicsGhent UniversityGhentBelgium
| | | |
Collapse
|
5
|
Ivimey-Cook ER, Pick JL, Bairos-Novak KR, Culina A, Gould E, Grainger M, Marshall BM, Moreau D, Paquet M, Royauté R, Sánchez-Tójar A, Silva I, Windecker SM. Implementing code review in the scientific workflow: Insights from ecology and evolutionary biology. J Evol Biol 2023; 36:1347-1356. [PMID: 37812156 DOI: 10.1111/jeb.14230] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/08/2023] [Accepted: 09/12/2023] [Indexed: 10/10/2023]
Abstract
Code review increases reliability and improves reproducibility of research. As such, code review is an inevitable step in software development and is common in fields such as computer science. However, despite its importance, code review is noticeably lacking in ecology and evolutionary biology. This is problematic as it facilitates the propagation of coding errors and a reduction in reproducibility and reliability of published results. To address this, we provide a detailed commentary on how to effectively review code, how to set up your project to enable this form of review and detail its possible implementation at several stages throughout the research process. This guide serves as a primer for code review, and adoption of the principles and advice here will go a long way in promoting more open, reliable, and transparent ecology and evolutionary biology.
Collapse
Affiliation(s)
- Edward R Ivimey-Cook
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow, UK
| | - Joel L Pick
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, UK
| | - Kevin R Bairos-Novak
- Australian Research Council Centre of Excellence for Coral Reef Studies & College of Science and Engineering, James Cook University, Townsville, Queensland, Australia
| | - Antica Culina
- Rudjer Boskovic Institute, Zagreb, Croatia
- Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, the Netherlands
| | - Elliot Gould
- School of Ecosystem and Forest Sciences, University of Melbourne, Melbourne, Victoria, Australia
| | | | - Benjamin M Marshall
- Biological and Environmental Sciences, Faculty of Natural Sciences, University of Stirling, Stirling, UK
| | - David Moreau
- School of Psychology, Centre for Brain Research, University of Auckland, Auckland, New Zealand
| | - Matthieu Paquet
- Institute of Mathematics of Bordeaux, University of Bordeaux, CNRS, Bordeaux INP, Talence, France
| | - Raphaël Royauté
- Université ParisSaclay, INRAE, AgroParisTech, UMR EcoSys, Palaiseau, France
| | | | - Inês Silva
- Center for Advanced Systems Understanding (CASUS), Helmholtz-Zentrum Dresden-Rossendorf e.V. (HZDR), Görlitz, Germany
| | - Saras M Windecker
- School of Ecosystem and Forest Sciences, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
6
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
7
|
Lubiana T, Lopes R, Medeiros P, Silva JC, Goncalves ANA, Maracaja-Coutinho V, Nakaya HI. Ten quick tips for harnessing the power of ChatGPT in computational biology. PLoS Comput Biol 2023; 19:e1011319. [PMID: 37561669 PMCID: PMC10414555 DOI: 10.1371/journal.pcbi.1011319] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open
Affiliation(s)
- Tiago Lubiana
- School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | - Rafael Lopes
- Department of Epidemiology of Microbial Diseases and Public Health Modeling Unit, Yale School of Public Health, New Haven, Connecticut, United States of America
| | | | - Juan Carlo Silva
- School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | | | - Vinicius Maracaja-Coutinho
- Advanced Center for Chronic Diseases, Universidad de Chile, Santiago, Chile
- Centro de Modelamiento Molecular, Biofísica y Bioinformática—CM2B2, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
- ANID Anillo ACT210004 SYSTEMIX, Rancagua, Chile
- Anillo Inflammation in HIV/AIDS—InflammAIDS, Santiago, Chile
- Beagle Bioinformatics, São Paulo, Brasil & Santiago, Chile
| | - Helder I. Nakaya
- School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
- Hospital Israelita Albert Einstein, São Paulo, Brazil
| |
Collapse
|
8
|
Chicco D, Jurman G. Ten simple rules for providing bioinformatics support within a hospital. BioData Min 2023; 16:6. [PMID: 36823520 PMCID: PMC9948383 DOI: 10.1186/s13040-023-00326-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 02/17/2023] [Indexed: 02/25/2023] Open
Abstract
Bioinformatics has become a key aspect of the biomedical research programmes of many hospitals' scientific centres, and the establishment of bioinformatics facilities within hospitals has become a common practice worldwide. Bioinformaticians working in these facilities provide computational biology support to medical doctors and principal investigators who are daily dealing with data of patients to analyze. These bioinformatics analysts, although pivotal, usually do not receive formal training for this job. We therefore propose these ten simple rules to guide these bioinformaticians in their work: ten pieces of advice on how to provide bioinformatics support to medical doctors in hospitals. We believe these simple rules can help bioinformatics facility analysts in producing better scientific results and work in a serene and fruitful environment.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, 155 College Street, M5T 3M7, Toronto, Ontario, Canada.
| | - Giuseppe Jurman
- grid.11469.3b0000 0000 9780 0901Data Science for Health Unit, Fondazione Bruno Kessler, Via Sommarive 18, 38123 Povo, Trento, Italy
| |
Collapse
|
9
|
Du X, Dastmalchi F, Ye H, Garrett TJ, Diller MA, Liu M, Hogan WR, Brochhausen M, Lemas DJ. Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software. Metabolomics 2023; 19:11. [PMID: 36745241 DOI: 10.1007/s11306-023-01974-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 01/20/2023] [Indexed: 02/07/2023]
Abstract
BACKGROUND Liquid chromatography-high resolution mass spectrometry (LC-HRMS) is a popular approach for metabolomics data acquisition and requires many data processing software tools. The FAIR Principles - Findability, Accessibility, Interoperability, and Reusability - were proposed to promote open science and reusable data management, and to maximize the benefit obtained from contemporary and formal scholarly digital publishing. More recently, the FAIR principles were extended to include Research Software (FAIR4RS). AIM OF REVIEW This study facilitates open science in metabolomics by providing an implementation solution for adopting FAIR4RS in the LC-HRMS metabolomics data processing software. We believe our evaluation guidelines and results can help improve the FAIRness of research software. KEY SCIENTIFIC CONCEPTS OF REVIEW We evaluated 124 LC-HRMS metabolomics data processing software obtained from a systematic review and selected 61 software for detailed evaluation using FAIR4RS-related criteria, which were extracted from the literature along with internal discussions. We assigned each criterion one or more FAIR4RS categories through discussion. The minimum, median, and maximum percentages of criteria fulfillment of software were 21.6%, 47.7%, and 71.8%. Statistical analysis revealed no significant improvement in FAIRness over time. We identified four criteria covering multiple FAIR4RS categories but had a low %fulfillment: (1) No software had semantic annotation of key information; (2) only 6.3% of evaluated software were registered to Zenodo and received DOIs; (3) only 14.5% of selected software had official software containerization or virtual machine; (4) only 16.7% of evaluated software had a fully documented functions in code. According to the results, we discussed improvement strategies and future directions.
Collapse
Affiliation(s)
- Xinsong Du
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Farhad Dastmalchi
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Hao Ye
- Health Science Center Libraries, University of Florida, Florida, USA
| | - Timothy J Garrett
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Florida, USA
| | - Matthew A Diller
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mei Liu
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mathias Brochhausen
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, USA
| | - Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA.
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Florida, Gainesville, United States.
- Center for Perinatal Outcomes Research, University of Florida College of Medicine, Gainesville, United States.
| |
Collapse
|
10
|
Cain JY, Yu JS, Bagheri N. The in silico lab: Improving academic code using lessons from biology. Cell Syst 2023; 14:1-6. [PMID: 36657389 DOI: 10.1016/j.cels.2022.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 10/27/2022] [Accepted: 11/22/2022] [Indexed: 01/19/2023]
Abstract
"Good code" is often regarded as a nebulous, impractical ideal. Common best practices toward improving code quality can be inaccessible to those without a rigorous computer science or software engineering background, contributing to a gap between advancing scientific research and FAIR practices. We seek to equip researchers with the necessary background and context to tackle the challenge of improving code quality in computational biology research using analogies from biology to synthesize why certain best practices are critical for advancing computational research. Improving code quality requires active stewardship; we encourage researchers to deliberately adopt and share practices that ensure reusability, repeatability, and reproducibility.
Collapse
Affiliation(s)
- Jason Y Cain
- Department of Chemical Engineering, University of Washington, Seattle, WA 98195, USA
| | - Jessica S Yu
- Department of Biology, University of Washington, Seattle, WA 98195, USA
| | - Neda Bagheri
- Department of Chemical Engineering, University of Washington, Seattle, WA 98195, USA; Department of Biology, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
11
|
Oza VH, Whitlock JH, Wilk EJ, Uno-Antonison A, Wilk B, Gajapathy M, Howton TC, Trull A, Ianov L, Worthey EA, Lasseigne BN. Ten simple rules for using public biological data for your research. PLoS Comput Biol 2023; 19:e1010749. [PMID: 36602970 DOI: 10.1371/journal.pcbi.1010749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
With an increasing amount of biological data available publicly, there is a need for a guide on how to successfully download and use this data. The 10 simple rules for using public biological data are: (1) use public data purposefully in your research; (2) evaluate data for your use case; (3) check data reuse requirements and embargoes; (4) be aware of ethics for data reuse; (5) plan for data storage and compute requirements; (6) know what you are downloading; (7) download programmatically and verify integrity; (8) properly cite data; (9) make reprocessed data and models Findable, Accessible, Interoperable, and Reusable (FAIR) and share; and (10) make pipelines and code FAIR and share. These rules are intended as a guide for researchers wanting to make use of available data and to increase data reuse and reproducibility.
Collapse
Affiliation(s)
- Vishal H Oza
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Jordan H Whitlock
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Elizabeth J Wilk
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Angelina Uno-Antonison
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Brandon Wilk
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Manavalan Gajapathy
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Timothy C Howton
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Austyn Trull
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Lara Ianov
- Civitan International Research Center, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Elizabeth A Worthey
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Brittany N Lasseigne
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| |
Collapse
|
12
|
Hughes A, Ragonnet R, Jayasundara P, Ngo HA, de Lara-Tuprio E, Estuar MRJ, Teng TR, Boon LK, Peariasamy KM, Chong ZL, Ghazali IMM, Fox GJ, Nguyen TA, Le LV, Abayawardana M, Shipman D, McBryde ES, Meehan MT, Caldwell JM, Trauer JM. COVID-19 collaborative modelling for policy response in the Philippines, Malaysia and Vietnam. THE LANCET REGIONAL HEALTH. WESTERN PACIFIC 2022; 29:100563. [PMID: 35974800 PMCID: PMC9371475 DOI: 10.1016/j.lanwpc.2022.100563] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Affiliation(s)
- Angus Hughes
- School of Public Health and Preventive Medicine, Monash University, Melbourne Australia
| | - Romain Ragonnet
- School of Public Health and Preventive Medicine, Monash University, Melbourne Australia
| | - Pavithra Jayasundara
- School of Public Health and Preventive Medicine, Monash University, Melbourne Australia
| | - Hoang-Anh Ngo
- Woolcock Institute of Medical Research, Hanoi, Viet Nam
- Usher Institute, The University of Edinburgh, Edinburgh, United Kingdom
| | | | | | - Timothy Robin Teng
- Department of Mathematics, Ateneo de Manila University, Manila, Philippines
| | - Law Kian Boon
- Institute for Clinical Research, National Institutes of Health, Ministry of Health Malaysia, Kuala Lumpur, Malaysia
| | - Kalaiarasu M. Peariasamy
- Institute for Clinical Research, National Institutes of Health, Ministry of Health Malaysia, Kuala Lumpur, Malaysia
| | - Zhuo-Lin Chong
- Institute for Public Health, National Institutes of Health, Ministry of Health Malaysia, Kuala Lumpur, Malaysia
| | - Izzuna Mudla M Ghazali
- Malaysian Health Technology Assessment Section, Medical Development Division, Ministry of Health Malaysia, Kuala Lumpur, Malaysia
| | - Greg J. Fox
- Central Clinical School, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - Thu-Anh Nguyen
- Woolcock Institute of Medical Research, Hanoi, Viet Nam
- Central Clinical School, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - Linh-Vi Le
- WHO Regional Office for the Western Pacific, Manila, Philippines
| | - Milinda Abayawardana
- School of Public Health and Preventive Medicine, Monash University, Melbourne Australia
| | - David Shipman
- School of Public Health and Preventive Medicine, Monash University, Melbourne Australia
| | - Emma S. McBryde
- Australian Institute of Tropical Health and Medicine, James Cook University, Townsville, Australia
| | - Michael T. Meehan
- Australian Institute of Tropical Health and Medicine, James Cook University, Townsville, Australia
| | - Jamie M. Caldwell
- High Meadows Environmental Institute, Princeton University, New Jersey, United States of America
| | - James M. Trauer
- School of Public Health and Preventive Medicine, Monash University, Melbourne Australia
| |
Collapse
|
13
|
Vallet N, Michonneau D, Tournier S. Toward practical transparent verifiable and long-term reproducible research using Guix. Sci Data 2022; 9:597. [PMID: 36195618 PMCID: PMC9532446 DOI: 10.1038/s41597-022-01720-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 09/26/2022] [Indexed: 11/09/2022] Open
Abstract
Reproducibility crisis urge scientists to promote transparency which allows peers to draw same conclusions after performing identical steps from hypothesis to results. Growing resources are developed to open the access to methods, data and source codes. Still, the computational environment, an interface between data and source code running analyses, is not addressed. Environments are usually described with software and library names associated with version labels or provided as an opaque container image. This is not enough to describe the complexity of the dependencies on which they rely to operate on. We describe this issue and illustrate how open tools like Guix can be used by any scientist to share their environment and allow peers to reproduce it. Some steps of research might not be fully reproducible, but at least, transparency for computation is technically addressable. These tools should be considered by scientists willing to promote transparency and open science.
Collapse
Affiliation(s)
- Nicolas Vallet
- Université de Paris, INSERM U976, F-75010, Paris, France.
| | - David Michonneau
- Université de Paris, INSERM U976, F-75010, Paris, France.,Hematology Transplantation, Saint Louis hospital, 1 avenue Claude Vellefaux, 75010, Paris, France
| | - Simon Tournier
- Université de Paris, INSERM US53, CNRS UAR 2030, Saint Louis Research Institute, 1 avenue Claude Vellefaux, 75010, Paris, France
| |
Collapse
|
14
|
Fisher JL, Jones EF, Flanary VL, Williams AS, Ramsey EJ, Lasseigne BN. Considerations and challenges for sex-aware drug repurposing. Biol Sex Differ 2022; 13:13. [PMID: 35337371 PMCID: PMC8949654 DOI: 10.1186/s13293-022-00420-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 03/06/2022] [Indexed: 01/09/2023] Open
Abstract
Sex differences are essential factors in disease etiology and manifestation in many diseases such as cardiovascular disease, cancer, and neurodegeneration [33]. The biological influence of sex differences (including genomic, epigenetic, hormonal, immunological, and metabolic differences between males and females) and the lack of biomedical studies considering sex differences in their study design has led to several policies. For example, the National Institute of Health's (NIH) sex as a biological variable (SABV) and Sex and Gender Equity in Research (SAGER) policies to motivate researchers to consider sex differences [204]. However, drug repurposing, a promising alternative to traditional drug discovery by identifying novel uses for FDA-approved drugs, lacks sex-aware methods that can improve the identification of drugs that have sex-specific responses [7, 11, 14, 33]. Sex-aware drug repurposing methods either select drug candidates that are more efficacious in one sex or deprioritize drug candidates based on if they are predicted to cause a sex-bias adverse event (SBAE), unintended therapeutic effects that are more likely to occur in one sex. Computational drug repurposing methods are encouraging approaches to develop for sex-aware drug repurposing because they can prioritize sex-specific drug candidates or SBAEs at lower cost and time than traditional drug discovery. Sex-aware methods currently exist for clinical, genomic, and transcriptomic information [1, 7, 155]. They have not expanded to other data types, such as DNA variation, which has been beneficial in other drug repurposing methods that do not consider sex [114]. Additionally, some sex-aware methods suffer from poorer performance because a disproportionate number of male and female samples are available to train computational methods [7]. However, there is development potential for several different categories (i.e., data mining, ligand binding predictions, molecular associations, and networks). Low-dimensional representations of molecular association and network approaches are also especially promising candidates for future sex-aware drug repurposing methodologies because they reduce the multiple hypothesis testing burden and capture sex-specific variation better than the other methods [151, 159]. Here we review how sex influences drug response, the current state of drug repurposing including with respect to sex-bias drug response, and how model organism study design choices influence drug repurposing validation.
Collapse
Affiliation(s)
- Jennifer L. Fisher
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| | - Emma F. Jones
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| | - Victoria L. Flanary
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| | - Avery S. Williams
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| | - Elizabeth J. Ramsey
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| | - Brittany N. Lasseigne
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294 USA
| |
Collapse
|