1
|
Gallagher K, Creswell R, Lambert B, Robinson M, Lok Lei C, Mirams GR, Gavaghan DJ. Ten simple rules for training scientists to make better software. PLoS Comput Biol 2024; 20:e1012410. [PMID: 39264985 PMCID: PMC11392269 DOI: 10.1371/journal.pcbi.1012410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2024] Open
Affiliation(s)
- Kit Gallagher
- Doctoral Training Centre, University of Oxford, Oxford, United Kingdom
| | - Richard Creswell
- Department of Computer Science, University of Oxford, Oxford, United Kingdom
| | - Ben Lambert
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Martin Robinson
- Department of Computer Science, University of Oxford, Oxford, United Kingdom
| | - Chon Lok Lei
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Macau, China
| | - Gary R Mirams
- Centre for Mathematical Medicine & Biology, School of Mathematical Sciences, University of Nottingham, Nottingham, United Kingdom
| | - David J Gavaghan
- Doctoral Training Centre, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
2
|
Berezin CT, Aguilera LU, Billerbeck S, Bourne PE, Densmore D, Freemont P, Gorochowski TE, Hernandez SI, Hillson NJ, King CR, Köpke M, Ma S, Miller KM, Moon TS, Moore JH, Munsky B, Myers CJ, Nicholas DA, Peccoud SJ, Zhou W, Peccoud J. Ten simple rules for managing laboratory information. PLoS Comput Biol 2023; 19:e1011652. [PMID: 38060459 PMCID: PMC10703290 DOI: 10.1371/journal.pcbi.1011652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2023] Open
Abstract
Information is the cornerstone of research, from experimental (meta)data and computational processes to complex inventories of reagents and equipment. These 10 simple rules discuss best practices for leveraging laboratory information management systems to transform this large information load into useful scientific findings.
Collapse
Affiliation(s)
- Casey-Tyler Berezin
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Luis U. Aguilera
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Sonja Billerbeck
- Molecular Microbiology Unit, Faculty of Science and Engineering, University of Groningen, Groningen, the Netherlands
| | - Philip E. Bourne
- School of Data Science, University of Virginia, Charlottesville, Virginia, United States of America
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| | - Douglas Densmore
- College of Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Paul Freemont
- Department of Infectious Disease, Imperial College, London, United Kingdom
| | - Thomas E. Gorochowski
- School of Biological Sciences, University of Bristol, Bristol, United Kingdom
- BrisEngBio, University of Bristol, Bristol, United Kingdom
| | - Sarah I. Hernandez
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Nathan J. Hillson
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- US Department of Energy Agile BioFoundry, Emeryville, California, United States of America
- US Department of Energy Joint BioEnergy Institute, Emeryville, California, United States of America
| | - Connor R. King
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Michael Köpke
- LanzaTech, Skokie, Illinois, United States of America
| | - Shuyi Ma
- Center for Global Infectious Disease Research, Seattle Children’s Hospital, University of Washington Medicine, Seattle, Washington, United States of America
| | - Katie M. Miller
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Tae Seok Moon
- Department of Energy, Environmental & Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Jason H. Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, United States of America
| | - Brian Munsky
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Chris J. Myers
- Department of Electrical, Computer & Energy Engineering, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Dequina A. Nicholas
- Department of Molecular Biology & Biochemistry, University of California Irvine, Irvine, California, United States of America
| | - Samuel J. Peccoud
- Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Wen Zhou
- Department of Statistics, Colorado State University, Fort Collins, Colorado, United States of America
| | - Jean Peccoud
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| |
Collapse
|
3
|
Laine VN, Sepers B, Lindner M, Gawehns F, Ruuskanen S, van Oers K. An ecologist's guide for studying DNA methylation variation in wild vertebrates. Mol Ecol Resour 2023; 23:1488-1508. [PMID: 35466564 DOI: 10.1111/1755-0998.13624] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 03/29/2022] [Accepted: 04/13/2022] [Indexed: 11/30/2022]
Abstract
The field of molecular biology is advancing fast with new powerful technologies, sequencing methods and analysis software being developed constantly. Commonly used tools originally developed for research on humans and model species are now regularly used in ecological and evolutionary research. There is also a growing interest in the causes and consequences of epigenetic variation in natural populations. Studying ecological epigenetics is currently challenging, especially for vertebrate systems, because of the required technical expertise, complications with analyses and interpretation, and limitations in acquiring sufficiently high sample sizes. Importantly, neglecting the limitations of the experimental setup, technology and analyses may affect the reliability and reproducibility, and the extent to which unbiased conclusions can be drawn from these studies. Here, we provide a practical guide for researchers aiming to study DNA methylation variation in wild vertebrates. We review the technical aspects of epigenetic research, concentrating on DNA methylation using bisulfite sequencing, discuss the limitations and possible pitfalls, and how to overcome them through rigid and reproducible data analysis. This review provides a solid foundation for the proper design of epigenetic studies, a clear roadmap on the best practices for correct data analysis and a realistic view on the limitations for studying ecological epigenetics in vertebrates. This review will help researchers studying the ecological and evolutionary implications of epigenetic variation in wild populations.
Collapse
Affiliation(s)
- Veronika N Laine
- Finnish Museum of Natural History, University of Helsinki, Helsinki, Finland
| | - Bernice Sepers
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands
- Behavioural Ecology Group, Wageningen University & Research (WUR), Wageningen, The Netherlands
| | - Melanie Lindner
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands
- Chronobiology Unit, Groningen Institute for Evolutionary Life Sciences (GELIFES), University of Groningen, Groningen, The Netherlands
| | - Fleur Gawehns
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands
| | - Suvi Ruuskanen
- Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland
- Department of Biology, University of Turku, Finland
| | - Kees van Oers
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands
- Behavioural Ecology Group, Wageningen University & Research (WUR), Wageningen, The Netherlands
| |
Collapse
|
4
|
Chicco D, Jurman G. Ten simple rules for providing bioinformatics support within a hospital. BioData Min 2023; 16:6. [PMID: 36823520 PMCID: PMC9948383 DOI: 10.1186/s13040-023-00326-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 02/17/2023] [Indexed: 02/25/2023] Open
Abstract
Bioinformatics has become a key aspect of the biomedical research programmes of many hospitals' scientific centres, and the establishment of bioinformatics facilities within hospitals has become a common practice worldwide. Bioinformaticians working in these facilities provide computational biology support to medical doctors and principal investigators who are daily dealing with data of patients to analyze. These bioinformatics analysts, although pivotal, usually do not receive formal training for this job. We therefore propose these ten simple rules to guide these bioinformaticians in their work: ten pieces of advice on how to provide bioinformatics support to medical doctors in hospitals. We believe these simple rules can help bioinformatics facility analysts in producing better scientific results and work in a serene and fruitful environment.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, 155 College Street, M5T 3M7 Toronto, Ontario Canada
| | - Giuseppe Jurman
- Data Science for Health Unit, Fondazione Bruno Kessler, Via Sommarive 18, 38123 Povo, Trento, Italy
| |
Collapse
|
5
|
Schlaeppi A, Adams W, Haase R, Huisken J, MacDonald RB, Eliceiri KW, Kugler EC. Meeting in the Middle: Towards Successful Multidisciplinary Bioimage Analysis Collaboration. FRONTIERS IN BIOINFORMATICS 2022; 2. [PMID: 35600765 PMCID: PMC9122012 DOI: 10.3389/fbinf.2022.889755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
With an increase in subject knowledge expertise required to solve specific biological questions, experts from different fields need to collaborate to address increasingly complex issues. To successfully collaborate, everyone involved in the collaboration must take steps to “meet in the middle.” We thus present a guide on truly cross-disciplinary work using bioimage analysis as a showcase, where it is required that the expertise of biologists, microscopists, data analysts, clinicians, engineers, and physicists meet. We discuss considerations and best practices from the perspective of both users and technology developers, while offering suggestions for working together productively and how this can be supported by institutes and funders. Although this guide uses bioimage analysis as an example, the guiding principles of these perspectives are widely applicable to other cross-disciplinary work.
Collapse
Affiliation(s)
- Anjalie Schlaeppi
- Morgridge Institute for Research, Madison, WI, United States
- BioImaging and Optics Platform, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- *Correspondence: Anjalie Schlaeppi, ; Elisabeth C. Kugler,
| | - Wilson Adams
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, United States
- Department of Pharmacology, Vanderbilt University, Nashville, TN, United States
| | - Robert Haase
- DFG Cluster of Excellence “Physics of Life”, Germany and Center for Systems Biology Dresden, TU Dresden, Dresden, Germany
| | - Jan Huisken
- Morgridge Institute for Research, Madison, WI, United States
- Department of Biology and Psychology, Georg-August-University Göttingen, Göttingen, Germany
| | - Ryan B. MacDonald
- Faculty of Brain Sciences, Institute of Ophthalmology, University College London, London, United Kingdom
| | - Kevin W. Eliceiri
- Morgridge Institute for Research, Madison, WI, United States
- Center for Quantitative Cell Imaging, University of Wisconsin-Madison, Madison, WI, United States
| | - Elisabeth C. Kugler
- Faculty of Brain Sciences, Institute of Ophthalmology, University College London, London, United Kingdom
- *Correspondence: Anjalie Schlaeppi, ; Elisabeth C. Kugler,
| |
Collapse
|
6
|
Fungtammasan A, Lee A, Taroni J, Wheeler K, Chin CS, Davis S, Greene C. Ten simple rules for large-scale data processing. PLoS Comput Biol 2022; 18:e1009757. [PMID: 35143491 PMCID: PMC8830682 DOI: 10.1371/journal.pcbi.1009757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Arkarachai Fungtammasan
- DNAnexus, Inc., Mountain View, California, United States of America
- * E-mail: (AF); (C-SC); (SD); (CG)
| | - Alexandra Lee
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jaclyn Taroni
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Philadelphia, Pennsylvania, United States of America
| | - Kurt Wheeler
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Philadelphia, Pennsylvania, United States of America
| | - Chen-Shan Chin
- DNAnexus, Inc., Mountain View, California, United States of America
- * E-mail: (AF); (C-SC); (SD); (CG)
| | - Sean Davis
- Center for Health AI, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America
- Department of Medicine, Divisions of Medical Oncology and Hematology, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America
- * E-mail: (AF); (C-SC); (SD); (CG)
| | - Casey Greene
- Center for Health AI, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America
- * E-mail: (AF); (C-SC); (SD); (CG)
| |
Collapse
|
7
|
Post AR, Luther J, Loveless JM, Ward M, Hewitt S. Enhancing research informatics core user satisfaction through agile practices. JAMIA Open 2021; 4:ooab103. [PMID: 34927001 PMCID: PMC8672926 DOI: 10.1093/jamiaopen/ooab103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 10/06/2021] [Accepted: 11/18/2021] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE The Huntsman Cancer Institute Research Informatics Shared Resource (RISR), a software and database development core facility, sought to address a lack of published operational best practices for research informatics cores. It aimed to use those insights to enhance effectiveness after an increase in team size from 20 to 31 full-time equivalents coincided with a reduction in user satisfaction. MATERIALS AND METHODS RISR migrated from a water-scrum-fall model of software development to agile software development practices, which emphasize iteration and collaboration. RISR's agile implementation emphasizes the product owner role, which is responsible for user engagement and may be particularly valuable in software development that requires close engagement with users like in science. RESULTS All RISR's software development teams implemented agile practices in early 2020. All project teams are led by a product owner who serves as the voice of the user on the development team. Annual user survey scores for service quality and turnaround time recorded 9 months after implementation increased by 17% and 11%, respectively. DISCUSSION RISR is illustrative of the increasing size of research informatics cores and the need to identify best practices for maintaining high effectiveness. Agile practices may address concerns about the fit of software engineering practices in science. The study had one time point after implementing agile practices and one site, limiting its generalizability. CONCLUSIONS Agile software development may substantially increase a research informatics core facility's effectiveness and should be studied further as a potential best practice for how such cores are operated.
Collapse
Affiliation(s)
- Andrew R Post
- Research Informatics Shared Resource, Huntsman
Cancer Institute, University of Utah, Salt Lake City, Utah,
USA
- Department of Biomedical Informatics, University of
Utah, Salt Lake City, Utah, USA
| | - Jared Luther
- Research Informatics Shared Resource, Huntsman
Cancer Institute, University of Utah, Salt Lake City, Utah,
USA
| | - J Maxwell Loveless
- Research Administration, Huntsman Cancer Institute,
University of Utah, Salt Lake City, Utah, USA
| | - Melanie Ward
- Research Administration, Huntsman Cancer Institute,
University of Utah, Salt Lake City, Utah, USA
| | - Shirleen Hewitt
- Research Informatics Shared Resource, Huntsman
Cancer Institute, University of Utah, Salt Lake City, Utah,
USA
| |
Collapse
|
8
|
Groiss S, Somvilla I, Daxböck C, Fuchs J, Lang-Olip I, Stiegler P, Leber B, Liegl-Atzwanger B, Brislinger D. Quantification of increased MUC5AC expression in airway mucus of smoker using an automated image-based approach. Microsc Res Tech 2021; 85:5-18. [PMID: 34288207 DOI: 10.1002/jemt.23879] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 07/06/2021] [Indexed: 12/19/2022]
Abstract
Microscopic analysis of mucus quantity and composition is crucial in research and diagnostics on muco-obstructive diseases. Currently used image-based methods are unable to extract concrete numeric values of mucosal proteins, especially on the expression of the key mucosal proteins MUC5AC and MUC5B. Since their levels increase under pathologic conditions such as extensive exposure to cigarette smoke, it is imperative to quantify them to improve treatment strategies of pulmonary diseases. This study presents a simple, image-based, and high-processing computational method that allows determining the ratio of MUC5AC and MUC5B within the overall airway mucus while providing information on their spatial distribution. The presented pipeline was optimized for automated downstream analysis using a combination of bright field and immunofluorescence imaging suitable for tracheal and bronchial tissue samples, and air-liquid interface (ALI) cell cultures. To validate our approach, we compared tracheal tissue and ALI cell cultures of isolated primary normal human bronchial epithelial cells derived from smokers and nonsmokers. Our data indicated 18-fold higher levels of MUC5AC in submucosal glands of smokers covering about 8% of mucosal areas compared to <1% in nonsmoking individuals, confirming results of previous studies. We further identified a subpopulation of nonsmokers with slightly elevated glandular MUC5AC levels suggesting moderate exposure to second-hand smoke or fine particulate air pollution. Overall, this study demonstrates a novel, user-friendly and freely available tool for digital pathology and the analysis of therapeutic interventions tested in ALI cell cultures.
Collapse
Affiliation(s)
- Silvia Groiss
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center, Medical University of Graz, Graz, Austria
| | - Ina Somvilla
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center, Medical University of Graz, Graz, Austria
| | - Christine Daxböck
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center, Medical University of Graz, Graz, Austria
| | - Julia Fuchs
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center, Medical University of Graz, Graz, Austria
| | - Ingrid Lang-Olip
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center, Medical University of Graz, Graz, Austria
| | - Philipp Stiegler
- Division of Transplantation Surgery, Department of Surgery, Medical University of Graz, Graz, Austria
| | - Bettina Leber
- Division of Transplantation Surgery, Department of Surgery, Medical University of Graz, Graz, Austria
| | - Bernadette Liegl-Atzwanger
- Diagnostic and Research Institute of Pathology, Diagnostic and Research Center for Molecular for Molecular Biomedicine, Medical University Graz, Graz, Austria
| | - Dagmar Brislinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center, Medical University of Graz, Graz, Austria
| |
Collapse
|
9
|
Balaban G, Grytten I, Rand KD, Scheffer L, Sandve GK. Ten simple rules for quick and dirty scientific programming. PLoS Comput Biol 2021; 17:e1008549. [PMID: 33705383 PMCID: PMC7951887 DOI: 10.1371/journal.pcbi.1008549] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Gabriel Balaban
- Biomedical Informatics Group, Department of Informatics, University of Oslo, Oslo, Norway
- PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
| | - Ivar Grytten
- Biomedical Informatics Group, Department of Informatics, University of Oslo, Oslo, Norway
| | - Knut Dagestad Rand
- Institute of Medical Microbiology, Oslo University Hospital, Rikshospitalet, Oslo, Norway
| | - Lonneke Scheffer
- Biomedical Informatics Group, Department of Informatics, University of Oslo, Oslo, Norway
| | - Geir Kjetil Sandve
- Biomedical Informatics Group, Department of Informatics, University of Oslo, Oslo, Norway
- PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
- * E-mail:
| |
Collapse
|
10
|
Ten simple rules for navigating the computational aspect of an interdisciplinary PhD. PLoS Comput Biol 2021; 17:e1008554. [PMID: 33600411 PMCID: PMC7891742 DOI: 10.1371/journal.pcbi.1008554] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
11
|
Lucas TCD, Pollington TM, Davis EL, Hollingsworth TD. Responsible modelling: Unit testing for infectious disease epidemiology. Epidemics 2020; 33:100425. [PMID: 33307443 PMCID: PMC7690327 DOI: 10.1016/j.epidem.2020.100425] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 10/21/2020] [Accepted: 11/21/2020] [Indexed: 11/30/2022] Open
Abstract
Infectious disease epidemiology is increasingly reliant on large-scale computation and inference. Models have guided health policy for epidemics including COVID-19 and Ebola and endemic diseases including malaria and tuberculosis. Yet a coding bug may bias results, yielding incorrect conclusions and actions causing avoidable harm. We are ethically obliged to make our code as free of error as possible. Unit testing is a coding method to avoid such bugs, but it is rarely used in epidemiology. We demonstrate how unit testing can handle the particular quirks of infectious disease models and aim to increase the uptake of this methodology in our field.
Collapse
Affiliation(s)
- Tim C D Lucas
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK. Centre for Environment and Health, School of Public Health, Imperial College, UK.
| | - Timothy M Pollington
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK. MathSys CDT, University of Warwick, UK
| | - Emma L Davis
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
| | - T Déirdre Hollingsworth
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
| |
Collapse
|
12
|
Culina A, van den Berg I, Evans S, Sánchez-Tójar A. Low availability of code in ecology: A call for urgent action. PLoS Biol 2020; 18:e3000763. [PMID: 32722681 PMCID: PMC7386629 DOI: 10.1371/journal.pbio.3000763] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Access to analytical code is essential for transparent and reproducible research. We review the state of code availability in ecology using a random sample of 346 nonmolecular articles published between 2015 and 2019 under mandatory or encouraged code-sharing policies. Our results call for urgent action to increase code availability: only 27% of eligible articles were accompanied by code. In contrast, data were available for 79% of eligible articles, highlighting that code availability is an important limiting factor for computational reproducibility in ecology. Although the percentage of ecological journals with mandatory or encouraged code-sharing policies has increased considerably, from 15% in 2015 to 75% in 2020, our results show that code-sharing policies are not adhered to by most authors. We hope these results will encourage journals, institutions, funding agencies, and researchers to address this alarming situation. Publication of the analytical code underlying a scientific study is increasingly expected or even mandated by journals, allowing others to reproduce the results. However, a survey of more than 300 recently published ecology papers finds the majority have no code publicly available, handicapping efforts to improve scientific transparency.
Collapse
Affiliation(s)
- Antica Culina
- Department of Animal Ecology, Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, the Netherlands
- * E-mail: (AC); (AST)
| | - Ilona van den Berg
- Department of Animal Ecology, Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, the Netherlands
- Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Simon Evans
- Centre for Ecology and Conservation, University of Exeter, Cornwall Campus, Penryn, United Kingdom
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Alfredo Sánchez-Tójar
- Department of Evolutionary Biology, Bielefeld University, Bielefeld, Germany
- * E-mail: (AC); (AST)
| |
Collapse
|
13
|
Stres B, Kronegger L. Shift in the paradigm towards next-generation microbiology. FEMS Microbiol Lett 2020; 366:5533319. [PMID: 31314103 PMCID: PMC6759065 DOI: 10.1093/femsle/fnz159] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 07/15/2019] [Indexed: 12/14/2022] Open
Abstract
In this work, the position of contemporary microbiology is considered from the perspective of scientific success, and a list of historical points and lessons learned from the fields of medical microbiology, microbial ecology and systems biology is presented. In addition, patterns in the development of top-down research topics that emerged over time as well as overlapping ideas and personnel, which are the first signs of trans-domain research activities in the fields of metagenomics, metaproteomics, metatranscriptomics and metabolomics, are explored through analysis of the publication networks of 28 654 papers using the computer programme Pajek. The current state of affairs is defined, and the need for meta-analyses to leverage publication biases in the field of microbiology is put forward as a very important emerging field of microbiology, especially since microbiology is progressively dealing with multi-scale systems. Consequently, the need for cross-fertilisation with other fields/disciplines instead of ‘more microbiology’ is needed to advance the field of microbiology as such. The reader is directed to consider how novel technologies, the introduction of big data approaches and artificial intelligence have transformed microbiology into a multi-scale field and initiated a shift away from its history of mostly manual work and towards a largely technology-, data- and statistics-driven discipline that is often coupled with automation and modelling.
Collapse
Affiliation(s)
- Blaž Stres
- Center for Clinical Neurophysiology, Faculty of Medicine, University of Ljubljana, Vrazov trg 2, SI-1000 Ljubljana, Slovenia.,Group for Microbiology and Microbial Biotechnology, Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, SI-1000 Ljubljana, Slovenia.,Institute of Sanitary Engineering, Faculty of Civil and Geodetic Engineering, University of Ljubljana, Hajdrihova 28, SI-1000 Ljubljana, Slovenia
| | - Luka Kronegger
- Faculty of Social Sciences, University of Ljubljana, Kardeljeva ploščad 5, SI-1000 Ljubljana, Slovenia
| |
Collapse
|
14
|
Abstract
Because of the inherent complexity of bioprocesses, mathematical models are more and more used for process design, control, optimization, etc. These models are generally based on a set of biochemical reactions. Model equations are then derived from mass balance, coupled with empirical kinetics. Biological models are nonlinear and represent processes, which by essence are dynamic and adaptive. The temptation to embed most of the biology is high, with the risk that calibration would not be significant anymore. The most important task for a modeler is thus to ensure a balance between model complexity and ease of use. Since a model should be tailored to the objectives, which will depend on applications and environment, a universal model representing any possible situation is probably not the best option.
Collapse
Affiliation(s)
- Francis Mairet
- Ifremer, Physiology and Biotechnology of Algae laboratory, Nantes, France
| | - Olivier Bernard
- Côte d’Azur University, INRIA, BIOCORE, Sophia-Antipolis Cedex, France
- Sorbonne University, CNRS, LOV, Villefranche-sur-mer, France
- ENERSENSE, Department of Energy and Process Engineering, NTNU, Trondheim, Norway
- * E-mail:
| |
Collapse
|
15
|
Lee J, Heath LS, Grene R, Li S. Comparing time series transcriptome data between plants using a network module finding algorithm. PLANT METHODS 2019; 15:61. [PMID: 31164912 PMCID: PMC6544932 DOI: 10.1186/s13007-019-0440-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Accepted: 05/17/2019] [Indexed: 06/01/2023]
Abstract
BACKGROUND Comparative transcriptome analysis is the comparison of expression patterns between homologous genes in different species. Since most molecular mechanistic studies in plants have been performed in model species, including Arabidopsis and rice, comparative transcriptome analysis is particularly important for functional annotation of genes in diverse plant species. Many biological processes, such as embryo development, are highly conserved between different plant species. The challenge is to establish one-to-one mapping of the developmental stages between two species. RESULTS In this manuscript, we solve this problem by converting the gene expression patterns into co-expression networks and then apply network module finding algorithms to the cross-species co-expression network. We describe how such analyses are carried out using bash scripts for preliminary data processing followed by using the R programming language for module finding with a simulated annealing method. We also provide instructions on how to visualize the resulting co-expression networks across species. CONCLUSIONS We provide a comprehensive pipeline from installing software and downloading raw transcriptome data to predicting homologous genes and finding orthologous co-expression networks. From the example provided, we demonstrate the application of our method to reveal functional conservation and divergence of genes in two plant species.
Collapse
Affiliation(s)
- Jiyoung Lee
- Genetics, Bioinformatics and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061 USA
- School of Plant and Environmental Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061 USA
| | - Lenwood S. Heath
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061 USA
| | - Ruth Grene
- School of Plant and Environmental Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061 USA
| | - Song Li
- Genetics, Bioinformatics and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061 USA
- School of Plant and Environmental Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061 USA
| |
Collapse
|
16
|
|
17
|
Russell S, Bennett TD, Ghosh D. Software engineering principles to improve quality and performance of R software. PeerJ Comput Sci 2019; 5:e175. [PMID: 33816828 PMCID: PMC7924430 DOI: 10.7717/peerj-cs.175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 01/11/2019] [Indexed: 06/12/2023]
Abstract
Today's computational researchers are expected to be highly proficient in using software to solve a wide range of problems ranging from processing large datasets to developing personalized treatment strategies from a growing range of options. Researchers are well versed in their own field, but may lack formal training and appropriate mentorship in software engineering principles. Two major themes not covered in most university coursework nor current literature are software testing and software optimization. Through a survey of all currently available Comprehensive R Archive Network packages, we show that reproducible and replicable software tests are frequently not available and that many packages do not appear to employ software performance and optimization tools and techniques. Through use of examples from an existing R package, we demonstrate powerful testing and optimization techniques that can improve the quality of any researcher's software.
Collapse
Affiliation(s)
- Seth Russell
- University of Colorado Data Science to Patient Value, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Tellen D. Bennett
- University of Colorado Data Science to Patient Value, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Pediatric Critical Care, University of Colorado School of Medicine, Aurora, CO, USA
| | - Debashis Ghosh
- University of Colorado Data Science to Patient Value, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA
| |
Collapse
|
18
|
Affiliation(s)
- Benjamin D. Lee
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
19
|
Abstract
Most studies in the life sciences and other disciplines involve generating and analyzing numerical data of some type as the foundation for scientific findings. Working with numerical data involves multiple challenges. These include reproducible data acquisition, appropriate data storage, computationally correct data analysis, appropriate reporting and presentation of the results, and suitable data interpretation. Finding and correcting mistakes when analyzing and interpreting data can be frustrating and time-consuming. Presenting or publishing incorrect results is embarrassing but not uncommon. Particular sources of errors are inappropriate use of statistical methods and incorrect interpretation of data by software. To detect mistakes as early as possible, one should frequently check intermediate and final results for plausibility. Clearly documenting how quantities and results were obtained facilitates correcting mistakes. Properly understanding data is indispensable for reaching well-founded conclusions from experimental results. Units are needed to make sense of numbers, and uncertainty should be estimated to know how meaningful results are. Descriptive statistics and significance testing are useful tools for interpreting numerical results if applied correctly. However, blindly trusting in computed numbers can also be misleading, so it is worth thinking about how data should be summarized quantitatively to properly answer the question at hand. Finally, a suitable form of presentation is needed so that the data can properly support the interpretation and findings. By additionally sharing the relevant data, others can access, understand, and ultimately make use of the results. These quick tips are intended to provide guidelines for correctly interpreting, efficiently analyzing, and presenting numerical data in a useful way.
Collapse
Affiliation(s)
| | - Sabrina Rueschenbaum
- Department of Internal Medicine 1, University Hospital Frankfurt, Goethe University, Theodor-Stern-Kai 7, Frankfurt (Main), Germany
| |
Collapse
|
20
|
Biscarini F, Cozzi P, Orozco-Ter Wengel P. Lessons learnt on the analysis of large sequence data in animal genomics. Anim Genet 2018; 49:147-158. [PMID: 29624711 DOI: 10.1111/age.12655] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/11/2018] [Indexed: 11/28/2022]
Abstract
The 'omics revolution has made a large amount of sequence data available to researchers and the industry. This has had a profound impact in the field of bioinformatics, stimulating unprecedented advancements in this discipline. Mostly, this is usually looked at from the perspective of human 'omics, in particular human genomics. Plant and animal genomics, however, have also been deeply influenced by next-generation sequencing technologies, with several genomics applications now popular among researchers and the breeding industry. Genomics tends to generate huge amounts of data, and genomic sequence data account for an increasing proportion of big data in biological sciences, due largely to decreasing sequencing and genotyping costs and to large-scale sequencing and resequencing projects. The analysis of big data poses a challenge to scientists, as data gathering currently takes place at a faster pace than does data processing and analysis, and the associated computational burden is increasingly taxing, making even simple manipulation, visualization and transferring of data a cumbersome operation. The time consumed by the processing and analysing of huge data sets may be at the expense of data quality assessment and critical interpretation. Additionally, when analysing lots of data, something is likely to go awry-the software may crash or stop-and it can be very frustrating to track the error. We herein review the most relevant issues related to tackling these challenges and problems, from the perspective of animal genomics, and provide researchers that lack extensive computing experience with guidelines that will help when processing large genomic data sets.
Collapse
Affiliation(s)
- F Biscarini
- CNR-IBBA, Via Bassini 15, 20133, Milan, Italy.,School of Medicine, Cardiff University, Heath Park, CF14 4XN, Cardiff, UK
| | - P Cozzi
- CNR-IBBA, Via Bassini 15, 20133, Milan, Italy.,Department of Bioinformatics and Biostatistics, PTP Science Park, Via Einstein, 26900, Lodi, Italy
| | - P Orozco-Ter Wengel
- School of Biosciences, Cardiff University, Museum Avenue, CF10 3AX, Cardiff, UK
| |
Collapse
|
21
|
Chicco D. Ten quick tips for machine learning in computational biology. BioData Min 2017; 10:35. [PMID: 29234465 PMCID: PMC5721660 DOI: 10.1186/s13040-017-0155-3] [Citation(s) in RCA: 327] [Impact Index Per Article: 46.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 11/08/2017] [Indexed: 11/12/2022] Open
Abstract
Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experience to run a data mining project effectively, and therefore can follow incorrect practices, that may lead to common mistakes or over-optimistic results. With this review, we present ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors that we observed hundreds of times in multiple bioinformatics projects. We believe our ten suggestions can strongly help any machine learning practitioner to carry on a successful project in computational biology and related sciences.
Collapse
Affiliation(s)
- Davide Chicco
- Princess Margaret Cancer Centre, PMCR Tower 11-401, 101 College Street, Toronto, Ontario, M5G 1L7 Canada
| |
Collapse
|
22
|
Fidler F, Chee YE, Wintle BC, Burgman MA, McCarthy MA, Gordon A. Metaresearch for Evaluating Reproducibility in Ecology and Evolution. Bioscience 2017; 67:282-289. [PMID: 28596617 PMCID: PMC5384162 DOI: 10.1093/biosci/biw159] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent replication projects in other disciplines have uncovered disturbingly low levels of reproducibility, suggesting that those research literatures may contain unverifiable claims. The conditions contributing to irreproducibility in other disciplines are also present in ecology. These include a large discrepancy between the proportion of "positive" or "significant" results and the average statistical power of empirical research, incomplete reporting of sampling stopping rules and results, journal policies that discourage replication studies, and a prevailing publish-or-perish research culture that encourages questionable research practices. We argue that these conditions constitute sufficient reason to systematically evaluate the reproducibility of the evidence base in ecology and evolution. In some cases, the direct replication of ecological research is difficult because of strong temporal and spatial dependencies, so here, we propose metaresearch projects that will provide proxy measures of reproducibility.
Collapse
Affiliation(s)
- Fiona Fidler
- Associate Professor Fiona Fidler holds a joint appointment in the School of BioSciences and the School of Historical and Philosophical Studies (History and Philosophy of Science Discipline) at the University of Melbourne, Australia; Fiona is interested in how scientists and experts make decisions. Bonnie C. Wintle is a postdoctoral fellow and Mark Burgman and Michael McCarthy are professors in the School of BioSciences at the University of Melbourne, Australia; they are interested in a broad range of topics related to environmental decisionmaking. Bonnie Wintle is now a research fellow at the Centre for Research in the Arts, Social Sciences and Humanities, University of Cambridge. Yung En Chee is a senior research fellow in the School of Ecosystem and Forest Sciences at the University of Melbourne, Australia; Yung applies ecological and decision-analytic theory and models to conservation problems. Ascelin Gordon is a senior research fellow in the Interdisciplinary Conservation Science Research Group in the School of Global, Urban, and Social Studies at RMIT University, in Melbourne, Australia; Ascelin is broadly interested in modeling approaches for understanding the impacts of environmental policies. FF, YC, BW, MB and MM were involved in discussion group about reproducibility and type 1 errors in ecology in 2014, which helped develop the outline for this article. AG and FF independently discussed the application of open science initiatives in ecology. FF wrote the first draft; YC wrote sections on data and code sharing with substantial input from AG. BW, MB, and MM made edits throughout
| | - Yung En Chee
- Associate Professor Fiona Fidler holds a joint appointment in the School of BioSciences and the School of Historical and Philosophical Studies (History and Philosophy of Science Discipline) at the University of Melbourne, Australia; Fiona is interested in how scientists and experts make decisions. Bonnie C. Wintle is a postdoctoral fellow and Mark Burgman and Michael McCarthy are professors in the School of BioSciences at the University of Melbourne, Australia; they are interested in a broad range of topics related to environmental decisionmaking. Bonnie Wintle is now a research fellow at the Centre for Research in the Arts, Social Sciences and Humanities, University of Cambridge. Yung En Chee is a senior research fellow in the School of Ecosystem and Forest Sciences at the University of Melbourne, Australia; Yung applies ecological and decision-analytic theory and models to conservation problems. Ascelin Gordon is a senior research fellow in the Interdisciplinary Conservation Science Research Group in the School of Global, Urban, and Social Studies at RMIT University, in Melbourne, Australia; Ascelin is broadly interested in modeling approaches for understanding the impacts of environmental policies. FF, YC, BW, MB and MM were involved in discussion group about reproducibility and type 1 errors in ecology in 2014, which helped develop the outline for this article. AG and FF independently discussed the application of open science initiatives in ecology. FF wrote the first draft; YC wrote sections on data and code sharing with substantial input from AG. BW, MB, and MM made edits throughout
| | - Bonnie C Wintle
- Associate Professor Fiona Fidler holds a joint appointment in the School of BioSciences and the School of Historical and Philosophical Studies (History and Philosophy of Science Discipline) at the University of Melbourne, Australia; Fiona is interested in how scientists and experts make decisions. Bonnie C. Wintle is a postdoctoral fellow and Mark Burgman and Michael McCarthy are professors in the School of BioSciences at the University of Melbourne, Australia; they are interested in a broad range of topics related to environmental decisionmaking. Bonnie Wintle is now a research fellow at the Centre for Research in the Arts, Social Sciences and Humanities, University of Cambridge. Yung En Chee is a senior research fellow in the School of Ecosystem and Forest Sciences at the University of Melbourne, Australia; Yung applies ecological and decision-analytic theory and models to conservation problems. Ascelin Gordon is a senior research fellow in the Interdisciplinary Conservation Science Research Group in the School of Global, Urban, and Social Studies at RMIT University, in Melbourne, Australia; Ascelin is broadly interested in modeling approaches for understanding the impacts of environmental policies. FF, YC, BW, MB and MM were involved in discussion group about reproducibility and type 1 errors in ecology in 2014, which helped develop the outline for this article. AG and FF independently discussed the application of open science initiatives in ecology. FF wrote the first draft; YC wrote sections on data and code sharing with substantial input from AG. BW, MB, and MM made edits throughout
| | - Mark A Burgman
- Associate Professor Fiona Fidler holds a joint appointment in the School of BioSciences and the School of Historical and Philosophical Studies (History and Philosophy of Science Discipline) at the University of Melbourne, Australia; Fiona is interested in how scientists and experts make decisions. Bonnie C. Wintle is a postdoctoral fellow and Mark Burgman and Michael McCarthy are professors in the School of BioSciences at the University of Melbourne, Australia; they are interested in a broad range of topics related to environmental decisionmaking. Bonnie Wintle is now a research fellow at the Centre for Research in the Arts, Social Sciences and Humanities, University of Cambridge. Yung En Chee is a senior research fellow in the School of Ecosystem and Forest Sciences at the University of Melbourne, Australia; Yung applies ecological and decision-analytic theory and models to conservation problems. Ascelin Gordon is a senior research fellow in the Interdisciplinary Conservation Science Research Group in the School of Global, Urban, and Social Studies at RMIT University, in Melbourne, Australia; Ascelin is broadly interested in modeling approaches for understanding the impacts of environmental policies. FF, YC, BW, MB and MM were involved in discussion group about reproducibility and type 1 errors in ecology in 2014, which helped develop the outline for this article. AG and FF independently discussed the application of open science initiatives in ecology. FF wrote the first draft; YC wrote sections on data and code sharing with substantial input from AG. BW, MB, and MM made edits throughout
| | - Michael A McCarthy
- Associate Professor Fiona Fidler holds a joint appointment in the School of BioSciences and the School of Historical and Philosophical Studies (History and Philosophy of Science Discipline) at the University of Melbourne, Australia; Fiona is interested in how scientists and experts make decisions. Bonnie C. Wintle is a postdoctoral fellow and Mark Burgman and Michael McCarthy are professors in the School of BioSciences at the University of Melbourne, Australia; they are interested in a broad range of topics related to environmental decisionmaking. Bonnie Wintle is now a research fellow at the Centre for Research in the Arts, Social Sciences and Humanities, University of Cambridge. Yung En Chee is a senior research fellow in the School of Ecosystem and Forest Sciences at the University of Melbourne, Australia; Yung applies ecological and decision-analytic theory and models to conservation problems. Ascelin Gordon is a senior research fellow in the Interdisciplinary Conservation Science Research Group in the School of Global, Urban, and Social Studies at RMIT University, in Melbourne, Australia; Ascelin is broadly interested in modeling approaches for understanding the impacts of environmental policies. FF, YC, BW, MB and MM were involved in discussion group about reproducibility and type 1 errors in ecology in 2014, which helped develop the outline for this article. AG and FF independently discussed the application of open science initiatives in ecology. FF wrote the first draft; YC wrote sections on data and code sharing with substantial input from AG. BW, MB, and MM made edits throughout
| | - Ascelin Gordon
- Associate Professor Fiona Fidler holds a joint appointment in the School of BioSciences and the School of Historical and Philosophical Studies (History and Philosophy of Science Discipline) at the University of Melbourne, Australia; Fiona is interested in how scientists and experts make decisions. Bonnie C. Wintle is a postdoctoral fellow and Mark Burgman and Michael McCarthy are professors in the School of BioSciences at the University of Melbourne, Australia; they are interested in a broad range of topics related to environmental decisionmaking. Bonnie Wintle is now a research fellow at the Centre for Research in the Arts, Social Sciences and Humanities, University of Cambridge. Yung En Chee is a senior research fellow in the School of Ecosystem and Forest Sciences at the University of Melbourne, Australia; Yung applies ecological and decision-analytic theory and models to conservation problems. Ascelin Gordon is a senior research fellow in the Interdisciplinary Conservation Science Research Group in the School of Global, Urban, and Social Studies at RMIT University, in Melbourne, Australia; Ascelin is broadly interested in modeling approaches for understanding the impacts of environmental policies. FF, YC, BW, MB and MM were involved in discussion group about reproducibility and type 1 errors in ecology in 2014, which helped develop the outline for this article. AG and FF independently discussed the application of open science initiatives in ecology. FF wrote the first draft; YC wrote sections on data and code sharing with substantial input from AG. BW, MB, and MM made edits throughout
| |
Collapse
|
23
|
|
24
|
Kursawe J, Bardenet R, Zartman JJ, Baker RE, Fletcher AG. Robust cell tracking in epithelial tissues through identification of maximum common subgraphs. J R Soc Interface 2016; 13:20160725. [PMID: 28334699 PMCID: PMC5134023 DOI: 10.1098/rsif.2016.0725] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Accepted: 10/17/2016] [Indexed: 11/30/2022] Open
Abstract
Tracking of cells in live-imaging microscopy videos of epithelial sheets is a powerful tool for investigating fundamental processes in embryonic development. Characterizing cell growth, proliferation, intercalation and apoptosis in epithelia helps us to understand how morphogenetic processes such as tissue invagination and extension are locally regulated and controlled. Accurate cell tracking requires correctly resolving cells entering or leaving the field of view between frames, cell neighbour exchanges, cell removals and cell divisions. However, current tracking methods for epithelial sheets are not robust to large morphogenetic deformations and require significant manual interventions. Here, we present a novel algorithm for epithelial cell tracking, exploiting the graph-theoretic concept of a 'maximum common subgraph' to track cells between frames of a video. Our algorithm does not require the adjustment of tissue-specific parameters, and scales in sub-quadratic time with tissue size. It does not rely on precise positional information, permitting large cell movements between frames and enabling tracking in datasets acquired at low temporal resolution due to experimental constraints such as phototoxicity. To demonstrate the method, we perform tracking on the Drosophila embryonic epidermis and compare cell-cell rearrangements to previous studies in other tissues. Our implementation is open source and generally applicable to epithelial tissues.
Collapse
Affiliation(s)
- Jochen Kursawe
- Mathematical Institute, University of Oxford, Andrew Wiles Building, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, UK
| | - Rémi Bardenet
- CNRS and CRIStAL, Université de Lille, 59651 Villeneuve d'Ascq, France
| | - Jeremiah J Zartman
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, 205D McCourtney Hall of Molecular Science and Engineering, Notre Dame, IN 46556, USA
| | - Ruth E Baker
- Mathematical Institute, University of Oxford, Andrew Wiles Building, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, UK
| | - Alexander G Fletcher
- School of Mathematics and Statistics, University of Sheffield, Hicks Building, Hounsfield Road, Sheffield S3 7RH, UK
- Bateson Centre, University of Sheffield, Sheffield S10 2TN, UK
| |
Collapse
|
25
|
Lewis J, Breeze CE, Charlesworth J, Maclaren OJ, Cooper J. Where next for the reproducibility agenda in computational biology? BMC SYSTEMS BIOLOGY 2016; 10:52. [PMID: 27422148 PMCID: PMC4946111 DOI: 10.1186/s12918-016-0288-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Accepted: 06/08/2016] [Indexed: 11/24/2022]
Abstract
Background The concept of reproducibility is a foundation of the scientific method. With the arrival of fast and powerful computers over the last few decades, there has been an explosion of results based on complex computational analyses and simulations. The reproducibility of these results has been addressed mainly in terms of exact replicability or numerical equivalence, ignoring the wider issue of the reproducibility of conclusions through equivalent, extended or alternative methods. Results We use case studies from our own research experience to illustrate how concepts of reproducibility might be applied in computational biology. Several fields have developed ‘minimum information’ checklists to support the full reporting of computational simulations, analyses and results, and standardised data formats and model description languages can facilitate the use of multiple systems to address the same research question. We note the importance of defining the key features of a result to be reproduced, and the expected agreement between original and subsequent results. Dynamic, updatable tools for publishing methods and results are becoming increasingly common, but sometimes come at the cost of clear communication. In general, the reproducibility of computational research is improving but would benefit from additional resources and incentives. Conclusions We conclude with a series of linked recommendations for improving reproducibility in computational biology through communication, policy, education and research practice. More reproducible research will lead to higher quality conclusions, deeper understanding and more valuable knowledge.
Collapse
Affiliation(s)
- Joanna Lewis
- Centre for Maths and Physics in the Life Sciences and Experimental Biology, University College London, Physics Building, Gower Place, London, WC1E 6BT, UK. .,NIHR Health Protection Research Unit in Modelling Methodology, Department of Infectious Disease Epidemiology, Imperial College London, St Mary's Campus, Norfolk Place, London, W2 1PG, UK.
| | - Charles E Breeze
- UCL Cancer Institute, University College London, 72 Huntley St, London, WC1E 6DD, UK
| | - Jane Charlesworth
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| | - Oliver J Maclaren
- Department of Mathematics, University of Auckland, Auckland, 1142, New Zealand.,Department of Engineering Science, University of Auckland, Auckland, 1142, New Zealand
| | - Jonathan Cooper
- Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK
| |
Collapse
|
26
|
Perez-Riverol Y, Gatto L, Wang R, Sachsenberg T, Uszkoreit J, Leprevost FDV, Fufezan C, Ternent T, Eglen SJ, Katz DS, Pollard TJ, Konovalov A, Flight RM, Blin K, Vizcaíno JA. Ten Simple Rules for Taking Advantage of Git and GitHub. PLoS Comput Biol 2016; 12:e1004947. [PMID: 27415786 PMCID: PMC4945047 DOI: 10.1371/journal.pcbi.1004947] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
- * E-mail: (YPR); (JAV)
| | - Laurent Gatto
- Computational Proteomics Unit, Cambridge Systems Biology Centre, University of Cambridge, Cambridge, United Kingdom
| | - Rui Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Timo Sachsenberg
- Applied Bioinformatics and Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Julian Uszkoreit
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, Bochum, Germany
| | | | - Christian Fufezan
- Institute of Plant Biology and Biotechnology, University of Münster, Münster, Germany
| | - Tobias Ternent
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Stephen J. Eglen
- Centre for Mathematical Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Daniel S. Katz
- National Center for Supercomputing Applications and Graduate School of Library and Information Science, University of Illinois, Urbana, Illinois, United States of America
| | - Tom J. Pollard
- MIT Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Alexander Konovalov
- Centre for Interdisciplinary Research in Computational Algebra, University of St Andrews, St Andrews, United Kingdom
| | - Robert M. Flight
- Department of Molecular Biology and Biochemistry, Markey Cancer Center, Resource Center for Stable Isotope-Resolved Metabolomics, University of Kentucky, Lexington, Kentucky, United States of America
| | - Kai Blin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Hørsholm, Denmark
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
- * E-mail: (YPR); (JAV)
| |
Collapse
|
27
|
Abstract
Extremely large datasets have become routine in biology. However, performing a computational analysis of a large dataset can be overwhelming, especially for novices. Here, we present a step-by-step guide to computing workflows with the biologist end-user in mind. Starting from a foundation of sound data management practices, we make specific recommendations on how to approach and perform computational analyses of large datasets, with a view to enabling sound, reproducible biological research.
Collapse
Affiliation(s)
- Ashley Shade
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
- BEACON Center for the Study of Evolution in Action, Michigan State University, East Lansing, Michigan, United States of America
| | - Tracy K. Teal
- BEACON Center for the Study of Evolution in Action, Michigan State University, East Lansing, Michigan, United States of America
- Data Carpentry, datacarpentry.org
| |
Collapse
|
28
|
Gutierrez JB, Harb OS, Zheng J, Tisch DJ, Charlebois ED, Stoeckert CJ, Sullivan SA. A Framework for Global Collaborative Data Management for Malaria Research. Am J Trop Med Hyg 2015; 93:124-132. [PMID: 26259944 PMCID: PMC4574270 DOI: 10.4269/ajtmh.15-0003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2015] [Accepted: 07/01/2015] [Indexed: 01/04/2023] Open
Abstract
Data generated during the course of research activities carried out by the International Centers of Excellence for Malaria Research (ICEMR) is heterogeneous, large, and multi-scaled. The complexity of federated and global data operations and the diverse uses planned for the data pose tremendous challenges and opportunities for collaborative research. In this article, we present the foundational principles for data management across the ICEMR Program, the logistics associated with multiple aspects of the data life cycle, and describe a pilot centralized web information system created in PlasmoDB to query a subset of this data. The paradigm proposed as a solution for the data operations in the ICEMR Program is widely applicable to large, multifaceted research projects, and could be reproduced in other contexts that require sophisticated data management.
Collapse
Affiliation(s)
- Juan B. Gutierrez
- Institute of Bioinformatics and Department of Mathematics, University of Georgia, Athens, Georgia; Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania; Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania; Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania; The Center for Global Health and Diseases, Case Western Reserve University School of Medicine, Cleveland, Ohio; Department of Medicine, University of California, San Francisco, California; New York University Center for Genomics and Systems Biology, New York, New York
| | | | | | | | | | | | | |
Collapse
|
29
|
Woods NT, Jhuraney A, Monteiro ANA. Incorporating computational resources in a cancer research program. Hum Genet 2015; 134:467-78. [PMID: 25324189 PMCID: PMC4401625 DOI: 10.1007/s00439-014-1496-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Accepted: 09/29/2014] [Indexed: 10/24/2022]
Abstract
Recent technological advances have transformed cancer genetics research. These advances have served as the basis for the generation of a number of richly annotated datasets relevant to the cancer geneticist. In addition, many of these technologies are now within reach of smaller laboratories to answer specific biological questions. Thus, one of the most pressing issues facing an experimental cancer biology research program in genetics is incorporating data from multiple sources to annotate, visualize, and analyze the system under study. Fortunately, there are several computational resources to aid in this process. However, a significant effort is required to adapt a molecular biology-based research program to take advantage of these datasets. Here, we discuss the lessons learned in our laboratory and share several recommendations to make this transition effective. This article is not meant to be a comprehensive evaluation of all the available resources, but rather highlight those that we have incorporated into our laboratory and how to choose the most appropriate ones for your research program.
Collapse
Affiliation(s)
- Nicholas T Woods
- Cancer Epidemiology Program, H. Lee Moffitt Cancer Center and Research Institute, 12902 Magnolia Drive, Tampa, FL, 33612, USA
| | | | | |
Collapse
|
30
|
|
31
|
Boulesteix AL. Ten simple rules for reducing overoptimistic reporting in methodological computational research. PLoS Comput Biol 2015; 11:e1004191. [PMID: 25905639 PMCID: PMC4407963 DOI: 10.1371/journal.pcbi.1004191] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Anne-Laure Boulesteix
- Institute for Medical Informatics, Biometry and Epidemiology, Ludwig Maximilians University, Munich, Germany
- * E-mail:
| |
Collapse
|
32
|
Pernet C, Poline JB. Improving functional magnetic resonance imaging reproducibility. Gigascience 2015; 4:15. [PMID: 25830019 PMCID: PMC4379514 DOI: 10.1186/s13742-015-0055-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2014] [Accepted: 03/15/2015] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND The ability to replicate an entire experiment is crucial to the scientific method. With the development of more and more complex paradigms, and the variety of analysis techniques available, fMRI studies are becoming harder to reproduce. RESULTS In this article, we aim to provide practical advice to fMRI researchers not versed in computing, in order to make studies more reproducible. All of these steps require researchers to move towards a more open science, in which all aspects of the experimental method are documented and shared. CONCLUSION Only by sharing experiments, data, metadata, derived data and analysis workflows will neuroimaging establish itself as a true data science.
Collapse
Affiliation(s)
- Cyril Pernet
- Centre for Clinical Brain Sciences, Neuroimaging Sciences, University of Edinburgh Chancellor’s Building, 49 Little France Crescent, Edinburgh, EH16 4SB UK
| | - Jean-Baptiste Poline
- Henry H Wheeler, Jr Brain Imaging Center, Helen Wills Neuroscience Institute, University of California at Berkeley, 3210 Tolman Hall, Berkeley, CA 94720-1650 USA
| |
Collapse
|
33
|
Nelissen BGL, van Herwaarden JA, Moll FL, van Diest PJ, Pasterkamp G. SlideToolkit: an assistive toolset for the histological quantification of whole slide images. PLoS One 2014; 9:e110289. [PMID: 25372389 PMCID: PMC4220929 DOI: 10.1371/journal.pone.0110289] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 09/11/2014] [Indexed: 11/24/2022] Open
Abstract
The demand for accurate and reproducible phenotyping of a disease trait increases with the rising number of biobanks and genome wide association studies. Detailed analysis of histology is a powerful way of phenotyping human tissues. Nonetheless, purely visual assessment of histological slides is time-consuming and liable to sampling variation and optical illusions and thereby observer variation, and external validation may be cumbersome. Therefore, within our own biobank, computerized quantification of digitized histological slides is often preferred as a more precise and reproducible, and sometimes more sensitive approach. Relatively few free toolkits are, however, available for fully digitized microscopic slides, usually known as whole slides images. In order to comply with this need, we developed the slideToolkit as a fast method to handle large quantities of low contrast whole slides images using advanced cell detecting algorithms. The slideToolkit has been developed for modern personal computers and high-performance clusters (HPCs) and is available as an open-source project on github.com. We here illustrate the power of slideToolkit by a repeated measurement of 303 digital slides containing CD3 stained (DAB) abdominal aortic aneurysm tissue from a tissue biobank. Our workflow consists of four consecutive steps. In the first step (acquisition), whole slide images are collected and converted to TIFF files. In the second step (preparation), files are organized. The third step (tiles), creates multiple manageable tiles to count. In the fourth step (analysis), tissue is analyzed and results are stored in a data set. Using this method, two consecutive measurements of 303 slides showed an intraclass correlation of 0.99. In conclusion, slideToolkit provides a free, powerful and versatile collection of tools for automated feature analysis of whole slide images to create reproducible and meaningful phenotypic data sets.
Collapse
Affiliation(s)
- Bastiaan G. L. Nelissen
- Department of Vascular Surgery, University Medical Center Utrecht, Utrecht, The Netherlands
- * E-mail:
| | | | - Frans L. Moll
- Department of Vascular Surgery, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Paul J. van Diest
- Department of Pathology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Gerard Pasterkamp
- Laboratory of Experimental Cardiology, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
34
|
Polyglot programming in applications used for genetic data analysis. BIOMED RESEARCH INTERNATIONAL 2014; 2014:253013. [PMID: 25197633 PMCID: PMC4150456 DOI: 10.1155/2014/253013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Revised: 07/22/2014] [Accepted: 07/31/2014] [Indexed: 11/29/2022]
Abstract
Applications used for the analysis of genetic data process large volumes of data with complex algorithms. High performance, flexibility, and a user interface with a web browser are required by these solutions, which can be achieved by using multiple programming languages. In this study, I developed a freely available framework for building software to analyze genetic data, which uses C++, Python, JavaScript, and several libraries. This system was used to build a number of genetic data processing applications and it reduced the time and costs of development.
Collapse
|