1
|
van Kampen AHC, Mahamune U, Jongejan A, van Schaik BDC, Balashova D, Lashgari D, Pras-Raves M, Wever EJM, Dane AD, García-Valiente R, Moerland PD. ENCORE: a practical implementation to improve reproducibility and transparency of computational research. Nat Commun 2024; 15:8117. [PMID: 39284801 PMCID: PMC11405857 DOI: 10.1038/s41467-024-52446-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 09/06/2024] [Indexed: 09/20/2024] Open
Abstract
Reproducibility of computational research is often challenging despite established guidelines and best practices. Translating these guidelines into practical applications remains difficult. Here, we present ENCORE, an approach to enhance transparency and reproducibility by guiding researchers in how to structure and document a computational project. ENCORE builds on previous efforts in computational reproducibility and integrates all project components into a standardized file system structure. It utilizes pre-defined files as documentation templates, leverages GitHub for software versioning, and includes an HTML-based navigator. ENCORE is designed to be agnostic to the type of computational project, data, programming language, and ICT infrastructure, and does not rely on specific software tools. We also share our group's experience using ENCORE, highlighting that the most significant challenge to the routine adoption of approaches like ours is the lack of incentives to motivate researchers to dedicate sufficient time and effort to ensure reproducibility.
Collapse
Affiliation(s)
- Antoine H C van Kampen
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands.
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands.
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands.
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, Amsterdam, Netherlands.
| | - Utkarsh Mahamune
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands
| | - Aldo Jongejan
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
| | - Barbera D C van Schaik
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands
| | - Daria Balashova
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands
| | - Danial Lashgari
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands
| | - Mia Pras-Raves
- Amsterdam UMC, University of Amsterdam, Department of Clinical Chemistry, Laboratory Genetic Metabolic Diseases, Meibergdreef 9, Amsterdam, Netherlands
- Core Facility Metabolomics, Amsterdam UMC, Amsterdam, Netherlands
| | - Eric J M Wever
- Amsterdam UMC, University of Amsterdam, Department of Clinical Chemistry, Laboratory Genetic Metabolic Diseases, Meibergdreef 9, Amsterdam, Netherlands
- Core Facility Metabolomics, Amsterdam UMC, Amsterdam, Netherlands
| | - Adrie D Dane
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Core Facility Metabolomics, Amsterdam UMC, Amsterdam, Netherlands
| | - Rodrigo García-Valiente
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands
| | - Perry D Moerland
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands
| |
Collapse
|
2
|
Gallagher K, Creswell R, Lambert B, Robinson M, Lok Lei C, Mirams GR, Gavaghan DJ. Ten simple rules for training scientists to make better software. PLoS Comput Biol 2024; 20:e1012410. [PMID: 39264985 PMCID: PMC11392269 DOI: 10.1371/journal.pcbi.1012410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2024] Open
Affiliation(s)
- Kit Gallagher
- Doctoral Training Centre, University of Oxford, Oxford, United Kingdom
| | - Richard Creswell
- Department of Computer Science, University of Oxford, Oxford, United Kingdom
| | - Ben Lambert
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Martin Robinson
- Department of Computer Science, University of Oxford, Oxford, United Kingdom
| | - Chon Lok Lei
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Macau, China
| | - Gary R Mirams
- Centre for Mathematical Medicine & Biology, School of Mathematical Sciences, University of Nottingham, Nottingham, United Kingdom
| | - David J Gavaghan
- Doctoral Training Centre, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
3
|
Poličar PG, Špendl M, Curk T, Zupan B. Teaching bioinformatics through the analysis of SARS-CoV-2: project-based training for computer science students. Bioinformatics 2024; 40:i20-i29. [PMID: 38940150 PMCID: PMC11211835 DOI: 10.1093/bioinformatics/btae208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION We learn more effectively through experience and reflection than through passive reception of information. Bioinformatics offers an excellent opportunity for project-based learning. Molecular data are abundant and accessible in open repositories, and important concepts in biology can be rediscovered by reanalyzing the data. RESULTS In the manuscript, we report on five hands-on assignments we designed for master's computer science students to train them in bioinformatics for genomics. These assignments are the cornerstones of our introductory bioinformatics course and are centered around the study of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). They assume no prior knowledge of molecular biology but do require programming skills. Through these assignments, students learn about genomes and genes, discover their composition and function, relate SARS-CoV-2 to other viruses, and learn about the body's response to infection. Student evaluation of the assignments confirms their usefulness and value, their appropriate mastery-level difficulty, and their interesting and motivating storyline. AVAILABILITY AND IMPLEMENTATION The course materials are freely available on GitHub at https://github.com/IB-ULFRI.
Collapse
Affiliation(s)
- Pavlin G Poličar
- Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, 1000 Ljubljana, Slovenia
| | - Martin Špendl
- Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, 1000 Ljubljana, Slovenia
| | - Tomaž Curk
- Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, 1000 Ljubljana, Slovenia
| | - Blaž Zupan
- Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, 1000 Ljubljana, Slovenia
- Department of Education, Innovation and Technology, Baylor College of Medicine, 1 Baylor Plz, Houston, TX 77030, United States
| |
Collapse
|
4
|
Cain JY, Yu JS, Bagheri N. The in silico lab: Improving academic code using lessons from biology. Cell Syst 2023; 14:1-6. [PMID: 36657389 DOI: 10.1016/j.cels.2022.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 10/27/2022] [Accepted: 11/22/2022] [Indexed: 01/19/2023]
Abstract
"Good code" is often regarded as a nebulous, impractical ideal. Common best practices toward improving code quality can be inaccessible to those without a rigorous computer science or software engineering background, contributing to a gap between advancing scientific research and FAIR practices. We seek to equip researchers with the necessary background and context to tackle the challenge of improving code quality in computational biology research using analogies from biology to synthesize why certain best practices are critical for advancing computational research. Improving code quality requires active stewardship; we encourage researchers to deliberately adopt and share practices that ensure reusability, repeatability, and reproducibility.
Collapse
Affiliation(s)
- Jason Y Cain
- Department of Chemical Engineering, University of Washington, Seattle, WA 98195, USA
| | - Jessica S Yu
- Department of Biology, University of Washington, Seattle, WA 98195, USA
| | - Neda Bagheri
- Department of Chemical Engineering, University of Washington, Seattle, WA 98195, USA; Department of Biology, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
5
|
Ten simple rules for teaching yourself R. PLoS Comput Biol 2022; 18:e1010372. [PMID: 36048770 PMCID: PMC9436135 DOI: 10.1371/journal.pcbi.1010372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
6
|
|
7
|
Zitomer RA, Karr J, Kerstens M, Perry L, Ruth K, Adrean L, Austin S, Cornelius J, Dachenhaus J, Dinkins J, Harrington A, Kim H, Owens T, Revekant C, Schroeder V, Sink C, Valente JJ, Woodis E, Rivers JW. Ten simple rules for getting started with statistics in graduate school. PLoS Comput Biol 2022; 18:e1010033. [PMID: 35446846 PMCID: PMC9022819 DOI: 10.1371/journal.pcbi.1010033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Rachel A. Zitomer
- Department of Forest Ecosystems and Society, Oregon State University, Corvallis, Oregon, United States of America
| | - Jessica Karr
- Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America
| | - Mark Kerstens
- Department of Forest Engineering, Resources, and Management, Oregon State University, Corvallis, Oregon, United States of America
| | - Lindsey Perry
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Kayla Ruth
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Lindsay Adrean
- Department of Forest Engineering, Resources, and Management, Oregon State University, Corvallis, Oregon, United States of America
| | - Suzanne Austin
- Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America
| | - Jamie Cornelius
- Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America
| | - Jonathan Dachenhaus
- Department of Forest Engineering, Resources, and Management, Oregon State University, Corvallis, Oregon, United States of America
| | - Jonathan Dinkins
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Alan Harrington
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Hankyu Kim
- Department of Forest Ecosystems and Society, Oregon State University, Corvallis, Oregon, United States of America
| | - Terrah Owens
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Claire Revekant
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Vanessa Schroeder
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Chelsea Sink
- Department of Fisheries, Wildlife, and Conservation Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Jonathon J. Valente
- Department of Forest Engineering, Resources, and Management, Oregon State University, Corvallis, Oregon, United States of America
- Smithsonian Conservation Biology Institute, Migratory Bird Center, National Zoological Park, Washington, DC, United States of America
| | - Ethan Woodis
- Department of Forest Engineering, Resources, and Management, Oregon State University, Corvallis, Oregon, United States of America
| | - James W. Rivers
- Department of Forest Engineering, Resources, and Management, Oregon State University, Corvallis, Oregon, United States of America
- * E-mail:
| |
Collapse
|
8
|
Gygli G. On the reproducibility of enzyme reactions and kinetic modelling. Biol Chem 2022; 403:717-730. [PMID: 35357794 DOI: 10.1515/hsz-2021-0393] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 03/09/2022] [Indexed: 12/20/2022]
Abstract
Enzyme reactions are highly dependent on reaction conditions. To ensure reproducibility of enzyme reaction parameters, experiments need to be carefully designed and kinetic modeling meticulously executed. Furthermore, to enable quality control of enzyme reaction parameters, the experimental conditions, the modeling process as well as the raw data need to be reported comprehensively. By taking these steps, enzyme reaction parameters can be open and FAIR (findable, accessible, interoperable, re-usable) as well as repeatable, replicable and reproducible. This review discusses these requirements and provides a practical guide to designing initial rate experiments for the determination of enzyme reaction parameters and gives an open, FAIR and re-editable example of the kinetic modeling of an enzyme reaction. Both the guide and example are scripted with Python in Jupyter Notebooks and are publicly available (https://fairdomhub.org/investigations/483/snapshots/1). Finally, the prerequisites of automated data analysis and machine learning algorithms are briefly discussed to provide further motivation for the comprehensive, open and FAIR reporting of enzyme reaction parameters.
Collapse
Affiliation(s)
- Gudrun Gygli
- Institute for Biological Interfaces (IBG 1), Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
9
|
Way GP, Greene CS, Carninci P, Carvalho BS, de Hoon M, Finley SD, Gosline SJC, Lȇ Cao KA, Lee JSH, Marchionni L, Robine N, Sindi SS, Theis FJ, Yang JYH, Carpenter AE, Fertig EJ. A field guide to cultivating computational biology. PLoS Biol 2021; 19:e3001419. [PMID: 34618807 PMCID: PMC8525744 DOI: 10.1371/journal.pbio.3001419] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 10/19/2021] [Indexed: 11/18/2022] Open
Abstract
Evolving in sync with the computation revolution over the past 30 years, computational biology has emerged as a mature scientific field. While the field has made major contributions toward improving scientific knowledge and human health, individual computational biology practitioners at various institutions often languish in career development. As optimistic biologists passionate about the future of our field, we propose solutions for both eager and reluctant individual scientists, institutions, publishers, funding agencies, and educators to fully embrace computational biology. We believe that in order to pave the way for the next generation of discoveries, we need to improve recognition for computational biologists and better align pathways of career success with pathways of scientific progress. With 10 outlined steps, we call on all adjacent fields to move away from the traditional individual, single-discipline investigator research model and embrace multidisciplinary, data-driven, team science.
Collapse
Affiliation(s)
- Gregory P. Way
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Casey S. Greene
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Piero Carninci
- RIKEN Center for Integrative Medical Sciences Yokohama, Kanagawa, Japan
- Human Technopole, Milan, Italy
| | - Benilton S. Carvalho
- Department of Statistics, Institute of Mathematics, Statistics and Scientific Computing, University of Campinas, Campinas, Brazil
| | - Michiel de Hoon
- RIKEN Center for Integrative Medical Sciences Yokohama, Kanagawa, Japan
| | - Stacey D. Finley
- Department of Biomedical Engineering, Quantitative and Computational Biology, and Chemical Engineering & Materials Science, University of Southern California, Los Angeles, California, United States of America
| | - Sara J. C. Gosline
- Pacific Northwest National Laboratory, Seattle, Washington, United States of America
| | - Kim-Anh Lȇ Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
| | - Jerry S. H. Lee
- Ellison Institute and Departments of Medicine/Oncology, Chemical Engineering, and Material Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill-Cornell Medicine, New York, New York, United States of America
| | - Nicolas Robine
- Computational Biology Lab, New York Genome Center, New York, New York, United States of America
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California Merced, Merced, California, United States of America
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Center Munich and Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Jean Y. H. Yang
- Charles Perkins Centre and School of Mathematics and Statistics, The University of Sydney, Australia
| | - Anne E. Carpenter
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Elana J. Fertig
- Convergence Institute, Departments of Oncology, Biomedical Engineering, and Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
10
|
Allbee Q, Barber R. Writing python programs to map alleles related to genetic disease. BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION : A BIMONTHLY PUBLICATION OF THE INTERNATIONAL UNION OF BIOCHEMISTRY AND MOLECULAR BIOLOGY 2021; 49:677-678. [PMID: 33991167 DOI: 10.1002/bmb.21528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Accepted: 05/06/2021] [Indexed: 06/12/2023]
Abstract
Biology is a data-driven discipline facilitated greatly by computer programming skills. This article describes an introductory experiential programming activity that can be integrated into distance learning environments. Students are asked to develop their own Python programs to identify the nature of alleles linked to disease. This activity effectively engages students in a problem solving exercise that provides an opportunity for application of basic programming skills as well as understanding eukaryotic gene structure. We provide sets of mapped alleles for two well-known genes, CFTR and HFE, as well as a suite of relevant Python programs to achieve these outcomes or allow subsequent exercise modifications.
Collapse
Affiliation(s)
- Quinn Allbee
- University of Wisconsin-Parkside, Kenosha, Wisconsin, USA
| | - Robert Barber
- University of Wisconsin-Parkside, Kenosha, Wisconsin, USA
| |
Collapse
|
11
|
Zeng H, Zhang J, Preising GA, Rubel T, Singh P, Ritz A. Graphery: interactive tutorials for biological network algorithms. Nucleic Acids Res 2021; 49:W257-W262. [PMID: 34037782 PMCID: PMC8262715 DOI: 10.1093/nar/gkab420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 04/19/2021] [Accepted: 05/03/2021] [Indexed: 11/14/2022] Open
Abstract
Networks have been an excellent framework for modeling complex biological information, but the methodological details of network-based tools are often described for a technical audience. We have developed Graphery, an interactive tutorial webserver that illustrates foundational graph concepts frequently used in network-based methods. Each tutorial describes a graph concept along with executable Python code that can be interactively run on a graph. Users navigate each tutorial using their choice of real-world biological networks that highlight the diverse applications of network algorithms. Graphery also allows users to modify the code within each tutorial or write new programs, which all can be executed without requiring an account. Graphery accepts ideas for new tutorials and datasets that will be shaped by both computational and biological researchers, growing into a community-contributed learning platform. Graphery is available at https://graphery.reedcompbio.org/.
Collapse
Affiliation(s)
- Heyuan Zeng
- Computer Science Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA.,Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| | - Jinbiao Zhang
- Information and Communication Technology Department, Xiamen University Malaysia, Jalan Sunsuria, Bandar Sunsuria, 43900 Sepang, Selangor Darul Ehsan, Malaysia
| | - Gabriel A Preising
- Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| | - Tobias Rubel
- Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| | - Pramesh Singh
- Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| | - Anna Ritz
- Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| |
Collapse
|
12
|
Schweizer RM, Saarman N, Ramstad KM, Forester BR, Kelley JL, Hand BK, Malison RL, Ackiss AS, Watsa M, Nelson TC, Beja-Pereira A, Waples RS, Funk WC, Luikart G. Big Data in Conservation Genomics: Boosting Skills, Hedging Bets, and Staying Current in the Field. J Hered 2021; 112:313-327. [PMID: 33860294 DOI: 10.1093/jhered/esab019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 04/13/2021] [Indexed: 02/07/2023] Open
Abstract
A current challenge in the fields of evolutionary, ecological, and conservation genomics is balancing production of large-scale datasets with additional training often required to handle such datasets. Thus, there is an increasing need for conservation geneticists to continually learn and train to stay up-to-date through avenues such as symposia, meetings, and workshops. The ConGen meeting is a near-annual workshop that strives to guide participants in understanding population genetics principles, study design, data processing, analysis, interpretation, and applications to real-world conservation issues. Each year of ConGen gathers a diverse set of instructors, students, and resulting lectures, hands-on sessions, and discussions. Here, we summarize key lessons learned from the 2019 meeting and more recent updates to the field with a focus on big data in conservation genomics. First, we highlight classical and contemporary issues in study design that are especially relevant to working with big datasets, including the intricacies of data filtering. We next emphasize the importance of building analytical skills and simulating data, and how these skills have applications within and outside of conservation genetics careers. We also highlight recent technological advances and novel applications to conservation of wild populations. Finally, we provide data and recommendations to support ongoing efforts by ConGen organizers and instructors-and beyond-to increase participation of underrepresented minorities in conservation and eco-evolutionary sciences. The future success of conservation genetics requires both continual training in handling big data and a diverse group of people and approaches to tackle key issues, including the global biodiversity-loss crisis.
Collapse
Affiliation(s)
- Rena M Schweizer
- Division of Biological Sciences, University of Montana, Missoula, MT
| | - Norah Saarman
- Department of Biology, Utah State University, Logan, UT
| | - Kristina M Ramstad
- Department of Biology and Geology, University of South Carolina Aiken, Aiken, SC
| | | | - Joanna L Kelley
- School of Biological Sciences, Washington State University, Pullman, WA
| | - Brian K Hand
- Division of Biological Sciences, University of Montana, Missoula, MT.,Flathead Lake Biological Station, University of Montana, Polson, MT
| | - Rachel L Malison
- Flathead Lake Biological Station, University of Montana, Polson, MT
| | - Amanda S Ackiss
- Wisconsin Cooperative Fishery Research Unit, University of Wisconsin Stevens Point, Stevens Point, WI
| | | | | | - Albano Beja-Pereira
- Centro de Investigação em Biodiversidade e Recursos Genéticos (CIBIO-UP), InBIO, Universidade do Porto, Vairão, Portugal.,DGAOT, Faculty of Sciences, University of Porto, Porto, Portugal.,Sustainable Agrifood Production Research Centre (GreenUPorto), Faculty of Sciences, University of Porto, Porto, Portugal
| | - Robin S Waples
- Northwest Fisheries Science Center, NOAA Fisheries, Seattle, WA
| | - W Chris Funk
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, Fort Collins, CO
| | - Gordon Luikart
- Division of Biological Sciences, University of Montana, Missoula, MT.,Flathead Lake Biological Station, University of Montana, Polson, MT
| |
Collapse
|
13
|
Balaban G, Grytten I, Rand KD, Scheffer L, Sandve GK. Ten simple rules for quick and dirty scientific programming. PLoS Comput Biol 2021; 17:e1008549. [PMID: 33705383 PMCID: PMC7951887 DOI: 10.1371/journal.pcbi.1008549] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Gabriel Balaban
- Biomedical Informatics Group, Department of Informatics, University of Oslo, Oslo, Norway
- PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
| | - Ivar Grytten
- Biomedical Informatics Group, Department of Informatics, University of Oslo, Oslo, Norway
| | - Knut Dagestad Rand
- Institute of Medical Microbiology, Oslo University Hospital, Rikshospitalet, Oslo, Norway
| | - Lonneke Scheffer
- Biomedical Informatics Group, Department of Informatics, University of Oslo, Oslo, Norway
| | - Geir Kjetil Sandve
- Biomedical Informatics Group, Department of Informatics, University of Oslo, Oslo, Norway
- PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
- * E-mail:
| |
Collapse
|
14
|
Werner J, Jeske D. Ten simple rules for running and managing virtual internships. PLoS Comput Biol 2021; 17:e1008599. [PMID: 33600416 PMCID: PMC7891720 DOI: 10.1371/journal.pcbi.1008599] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Johannes Werner
- Department of Biological Oceanography, Leibniz Institute of Baltic Sea Research, Rostock-Warnemünde, Germany.,High Performance and Cloud Computing Group, Zentrum für Datenverarbeitung (ZDV), Eberhard Karls University of Tübingen, Tübingen, Germany
| | - Debora Jeske
- School of Applied Psychology, University College Cork, Cork, Republic of Ireland
| |
Collapse
|
15
|
Brandies PA, Hogg CJ. Ten simple rules for getting started with command-line bioinformatics. PLoS Comput Biol 2021; 17:e1008645. [PMID: 33600404 PMCID: PMC7891784 DOI: 10.1371/journal.pcbi.1008645] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Affiliation(s)
- Parice A. Brandies
- School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, New South Wales, Australia
| | - Carolyn J. Hogg
- School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, New South Wales, Australia
- * E-mail:
| |
Collapse
|
16
|
Ten simple rules for navigating the computational aspect of an interdisciplinary PhD. PLoS Comput Biol 2021; 17:e1008554. [PMID: 33600411 PMCID: PMC7891742 DOI: 10.1371/journal.pcbi.1008554] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
17
|
Bodner K, Brimacombe C, Chenery ES, Greiner A, McLeod AM, Penk SR, Vargas Soto JS. Ten simple rules for tackling your first mathematical models: A guide for graduate students by graduate students. PLoS Comput Biol 2021; 17:e1008539. [PMID: 33444343 PMCID: PMC7808623 DOI: 10.1371/journal.pcbi.1008539] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Korryn Bodner
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
- Department of Ecology and Evolution, University of Toronto, Toronto, Ontario, Canada
- * E-mail:
| | - Chris Brimacombe
- Department of Ecology and Evolution, University of Toronto, Toronto, Ontario, Canada
| | - Emily S. Chenery
- Department of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
| | - Ariel Greiner
- Department of Ecology and Evolution, University of Toronto, Toronto, Ontario, Canada
| | - Anne M. McLeod
- Department of Biology, Memorial University of Newfoundland, St John’s, Newfoundland, Canada
| | - Stephanie R. Penk
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
- Department of Ecology and Evolution, University of Toronto, Toronto, Ontario, Canada
| | - Juan S. Vargas Soto
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
- Department of Ecology and Evolution, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
18
|
Mooers BHM, Brown ME. Templates for writing PyMOL scripts. Protein Sci 2021; 30:262-269. [PMID: 33179363 PMCID: PMC7737772 DOI: 10.1002/pro.3997] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/06/2020] [Accepted: 11/10/2020] [Indexed: 11/10/2022]
Abstract
PyMOL commands are used to exert exquisite control over the appearance of a molecular model. This control has made PyMOL popular for making images of protein structures for publications and presentations. However, many users have poor recall of the commands due to infrequent use of PyMOL. This poor recall hinders the writing of new code in scripts. One solution is to build the new script by using code fragments as templates for modular parts of the task at hand. The code fragments can be accessed from a library while writing the code from inside a text editor (e.g., Visual Studio Code, Vim, and Emacs). We developed a library of PyMOL code templates or snippets called pymolsnips to ease the writing of PyMOL code in scripts. We made pymolsnips available on GitHub in formats for 18 popular text editors. Most of the supported text editors are available for Mac, Windows, and Linux operating systems. The GitHub site includes animations that complement the instructions for installing the library for each text editor. We expect that the library will help many PyMOL users to be more productive when writing PyMOL script files.
Collapse
Affiliation(s)
- Blaine H. M. Mooers
- Department of Biochemistry and Molecular BiologyUniversity of Oklahoma Health Sciences CenterOklahoma CityOklahomaUSA
- Stephenson Cancer CenterUniversity of Oklahoma Health Sciences CenterOklahoma CityOklahomaUSA
- Laboratory of Biomolecular Structure and FunctionUniversity of Oklahoma Health Sciences CenterOklahoma CityOklahomaUSA
| | - Marina E. Brown
- Department of Biochemistry and Molecular BiologyUniversity of Oklahoma Health Sciences CenterOklahoma CityOklahomaUSA
| |
Collapse
|
19
|
Mura C, Chalupa M, Newbury AM, Chalupa J, Bourne PE. Ten simple rules for starting research in your late teens. PLoS Comput Biol 2020; 16:e1008403. [PMID: 33211694 PMCID: PMC7676678 DOI: 10.1371/journal.pcbi.1008403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Cameron Mura
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
- School of Data Science, University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail: (CM); (PEB)
| | - Mike Chalupa
- City Neighbors Foundation, Baltimore, Maryland, United States of America
| | - Abigail M. Newbury
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| | - Jack Chalupa
- City Neighbors Foundation, Baltimore, Maryland, United States of America
| | - Philip E. Bourne
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
- School of Data Science, University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail: (CM); (PEB)
| |
Collapse
|
20
|
Jung H, Ventura T, Chung JS, Kim WJ, Nam BH, Kong HJ, Kim YO, Jeon MS, Eyun SI. Twelve quick steps for genome assembly and annotation in the classroom. PLoS Comput Biol 2020; 16:e1008325. [PMID: 33180771 PMCID: PMC7660529 DOI: 10.1371/journal.pcbi.1008325] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.
Collapse
Affiliation(s)
- Hyungtaek Jung
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
- Centre for Agriculture and Bioeconomy, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Tomer Ventura
- Genecology Research Centre, School of Science and Engineering, University of the Sunshine Coast, Sippy Downs, Queensland, Australia
| | - J. Sook Chung
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, Maryland, United States of America
| | - Woo-Jin Kim
- Genetics and Breeding Research Center, National Institute of Fisheries Science, Geoje, Korea
| | - Bo-Hye Nam
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Hee Jeong Kong
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Young-Ok Kim
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Seong-il Eyun
- Department of Life Science, Chung-Ang University, Seoul, Korea
| |
Collapse
|
21
|
Mitchell K, Ronas J, Dao C, Freise AC, Mangul S, Shapiro C, Moberg Parker J. PUMAA: A Platform for Accessible Microbiome Analysis in the Undergraduate Classroom. Front Microbiol 2020; 11:584699. [PMID: 33123113 PMCID: PMC7573227 DOI: 10.3389/fmicb.2020.584699] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 09/14/2020] [Indexed: 12/22/2022] Open
Abstract
Improvements in high-throughput sequencing makes targeted amplicon analysis an ideal method for the study of human and environmental microbiomes by undergraduates. Multiple bioinformatics programs are available to process and interpret raw microbial diversity datasets, and the choice of programs to use in curricula is largely determined by student learning goals. Many of the most commonly used microbiome bioinformatics platforms offer end-to-end data processing and data analysis using a command line interface (CLI), but the downside for novice microbiome researchers is the steep learning curve often required. Alternatively, some sequencing providers include processing of raw data and taxonomy assignments as part of their pipelines. This, when coupled with available web-based or graphical user interface (GUI) analysis and visualization tools, eliminates the need for students or instructors to have extensive CLI experience. However, lack of universal data formats can make integration of these tools challenging. For example, tools for upstream and downstream analyses frequently use multiple different data formats which then require writing custom scripts or hours of manual work to make the files compatible. Here, we describe a microbial ecology bioinformatics curriculum that focuses on data analysis, visualization, and statistical reasoning by taking advantage of existing web-based and GUI tools. We created the Program for Unifying Microbiome Analysis Applications (PUMAA), which solves the problem of inconsistent files by formatting the output files from several raw data processing programs to seamlessly transition to a suite of GUI programs for analysis and visualization of microbiome taxonomic and inferred functional profiles. Additionally, we created a series of tutorials to accompany each of the microbiome analysis curricular modules. From pre- and post-course surveys, students in this curriculum self-reported conceptual and confidence gains in bioinformatics and data analysis skills. Students also demonstrated gains in biologically relevant statistical reasoning based on rubric-guided evaluations of open-ended survey questions and the Statistical Reasoning in Biology Concept Inventory. The PUMAA program and associated analysis tutorials enable students and researchers with no computational experience to effectively analyze real microbiome datasets to investigate real-world research questions.
Collapse
Affiliation(s)
- Keith Mitchell
- Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA, United States
| | - Jiem Ronas
- Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA, United States
| | - Christopher Dao
- Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA, United States
| | - Amanda C Freise
- Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA, United States
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, United States
| | - Casey Shapiro
- Center for Educational Assessment, Center for the Advancement of Teaching, University of California, Los Angeles, Los Angeles, CA, United States
| | - Jordan Moberg Parker
- Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
22
|
Duruflé H, Selmani M, Ranocha P, Jamet E, Dunand C, Déjean S. A powerful framework for an integrative study with heterogeneous omics data: from univariate statistics to multi-block analysis. Brief Bioinform 2020; 22:5890507. [PMID: 32778869 DOI: 10.1093/bib/bbaa166] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 06/23/2020] [Accepted: 07/03/2020] [Indexed: 01/25/2023] Open
Abstract
High-throughput data generated by new biotechnologies require specific and adapted statistical treatment in order to be efficiently used in biological studies. In this article, we propose a powerful framework to manage and analyse multi-omics heterogeneous data to carry out an integrative analysis. We have illustrated this using the mixOmics package for R software as it specifically addresses data integration issues. Our work also aims at applying the most recent functionalities of mixOmics to real datasets. Although multi-block integrative methodologies exist, we hope to encourage a more widespread use of such approaches in an operational framework by biologists. We have used natural populations of the model plant Arabidopsis thaliana in this work, but the framework proposed is not limited to this plant and can be deployed whatever the organisms of interest and the biological question may be. Four omics datasets (phenomics, metabolomics, cell wall proteomics and transcriptomics) were collected, analysed and integrated to study the cell wall plasticity of plants exposed to sub-optimal temperature growth conditions. The methodologies presented here start from basic univariate statistics leading to multi-block integration analysis. We have also highlighted the fact that each method, either unsupervised or supervised, is associated with one biological issue. Using this powerful framework enabled us to arrive at novel conclusions on the biological system, which would not have been possible using standard statistical approaches.
Collapse
Affiliation(s)
| | - Merwann Selmani
- Laboratoire de Recherche en Sciences Végétales and the Institut de Mathématiques de Toulouse
| | | | | | | | | |
Collapse
|
23
|
Hagan AK, Lesniak NA, Balunas MJ, Bishop L, Close WL, Doherty MD, Elmore AG, Flynn KJ, Hannigan GD, Koumpouras CC, Jenior ML, Kozik AJ, McBride K, Rifkin SB, Stough JMA, Sovacool KL, Sze MA, Tomkovich S, Topcuoglu BD, Schloss PD. Ten simple rules to increase computational skills among biologists with Code Clubs. PLoS Comput Biol 2020; 16:e1008119. [PMID: 32853198 PMCID: PMC7451508 DOI: 10.1371/journal.pcbi.1008119] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Ada K. Hagan
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Nicholas A. Lesniak
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Marcy J. Balunas
- Division of Medicinal Chemistry, Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, United States of America
| | - Lucas Bishop
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - William L. Close
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Matthew D. Doherty
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Amanda G. Elmore
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Kaitlin J. Flynn
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Geoffrey D. Hannigan
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Charlie C. Koumpouras
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Matthew L. Jenior
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Ariangela J. Kozik
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Kathryn McBride
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Samara B. Rifkin
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Joshua M. A. Stough
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Kelly L. Sovacool
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Marc A. Sze
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sarah Tomkovich
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Begum D. Topcuoglu
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Patrick D. Schloss
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
24
|
Affiliation(s)
- Hoe-Han Goh
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, UKM Bangi, Selangor, Malaysia
| | - Philip E. Bourne
- School of Data Science, University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail:
| |
Collapse
|
25
|
Archmiller AA, Johnson AD, Nolan J, Edwards M, Elliott LH, Ferguson JM, Iannarilli F, Vélez J, Vitense K, Johnson DH, Fieberg J. Computational Reproducibility in The Wildlife Society's Flagship Journals. J Wildl Manage 2020. [DOI: 10.1002/jwmg.21855] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
| | - Andrew D. Johnson
- Biology Department Concordia College 901 8th St S Moorhead MN 56562 USA
| | - Jane Nolan
- Concordia College 901 8th St S Moorhead MN 56562 USA
| | - Margaret Edwards
- Department of Fisheries, Wildlife and Conservation Biology University of Minnesota 2003 Upper Buford Circle, Suite 135 Saint Paul MN 55108 USA
| | - Lisa H. Elliott
- Department of Fisheries, Wildlife and Conservation Biology University of Minnesota 2003 Upper Buford Circle, Suite 135 Saint Paul MN 55108 USA
| | - Jake M. Ferguson
- Department of Biology University of Hawaiʻi at Mānoa 2538 McCarthy Mall Honolulu HI 96822 USA
| | - Fabiola Iannarilli
- Department of Fisheries, Wildlife and Conservation Biology University of Minnesota 2003 Upper Buford Circle, Suite 135 Saint Paul MN 55108 USA
| | - Juliana Vélez
- Department of Fisheries, Wildlife and Conservation Biology University of Minnesota 2003 Upper Buford Circle, Suite 135 Saint Paul MN 55108 USA
| | - Kelsey Vitense
- Department of Fisheries, Wildlife and Conservation Biology University of Minnesota 2003 Upper Buford Circle, Suite 135 Saint Paul MN 55108 USA
| | - Douglas H. Johnson
- Department of Fisheries, Wildlife and Conservation Biology University of Minnesota 2003 Upper Buford Circle, Suite 135 Saint Paul MN 55108 USA
| | - John Fieberg
- Department of Fisheries, Wildlife and Conservation Biology University of Minnesota 2003 Upper Buford Circle, Suite 135 Saint Paul MN 55108 USA
| |
Collapse
|
26
|
Jo J, Oh J, Park C. Microbial community analysis using high-throughput sequencing technology: a beginner's guide for microbiologists. J Microbiol 2020; 58:176-192. [PMID: 32108314 DOI: 10.1007/s12275-020-9525-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 12/11/2019] [Accepted: 12/16/2019] [Indexed: 12/19/2022]
Abstract
Microbial communities present in diverse environments from deep seas to human body niches play significant roles in the complex ecosystem and human health. Characterizing their structural and functional diversities is indispensable, and many approaches, such as microscopic observation, DNA fingerprinting, and PCR-based marker gene analysis, have been successfully applied to identify microorganisms. Since the revolutionary improvement of DNA sequencing technologies, direct and high-throughput analysis of genomic DNA from a whole environmental community without prior cultivation has become the mainstream approach, overcoming the constraints of the classical approaches. Here, we first briefly review the history of environmental DNA analysis applications with a focus on profiling the taxonomic composition and functional potentials of microbial communities. To this end, we aim to introduce the shotgun metagenomic sequencing (SMS) approach, which is used for the untargeted ("shotgun") sequencing of all ("meta") microbial genomes ("genomic") present in a sample. SMS data analyses are performed in silico using various software programs; however, in silico analysis is typically regarded as a burden on wet-lab experimental microbiologists. Therefore, in this review, we present microbiologists who are unfamiliar with in silico analyses with a basic and practical SMS data analysis protocol. This protocol covers all the bioinformatics processes of the SMS analysis in terms of data preprocessing, taxonomic profiling, functional annotation, and visualization.
Collapse
Affiliation(s)
- Jihoon Jo
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Jooseong Oh
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Chungoo Park
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea.
| |
Collapse
|
27
|
Abstract
More than 10 years ago, we published the paper describing the mothur software package in Applied and Environmental Microbiology Our goal was to create a comprehensive package that allowed users to analyze amplicon sequence data using the most robust methods available. mothur has helped lead the community through the ongoing sequencing revolution and continues to provide this service to the microbial ecology community. Beyond its success and impact on the field, mothur's development exposed a series of observations that are generally translatable across science. Perhaps the observation that stands out the most is that all science is done in the context of prevailing ideas and available technologies. Although it is easy to criticize choices that were made 10 years ago through a modern lens, if we were to wait for all of the possible limitations to be solved before proceeding, science would stall. Even preceding the development of mothur, it was necessary to address the most important problems and work backwards to other problems that limited access to robust sequence analysis tools. At the same time, we strive to expand mothur's capabilities in a data-driven manner to incorporate new ideas and accommodate changes in data and desires of the research community. It has been edifying to see the benefit that a simple set of tools can bring to so many other researchers.
Collapse
Affiliation(s)
- Patrick D Schloss
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
28
|
Cereceda O, Quinn DE. A graduate student perspective on overcoming barriers to interacting with open-source software. Facets (Ott) 2020. [DOI: 10.1139/facets-2019-0020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Computational methods, coding, and software are important tools for conducting research. In both academic and industry data analytics, open-source software (OSS) has gained massive popularity. Collaborative source code allows students to interact with researchers, code developers, and users from a variety of disciplines. Based on the authors’ experiences as graduate students and coding instructors, this paper provides a unique overview of the obstacles that graduate students face in obtaining the knowledge and skills required to complete their research and in transitioning from an OSS user to a contributor: psychological, practical, and cultural barriers and challenges specific to graduate students including cognitive load in graduate school, the importance of a knowledgeable mentor, seeking help from both the online and local communities, and the ongoing campaign to recognize software as research output in career and degree progression. Specific and practical steps are recommended to provide a foundation for graduate students, supervisors, administrators, and members of the OSS community to help overcome these obstacles. In conclusion, the objective of these recommendations is to describe a possible framework that individuals from across the scientific community can adapt to their needs and facilitate a sustainable feedback loop between graduate students and OSS.
Collapse
Affiliation(s)
- Oihane Cereceda
- Faculty of Engineering and Applied Science, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Danielle E.A. Quinn
- Faculty of Science, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| |
Collapse
|
29
|
Affiliation(s)
- Vincent Miele
- Université de Lyon, F-69000 Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| | - Catherine Matias
- Laboratoire de Probabilités, Statistique et Modélisation, Centre National de la Recherche Scientifique, Sorbonne Université et Université de Paris, Paris, France
| | - Stéphane Robin
- UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris, France
| | - Stéphane Dray
- Université de Lyon, F-69000 Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| |
Collapse
|
30
|
Georgeson P, Syme A, Sloggett C, Chung J, Dashnow H, Milton M, Lonsdale A, Powell D, Seemann T, Pope B. Bionitio: demonstrating and facilitating best practices for bioinformatics command-line software. Gigascience 2019; 8:giz109. [PMID: 31544213 PMCID: PMC6755254 DOI: 10.1093/gigascience/giz109] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 07/16/2019] [Accepted: 08/13/2019] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Bioinformatics software tools are often created ad hoc, frequently by people without extensive training in software development. In particular, for beginners, the barrier to entry in bioinformatics software development is high, especially if they want to adopt good programming practices. Even experienced developers do not always follow best practices. This results in the proliferation of poorer-quality bioinformatics software, leading to limited scalability and inefficient use of resources; lack of reproducibility, usability, adaptability, and interoperability; and erroneous or inaccurate results. FINDINGS We have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in 1 of 12 programming languages. The resulting software is functional, carrying out a prototypical bioinformatics task, and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, progress logging, defined exit status values, a test suite, a version number, standardized building and packaging, user documentation, code documentation, a standard open source software license, software revision control, and containerization. CONCLUSIONS Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. This also benefits users because tools are more easily installed and consistent in their usage. Bionitio is released as open source software under the MIT License and is available at https://github.com/bionitio-team/bionitio.
Collapse
Affiliation(s)
- Peter Georgeson
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Victorian Comprehensive Cancer Centre, 305 Grattan Street, Melbourne, Victoria, Australia 3000
| | - Anna Syme
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
- Royal Botanic Gardens Victoria, Birdwood Avenue, Melbourne, Victoria, Australia 3004
| | - Clare Sloggett
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
| | - Jessica Chung
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
| | - Harriet Dashnow
- Bioinformatics, Murdoch Children's Research Institute, Royal Children's Hospital, Flemington Road, Parkville, Victoria, Australia 3052
- School of BioSciences, The University of Melbourne, Royal Parade, Parkville, Victoria, Australia 3052
| | - Michael Milton
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
- Melbourne Genomics Health Alliance, Walter and Eliza Hall Institute, 1G Royal Parade, Parkville, Victoria, Australia 3052
| | - Andrew Lonsdale
- Bioinformatics, Murdoch Children's Research Institute, Royal Children's Hospital, Flemington Road, Parkville, Victoria, Australia 3052
- ARC Centre of Excellence in Plant Cell Walls, School of BioSciences, The University of Melbourne, Royal Parade, Parkville, Victoria, Australia 3052
| | - David Powell
- Monash Bioinformatics Platform, Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences, 15 Innovation Walk, Monash University, Clayton, Victoria, Australia 3800
| | - Torsten Seemann
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
- Department of Microbiology and Immunology, Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street Melbourne, Victoria, Australia 3000
| | - Bernard Pope
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Victorian Comprehensive Cancer Centre, 305 Grattan Street, Melbourne, Victoria, Australia 3000
- Department of Medicine, Central Clinical School, Monash University, Clayton, Victoria, Australia 3800
| |
Collapse
|
31
|
Coelho LP, Alves R, Monteiro P, Huerta-Cepas J, Freitas AT, Bork P. NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language. MICROBIOME 2019; 7:84. [PMID: 31159881 PMCID: PMC6547473 DOI: 10.1186/s40168-019-0684-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 04/10/2019] [Indexed: 05/10/2023]
Abstract
BACKGROUND Shotgun metagenomes contain a sample of all the genomic material in an environment, allowing for the characterization of a microbial community. In order to understand these communities, bioinformatics methods are crucial. A common first step in processing metagenomes is to compute abundance estimates of different taxonomic or functional groups from the raw sequencing data. Given the breadth of the field, computational solutions need to be flexible and extensible, enabling the combination of different tools into a larger pipeline. RESULTS We present NGLess and NG-meta-profiler. NGLess is a domain specific language for describing next-generation sequence processing pipelines. It was developed with the goal of enabling user-friendly computational reproducibility. It provides built-in support for many common operations on sequencing data and is extensible with external tools with configuration files. Using this framework, we developed NG-meta-profiler, a fast profiler for metagenomes which performs sequence preprocessing, mapping to bundled databases, filtering of the mapping results, and profiling (taxonomic and functional). It is significantly faster than either MOCAT2 or htseq-count and (as it builds on NGLess) its results are perfectly reproducible. CONCLUSIONS NG-meta-profiler is a high-performance solution for metagenomics processing built on NGLess. It can be used as-is to execute standard analyses or serve as the starting point for customization in a perfectly reproducible fashion. NGLess and NG-meta-profiler are open source software (under the liberal MIT license) and can be downloaded from https://ngless.embl.de or installed through bioconda.
Collapse
Affiliation(s)
- Luis Pedro Coelho
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Renato Alves
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Collaboration for joint PhD degree between EMBL and Heidelberg University, Faculty of Biosciences, Heidelberg, Germany
| | - Paulo Monteiro
- INESC-ID, Instituto Superior Técnico, University of Lisbon, Lisbon, Portugal
| | - Jaime Huerta-Cepas
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM), Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Madrid, Spain
| | - Ana Teresa Freitas
- INESC-ID, Instituto Superior Técnico, University of Lisbon, Lisbon, Portugal
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany
- Molecular Medicine Partnership Unit, University of Heidelberg and European Molecular Biology Laboratory, Heidelberg, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| |
Collapse
|
32
|
Oliver JC, Kollen C, Hickson B, Rios F. Data Science Support at the Academic Library. JOURNAL OF LIBRARY ADMINISTRATION 2019. [DOI: 10.1080/01930826.2019.1583015] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Jeffrey C. Oliver
- Data Science Specialist, Office of Digital Innovation and Stewardship, University Libraries, University of Arizona, Tucson, AZ, USA
| | - Christine Kollen
- Data Curation Librarian, Office of Digital Innovation and Stewardship, University Libraries, University of Arizona, Tucson, AZ, USA
| | - Benjamin Hickson
- Geospatial Specialist, Office of Digital Innovation and Stewardship, University Libraries, University of Arizona, Tucson, AZ, USA
| | - Fernando Rios
- Research Data Management Specialist, Office of Digital Innovation and Stewardship, University Libraries, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
33
|
Grabowski P, Rappsilber J. A Primer on Data Analytics in Functional Genomics: How to Move from Data to Insight? Trends Biochem Sci 2019; 44:21-32. [PMID: 30522862 PMCID: PMC6318833 DOI: 10.1016/j.tibs.2018.10.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Revised: 10/19/2018] [Accepted: 10/25/2018] [Indexed: 02/06/2023]
Abstract
High-throughput methodologies and machine learning have been central in developing systems-level perspectives in molecular biology. Unfortunately, performing such integrative analyses has traditionally been reserved for bioinformaticians. This is now changing with the appearance of resources to help bench-side biologists become skilled at computational data analysis and handling large omics data sets. Here, we show an entry route into the field of omics data analytics. We provide information about easily accessible data sources and suggest some first steps for aspiring computational data analysts. Moreover, we highlight how machine learning is transforming the field and how it can help make sense of biological data. Finally, we suggest good starting points for self-learning and hope to convince readers that computational data analysis and programming are not intimidating.
Collapse
Affiliation(s)
- Piotr Grabowski
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany
| | - Juri Rappsilber
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany; Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK.
| |
Collapse
|
34
|
Kandlikar GS, Gold ZJ, Cowen MC, Meyer RS, Freise AC, Kraft NJB, Moberg-Parker J, Sprague J, Kushner DJ, Curd EE. ranacapa: An R package and Shiny web app to explore environmental DNA data with exploratory statistics and interactive visualizations. F1000Res 2018; 7:1734. [PMID: 30613396 PMCID: PMC6305237 DOI: 10.12688/f1000research.16680.1] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/23/2018] [Indexed: 11/29/2022] Open
Abstract
Environmental DNA (eDNA) metabarcoding is becoming a core tool in ecology and conservation biology, and is being used in a growing number of education, biodiversity monitoring, and public outreach programs in which professional research scientists engage community partners in primary research. Results from eDNA analyses can engage and educate natural resource managers, students, community scientists, and naturalists, but without significant training in bioinformatics, it can be difficult for this diverse audience to interact with eDNA results. Here we present the R package ranacapa, at the core of which is a Shiny web app that helps perform exploratory biodiversity analyses and visualizations of eDNA results. The app requires a taxonomy-by-sample matrix and a simple metadata file with descriptive information about each sample. The app enables users to explore the data with interactive figures and presents results from simple community ecology analyses. We demonstrate the value of ranacapa to two groups of community partners engaging with eDNA metabarcoding results.
Collapse
Affiliation(s)
- Gaurav S Kandlikar
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Zachary J Gold
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Madeline C Cowen
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Rachel S Meyer
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Amanda C Freise
- Department of Microbiology and Microbial Genetics, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Nathan J B Kraft
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Jordan Moberg-Parker
- Department of Microbiology and Microbial Genetics, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Joshua Sprague
- Channel Islands National Park, National Park Service, Ventura, CA, USA
| | - David J Kushner
- Channel Islands National Park, National Park Service, Ventura, CA, USA
| | - Emily E Curd
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| |
Collapse
|
35
|
Abstract
Most studies in the life sciences and other disciplines involve generating and analyzing numerical data of some type as the foundation for scientific findings. Working with numerical data involves multiple challenges. These include reproducible data acquisition, appropriate data storage, computationally correct data analysis, appropriate reporting and presentation of the results, and suitable data interpretation. Finding and correcting mistakes when analyzing and interpreting data can be frustrating and time-consuming. Presenting or publishing incorrect results is embarrassing but not uncommon. Particular sources of errors are inappropriate use of statistical methods and incorrect interpretation of data by software. To detect mistakes as early as possible, one should frequently check intermediate and final results for plausibility. Clearly documenting how quantities and results were obtained facilitates correcting mistakes. Properly understanding data is indispensable for reaching well-founded conclusions from experimental results. Units are needed to make sense of numbers, and uncertainty should be estimated to know how meaningful results are. Descriptive statistics and significance testing are useful tools for interpreting numerical results if applied correctly. However, blindly trusting in computed numbers can also be misleading, so it is worth thinking about how data should be summarized quantitatively to properly answer the question at hand. Finally, a suitable form of presentation is needed so that the data can properly support the interpretation and findings. By additionally sharing the relevant data, others can access, understand, and ultimately make use of the results. These quick tips are intended to provide guidelines for correctly interpreting, efficiently analyzing, and presenting numerical data in a useful way.
Collapse
Affiliation(s)
| | - Sabrina Rueschenbaum
- Department of Internal Medicine 1, University Hospital Frankfurt, Goethe University, Theodor-Stern-Kai 7, Frankfurt (Main), Germany
| |
Collapse
|
36
|
McKain MR, Johnson MG, Uribe‐Convers S, Eaton D, Yang Y. Practical considerations for plant phylogenomics. APPLICATIONS IN PLANT SCIENCES 2018; 6:e1038. [PMID: 29732268 PMCID: PMC5895195 DOI: 10.1002/aps3.1038] [Citation(s) in RCA: 101] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 03/13/2018] [Indexed: 05/10/2023]
Abstract
The past decade has seen a major breakthrough in our ability to easily and inexpensively sequence genome-scale data from diverse lineages. The development of high-throughput sequencing and long-read technologies has ushered in the era of phylogenomics, where hundreds to thousands of nuclear genes and whole organellar genomes are routinely used to reconstruct evolutionary relationships. As a result, understanding which options are best suited for a particular set of questions can be difficult, especially for those just starting in the field. Here, we review the most recent advances in plant phylogenomic methods and make recommendations for project-dependent best practices and considerations. We focus on the costs and benefits of different approaches in regard to the information they provide researchers and the questions they can address. We also highlight unique challenges and opportunities in plant systems, such as polyploidy, reticulate evolution, and the use of herbarium materials, identifying optimal methodologies for each. Finally, we draw attention to lingering challenges in the field of plant phylogenomics, such as reusability of data sets, and look at some up-and-coming technologies that may help propel the field even further.
Collapse
Affiliation(s)
- Michael R. McKain
- Department of Biological SciencesThe University of AlabamaBox 870344TuscaloosaAlabama35487USA
| | - Matthew G. Johnson
- Department of Biological SciencesTexas Tech University2901 Main Street, Box 43131LubbockTexas79409USA
| | - Simon Uribe‐Convers
- Department of Ecology and Evolutionary BiologyUniversity of Michigan830 North UniversityAnn ArborMichigan48109USA
| | - Deren Eaton
- Department of Ecology, Evolution, and Environmental BiologyColumbia University1200 Amsterdam AvenueNew YorkNew York10027USA
| | - Ya Yang
- Department of Plant and Microbial BiologyUniversity of Minnesota–Twin Cities1445 Gortner AvenueSt. PaulMinnesota55108USA
| |
Collapse
|