1
|
Villani F, Guarracino A, Ward RR, Green T, Emms M, Pravenec M, Prins P, Garrison E, Williams RW, Chen H, Colonna V. Pangenome reconstruction in rats enhances genotype-phenotype mapping and novel variant discovery. bioRxiv 2024:2024.01.10.575041. [PMID: 38260597 PMCID: PMC10802574 DOI: 10.1101/2024.01.10.575041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
The HXB/BXH family of recombinant inbred rat strains is a unique genetic resource that has been extensively phenotyped over 25 years, resulting in a vast dataset of quantitative molecular and physiological phenotypes. We built a pangenome graph from 10x Genomics Linked-Read data for 31 recombinant inbred rats to study genetic variation and association mapping. The pangenome includes 0.2Gb of sequence that is not present the reference mRatBN7.2, confirming the capture of substantial additional variation. We validated variants in challenging regions, including complex structural variants resolving into multiple haplotypes. Phenome-wide association analysis of validated SNPs uncovered variants associated with glucose/insulin levels and hippocampal gene expression. We propose an interaction between Pirl1l1 , chromogranin expression, TNF-α levels, and insulin regulation. This study demonstrates the utility of linked-read pangenomes for comprehensive variant detection and mapping phenotypic diversity in a widely used rat genetic reference panel.
Collapse
|
2
|
de Jong TV, Pan Y, Rastas P, Munro D, Tutaj M, Akil H, Benner C, Chen D, Chitre AS, Chow W, Colonna V, Dalgard CL, Demos WM, Doris PA, Garrison E, Geurts AM, Gunturkun HM, Guryev V, Hourlier T, Howe K, Huang J, Kalbfleisch T, Kim P, Li L, Mahaffey S, Martin FJ, Mohammadi P, Ozel AB, Polesskaya O, Pravenec M, Prins P, Sebat J, Smith JR, Solberg Woods LC, Tabakoff B, Tracey A, Uliano-Silva M, Villani F, Wang H, Sharp BM, Telese F, Jiang Z, Saba L, Wang X, Murphy TD, Palmer AA, Kwitek AE, Dwinell MR, Williams RW, Li JZ, Chen H. A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats. Cell Genom 2024; 4:100527. [PMID: 38537634 PMCID: PMC11019364 DOI: 10.1016/j.xgen.2024.100527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 12/26/2023] [Accepted: 02/29/2024] [Indexed: 04/09/2024]
Abstract
The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared with its predecessor. Gene annotations are now more complete, improving the mapping precision of genomic, transcriptomic, and proteomics datasets. We jointly analyzed 163 short-read whole-genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ∼20.0 million sequence variations, of which 18,700 are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.
Collapse
Affiliation(s)
- Tristan V de Jong
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Yanchao Pan
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Pasi Rastas
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Daniel Munro
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA; Department of Integrative Structural and Computational Biology, Scripps Research, San Diego, CA, USA
| | - Monika Tutaj
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Huda Akil
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Chris Benner
- Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - Denghui Chen
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy; Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Clifton L Dalgard
- Department of Anatomy, Physiology & Genetics, The American Genome Center, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Wendy M Demos
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Peter A Doris
- The Brown Foundation Institute of Molecular Medicine, Center for Human Genetics, University of Texas Health Science Center, Houston, TX, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Aron M Geurts
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Hakan M Gunturkun
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Victor Guryev
- Genome Structure and Ageing, University of Groningen, UMC, Groningen, the Netherlands
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Jun Huang
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ted Kalbfleisch
- Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Louisville, KY, USA
| | - Panjun Kim
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ling Li
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Spencer Mahaffey
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA; Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| | - Ayse Bilge Ozel
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Oksana Polesskaya
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Michal Pravenec
- Institute of Physiology, Czech Academy of Sciences, Prague, Czechia
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Jennifer R Smith
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Leah C Solberg Woods
- Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Boris Tabakoff
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | | | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Hongyang Wang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Burt M Sharp
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Francesca Telese
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Zhihua Jiang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Laura Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Xusheng Wang
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA; Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Anne E Kwitek
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Melinda R Dwinell
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jun Z Li
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
| | - Hao Chen
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA.
| |
Collapse
|
3
|
Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, Marschall T, Li H, Paten B, Abel HJ, Antonacci-Fulton LL, Asri M, Baid G, Baker CA, Belyaeva A, Billis K, Bourque G, Buonaiuto S, Carroll A, Chaisson MJP, Chang PC, Chang XH, Cheng H, Chu J, Cody S, Colonna V, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Doerr D, Ebert P, Ebler J, Eichler EE, Eizenga JM, Fairley S, Fedrigo O, Felsenfeld AL, Feng X, Fischer C, Flicek P, Formenti G, Frankish A, Fulton RS, Gao Y, Garg S, Garrison E, Garrison NA, Giron CG, Green RE, Groza C, Guarracino A, Haggerty L, Hall IM, Harvey WT, Haukness M, Haussler D, Heumos S, Hickey G, Hoekzema K, Hourlier T, Howe K, Jain M, Jarvis ED, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Li H, Liao WW, Lu S, Lu TY, Lucas JK, Magalhães H, Marco-Sola S, Marijon P, Markello C, Marschall T, Martin FJ, McCartney A, McDaniel J, Miga KH, Mitchell MW, Monlong J, Mountcastle J, Munson KM, Mwaniki MN, Nattestad M, Novak AM, Nurk S, Olsen HE, Olson ND, Paten B, Pesout T, Phillippy AM, Popejoy AB, Porubsky D, Prins P, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Sibbesen JA, Sirén J, Smith MW, Sofia HJ, Tayoun ANA, Thibaud-Nissen F, Tomlinson C, Tricomi FF, Villani F, Vollger MR, Wagner J, Walenz B, Wang T, Wood JMD, Zimin AV, Zook JM. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol 2024; 42:663-673. [PMID: 37165083 PMCID: PMC10638906 DOI: 10.1038/s41587-023-01793-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 04/18/2023] [Indexed: 05/12/2023]
Abstract
Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph's ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.
Collapse
Affiliation(s)
- Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Adam M. Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jordan M. Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Haley J. Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Carl A. Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Canadian Center for Computational Genomics, McGill University, Montreal, QC, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | | | - Mark J. P. Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | | | - Xian H. Chang
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - Robert M. Cook-Deegan
- Arizona State University, Barrett and O’Connor Washington Center, Washington, DC, USA
| | - Omar E. Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Daniel Doerr
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Peter Ebert
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jana Ebler
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Jordan M. Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L. Felsenfeld
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Robert S. Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Nanibaa’ A. Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Erich D. Jarvis
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Hanlee P. Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E. Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A. Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | | | - Jan O. Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Julian K. Lucas
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Hugo Magalhães
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d’Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Charles Markello
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Tobias Marschall
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Fergal J. Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | | | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Adam M. Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E. Olsen
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice B. Popejoy
- Department of Public Health Sciences, University of California, Davis, Davis, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A. Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ashley D. Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I. Schultz
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Jonas A. Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Michael W. Smith
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J. Sofia
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N. Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children’s Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Aleksey V. Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| |
Collapse
|
4
|
Heumos S, Guarracino A, Schmelzle JNM, Li J, Zhang Z, Hagmann J, Nahnsen S, Prins P, Garrison E. Pangenome graph layout by Path-Guided Stochastic Gradient Descent. bioRxiv 2023:2023.09.22.558964. [PMID: 37790531 PMCID: PMC10542513 DOI: 10.1101/2023.09.22.558964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Motivation The increasing availability of complete genomes demands for models to study genomic variability within entire populations. Pangenome graphs capture the full genomic similarity and diversity between multiple genomes. In order to understand them, we need to see them. For visualization, we need a human readable graph layout: A graph embedding in low (e.g. two) dimensional depictions. Due to a pangenome graph's potential excessive size, this is a significant challenge. Results In response, we introduce a novel graph layout algorithm: the Path-Guided Stochastic Gradient Descent (PG-SGD). PG-SGD uses the genomes, represented in the pangenome graph as paths, as an embedded positional system to sample genomic distances between pairs of nodes. This avoids the quadratic cost seen in previous versions of graph drawing by Stochastic Gradient Descent (SGD). We show that our implementation efficiently computes the low dimensional layouts of gigabase-scale pangenome graphs, unveiling their biological features. Availability We integrated PG-SGD in ODGI which is released as free software under the MIT open source license. Source code is available at https://github.com/pangenome/odgi.
Collapse
Affiliation(s)
- Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen 72076, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen 72076, Germany
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
- Genomics Research Centre, Human Technopole, Milan 20157, Italy
| | - Jan-Niklas M. Schmelzle
- Department of Computer Engineering, School of Computation, Information and Technology (CIT), Technical University of Munich, Munich 80333, Germany
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Jiajie Li
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Zhiru Zhang
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Jörg Hagmann
- Computomics GmbH, Eisenbahnstr. 1, 72072 Tübingen, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen 72076, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen 72076, Germany
- M3 Research Center, University Hospital Tübingen, 72076 Tübingen, Germany
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| |
Collapse
|
5
|
de Jong TV, Pan Y, Rastas P, Munro D, Tutaj M, Akil H, Benner C, Chen D, Chitre AS, Chow W, Colonna V, Dalgard CL, Demos WM, Doris PA, Garrison E, Geurts AM, Gunturkun HM, Guryev V, Hourlier T, Howe K, Huang J, Kalbfleisch T, Kim P, Li L, Mahaffey S, Martin FJ, Mohammadi P, Ozel AB, Polesskaya O, Pravenec M, Prins P, Sebat J, Smith JR, Solberg Woods LC, Tabakoff B, Tracey A, Uliano-Silva M, Villani F, Wang H, Sharp BM, Telese F, Jiang Z, Saba L, Wang X, Murphy TD, Palmer AA, Kwitek AE, Dwinell MR, Williams RW, Li JZ, Chen H. A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats. bioRxiv 2023:2023.04.13.536694. [PMID: 37214860 PMCID: PMC10197727 DOI: 10.1101/2023.04.13.536694] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared to its predecessor. Gene annotations are now more complete, significantly improving the mapping precision of genomic, transcriptomic, and proteomics data sets. We jointly analyzed 163 short-read whole genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ~20.0 million sequence variations, of which 18.7 thousand are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.
Collapse
Affiliation(s)
- Tristan V de Jong
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Yanchao Pan
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Pasi Rastas
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Daniel Munro
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
- Department of Integrative Structural and Computational Biology, Scripps Research, San Diego, CA, USA
| | - Monika Tutaj
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
- Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Huda Akil
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Chris Benner
- Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - Denghui Chen
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Clifton L Dalgard
- Department of Anatomy, Physiology & Genetics; The American Genome Center, Uniformed Services University of the Health Sciences, Washington DC, USA
| | - Wendy M Demos
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
- Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Peter A Doris
- The Brown Foundation Institute of Molecular Medicine, Center For Human Genetics, University of Texas Health Science Center, Houston, TX, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Aron M Geurts
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Hakan M Gunturkun
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Victor Guryev
- Genome Structure and Ageing, University of Groningen, UMC Groningen, The Netherlands
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Jun Huang
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ted Kalbfleisch
- Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Louisville, KY, USA
| | - Panjun Kim
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ling Li
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Spencer Mahaffey
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children’s Research Institute, Seattle, WA, USA
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| | - Ayse Bilge Ozel
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Oksana Polesskaya
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Michal Pravenec
- Institute of Physiology, Czech Academy of Sciences, Prague, Czechia
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Jennifer R Smith
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
- Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Leah C Solberg Woods
- Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Boris Tabakoff
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | | | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Hongyang Wang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Burt M Sharp
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Francesca Telese
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Zhihua Jiang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Laura Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Xusheng Wang
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Anne E Kwitek
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
- Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Melinda R Dwinell
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
- Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jun Z Li
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Hao Chen
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| |
Collapse
|
6
|
Yang Z, Guarracino A, Biggs PJ, Black MA, Ismail N, Wold JR, Merriman TR, Prins P, Garrison E, de Ligt J. Pangenome graphs in infectious disease: a comprehensive genetic variation analysis of Neisseria meningitidis leveraging Oxford Nanopore long reads. Front Genet 2023; 14:1225248. [PMID: 37636268 PMCID: PMC10448961 DOI: 10.3389/fgene.2023.1225248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 08/01/2023] [Indexed: 08/29/2023] Open
Abstract
Whole genome sequencing has revolutionized infectious disease surveillance for tracking and monitoring the spread and evolution of pathogens. However, using a linear reference genome for genomic analyses may introduce biases, especially when studies are conducted on highly variable bacterial genomes of the same species. Pangenome graphs provide an efficient model for representing and analyzing multiple genomes and their variants as a graph structure that includes all types of variations. In this study, we present a practical bioinformatics pipeline that employs the PanGenome Graph Builder and the Variation Graph toolkit to build pangenomes from assembled genomes, align whole genome sequencing data and call variants against a graph reference. The pangenome graph enables the identification of structural variants, rearrangements, and small variants (e.g., single nucleotide polymorphisms and insertions/deletions) simultaneously. We demonstrate that using a pangenome graph, instead of a single linear reference genome, improves mapping rates and variant calling for both simulated and real datasets of the pathogen Neisseria meningitidis. Overall, pangenome graphs offer a promising approach for comparative genomics and comprehensive genetic variation analysis in infectious disease. Moreover, this innovative pipeline, leveraging pangenome graphs, can bridge variant analysis, genome assembly, population genetics, and evolutionary biology, expanding the reach of genomic understanding and applications.
Collapse
Affiliation(s)
- Zuyu Yang
- Institute of Environmental Science and Research, Porirua, New Zealand
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, United States
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Patrick J. Biggs
- Molecular Biosciences Group, School of Natural Sciences, Massey University, Palmerston North, New Zealand
- Molecular Epidemiology and Public Health Laboratory, School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Michael A. Black
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | - Nuzla Ismail
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | - Jana Renee Wold
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Tony R. Merriman
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
- Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Joep de Ligt
- Institute of Environmental Science and Research, Porirua, New Zealand
| |
Collapse
|
7
|
Mozhui K, O’Callaghan JP, Ashbrook DG, Prins P, Zhao W, Lu L, Jones BC. Epigenetic analysis in a murine genetic model of Gulf War illness. Front Toxicol 2023; 5:1162749. [PMID: 37389175 PMCID: PMC10300436 DOI: 10.3389/ftox.2023.1162749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 05/22/2023] [Indexed: 07/01/2023] Open
Abstract
Of the nearly 1 million military personnel who participated in the 1990-1991 Gulf War, between 25% and 35% became ill with what now is referred to as Gulf War Illness (GWI) by the Department of Defense. Symptoms varied from gastrointestinal distress to lethargy, memory loss, inability to concentrate, depression, respiratory, and reproductive problems. The symptoms have persisted for 30 years in those afflicted but the basis of the illness remains largely unknown. Nerve agents and other chemical exposures in the war zone have been implicated but the long-term effects of these acute exposures have left few if any identifiable signatures. The major aim of this study is to elucidate the possible genomic basis for the persistence of symptoms, especially of the neurological and behavioral effects. To address this, we performed a whole genome epigenetic analysis of the proposed cause of GWI, viz., exposure to organophosphate neurotoxicants combined with high circulating glucocorticoids in two inbred mouse strains, C57BL/6J and DBA/2J. The animals received corticosterone in their drinking water for 7 days followed by injection of diisopropylfluorophosphate, a nerve agent surrogate. Six weeks after DFP injection, the animals were euthanized and medial prefrontal cortex harvested for genome-wide DNA methylation analysis using high-throughput sequencing. We observed 67 differentially methylated genes, notably among them, Ttll7, Akr1c14, Slc44a4, and Rusc2, all related to different symptoms of GWI. Our results support proof of principle of genetic differences in the chronic effects of GWI-related exposures and may reveal why the disease has persisted in many of the now aging Gulf War veterans.
Collapse
Affiliation(s)
- Khyobeni Mozhui
- Department of Preventive Medicine, College of Medicine, University of Tennessee Health Science Center, Memphis, TN, United States
- Department of Genetics, Genomics and Informatics, College of Medicine, University of Tennessee Health Science Center, Memphis, TN, United States
| | - James P. O’Callaghan
- Molecular Neurotoxicology Laboratory, Toxicology, and Molecular Biology Branch, Health Effects Laboratory Division, U. S. Centers for Disease Control and Prevention, NIOSH, Morgantown, WV, United States
| | - David G. Ashbrook
- Department of Genetics, Genomics and Informatics, College of Medicine, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, College of Medicine, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Wenyuan Zhao
- Department of Genetics, Genomics and Informatics, College of Medicine, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Lu Lu
- Department of Genetics, Genomics and Informatics, College of Medicine, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Byron C. Jones
- Department of Genetics, Genomics and Informatics, College of Medicine, University of Tennessee Health Science Center, Memphis, TN, United States
- Department of Pharmacology, Addiction Science, and Toxicology, College of Medicine, University of Tennessee Health Science Center, Memphis, TN, United States
| |
Collapse
|
8
|
Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ, Buonaiuto S, Chang XH, Cheng H, Chu J, Colonna V, Eizenga JM, Feng X, Fischer C, Fulton RS, Garg S, Groza C, Guarracino A, Harvey WT, Heumos S, Howe K, Jain M, Lu TY, Markello C, Martin FJ, Mitchell MW, Munson KM, Mwaniki MN, Novak AM, Olsen HE, Pesout T, Porubsky D, Prins P, Sibbesen JA, Sirén J, Tomlinson C, Villani F, Vollger MR, Antonacci-Fulton LL, Baid G, Baker CA, Belyaeva A, Billis K, Carroll A, Chang PC, Cody S, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld AL, Formenti G, Frankish A, Gao Y, Garrison NA, Giron CG, Green RE, Haggerty L, Hoekzema K, Hourlier T, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson ND, Popejoy AB, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Smith MW, Sofia HJ, Abou Tayoun AN, Thibaud-Nissen F, Tricomi FF, Wagner J, Walenz B, Wood JMD, Zimin AV, Bourque G, Chaisson MJP, Flicek P, Phillippy AM, Zook JM, Eichler EE, Haussler D, Wang T, Jarvis ED, Miga KH, Garrison E, Marschall T, Hall IM, Li H, Paten B. A draft human pangenome reference. Nature 2023; 617:312-324. [PMID: 37165242 PMCID: PMC10172123 DOI: 10.1038/s41586-023-05896-x] [Citation(s) in RCA: 163] [Impact Index Per Article: 163.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 02/28/2023] [Indexed: 05/12/2023]
Abstract
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Collapse
Affiliation(s)
- Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Mobin Asri
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Daniel Doerr
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Marina Haukness
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - Julian K Lucas
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jean Monlong
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haley J Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | - Xian H Chang
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Robert S Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, Québec, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Charles Markello
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Novak
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Hugh E Olsen
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Trevor Pesout
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonas A Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Carl A Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | | | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Robert M Cook-Deegan
- Barrett and O'Connor Washington Center, Arizona State University, Washington, DC, USA
| | - Omar E Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Mark Diekhans
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L Felsenfeld
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Nanibaa' A Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, CA, USA
| | | | - Jan O Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Hugo Magalhães
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Alice B Popejoy
- Department of Public Health Sciences, University of California, Davis, CA, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Ashley D Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I Schultz
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Michael W Smith
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J Sofia
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children's Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Aleksey V Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Karen H Miga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany.
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany.
| | - Ira M Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA.
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA.
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, CA, USA.
| |
Collapse
|
9
|
Garrison E, Guarracino A, Heumos S, Villani F, Bao Z, Tattini L, Hagmann J, Vorbrugg S, Marco-Sola S, Kubica C, Ashbrook DG, Thorell K, Rusholme-Pilcher RL, Liti G, Rudbeck E, Nahnsen S, Yang Z, Moses MN, Nobrega FL, Wu Y, Chen H, de Ligt J, Sudmant PH, Soranzo N, Colonna V, Williams RW, Prins P. Building pangenome graphs. bioRxiv 2023:2023.04.05.535718. [PMID: 37066137 PMCID: PMC10104075 DOI: 10.1101/2023.04.05.535718] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Pangenome graphs can represent all variation between multiple genomes, but existing methods for constructing them are biased due to reference-guided approaches. In response, we have developed PanGenome Graph Builder (PGGB), a reference-free pipeline for constructing unbi-ased pangenome graphs. PGGB uses all-to-all whole-genome alignments and learned graph embeddings to build and iteratively refine a model in which we can identify variation, measure conservation, detect recombination events, and infer phylogenetic relationships.
Collapse
|
10
|
Verhoeven A, Finkers-Tomczak A, Prins P, Valkenburg-van Raaij DR, van Schaik CC, Overmars H, van Steenbrugge JJM, Tacken W, Varossieau K, Slootweg EJ, Kappers IF, Quentin M, Goverse A, Sterken MG, Smant G. The root-knot nematode effector MiMSP32 targets host 12-oxophytodienoate reductase 2 to regulate plant susceptibility. New Phytol 2023; 237:2360-2374. [PMID: 36457296 DOI: 10.1111/nph.18653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 11/23/2022] [Indexed: 06/17/2023]
Abstract
To establish persistent infections in host plants, herbivorous invaders, such as root-knot nematodes, must rely on effectors for suppressing damage-induced jasmonate-dependent host defenses. However, at present, the effector mechanisms targeting the biosynthesis of biologically active jasmonates to avoid adverse host responses are unknown. Using yeast two-hybrid, in planta co-immunoprecipitation, and mutant analyses, we identified 12-oxophytodienoate reductase 2 (OPR2) as an important host target of the stylet-secreted effector MiMSP32 of the root-knot nematode Meloidogyne incognita. MiMSP32 has no informative sequence similarities with other functionally annotated genes but was selected for the discovery of novel effector mechanisms based on evidence of positive, diversifying selection. OPR2 catalyzes the conversion of a derivative of 12-oxophytodienoate to jasmonic acid (JA) and operates parallel to 12-oxophytodienoate reductase 3 (OPR3), which controls the main pathway in the biosynthesis of jasmonates. We show that MiMSP32 targets OPR2 to promote parasitism of M. incognita in host plants independent of OPR3-mediated JA biosynthesis. Artificially manipulating the conversion of the 12-oxophytodienoate by OPRs increases susceptibility to multiple unrelated plant invaders. Our study is the first to shed light on a novel effector mechanism targeting this process to regulate the susceptibility of host plants.
Collapse
Affiliation(s)
- Ava Verhoeven
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
- Plant Stress Resilience, Utrecht University, Padualaan 8, 3584 CH, Utrecht, the Netherlands
- Plant-Environment Signaling, Utrecht University, Padualaan 8, 3584 CH, Utrecht, the Netherlands
| | - Anna Finkers-Tomczak
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Pjotr Prins
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Debbie R Valkenburg-van Raaij
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Casper C van Schaik
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Hein Overmars
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Joris J M van Steenbrugge
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Wannes Tacken
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Koen Varossieau
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Erik J Slootweg
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Iris F Kappers
- Laboratory of Plant Physiology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Michaël Quentin
- INRAE, Université Côte d'Azur, CNRS, ISA, F-06903, Sophia Antipolis, France
| | - Aska Goverse
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Mark G Sterken
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Geert Smant
- Laboratory of Nematology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| |
Collapse
|
11
|
Guarracino A, Heumos S, Nahnsen S, Prins P, Garrison E. ODGI: understanding pangenome graphs. Bioinformatics 2022; 38:3319-3326. [PMID: 35552372 DOI: 10.1101/2021.11.10.467921] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 03/18/2022] [Indexed: 05/24/2023]
Abstract
MOTIVATION Pangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way. RESULTS We wrote Optimized Dynamic Genome/Graph Implementation (ODGI), a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs. AVAILABILITY AND IMPLEMENTATION ODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen 72076, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen 72076, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen 72076, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen 72076, Germany
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| |
Collapse
|
12
|
Abstract
Motivation Pangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way. Results We wrote Optimized Dynamic Genome/Graph Implementation (ODGI), a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs. Availability and implementation ODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andrea Guarracino
- Genomics Research Centre, Human Technopole, Viale Rita Levi-Montalcini 1, Milan, 20157, Italy
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, 72076, Germany.,Biomedical Data Science, Dept. of Computer Science, University of Tübingen, Tübingen, 72076, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, 72076, Germany.,Biomedical Data Science, Dept. of Computer Science, University of Tübingen, Tübingen, 72076, Germany
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, 38163, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, 38163, USA
| |
Collapse
|
13
|
Gunturkun MH, Flashner E, Wang T, Mulligan MK, Williams RW, Prins P, Chen H. GeneCup: mining PubMed and GWAS catalog for gene-keyword relationships. G3 (Bethesda) 2022; 12:jkac059. [PMID: 35285473 PMCID: PMC9073678 DOI: 10.1093/g3journal/jkac059] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 03/04/2022] [Indexed: 11/13/2022]
Abstract
Interpreting and integrating results from omics studies typically requires a comprehensive and time consuming survey of extant literature. GeneCup is a literature mining web service that retrieves sentences containing user-provided gene symbols and keywords from PubMed abstracts. The keywords are organized into an ontology and can be extended to include results from human genome-wide association studies. We provide a drug addiction keyword ontology that contains over 300 keywords as an example. The literature search is conducted by querying the PubMed server using a programming interface, which is followed by retrieving abstracts from a local copy of the PubMed archive. The main results presented to the user are sentences where gene symbol and keywords co-occur. These sentences are presented through an interactive graphical interface or as tables. All results are linked to the original abstract in PubMed. In addition, a convolutional neural network is employed to distinguish sentences describing systemic stress from those describing cellular stress. The automated and comprehensive search strategy provided by GeneCup facilitates the integration of new discoveries from omic studies with existing literature. GeneCup is free and open source software. The source code of GeneCup and the link to a running instance is available at https://github.com/hakangunturkun/GeneCup.
Collapse
Affiliation(s)
- Mustafa H Gunturkun
- Department of Pharmacology, Addiction Science and Toxicology, University of Tennessee Health Science, Memphis, TN 38103, USA
| | - Efraim Flashner
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science, Memphis, TN 38103, USA
| | - Tengfei Wang
- Department of Pharmacology, Addiction Science and Toxicology, University of Tennessee Health Science, Memphis, TN 38103, USA
| | - Megan K Mulligan
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science, Memphis, TN 38103, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science, Memphis, TN 38103, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science, Memphis, TN 38103, USA
| | - Hao Chen
- Department of Pharmacology, Addiction Science and Toxicology, University of Tennessee Health Science, Memphis, TN 38103, USA
| |
Collapse
|
14
|
Garrison E, Kronenberg ZN, Dawson ET, Pedersen BS, Prins P. A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar. PLoS Comput Biol 2022. [PMID: 35639788 DOI: 10.1101/2021.05.21.445151] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2023] Open
Abstract
Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies-as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome. Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format.
Collapse
Affiliation(s)
- Erik Garrison
- Department Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Zev N Kronenberg
- Pacific Biosciences, San Diego, California, United States of America
| | - Eric T Dawson
- NVIDIA Corporation, Santa Clara, California, United States of America
| | - Brent S Pedersen
- Center for Molecular Medicine, University Medical Center, Utrecht, The Netherlands
| | - Pjotr Prins
- Department Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| |
Collapse
|
15
|
Garrison E, Kronenberg ZN, Dawson ET, Pedersen BS, Prins P. A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar. PLoS Comput Biol 2022; 18:e1009123. [PMID: 35639788 PMCID: PMC9286226 DOI: 10.1371/journal.pcbi.1009123] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 07/15/2022] [Accepted: 04/11/2022] [Indexed: 11/30/2022] Open
Abstract
Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies—as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome. Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format. Most bioinformatics workflows deal with DNA/RNA variations that are typically represented in the variant call format (VCF)—a file format that describes mutations (SNP and MNP), insertions and deletions (INDEL) against a reference genome. Here we present a wide range of free and open source software tools that are used in biomedical sequencing workflows around the world today.
Collapse
Affiliation(s)
- Erik Garrison
- Department Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Zev N. Kronenberg
- Pacific Biosciences, San Diego, California, United States of America
| | - Eric T. Dawson
- NVIDIA Corporation, Santa Clara, California, United States of America
| | - Brent S. Pedersen
- Center for Molecular Medicine, University Medical Center, Utrecht, The Netherlands
| | - Pjotr Prins
- Department Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
16
|
Wang X, Bajpai AK, Gu Q, Centeno A, Starlard-Davenport A, Prins P, Xu F, Lu L. A systems genetics approach delineates the role of Bcl2 in leukemia pathogenesis. Leuk Res 2022; 114:106804. [PMID: 35182904 PMCID: PMC9272521 DOI: 10.1016/j.leukres.2022.106804] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/11/2022] [Accepted: 02/06/2022] [Indexed: 01/11/2023]
Abstract
Leukemia is a group of malignancies of the blood forming tissues, and is characterized by the uncontrolled proliferation of blood cells. In the United States, it accounts for approximately 3.5% and 4% of all cancer-related incidences and mortalities, respectively. The current study aimed to explore the role of Bcl2 and associated genes in leukemia pathogenesis using a systems genetics approach. The transcriptome data from BXD Recombinant Inbred (RI) mice was analyzed to identify the expression of Bcl2 in myeloid cells. eQTL mapping was performed to select the potential chromosomal region and subsequently identify the candidate gene modulating the expression of Bcl2. Furthermore, gene enrichment and protein-protein interaction (PPI) analyses of the Bcl2-coexpressed genes were performed to demonstrate the role of Bcl2 in leukemia pathogenesis. The Bcl2-coexpressed genes were found to be enriched in various hematopoietic system related functions, and multiple pathways related to signaling, immune response, and cancer. The PPI network analysis demonstrated direct interaction of hematopoietic function related genes, such as Bag3, Bak1, Bcl2l11, Bmf, Mapk9, Myc, Ppp2r5c, and Ppp3ca with Bcl2. The eQTL mapping identified a 4.5 Mb genomic region on chromosome 11, potentially regulating the expression of Bcl2. A multi-criteria filtering process identified Top2a, among the genes located in the mapped locus, as the best candidate upstream regulator for Bcl2 expression variation. Hence, the current study provides better insights into the role of Bcl2 in leukemia pathogenesis and demonstrates the significance of our approach in gaining new knowledge on leukemia. Furthermore, our findings from the PPI network analysis and eQTL mapping provide supporting evidence of leukemia-associated genes, which can be further explored for their functional importance in leukemia. DATA AVAILABILITY: The myeloid cell transcriptomic data of the BXD mice used in this study can be accessed through our GeneNetwork (http://www.genenetwork.org) with the accession number of GN144.
Collapse
Affiliation(s)
- Xinfeng Wang
- Department of Hematology, Affiliated Hospital of Nantong University, Jiangsu, China
| | - Akhilesh Kumar Bajpai
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Qingqing Gu
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA,Department of Cardiology, Affiliated Hospital of Nantong University, Jiangsu 226001, China
| | - Arthur Centeno
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Athena Starlard-Davenport
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Pjotr Prins
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Fuyi Xu
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA; School of Pharmacy, Binzhou Medical University, Yantai, Shandong 264003, China.
| | - Lu Lu
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA.
| |
Collapse
|
17
|
Trotter C, Kim H, Farage G, Prins P, Williams RW, Broman KW, Sen Ś. Speeding up eQTL scans in the BXD population using GPUs. G3 (Bethesda) 2021; 11:jkab254. [PMID: 34499130 PMCID: PMC8664437 DOI: 10.1093/g3journal/jkab254] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 05/27/2021] [Indexed: 11/27/2022]
Abstract
The BXD family of mouse strains are an important reference population for systems biology and genetics that have been fully sequenced and deeply phenotyped. To facilitate interactive use of genotype-phenotype relations using many massive omics data sets for this and other segregating populations, we have developed new algorithms and code that enable near-real-time whole-genome quantitative trait locus (QTL) scans for up to one million traits. By using easily parallelizable operations including matrix multiplication, vectorized operations, and element-wise operations, our method is more than 700 times faster than a R/qtl linear model genome scan using 16 threads. We used parallelization of different CPU threads as well as GPUs. We found that the speed advantage of GPUs is dependent on problem size and shape (the number of cases, number of genotypes, and number of traits). Our approach is ideal for interactive web services, such as GeneNetwork.org that need to display results in real-time. Our implementation is available as the Julia language package LiteQTL at https://github.com/senresearch/LiteQTL.jl.
Collapse
Affiliation(s)
- Chelsea Trotter
- Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Hyeonju Kim
- Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Gregory Farage
- Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Karl W Broman
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Śaunak Sen
- Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| |
Collapse
|
18
|
Parks C, Rogers CM, Prins P, Williams RW, Chen H, Jones BC, Moore BM, Mulligan MK. Genetic Modulation of Initial Sensitivity to Δ9-Tetrahydrocannabinol (THC) Among the BXD Family of Mice. Front Genet 2021; 12:659012. [PMID: 34367237 PMCID: PMC8343140 DOI: 10.3389/fgene.2021.659012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/08/2021] [Indexed: 11/16/2022] Open
Abstract
Cannabinoid receptor 1 activation by the major psychoactive component in cannabis, Δ9-tetrahydrocannabinol (THC), produces motor impairments, hypothermia, and analgesia upon acute exposure. In previous work, we demonstrated significant sex and strain differences in acute responses to THC following administration of a single dose (10 mg/kg, i.p.) in C57BL/6J (B6) and DBA/2J (D2) inbred mice. To determine the extent to which these differences are heritable, we quantified acute responses to a single dose of THC (10 mg/kg, i.p.) in males and females from 20 members of the BXD family of inbred strains derived by crossing and inbreeding B6 and D2 mice. Acute THC responses (initial sensitivity) were quantified as changes from baseline for: 1. spontaneous activity in the open field (mobility), 2. body temperature (hypothermia), and 3. tail withdrawal latency to a thermal stimulus (antinociception). Initial sensitivity to the immobilizing, hypothermic, and antinociceptive effects of THC varied substantially across the BXD family. Heritability was highest for mobility and hypothermia traits, indicating that segregating genetic variants modulate initial sensitivity to THC. We identified genomic loci and candidate genes, including Ndufs2, Scp2, Rps6kb1 or P70S6K, Pde4d, and Pten, that may control variation in THC initial sensitivity. We also detected strong correlations between initial responses to THC and legacy phenotypes related to intake or response to other drugs of abuse (cocaine, ethanol, and morphine). Our study demonstrates the feasibility of mapping genes and variants modulating THC responses in the BXDs to systematically define biological processes and liabilities associated with drug use and abuse.
Collapse
Affiliation(s)
- Cory Parks
- Department of Genetics, Genomics and Informatics, The University of Tennessee Health Science Center, Memphis, TN, United States
- Department of Agriculture, Biology and Health Sciences, Cameron University, Lawton, OK, United States
| | - Chris M. Rogers
- Department of Genetics, Genomics and Informatics, The University of Tennessee Health Science Center, Memphis, TN, United States
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, The University of Tennessee Health Science Center, Memphis, TN, United States
| | - Robert W. Williams
- Department of Genetics, Genomics and Informatics, The University of Tennessee Health Science Center, Memphis, TN, United States
| | - Hao Chen
- Department of Pharmacology, Addiction Science and Toxicology, The University of Tennessee Health Science Center, Memphis, TN, United States
| | - Byron C. Jones
- Department of Genetics, Genomics and Informatics, The University of Tennessee Health Science Center, Memphis, TN, United States
| | - Bob M. Moore
- Department of Pharmaceutical Sciences, The University of Tennessee Health Science Center, Memphis, TN, United States
| | - Megan K. Mulligan
- Department of Genetics, Genomics and Informatics, The University of Tennessee Health Science Center, Memphis, TN, United States
| |
Collapse
|
19
|
Palmer RHC, Johnson EC, Won H, Polimanti R, Kapoor M, Chitre A, Bogue MA, Benca‐Bachman CE, Parker CC, Verma A, Reynolds T, Ernst J, Bray M, Kwon SB, Lai D, Quach BC, Gaddis NC, Saba L, Chen H, Hawrylycz M, Zhang S, Zhou Y, Mahaffey S, Fischer C, Sanchez‐Roige S, Bandrowski A, Lu Q, Shen L, Philip V, Gelernter J, Bierut LJ, Hancock DB, Edenberg HJ, Johnson EO, Nestler EJ, Barr PB, Prins P, Smith DJ, Akbarian S, Thorgeirsson T, Walton D, Baker E, Jacobson D, Palmer AA, Miles M, Chesler EJ, Emerson J, Agrawal A, Martone M, Williams RW. Integration of evidence across human and model organism studies: A meeting report. Genes Brain Behav 2021; 20:e12738. [PMID: 33893716 PMCID: PMC8365690 DOI: 10.1111/gbb.12738] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 04/11/2021] [Accepted: 04/21/2021] [Indexed: 12/13/2022]
Abstract
The National Institute on Drug Abuse and Joint Institute for Biological Sciences at the Oak Ridge National Laboratory hosted a meeting attended by a diverse group of scientists with expertise in substance use disorders (SUDs), computational biology, and FAIR (Findability, Accessibility, Interoperability, and Reusability) data sharing. The meeting's objective was to discuss and evaluate better strategies to integrate genetic, epigenetic, and 'omics data across human and model organisms to achieve deeper mechanistic insight into SUDs. Specific topics were to (a) evaluate the current state of substance use genetics and genomics research and fundamental gaps, (b) identify opportunities and challenges of integration and sharing across species and data types, (c) identify current tools and resources for integration of genetic, epigenetic, and phenotypic data, (d) discuss steps and impediment related to data integration, and (e) outline future steps to support more effective collaboration-particularly between animal model research communities and human genetics and clinical research teams. This review summarizes key facets of this catalytic discussion with a focus on new opportunities and gaps in resources and knowledge on SUDs.
Collapse
Affiliation(s)
- Rohan H. C. Palmer
- Behavioral Genetics of Addiction Laboratory, Department of PsychologyEmory UniversityAtlantaGeorgiaUSA
| | - Emma C. Johnson
- Department of PsychiatryWashington University School of MedicineSt. LouisMissouriUSA
| | - Hyejung Won
- Department of Genetics and Neuroscience CenterUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Renato Polimanti
- Department of PsychiatryYale University School of MedicineWest HavenConnecticutUSA
| | - Manav Kapoor
- Nash Family Department of Neuroscience and Friedman Brain InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Apurva Chitre
- Department of PsychiatryUniversity of California, San DiegoLa JollaCaliforniaUSA
| | | | - Chelsie E. Benca‐Bachman
- Behavioral Genetics of Addiction Laboratory, Department of PsychologyEmory UniversityAtlantaGeorgiaUSA
| | - Clarissa C. Parker
- Department of Psychology and Program in NeuroscienceMiddlebury CollegeMiddleburyVermontUSA
| | - Anurag Verma
- Biomedical and Translational Informatics LaboratoryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | | | - Jason Ernst
- Department of Biological ChemistryUniversity of California Los AngelesLos AngelesCaliforniaUSA
| | - Michael Bray
- Department of PsychiatryWashington University School of MedicineSt. LouisMissouriUSA
| | - Soo Bin Kwon
- Department of Biological ChemistryUniversity of California Los AngelesLos AngelesCaliforniaUSA
| | - Dongbing Lai
- Department of Medical and Molecular GeneticsIndiana University School of MedicineIndianapolisIndianaUSA
| | - Bryan C. Quach
- GenOmics, Bioinformatics, and Translational Research Center, Biostatistics and Epidemiology DivisionRTI InternationalResearch Triangle ParkNorth CarolinaUSA
| | - Nathan C. Gaddis
- GenOmics, Bioinformatics, and Translational Research Center, Biostatistics and Epidemiology DivisionRTI InternationalResearch Triangle ParkNorth CarolinaUSA
| | - Laura Saba
- Department of Pharmaceutical SciencesUniversity of Colorado, Anschutz Medical CampusAuroraColoradoUSA
| | - Hao Chen
- Department of Pharmacology, Addiction Science, and ToxicologyUniversity of Tennessee Health Science CenterMemphisTennesseeUSA
| | | | - Shan Zhang
- Department of Statistics and ProbabilityMichigan State UniversityEast LansingMichiganUSA
| | - Yuan Zhou
- Department of Department of BiostatisticsUniversity of FloridaGainesvilleFloridaUSA
| | - Spencer Mahaffey
- Department of Pharmaceutical Sciences, School of PharmacyUniversity of Colorado DenverAuroraColoradoUSA
| | - Christian Fischer
- Department of Genetics, Genomics and InformaticsUniversity of Tennessee Health Science CenterMemphisTennesseeUSA
| | - Sandra Sanchez‐Roige
- Department of PsychiatryUniversity of California, San DiegoLa JollaCaliforniaUSA
| | - Anita Bandrowski
- Department of NeuroscienceUniversity of California, San DiegoLa JollaCaliforniaUSA
| | - Qing Lu
- Department of Department of BiostatisticsUniversity of FloridaGainesvilleFloridaUSA
| | - Li Shen
- Nash Family Department of Neuroscience and Friedman Brain InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | | | - Joel Gelernter
- Department of PsychiatryYale University School of MedicineWest HavenConnecticutUSA
| | - Laura J. Bierut
- Department of PsychiatryWashington University School of MedicineSt. LouisMissouriUSA
| | - Dana B. Hancock
- GenOmics, Bioinformatics, and Translational Research Center, Biostatistics and Epidemiology DivisionRTI InternationalResearch Triangle ParkNorth CarolinaUSA
| | - Howard J. Edenberg
- Department of Medical and Molecular GeneticsIndiana University School of MedicineIndianapolisIndianaUSA
- Department of Biochemistry and Molecular BiologyIndiana University School of MedicineIndianapolisIndianaUSA
| | - Eric O. Johnson
- GenOmics, Bioinformatics, and Translational Research Center, Biostatistics and Epidemiology DivisionRTI InternationalResearch Triangle ParkNorth CarolinaUSA
| | - Eric J. Nestler
- Nash Family Department of Neuroscience and Friedman Brain InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Peter B. Barr
- Department of PsychologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Pjotr Prins
- Department of Genetics, Genomics and InformaticsUniversity of Tennessee Health Science CenterMemphisTennesseeUSA
| | - Desmond J. Smith
- Department of Molecular and Medical PharmacologyDavid Geffen School of Medicine, UCLALos AngelesCaliforniaUSA
| | - Schahram Akbarian
- Friedman Brain Institute and Departments of Psychiatry and NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | | | | | - Erich Baker
- Department of Computer ScienceBaylor UniversityWacoTexasUSA
| | - Daniel Jacobson
- Computational and Predictive Biology, BiosciencesOak Ridge National LaboratoryOak RidgeTennesseeUSA
- Department of PsychologyUniversity of Tennessee KnoxvilleKnoxvilleTennesseeUSA
| | - Abraham A. Palmer
- Department of PsychiatryUniversity of California, San DiegoLa JollaCaliforniaUSA
- Institute for Genomic Medicine, University of California San DiegoLa JollaCaliforniaUSA
| | - Michael Miles
- Department of Pharmacology and ToxicologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| | | | | | - Arpana Agrawal
- Department of PsychiatryWashington University School of MedicineSt. LouisMissouriUSA
| | - Maryann Martone
- Department of NeuroscienceUniversity of California, San DiegoLa JollaCaliforniaUSA
| | - Robert W. Williams
- Department of Genetics, Genomics and InformaticsUniversity of Tennessee Health Science CenterMemphisTennesseeUSA
| |
Collapse
|
20
|
Ashbrook DG, Arends D, Prins P, Mulligan MK, Roy S, Williams EG, Lutz CM, Valenzuela A, Bohl CJ, Ingels JF, McCarty MS, Centeno AG, Hager R, Auwerx J, Lu L, Williams RW. A platform for experimental precision medicine: The extended BXD mouse family. Cell Syst 2021; 12:235-247.e9. [PMID: 33472028 PMCID: PMC7979527 DOI: 10.1016/j.cels.2020.12.002] [Citation(s) in RCA: 86] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/29/2020] [Accepted: 12/21/2020] [Indexed: 12/17/2022]
Abstract
The challenge of precision medicine is to model complex interactions among DNA variants, phenotypes, development, environments, and treatments. We address this challenge by expanding the BXD family of mice to 140 fully isogenic strains, creating a uniquely powerful model for precision medicine. This family segregates for 6 million common DNA variants-a level that exceeds many human populations. Because each member can be replicated, heritable traits can be mapped with high power and precision. Current BXD phenomes are unsurpassed in coverage and include much omics data and thousands of quantitative traits. BXDs can be extended by a single-generation cross to as many as 19,460 isogenic F1 progeny, and this extended BXD family is an effective platform for testing causal modeling and for predictive validation. BXDs are a unique core resource for the field of experimental precision medicine.
Collapse
Affiliation(s)
- David G Ashbrook
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA.
| | - Danny Arends
- Lebenswissenschaftliche Fakultät, Albrecht Daniel Thaer-Institut, Humboldt-Universität zu Berlin, Invalidenstraße 42, 10115 Berlin, Germany
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Megan K Mulligan
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Suheeta Roy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Evan G Williams
- Luxembourg Centre for Systems Biomedicine, Université du Luxembourg, L-4365 Esch-sur-Alzette, Luxembourg
| | - Cathleen M Lutz
- Mouse Repository and the Rare and Orphan Disease Center, the Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Alicia Valenzuela
- Mouse Repository and the Rare and Orphan Disease Center, the Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Casey J Bohl
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Jesse F Ingels
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Melinda S McCarty
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Arthur G Centeno
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Reinmar Hager
- Division of Evolution & Genomic Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Johan Auwerx
- Laboratory of Integrative Systems Physiology, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Lu Lu
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA.
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA.
| |
Collapse
|
21
|
Anderson KR, Harris JA, Ng L, Prins P, Memar S, Ljungquist B, Fürth D, Williams RW, Ascoli GA, Dumitriu D. Highlights from the Era of Open Source Web-Based Tools. J Neurosci 2021; 41:927-936. [PMID: 33472826 PMCID: PMC7880282 DOI: 10.1523/jneurosci.1657-20.2020] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 11/22/2020] [Accepted: 11/29/2020] [Indexed: 12/20/2022] Open
Abstract
High digital connectivity and a focus on reproducibility are contributing to an open science revolution in neuroscience. Repositories and platforms have emerged across the whole spectrum of subdisciplines, paving the way for a paradigm shift in the way we share, analyze, and reuse vast amounts of data collected across many laboratories. Here, we describe how open access web-based tools are changing the landscape and culture of neuroscience, highlighting six free resources that span subdisciplines from behavior to whole-brain mapping, circuits, neurons, and gene variants.
Collapse
Affiliation(s)
- Kristin R Anderson
- Departments of Pediatrics and Psychiatry, Columbia University, New York, New York 10032
- Division of Developmental Psychobiology, New York State Psychiatric Institute, New York, New York 10032
- The Sackler Institute for Developmental Psychobiology, Columbia University, New York, New York 10032
- Columbia Population Research Center, Columbia University, New York, New York 10027
- Zuckerman Institute, Columbia University, New York, New York 10027
| | - Julie A Harris
- Allen Institute for Brain Science, Seattle, Washington 98109
| | - Lydia Ng
- Allen Institute for Brain Science, Seattle, Washington 98109
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, Tennessee 38163
| | - Sara Memar
- Robarts Research Institute, BrainsCAN, Schulich School of Medicine & Dentistry, Western University, London, Ontario N6A 3K7, Canada
| | - Bengt Ljungquist
- Center for Neural Informatics, Structures, and Plasticity, Krasnow Institute for Advanced Study; and Department of Bioengineering, Volgenau School of Engineering, George Mason University, Fairfax, Virginia 22030
| | - Daniel Fürth
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, Tennessee 38163
| | - Giorgio A Ascoli
- Center for Neural Informatics, Structures, and Plasticity, Krasnow Institute for Advanced Study; and Department of Bioengineering, Volgenau School of Engineering, George Mason University, Fairfax, Virginia 22030
| | - Dani Dumitriu
- Departments of Pediatrics and Psychiatry, Columbia University, New York, New York 10032
- Division of Developmental Psychobiology, New York State Psychiatric Institute, New York, New York 10032
- The Sackler Institute for Developmental Psychobiology, Columbia University, New York, New York 10032
- Columbia Population Research Center, Columbia University, New York, New York 10027
- Zuckerman Institute, Columbia University, New York, New York 10027
| |
Collapse
|
22
|
Wang E, Song X, Burke A, Boca S, Prins P, He A, Unger K. DNA Damage Response Protein Mutations Associated with Response to Radiotherapy in Gastrointestinal Malignancies. Int J Radiat Oncol Biol Phys 2020. [DOI: 10.1016/j.ijrobp.2020.07.1774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
23
|
Mott R, Fischer C, Prins P, Davies RW. Private Genomes and Public SNPs: Homomorphic Encryption of Genotypes and Phenotypes for Shared Quantitative Genetics. Genetics 2020; 215:359-372. [PMID: 32327562 PMCID: PMC7268998 DOI: 10.1534/genetics.120.303153] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 04/13/2020] [Indexed: 11/18/2022] Open
Abstract
Sharing human genotype and phenotype data is essential to discover otherwise inaccessible genetic associations, but is a challenge because of privacy concerns. Here, we present a method of homomorphic encryption that obscures individuals' genotypes and phenotypes, and is suited to quantitative genetic association analysis. Encrypted ciphertext and unencrypted plaintext are analytically interchangeable. The encryption uses a high-dimensional random linear orthogonal transformation key that leaves the likelihood of quantitative trait data unchanged under a linear model with normally distributed errors. It also preserves linkage disequilibrium between genetic variants and associations between variants and phenotypes. It scrambles relationships between individuals: encrypted genotype dosages closely resemble Gaussian deviates, and can be replaced by quantiles from a Gaussian with negligible effects on accuracy. Likelihood-based inferences are unaffected by orthogonal encryption. These include linear mixed models to control for unequal relatedness between individuals, heritability estimation, and including covariates when testing association. Orthogonal transformations can be applied in a modular fashion for multiparty federated mega-analyses where the parties first agree to share a common set of genotype sites and covariates prior to encryption. Each then privately encrypts and shares their own ciphertext, and analyses all parties' ciphertexts. In the absence of private variants, or knowledge of the key, we show that it is infeasible to decrypt ciphertext using existing brute-force or noise-reduction attacks. We present the method as a challenge to the community to determine its security.
Collapse
Affiliation(s)
- Richard Mott
- Genetics Institute, University College London, WC1E 6BT, UK
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38103
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38103
| | | |
Collapse
|
24
|
Affiliation(s)
- Leyla Garcia
- ZB MED Information Centre for Life Sciences, Cologne, Germany
| | - Erick Antezana
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
- Bayer CropScience SA-NV, Diegem, Belgium
| | | | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | | | - Pjotr Prins
- University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Juan M. Banda
- Georgia State University, Atlanta, Georgia, United States of America
| | | |
Collapse
|
25
|
Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, Moriya Y, Tokimatsu T, Yamaguchi A, Yamamoto Y, Wu H, Amstutz P, Antezana E, Aoki NP, Arakawa K, Bolleman JT, Bolton E, Bonnal RJP, Bono H, Burger K, Chiba H, Cohen KB, Deutsch EW, Fernández-Breis JT, Fu G, Fujisawa T, Fukushima A, García A, Goto N, Groza T, Hercus C, Hoehndorf R, Itaya K, Juty N, Kawashima T, Kim JH, Kinjo AR, Kotera M, Kozaki K, Kumagai S, Kushida T, Lütteke T, Matsubara M, Miyamoto J, Mohsen A, Mori H, Naito Y, Nakazato T, Nguyen-Xuan J, Nishida K, Nishida N, Nishide H, Ogishima S, Ohta T, Okuda S, Paten B, Perret JL, Prathipati P, Prins P, Queralt-Rosinach N, Shinmachi D, Suzuki S, Tabata T, Takatsuki T, Taylor K, Thompson M, Uchiyama I, Vieira B, Wei CH, Wilkinson M, Yamada I, Yamanaka R, Yoshitake K, Yoshizawa AC, Dumontier M, Kosaki K, Takagi T. BioHackathon 2015: Semantics of data for life sciences and reproducible research. F1000Res 2020; 9:136. [PMID: 32308977 PMCID: PMC7141167 DOI: 10.12688/f1000research.18236.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/05/2020] [Indexed: 01/08/2023] Open
Abstract
We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.
Collapse
Affiliation(s)
- Rutger A. Vos
- Institute of Biology Leiden, Leiden University, Leiden, The Netherlands
- Naturalis Biodiversity Center, Leiden, The Netherlands
| | | | - Hiroyuki Mishima
- Department of Human Genetics, Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan
| | - Shin Kawano
- Database Center for Life Science, Tokyo, Japan
| | | | | | - Yuki Moriya
- Database Center for Life Science, Tokyo, Japan
| | | | | | | | - Hongyan Wu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | | | - Erick Antezana
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Nobuyuki P. Aoki
- Faculty of Science and Engineering, SOKA University, Tokyo, Japan
| | - Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Jerven T. Bolleman
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Lausanne, Switzerland
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | - Raoul J. P. Bonnal
- Istituto Nazionale Genetica Molecolare, Romeo ed Enrica Invernizzi, Milan, Italy
| | | | - Kees Burger
- Dutch Techcentre for Life Sciences, Utrecht, The Netherlands
| | - Hirokazu Chiba
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Kevin B. Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, USA
- Université Paris-Saclay, LIMSI, CNRS, Paris, France
| | | | | | - Gang Fu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | | | | | | | - Naohisa Goto
- Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
| | - Tudor Groza
- St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Darlinghurst, Australia
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - Colin Hercus
- Novocraft Technologies Sdn. Bhd., Selangor, Malaysia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Kotone Itaya
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Nick Juty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Jee-Hyub Kim
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Akira R. Kinjo
- Institute for Protein Research, Osaka University, Osaka, Japan
| | - Masaaki Kotera
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Kouji Kozaki
- The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan
| | | | - Tatsuya Kushida
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
| | - Thomas Lütteke
- Institute of Veterinary Physiology and Biochemistry, Justus-Liebig University Giessen, Giessen, Germany
- Gesellschaft für innovative Personalwirtschaftssysteme mbH (GIP GmbH), Offenbach, Germany
| | | | | | - Attayeb Mohsen
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Hiroshi Mori
- Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Yuki Naito
- Database Center for Life Science, Tokyo, Japan
| | | | | | | | - Naoki Nishida
- Department of Systems Science, Osaka University, Osaka, Japan
| | - Hiroyo Nishide
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Soichi Ogishima
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Tazro Ohta
- Database Center for Life Science, Tokyo, Japan
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, USA
| | | | - Philip Prathipati
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Pjotr Prins
- University Medical Center Utrecht, Utrecht, The Netherlands
- University of Tennessee Health Science Center, Memphis, USA
| | - Núria Queralt-Rosinach
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Shinya Suzuki
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Tsuyosi Tabata
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Japan
| | | | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mark Thompson
- Leiden University Medical Center, Leiden, The Netherlands
| | - Ikuo Uchiyama
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Bruno Vieira
- WurmLab, School of Biological & Chemical Sciences, Queen Mary University of London, London, UK
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | - Mark Wilkinson
- Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid, Spain
| | | | | | - Kazutoshi Yoshitake
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | | | - Michel Dumontier
- Institute of Data Science, Maastricht University, Maastricht, The Netherlands
| | - Kenjiro Kosaki
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, Japan
| | - Toshihisa Takagi
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
26
|
Moelans CB, de Ligt J, van der Groep P, Prins P, Besselink NJM, Hoogstraat M, Ter Hoeve ND, Lacle MM, Kornegoor R, van der Pol CC, de Leng WWJ, Barbé E, van der Vegt B, Martens J, Bult P, Smit VTHBM, Koudijs MJ, Nijman IJ, Voest EE, Selenica P, Weigelt B, Reis-Filho JS, van der Wall E, Cuppen E, van Diest PJ. The molecular genetic make-up of male breast cancer. Endocr Relat Cancer 2019; 26:779-794. [PMID: 31340200 PMCID: PMC6938562 DOI: 10.1530/erc-19-0278] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 07/23/2019] [Indexed: 12/17/2022]
Abstract
Male breast cancer (MBC) is extremely rare and accounts for less than 1% of all breast malignancies. Therefore, clinical management of MBC is currently guided by research on the disease in females. In this study, DNA obtained from 45 formalin-fixed paraffin-embedded (FFPE) MBCs with and 90 MBCs (52 FFPE and 38 fresh-frozen) without matched normal tissues was subjected to massively parallel sequencing targeting all exons of 1943 cancer-related genes. The landscape of mutations and copy number alterations was compared to that of publicly available estrogen receptor (ER)-positive female breast cancers (smFBCs) and correlated to prognosis. From the 135 MBCs, 90% showed ductal histology, 96% were ER-positive, 66% were progesterone receptor (PR)-positive, and 2% HER2-positive, resulting in 50, 46 and 4% luminal A-like, luminal B-like and basal-like cases, respectively. Five patients had Klinefelter syndrome (4%) and 11% of patients harbored pathogenic BRCA2 germline mutations. The genomic landscape of MBC to some extent recapitulated that of smFBC, with recurrent PIK3CA (36%) and GATA3 (15%) somatic mutations, and with 40% of the most frequently amplified genes overlapping between both sexes. TP53 (3%) somatic mutations were significantly less frequent in MBC compared to smFBC, whereas somatic mutations in genes regulating chromatin function and homologous recombination deficiency-related signatures were more prevalent. MDM2 amplifications were frequent (13%), correlated with protein overexpression (P = 0.001) and predicted poor outcome (P = 0.007). In conclusion, despite similarities in the genomic landscape between MBC and smFBC, MBC is a molecularly unique and heterogeneous disease requiring its own clinical trials and treatment guidelines.
Collapse
Affiliation(s)
- Cathy B Moelans
- Department of Pathology, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Joep de Ligt
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Petra van der Groep
- Department of Pathology, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Department of Internal Medicine, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Pjotr Prins
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Nicolle J M Besselink
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Center for Personalized Cancer Treatment, Rotterdam, The Netherlands
| | - Marlous Hoogstraat
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Natalie D Ter Hoeve
- Department of Pathology, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Miangela M Lacle
- Department of Pathology, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Robert Kornegoor
- Department of Pathology, Gelre Ziekenhuizen, Appeldoorn, The Netherlands
| | - Carmen C van der Pol
- Cancer Center, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Wendy W J de Leng
- Department of Pathology, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Ellis Barbé
- Department of Pathology, VU University Medical Center, Amsterdam, The Netherlands
| | - Bert van der Vegt
- Department of Pathology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - John Martens
- Department of Medical Oncology, Daniel den Hoed Cancer Center, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Peter Bult
- Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
| | | | - Marco J Koudijs
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Center for Personalized Cancer Treatment, Rotterdam, The Netherlands
| | - Isaac J Nijman
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Center for Personalized Cancer Treatment, Rotterdam, The Netherlands
| | - Emile E Voest
- Center for Personalized Cancer Treatment, Rotterdam, The Netherlands
- Department of Medical Oncology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Pier Selenica
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Britta Weigelt
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Jorge S Reis-Filho
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Elsken van der Wall
- Cancer Center, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Edwin Cuppen
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cancer Genomics.nl, Center for Molecular Medicine, UMC Utrecht, Utrecht, The Netherlands
| | - Paul J van Diest
- Department of Pathology, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
27
|
Burke A, Prins P, Khan A, Hwang A, Marshall J, Unger K. Comprehensive Genetic Profiling and Clinical Outcomes in Gastrointestinal Cancers Treated with Radiotherapy. Int J Radiat Oncol Biol Phys 2019. [DOI: 10.1016/j.ijrobp.2019.06.1998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
28
|
Strozzi F, Janssen R, Wurmus R, Crusoe MR, Githinji G, Di Tommaso P, Belhachemi D, Möller S, Smant G, de Ligt J, Prins P. Scalable Workflows and Reproducible Data Analysis for Genomics. Methods Mol Biol 2019; 1910:723-745. [PMID: 31278683 PMCID: PMC7613310 DOI: 10.1007/978-1-4939-9074-0_24] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Biological, clinical, and pharmacological research now often involves analyses of genomes, transcriptomes, proteomes, and interactomes, within and between individuals and across species. Due to large volumes, the analysis and integration of data generated by such high-throughput technologies have become computationally intensive, and analysis can no longer happen on a typical desktop computer.In this chapter we show how to describe and execute the same analysis using a number of workflow systems and how these follow different approaches to tackle execution and reproducibility issues. We show how any researcher can create a reusable and reproducible bioinformatics pipeline that can be deployed and run anywhere. We show how to create a scalable, reusable, and shareable workflow using four different workflow engines: the Common Workflow Language (CWL), Guix Workflow Language (GWL), Snakemake, and Nextflow. Each of which can be run in parallel.We show how to bundle a number of tools used in evolutionary biology by using Debian, GNU Guix, and Bioconda software distributions, along with the use of container systems, such as Docker, GNU Guix, and Singularity. Together these distributions represent the overall majority of software packages relevant for biology, including PAML, Muscle, MAFFT, MrBayes, and BLAST. By bundling software in lightweight containers, they can be deployed on a desktop, in the cloud, and, increasingly, on compute clusters.By bundling software through these public software distributions, and by creating reproducible and shareable pipelines using these workflow engines, not only do bioinformaticians have to spend less time reinventing the wheel but also do we get closer to the ideal of making science reproducible. The examples in this chapter allow a quick comparison of different solutions.
Collapse
|
29
|
Mulligan MK, Zhao W, Dickerson M, Arends D, Prins P, Cavigelli SA, Terenina E, Mormede P, Lu L, Jones BC. Genetic Contribution to Initial and Progressive Alcohol Intake Among Recombinant Inbred Strains of Mice. Front Genet 2018; 9:370. [PMID: 30319684 PMCID: PMC6167410 DOI: 10.3389/fgene.2018.00370] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 08/22/2018] [Indexed: 01/12/2023] Open
Abstract
We profiled individual differences in alcohol consumption upon initial exposure and during 5 weeks of voluntary alcohol intake in female mice from 39 BXD recombinant inbred strains and parents using the drinking in the dark (DID) method. In this paradigm, a single bottle of 20% (v/v) alcohol was presented as the sole liquid source for 2 or 4 h starting 3 h into the dark cycle. For 3 consecutive days mice had access to alcohol for 2 h followed by a 4th day of 4 h access and 3 intervening days where alcohol was not offered. We followed this regime for 5 weeks. For most strains, 2 or 4 h alcohol intake increased over the 5-week period, with some strains demonstrating greatly increased intake. There was considerable and heritable genetic variation in alcohol consumption upon initial early and sustained weekly exposure. Two different mapping algorithms were used to identify QTLs associated with alcohol intake and only QTLs detected by both methods were considered further. Multiple suggestive QTLs for alcohol intake on chromosomes (Chrs) 2, 6, and 12 were identified for the first 4 h exposure. Suggestive QTLs for sustained intake during later weeks were identified on Chrs 4 and 8. Thirty high priority candidate genes, including Entpd2, Per3, and Fto were nominated for early and sustained alcohol intake QTLs. In addition, a suggestive QTL on Chr 15 was detected for change in 2 h alcohol intake over the duration of the study and Adcy8 was identified as a strong candidate gene. Bioinformatic analyses revealed that early and sustained alcohol intake is likely driven by genes and pathways involved in signaling, and/or immune and metabolic function, while a combination of epigenetic factors related to alcohol experience and genetic factors likely drives progressive alcohol intake.
Collapse
Affiliation(s)
- Megan K Mulligan
- Department of Genetics, Genomics, and Informatics, The University of Tennessee Health Science Center, Memphis, TN, United States
| | - Wenyuan Zhao
- Department of Genetics, Genomics, and Informatics, The University of Tennessee Health Science Center, Memphis, TN, United States
| | - Morgan Dickerson
- Department of Genetics, Genomics, and Informatics, The University of Tennessee Health Science Center, Memphis, TN, United States
| | - Danny Arends
- Albrecht Daniel Thaer-Institut für Agrar- und Gartenbauwissenschaften, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Pjotr Prins
- Biomedical Genetics, University Medical Center Utrecht, Utrecht, Netherlands
| | - Sonia A Cavigelli
- Department of BioBehavioral Health, The Pennsylvania State University, University Park, PA, United States
| | - Elena Terenina
- GenPhySE, INRA, ENVT, Université de Toulouse, Castanet-Tolosan, France
| | - Pierre Mormede
- GenPhySE, INRA, ENVT, Université de Toulouse, Castanet-Tolosan, France
| | - Lu Lu
- Department of Genetics, Genomics, and Informatics, The University of Tennessee Health Science Center, Memphis, TN, United States
| | - Byron C Jones
- Department of Genetics, Genomics, and Informatics, The University of Tennessee Health Science Center, Memphis, TN, United States
| |
Collapse
|
30
|
Said Mohammed K, Kibinge N, Prins P, Agoti CN, Cotten M, Nokes D, Brand S, Githinji G. Evaluating the performance of tools used to call minority variants from whole genome short-read data. Wellcome Open Res 2018; 3:21. [PMID: 30483597 PMCID: PMC6234735 DOI: 10.12688/wellcomeopenres.13538.2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2018] [Indexed: 01/06/2023] Open
Abstract
Background: High-throughput whole genome sequencing facilitates investigation of minority virus sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially assist in elucidating person-person transmission pathways. Several minority variant callers have been developed to describe low frequency sub-populations from whole genome sequence data. These callers differ based on bioinformatics and statistical methods used to discriminate sequencing errors from low-frequency variants. Methods: We evaluated the diagnostic performance and concordance between published minority variant callers used in identifying minority variants from whole-genome sequence data from virus samples. We used the ART-Illumina read simulation tool to generate three artificial short-read datasets of varying coverage and error profiles from an RSV reference genome. The datasets were spiked with nucleotide variants at predetermined positions and frequencies. Variants were called using FreeBayes, LoFreq, Vardict, and VarScan2. The variant callers' agreement in identifying known variants was quantified using two measures; concordance accuracy and the inter-caller concordance. Results: The variant callers reported differences in identifying minority variants from the datasets. Concordance accuracy and inter-caller concordance were positively correlated with sample coverage. FreeBayes identified the majority of variants although it was characterised by variable sensitivity and precision in addition to a high false positive rate relative to the other minority variant callers and which varied with sample coverage. LoFreq was the most conservative caller. Conclusions: We conducted a performance and concordance evaluation of four minority variant calling tools used to identify and quantify low frequency variants. Inconsistency in the quality of sequenced samples impacts on sensitivity and accuracy of minority variant callers. Our study suggests that combining at least three tools when identifying minority variants is useful in filtering errors when calling low frequency variants.
Collapse
Affiliation(s)
- Khadija Said Mohammed
- Pwani University, Kilifi, Kenya
- KEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, Kenya
| | - Nelson Kibinge
- KEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, Kenya
| | - Pjotr Prins
- KEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, Kenya
- University Medical Center Utrecht, Utrecht, The Netherlands
| | - Charles N. Agoti
- Pwani University, Kilifi, Kenya
- KEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, Kenya
| | - Matthew Cotten
- Virosciences Department, Erasmus Medical Centre, Rotterdam, The Netherlands
| | - D.J. Nokes
- KEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, Kenya
- School of Life Sciences and Zeeman Institute (SBIDER), University of Warwick, Coventry, UK
| | - Samuel Brand
- School of Life Sciences and Zeeman Institute (SBIDER), University of Warwick, Coventry, UK
| | - George Githinji
- KEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, Kenya
| |
Collapse
|
31
|
Said Mohammed K, Kibinge N, Prins P, Agoti CN, Cotten M, Nokes D, Brand S, Githinji G. Evaluating the performance of tools used to call minority variants from whole genome short-read data. Wellcome Open Res 2018; 3:21. [PMID: 30483597 PMCID: PMC6234735 DOI: 10.12688/wellcomeopenres.13538.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/06/2018] [Indexed: 01/11/2023] Open
Abstract
Background: High-throughput whole genome sequencing facilitates investigation of minority sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially help to elucidate person-person transmission chains. Several minority variant callers have been developed to describe the minority variants sub-populations from whole genome sequence data. However, they differ on bioinformatics and statistical approaches used to discriminate sequencing errors from low-frequency variants. Methods: We evaluated the diagnostic performance and concordance between published minority variant callers used in identifying minority variants from whole-genome sequence data. The ART-Illumina read simulation tool was used to generate three artificial short-read datasets of varying coverage and error profiles from an RSV reference genome. The datasets were spiked with nucleotide variants at predetermined positions and frequencies. Variants were called using FreeBayes, LoFreq, Vardict, and VarScan2. The variant callers' agreement in identifying known variants was quantified using two measures; concordance accuracy and the inter-caller concordance. Results: The variant callers reported differences in identifying minority variants from the datasets. Concordance accuracy and inter-caller concordance were positively correlated with sample coverage. FreeBayes identified majority of the variants although it was characterised by variable sensitivity and precision in addition to a high false positive rate relative to the other minority variant callers and which varied with sample coverage. LoFreq was the most conservative caller. Conclusions: We conducted a performance and concordance evaluation of four minority variant calling tools used to identify and quantify low frequency variants. Inconsistency in the quality of sequenced samples impact on sensitivity and accuracy of minority variant callers. Our study suggests that combining at least three tools when identifying minority variants is useful in filtering errors when calling low frequency variants.
Collapse
Affiliation(s)
- Khadija Said Mohammed
- Pwani University, Kilifi, Kenya
- KEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, Kenya
| | - Nelson Kibinge
- KEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, Kenya
| | - Pjotr Prins
- KEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, Kenya
- University Medical Center Utrecht, Utrecht, The Netherlands
| | - Charles N. Agoti
- Pwani University, Kilifi, Kenya
- KEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, Kenya
| | - Matthew Cotten
- Virosciences Department, Erasmus Medical Centre, Rotterdam, The Netherlands
| | - D.J. Nokes
- KEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, Kenya
- School of Life Sciences and Zeeman Institute (SBIDER), University of Warwick, Coventry, UK
| | - Samuel Brand
- School of Life Sciences and Zeeman Institute (SBIDER), University of Warwick, Coventry, UK
| | - George Githinji
- KEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, Kenya
| |
Collapse
|
32
|
Smith AM, Niemeyer KE, Katz DS, Barba LA, Githinji G, Gymrek M, Huff KD, Madan CR, Cabunoc Mayes A, Moerman KM, Prins P, Ram K, Rokem A, Teal TK, Valls Guimera R, Vanderplas JT. Journal of Open Source Software (JOSS): design and first-year review. PeerJ Prepr 2018; 4:e147. [PMID: 32704456 PMCID: PMC7340488 DOI: 10.7717/peerj-cs.147] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 01/24/2018] [Indexed: 06/01/2023]
Abstract
This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit system of science, JOSS addresses the dearth of rewards for key contributions to science made in the form of software. JOSS publishes articles that encapsulate scholarship contained in the software itself, and its rigorous peer review targets the software components: functionality, documentation, tests, continuous integration, and the license. A JOSS article contains an abstract describing the purpose and functionality of the software, references, and a link to the software archive. The article is the entry point of a JOSS submission, which encompasses the full set of software artifacts. Submission and review proceed in the open, on GitHub. Editors, reviewers, and authors work collaboratively and openly. Unlike other journals, JOSS does not reject articles requiring major revision; while not yet accepted, articles remain visible and under review until the authors make adequate changes (or withdraw, if unable to meet requirements). Once an article is accepted, JOSS gives it a digital object identifier (DOI), deposits its metadata in Crossref, and the article can begin collecting citations on indexers like Google Scholar and other services. Authors retain copyright of their JOSS article, releasing it under a Creative Commons Attribution 4.0 International License. In its first year, starting in May 2016, JOSS published 111 articles, with more than 40 additional articles under review. JOSS is a sponsored project of the nonprofit organization NumFOCUS and is an affiliate of the Open Source Initiative (OSI).
Collapse
Affiliation(s)
- Arfon M. Smith
- Data Science Mission Office, Space Telescope Science Institute, Baltimore, MD, United States of America
| | - Kyle E. Niemeyer
- School of Mechanical, Industrial, and Manufacturing Engineering, Oregon State University, Corvallis, OR, United States of America
| | - Daniel S. Katz
- National Center for Supercomputing Applications & Department of Computer Science & Department of Electrical and Computer Engineering & School of Information Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Lorena A. Barba
- Department of Mechanical & Aerospace Engineering, The George Washington University, Washington, D.C., United States of America
| | | | - Melissa Gymrek
- Departments of Medicine & Computer Science and Engineering, University of California, San Diego, La Jolla, CA, United States of America
| | - Kathryn D. Huff
- Department of Nuclear, Plasma, and Radiological Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | | | | | - Kevin M. Moerman
- MIT Media Lab, Massachusetts Institute of Technology, Cambridge, MA, United States of America
- Trinity Centre for Bioengineering, Trinity College, The University of Dublin, Dublin, Ireland
| | - Pjotr Prins
- University of Tennessee Health Science Center, Memphis, TN, United States of America
- University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Karthik Ram
- Berkeley Institute for Data Science, University of California, Berkeley, CA, United States of America
| | - Ariel Rokem
- eScience Institute, University of Washington, Seattle, WA, United States of America
| | | | - Roman Valls Guimera
- University of Melbourne Centre for Cancer Research, University of Melbourne, Melbourne, Australia
| | - Jacob T. Vanderplas
- eScience Institute, University of Washington, Seattle, WA, United States of America
| |
Collapse
|
33
|
Li H, Wang X, Rukina D, Huang Q, Lin T, Sorrentino V, Zhang H, Bou Sleiman M, Arends D, McDaid A, Luan P, Ziari N, Velázquez-Villegas LA, Gariani K, Kutalik Z, Schoonjans K, Radcliffe RA, Prins P, Morgenthaler S, Williams RW, Auwerx J. An Integrated Systems Genetics and Omics Toolkit to Probe Gene Function. Cell Syst 2017; 6:90-102.e4. [PMID: 29199021 DOI: 10.1016/j.cels.2017.10.016] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Revised: 08/31/2017] [Accepted: 10/25/2017] [Indexed: 01/20/2023]
Abstract
Identifying genetic and environmental factors that impact complex traits and common diseases is a high biomedical priority. Here, we developed, validated, and implemented a series of multi-layered systems approaches, including (expression-based) phenome-wide association, transcriptome-/proteome-wide association, and (reverse-) mediation analysis, in an open-access web server (systems-genetics.org) to expedite the systems dissection of gene function. We applied these approaches to multi-omics datasets from the BXD mouse genetic reference population, and identified and validated associations between genes and clinical and molecular phenotypes, including previously unreported links between Rpl26 and body weight, and Cpt1a and lipid metabolism. Furthermore, through mediation and reverse-mediation analysis we established regulatory relations between genes, such as the co-regulation of BCKDHA and BCKDHB protein levels, and identified targets of transcription factors E2F6, ZFP277, and ZKSCAN1. Our multifaceted toolkit enabled the identification of gene-gene and gene-phenotype links that are robust and that translate well across populations and species, and can be universally applied to any populations with multi-omics datasets.
Collapse
Affiliation(s)
- Hao Li
- Laboratory for Integrative and Systems Physiology, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Xu Wang
- Laboratory for Integrative and Systems Physiology, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Daria Rukina
- Institute of Mathematics, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Qingyao Huang
- Laboratory of Metabolic Signaling, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Tao Lin
- Laboratory for Integrative and Systems Physiology, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Vincenzo Sorrentino
- Laboratory for Integrative and Systems Physiology, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Hongbo Zhang
- Laboratory for Integrative and Systems Physiology, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Maroun Bou Sleiman
- Laboratory for Integrative and Systems Physiology, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Danny Arends
- Albrecht Daniel Thaer-Institut für Agrar- und Gartenbauwissenschaften, Humboldt-Universität zu Berlin, D-10115 Berlin, Germany
| | - Aaron McDaid
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland; Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne 1010, Switzerland
| | - Peiling Luan
- Laboratory for Integrative and Systems Physiology, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Naveed Ziari
- Laboratory for Integrative and Systems Physiology, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Laura A Velázquez-Villegas
- Laboratory of Metabolic Signaling, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Karim Gariani
- Laboratory for Integrative and Systems Physiology, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Zoltan Kutalik
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland; Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne 1010, Switzerland
| | - Kristina Schoonjans
- Laboratory of Metabolic Signaling, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Richard A Radcliffe
- Department of Pharmaceutical Sciences, University of Colorado, Aurora, CO 80045, USA
| | - Pjotr Prins
- University Medical Center Utrecht, 3584CT Utrecht, the Netherlands; Department of Genetics, Genomics and Informatics, University of Tennessee, Memphis, TN 38163, USA
| | - Stephan Morgenthaler
- Institute of Mathematics, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee, Memphis, TN 38163, USA
| | - Johan Auwerx
- Laboratory for Integrative and Systems Physiology, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland.
| |
Collapse
|
34
|
Nijveen H, Ligterink W, Keurentjes JJB, Loudet O, Long J, Sterken MG, Prins P, Hilhorst HW, de Ridder D, Kammenga JE, Snoek BL. AraQTL - workbench and archive for systems genetics in Arabidopsis thaliana. Plant J 2017; 89:1225-1235. [PMID: 27995664 DOI: 10.1111/tpj.13457] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Revised: 11/24/2016] [Accepted: 12/06/2016] [Indexed: 06/06/2023]
Abstract
Genetical genomics studies uncover genome-wide genetic interactions between genes and their transcriptional regulators. High-throughput measurement of gene expression in recombinant inbred line populations has enabled investigation of the genetic architecture of variation in gene expression. This has the potential to enrich our understanding of the molecular mechanisms affected by and underlying natural variation. Moreover, it contributes to the systems biology of natural variation, as a substantial number of experiments have resulted in a valuable amount of interconnectable phenotypic, molecular and genotypic data. A number of genetical genomics studies have been published for Arabidopsis thaliana, uncovering many expression quantitative trait loci (eQTLs). However, these complex data are not easily accessible to the plant research community, leaving most of the valuable genetic interactions unexplored as cross-analysis of these studies is a major effort. We address this problem with AraQTL (http://www.bioinformatics.nl/Ara QTL/), an easily accessible workbench and database for comparative analysis and meta-analysis of all published Arabidopsis eQTL datasets. AraQTL provides a workbench for comparing, re-using and extending upon the results of these experiments. For example, one can easily screen a physical region for specific local eQTLs that could harbour candidate genes for phenotypic QTLs, or detect gene-by-environment interactions by comparing eQTLs under different conditions.
Collapse
Affiliation(s)
- Harm Nijveen
- Bioinformatics Group, Wageningen University, Droevendaalsesteeg 1, Wageningen, NL-6708 PB, The Netherlands
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, Wageningen, NL-6708 PB, The Netherlands
| | - Wilco Ligterink
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, Wageningen, NL-6708 PB, The Netherlands
| | - Joost J B Keurentjes
- Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, Wageningen, NL-6708 PB, The Netherlands
| | - Olivier Loudet
- Institut Jean-Pierre Bourgin, INRA, AgroParisTech, CNRS, Université Paris-Saclay, Versailles, 78000, France
| | - Jiao Long
- Bioinformatics Group, Wageningen University, Droevendaalsesteeg 1, Wageningen, NL-6708 PB, The Netherlands
- Laboratory of Nematology, Wageningen University, Droevendaalsesteeg 1, Wageningen, NL-6708 PB, The Netherlands
| | - Mark G Sterken
- Laboratory of Nematology, Wageningen University, Droevendaalsesteeg 1, Wageningen, NL-6708 PB, The Netherlands
| | - Pjotr Prins
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
| | - Henk W Hilhorst
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, Wageningen, NL-6708 PB, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University, Droevendaalsesteeg 1, Wageningen, NL-6708 PB, The Netherlands
| | - Jan E Kammenga
- Laboratory of Nematology, Wageningen University, Droevendaalsesteeg 1, Wageningen, NL-6708 PB, The Netherlands
| | - Basten L Snoek
- Laboratory of Nematology, Wageningen University, Droevendaalsesteeg 1, Wageningen, NL-6708 PB, The Netherlands
| |
Collapse
|
35
|
Blokzijl F, de Ligt J, Jager M, Sasselli V, Roerink S, Sasaki N, Huch M, Boymans S, Kuijk E, Prins P, Nijman IJ, Martincorena I, Mokry M, Wiegerinck CL, Middendorp S, Sato T, Schwank G, Nieuwenhuis EES, Verstegen MMA, van der Laan LJW, de Jonge J, IJzermans JNM, Vries RG, van de Wetering M, Stratton MR, Clevers H, Cuppen E, van Boxtel R. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 2016; 538:260-264. [PMID: 27698416 PMCID: PMC5536223 DOI: 10.1038/nature19768] [Citation(s) in RCA: 605] [Impact Index Per Article: 75.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2016] [Accepted: 08/16/2016] [Indexed: 12/20/2022]
Abstract
The gradual accumulation of genetic mutations in human adult stem cells (ASCs) during life is associated with various age-related diseases, including cancer. Extreme variation in cancer risk across tissues was recently proposed to depend on the lifetime number of ASC divisions, owing to unavoidable random mutations that arise during DNA replication. However, the rates and patterns of mutations in normal ASCs remain unknown. Here we determine genome-wide mutation patterns in ASCs of the small intestine, colon and liver of human donors with ages ranging from 3 to 87 years by sequencing clonal organoid cultures derived from primary multipotent cells. Our results show that mutations accumulate steadily over time in all of the assessed tissue types, at a rate of approximately 40 novel mutations per year, despite the large variation in cancer incidence among these tissues. Liver ASCs, however, have different mutation spectra compared to those of the colon and small intestine. Mutational signature analysis reveals that this difference can be attributed to spontaneous deamination of methylated cytosine residues in the colon and small intestine, probably reflecting their high ASC division rate. In liver, a signature with an as-yet-unknown underlying mechanism is predominant. Mutation spectra of driver genes in cancer show high similarity to the tissue-specific ASC mutation spectra, suggesting that intrinsic mutational processes in ASCs can initiate tumorigenesis. Notably, the inter-individual variation in mutation rate and spectra are low, suggesting tissue-specific activity of common mutational processes throughout life.
Collapse
Affiliation(s)
- Francis Blokzijl
- Center for Molecular Medicine, Cancer Genomics Netherlands, Department of Genetics, University Medical Center Utrecht, Heidelberglaan 100, 3584CX Utrecht, The Netherlands
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Joep de Ligt
- Center for Molecular Medicine, Cancer Genomics Netherlands, Department of Genetics, University Medical Center Utrecht, Heidelberglaan 100, 3584CX Utrecht, The Netherlands
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Myrthe Jager
- Center for Molecular Medicine, Cancer Genomics Netherlands, Department of Genetics, University Medical Center Utrecht, Heidelberglaan 100, 3584CX Utrecht, The Netherlands
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Valentina Sasselli
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Sophie Roerink
- Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Nobuo Sasaki
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Meritxell Huch
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Sander Boymans
- Center for Molecular Medicine, Cancer Genomics Netherlands, Department of Genetics, University Medical Center Utrecht, Heidelberglaan 100, 3584CX Utrecht, The Netherlands
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Ewart Kuijk
- Center for Molecular Medicine, Cancer Genomics Netherlands, Department of Genetics, University Medical Center Utrecht, Heidelberglaan 100, 3584CX Utrecht, The Netherlands
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Pjotr Prins
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Isaac J Nijman
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Inigo Martincorena
- Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Michal Mokry
- Department of Pediatrics, University Medical Center Utrecht, Lundlaan 6, 3584 EA Utrecht, The Netherlands
| | - Caroline L Wiegerinck
- Department of Pediatrics, University Medical Center Utrecht, Lundlaan 6, 3584 EA Utrecht, The Netherlands
| | - Sabine Middendorp
- Department of Pediatrics, University Medical Center Utrecht, Lundlaan 6, 3584 EA Utrecht, The Netherlands
| | - Toshiro Sato
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Gerald Schwank
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Edward E S Nieuwenhuis
- Department of Pediatrics, University Medical Center Utrecht, Lundlaan 6, 3584 EA Utrecht, The Netherlands
| | - Monique M A Verstegen
- Department of Surgery, Erasmus MC-University Medical Center, Postbus 2040, 3000 CA Rotterdam, The Netherlands
| | - Luc J W van der Laan
- Department of Surgery, Erasmus MC-University Medical Center, Postbus 2040, 3000 CA Rotterdam, The Netherlands
| | - Jeroen de Jonge
- Department of Surgery, Erasmus MC-University Medical Center, Postbus 2040, 3000 CA Rotterdam, The Netherlands
| | - Jan N M IJzermans
- Department of Surgery, Erasmus MC-University Medical Center, Postbus 2040, 3000 CA Rotterdam, The Netherlands
| | - Robert G Vries
- Foundation Hubrecht Organoid Technology (HUB), Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Marc van de Wetering
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Michael R Stratton
- Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Hans Clevers
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Edwin Cuppen
- Center for Molecular Medicine, Cancer Genomics Netherlands, Department of Genetics, University Medical Center Utrecht, Heidelberglaan 100, 3584CX Utrecht, The Netherlands
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | - Ruben van Boxtel
- Center for Molecular Medicine, Cancer Genomics Netherlands, Department of Genetics, University Medical Center Utrecht, Heidelberglaan 100, 3584CX Utrecht, The Netherlands
- Hubrecht Institute for Developmental Biology and Stem Cell Research, KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| |
Collapse
|
36
|
Al-Hajeili M, Serzan M, Prins P, Marshall J. P-056 Outcome of maintenance therapy in patients who achieved NED after liver resection for mCRC. Ann Oncol 2016. [DOI: 10.1093/annonc/mdw199.54] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
37
|
Hassan MA, Jensen KD, Butty V, Hu K, Boedec E, Prins P, Saeij JPJ. Transcriptional and Linkage Analyses Identify Loci that Mediate the Differential Macrophage Response to Inflammatory Stimuli and Infection. PLoS Genet 2015; 11:e1005619. [PMID: 26510153 PMCID: PMC4625001 DOI: 10.1371/journal.pgen.1005619] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 09/29/2015] [Indexed: 12/18/2022] Open
Abstract
Macrophages display flexible activation states that range between pro-inflammatory (classical activation) and anti-inflammatory (alternative activation). These macrophage polarization states contribute to a variety of organismal phenotypes such as tissue remodeling and susceptibility to infectious and inflammatory diseases. Several macrophage- or immune-related genes have been shown to modulate infectious and inflammatory disease pathogenesis. However, the potential role that differences in macrophage activation phenotypes play in modulating differences in susceptibility to infectious and inflammatory disease is just emerging. We integrated transcriptional profiling and linkage analyses to determine the genetic basis for the differential murine macrophage response to inflammatory stimuli and to infection with the obligate intracellular parasite Toxoplasma gondii. We show that specific transcriptional programs, defined by distinct genomic loci, modulate macrophage activation phenotypes. In addition, we show that the difference between AJ and C57BL/6J macrophages in controlling Toxoplasma growth after stimulation with interferon gamma and tumor necrosis factor alpha mapped to chromosome 3, proximal to the Guanylate binding protein (Gbp) locus that is known to modulate the murine macrophage response to Toxoplasma. Using an shRNA-knockdown strategy, we show that the transcript levels of an RNA helicase, Ddx1, regulates strain differences in the amount of nitric oxide produced by macrophage after stimulation with interferon gamma and tumor necrosis factor. Our results provide a template for discovering candidate genes that modulate macrophage-mediated complex traits. Macrophages provide a first line of defense against invading pathogens and play an important role in the initiation and resolution of immune responses. When in contact with pathogens or immune factors, such as cytokines, macrophages assume activation states that range between pro-inflammatory (classical activation) and anti-inflammatory (alternative activation). Even though it is known that macrophages from different individuals are biased towards one of the various activation states, the genetic factors that define individual differences in macrophage activation are not fully understood. Additionally, although macrophages are important in infectious disease pathogenesis, how individual differences in macrophage activation contribute to individual differences in susceptibility to infectious disease is just emerging. We used macrophages from genetically segregating mice to show that discrete transcriptional programs, which are modulated by specific genomic regions, modulate differences in macrophage activation. Murine macrophages differences in controlling Toxoplasma growth mapped to chromosome 3, proximal to the Guanylate binding protein (Gbp) locus that is known to modulate the murine macrophage response to Toxoplasma. Using a shRNA-mediated knockdown approach, we show that the DEAD box polypeptide 1 (Ddx1) modulates nitric oxide production in macrophages stimulated with interferon gamma and tumor necrosis factor. These findings are a step towards the identification of genes that regulate macrophage phenotypes and disease outcome.
Collapse
Affiliation(s)
- Musa A. Hassan
- Wellcome Trust Centre for Molecular Parasitology, University of Glasgow, Glasgow, United Kingdom
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- * E-mail: (MAH); (JPJS)
| | - Kirk D. Jensen
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Vincent Butty
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Kenneth Hu
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Erwan Boedec
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- School of Biotechnology, University of Strasbourg, Strasbourg, France
| | - Pjotr Prins
- Laboratory of Nematology, Wageningen University, Wageningen, The Netherlands
| | - Jeroen P. J. Saeij
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Pathology, Microbiology & Immunology, University of California, Davis, Davis, California, United States of America
- * E-mail: (MAH); (JPJS)
| |
Collapse
|
38
|
Fuerweger C, Prins P, Coskan H, Heijmen B. SU-C-BRB-04: Characteristics and Performance Evaluation of the First Commercial MLC for a Robotic Delivery System. Med Phys 2015. [DOI: 10.1118/1.4923807] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
39
|
Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 2015; 31:2032-4. [PMID: 25697820 PMCID: PMC4765878 DOI: 10.1093/bioinformatics/btv098] [Citation(s) in RCA: 1056] [Impact Index Per Article: 117.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Accepted: 02/10/2015] [Indexed: 11/13/2022] Open
Abstract
Summary: Sambamba is a high-performance robust tool and library for working with SAM, BAM and CRAM sequence alignment files; the most common file formats for aligned next generation sequencing data. Sambamba is a faster alternative to samtools that exploits multi-core processing and dramatically reduces processing time. Sambamba is being adopted at sequencing centers, not only because of its speed, but also because of additional functionality, including coverage analysis and powerful filtering capability. Availability and implementation: Sambamba is free and open source software, available under a GPLv2 license. Sambamba can be downloaded and installed from http://www.open-bio.org/wiki/Sambamba. Sambamba v0.5.0 was released with doi:10.5281/zenodo.13200. Contact: j.c.p.prins@umcutrecht.nl
Collapse
Affiliation(s)
- Artem Tarasov
- Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands
| | - Albert J Vilella
- Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands
| | - Edwin Cuppen
- Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands
| | - Isaac J Nijman
- Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands
| | - Pjotr Prins
- Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands Department of Statistical Simulation, St. Petersburg State University, St. Petersburg, Russia, Illumina Cambridge, Cambridge, UK, Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences, Utrecht, The Netherlands, Department of Medical Genetics, Institute for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands and Department of Nematology, Wageningen University, Wageningen, The Netherlands
| |
Collapse
|
40
|
Butler B, Gamble-George J, Prins P, North A, Clarke JT, Khoshbouei H. Chronic Methamphetamine Increases Alpha-Synuclein Protein Levels in the Striatum and Hippocampus but not in the Cortex of Juvenile Mice. J Addict Prev 2015; 2:6. [PMID: 25621291 PMCID: PMC4303106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Methamphetamine is the second most widely used illicit drug worldwide. More than 290 tons of methamphetamine was synthesized in the year 2005 alone, corresponding to approximately ~3 billion 100 mg doses of methamphetamine. Drug addicts abuse high concentrations of methamphetamine for months and even years. Current reports in the literature are consistent with the interpretation that methamphetamine-induced neuronal injury may render methamphetamine users more susceptible to neurodegenerative pathologies. Specifically, chronic exposure to psychostimulants is associated with increases in striatal alpha-synuclein expression, a synaptic protein implicated in the pathogenesis of neurodegenerative diseases. This raises the question whether methamphetamine exposure affects alpha-synuclein levels in the brain. In this short report, we examined alpha-synuclein protein and mRNA levels in the striatum, hippocampus and cortex of adolescent male mice following a neurotoxic regimen of methamphetamine (24mg/kg/daily/14days). We found that methamphetamine exposure resulted in a decrease in the monomeric form of alpha-synuclein (molecular species <19 kDa), while increasing higher molecular weight alpha-synuclein species (>19 kDa) in the striatum and hippocampus, but not in the cortex. Despite the elevation of high molecular weight alpha-synuclein species (>19 kDa), there was no change in the alpha-synuclein mRNA levels in the striatum, hippocampus and cortex of mice exposed to methamphetamine. The methamphetamine-induced increase in high molecular weight alpha-synuclein protein levels might be one of the causal mechanisms or one of the compensatory consequences of methamphetamine-mediated neurotoxicity.
Collapse
Affiliation(s)
- B Butler
- Department of Neuroscience, University of Florida School of Medicine, Gainesville, Fl 32611, USA
| | - J Gamble-George
- Department of Neuroscience, University of Florida School of Medicine, Gainesville, Fl 32611, USA
| | - P Prins
- Department of Neuroscience, University of Florida School of Medicine, Gainesville, Fl 32611, USA
| | - A North
- Department of Neuroscience, University of Florida School of Medicine, Gainesville, Fl 32611, USA
| | - J T Clarke
- Department of Neuroscience, University of Florida School of Medicine, Gainesville, Fl 32611, USA
| | - H Khoshbouei
- Department of Neuroscience, University of Florida School of Medicine, Gainesville, Fl 32611, USA
| |
Collapse
|
41
|
Möller S, Afgan E, Banck M, Bonnal RJP, Booth T, Chilton J, Cock PJA, Gumbel M, Harris N, Holland R, Kalaš M, Kaján L, Kibukawa E, Powel DR, Prins P, Quinn J, Sallou O, Strozzi F, Seemann T, Sloggett C, Soiland-Reyes S, Spooner W, Steinbiss S, Tille A, Travis AJ, Guimera R, Katayama T, Chapman BA. Community-driven development for computational biology at Sprints, Hackathons and Codefests. BMC Bioinformatics 2014; 15 Suppl 14:S7. [PMID: 25472764 PMCID: PMC4255748 DOI: 10.1186/1471-2105-15-s14-s7] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. Results This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. Conclusions Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.
Collapse
|
42
|
Katayama T, Wilkinson MD, Aoki-Kinoshita KF, Kawashima S, Yamamoto Y, Yamaguchi A, Okamoto S, Kawano S, Kim JD, Wang Y, Wu H, Kano Y, Ono H, Bono H, Kocbek S, Aerts J, Akune Y, Antezana E, Arakawa K, Aranda B, Baran J, Bolleman J, Bonnal RJ, Buttigieg PL, Campbell MP, Chen YA, Chiba H, Cock PJ, Cohen KB, Constantin A, Duck G, Dumontier M, Fujisawa T, Fujiwara T, Goto N, Hoehndorf R, Igarashi Y, Itaya H, Ito M, Iwasaki W, Kalaš M, Katoda T, Kim T, Kokubu A, Komiyama Y, Kotera M, Laibe C, Lapp H, Lütteke T, Marshall MS, Mori T, Mori H, Morita M, Murakami K, Nakao M, Narimatsu H, Nishide H, Nishimura Y, Nystrom-Persson J, Ogishima S, Okamura Y, Okuda S, Oshita K, Packer NH, Prins P, Ranzinger R, Rocca-Serra P, Sansone S, Sawaki H, Shin SH, Splendiani A, Strozzi F, Tadaka S, Toukach P, Uchiyama I, Umezaki M, Vos R, Whetzel PL, Yamada I, Yamasaki C, Yamashita R, York WS, Zmasek CM, Kawamoto S, Takagi T. BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains. J Biomed Semantics 2014; 5:5. [PMID: 24495517 PMCID: PMC3978116 DOI: 10.1186/2041-1480-5-5] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 11/26/2013] [Indexed: 01/24/2023] Open
Abstract
The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
Collapse
Affiliation(s)
- Toshiaki Katayama
- Database Center for Life Science, Research Organization of Information and Systems, 2-11-16, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Abstract
BACKGROUND Biological data acquisition is raising new challenges, both in data analysis and handling. Not only is it proving hard to analyze the data at the rate it is generated today, but simply reading and transferring data files can be prohibitively slow due to their size. This primarily concerns logistics within and between data centers, but is also important for workstation users in the analysis phase. Common usage patterns, such as comparing and transferring files, are proving computationally expensive and are tying down shared resources. RESULTS We present an efficient method for calculating file uniqueness for large scientific data files, that takes less computational effort than existing techniques. This method, called Probabilistic Fast File Fingerprinting (PFFF), exploits the variation present in biological data and computes file fingerprints by sampling randomly from the file instead of reading it in full. Consequently, it has a flat performance characteristic, correlated with data variation rather than file size. We demonstrate that probabilistic fingerprinting can be as reliable as existing hashing techniques, with provably negligible risk of collisions. We measure the performance of the algorithm on a number of data storage and access technologies, identifying its strengths as well as limitations. CONCLUSIONS Probabilistic fingerprinting may significantly reduce the use of computational resources when comparing very large files. Utilisation of probabilistic fingerprinting techniques can increase the speed of common file-related workflows, both in the data center and for workbench analysis. The implementation of the algorithm is available as an open-source tool named pfff, as a command-line tool as well as a C library. The tool can be downloaded from http://biit.cs.ut.ee/pfff.
Collapse
Affiliation(s)
- Konstantin Tretyakov
- Institute of Computer Science, University of Tartu, J, Liivi 2, 50409 Tartu, Estonia.
| | | | | | | | | |
Collapse
|
44
|
Prins P, Goto N, Yates A, Gautier L, Willis S, Fields C, Katayama T. Sharing programming resources between Bio* projects through remote procedure call and native call stack strategies. Methods Mol Biol 2012; 856:513-27. [PMID: 22399473 DOI: 10.1007/978-1-61779-585-5_21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Open-source software (OSS) encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, OSS comes in a broad range of programming languages, including C/C++, Perl, Python, Ruby, Java, and R. To avoid writing the same functionality multiple times for different languages, it is possible to share components by bridging computer languages and Bio* projects, such as BioPerl, Biopython, BioRuby, BioJava, and R/Bioconductor. In this chapter, we compare the two principal approaches for sharing software between different programming languages: either by remote procedure call (RPC) or by sharing a local call stack. RPC provides a language-independent protocol over a network interface; examples are RSOAP and Rserve. The local call stack provides a between-language mapping not over the network interface, but directly in computer memory; examples are R bindings, RPy, and languages sharing the Java Virtual Machine stack. This functionality provides strategies for sharing of software between Bio* projects, which can be exploited more often. Here, we present cross-language examples for sequence translation, and measure throughput of the different options. We compare calling into R through native R, RSOAP, Rserve, and RPy interfaces, with the performance of native BioPerl, Biopython, BioJava, and BioRuby implementations, and with call stack bindings to BioJava and the European Molecular Biology Open Software Suite. In general, call stack approaches outperform native Bio* implementations and these, in turn, outperform RPC-based approaches. To test and compare strategies, we provide a downloadable BioNode image with all examples, tools, and libraries included. The BioNode image can be run on VirtualBox-supported operating systems, including Windows, OSX, and Linux.
Collapse
Affiliation(s)
- Pjotr Prins
- Laboratory of Nematology, Wageningen University, Wageningen, The Netherlands.
| | | | | | | | | | | | | |
Collapse
|
45
|
Bonnal RJP, Aerts J, Githinji G, Goto N, MacLean D, Miller CA, Mishima H, Pagani M, Ramirez-Gonzalez R, Smant G, Strozzi F, Syme R, Vos R, Wennblom TJ, Woodcroft BJ, Katayama T, Prins P. Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics. Bioinformatics 2012; 28:1035-7. [PMID: 22332238 PMCID: PMC3315718 DOI: 10.1093/bioinformatics/bts080] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Biogem provides a software development environment for the Ruby programming language, which encourages community-based software development for bioinformatics while lowering the barrier to entry and encouraging best practices. Biogem, with its targeted modular and decentralized approach, software generator, tools and tight web integration, is an improved general model for scaling up collaborative open source software development in bioinformatics. AVAILABILITY Biogem and modules are free and are OSS. Biogem runs on all systems that support recent versions of Ruby, including Linux, Mac OS X and Windows. Further information at http://www.biogems.info. A tutorial is available at http://www.biogems.info/howto.html CONTACT bonnal@ingm.org.
Collapse
Affiliation(s)
- Raoul J P Bonnal
- Integrative Biology Program, Istituto Nazionale Genetica Molecolare, Milan 20122, Italy.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Arends D, van der Velde KJ, Prins P, Broman KW, Möller S, Jansen RC, Swertz MA. xQTL workbench: a scalable web environment for multi-level QTL analysis. Bioinformatics 2012; 28:1042-4. [PMID: 22308096 PMCID: PMC3315722 DOI: 10.1093/bioinformatics/bts049] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Summary:xQTL workbench is a scalable web platform for the mapping of quantitative trait loci (QTLs) at multiple levels: for example gene expression (eQTL), protein abundance (pQTL), metabolite abundance (mQTL) and phenotype (phQTL) data. Popular QTL mapping methods for model organism and human populations are accessible via the web user interface. Large calculations scale easily on to multi-core computers, clusters and Cloud. All data involved can be uploaded and queried online: markers, genotypes, microarrays, NGS, LC-MS, GC-MS, NMR, etc. When new data types come available, xQTL workbench is quickly customized using the Molgenis software generator. Availability:xQTL workbench runs on all common platforms, including Linux, Mac OS X and Windows. An online demo system, installation guide, tutorials, software and source code are available under the LGPL3 license from http://www.xqtl.org. Contact:m.a.swertz@rug.nl
Collapse
Affiliation(s)
- Danny Arends
- Groningen Bioinformatics Centre, University of Groningen, Groningen, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
47
|
Abstract
Genetical genomics combines acquired high-throughput genomic data with genetic analysis. In this chapter, we discuss the application of genetical genomics for evolutionary studies, where new high-throughput molecular technologies are combined with mapping quantitative trait loci (QTL) on the genome in segregating populations.The recent explosion of high-throughput data--measuring thousands of proteins and metabolites, deep sequencing, chromatin, and methyl-DNA immunoprecipitation--allows the study of the genetic variation underlying quantitative phenotypes, together termed xQTL. At the same time, mining information is not getting easier. To deal with the sheer amount of information, powerful statistical tools are needed to analyze multidimensional relationships. In the context of evolutionary computational biology, a well-designed experiment may help dissect a complex evolutionary trait using proven statistical methods for associating phenotypical variation with genomic locations.Evolutionary expression QTL (eQTL) studies of the last years focus on gene expression adaptations, mapping the gene expression landscape, and, tentatively, eQTL networks. Here, we discuss the possibility of introducing an evolutionary prior, in the form of gene families displaying evidence of positive selection, and using that in the context of an eQTL experiment for elucidating host-pathogen protein-protein interactions. Through the example of an experimental design, we discuss the choice of xQTL platform, analysis methods, and scope of results. The resulting eQTL can be matched, resulting in putative interacting genes and their regulators. In addition, a prior may help distinguish QTL causality from reactivity, or independence of traits, by creating QTL networks.
Collapse
Affiliation(s)
- Pjotr Prins
- Laboratory of Nematology, Wageningen University, Wageningen, The Netherlands.
| | | | | |
Collapse
|
48
|
Abstract
Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
Collapse
Affiliation(s)
- Pjotr Prins
- Laboratory of Nematology, Wageningen University, Wageningen, The Netherlands.
| | | | | | | |
Collapse
|
49
|
McKeown PC, Laouielle-Duprat S, Prins P, Wolff P, Schmid MW, Donoghue MTA, Fort A, Duszynska D, Comte A, Lao NT, Wennblom TJ, Smant G, Köhler C, Grossniklaus U, Spillane C. Identification of imprinted genes subject to parent-of-origin specific expression in Arabidopsis thaliana seeds. BMC Plant Biol 2011; 11:113. [PMID: 21838868 PMCID: PMC3174879 DOI: 10.1186/1471-2229-11-113] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2011] [Accepted: 08/12/2011] [Indexed: 05/02/2023]
Abstract
BACKGROUND Epigenetic regulation of gene dosage by genomic imprinting of some autosomal genes facilitates normal reproductive development in both mammals and flowering plants. While many imprinted genes have been identified and intensively studied in mammals, smaller numbers have been characterized in flowering plants, mostly in Arabidopsis thaliana. Identification of additional imprinted loci in flowering plants by genome-wide screening for parent-of-origin specific uniparental expression in seed tissues will facilitate our understanding of the origins and functions of imprinted genes in flowering plants. RESULTS cDNA-AFLP can detect allele-specific expression that is parent-of-origin dependent for expressed genes in which restriction site polymorphisms exist in the transcripts derived from each allele. Using a genome-wide cDNA-AFLP screen surveying allele-specific expression of 4500 transcript-derived fragments, we report the identification of 52 maternally expressed genes (MEGs) displaying parent-of-origin dependent expression patterns in Arabidopsis siliques containing F1 hybrid seeds (3, 4 and 5 days after pollination). We identified these MEGs by developing a bioinformatics tool (GenFrag) which can directly determine the identities of transcript-derived fragments from (i) their size and (ii) which selective nucleotides were added to the primers used to generate them. Hence, GenFrag facilitates increased throughput for genome-wide cDNA-AFLP fragment analyses. The 52 MEGs we identified were further filtered for high expression levels in the endosperm relative to the seed coat to identify the candidate genes most likely representing novel imprinted genes expressed in the endosperm of Arabidopsis thaliana. Expression in seed tissues of the three top-ranked candidate genes, ATCDC48, PDE120 and MS5-like, was confirmed by Laser-Capture Microdissection and qRT-PCR analysis. Maternal-specific expression of these genes in Arabidopsis thaliana F1 seeds was confirmed via allele-specific transcript analysis across a range of different accessions. Differentially methylated regions were identified adjacent to ATCDC48 and PDE120, which may represent candidate imprinting control regions. Finally, we demonstrate that expression levels of these three genes in vegetative tissues are MET1-dependent, while their uniparental maternal expression in the seed is not dependent on MET1. CONCLUSIONS Using a cDNA-AFLP transcriptome profiling approach, we have identified three genes, ATCDC48, PDE120 and MS5-like which represent novel maternally expressed imprinted genes in the Arabidopsis thaliana seed. The extent of overlap between our cDNA-AFLP screen for maternally expressed imprinted genes, and other screens for imprinted and endosperm-expressed genes is discussed.
Collapse
Affiliation(s)
- Peter C McKeown
- Genetics and Biotechnology Lab, Botany and Plant Science, National University of Ireland Galway (NUIG), C306 Aras de Brun, University Road, Galway, Ireland
| | - Sylvia Laouielle-Duprat
- Genetics and Biotechnology Lab, Botany and Plant Science, National University of Ireland Galway (NUIG), C306 Aras de Brun, University Road, Galway, Ireland
| | - Pjotr Prins
- Laboratory of Nematology, Wageningen University, Droevendaalsesteeg 1, Wageningen, The Netherlands
| | - Philip Wolff
- Department of Biology and Zürich-Basel Plant Science Center, Swiss Federal Institute of Technology, ETH Centre, CH-8092 Zürich, Switzerland
- Department of Plant Biology and Forest Genetics, Uppsala BioCenter, Swedish University of Agricultural Sciences, SE-75007 Uppsala, Sweden
| | - Marc W Schmid
- Institute of Plant Biology and Zürich-Basel Plant Science Center, University of Zürich, Zollikerstrasse 107, CH-8008 Zürich, Switzerland
| | - Mark TA Donoghue
- Genetics and Biotechnology Lab, Botany and Plant Science, National University of Ireland Galway (NUIG), C306 Aras de Brun, University Road, Galway, Ireland
| | - Antoine Fort
- Genetics and Biotechnology Lab, Botany and Plant Science, National University of Ireland Galway (NUIG), C306 Aras de Brun, University Road, Galway, Ireland
| | - Dorota Duszynska
- Genetics and Biotechnology Lab, Botany and Plant Science, National University of Ireland Galway (NUIG), C306 Aras de Brun, University Road, Galway, Ireland
| | - Aurélie Comte
- Genetics and Biotechnology Lab, Botany and Plant Science, National University of Ireland Galway (NUIG), C306 Aras de Brun, University Road, Galway, Ireland
| | - Nga Thi Lao
- Genetics and Biotechnology Lab, Botany and Plant Science, National University of Ireland Galway (NUIG), C306 Aras de Brun, University Road, Galway, Ireland
| | | | - Geert Smant
- Laboratory of Nematology, Wageningen University, Droevendaalsesteeg 1, Wageningen, The Netherlands
| | - Claudia Köhler
- Department of Biology and Zürich-Basel Plant Science Center, Swiss Federal Institute of Technology, ETH Centre, CH-8092 Zürich, Switzerland
- Department of Plant Biology and Forest Genetics, Uppsala BioCenter, Swedish University of Agricultural Sciences, SE-75007 Uppsala, Sweden
| | - Ueli Grossniklaus
- Institute of Plant Biology and Zürich-Basel Plant Science Center, University of Zürich, Zollikerstrasse 107, CH-8008 Zürich, Switzerland
| | - Charles Spillane
- Genetics and Biotechnology Lab, Botany and Plant Science, National University of Ireland Galway (NUIG), C306 Aras de Brun, University Road, Galway, Ireland
| |
Collapse
|
50
|
Bakker E, Borm T, Prins P, van der Vossen E, Uenk G, Arens M, de Boer J, van Eck H, Muskens M, Vossen J, van der Linden G, van Ham R, Klein-Lankhorst R, Visser R, Smant G, Bakker J, Goverse A. A genome-wide genetic map of NB-LRR disease resistance loci in potato. Theor Appl Genet 2011; 123:493-508. [PMID: 21590328 PMCID: PMC3135832 DOI: 10.1007/s00122-011-1602-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2010] [Accepted: 04/26/2011] [Indexed: 05/14/2023]
Abstract
Like all plants, potato has evolved a surveillance system consisting of a large array of genes encoding for immune receptors that confer resistance to pathogens and pests. The majority of these so-called resistance or R proteins belong to the super-family that harbour a nucleotide binding and a leucine-rich-repeat domain (NB-LRR). Here, sequence information of the conserved NB domain was used to investigate the genome-wide genetic distribution of the NB-LRR resistance gene loci in potato. We analysed the sequences of 288 unique BAC clones selected using filter hybridisation screening of a BAC library of the diploid potato clone RH89-039-16 (S. tuberosum ssp. tuberosum) and a physical map of this BAC library. This resulted in the identification of 738 partial and full-length NB-LRR sequences. Based on homology of these sequences with known resistance genes, 280 and 448 sequences were classified as TIR-NB-LRR (TNL) and CC-NB-LRR (CNL) sequences, respectively. Genetic mapping revealed the presence of 15 TNL and 32 CNL loci. Thirty-six are novel, while three TNL loci and eight CNL loci are syntenic with previously identified functional resistance genes. The genetic map was complemented with 68 universal CAPS markers and 82 disease resistance trait loci described in literature, providing an excellent template for genetic studies and applied research in potato.
Collapse
Affiliation(s)
- Erin Bakker
- Laboratory of Nematology, Wageningen University and Research Centre, Droevendaalsesteeg 1, Wageningen, The Netherlands.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|